Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,71 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
|
| 5 |
+
- liuhaotian/LLaVA-Pretrain
|
| 6 |
+
language:
|
| 7 |
+
- en
|
| 8 |
+
metrics:
|
| 9 |
+
- accuracy
|
| 10 |
+
base_model:
|
| 11 |
+
- microsoft/bitnet-b1.58-2B-4T
|
| 12 |
+
pipeline_tag: image-text-to-text
|
| 13 |
+
tags:
|
| 14 |
+
- 1-bit
|
| 15 |
+
- VLA
|
| 16 |
+
- VLM
|
| 17 |
+
---
|
| 18 |
+
# BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
|
| 19 |
+
[[paper]](https://arxiv.org/abs/2506.07530) [[model]](https://huggingface.co/collections/hongyuw/bitvla-68468fb1e3aae15dd8a4e36e) [[code]](https://github.com/ustcwhy/BitVLA)
|
| 20 |
+
|
| 21 |
+
- June 2025: [BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation](https://arxiv.org/abs/2506.07530)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## Open Source Plan
|
| 25 |
+
|
| 26 |
+
- ✅ Paper, Pre-trained VLM and evaluation code.
|
| 27 |
+
- 🧭 Fine-tuned VLA models, pre-training and fine-tuning code.
|
| 28 |
+
- 🧭 Pre-trained VLA.
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
## Evaluation on VQA
|
| 32 |
+
|
| 33 |
+
We use the [LMM-Eval](https://github.com/ustcwhy/BitVLA/tree/main/lmms-eval) toolkit to conduct evaluations on VQA tasks. We provide the [transformers repo](https://github.com/ustcwhy/BitVLA/tree/main/transformers) in which we modify the [modeling_llava.py](https://github.com/ustcwhy/BitVLA/blob/main/transformers/src/transformers/models/llava/modeling_llava.py) and [modeling_siglip.py](https://github.com/ustcwhy/BitVLA/blob/main/transformers/src/transformers/models/siglip/modeling_siglip.py) to support the W1.58-A8 quantization.
|
| 34 |
+
|
| 35 |
+
The evaluation should use nvidia_24_07 docker. Install the packages:
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
docker run --name nvidia_24_07 --privileged --net=host --ipc=host --gpus=all -v /mnt:/mnt -v /tmp:/tmp -d nvcr.io/nvidia/pytorch:24.07-py3 sleep infinity # only use for multimodal evaluation
|
| 39 |
+
docker exec -it nvidia_24_07 bash
|
| 40 |
+
git clone https://github.com/ustcwhy/BitVLA.git
|
| 41 |
+
cd BitVLA/
|
| 42 |
+
bash vl_eval_setup.sh # only use for multimodal evaluation
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
First, download the BitVLA model from HuggingFace:
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
git clone https://huggingface.co/hongyuw/bitvla-bitsiglipL-224px-bf16 # BitVLA w/ W1.58-A8 SigLIP-L
|
| 49 |
+
git clone https://huggingface.co/hongyuw/bitvla-siglipL-224px-bf16 # BitVLA w/ BF16 SigLIP-L
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
Then run the following scripts to conduct evaluations:
|
| 53 |
+
|
| 54 |
+
```bash
|
| 55 |
+
cd lmms-eval/
|
| 56 |
+
bash eval-dense-hf.sh /YOUR_PATH_TO_EXP/bitvla-bitsiglipL-224px-bf16
|
| 57 |
+
bash eval-dense-hf.sh /YOUR_PATH_TO_EXP/bitvla-siglipL-224px-bf16
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Note that we provide the master weights of BitVLA and perform online quantization. For actual memory savings, you may quantize the weights offline to 1.58-bit precision. We recommend using the [bitnet.cpp](https://github.com/microsoft/bitnet) inference framework to accurately measure the reduction in inference cost.
|
| 61 |
+
|
| 62 |
+
## Acknowledgement
|
| 63 |
+
|
| 64 |
+
This repository is built using [LMM-Eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) and [the HuggingFace's transformers](https://github.com/huggingface/transformers).
|
| 65 |
+
|
| 66 |
+
## License
|
| 67 |
+
This project is licensed under the MIT License.
|
| 68 |
+
|
| 69 |
+
### Contact Information
|
| 70 |
+
|
| 71 |
+
For help or issues using models, please submit a GitHub issue.
|