pritamqu commited on
Commit
9b96dbf
·
verified ·
1 Parent(s): e17b23d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +197 -1
README.md CHANGED
@@ -1 +1,197 @@
1
- Please find more information on our GitHub page: https://github.com/pritamqu/HALVA
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <!-- <div align="center">
3
+ <img src="assets/halva_icon.png" alt="HALVA" style="width:auto;height:144px;">
4
+ </div>
5
+ <h1 align="center">
6
+ HALVA
7
+ </h1> -->
8
+
9
+ <h1 align="center">
10
+ Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination
11
+ </h1>
12
+
13
+
14
+ <h3 align="center">
15
+ ICLR 2025
16
+ </h3>
17
+ <h3 align="center">
18
+ <a href="https://www.pritamsarkar.com">Pritam Sarkar</a>
19
+ &nbsp;
20
+ Sayna Ebrahimi
21
+ &nbsp;
22
+ Ali Etemad
23
+ &nbsp; <br>
24
+ Ahmad Beirami
25
+ &nbsp;
26
+ Sercan O Arik
27
+ &nbsp;
28
+ Tomas Pfister
29
+ </h3>
30
+
31
+ <h4 align="center">
32
+ <a href="https://arxiv.org/abs/2405.18654">[arXiV]</a>
33
+ <a href="https://openreview.net/forum?id=yG1fW8igzP">[OpenReview]</a>
34
+ <a href="https://github.com/pritamqu/HALVA/">[GitHub]</a>
35
+ <a href="https://huggingface.co/collections/pritamqu/halva-6797efacaa78d98bccb8e57a">[Model Weights &#129303]</a>
36
+ <a href="./?tab=readme-ov-file#data">[Training Data]</a>
37
+ </h5>
38
+
39
+
40
+ <hr>
41
+
42
+ Please see our [GitHUb](https://github.com/pritamqu/HALVA/) repo for details.
43
+
44
+ ### Setup environment
45
+
46
+ ```
47
+ conda create -n halva python=3.10 -y
48
+ conda activate halva
49
+ pip install --upgrade pip
50
+ pip install -r req.txt
51
+ module load cuda/11.7.1
52
+ pip install flash-attn --no-build-isolation
53
+ ```
54
+
55
+ ### Try HALVA!
56
+
57
+ We share a minimal setup to quickly try our HALVA! See this [notebook](https://github.com/pritamqu/HALVA/blob/master/try_halva.ipynb).
58
+
59
+ ### Model weights
60
+
61
+ - [HALVA 7B](https://huggingface.co/pritamqu/halva7b-lora)
62
+ - [HALVA 13B](https://huggingface.co/pritamqu/halva13b-lora)
63
+ - [HALVA 13B/384](https://huggingface.co/pritamqu/halva13b384-lora)
64
+
65
+ ### Training HALVA
66
+
67
+ ### Data
68
+
69
+ **generative data augmented contrastive samples**
70
+ - Vision-language instructions and their correct and hallucinated responses are available here: [data](https://github.com/pritamqu/HALVA/blob/master/data/data.json)
71
+ - Download the images from Visual Genome and save both part 1 and part 2 as `data/vg/VG_100K` and `data/vg/VG_100K_2`
72
+
73
+ **reference samples**
74
+
75
+ - A random subset from [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/tree/main). For reproducibility, we share the actual subset that has been used in our study: [ref data](data/ref_data.json)
76
+ - Image sources:
77
+ - MSCOCO - download them as `data/MSCOCO2017`
78
+ - TextVQA - download them as `data/textvqa`
79
+ - GQA - download them as `data/gqa`
80
+ - OCR-VQA - download them as `data/ocr_vqa`
81
+
82
+
83
+ ### Train
84
+
85
+ - The base model LLaVA-v1.5 weights can be found here: [7B](https://huggingface.co/liuhaotian/llava-v1.5-7b) and [13B](https://huggingface.co/liuhaotian/llava-v1.5-13b).
86
+ - We use 4-A100 80GB GPUs for training, which takes 1.5 hours and 3 hours for training 7B and 13B variants, respectively. If you are using different GPUs, please make sure to match our default batch_size x gradient accumulation steps, for optimal performance with the default hyperparameters.
87
+ - The following training script can be used to train HALVA that uses LLaVA 1.5 as the base model:
88
+ - HALVA-7B: `src/hallava_7b.sh`
89
+ - HALVA-13B: `src/hallava_13b.sh`
90
+
91
+
92
+ ### Evaluation on hallucination benchmarks
93
+
94
+ Choose the HALVA variant and their base model. We provide sample validation scripts for evaluation, **please make sure to update the paths based on your setup**.
95
+
96
+ ```
97
+ MODEL="halva13b-lora"
98
+ MODEL_BASE="liuhaotian/llava-v1.5-13b"
99
+
100
+ # OR
101
+
102
+ MODEL="halva7b-lora"
103
+ MODEL_BASE="liuhaotian/llava-v1.5-7b"
104
+ ```
105
+
106
+ #### CHAIR
107
+
108
+ - Download the validation images from [MSCOCO2014](https://cocodataset.org/#download) and store them as `data/MSCOCO2014/val2014`. We use the same 500 images for validation, as used in [prior work](https://github.com/yuezih/less-is-more/blob/main/CHAIR-eval/data/chair-500.jsonl).
109
+ - You can use the given sample script for evaluation.
110
+
111
+ ```
112
+ ##### run chair
113
+ bash src/evaluate_hall/chair.sh ${MODEL} ${MODEL_BASE}
114
+ ```
115
+
116
+ #### MME-Hall
117
+
118
+ - MME-Hall is a subset of MME consisting of `existence`, `count`, `position`, and `color`.
119
+ - You can follow the official instructions for MME evaluation: [link](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and download the MME benchmark.
120
+ - Once the data is downloaded you can use the given sample script for evaluation.
121
+
122
+ ```
123
+ ##### run mme
124
+ bash src/evaluate_hall/mme.sh ${MODEL} ${MODEL_BASE}
125
+ ```
126
+
127
+ #### AMBER
128
+
129
+ - Download the validation images are from the source repo [AMBER](https://github.com/junyangwang0410/AMBER/tree/master) and keep them as `data/amber/image/`.
130
+ - Download the annotation [data](https://github.com/junyangwang0410/AMBER/tree/master/data) directory and save as `eval_hall/amber/data`.
131
+ - Once the data is downloaded you can use the given sample script for evaluation.
132
+
133
+
134
+ ```
135
+ ##### run amber evaluation on 4 GPUs in parallel if available, else run sequentially by removing & from the end
136
+ bash src/evaluate_hall/amber.sh g ${MODEL} ${MODEL_BASE} 0 &
137
+ bash src/evaluate_hall/amber.sh da ${MODEL} ${MODEL_BASE} 1 &
138
+ bash src/evaluate_hall/amber.sh dr ${MODEL} ${MODEL_BASE} 2 &
139
+ bash src/evaluate_hall/amber.sh de ${MODEL} ${MODEL_BASE} 3 &
140
+ wait
141
+ # get amber f1 for all discriminative tasks
142
+ bash src/evaluate_hall/amber_f1.sh ${MODEL}
143
+ ```
144
+
145
+ #### MMHal-Bench
146
+
147
+ - The validation data will be directly downloaded from HuggingFace. You can use the given sample script for evaluation.
148
+
149
+ ```
150
+ ##### run mmhal-bench
151
+ bash src/evaluate_hall/mmhal.sh ${MODEL} ${MODEL_BASE} 0
152
+ ```
153
+
154
+
155
+ #### HallusionBench
156
+
157
+ - Download the validation images from [link](https://drive.google.com/file/d/1eeO1i0G9BSZTE1yd5XeFwmrbe1hwyf_0/view?usp=sharing) and save them in `data/hallusion_bench`.
158
+ - Download the annotation files from [link](https://github.com/tianyi-lab/HallusionBench/blob/main/HallusionBench.json) and save them in `eval_hall/hallusion_bench`.
159
+ - For more details, you can check the [official repo](https://github.com/tianyi-lab/HallusionBench). You can use the given sample script for evaluation.
160
+
161
+ ```
162
+ ##### run halusion-bench
163
+ bash src/evaluate_hall/hallusionbench.sh ${MODEL} ${MODEL_BASE} 0
164
+ ```
165
+
166
+
167
+ ### Evaluation on general vision-language tasks
168
+
169
+ In addition to the above-mentioned evaluation on hallucination benchmarks, we also evaluate on general vision-language benchmarks. For those, we directly follow LLaVA repo as follows:
170
+
171
+ - [VQA](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#vqav2)
172
+ - [MM-Vet](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#mm-vet)
173
+ - [TextVQA](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#textvqa)
174
+ - [MME](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#mme)
175
+
176
+ ### VILA
177
+
178
+ The above instructions are mainly related to the LLaVA 1.5 based checkpoints, you can find the VILA codes inside `*_vila` directories.
179
+
180
+ ### Citation
181
+ If you find this repository useful, please consider giving a star :star: and citation using the given BibTeX entry:
182
+
183
+ ```
184
+ @misc{sarkar2024halva,
185
+ title={Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination},
186
+ author={Pritam Sarkar and Sayna Ebrahimi and Ali Etemad and Ahmad Beirami and Sercan Ö. Arık and Tomas Pfister},
187
+ year={2024},
188
+ eprint={2405.18654},
189
+ archivePrefix={arXiv},
190
+ primaryClass={cs.CV}
191
+ }
192
+ ```
193
+
194
+ ### Acknowledgement
195
+
196
+ This code base is built upon [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main) and [VILA](https://github.com/NVlabs/VILA).
197
+