### Introduction
Paper: [Paper](https://arxiv.org/abs/2502.18411),

Github: [Github](https://github.com/PhoenixZ810/OmniAlign-V),

Page: [Page](https://phoenixz810.github.io/OmniAlign-V/),

SFT Dataset: [OmniAlign-V](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V),

DPO Dataset: [OmniAlign-V-DPO](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V-DPO),

MM-AlignBench: [VLMEvalkit](https://github.com/open-compass/VLMEvalKit), [Huggingface](https://huggingface.co/datasets/PhoenixZ/MM-AlignBench)

Checkpoints: [LLaVANext-OA-7B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-7B), [LLaVANext-OA-32B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B), [LLaVANext-OA-32B-DPO](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B-DPO)

This is the official repo of LLaVANext-OmniAlign(OA)-32B-DPO in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference.

LLaVANext-OmniAlign-32B-DPO is based on [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT) structure with [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct).

By applying DPO stage using OmniAlign-V-DPO datasets, we can further improve the alignment of MLLMs with human preference.

### Performance
By integrating OmniAlign-V-DPO datasets in DPO stage, we can further improve the alignment of MLLMs with human preference. Our LLaVANext-OA-32B-DPO even surpasses Qwen2VL-72B on MM-AlignBench.

| Model                         | Win Rate         | Reward | Better+         | Better   | Tie  | Worse   | Worse+  |
|-------------------------------|------------------------------|---------------------------|------------|-----|----|-----|-----|
| Claude3.5V-Sonnet             | 84.9                         | +51.4                     | 70         | 144 | 13 | 25  | 0   |
| GPT-4o                        | 81.3                         | +49.0                     | 81         | 124 | 12 | 31  | 4   |
| GPT-4V                        | 82.5                         | +46.0                     | 57         | 151 | 12 | 31  | 1   |
| GeminiFlash1.5-002            | 77.0                         | +39.1                     | 56         | 138 | 14 | 35  | 9   |
| LLaVANext-OA-32B-DPO | 74.2                         | +36.9                     | 49         | 138 | 20 | 40  | 5   |
| Qwen2VL-72B                   | 61.5                         | +21.6                     | 43         | 112 | 15 | 75  | 7   |
| LLaVANext-OA-32B     | 62.3                         | +19.4                     | 31         | 126 | 19 | 62  | 14  |
| Claude-3V-Sonnet              | 50                           | 0                         | -          | -   | -  | -   | -   |
| Qwen2VL-7B                    | 44.4                         | -5.8                      | 28         | 84  | 5  | 101 | 34  |
| InternVL2-72B                 | 44.4                         | -6.9                      | 19         | 93  | 8  | 98  | 34  |
| InternVL2-8B-MPO              | 40.1                         | -10.9                     | 26         | 75  | 10 | 100 | 41  |
| InternVL2-8B                  | 31.3                         | -21.8                     | 18         | 61  | 15 | 109 | 49  |
| LLaMA3.2-Vision-11B           | 27.8                         | -33.7                     | 18         | 52  | 4  | 98  | 80  |
| LLaVANext-Qwen32B    | 26.6                         | -29.0                     | 16         | 51  | 10 | 121 | 54  |
| LLaVA-OneVision-7B            | 23.8                         | -46.2                     | 14         | 46  | 1  | 75  | 116 |
| MiniCPM-V-2.5                 | 12.7                         | -53.0                     | 9          | 23  | 8  | 116 | 96  |
| Xcomposer2.5-7B               | 7.5                          | -74.0                     | 5          | 14  | 3  | 63  | 167 |
| Idefics3-8B                   | 2.7                          | -92.3                     | 3          | 4   | 0  | 15  | 230 |

### How to use
Please refer to our [Github](https://github.com/PhoenixZ810/OmniAlign-V) for more details about training and evaluation.