### Introduction Paper: [Paper](https://arxiv.org/abs/2502.18411), Github: [Github](https://github.com/PhoenixZ810/OmniAlign-V), Page: [Page](https://phoenixz810.github.io/OmniAlign-V/), SFT Dataset: [OmniAlign-V](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V), DPO Dataset: [OmniAlign-V-DPO](https://huggingface.co/datasets/PhoenixZ/OmniAlign-V-DPO), MM-AlignBench: [VLMEvalkit](https://github.com/open-compass/VLMEvalKit), [Huggingface](https://huggingface.co/datasets/PhoenixZ/MM-AlignBench) Checkpoints: [LLaVANext-OA-7B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-7B), [LLaVANext-OA-32B](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B), [LLaVANext-OA-32B-DPO](https://huggingface.co/PhoenixZ/LLaVANext-OmniAlign-32B-DPO) This is the official repo of LLaVANext-OmniAlign(OA)-32B-DPO in OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference. LLaVANext-OmniAlign-32B-DPO is based on [LLaVA-Next](https://github.com/LLaVA-VL/LLaVA-NeXT) structure with [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct). By applying DPO stage using OmniAlign-V-DPO datasets, we can further improve the alignment of MLLMs with human preference. ### Performance By integrating OmniAlign-V-DPO datasets in DPO stage, we can further improve the alignment of MLLMs with human preference. Our LLaVANext-OA-32B-DPO even surpasses Qwen2VL-72B on MM-AlignBench. | Model | Win Rate | Reward | Better+ | Better | Tie | Worse | Worse+ | |-------------------------------|------------------------------|---------------------------|------------|-----|----|-----|-----| | Claude3.5V-Sonnet | 84.9 | +51.4 | 70 | 144 | 13 | 25 | 0 | | GPT-4o | 81.3 | +49.0 | 81 | 124 | 12 | 31 | 4 | | GPT-4V | 82.5 | +46.0 | 57 | 151 | 12 | 31 | 1 | | GeminiFlash1.5-002 | 77.0 | +39.1 | 56 | 138 | 14 | 35 | 9 | | LLaVANext-OA-32B-DPO | 74.2 | +36.9 | 49 | 138 | 20 | 40 | 5 | | Qwen2VL-72B | 61.5 | +21.6 | 43 | 112 | 15 | 75 | 7 | | LLaVANext-OA-32B | 62.3 | +19.4 | 31 | 126 | 19 | 62 | 14 | | Claude-3V-Sonnet | 50 | 0 | - | - | - | - | - | | Qwen2VL-7B | 44.4 | -5.8 | 28 | 84 | 5 | 101 | 34 | | InternVL2-72B | 44.4 | -6.9 | 19 | 93 | 8 | 98 | 34 | | InternVL2-8B-MPO | 40.1 | -10.9 | 26 | 75 | 10 | 100 | 41 | | InternVL2-8B | 31.3 | -21.8 | 18 | 61 | 15 | 109 | 49 | | LLaMA3.2-Vision-11B | 27.8 | -33.7 | 18 | 52 | 4 | 98 | 80 | | LLaVANext-Qwen32B | 26.6 | -29.0 | 16 | 51 | 10 | 121 | 54 | | LLaVA-OneVision-7B | 23.8 | -46.2 | 14 | 46 | 1 | 75 | 116 | | MiniCPM-V-2.5 | 12.7 | -53.0 | 9 | 23 | 8 | 116 | 96 | | Xcomposer2.5-7B | 7.5 | -74.0 | 5 | 14 | 3 | 63 | 167 | | Idefics3-8B | 2.7 | -92.3 | 3 | 4 | 0 | 15 | 230 | ### How to use Please refer to our [Github](https://github.com/PhoenixZ810/OmniAlign-V) for more details about training and evaluation.