File size: 3,369 Bytes

1d0da0c
769dff2
 
 
 
 
 
 
 
 
 
 
 
1d0da0c
 
769dff2
 
1d0da0c
769dff2
 
 
 
 
1d0da0c
769dff2
1d0da0c
769dff2
1d0da0c
769dff2
1d0da0c
769dff2
 
 
 
 
 
1d0da0c
 
769dff2
1d0da0c
769dff2
 
 
 
 
 
 
 
 
 
 
 
1d0da0c
 
769dff2
1d0da0c
769dff2

---
license: mit
datasets:
- CodeGoat24/HPD
- CodeGoat24/LiFT-HRA
- CodeGoat24/OIP
- CodeGoat24/EvalMuse
- CodeGoat24/ShareGPTVideo-DPO
- CodeGoat24/VideoFeedback
- CodeGoat24/LLaVA-Critic-113k
- CodeGoat24/VideoDPO
base_model:
- Qwen/Qwen2.5-VL-32B-Instruct
---

# UnifiedReward-2.0-qwen-32B
We are actively gathering feedback from the community to improve our models. **We welcome your input and encourage you to stay updated through our repository**!!

🔥🔥🔥 We release **UnifiedReward-2.0**-qwen-[[3b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-3b)/[7b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-7b)/[32b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-32b)/[72b](https://huggingface.co/CodeGoat24/UnifiedReward-2.0-qwen-72b)]. 
This version introduces several new capabilities:
>1. **Pairwise scoring** for image and video generation assessment on **_Alignment_**, **_Coherence_**, **_Style_** dimensions.
>
>2. **Pointwise scoring** for image and video generation assessment on **_Alignment_**, **_Coherence/Physics_**, **_Style_** dimensions.

Welcome to try the latest version, and the inference code is available at [`here`](https://github.com/CodeGoat24/UnifiedReward/tree/main/inference_qwen/UnifiedReward-2.0-inference).

## Model Summary

`UnifiedReward-2.0-qwen-32b` is the first unified reward model based on [Qwen/Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

For further details, please refer to the following resources:
- 📰 Paper: https://arxiv.org/pdf/2503.05236
- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
- 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)


## 🏁 Compared with Current Reward Models

|  Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding
| :-----: | :-----: |:-----: |:-----: | :-----: | :-----: |
|  [PickScore](https://github.com/yuvalkirstain/PickScore) |Point | √ |  | ||
|  [HPS](https://github.com/tgxs002/HPSv2) | Point | √ |  |||
|  [ImageReward](https://github.com/THUDM/ImageReward) |  Point| √|  |||
|  [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) | Pair/Point | | √  |||
|  [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) | Pair/Point | | √  ||√|
|  [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) | Point |  |  |√ ||
|  [LiFT](https://github.com/CodeGoat24/LiFT) | Point |  |  |√| |
|  [VisionReward](https://github.com/THUDM/VisionReward) | Point |√  | |√||
|  [VideoReward](https://github.com/KwaiVGI/VideoAlign) | Point |  |  |√ ||
|  UnifiedReward (Ours) | Pair/Point | √ | √ |√|√|


## Citation

```
@article{unifiedreward,
  title={Unified reward model for multimodal understanding and generation},
  author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2503.05236},
  year={2025}
}
```