CodeGoat24's picture
Update README.md
8320bb3 verified
metadata
license: mit
datasets:
  - CodeGoat24/HPD
  - CodeGoat24/LiFT-HRA
  - CodeGoat24/OIP
  - CodeGoat24/EvalMuse
  - CodeGoat24/ShareGPTVideo-DPO
  - CodeGoat24/VideoFeedback
  - CodeGoat24/LLaVA-Critic-113k
  - CodeGoat24/VideoDPO
base_model:
  - Qwen/Qwen2.5-VL-32B-Instruct

UnifiedReward-2.0-qwen-32B

We are actively gathering feedback from the community to improve our models. We welcome your input and encourage you to stay updated through our repository!!

πŸ”₯πŸ”₯πŸ”₯ We release UnifiedReward-2.0-qwen-[3b/7b/32b/72b]. This version introduces several new capabilities:

  1. Pairwise scoring for image and video generation assessment on Alignment, Coherence, Style dimensions.

  2. Pointwise scoring for image and video generation assessment on Alignment, Coherence/Physics, Style dimensions.

Welcome to try the latest version, and the inference code is available at here.

Model Summary

UnifiedReward-2.0-qwen-32b is the first unified reward model based on Qwen/Qwen2.5-VL-32B-Instruct for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

For further details, please refer to the following resources:

🏁 Compared with Current Reward Models

Reward Model Method Image Generation Image Understanding Video Generation Video Understanding
PickScore Point √
HPS Point √
ImageReward Point √
LLaVA-Critic Pair/Point √
IXC-2.5-Reward Pair/Point √ √
VideoScore Point √
LiFT Point √
VisionReward Point √ √
VideoReward Point √
UnifiedReward (Ours) Pair/Point √ √ √ √

Citation

@article{unifiedreward,
  title={Unified reward model for multimodal understanding and generation},
  author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2503.05236},
  year={2025}
}