Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
merve
/
Qwen2.5-VL-3B-Instruct-trl-mpo-rlaif-v
like
0
Transformers
TensorBoard
Safetensors
Generated from Trainer
trl
dpo
arxiv:
2305.18290
Model card
Files
Files and versions
Metrics
Training metrics
Community
Train
Deploy
Use this model
main
Qwen2.5-VL-3B-Instruct-trl-mpo-rlaif-v
Commit History
Training in progress, step 124
34dcc84
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 120
b7bec26
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 110
f709810
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 100
8347fd4
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 90
b94dbb3
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 80
4a76c01
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 70
2fc1ba5
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 60
211829c
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 50
de872cf
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 40
097b87d
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 30
578f443
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 20
2bb931d
verified
merve
HF Staff
commited on
Jul 23
Training in progress, step 10
7de3cfc
verified
merve
HF Staff
commited on
Jul 23
initial commit
6fafaa1
verified
merve
HF Staff
commited on
Jul 23