qwen_cCPO_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4698
  • Rewards/chosen: -1.7544
  • Rewards/rejected: -2.3724
  • Rewards/accuracies: 0.6840
  • Rewards/margins: 0.6180
  • Logps/rejected: -2.3724
  • Logps/chosen: -1.7544
  • Logits/rejected: 0.2213
  • Logits/chosen: 0.1180

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.547 0.2141 400 0.5455 -1.3437 -1.4812 0.5579 0.1374 -1.4812 -1.3437 0.3541 0.2674
0.5301 0.4282 800 0.5165 -1.3889 -1.6415 0.5927 0.2526 -1.6415 -1.3889 0.4272 0.3345
0.5265 0.6422 1200 0.4985 -1.4579 -1.8204 0.6224 0.3625 -1.8204 -1.4579 0.3699 0.2760
0.4765 0.8563 1600 0.4935 -1.4994 -1.8829 0.6380 0.3836 -1.8829 -1.4994 0.3198 0.2257
0.5542 1.0704 2000 0.4872 -1.4687 -1.8582 0.6372 0.3895 -1.8582 -1.4687 0.3054 0.2090
0.4732 1.2845 2400 0.4775 -1.6420 -2.1625 0.6669 0.5204 -2.1625 -1.6420 0.3805 0.2752
0.5055 1.4986 2800 0.4755 -1.6156 -2.1129 0.6639 0.4973 -2.1129 -1.6156 0.4048 0.2981
0.4945 1.7127 3200 0.4738 -1.5940 -2.0956 0.6677 0.5016 -2.0956 -1.5940 0.3909 0.2834
0.4619 1.9267 3600 0.4700 -1.6914 -2.2530 0.6728 0.5617 -2.2530 -1.6914 0.3536 0.2473
0.4109 2.1408 4000 0.4699 -1.7062 -2.2883 0.6780 0.5822 -2.2883 -1.7062 0.3677 0.2556
0.4282 2.3549 4400 0.4707 -1.7749 -2.3952 0.6877 0.6202 -2.3952 -1.7749 0.2280 0.1239
0.4299 2.5690 4800 0.4704 -1.7425 -2.3507 0.6803 0.6082 -2.3507 -1.7425 0.3027 0.1929
0.4414 2.7831 5200 0.4698 -1.7506 -2.3686 0.6847 0.6181 -2.3686 -1.7506 0.2344 0.1302
0.404 2.9972 5600 0.4698 -1.7544 -2.3724 0.6840 0.6180 -2.3724 -1.7544 0.2213 0.1180

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
143
Safetensors
Model size
464M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for yakazimir/qwen_cCPO_entropy

Base model

Qwen/Qwen1.5-0.5B
Finetuned
(23)
this model

Dataset used to train yakazimir/qwen_cCPO_entropy