ViDolphin-v0
This model is a fine-tuned version of ByteDance/Dolphin on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0628
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 15
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.3233 | 0.2910 | 500 | 0.2670 |
0.2159 | 0.5821 | 1000 | 0.1814 |
0.1222 | 1.3099 | 1500 | 0.1093 |
0.1081 | 1.7464 | 2000 | 0.0942 |
0.0886 | 2.1825 | 2500 | 0.0865 |
0.0813 | 2.6189 | 3000 | 0.0811 |
0.076 | 3.0550 | 3500 | 0.0777 |
0.0663 | 3.4915 | 4000 | 0.0745 |
0.0591 | 3.9280 | 4500 | 0.0720 |
0.0673 | 4.3640 | 5000 | 0.0697 |
0.0531 | 4.8005 | 5500 | 0.0674 |
0.0557 | 5.2366 | 6000 | 0.0673 |
0.0545 | 5.6731 | 6500 | 0.0655 |
0.0561 | 6.1091 | 7000 | 0.0655 |
0.0421 | 6.5456 | 7500 | 0.0646 |
0.044 | 6.9821 | 8000 | 0.0636 |
0.0398 | 7.4182 | 8500 | 0.0637 |
0.0448 | 7.8546 | 9000 | 0.0639 |
0.0355 | 8.2907 | 9500 | 0.0635 |
0.042 | 8.7272 | 10000 | 0.0631 |
0.0396 | 9.1632 | 10500 | 0.0635 |
0.038 | 9.5997 | 11000 | 0.0634 |
0.0379 | 10.0358 | 11500 | 0.0627 |
0.0349 | 10.4723 | 12000 | 0.0627 |
0.0334 | 10.9088 | 12500 | 0.0626 |
0.0359 | 11.3448 | 13000 | 0.0626 |
0.035 | 11.7813 | 13500 | 0.0626 |
0.0305 | 12.2174 | 14000 | 0.0629 |
0.0293 | 12.6539 | 14500 | 0.0628 |
Framework versions
- Transformers 4.56.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 22
Model tree for htdung167/ViDolphin-v0
Base model
ByteDance/Dolphin