ViDolphin-v0

This model is a fine-tuned version of ByteDance/Dolphin on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 15
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.3233	0.2910	500	0.2670
0.2159	0.5821	1000	0.1814
0.1222	1.3099	1500	0.1093
0.1081	1.7464	2000	0.0942
0.0886	2.1825	2500	0.0865
0.0813	2.6189	3000	0.0811
0.076	3.0550	3500	0.0777
0.0663	3.4915	4000	0.0745
0.0591	3.9280	4500	0.0720
0.0673	4.3640	5000	0.0697
0.0531	4.8005	5500	0.0674
0.0557	5.2366	6000	0.0673
0.0545	5.6731	6500	0.0655
0.0561	6.1091	7000	0.0655
0.0421	6.5456	7500	0.0646
0.044	6.9821	8000	0.0636
0.0398	7.4182	8500	0.0637
0.0448	7.8546	9000	0.0639
0.0355	8.2907	9500	0.0635
0.042	8.7272	10000	0.0631
0.0396	9.1632	10500	0.0635
0.038	9.5997	11000	0.0634
0.0379	10.0358	11500	0.0627
0.0349	10.4723	12000	0.0627
0.0334	10.9088	12500	0.0626
0.0359	11.3448	13000	0.0626
0.035	11.7813	13500	0.0626
0.0305	12.2174	14000	0.0629
0.0293	12.6539	14500	0.0628