oh_scale_x.5_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x dataset. It achieves the following results on the evaluation set:

Loss: 2.4058

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 512
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 25.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.7807	0.9947	165	0.7690
0.7139	1.9955	331	0.7520
0.6642	2.9962	497	0.7525
0.6186	3.9970	663	0.7615
0.5777	4.9977	829	0.7785
0.5287	5.9985	995	0.8154
0.473	6.9992	1161	0.8710
0.4134	8.0	1327	0.9475
0.3615	8.9947	1492	1.0203
0.3057	9.9955	1658	1.1177
0.2565	10.9962	1824	1.2368
0.2099	11.9970	1990	1.3552
0.1676	12.9977	2156	1.5071
0.1283	13.9985	2322	1.6324
0.1022	14.9992	2488	1.7542
0.0779	16.0	2654	1.8729
0.0607	16.9947	2819	1.9862
0.0481	17.9955	2985	2.0547
0.038	18.9962	3151	2.1351
0.0306	19.9970	3317	2.2255
0.0256	20.9977	3483	2.2699
0.0221	21.9985	3649	2.3515
0.0197	22.9992	3815	2.3599
0.0186	24.0	3981	2.3888
0.0169	24.8681	4125	2.4058

Framework versions

Transformers 4.46.1
Pytorch 2.3.0
Datasets 3.1.0
Tokenizers 0.20.3

mlfoundations-dev
/

oh_scale_x.5_compute_equal

oh_scale_x.5_compute_equal

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mlfoundations-dev/oh_scale_x.5_compute_equal

Evaluation results