oh_scale_x.5_compute_equal

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B on the mlfoundations-dev/oh-dcft-v1.3_no-curation_gpt-4o-mini_scale_0.5x dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4058

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 25.0

Training results

Training Loss Epoch Step Validation Loss
0.7807 0.9947 165 0.7690
0.7139 1.9955 331 0.7520
0.6642 2.9962 497 0.7525
0.6186 3.9970 663 0.7615
0.5777 4.9977 829 0.7785
0.5287 5.9985 995 0.8154
0.473 6.9992 1161 0.8710
0.4134 8.0 1327 0.9475
0.3615 8.9947 1492 1.0203
0.3057 9.9955 1658 1.1177
0.2565 10.9962 1824 1.2368
0.2099 11.9970 1990 1.3552
0.1676 12.9977 2156 1.5071
0.1283 13.9985 2322 1.6324
0.1022 14.9992 2488 1.7542
0.0779 16.0 2654 1.8729
0.0607 16.9947 2819 1.9862
0.0481 17.9955 2985 2.0547
0.038 18.9962 3151 2.1351
0.0306 19.9970 3317 2.2255
0.0256 20.9977 3483 2.2699
0.0221 21.9985 3649 2.3515
0.0197 22.9992 3815 2.3599
0.0186 24.0 3981 2.3888
0.0169 24.8681 4125 2.4058

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.3.0
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
30
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mlfoundations-dev/oh_scale_x.5_compute_equal

Finetuned
(926)
this model