counterfactual_seed-63_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1845
  • Accuracy: 0.4009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 63
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
5.9705 0.9994 1486 4.4079 0.2941
4.3089 1.9992 2972 3.9092 0.3319
3.703 2.9991 4458 3.6332 0.3553
3.5333 3.9996 5945 3.4678 0.3710
3.3129 4.9994 7431 3.3714 0.3801
3.2404 5.9992 8917 3.3118 0.3860
3.1329 6.9991 10403 3.2677 0.3900
3.0918 7.9996 11890 3.2436 0.3927
3.0319 8.9994 13376 3.2273 0.3947
3.0061 9.9992 14862 3.2162 0.3960
2.9698 10.9991 16348 3.2088 0.3971
2.9485 11.9996 17835 3.2029 0.3982
2.9272 12.9994 19321 3.2025 0.3986
2.9081 13.9992 20807 3.1958 0.3994
2.8982 14.9991 22293 3.1934 0.3992
2.8793 15.9996 23780 3.1888 0.3998
2.8779 16.9994 25266 3.1852 0.4005
2.8592 17.9992 26752 3.1844 0.4004
2.8662 18.9991 28238 3.1828 0.4007
2.8451 19.9962 29720 3.1845 0.4009

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.0
Downloads last month
13
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.