datives_removed_seed-42_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1700
  • Accuracy: 0.4016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.0149 0.9995 1504 4.4077 0.2931
3.947 1.9993 3008 3.8892 0.3339
3.6966 2.9992 4512 3.6094 0.3572
3.4034 3.9997 6017 3.4486 0.3726
3.2971 4.9995 7521 3.3558 0.3813
3.1805 5.9993 9025 3.2927 0.3872
3.1139 6.9992 10529 3.2557 0.3909
3.0634 7.9997 12034 3.2280 0.3938
3.0111 8.9995 13538 3.2141 0.3953
2.9921 9.9993 15042 3.1998 0.3969
2.9476 10.9992 16546 3.1914 0.3984
2.9475 11.9997 18051 3.1887 0.3987
2.9023 12.9995 19555 3.1828 0.3996
2.9125 13.9993 21059 3.1795 0.4003
2.8721 14.9992 22563 3.1771 0.4005
2.8886 15.9997 24068 3.1767 0.4010
2.8514 16.9995 25572 3.1728 0.4009
2.8731 17.9993 27076 3.1758 0.4015
2.8378 18.9992 28580 3.1706 0.4013
2.8617 19.9963 30080 3.1700 0.4016

Framework versions

  • Transformers 4.46.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.20.0
Downloads last month
18
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.