RouteTisaleoGPT2 / README.md
IAMRS23's picture
End of training
cd96781 verified
metadata
library_name: transformers
license: mit
base_model: IAMRS23/RouteTisaleoGPT2
tags:
  - generated_from_trainer
model-index:
  - name: RouteTisaleoGPT2
    results: []

RouteTisaleoGPT2

This model is a fine-tuned version of IAMRS23/RouteTisaleoGPT2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0332
  • Perplexity: 4718.9899

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Perplexity
1.323 0.9804 100 0.0435 4779.9616
0.0424 1.9608 200 0.0357 4948.1139
0.0374 2.9412 300 0.0343 4729.6717
0.0358 3.9216 400 0.0360 4759.4082
0.0355 4.9020 500 0.0337 4736.7832
0.0349 5.8824 600 0.0345 4804.4681
0.0347 6.8627 700 0.0335 4938.8585
0.0345 7.8431 800 0.0333 4736.7832
0.0346 8.8235 900 0.0335 4759.4082
0.0341 9.8039 1000 0.0332 4718.9899

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0