RylanSchaeffer's picture
End of training
d03f4d5 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1952
  • Num Input Tokens Seen: 5025616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3245 0.0534 5 1.2768 275360
1.1423 0.1069 10 1.2008 543560
0.9748 0.1603 15 1.1848 809872
1.0866 0.2138 20 1.2027 1077984
0.8487 0.2672 25 1.2109 1343264
0.8541 0.3206 30 1.2285 1613856
0.7718 0.3741 35 1.2338 1883800
0.752 0.4275 40 1.2181 2154856
0.6467 0.4810 45 1.2274 2428208
0.5452 0.5344 50 1.2074 2695040
0.5495 0.5878 55 1.2047 2970696
0.5562 0.6413 60 1.2104 3245864
0.5367 0.6947 65 1.1986 3512208
0.4594 0.7482 70 1.1975 3784176
0.5366 0.8016 75 1.1995 4052712
0.3897 0.8550 80 1.1944 4323640
0.4671 0.9085 85 1.1959 4591856
0.4434 0.9619 90 1.1870 4864704

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1