collapse_gemma-2-2b_hs2_replace_iter11_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5435
  • Num Input Tokens Seen: 4784104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5272 0.0511 5 1.2822 252768
0.8166 0.1022 10 1.3380 509408
0.4892 0.1533 15 1.5438 759784
0.206 0.2043 20 1.7476 1010152
0.2173 0.2554 25 1.9773 1248280
0.0722 0.3065 30 2.1778 1497952
0.0457 0.3576 35 2.3481 1738968
0.0783 0.4087 40 2.4090 1989136
0.0274 0.4598 45 2.4217 2237392
0.0255 0.5109 50 2.4545 2485808
0.0261 0.5619 55 2.4836 2737960
0.0232 0.6130 60 2.4909 2979504
0.0375 0.6641 65 2.4994 3225896
0.028 0.7152 70 2.4842 3463664
0.0235 0.7663 75 2.4755 3711680
0.0217 0.8174 80 2.4925 3954272
0.0213 0.8685 85 2.5131 4204248
0.0204 0.9195 90 2.5247 4446520
0.0226 0.9706 95 2.5365 4686904

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter11_sftsd0

Base model

google/gemma-2-2b
Finetuned
(491)
this model