collapse_gemma-2-2b_hs2_replace_iter12_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6137
  • Num Input Tokens Seen: 4692640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4164 0.0511 5 1.2765 248104
0.942 0.1021 10 1.2582 482960
0.5589 0.1532 15 1.4596 727376
0.3508 0.2042 20 1.6556 975432
0.1903 0.2553 25 1.8531 1215984
0.0848 0.3063 30 2.1690 1455400
0.0438 0.3574 35 2.3138 1697704
0.0314 0.4084 40 2.4183 1942680
0.0274 0.4595 45 2.4952 2183992
0.026 0.5105 50 2.5004 2424896
0.0256 0.5616 55 2.5534 2673496
0.0219 0.6126 60 2.5495 2911280
0.0228 0.6637 65 2.5383 3151984
0.0219 0.7147 70 2.5423 3401680
0.0211 0.7658 75 2.5687 3640496
0.022 0.8168 80 2.5789 3884360
0.0216 0.8679 85 2.5962 4127864
0.0223 0.9190 90 2.6077 4361768
0.0218 0.9700 95 2.6126 4595248

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter12_sftsd1

Base model

google/gemma-2-2b
Finetuned
(491)
this model