collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1952
  • Num Input Tokens Seen: 5025616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3245 0.0534 5 1.2768 275360
1.1423 0.1069 10 1.2008 543560
0.9748 0.1603 15 1.1848 809872
1.0866 0.2138 20 1.2027 1077984
0.8487 0.2672 25 1.2109 1343264
0.8541 0.3206 30 1.2285 1613856
0.7718 0.3741 35 1.2338 1883800
0.752 0.4275 40 1.2181 2154856
0.6467 0.4810 45 1.2274 2428208
0.5452 0.5344 50 1.2074 2695040
0.5495 0.5878 55 1.2047 2970696
0.5562 0.6413 60 1.2104 3245864
0.5367 0.6947 65 1.1986 3512208
0.4594 0.7482 70 1.1975 3784176
0.5366 0.8016 75 1.1995 4052712
0.3897 0.8550 80 1.1944 4323640
0.4671 0.9085 85 1.1959 4591856
0.4434 0.9619 90 1.1870 4864704

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2

Base model

google/gemma-2-2b
Finetuned
(491)
this model