collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1920
  • Num Input Tokens Seen: 4990392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.2869 0.0530 5 1.2735 262696
1.1702 0.1060 10 1.1965 527024
0.9937 0.1589 15 1.1996 790848
0.9627 0.2119 20 1.2120 1058784
0.7511 0.2649 25 1.2339 1324632
0.6814 0.3179 30 1.2293 1593184
0.5676 0.3709 35 1.2292 1855640
0.5643 0.4238 40 1.2202 2122816
0.4603 0.4768 45 1.2314 2388616
0.5547 0.5298 50 1.2148 2659144
0.3848 0.5828 55 1.2187 2917112
0.3427 0.6358 60 1.2079 3188928
0.4605 0.6887 65 1.1907 3455416
0.4421 0.7417 70 1.2011 3723320
0.3895 0.7947 75 1.1885 3986816
0.3918 0.8477 80 1.1865 4250632
0.4303 0.9007 85 1.1873 4512448
0.3382 0.9536 90 1.1831 4782984

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd1

Base model

google/gemma-2-2b
Finetuned
(491)
this model