collapse_gemma-2-2b_hs2_accumulatesubsample_iter12_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2025
  • Num Input Tokens Seen: 5030112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3362 0.0532 5 1.2769 265880
1.1081 0.1063 10 1.2084 539392
0.9652 0.1595 15 1.1969 804592
0.9587 0.2126 20 1.2121 1075208
0.7927 0.2658 25 1.2314 1341568
0.7717 0.3189 30 1.2362 1611200
0.8045 0.3721 35 1.2276 1882360
0.5481 0.4252 40 1.2431 2142016
0.5799 0.4784 45 1.2190 2414496
0.6372 0.5316 50 1.2269 2678120
0.6331 0.5847 55 1.2109 2943720
0.4929 0.6379 60 1.2131 3206336
0.5115 0.6910 65 1.2175 3478880
0.5867 0.7442 70 1.2034 3749192
0.453 0.7973 75 1.2146 4012808
0.5101 0.8505 80 1.2072 4274584
0.3742 0.9037 85 1.2038 4541400
0.4019 0.9568 90 1.2069 4814864

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter12_sftsd2

Base model

google/gemma-2-2b
Finetuned
(500)
this model