RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1 / README.md

RylanSchaeffer's picture

End of training

fae8e1c verified 5 months ago

|

history blame contribute delete

4.22 kB

	---
	license: gemma
	base_model: google/gemma-2-2b
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

	This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.0880
	- Num Input Tokens Seen: 10886024

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-06
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 1
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant_with_warmup
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Input Tokens Seen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-----------------:\|
	\| No log \| 0 \| 0 \| 1.3909 \| 0 \|
	\| 1.4818 \| 0.0264 \| 5 \| 1.3328 \| 284168 \|
	\| 1.4014 \| 0.0528 \| 10 \| 1.2146 \| 570464 \|
	\| 1.247 \| 0.0792 \| 15 \| 1.1552 \| 859552 \|
	\| 1.2344 \| 0.1056 \| 20 \| 1.1316 \| 1139712 \|
	\| 1.0727 \| 0.1321 \| 25 \| 1.1148 \| 1425952 \|
	\| 1.0489 \| 0.1585 \| 30 \| 1.1144 \| 1712584 \|
	\| 1.0564 \| 0.1849 \| 35 \| 1.1157 \| 1999000 \|
	\| 1.0475 \| 0.2113 \| 40 \| 1.1221 \| 2278656 \|
	\| 1.0397 \| 0.2377 \| 45 \| 1.1144 \| 2567096 \|
	\| 0.9626 \| 0.2641 \| 50 \| 1.1186 \| 2858408 \|
	\| 0.9346 \| 0.2905 \| 55 \| 1.1198 \| 3145312 \|
	\| 0.9472 \| 0.3169 \| 60 \| 1.1231 \| 3435992 \|
	\| 0.9308 \| 0.3433 \| 65 \| 1.1217 \| 3729256 \|
	\| 0.7938 \| 0.3698 \| 70 \| 1.1223 \| 4015952 \|
	\| 0.8555 \| 0.3962 \| 75 \| 1.1211 \| 4305600 \|
	\| 0.8708 \| 0.4226 \| 80 \| 1.1195 \| 4599712 \|
	\| 0.8453 \| 0.4490 \| 85 \| 1.1167 \| 4888360 \|
	\| 0.7371 \| 0.4754 \| 90 \| 1.1169 \| 5180504 \|
	\| 0.8233 \| 0.5018 \| 95 \| 1.1128 \| 5473352 \|
	\| 0.8823 \| 0.5282 \| 100 \| 1.1131 \| 5765104 \|
	\| 0.623 \| 0.5546 \| 105 \| 1.1111 \| 6052128 \|
	\| 0.7361 \| 0.5810 \| 110 \| 1.1069 \| 6343856 \|
	\| 0.8444 \| 0.6075 \| 115 \| 1.1103 \| 6631416 \|
	\| 0.7777 \| 0.6339 \| 120 \| 1.1068 \| 6921552 \|
	\| 0.6832 \| 0.6603 \| 125 \| 1.1054 \| 7209048 \|
	\| 0.8106 \| 0.6867 \| 130 \| 1.1039 \| 7489664 \|
	\| 0.6772 \| 0.7131 \| 135 \| 1.1007 \| 7782048 \|
	\| 0.7388 \| 0.7395 \| 140 \| 1.0992 \| 8068440 \|
	\| 0.8197 \| 0.7659 \| 145 \| 1.0968 \| 8360312 \|
	\| 0.6981 \| 0.7923 \| 150 \| 1.0959 \| 8648720 \|
	\| 0.6736 \| 0.8188 \| 155 \| 1.0956 \| 8940416 \|
	\| 0.7139 \| 0.8452 \| 160 \| 1.0935 \| 9223368 \|
	\| 0.8445 \| 0.8716 \| 165 \| 1.0927 \| 9508432 \|
	\| 0.6475 \| 0.8980 \| 170 \| 1.0919 \| 9797464 \|
	\| 0.7119 \| 0.9244 \| 175 \| 1.0904 \| 10086248 \|
	\| 0.8095 \| 0.9508 \| 180 \| 1.0897 \| 10378552 \|
	\| 0.6255 \| 0.9772 \| 185 \| 1.0894 \| 10659304 \|


	### Framework versions

	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1