RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2 / README.md

RylanSchaeffer's picture

End of training

d03f4d5 verified 5 months ago

|

history blame contribute delete

2.87 kB

	---
	license: gemma
	base_model: google/gemma-2-2b
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# collapse_gemma-2-2b_hs2_accumulatesubsample_iter9_sftsd2

	This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.1952
	- Num Input Tokens Seen: 5025616

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-06
	- train_batch_size: 8
	- eval_batch_size: 16
	- seed: 2
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant_with_warmup
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Input Tokens Seen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-----------------:\|
	\| No log \| 0 \| 0 \| 1.3909 \| 0 \|
	\| 1.3245 \| 0.0534 \| 5 \| 1.2768 \| 275360 \|
	\| 1.1423 \| 0.1069 \| 10 \| 1.2008 \| 543560 \|
	\| 0.9748 \| 0.1603 \| 15 \| 1.1848 \| 809872 \|
	\| 1.0866 \| 0.2138 \| 20 \| 1.2027 \| 1077984 \|
	\| 0.8487 \| 0.2672 \| 25 \| 1.2109 \| 1343264 \|
	\| 0.8541 \| 0.3206 \| 30 \| 1.2285 \| 1613856 \|
	\| 0.7718 \| 0.3741 \| 35 \| 1.2338 \| 1883800 \|
	\| 0.752 \| 0.4275 \| 40 \| 1.2181 \| 2154856 \|
	\| 0.6467 \| 0.4810 \| 45 \| 1.2274 \| 2428208 \|
	\| 0.5452 \| 0.5344 \| 50 \| 1.2074 \| 2695040 \|
	\| 0.5495 \| 0.5878 \| 55 \| 1.2047 \| 2970696 \|
	\| 0.5562 \| 0.6413 \| 60 \| 1.2104 \| 3245864 \|
	\| 0.5367 \| 0.6947 \| 65 \| 1.1986 \| 3512208 \|
	\| 0.4594 \| 0.7482 \| 70 \| 1.1975 \| 3784176 \|
	\| 0.5366 \| 0.8016 \| 75 \| 1.1995 \| 4052712 \|
	\| 0.3897 \| 0.8550 \| 80 \| 1.1944 \| 4323640 \|
	\| 0.4671 \| 0.9085 \| 85 \| 1.1959 \| 4591856 \|
	\| 0.4434 \| 0.9619 \| 90 \| 1.1870 \| 4864704 \|


	### Framework versions

	- Transformers 4.44.0
	- Pytorch 2.4.0+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1