RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd1

Generated from Trainer

Model card Files Files and versions Community

collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1920
Num Input Tokens Seen: 4990392

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.2869	0.0530	5	1.2735	262696
1.1702	0.1060	10	1.1965	527024
0.9937	0.1589	15	1.1996	790848
0.9627	0.2119	20	1.2120	1058784
0.7511	0.2649	25	1.2339	1324632
0.6814	0.3179	30	1.2293	1593184
0.5676	0.3709	35	1.2292	1855640
0.5643	0.4238	40	1.2202	2122816
0.4603	0.4768	45	1.2314	2388616
0.5547	0.5298	50	1.2148	2659144
0.3848	0.5828	55	1.2187	2917112
0.3427	0.6358	60	1.2079	3188928
0.4605	0.6887	65	1.1907	3455416
0.4421	0.7417	70	1.2011	3723320
0.3895	0.7947	75	1.1885	3986816
0.3918	0.8477	80	1.1865	4250632
0.4303	0.9007	85	1.1873	4512448
0.3382	0.9536	90	1.1831	4782984

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Downloads last month: 4

Safetensors

Model size

2.61B params

Tensor type

BF16

·

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter10_sftsd1

Base model

google/gemma-2-2b

Finetuned

(491)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard