--- license: gemma base_model: google/gemma-2-27b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0 results: [] --- # collapse_gemma-2-27b_hs2_accumulate_iter2_sftsd0 This model is a fine-tuned version of [google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.9209 - Num Input Tokens Seen: 9209636 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 4 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.1282 | 0 | | 1.7181 | 0.0278 | 5 | 1.0178 | 258940 | | 1.5727 | 0.0556 | 10 | 0.9732 | 523840 | | 1.4846 | 0.0834 | 15 | 0.9611 | 775600 | | 1.5742 | 0.1113 | 20 | 0.9573 | 1031780 | | 1.5061 | 0.1391 | 25 | 0.9571 | 1291404 | | 1.2746 | 0.1669 | 30 | 0.9544 | 1551680 | | 1.2702 | 0.1947 | 35 | 0.9557 | 1808156 | | 1.329 | 0.2225 | 40 | 0.9525 | 2060568 | | 1.1092 | 0.2503 | 45 | 0.9495 | 2319496 | | 0.9658 | 0.2782 | 50 | 0.9482 | 2567632 | | 1.0994 | 0.3060 | 55 | 0.9444 | 2831744 | | 1.0686 | 0.3338 | 60 | 0.9435 | 3087788 | | 1.115 | 0.3616 | 65 | 0.9405 | 3340312 | | 1.0044 | 0.3894 | 70 | 0.9375 | 3602000 | | 1.1384 | 0.4172 | 75 | 0.9357 | 3868648 | | 1.0943 | 0.4451 | 80 | 0.9361 | 4121888 | | 1.0129 | 0.4729 | 85 | 0.9323 | 4375104 | | 0.9281 | 0.5007 | 90 | 0.9314 | 4629144 | | 0.9001 | 0.5285 | 95 | 0.9316 | 4881800 | | 1.0471 | 0.5563 | 100 | 0.9303 | 5142288 | | 1.0141 | 0.5841 | 105 | 0.9302 | 5398480 | | 1.0427 | 0.6120 | 110 | 0.9280 | 5651544 | | 0.9628 | 0.6398 | 115 | 0.9274 | 5904284 | | 0.8986 | 0.6676 | 120 | 0.9257 | 6160992 | | 0.9081 | 0.6954 | 125 | 0.9279 | 6427076 | | 0.957 | 0.7232 | 130 | 0.9241 | 6686176 | | 0.9556 | 0.7510 | 135 | 0.9246 | 6942364 | | 0.9609 | 0.7789 | 140 | 0.9244 | 7193836 | | 0.9889 | 0.8067 | 145 | 0.9228 | 7452352 | | 0.9009 | 0.8345 | 150 | 0.9231 | 7708728 | | 0.8942 | 0.8623 | 155 | 0.9217 | 7969644 | | 0.9304 | 0.8901 | 160 | 0.9216 | 8223032 | | 0.9462 | 0.9179 | 165 | 0.9212 | 8481188 | | 0.9904 | 0.9458 | 170 | 0.9204 | 8743924 | | 0.9147 | 0.9736 | 175 | 0.9204 | 8999112 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1