---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter17_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulatesubsample_iter17_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2165
- Num Input Tokens Seen: 4964320

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.2726        | 0.0535 | 5    | 1.2797          | 261768            |
| 1.1191        | 0.1070 | 10   | 1.2274          | 527656            |
| 0.9603        | 0.1605 | 15   | 1.2346          | 792592            |
| 0.7861        | 0.2140 | 20   | 1.2535          | 1060392           |
| 0.7055        | 0.2676 | 25   | 1.2497          | 1331816           |
| 0.6513        | 0.3211 | 30   | 1.2599          | 1600048           |
| 0.6785        | 0.3746 | 35   | 1.2513          | 1862592           |
| 0.5816        | 0.4281 | 40   | 1.2579          | 2132648           |
| 0.5033        | 0.4816 | 45   | 1.2418          | 2397080           |
| 0.4926        | 0.5351 | 50   | 1.2292          | 2665584           |
| 0.5115        | 0.5886 | 55   | 1.2360          | 2939440           |
| 0.395         | 0.6421 | 60   | 1.2264          | 3206336           |
| 0.4836        | 0.6957 | 65   | 1.2312          | 3475784           |
| 0.4008        | 0.7492 | 70   | 1.2145          | 3740448           |
| 0.4104        | 0.8027 | 75   | 1.2251          | 4008264           |
| 0.4466        | 0.8562 | 80   | 1.2196          | 4277008           |
| 0.3173        | 0.9097 | 85   | 1.2176          | 4540200           |
| 0.4054        | 0.9632 | 90   | 1.2160          | 4799696           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1