End of training

Files changed (5) hide show

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ early_stopping_patience: null
 eval_max_new_tokens: 128
 eval_table_size: null
 evals_per_epoch: 1
-flash_attention: false
 fp16: null
 fsdp: null
 fsdp_config: null
@@ -103,7 +103,7 @@ xformers_attention: null
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: nan
 ## Model description
@@ -137,11 +137,11 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.0           | 0.0002 | 1    | nan             |
-| 0.1966        | 0.0055 | 25   | nan             |
-| 4.8102        | 0.0111 | 50   | nan             |
-| 2.9287        | 0.0166 | 75   | nan             |
-| 3.5741        | 0.0221 | 100  | nan             |
 ### Framework versions

 eval_max_new_tokens: 128
 eval_table_size: null
 evals_per_epoch: 1
+flash_attention: true
 fp16: null
 fsdp: null
 fsdp_config: null
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.9404
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 1.7563        | 0.0002 | 1    | 3.0182          |
+| 1.531         | 0.0055 | 25   | 1.2687          |
+| 1.1124        | 0.0111 | 50   | 1.0121          |
+| 0.5517        | 0.0166 | 75   | 0.9560          |
+| 0.3658        | 0.0221 | 100  | 0.9404          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "k_proj",
-    "o_proj",
-    "q_proj",
-    "down_proj",
     "gate_proj",
     "up_proj",
-    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "gate_proj",
+    "q_proj",
+    "v_proj",
     "up_proj",
+    "o_proj",
+    "down_proj",
+    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:858c3d4df2784005ed0cd000a01797eb9b52c0c425686bf7aaaaec1e018f7844
 size 90258378

 version https://git-lfs.github.com/spec/v1
+oid sha256:abf7dce0f2eb241853271543c313030dcb7fa9fb841ff314e90c8222190df620
 size 90258378

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9cdd439a76295fd6d09be599ad07a660ed6668b045a9dd94068eb8b9e1779ca3
 size 90207248

 version https://git-lfs.github.com/spec/v1
+oid sha256:3abd600633004f737757a13247a23693caa58d5002d763d8e51a00c693dab67b
 size 90207248

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:05530ac48b6077bf2d56f539fa4393812b10b3dc652ed678c61e51a2fe9ac582
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:74fb7478fc1012b216a24ea1fb3ee5f5f6fc81742a0ea3f826b6fc1e8aae82d5
 size 6776