Update README.md
Browse files
README.md
CHANGED
@@ -69,18 +69,18 @@ This technique avoids training unnecessarily low-performng weights that can turn
|
|
69 |
|
70 |
Axolotl was used for training and dataset tokenization.
|
71 |
|
72 |
-
#### Preprocessing
|
73 |
|
74 |
Dataset was formatted in ShareGpt format for the purposes of using with Axolotl, in conversational format.
|
75 |
|
76 |
#### Training Hyperparameters
|
77 |
|
78 |
-
lora_r: 64
|
79 |
-
lora_alpha: 16
|
80 |
-
lora_dropout: 0.05
|
81 |
-
gradient_accumulation_steps: 4
|
82 |
-
micro_batch_size: 1
|
83 |
-
num_epochs: 3
|
84 |
-
optimizer: adamw_bnb_8bit
|
85 |
-
lr_scheduler: cosine
|
86 |
-
learning_rate: 0.00025
|
|
|
69 |
|
70 |
Axolotl was used for training and dataset tokenization.
|
71 |
|
72 |
+
#### Preprocessing
|
73 |
|
74 |
Dataset was formatted in ShareGpt format for the purposes of using with Axolotl, in conversational format.
|
75 |
|
76 |
#### Training Hyperparameters
|
77 |
|
78 |
+
- lora_r: 64
|
79 |
+
- lora_alpha: 16
|
80 |
+
- lora_dropout: 0.05
|
81 |
+
- gradient_accumulation_steps: 4
|
82 |
+
- micro_batch_size: 1
|
83 |
+
- num_epochs: 3
|
84 |
+
- optimizer: adamw_bnb_8bit
|
85 |
+
- lr_scheduler: cosine
|
86 |
+
- learning_rate: 0.00025
|