Update README.md
Browse files
README.md
CHANGED
@@ -145,7 +145,7 @@ Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/
|
|
145 |
|LoRA r | 20 |
|
146 |
|LoRA alpha | 9 |
|
147 |
|LoRA dropout |0.0 |
|
148 |
-
|LoRA bias
|
149 |
|target_modules |q_proj, v_proj |
|
150 |
|task_type |"SEQ_CLS" |
|
151 |
|Loss |BCE |
|
@@ -153,20 +153,23 @@ Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/
|
|
153 |
|
154 |
### Training Procedure
|
155 |
The following hyperparameters were used during training:
|
156 |
-
learning_rate: 1e-4
|
157 |
-
train_batch_size: 16
|
158 |
-
eval_batch_size: 16
|
159 |
-
seed: 42
|
160 |
-
num_devices: 1
|
161 |
-
gradient_accumulation_steps: 2
|
162 |
-
optimizer: paged Adam optimizer
|
163 |
-
lr_scheduler_type: linear
|
164 |
-
Weight Decay: 0.01
|
165 |
-
M. G. Norm: 0.3
|
166 |
-
max_seq_length: 512
|
167 |
-
num_epochs: 1
|
168 |
|
169 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
170 |
|
171 |
#### Training hyperparameters
|
172 |
|
|
|
145 |
|LoRA r | 20 |
|
146 |
|LoRA alpha | 9 |
|
147 |
|LoRA dropout |0.0 |
|
148 |
+
|LoRA bias |'none' |
|
149 |
|target_modules |q_proj, v_proj |
|
150 |
|task_type |"SEQ_CLS" |
|
151 |
|Loss |BCE |
|
|
|
153 |
|
154 |
### Training Procedure
|
155 |
The following hyperparameters were used during training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
|
157 |
|
158 |
+
|Hyperparameter | Value |
|
159 |
+
|----------------------------|-------------------------------------|
|
160 |
+
|l_r | 1e-4 |
|
161 |
+
|lr_scheduler_type |linear |
|
162 |
+
|train_batch_size | 16 |
|
163 |
+
|eval_batch_size |16 |
|
164 |
+
|seed |42 |
|
165 |
+
|num_devices |1 |
|
166 |
+
|gradient_accumulation_steps |2 |
|
167 |
+
|optimizer |paged Adam |
|
168 |
+
|Weight Decay | 0.01 |
|
169 |
+
|max grad norm | 0.3 |
|
170 |
+
|max_seq_length |512 |
|
171 |
+
|num_epochs |1 |
|
172 |
+
|
173 |
|
174 |
#### Training hyperparameters
|
175 |
|