Update README.md
Browse files
README.md
CHANGED
@@ -120,6 +120,28 @@ for idx, result in enumerate(results, start=1):
|
|
120 |
|
121 |
Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
|
122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
### Training Procedure
|
124 |
The following hyperparameters were used during training:
|
125 |
learning_rate: 1e-4
|
@@ -134,16 +156,7 @@ Weight Decay: 0.01
|
|
134 |
M. G. Norm: 0.3
|
135 |
max_seq_length: 512
|
136 |
num_epochs: 1
|
137 |
-
|
138 |
-
LoRa Hyperparameters
|
139 |
-
LoRA r :8
|
140 |
-
LoRA alpha: 8
|
141 |
-
LoRA dropout: 0.0
|
142 |
-
LoRA bias: ‘none'
|
143 |
-
target_modules: q_proj, v_proj
|
144 |
-
task_type: "SEQ_CLS"
|
145 |
-
Loss: Binary Cross Entropy
|
146 |
-
trainable parameters: 3,416,064 (~5% of the original model)
|
147 |
|
148 |
|
149 |
#### Training hyperparameters
|
|
|
120 |
|
121 |
Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
|
122 |
|
123 |
+
### Quantization
|
124 |
+
4-bit with bitsanbytes
|
125 |
+
```
|
126 |
+
bnb_config = BitsAndBytesConfig(
|
127 |
+
load_in_4bit=True,
|
128 |
+
bnb_4bit_quant_type="nf4",
|
129 |
+
bnb_4bit_use_double_quant=True,
|
130 |
+
bnb_4bit_compute_dtype=torch.bfloat16
|
131 |
+
)
|
132 |
+
```
|
133 |
+
|
134 |
+
### PEFT (LoRa)
|
135 |
+
LoRa Hyperparameters
|
136 |
+
LoRA r :8
|
137 |
+
LoRA alpha: 8
|
138 |
+
LoRA dropout: 0.0
|
139 |
+
LoRA bias: ‘none'
|
140 |
+
target_modules: q_proj, v_proj
|
141 |
+
task_type: "SEQ_CLS"
|
142 |
+
Loss: Binary Cross Entropy
|
143 |
+
trainable parameters: 3,416,064 (~5% of the original model)
|
144 |
+
|
145 |
### Training Procedure
|
146 |
The following hyperparameters were used during training:
|
147 |
learning_rate: 1e-4
|
|
|
156 |
M. G. Norm: 0.3
|
157 |
max_seq_length: 512
|
158 |
num_epochs: 1
|
159 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
160 |
|
161 |
|
162 |
#### Training hyperparameters
|