Kleo commited on
Commit
e8b3405
·
verified ·
1 Parent(s): 8b5da4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -10
README.md CHANGED
@@ -120,6 +120,28 @@ for idx, result in enumerate(results, start=1):
120
 
121
  Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ### Training Procedure
124
  The following hyperparameters were used during training:
125
  learning_rate: 1e-4
@@ -134,16 +156,7 @@ Weight Decay: 0.01
134
  M. G. Norm: 0.3
135
  max_seq_length: 512
136
  num_epochs: 1
137
- ##################################################################
138
- LoRa Hyperparameters
139
- LoRA r :8
140
- LoRA alpha: 8
141
- LoRA dropout: 0.0
142
- LoRA bias: ‘none'
143
- target_modules: q_proj, v_proj
144
- task_type: "SEQ_CLS"
145
- Loss: Binary Cross Entropy
146
- trainable parameters: 3,416,064 (~5% of the original model)
147
 
148
 
149
  #### Training hyperparameters
 
120
 
121
  Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
122
 
123
+ ### Quantization
124
+ 4-bit with bitsanbytes
125
+ ```
126
+ bnb_config = BitsAndBytesConfig(
127
+ load_in_4bit=True,
128
+ bnb_4bit_quant_type="nf4",
129
+ bnb_4bit_use_double_quant=True,
130
+ bnb_4bit_compute_dtype=torch.bfloat16
131
+ )
132
+ ```
133
+
134
+ ### PEFT (LoRa)
135
+ LoRa Hyperparameters
136
+ LoRA r :8
137
+ LoRA alpha: 8
138
+ LoRA dropout: 0.0
139
+ LoRA bias: ‘none'
140
+ target_modules: q_proj, v_proj
141
+ task_type: "SEQ_CLS"
142
+ Loss: Binary Cross Entropy
143
+ trainable parameters: 3,416,064 (~5% of the original model)
144
+
145
  ### Training Procedure
146
  The following hyperparameters were used during training:
147
  learning_rate: 1e-4
 
156
  M. G. Norm: 0.3
157
  max_seq_length: 512
158
  num_epochs: 1
159
+
 
 
 
 
 
 
 
 
 
160
 
161
 
162
  #### Training hyperparameters