Kleo
/

meltemi_arg2kp_matcher

Text Classification

Model card Files Files and versions Community

Kleo commited on Jan 27

Commit

e8b3405

·

verified ·

1 Parent(s): 8b5da4b

Update README.md

Files changed (1) hide show

README.md +23 -10

README.md CHANGED Viewed

@@ -120,6 +120,28 @@ for idx, result in enumerate(results, start=1):
 Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
 ### Training Procedure
 The following hyperparameters were used during training:
 learning_rate:  1e-4
@@ -134,16 +156,7 @@ Weight Decay: 0.01
 M. G. Norm: 0.3
 max_seq_length: 512
 num_epochs: 1
-##################################################################
-LoRa Hyperparameters
-LoRA r :8
-LoRA alpha: 8
-LoRA dropout: 0.0
-LoRA bias: ‘none'
-target_modules: q_proj, v_proj
-task_type: "SEQ_CLS"
-Loss: Binary Cross Entropy
-trainable parameters: 3,416,064 (~5% of the original model)
 #### Training hyperparameters

 Machine translated train set of [ArgKP_2021_GR](https://huggingface.co/datasets/Kleo/ArgKP_2021_GR)
+### Quantization
+4-bit with bitsanbytes
+```
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+```
+### PEFT (LoRa)
+LoRa Hyperparameters
+LoRA r :8
+LoRA alpha: 8
+LoRA dropout: 0.0
+LoRA bias: ‘none'
+target_modules: q_proj, v_proj
+task_type: "SEQ_CLS"
+Loss: Binary Cross Entropy
+trainable parameters: 3,416,064 (~5% of the original model)
 ### Training Procedure
 The following hyperparameters were used during training:
 learning_rate:  1e-4
 M. G. Norm: 0.3
 max_seq_length: 512
 num_epochs: 1
 #### Training hyperparameters