Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# About
|
2 |
+
This model is Lightblue's QLoRA finetune of OpenOrca's [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) model on Japanese fine-tuning datasets.
|
3 |
+
|
4 |
+
We trained on equal samples of the following three datasets:
|
5 |
+
* [SNOW](https://huggingface.co/datasets/snow_simplified_japanese_corpus)
|
6 |
+
* [TyDiQA (Ja)](https://huggingface.co/datasets/khalidalt/tydiqa-goldp)
|
7 |
+
* [XLSUM (Ja)](https://huggingface.co/datasets/csebuetnlp/xlsum)
|
8 |
+
which resulted in a dataset of 13167 samples total.
|
9 |
+
|
10 |
+
These three datasets make up the model name: STX.
|
11 |
+
|
12 |
+
# How to use
|
13 |
+
|
14 |
+
```python
|
15 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
16 |
+
|
17 |
+
tokenizer = AutoTokenizer.from_pretrained(model_dir)
|
18 |
+
model = AutoModelForCausalLM.from_pretrained(
|
19 |
+
model_dir, torch_dtype=torch.bfloat16, device_map='auto',
|
20 |
+
)
|
21 |
+
|
22 |
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
23 |
+
|
24 |
+
|
25 |
+
def create_summarization_prompt(text_to_sum):
|
26 |
+
return "่จไบ๏ผ\n" + text_to_sum + "\n\n่ฆ็ด๏ผ\n"
|
27 |
+
|
28 |
+
def do_closed_qa(context, question):
|
29 |
+
return context + "\n\n" + question
|
30 |
+
|
31 |
+
test_article = """ใใขใใใใฎใฌใใผใใชใผใซใใชใผใใปใใคใฑใซ้ธๆใใใใใฌใคใถใผใฉใขใณRGใใใๆฌไบบๅ
ฌ่ชใฎใขใใใใงใใใใฉใฐใใผใใกใณใฎๅๅฟใซๅฐใ้ฉใใใใใงใใ
|
32 |
+
ใใชใผใใปใใคใฑใซ้ธๆใฎใขใใใใฏใไฝใใใฃใใใงใใใ
|
33 |
+
ใ2015ๅนดใฎใฏใผใซใใซใใ๏ผWๆฏ๏ผใคใณใฐใฉใณใๅคงไผใงๆฅๆฌใๅใขใใชใซใๅใใๆฌกใฎๆฅใใไบฌ้ฝใงใฎ็ช็ตใญใฑใงใใใๅฝๆใฏใใขใใใซใฎๅ
ฑๅๅตๆฅญ่
ในใใฃใผใใปใธใงใใบใฎใขใใใใฐใใใงใใใใไธ็ทใซใญใฑใใใฆใใใธใฃใณใฐใซใใฑใใใใใใชใผใใปใใคใฑใซใซไผผใฆใพใใใใธใงใใบใฎใพใพใใใใใใใใชใใงใใ๏ผใใจ่จใใใใฎใๅงใพใใงใใ
|
34 |
+
ใใใ ใใฟใใช็ฅ่ญใใชใใใฉใฐใใผใทใงใใใๆขใใๆฅๆฌไปฃ่กจใฎใฆใใใผใ ใๅฃฒใๅใใ ใฃใใฎใงใ่ตคใฃใฝใใฆใใใผใ ใจใใใใใฎ็ญใใณใใฏใใฆใใจใใใใSNSใงใใชใผใใปใใคใฑใซใงใใใฃใฆใใฃใฑใๅ็ใ่ผใใพใใใ
|
35 |
+
ใใใใจใใใใ่ฆใใชใผใใใๆฌไบบใใDM๏ผใใคใฌใฏใใกใใปใผใธ๏ผใๅฑใใพใใใใใขใใใใใใใจใใใใใพใใใใใขใใใใใใใชใใๅใฎใฆใใใผใ ใ้ใใพใใฎใง็ใฆใใ ใใใใจใWๆฏๅพใซใฆใใใผใ 2็ใจใใณใใใฝใใฏในใชใฉใใปใใพใซ้ใฃใฆใใฆใใใพใใใไป็ใฆใใใฎใใใใงใใ
|
36 |
+
ใใใพใงใๆฐใ
ใฎ่ๅไบบใใขใใใใใฆใใใใพใใใใชใผใ้ธๆใฎใใฟใฎๅ้ฟใฏใใใใงใใใใ
|
37 |
+
ใใๅใฏใฉใฐใใผ็ต้จใใชใใงใใใใฉใฐใใผใๅ
จ็ถ็ฅใใชใใฃใใใฉใใใฃใฑใๆฌไบบใใใฆใใใผใ ใ้ ใใฆใใฃใฆใใโๅฐ็ฑ ๏ผใใใใ๏ผโใฟใใใชใฎใใใฃใฆใใใใใคใฏใชใผใใใๆฌไบบใซ่ชใใใใฆใใใจใไธ็ฎ็ฝฎใใใฆใใใฎใใชใจๆใใพใใ
|
38 |
+
ใใใใฃใฆใใใใจใฏใ่ฆใ็ฎใๆฌไบบใซๅฏใใฆใฏใณใใผใ ใฃใฆ่จใใ ใใชใใงใใใฉใญใใใใงใใใใใใชใผใใใใ ใใจ่จใฃใฆใใใใพใใ
|
39 |
+
ใใใชใผใใใใจๅฎ้ใซไผใใใจใชใใฆใ็ฐกๅใซใฏใงใใชใใใใชใใงใใใใงใใใชใผใใใใฎใพใญใใใฆใใRGใซใฏไผใใใใใฟใใใช๏ผ็ฌ๏ผใไฝใ ใใใชใๆๅใช็ฅ็คพใฎๆฏ็คพใฎใใใชๅญๅจใงใใใญใใใใใใใใใใจใใๆๅณใงใฏไปใฎใขใใใใจใฏใใใ้ใใพใใญใ
|
40 |
+
"""
|
41 |
+
|
42 |
+
test_question = "ใใชใผใใปใใคใฑใซใฏไฝใ้ใฃใฆใใพใใใ๏ผ"
|
43 |
+
|
44 |
+
pipe(create_summarization_prompt(test_article), max_new_tokens=256, temperature=0)[0]["generated_text"]
|
45 |
+
pipe(do_closed_qa(test_article, question), max_new_tokens=128, temperature=0)[0]["generated_text"]
|
46 |
+
```
|
47 |
+
|
48 |
+
|
49 |
+
# Training details
|
50 |
+
|
51 |
+
This model was trained for 1000 steps (1.2 epochs) with the model being evaluated every 50 steps. We then chose the best model from these evaluations based on validation loss.
|
52 |
+
We used the [qlora](https://github.com/artidoro/qlora) package from artidoro.
|
53 |
+
We trained with the following hyperparameters:
|
54 |
+
|
55 |
+
Per device evaluation batch size: 16
|
56 |
+
Per device train batch size: 8
|
57 |
+
LoRA (lora_r): 64
|
58 |
+
LoRA alpha (lora_alpha): 16
|
59 |
+
LoRA modules: all
|
60 |
+
Double quantization: Enabled
|
61 |
+
Quantization type: nf4
|
62 |
+
BF16: Enabled
|
63 |
+
Bits: 4
|
64 |
+
Warmup ratio: 0.03
|
65 |
+
Learning rate scheduler type: Constant
|
66 |
+
Gradient checkpointing: Enabled
|
67 |
+
Gradient accumulation steps: 2
|
68 |
+
Learning rate: 0.0002
|
69 |
+
Adam beta2: 0.999
|
70 |
+
Maximum gradient norm: 0.3
|
71 |
+
LoRA dropout: 0.05
|
72 |
+
Weight decay: 0.0
|
73 |
+
|
74 |
+

|
75 |
+
|
76 |
+

|
77 |
+
|