Text Generation
Transformers
PyTorch
Japanese
llama
text-generation-inference
Inference Endpoints
ptrdvn commited on
Commit
f503b95
ยท
1 Parent(s): 587d621

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # About
2
+ This model is Lightblue's QLoRA finetune of OpenOrca's [Open-Orca/OpenOrcaxOpenChat-Preview2-13B](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B) model on Japanese fine-tuning datasets.
3
+
4
+ We trained on equal samples of the following three datasets:
5
+ * [SNOW](https://huggingface.co/datasets/snow_simplified_japanese_corpus)
6
+ * [TyDiQA (Ja)](https://huggingface.co/datasets/khalidalt/tydiqa-goldp)
7
+ * [XLSUM (Ja)](https://huggingface.co/datasets/csebuetnlp/xlsum)
8
+ which resulted in a dataset of 13167 samples total.
9
+
10
+ These three datasets make up the model name: STX.
11
+
12
+ # How to use
13
+
14
+ ```python
15
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
18
+ model = AutoModelForCausalLM.from_pretrained(
19
+ model_dir, torch_dtype=torch.bfloat16, device_map='auto',
20
+ )
21
+
22
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
23
+
24
+
25
+ def create_summarization_prompt(text_to_sum):
26
+ return "่จ˜ไบ‹๏ผš\n" + text_to_sum + "\n\n่ฆ็ด„๏ผš\n"
27
+
28
+ def do_closed_qa(context, question):
29
+ return context + "\n\n" + question
30
+
31
+ test_article = """ใ€€ใƒขใƒŽใƒžใƒใฎใƒฌใƒ‘ใƒผใƒˆใƒชใƒผใซใ€Œใƒชใƒผใƒใƒปใƒžใ‚คใ‚ฑใƒซ้ธๆ‰‹ใ€ใŒใ‚ใ‚‹ใƒฌใ‚คใ‚ถใƒผใƒฉใƒขใƒณRGใ•ใ‚“ใ€‚ๆœฌไบบๅ…ฌ่ชใฎใƒขใƒŽใƒžใƒใงใ™ใŒใ€ใƒฉใ‚ฐใƒ“ใƒผใƒ•ใ‚กใƒณใฎๅๅฟœใซๅฐ‘ใ—้ฉšใ„ใŸใใ†ใงใ™ใ€‚
32
+ ใ€€ใƒชใƒผใƒใƒปใƒžใ‚คใ‚ฑใƒซ้ธๆ‰‹ใฎใƒขใƒŽใƒžใƒใฏใ€ไฝ•ใŒใใฃใ‹ใ‘ใงใ™ใ‹ใ€‚
33
+ ใ€Œ2015ๅนดใฎใƒฏใƒผใƒซใƒ‰ใ‚ซใƒƒใƒ—๏ผˆWๆฏ๏ผ‰ใ‚คใƒณใ‚ฐใƒฉใƒณใƒ‰ๅคงไผšใงๆ—ฅๆœฌใŒๅ—ใ‚ขใƒ•ใƒชใ‚ซใ‚’ๅ€’ใ—ใŸๆฌกใฎๆ—ฅใŒใ€ไบฌ้ƒฝใงใฎ็•ช็ต„ใƒญใ‚ฑใงใ—ใŸใ€‚ๅฝ“ๆ™‚ใฏใ€ใ‚ขใƒƒใƒ—ใƒซใฎๅ…ฑๅŒๅ‰ตๆฅญ่€…ใ‚นใƒ†ใ‚ฃใƒผใƒ–ใƒปใ‚ธใƒงใƒ–ใ‚บใฎใƒขใƒŽใƒžใƒใฐใ‹ใ‚Šใงใ—ใŸใŒใ€ไธ€็ท’ใซใƒญใ‚ฑใ‚’ใ—ใฆใ„ใŸใ‚ธใƒฃใƒณใ‚ฐใƒซใƒใ‚ฑใƒƒใƒˆใ‹ใ‚‰ใ€Žใƒชใƒผใƒใƒปใƒžใ‚คใ‚ฑใƒซใซไผผใฆใพใ™ใ‚ˆใ€‚ใ‚ธใƒงใƒ–ใ‚บใฎใพใพใ€ใ„ใ‘ใ‚‹ใ‚“ใ˜ใ‚ƒใชใ„ใงใ™ใ‹๏ผŸใ€ใจ่จ€ใ‚ใ‚ŒใŸใฎใŒๅง‹ใพใ‚Šใงใ™ใ€
34
+ ใ€ŒใŸใ ใ€ใฟใ‚“ใช็Ÿฅ่ญ˜ใŒใชใ„ใ€‚ใƒฉใ‚ฐใƒ“ใƒผใ‚ทใƒงใƒƒใƒ—ใ‚’ๆŽขใ—ใ€ๆ—ฅๆœฌไปฃ่กจใฎใƒฆใƒ‹ใƒ›ใƒผใƒ ใŒๅฃฒใ‚Šๅˆ‡ใ‚Œใ ใฃใŸใฎใงใ€่ตคใฃใฝใ„ใƒฆใƒ‹ใƒ›ใƒผใƒ ใจใƒ”ใƒใƒ”ใƒใฎ็Ÿญใƒ‘ใƒณใ‚’ใฏใ„ใฆใ€‚ใจใ‚Šใ‚ใˆใšSNSใงใ€Žใƒชใƒผใƒใƒปใƒžใ‚คใ‚ฑใƒซใงใ™ใ€ใฃใฆใ„ใฃใฑใ„ๅ†™็œŸใ‚’่ผ‰ใ›ใพใ—ใŸใ€
35
+ ใ€Œใ™ใ‚‹ใจใ€ใใ‚Œใ‚’่ฆ‹ใŸใƒชใƒผใƒใ•ใ‚“ๆœฌไบบใ‹ใ‚‰DM๏ผˆใƒ€ใ‚คใƒฌใ‚ฏใƒˆใƒกใƒƒใ‚ปใƒผใ‚ธ๏ผ‰ใŒๅฑŠใใพใ—ใŸใ€‚ใ€ŽใƒขใƒŽใƒžใƒใ‚ใ‚ŠใŒใจใ†ใ”ใ–ใ„ใพใ™ใ€‚ใ‚‚ใ—ใƒขใƒŽใƒžใƒใ‚’ใ™ใ‚‹ใชใ‚‰ใ€ๅƒ•ใฎใƒฆใƒ‹ใƒ›ใƒผใƒ ใ‚’้€ใ‚Šใพใ™ใฎใง็€ใฆใใ ใ•ใ„ใ€ใจใ€‚WๆฏๅพŒใซใƒฆใƒ‹ใƒ›ใƒผใƒ 2็€ใจใƒ‘ใƒณใƒ„ใ‚„ใ‚ฝใƒƒใ‚ฏใ‚นใชใฉใ‚’ใปใ‚“ใพใซ้€ใฃใฆใใฆใใ‚Œใพใ—ใŸใ€‚ไปŠ็€ใฆใ„ใ‚‹ใฎใŒใใ‚Œใงใ™ใ€
36
+ ใ“ใ‚Œใพใงใ€ๆ•ฐใ€…ใฎ่‘—ๅไบบใ‚’ใƒขใƒŽใƒžใƒใ—ใฆใ“ใ‚‰ใ‚Œใพใ—ใŸใ€‚ใƒชใƒผใƒ้ธๆ‰‹ใฎใƒใ‚ฟใฎๅ้Ÿฟใฏใ„ใ‹ใŒใงใ—ใŸใ‹ใ€‚
37
+ ใ€€ใ€Œๅƒ•ใฏใƒฉใ‚ฐใƒ“ใƒผ็ตŒ้จ“ใŒใชใ„ใงใ™ใ—ใ€ใƒฉใ‚ฐใƒ“ใƒผใ‚’ๅ…จ็„ถ็Ÿฅใ‚‰ใชใ‹ใฃใŸใ‘ใฉใ€ใ‚„ใฃใฑใ‚Šๆœฌไบบใ‹ใ‚‰ใƒฆใƒ‹ใƒ›ใƒผใƒ ใ‚’้ ‚ใ„ใฆใ‚‹ใฃใฆใ„ใ†โ€œๅฐ็ฑ ๏ผˆใ„ใ‚“ใ‚ใ†๏ผ‰โ€ใฟใŸใ„ใชใฎใŒใ‚ใฃใฆใ€‚ใ€Žใ‚ใ„ใคใฏใƒชใƒผใƒใ•ใ‚“ๆœฌไบบใซ่ชใ‚ใ‚‰ใ‚Œใฆใ‚‹ใ€ใจใ€‚ไธ€็›ฎ็ฝฎใ‹ใ‚Œใฆใ„ใ‚‹ใฎใ‹ใชใจๆ„Ÿใ˜ใพใ™ใ€
38
+ ใ€€ใ€Œใ‚„ใฃใฆใ„ใ‚‹ใ“ใจใฏใ€่ฆ‹ใŸ็›ฎใ‚’ๆœฌไบบใซๅฏ„ใ›ใฆใƒฏใƒณใƒใƒผใƒ ใฃใฆ่จ€ใ†ใ ใ‘ใชใ‚“ใงใ™ใ‘ใฉใญใ€‚ใใ‚Œใงใ‚‚ใ€Žใ‚ใ‚ใ€ใƒชใƒผใƒใ•ใ‚“ใ ใ€ใจ่จ€ใฃใฆใ‚‚ใ‚‰ใˆใพใ™ใ€
39
+ ใ€€ใ€Œใƒชใƒผใƒใ•ใ‚“ใจๅฎŸ้š›ใซไผšใ†ใ“ใจใชใ‚“ใฆใ€็ฐกๅ˜ใซใฏใงใใชใ„ใ˜ใ‚ƒใชใ„ใงใ™ใ‹ใ€‚ใงใ‚‚ใ€ใƒชใƒผใƒใ•ใ‚“ใฎใพใญใ‚’ใ—ใฆใ„ใ‚‹RGใซใฏไผšใˆใŸใ‚ใ€ใฟใŸใ„ใช๏ผˆ็ฌ‘๏ผ‰ใ€‚ไฝ•ใ ใ‚ใ†ใชใ€ๆœ‰ๅใช็ฅž็คพใฎๆ”ฏ็คพใฎใ‚ˆใ†ใชๅญ˜ๅœจใงใ™ใ‹ใญใ€‚ใ‚ใ‚ŠใŒใŸใŒใ‚‰ใ‚Œใ‚‹ใจใ„ใ†ๆ„ๅ‘ณใงใฏไป–ใฎใƒขใƒŽใƒžใƒใจใฏใ™ใ”ใ้•ใ„ใพใ™ใญใ€
40
+ """
41
+
42
+ test_question = "ใ€€ใƒชใƒผใƒใƒปใƒžใ‚คใ‚ฑใƒซใฏไฝ•ใ‚’้€ใฃใฆใใพใ—ใŸใ‹๏ผŸ"
43
+
44
+ pipe(create_summarization_prompt(test_article), max_new_tokens=256, temperature=0)[0]["generated_text"]
45
+ pipe(do_closed_qa(test_article, question), max_new_tokens=128, temperature=0)[0]["generated_text"]
46
+ ```
47
+
48
+
49
+ # Training details
50
+
51
+ This model was trained for 1000 steps (1.2 epochs) with the model being evaluated every 50 steps. We then chose the best model from these evaluations based on validation loss.
52
+ We used the [qlora](https://github.com/artidoro/qlora) package from artidoro.
53
+ We trained with the following hyperparameters:
54
+
55
+ Per device evaluation batch size: 16
56
+ Per device train batch size: 8
57
+ LoRA (lora_r): 64
58
+ LoRA alpha (lora_alpha): 16
59
+ LoRA modules: all
60
+ Double quantization: Enabled
61
+ Quantization type: nf4
62
+ BF16: Enabled
63
+ Bits: 4
64
+ Warmup ratio: 0.03
65
+ Learning rate scheduler type: Constant
66
+ Gradient checkpointing: Enabled
67
+ Gradient accumulation steps: 2
68
+ Learning rate: 0.0002
69
+ Adam beta2: 0.999
70
+ Maximum gradient norm: 0.3
71
+ LoRA dropout: 0.05
72
+ Weight decay: 0.0
73
+
74
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/UWiE7z5tG8t_vdSFrb5WC.png)
75
+
76
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/_fKBf9sdq9UAKKYMxM6ad.png)
77
+