trollek commited on
Commit
567998b
·
verified ·
1 Parent(s): c65093c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -3
README.md CHANGED
@@ -1,3 +1,112 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - hiyouga/glaive-function-calling-v2-sharegpt
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ tags:
9
+ - llama-factory
10
+ - unsloth
11
+ base_model: h2oai/h2o-danube2-1.8b-base
12
+ ---
13
+
14
+ # h2o-danube2 with ChatML template
15
+
16
+ This is a [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") and [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") fine-tuned danube2 base model. It uses the ChatML template and was trained on the [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) dataset from [GlaiveAI](https://huggingface.co/glaiveai) that has been converted to [ShareGPT](https://huggingface.co/datasets/hiyouga/glaive-function-calling-v2-sharegpt) by [hiyouga](https://huggingface.co/hiyouga) of [LLama-Factory](https://github.com/hiyouga/LLaMA-Factory) fame.
17
+
18
+ ## Template
19
+
20
+ ### ChatML
21
+ ```jinja2
22
+ <|im_start>system
23
+ {{system}}
24
+ <tools>
25
+ {{json_format_tools}}
26
+ </tools>
27
+ <|im_end|>
28
+ <|im_start>user
29
+ {{instruction}}<|im_end|>
30
+ <|im_start>assistant
31
+ <tool_call>
32
+ {{tool_call}}
33
+ </tool_call><|im_end>
34
+ <|im_start>tool
35
+ <tool_response>
36
+ {{response}}
37
+ </tool_response><|im_end>
38
+ ```
39
+
40
+ ### LLama-Factory
41
+ ```python
42
+ _register_template(
43
+ name="hermes_chatml",
44
+ format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n"]),
45
+ format_assistant=StringFormatter(slots=["{{content}}<|im_end|>\n"]),
46
+ format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}<|im_end|>\n"]),
47
+ format_function=FunctionFormatter(slots=["<tool_call>\n{\"name\":\"{{name}}\", \"arguments\":{{arguments}}}\n</tool_call><|im_end|>\n"]),
48
+ format_observation=StringFormatter(slots=["<|im_start|>tool\n<tool_response>\n{{content}}\n</tool_response><|im_end|>\n<|im_start|>assistant\n"]),
49
+ format_tools=ToolFormatter(tool_format="chatml"),
50
+ stop_words=["<|im_end|>"],
51
+ )
52
+ ```
53
+
54
+ ## BAdam config
55
+
56
+ ```yaml
57
+ ### model
58
+ model_name_or_path: danube2-base-chatml
59
+
60
+ ### method
61
+ stage: sft
62
+ do_train: true
63
+ finetuning_type: full
64
+ use_badam: true
65
+ badam_switch_mode: ascending
66
+ badam_switch_interval: 50
67
+ badam_verbose: 1
68
+ badam_start_block: 5
69
+ seed: 404
70
+
71
+ ### dataset
72
+ dataset: glaive_toolcall_100k
73
+ template: hermes_chatml
74
+ cutoff_len: 8192
75
+ overwrite_cache: false
76
+ preprocessing_num_workers: 12
77
+
78
+ ### output
79
+ output_dir: glaive-tool-chatml-badam
80
+ logging_steps: 5
81
+ save_steps: 1
82
+ save_strategy: epoch
83
+ plot_loss: true
84
+ overwrite_output_dir: false
85
+
86
+ ### train
87
+ per_device_train_batch_size: 2
88
+ gradient_accumulation_steps: 8
89
+ learning_rate: 0.000005
90
+ num_train_epochs: 1
91
+ lr_scheduler_type: cosine
92
+ warmup_ratio: 0.01
93
+ pure_bf16: true
94
+ flash_attn: fa2
95
+
96
+ ### eval
97
+ val_size: 0.01
98
+ per_device_eval_batch_size: 1
99
+ eval_strategy: steps
100
+ eval_steps: 1000
101
+ ```
102
+
103
+ ### BAdam Training results
104
+
105
+ | Training Loss | Epoch | Step | Validation Loss |
106
+ |:-------------:|:------:|:----:|:---------------:|
107
+ | 0.3914 | 0.1607 | 1000 | 0.2984 |
108
+ | 0.3256 | 0.3214 | 2000 | 0.2819 |
109
+ | 0.4131 | 0.4821 | 3000 | 0.2765 |
110
+ | 0.3922 | 0.6428 | 4000 | 0.2736 |
111
+ | 0.3528 | 0.8036 | 5000 | 0.2724 |
112
+ | 0.3477 | 0.9643 | 6000 | 0.2724 |