YOYO-AI commited on
Commit
4929c37
·
verified ·
1 Parent(s): 19f66c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -1
README.md CHANGED
@@ -16,4 +16,180 @@ tags:
16
  ---
17
 
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
19
- # Qwen2.5-14B-1M-YOYO-V3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
19
+ I’m excited to introduce my third-generation model:
20
+ # Qwen2.5-14B-1M-YOYO-V3
21
+ This time, I’m not only releasing the model but also sharing some model merging techniques, which might be even more valuable than the model itself.
22
+
23
+ Let’s start by looking at the initial merge configuration (YAML):
24
+ ```yaml
25
+ merge_method: model_stock
26
+ base_model: Qwen/Qwen2.5-14B
27
+ models:
28
+ - model: Qwen/Qwen2.5-14B-instruct
29
+ - model: Qwen/Qwen2.5-14B-instruct-1M
30
+ dtype: bfloat16
31
+ ```
32
+ Seems straightforward, right? But the merged model occasionally suffered from **uncontrollable outputs**, likely due to the large divergence between the instruction-tuned models and the base model.
33
+
34
+ To address this, I first tried integrating a fine-tuned model with smaller divergence from the base model, like **Virtuoso-Small-v2**.
35
+
36
+ This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
37
+ ```yaml
38
+ merge_method: model_stock
39
+ base_model: Qwen/Qwen2.5-14B
40
+ models:
41
+ - model: Qwen/Qwen2.5-14B-instruct
42
+ - model: Qwen/Qwen2.5-14B-instruct-1M
43
+ - model: arcee-ai/Virtuoso-Small-v2
44
+ dtype: bfloat16
45
+ name: Qwen2.5-14B-YOYO-latest-V2
46
+ ```
47
+ This reduced runaway outputs but still left the model unstable.
48
+
49
+ Through experimentation, I found that merging **"high-divergence"** models into **"low-divergence"** models (close to the base) using the `della` method produced more stable and performant result
50
+
51
+ ## Key models used:
52
+ *1. Low-divergence, high-performance models:*
53
+
54
+ - Virtuoso-Small-v2
55
+ - Blossom-V6-14B
56
+
57
+ *2. High-divergence, instruction-focused models:*
58
+
59
+ - Qwen2.5-14B-instruct
60
+ - Qwen2.5-14B-instruct-1M
61
+
62
+ ## DELLA Merge Configuration:
63
+ ```yaml
64
+ models:
65
+ - model: Qwen/Qwen2.5-14B-Instruct
66
+ parameters:
67
+ density: 1
68
+ weight: 1
69
+ lambda: 0.9
70
+ merge_method: della
71
+ base_model: arcee-ai/Virtuoso-Small-v2
72
+ parameters:
73
+ density: 1
74
+ weight: 1
75
+ lambda: 0.9
76
+ normalize: true
77
+ int8_mask: true
78
+ dtype: bfloat16
79
+ tokenizer_source: base
80
+ name: Qwen2.5-14B-YOYO-della1
81
+ ```
82
+ ```yaml
83
+ models:
84
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
85
+ parameters:
86
+ density: 1
87
+ weight: 1
88
+ lambda: 0.9
89
+ merge_method: della
90
+ base_model: arcee-ai/Virtuoso-Small-v2
91
+ parameters:
92
+ density: 1
93
+ weight: 1
94
+ lambda: 0.9
95
+ normalize: true
96
+ int8_mask: true
97
+ dtype: bfloat16
98
+ tokenizer_source: base
99
+ name: Qwen2.5-14B-YOYO-della2
100
+ ```
101
+ ```yaml
102
+ models:
103
+ - model: Qwen/Qwen2.5-14B-Instruct
104
+ parameters:
105
+ density: 1
106
+ weight: 1
107
+ lambda: 0.9
108
+ merge_method: della
109
+ base_model: Azure99/Blossom-V6-14B
110
+ parameters:
111
+ density: 1
112
+ weight: 1
113
+ lambda: 0.9
114
+ normalize: true
115
+ int8_mask: true
116
+ dtype: bfloat16
117
+ tokenizer_source: base
118
+ name: Qwen2.5-14B-YOYO-della3
119
+ ```
120
+ ```yaml
121
+ models:
122
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
123
+ parameters:
124
+ density: 1
125
+ weight: 1
126
+ lambda: 0.9
127
+ merge_method: della
128
+ base_model: Azure99/Blossom-V6-14B
129
+ parameters:
130
+ density: 1
131
+ weight: 1
132
+ lambda: 0.9
133
+ normalize: true
134
+ int8_mask: true
135
+ dtype: bfloat16
136
+ tokenizer_source: base
137
+ name: Qwen2.5-14B-YOYO-della3
138
+ ```
139
+ This approach yielded four variants:
140
+ - `Qwen2.5-14B-YOYO-della1`
141
+ - `Qwen2.5-14B-YOYO-della2`
142
+ - `Qwen2.5-14B-YOYO-della3`
143
+ - `Qwen2.5-14B-YOYO-della4`
144
+
145
+ ## Base Model:
146
+ To enhance base model roleplay and creative writing capabilities, I applied the same strategy:
147
+ ```yaml
148
+ models:
149
+ - model: EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
150
+ parameters:
151
+ density: 1
152
+ weight: 1
153
+ lambda: 0.9
154
+ merge_method: della
155
+ base_model: Qwen/Qwen2.5-14B
156
+ parameters:
157
+ density: 1
158
+ weight: 1
159
+ lambda: 0.9
160
+ normalize: true
161
+ int8_mask: true
162
+ dtype: bfloat16
163
+ tokenizer_source: base
164
+ name: EVA-Qwen2.5-14B-base
165
+ ```
166
+ Next, I extended the context length using the SCE method:
167
+ ```yaml
168
+ merge_method: sce
169
+ models:
170
+ - model: EVA-Qwen2.5-14B-base
171
+ base_model: Qwen/Qwen2.5-14B-Instruct-1M
172
+ parameters:
173
+ select_topk: 1
174
+ dtype: bfloat16
175
+ tokenizer_source: base
176
+ normalize: true
177
+ int8_mask: true
178
+ name: Qwen2.5-14B-pro
179
+ ```
180
+ ## Final Merge Step:
181
+ ```yaml
182
+ merge_method: model_stock
183
+ base_model: Qwen2.5-14B-pro
184
+ models:
185
+ - model: Qwen2.5-14B-YOYO-della1
186
+ - model: Qwen2.5-14B-YOYO-della2
187
+ - model: Qwen2.5-14B-YOYO-della3
188
+ - model: Qwen2.5-14B-YOYO-della4
189
+ dtype: bfloat16
190
+ tokenizer_source: base
191
+ int8_mask: true
192
+ normalize: true
193
+ name: Qwen2.5-14B-1M-YOYO-V3
194
+ ```
195
+ Feel free to adapt these strategies for your own merging experiments! 🚀