Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ tags:
|
|
18 |

|
19 |
I’m excited to introduce my third-generation model:
|
20 |
# Qwen2.5-14B-1M-YOYO-V3
|
21 |
-
This time, I
|
22 |
|
23 |
Let’s start by looking at the initial merge configuration (YAML):
|
24 |
```yaml
|
@@ -29,9 +29,9 @@ models:
|
|
29 |
- model: Qwen/Qwen2.5-14B-instruct-1M
|
30 |
dtype: bfloat16
|
31 |
```
|
32 |
-
|
33 |
|
34 |
-
To address this, I first
|
35 |
|
36 |
This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
|
37 |
```yaml
|
@@ -192,4 +192,4 @@ int8_mask: true
|
|
192 |
normalize: true
|
193 |
name: Qwen2.5-14B-1M-YOYO-V3
|
194 |
```
|
195 |
-
|
|
|
18 |

|
19 |
I’m excited to introduce my third-generation model:
|
20 |
# Qwen2.5-14B-1M-YOYO-V3
|
21 |
+
This time, I not only released the model but also shared some model merging insights that might be even more valuable than the model itself.
|
22 |
|
23 |
Let’s start by looking at the initial merge configuration (YAML):
|
24 |
```yaml
|
|
|
29 |
- model: Qwen/Qwen2.5-14B-instruct-1M
|
30 |
dtype: bfloat16
|
31 |
```
|
32 |
+
Does this seem flawless? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
|
33 |
|
34 |
+
To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
|
35 |
|
36 |
This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
|
37 |
```yaml
|
|
|
192 |
normalize: true
|
193 |
name: Qwen2.5-14B-1M-YOYO-V3
|
194 |
```
|
195 |
+
I hope this helps!
|