Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ models:
|
|
28 |
- model: Qwen/Qwen2.5-14B-instruct-1M
|
29 |
dtype: bfloat16
|
30 |
```
|
31 |
-
Does
|
32 |
|
33 |
To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
|
34 |
|
@@ -43,7 +43,7 @@ models:
|
|
43 |
dtype: bfloat16
|
44 |
name: Qwen2.5-14B-YOYO-latest-V2
|
45 |
```
|
46 |
-
|
47 |
|
48 |
Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/2403.19522) method, then applying the [Model Stock](https://arxiv.org/abs/2406.11617) method, ultimately produces a model that is not only more stable but also achieves better performance.
|
49 |
|
|
|
28 |
- model: Qwen/Qwen2.5-14B-instruct-1M
|
29 |
dtype: bfloat16
|
30 |
```
|
31 |
+
Does it seem like there are no issues at all? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
|
32 |
|
33 |
To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
|
34 |
|
|
|
43 |
dtype: bfloat16
|
44 |
name: Qwen2.5-14B-YOYO-latest-V2
|
45 |
```
|
46 |
+
Although the uncontrollable output issue has been addressed, the model still lacks stability.
|
47 |
|
48 |
Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/2403.19522) method, then applying the [Model Stock](https://arxiv.org/abs/2406.11617) method, ultimately produces a model that is not only more stable but also achieves better performance.
|
49 |
|