YOYO-AI commited on
Commit
abf499d
·
verified ·
1 Parent(s): 39bbc2f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -28,7 +28,7 @@ models:
28
  - model: Qwen/Qwen2.5-14B-instruct-1M
29
  dtype: bfloat16
30
  ```
31
- Does this seem flawless? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
32
 
33
  To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
34
 
@@ -43,7 +43,7 @@ models:
43
  dtype: bfloat16
44
  name: Qwen2.5-14B-YOYO-latest-V2
45
  ```
46
- This reduced runaway outputs but still left the model unstable.
47
 
48
  Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/2403.19522) method, then applying the [Model Stock](https://arxiv.org/abs/2406.11617) method, ultimately produces a model that is not only more stable but also achieves better performance.
49
 
 
28
  - model: Qwen/Qwen2.5-14B-instruct-1M
29
  dtype: bfloat16
30
  ```
31
+ Does it seem like there are no issues at all? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
32
 
33
  To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
34
 
 
43
  dtype: bfloat16
44
  name: Qwen2.5-14B-YOYO-latest-V2
45
  ```
46
+ Although the uncontrollable output issue has been addressed, the model still lacks stability.
47
 
48
  Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/2403.19522) method, then applying the [Model Stock](https://arxiv.org/abs/2406.11617) method, ultimately produces a model that is not only more stable but also achieves better performance.
49