Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ name: Qwen2.5-14B-YOYO-latest-V2
|
|
45 |
```
|
46 |
Although the uncontrollable output issue has been addressed, the model still lacks stability.
|
47 |
|
48 |
-
Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/
|
49 |
|
50 |
## Key models used:
|
51 |
*1. Low-divergence, high-performance models:*
|
|
|
45 |
```
|
46 |
Although the uncontrollable output issue has been addressed, the model still lacks stability.
|
47 |
|
48 |
+
Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the [DELLA](https://arxiv.org/abs/2406.11617) method, then applying the [Model Stock](https://arxiv.org/abs/2403.19522) method, ultimately produces a model that is not only more stable but also achieves better performance.
|
49 |
|
50 |
## Key models used:
|
51 |
*1. Low-divergence, high-performance models:*
|