YOYO-AI
/

Qwen2.5-14B-1M-YOYO-V3

Text Generation

Model card Files Files and versions Community

YOYO-AI commited on 18 days ago

Commit

bb54e01

·

verified ·

1 Parent(s): 3f1dbd8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ name: Qwen2.5-14B-YOYO-latest-V2
 ```
 Although the uncontrollable output issue has been addressed, the model still lacks stability.
-Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the  [DELLA](https://arxiv.org/abs/2403.19522)  method, then applying the  [Model Stock](https://arxiv.org/abs/2406.11617)  method, ultimately produces a model that is not only more stable but also achieves better performance.
 ## Key models used:
 *1. Low-divergence, high-performance models:*

 ```
 Although the uncontrollable output issue has been addressed, the model still lacks stability.
+Through practical experimentation, I found that first merging **"high-divergence"** models (significantly different from the base) into **"low-divergence"** models (closer to the base) using the  [DELLA](https://arxiv.org/abs/2406.11617)  method, then applying the  [Model Stock](https://arxiv.org/abs/2403.19522)  method, ultimately produces a model that is not only more stable but also achieves better performance.
 ## Key models used:
 *1. Low-divergence, high-performance models:*