YOYO-AI
/

Qwen2.5-14B-1M-YOYO-V3

Text Generation

Model card Files Files and versions Community

YOYO-AI commited on 18 days ago

Commit

a346c0b

·

verified ·

1 Parent(s): 7157849

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ tags:
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
 I’m excited to introduce my third-generation model:
 # Qwen2.5-14B-1M-YOYO-V3
-This time, I’m not only releasing the model but also sharing some model merging techniques, which might be even more valuable than the model itself.
 Let’s start by looking at the initial merge configuration (YAML):
 ```yaml
@@ -29,9 +29,9 @@ models:
   - model: Qwen/Qwen2.5-14B-instruct-1M
 dtype: bfloat16
 ```
-Seems straightforward, right? But the merged model occasionally suffered from **uncontrollable outputs**, likely due to the large divergence between the instruction-tuned models and the base model.
-To address this, I first tried integrating a fine-tuned model with smaller divergence from the base model, like **Virtuoso-Small-v2**.
 This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
 ```yaml
@@ -192,4 +192,4 @@ int8_mask: true
 normalize: true
 name: Qwen2.5-14B-1M-YOYO-V3
 ```
-Feel free to adapt these strategies for your own merging experiments! 🚀

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
 I’m excited to introduce my third-generation model:
 # Qwen2.5-14B-1M-YOYO-V3
+This time, I not only released the model but also shared some model merging insights that might be even more valuable than the model itself.
 Let’s start by looking at the initial merge configuration (YAML):
 ```yaml
   - model: Qwen/Qwen2.5-14B-instruct-1M
 dtype: bfloat16
 ```
+Does this seem flawless? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
+To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
 This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
 ```yaml
 normalize: true
 name: Qwen2.5-14B-1M-YOYO-V3
 ```
+I hope this helps!