YOYO-AI commited on
Commit
a346c0b
·
verified ·
1 Parent(s): 7157849

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
19
  I’m excited to introduce my third-generation model:
20
  # Qwen2.5-14B-1M-YOYO-V3
21
- This time, I’m not only releasing the model but also sharing some model merging techniques, which might be even more valuable than the model itself.
22
 
23
  Let’s start by looking at the initial merge configuration (YAML):
24
  ```yaml
@@ -29,9 +29,9 @@ models:
29
  - model: Qwen/Qwen2.5-14B-instruct-1M
30
  dtype: bfloat16
31
  ```
32
- Seems straightforward, right? But the merged model occasionally suffered from **uncontrollable outputs**, likely due to the large divergence between the instruction-tuned models and the base model.
33
 
34
- To address this, I first tried integrating a fine-tuned model with smaller divergence from the base model, like **Virtuoso-Small-v2**.
35
 
36
  This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
37
  ```yaml
@@ -192,4 +192,4 @@ int8_mask: true
192
  normalize: true
193
  name: Qwen2.5-14B-1M-YOYO-V3
194
  ```
195
- Feel free to adapt these strategies for your own merging experiments! 🚀
 
18
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CfIE4_oZgpNsNZyurjO7D.png)
19
  I’m excited to introduce my third-generation model:
20
  # Qwen2.5-14B-1M-YOYO-V3
21
+ This time, I not only released the model but also shared some model merging insights that might be even more valuable than the model itself.
22
 
23
  Let’s start by looking at the initial merge configuration (YAML):
24
  ```yaml
 
29
  - model: Qwen/Qwen2.5-14B-instruct-1M
30
  dtype: bfloat16
31
  ```
32
+ Does this seem flawless? However, merged models occasionally exhibit **uncontrollable outputs**, likely due to significant discrepancies between instruction-tuned models and base models.
33
 
34
+ To address this, I first attempted to directly integrate a fine-tuned model with smaller divergence from the base model, such as **Virtuoso-Small-v2**.
35
 
36
  This gave rise to [Qwen2.5-14B-YOYO-latest-V2](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-latest-V2).
37
  ```yaml
 
192
  normalize: true
193
  name: Qwen2.5-14B-1M-YOYO-V3
194
  ```
195
+ I hope this helps!