File size: 9,788 Bytes

---
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-14B
- Qwen/Qwen2.5-14B-Instruct
- Qwen/Qwen2.5-14B-Instruct-1M
- Qwen/Qwen2.5-Coder-14B
- Qwen/Qwen2.5-Coder-14B-Instruct
- Azure99/Blossom-V6-14B
- arcee-ai/SuperNova-Medius
- arcee-ai/Virtuoso-Small-v2
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
- huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
pipeline_tag: text-generation
tags:
- merge
model-index:
- name: ZYH-LLM-Qwen2.5-14B-V4
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 83.65
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 50.27
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 53.93
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.61
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 15.66
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 46.71
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=YOYO-AI/ZYH-LLM-Qwen2.5-14B-V4
      name: Open LLM Leaderboard
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/CpkVlkXWV0_9Qnz0nDIP4.jpeg)
# ZYH-LLM-Qwen2.5-14B-V4
*The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!*

*Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.***

## Merge Template

```yaml
merge_method: model_stock  
base_model: Instruction Model  
models:  
  - model: Instruction Fine-tuning Model 1  
  - model: Instruction Fine-tuning Model 2  
  - model: Inference Fine-tuning Model 1  
  - model: Inference Fine-tuning Model 2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
```
Using the above template for merging can improve the **calculation accuracy** and **inference ability** of the model without reducing the **general capabilities** of the instruction model.

**ZYH-LLM-Qwen2.5-V4** used this template during the model merging process.

## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/YOYO-AI__ZYH-LLM-Qwen2.5-14B-V4-details)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |43.14|
|IFEval (0-Shot)    |83.65|
|BBH (3-Shot)       |50.27|
|MATH Lvl 5 (4-Shot)|53.93|
|GPQA (0-shot)      |8.61|
|MuSR (0-shot)      |15.66|
|MMLU-PRO (5-shot)  |46.71|

## First stage:
*Create four different instruction models and code model*
```yaml
models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-base
```
```yaml
models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/Virtuoso-Small-v2  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-v2
```
```yaml
models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: arcee-ai/SuperNova-Medius  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-Nova
```
```yaml
models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Azure99/Blossom-V6-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-V6
```
```yaml
models:  
  - model: Qwen/Qwen2.5-Coder-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen/Qwen2.5-Coder-14B  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-Coder-14B-della
```
## Second stage:

### Step 1:
*Create three instruction models with a bias towards reasoning by using templates.*
```yaml
merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-Coder-14B-della  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Coder
```
```yaml
merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-V6  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-V6
```
```yaml
merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B  
  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-Nova
```
### Step 2:
*Create a pure instruction model to restore the generality of the final model.*
```yaml
merge_method: model_stock  
base_model: Qwen2.5-14B-della-base  
models:  
  - model: Qwen2.5-14B-della-Nova  
  - model: Qwen2.5-14B-della-v2  
  - model: Qwen2.5-14B-della-V6   
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: Qwen2.5-14B-mst-it
```
## Third stage:
*Create a base model with a context of 1 million tokens.*
```yaml
merge_method: sce  
models:
  # Pivot model
  - model: Qwen/Qwen2.5-14B-Instruct-1M
  # Target models  
  - model: Qwen/Qwen2.5-14B  
base_model: Qwen/Qwen2.5-14B-Instruct-1M  
parameters:  
  select_topk: 1  
dtype: bfloat16  
tokenizer_source: base  
normalize: true  
int8_mask: true  
name: Qwen2.5-14B-1M
```
```yaml
models:  
  - model: Qwen/Qwen2.5-14B-Instruct  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9
  - model: Qwen/Qwen2.5-14B-Instruct-1M  
    parameters:  
      density: 1  
      weight: 1  
      lambda: 0.9  
merge_method: della  
base_model: Qwen2.5-14B-1M  
parameters:  
  density: 1  
  weight: 1  
  lambda: 0.9  
  normalize: true  
  int8_mask: true  
dtype: bfloat16  
tokenizer_source: base  
name: Qwen2.5-14B-della-1M
```
## Final stage:

```yaml
merge_method: model_stock  
base_model: Qwen2.5-14B-della-1M  
models:  
  - model: Qwen2.5-14B-mst-Coder  
  - model: Qwen2.5-14B-mst-V6  
  - model: Qwen2.5-14B-mst-Nova  
  - model: Qwen2.5-14B-mst-it  
dtype: bfloat16  
tokenizer_source: base  
int8_mask: true  
normalize: true  
name: ZYH-LLM-Qwen2.5-14B-V4
```