marcuscedricridia commited on
Commit
b3fb741
·
verified ·
1 Parent(s): cd8c9dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -22
README.md CHANGED
@@ -1,49 +1,113 @@
1
  ---
2
- base_model:
3
- - marcuscedricridia/mergekit-della-qejrhsk
4
- - marcuscedricridia/mergekit-della-wpunuct
5
- - YOYO-AI/Qwen2.5-7B-it-restore
6
- - marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M
7
- - marcuscedricridia/mergekit-della-phphmhr
8
  library_name: transformers
9
  tags:
10
  - mergekit
11
  - merge
12
-
13
  ---
14
- # merge
15
 
16
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
17
 
18
- ## Merge Details
19
- ### Merge Method
20
 
21
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [YOYO-AI/Qwen2.5-7B-it-restore](https://huggingface.co/YOYO-AI/Qwen2.5-7B-it-restore) as a base.
 
 
22
 
23
- ### Models Merged
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
- The following models were included in the merge:
26
- * [marcuscedricridia/mergekit-della-qejrhsk](https://huggingface.co/marcuscedricridia/mergekit-della-qejrhsk)
27
- * [marcuscedricridia/mergekit-della-wpunuct](https://huggingface.co/marcuscedricridia/mergekit-della-wpunuct)
28
- * [marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M](https://huggingface.co/marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M)
29
- * [marcuscedricridia/mergekit-della-phphmhr](https://huggingface.co/marcuscedricridia/mergekit-della-phphmhr)
30
 
31
- ### Configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
34
 
 
35
  ```yaml
36
  merge_method: model_stock
37
- base_model: YOYO-AI/Qwen2.5-7B-it-restore
38
  models:
39
  - model: marcuscedricridia/mergekit-della-wpunuct
40
  - model: marcuscedricridia/mergekit-della-phphmhr
41
  - model: marcuscedricridia/mergekit-della-qejrhsk
42
- - model: marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M
43
  dtype: bfloat16
44
  tokenizer_source: base
45
  int8_mask: true
46
  normalize: true
47
  name: Cheng-1
 
 
 
 
 
 
 
48
 
 
 
 
 
 
 
 
49
  ```
 
 
 
 
1
  ---
 
 
 
 
 
 
2
  library_name: transformers
3
  tags:
4
  - mergekit
5
  - merge
6
+ license: mit
7
  ---
8
+ # Cheng-1: Multi-Specialty Merged Language Model
9
 
10
+ ## Model Overview
11
+ **Cheng-1** is a high-performance language model created through strategic merging of top-tier, pre-existing fine-tuned models. It excels in **coding, math, translation, and roleplay** without requiring additional fine-tuning. The final model was built using the **model_stock** method with a restore model to maintain strong instruction-following and mathematical abilities.
12
 
13
+ ## Development Process
 
14
 
15
+ ### 1. Foundation Model - "Yell-Qwen2.5-7B-1M"
16
+ - **Base Merge:** Combined `Qwen2.5-7B-Instruct-1M` with `Qwen2.5-7B` using **SCE merging**.
17
+ - **Purpose:** Established a strong general-purpose foundation for later merges.
18
 
19
+ #### **Merge Code:**
20
+ ```yaml
21
+ merge_method: sce
22
+ models:
23
+ - model: Qwen/Qwen2.5-7B-Instruct-1M
24
+ - model: Qwen/Qwen2.5-7B
25
+ base_model: Qwen/Qwen2.5-7B-Instruct-1M
26
+ parameters:
27
+ select_topk: 1
28
+ dtype: bfloat16
29
+ tokenizer_source: base
30
+ normalize: true
31
+ int8_mask: true
32
+ name: Yell-Qwen2.5-7B-1M
33
+ ```
34
 
35
+ ### 2. Domain-Specific Merges
36
+ - **Coding:** Merged `AceCoder-Qwen2.5-7B-Ins-Rule` with Yell-Qwen2.5-7B-1M.
37
+ - **Translation:** Merged `DRT-7B` with Yell-Qwen2.5-7B-1M.
38
+ - **Math:** Merged `AceMath-7B-Instruct` with Yell-Qwen2.5-7B-1M.
39
+ - **Method:** All three were merged using **della merging**, producing three intermediate models.
40
 
41
+ #### **Merge Code:**
42
+ ```yaml
43
+ merge_method: della
44
+ base_model: marcuscedricridia/Yell-Qwen2.5-7B-1M
45
+ models:
46
+ - model: TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule
47
+ parameters:
48
+ density: 1
49
+ weight: 1
50
+ lambda: 0.9
51
+ - model: Krystalan/DRT-7B
52
+ parameters:
53
+ density: 1
54
+ weight: 1
55
+ lambda: 0.9
56
+ - model: nvidia/AceMath-7B-Instruct
57
+ parameters:
58
+ density: 1
59
+ weight: 1
60
+ lambda: 0.9
61
+ parameters:
62
+ density: 1
63
+ weight: 1
64
+ lambda: 0.9
65
+ normalize: true
66
+ int8_mask: true
67
+ dtype: bfloat16
68
+ tokenizer_source: base
69
+ name: Cheng-1
70
+ ```
71
 
72
+ ### 3. Final Model Stock Merge
73
+ - **Models Combined:**
74
+ - `mergekit-della-wpunuct`
75
+ - `mergekit-della-phphmhr`
76
+ - `mergekit-della-qejrhsk`
77
+ - `Hush-Qwen2.5-7B-RP-v1.2-1M` (Roleplay model)
78
+ - **Base Model:** `YOYO-AI/Qwen2.5-7B-it-restore`
79
+ - **Final Method:** Used **model_stock merging** to integrate all models into Cheng-1.
80
 
81
+ #### **Merge Code:**
82
  ```yaml
83
  merge_method: model_stock
84
+ base_model: YOYO-AI/Qwen2.5-7B-it-restore
85
  models:
86
  - model: marcuscedricridia/mergekit-della-wpunuct
87
  - model: marcuscedricridia/mergekit-della-phphmhr
88
  - model: marcuscedricridia/mergekit-della-qejrhsk
89
+ - model: marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M
90
  dtype: bfloat16
91
  tokenizer_source: base
92
  int8_mask: true
93
  normalize: true
94
  name: Cheng-1
95
+ ```
96
+
97
+ ## Benchmarks
98
+ ```
99
+ Model: marcuscedricridia/Cheng-1
100
+ Precision: torch.bfloat16
101
+ Revision: cd8c9dd37c67c2e1b7c683fdd5e72b7f08c074b9
102
 
103
+ Average: 36.06
104
+ IFEval: 77.89
105
+ BBH: 36.54
106
+ MATH: 48.94
107
+ GPQA: 6.15
108
+ MUSR: 9.62
109
+ MMLU-PRO: 37.21
110
  ```
111
+
112
+ ## Conclusion
113
+ Cheng-1 is a versatile model optimized for multiple domains. By merging top-performing models in coding, math, translation, and roleplay, it achieves balanced and strong benchmark results without direct fine-tuning.