--- library_name: transformers tags: - mergekit - merge license: mit --- # Cheng-1: Multi-Specialty Merged Language Model ## Model Overview **Cheng-1** is a high-performance language model created through strategic merging of top-tier, pre-existing fine-tuned models. It excels in **coding, math, translation, and roleplay** without requiring additional fine-tuning. The final model was built using the **model_stock** method with a restore model to maintain strong instruction-following and mathematical abilities. ## Development Process ### 1. Foundation Model - "Yell-Qwen2.5-7B-1M" - **Base Merge:** Combined `Qwen2.5-7B-Instruct-1M` with `Qwen2.5-7B` using **SCE merging**. - **Purpose:** Established a strong general-purpose foundation for later merges. #### **Merge Code:** ```yaml merge_method: sce models: - model: Qwen/Qwen2.5-7B-Instruct-1M - model: Qwen/Qwen2.5-7B base_model: Qwen/Qwen2.5-7B-Instruct-1M parameters: select_topk: 1 dtype: bfloat16 tokenizer_source: base normalize: true int8_mask: true name: Yell-Qwen2.5-7B-1M ``` ### 2. Domain-Specific Merges - **Coding:** Merged `AceCoder-Qwen2.5-7B-Ins-Rule` with Yell-Qwen2.5-7B-1M. - **Translation:** Merged `DRT-7B` with Yell-Qwen2.5-7B-1M. - **Math:** Merged `AceMath-7B-Instruct` with Yell-Qwen2.5-7B-1M. - **Method:** All three were merged using **della merging**, producing three intermediate models. #### **Merge Code:** ```yaml merge_method: della base_model: marcuscedricridia/Yell-Qwen2.5-7B-1M models: - model: TIGER-Lab/AceCoder-Qwen2.5-7B-Ins-Rule parameters: density: 1 weight: 1 lambda: 0.9 - model: Krystalan/DRT-7B parameters: density: 1 weight: 1 lambda: 0.9 - model: nvidia/AceMath-7B-Instruct parameters: density: 1 weight: 1 lambda: 0.9 parameters: density: 1 weight: 1 lambda: 0.9 normalize: true int8_mask: true dtype: bfloat16 tokenizer_source: base name: Cheng-1 ``` ### 3. Final Model Stock Merge - **Models Combined:** - `mergekit-della-wpunuct` - `mergekit-della-phphmhr` - `mergekit-della-qejrhsk` - `Hush-Qwen2.5-7B-RP-v1.2-1M` (Roleplay model) - **Base Model:** `YOYO-AI/Qwen2.5-7B-it-restore` - **Final Method:** Used **model_stock merging** to integrate all models into Cheng-1. #### **Merge Code:** ```yaml merge_method: model_stock base_model: YOYO-AI/Qwen2.5-7B-it-restore models: - model: marcuscedricridia/mergekit-della-wpunuct - model: marcuscedricridia/mergekit-della-phphmhr - model: marcuscedricridia/mergekit-della-qejrhsk - model: marcuscedricridia/Hush-Qwen2.5-7B-RP-v1.2-1M dtype: bfloat16 tokenizer_source: base int8_mask: true normalize: true name: Cheng-1 ``` ## Benchmarks ``` Model: marcuscedricridia/Cheng-1 Precision: torch.bfloat16 Revision: cd8c9dd37c67c2e1b7c683fdd5e72b7f08c074b9 Average: 36.06 IFEval: 77.89 BBH: 36.54 MATH: 48.94 GPQA: 6.15 MUSR: 9.62 MMLU-PRO: 37.21 ``` ## Conclusion Cheng-1 is a versatile model optimized for multiple domains. By merging top-performing models in coding, math, translation, and roleplay, it achieves balanced and strong benchmark results without direct fine-tuning.