--- license: apache-2.0 base_model: - Qwen/Qwen1.5-7B-Chat - deepseek-ai/deepseek-coder-6.7b-instruct tags: - merge - mergekit - qwen - deepseek - coder - slerp --- # Qwen15-DeepSeek-Coder-Merge This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion. ## About Me I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities. 🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/) ## Merge Details ### Merge Method This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance: - **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model - **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer - **Format**: bfloat16 precision for efficient memory usage ### Models Merged * [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following * [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities ### Configuration ```yaml slices: - sources: - model: Qwen/Qwen1.5-7B-Chat layer_range: [0, 32] - model: deepseek-ai/deepseek-coder-6.7b-instruct layer_range: [0, 32] merge_method: slerp base_model: Qwen/Qwen1.5-7B-Chat parameters: t: 0.6 dtype: bfloat16 ``` ## Model Capabilities This merge combines: - Qwen 1.5's strong instruction following and general knowledge capabilities - DeepSeek Coder's specialized programming expertise and code generation abilities - Enhanced technical understanding and explanation capabilities - Fully open architecture with no usage restrictions The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as: - Code generation across multiple programming languages - Technical documentation and explanations - Algorithm implementation and problem-solving - Software development assistance with natural language understanding - Debugging and code optimization suggestions ## Limitations - Inherits limitations from both base models - May exhibit inconsistent behavior for certain advanced programming tasks - No additional alignment or fine-tuning beyond the base models' training - Model was created through parameter merging without additional training data - Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts ## License This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.