| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						license: apache-2.0 | 
					
					
						
						| 
							 | 
						base_model: | 
					
					
						
						| 
							 | 
						- Qwen/Qwen1.5-7B-Chat | 
					
					
						
						| 
							 | 
						- deepseek-ai/deepseek-coder-6.7b-instruct | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- merge | 
					
					
						
						| 
							 | 
						- mergekit | 
					
					
						
						| 
							 | 
						- qwen | 
					
					
						
						| 
							 | 
						- deepseek | 
					
					
						
						| 
							 | 
						- coder | 
					
					
						
						| 
							 | 
						- slerp | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						# Qwen15-DeepSeek-Coder-Merge | 
					
					
						
						| 
							 | 
						This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## About Me | 
					
					
						
						| 
							 | 
						I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						๐ [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/) | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Merge Details | 
					
					
						
						| 
							 | 
						### Merge Method | 
					
					
						
						| 
							 | 
						This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance: | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						- **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model | 
					
					
						
						| 
							 | 
						- **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer | 
					
					
						
						| 
							 | 
						- **Format**: bfloat16 precision for efficient memory usage | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Models Merged | 
					
					
						
						| 
							 | 
						* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following | 
					
					
						
						| 
							 | 
						* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Configuration | 
					
					
						
						| 
							 | 
						```yaml | 
					
					
						
						| 
							 | 
						slices: | 
					
					
						
						| 
							 | 
						  - sources: | 
					
					
						
						| 
							 | 
						      - model: Qwen/Qwen1.5-7B-Chat | 
					
					
						
						| 
							 | 
						        layer_range: [0, 32] | 
					
					
						
						| 
							 | 
						      - model: deepseek-ai/deepseek-coder-6.7b-instruct | 
					
					
						
						| 
							 | 
						        layer_range: [0, 32] | 
					
					
						
						| 
							 | 
						merge_method: slerp | 
					
					
						
						| 
							 | 
						base_model: Qwen/Qwen1.5-7B-Chat | 
					
					
						
						| 
							 | 
						parameters: | 
					
					
						
						| 
							 | 
						  t: 0.6 | 
					
					
						
						| 
							 | 
						dtype: bfloat16 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Model Capabilities | 
					
					
						
						| 
							 | 
						This merge combines: | 
					
					
						
						| 
							 | 
						- Qwen 1.5's strong instruction following and general knowledge capabilities | 
					
					
						
						| 
							 | 
						- DeepSeek Coder's specialized programming expertise and code generation abilities | 
					
					
						
						| 
							 | 
						- Enhanced technical understanding and explanation capabilities | 
					
					
						
						| 
							 | 
						- Fully open architecture with no usage restrictions | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as: | 
					
					
						
						| 
							 | 
						- Code generation across multiple programming languages | 
					
					
						
						| 
							 | 
						- Technical documentation and explanations | 
					
					
						
						| 
							 | 
						- Algorithm implementation and problem-solving | 
					
					
						
						| 
							 | 
						- Software development assistance with natural language understanding | 
					
					
						
						| 
							 | 
						- Debugging and code optimization suggestions | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Limitations | 
					
					
						
						| 
							 | 
						- Inherits limitations from both base models | 
					
					
						
						| 
							 | 
						- May exhibit inconsistent behavior for certain advanced programming tasks | 
					
					
						
						| 
							 | 
						- No additional alignment or fine-tuning beyond the base models' training | 
					
					
						
						| 
							 | 
						- Model was created through parameter merging without additional training data | 
					
					
						
						| 
							 | 
						- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## License | 
					
					
						
						| 
							 | 
						This model is released under the Apache 2.0 license, consistent with the underlying models' licenses. |