Davidsv
/

Qwen15-DeepSeekCoder-Merge

Model card Files Files and versions

Qwen15-DeepSeekCoder-Merge / README.md

Davidsv's picture

Update README.md

c4f19ce verified 8 months ago

|

history blame contribute delete

3.16 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen1.5-7B-Chat
	- deepseek-ai/deepseek-coder-6.7b-instruct
	tags:
	- merge
	- mergekit
	- qwen
	- deepseek
	- coder
	- slerp
	---
	# Qwen15-DeepSeek-Coder-Merge
	This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.

	## About Me
	I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

	🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)

	## Merge Details
	### Merge Method
	This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

	- Weighted Blend: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
	- Complete Layer Merging: Full layer-range coverage ensures comprehensive knowledge transfer
	- Format: bfloat16 precision for efficient memory usage

	### Models Merged
	* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
	* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities

	### Configuration
	```yaml
	slices:
	- sources:
	- model: Qwen/Qwen1.5-7B-Chat
	layer_range: [0, 32]
	- model: deepseek-ai/deepseek-coder-6.7b-instruct
	layer_range: [0, 32]
	merge_method: slerp
	base_model: Qwen/Qwen1.5-7B-Chat
	parameters:
	t: 0.6
	dtype: bfloat16
	```

	## Model Capabilities
	This merge combines:
	- Qwen 1.5's strong instruction following and general knowledge capabilities
	- DeepSeek Coder's specialized programming expertise and code generation abilities
	- Enhanced technical understanding and explanation capabilities
	- Fully open architecture with no usage restrictions

	The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
	- Code generation across multiple programming languages
	- Technical documentation and explanations
	- Algorithm implementation and problem-solving
	- Software development assistance with natural language understanding
	- Debugging and code optimization suggestions

	## Limitations
	- Inherits limitations from both base models
	- May exhibit inconsistent behavior for certain advanced programming tasks
	- No additional alignment or fine-tuning beyond the base models' training
	- Model was created through parameter merging without additional training data
	- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts

	## License
	This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.