|
--- |
|
base_model: [] |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
|
|
--- |
|
# Llama3.1-SuperDeepFuse |
|
|
|
An 8B parameter language model that merges three high-performance distilled models to boost reasoning, instruction-following, and performance in mathematics and coding. |
|
|
|
## Model Highlights |
|
|
|
- **Size**: 8 billion parameters |
|
- **Base**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
|
- **Merged Sources**: |
|
- [arcee-ai/Llama-3.1-**Super**Nova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite) |
|
- [deepseek-ai/**Deep**Seek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
|
- [FuseAI/**Fuse**Chat-Llama-3.1-8B-Instruct](https://huggingface.co/FuseAI/FuseChat-Llama-3.1-8B-Instruct) |
|
- **Merge Method**: `model_stock` |
|
|
|
## Key Capabilities |
|
|
|
- Enhanced multi-task reasoning |
|
- Improved mathematical and coding performance |
|
- Multilingual support |
|
|
|
## Performance Notes |
|
|
|
- Maintains Llama 3.1 safety standards |
|
- Suitable for consumer GPU deployment |
|
- Balanced performance across diverse tasks |
|
|
|
## Considerations |
|
|
|
- Still being benchmarked |
|
- Capabilities limited compared to larger model variants |
|
- Can give misleading output like all other language models |
|
- Outputs should be independently verified |
|
|
|
## Licensing |
|
|
|
Follows standard Llama 3.1 usage terms. |