metadata
license: apache-2.0
language:
- en
tags:
- merge
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
- ehartford/dolphin-2.2.1-mistral-7b
- SciPhi/SciPhi-Mistral-7B-32k
- ehartford/samantha-1.2-mistral-7b
- Arc53/docsgpt-7b-mistral
- HuggingFaceH4/zephyr-7b-beta
- meta-math/MetaMath-Mistral-7B
- Open-Orca/Mistral-7B-OpenOrca
- openchat/openchat-3.5-1210
- beowolx/MistralHermes-CodePro-7B-v1
- TIGER-Lab/MAmmoTH-7B-Mistral
- teknium/OpenHermes-2.5-Mistral-7B
- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
- mlabonne/NeuralHermes-2.5-Mistral-7B
Update 2024-01-03
Check out our v0.4 model which is based on this and achieves better average score of 71.19 versus 69.66.
Model Description
This is an update to EmbeddedLLM/Mistral-7B-Merge-14-v0.2 that removes potentially TruthfulQA-contaminated models and non-commercially licensed models:
- berkeley-nest/Starling-LM-7B-alpha
- Q-bert/MetaMath-Cybertron-Starling
- v1olet/v1olet_marcoroni-go-bruins-merge-7B
This is an experiment to test merging 14 models using DARE TIES 🦙
The result is a base model that performs quite well but may need some further chat fine-tuning.
The 14 models are as follows:
- mistralai/Mistral-7B-Instruct-v0.2
- ehartford/dolphin-2.2.1-mistral-7b
- SciPhi/SciPhi-Mistral-7B-32k
- ehartford/samantha-1.2-mistral-7b
- Arc53/docsgpt-7b-mistral
- HuggingFaceH4/zephyr-7b-beta
- meta-math/MetaMath-Mistral-7B
- Open-Orca/Mistral-7B-OpenOrca
- openchat/openchat-3.5-1210
- beowolx/MistralHermes-CodePro-7B-v1
- TIGER-Lab/MAmmoTH-7B-Mistral
- teknium/OpenHermes-2.5-Mistral-7B
- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
- mlabonne/NeuralHermes-2.5-Mistral-7B
- base model: mistralai/Mistral-7B-v0.1
Open LLM Leaderboard
v0.3 | v0.4 | |
---|---|---|
Average | 69.66 | 71.19 |
ARC | 65.96 | 66.81 |
HellaSwag | 85.29 | 86.15 |
MMLU | 64.35 | 65.10 |
TruthfulQA | 57.80 | 58.25 |
Winogrande | 78.30 | 80.03 |
GSM8K | 66.26 | 70.81 |
Chat Template
We tried ChatML and Llama-2 chat template, but feel free to try other templates.
Merge Configuration
The merge config file for this model is here:
models:
- model: mistralai/Mistral-7B-v0.1
# no parameters necessary for base model
- model: ehartford/dolphin-2.2.1-mistral-7b
parameters:
weight: 0.08
density: 0.4
- model: SciPhi/SciPhi-Mistral-7B-32k
parameters:
weight: 0.08
density: 0.4
- model: ehartford/samantha-1.2-mistral-7b
parameters:
weight: 0.08
density: 0.4
- model: Arc53/docsgpt-7b-mistral
parameters:
weight: 0.08
density: 0.4
- model: HuggingFaceH4/zephyr-7b-beta
parameters:
weight: 0.08
density: 0.4
- model: meta-math/MetaMath-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: Open-Orca/Mistral-7B-OpenOrca
parameters:
weight: 0.08
density: 0.4
- model: openchat/openchat-3.5-1210
parameters:
weight: 0.08
density: 0.4
- model: beowolx/MistralHermes-CodePro-7B-v1
parameters:
weight: 0.08
density: 0.4
- model: TIGER-Lab/MAmmoTH-7B-Mistral
parameters:
weight: 0.08
density: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
parameters:
weight: 0.08
density: 0.4
- model: mlabonne/NeuralHermes-2.5-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: mistralai/Mistral-7B-Instruct-v0.2
parameters:
weight: 0.08
density: 0.5
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
int8_mask: true
dtype: bfloat16