|
--- |
|
base_model: |
|
- cstr/llama3.1-8b-spaetzle-v85 |
|
- cstr/llama3.1-8b-spaetzle-v86 |
|
- cstr/llama3.1-8b-spaetzle-v74 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- cstr/llama3.1-8b-spaetzle-v85 |
|
- cstr/llama3.1-8b-spaetzle-v86 |
|
- cstr/llama3.1-8b-spaetzle-v74 |
|
license: llama3 |
|
language: |
|
- en |
|
- de |
|
--- |
|
|
|
# llama3.1-8b-spaetzle-v90 |
|
|
|
These are q4_k_m quants made with llama.cpp b3472 from [cstr/llama3.1-8b-spaetzle-v90](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90) which is a progressive merge of merges. |
|
|
|
EQ-Bench v2_de: 69.93 (171/171). |
|
|
|
The merge tree involves the following models: |
|
|
|
- NousResearch/Hermes-3-Llama-3.1-8B |
|
- Undi95/Meta-Llama-3.1-8B-Claude |
|
- Dampfinchen/Llama-3.1-8B-Ultra-Instruct |
|
- VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct |
|
- akjindal53244/Llama-3.1-Storm-8B |
|
- nbeerbower/llama3.1-gutenberg-8B |
|
- Undi95/Meta-Llama-3.1-8B-Claude |
|
- DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 |
|
- nbeerbower/llama-3-wissenschaft-8B-v2 |
|
- Azure99/blossom-v5-llama3-8b |
|
- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct |
|
- princeton-nlp/Llama-3-Instruct-8B-SimPO |
|
- Locutusque/llama-3-neural-chat-v1-8b |
|
- Locutusque/Llama-3-Orca-1.0-8B |
|
- DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental |
|
- seedboxai/Llama-3-Kafka-8B-v0.2 |
|
- VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct |
|
- nbeerbower/llama-3-wissenschaft-8B-v2 |
|
- mlabonne/Daredevil-8B-abliterated-dpomix |
|
|
|
There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below. |
|
|
|
## 🧩 Configuration |
|
|
|
The final merge for this was: |
|
|
|
```yaml |
|
models: |
|
- model: cstr/llama3.1-8b-spaetzle-v59 |
|
# no parameters necessary for base model |
|
- model: cstr/llama3.1-8b-spaetzle-v85 |
|
parameters: |
|
density: 0.65 |
|
weight: 0.3 |
|
- model: cstr/llama3.1-8b-spaetzle-v86 |
|
parameters: |
|
density: 0.65 |
|
weight: 0.3 |
|
- model: cstr/llama3.1-8b-spaetzle-v74 |
|
parameters: |
|
density: 0.65 |
|
weight: 0.3 |
|
merge_method: dare_ties |
|
base_model: cstr/llama3.1-8b-spaetzle-v59 |
|
parameters: |
|
int8_mask: true |
|
dtype: bfloat16 |
|
random_seed: 0 |
|
tokenizer_source: base |
|
``` |
|
|
|
Among the previous steps: |
|
```yaml |
|
models: |
|
- model: NousResearch/Hermes-3-Llama-3.1-8B |
|
merge_method: slerp |
|
base_model: cstr/llama3.1-8b-spaetzle-v74 |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0] |
|
dtype: float16 |
|
``` |
|
|
|
## 💻 Usage |
|
|
|
Use with llama3 chat template as common. The q4km quants here are from [cstr/llama3.1-8b-spaetzle-v90](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90). |
|
|
|
|