|
--- |
|
license: apache-2.0 |
|
--- |
|
Meta-Llama-3-70bのセルフマージにより120Bにパラメーター数を拡大したモデルの高性能化が報告されています |
|
今回高品質な日本語LLMである、[karakuri-ai/karakuri-lm-8x7b-chat-v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-chat-v0.1)の精度を更に高めるために、"num_hidden_layers": 32、から、56への自己拡張マージを行いました。 |
|
マージに利用したスライスのインターバルから本モデル(Ex-karakuri-8x12B-chat-v2)が非マージ部分4層、[Ex-karakuri-8x12B-chat-v1](https://huggingface.co/aixsatoshi/Ex-karakuri-8x12B-chat-v1)は8層に設定しています |
|
|
|
|
|
|
|
It was inspired by large merges like: |
|
- [Meta-Llama-3-120B-Instruct](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct/) |
|
- [alpindale/goliath-120b](https://huggingface.co/alpindale/goliath-120b) |
|
- [nsfwthrowitaway69/Venus-120b-v1.0](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0) |
|
- [cognitivecomputations/MegaDolphin-120b](https://huggingface.co/cognitivecomputations/MegaDolphin-120b) |
|
- [wolfram/miquliz-120b-v2.0](https://huggingface.co/wolfram/miquliz-120b-v2.0). |
|
|
|
|
|
``` |
|
slices: |
|
- sources: |
|
- layer_range: [0, 4] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [2, 6] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [4, 8] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [6, 10] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [8, 12] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [10, 14] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [12, 16] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [14, 18] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [16, 20] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [18, 22] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [20, 24] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [22, 26] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [24, 28] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [26, 30] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
- sources: |
|
- layer_range: [28, 32] |
|
model: karakuri-ai/karakuri-lm-8x7b-chat-v0.1 |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
|
|
``` |