Llama-3-VNTL-Yollisa-8B
This is a merge of pre-trained language models created using mergekit.
This merge is an expansion on the idea of merging at extremely low weight as an alternitive to finetuning with the added step of subtracting the base model from finetunes before merging. Instruct format is the custom version of llama3 that VNTL uses, but you should be able to mix in some regular llama3 formats as well, and it might even help with improving translation quality with the right prompt.
Usage
Presets
For SillyTavern use these presets.
When adding prompts outside of Metadata, set role to system and add instruct format manually. Because system prompt formats are blank, this allows to write ST scripts to add old chat pairs to the Data Bank with instruct formats RegExed in and inject them via RAG. I found that doing so increases translation quality greatly.
The Data Bank entry should look something like this with instruct format included:
<|start_header_id|>Japanese<|end_header_id|>
千春「2人も参加っと。<|eot_id|><|start_header_id|>English<|end_header_id|>
Chiharu "So, both of you are in, huh?"<|eot_id|><|start_header_id|>Japanese<|end_header_id|>
千春後は柚ちゃんだけだけど、もちろんやるよね」<|eot_id|><|start_header_id|>English<|end_header_id|>
Chiharu "Now it’s just Yuzu-chan left. Of course, you’re in, right?"<|eot_id|>
Samplers
top_k: 1
# or
temp: 0
writing and translation quality can be a bit unstable, but I recommend using RAG to stablize it.
Configuration
The following YAML configuration was used to produce this model:
Llama-3-Yollow-8B
models:
# Pivot model
- model: meta-llama/Meta-Llama-3-8B
# Target models
- model: rinna/llama-3-youko-8b
- model: tokyotech-llm/Llama-3-Swallow-8B-v0.1
merge_method: sce
base_model: meta-llama/Meta-Llama-3-8B
parameters:
select_topk: 1.0
dtype: float32
Llama-3-Minus-Base-8B
models:
# Finetune model
- model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: meta-llama/Meta-Llama-3-8B-Instruct
parameters:
normalize: false
dtype: float32
Llama-3-Youko-Minus-Base-8B
models:
# Finetune model
- model: rinna/llama-3-youko-8b-instruct
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: rinna/llama-3-youko-8b-instruct
parameters:
normalize: false
dtype: float32
Llama-3-Swallow-Minus-Base-8B
models:
# Finetune model
- model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
parameters:
normalize: false
dtype: float32
Llama-3-Shisa-Minus-Base-8B
models:
# Finetune model
- model: shisa-ai/shisa-v1-llama3-8b
parameters:
weight: 1.0
# Base model
- model: meta-llama/Meta-Llama-3-8B
parameters:
weight: -1.0
merge_method: task_arithmetic
base_model: shisa-ai/shisa-v1-llama3-8b
parameters:
normalize: false
dtype: float32
Llama-3-VNTL-Yollisa-8B
models:
# Base
- model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
weight: 1.0
# Models
- model: Casual-Autopsy/Llama-3-Minus-Base-8B
parameters:
density: 0.35
weight: 10e-5
- model: Casual-Autopsy/Llama-3-Shisa-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
- model: Casual-Autopsy/Llama-3-Swallow-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
- model: Casual-Autopsy/Llama-3-Youko-Minus-Base-8B
parameters:
density: 0.85
weight: 25e-5
merge_method: ties
base_model: Casual-Autopsy/Llama-3-Yollow-8B+lmg-anon/vntl-llama3-8b-v2-qlora
parameters:
normalize: false
int8_mask: false
dtype: float32
- Downloads last month
- 184