🐂 domestic-yak, a Macedonian LM (instruct version)
Model Summary
This is the instruct-tuned version of domestic-yak-8B. It has been fine-tuned specifically to improve instruction-following capabilities in Macedonian. It was fine-tuned on the sft-mk dataset for three epochs. Building on the foundation of domestic-yak-8B
, this version is optimized for generating coherent, task-specific responses to user queries, making it ideal for chatbots, virtual assistants, and other interactive applications.
📊 Results
The table below compares the performance of our model, domestic-yak-8B-instruct with 4 other models. As we can see our model is on par with Llama 70B, and even beats it on three of the benchmarks. It is also worth noting that this model is currently the best in the 8B parameter range.
The results were obtained using the macedonian-llm-eval benchmark.
wn.png)
🔑 Key Details
- Language: Macedonian (
mk
) - Base Model: domestic-yak-8B
- Dataset: ~100k samples across multiple categories (Question answering (QA), chat-like conversations, reasoning, essays, and code) consolidated from translating publicly available datasets and custom synthetic data. Dataset can be found here.
- Fine-tuning Objective: Supervised fine-tuning (SFT) on Macedonian-specific instruction-following data
Usage
Pipeline automatically uses apply_chat_template which formats the input appropriately. The model was trained using the default Llama 3.1 format.
import transformers
import torch
model_id = "LVSTCK/domestic-yak-8B-instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "Ти си виртуелен асистент кој помага на корисници на македонски јазик. Одговарај на прашања на јасен, разбирлив и професионален начин. Користи правилна граматика и обиди се одговорите да бидат што е можно покорисни и релевантни."},
{"role": "user", "content": "Кој е највисок врв во Македонија?"},
]
outputs = pipeline(
messages,
max_new_tokens=256, # You can increase this
temperature=0.1,
)
print(outputs[0]["generated_text"][-1])
📬 Contact
For inquiries, feedback, or contributions, please feel free to reach out to the core team:
Citation
@model{domestic-yak-8B,
title={domestic-yak-8B: A Macedonian Language Model},
authors={Stefan Krsteski, Matea Tashkovska, Borjan Sazdov},
year={2024},
url={https://huggingface.co/LVSTCK/domestic-yak-8B},
note={Macedonian adaptation of Llama 8B.}
}
- Downloads last month
- 17
Model tree for LVSTCK/domestic-yak-8B-instruct
Base model
meta-llama/Llama-3.1-8B