🐂 domestic-yak, a Macedonian LM (instruct version)

Model Summary

This is the instruct-tuned version of domestic-yak-8B. It has been fine-tuned specifically to improve instruction-following capabilities in Macedonian. It was fine-tuned on the sft-mk dataset for three epochs. Building on the foundation of domestic-yak-8B, this version is optimized for generating coherent, task-specific responses to user queries, making it ideal for chatbots, virtual assistants, and other interactive applications.

📊 Results

The table below compares the performance of our model, domestic-yak-8B-instruct with 4 other models. As we can see our model is on par with Llama 70B, and even beats it on three of the benchmarks. It is also worth noting that this model is currently the best in the 8B parameter range.

The results were obtained using the macedonian-llm-eval benchmark.

wn.png)

🔑 Key Details

Language: Macedonian (mk)
Base Model: domestic-yak-8B
Dataset: ~100k samples across multiple categories (Question answering (QA), chat-like conversations, reasoning, essays, and code) consolidated from translating publicly available datasets and custom synthetic data. Dataset can be found here.
Fine-tuning Objective: Supervised fine-tuning (SFT) on Macedonian-specific instruction-following data

Usage

Pipeline automatically uses apply_chat_template which formats the input appropriately. The model was trained using the default Llama 3.1 format.

import transformers
import torch

model_id = "LVSTCK/domestic-yak-8B-instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Ти си виртуелен асистент кој помага на корисници на македонски јазик. Одговарај на прашања на јасен, разбирлив и професионален начин. Користи правилна граматика и обиди се одговорите да бидат што е можно покорисни и релевантни."},
    {"role": "user", "content": "Кој е највисок врв во Македонија?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256, # You can increase this
    temperature=0.1,
)
print(outputs[0]["generated_text"][-1])

📬 Contact

For inquiries, feedback, or contributions, please feel free to reach out to the core team:

Citation

@model{domestic-yak-8B,
title={domestic-yak-8B: A Macedonian Language Model},
authors={Stefan Krsteski, Matea Tashkovska, Borjan Sazdov},
year={2024},
url={https://huggingface.co/LVSTCK/domestic-yak-8B},
note={Macedonian adaptation of Llama 8B.}
}

LVSTCK
/

domestic-yak-8B-instruct