license: llama3.2
language:
- ar
base_model:
- meta-llama/Llama-3.2-3B-Instruct
tags:
- arabic
This model is built upon Meta-Llama 3.2 Instruct (3B parameters) and extended through supervised fine-tuning on a large-scale bilingual dataset of approximately 2 million entries. The training corpus combines the ToMe dataset, which offers diverse instruction–response pairs and conversational contexts, with the Arabic Wikipedia dataset, which provides high–quality factual content and rich coverage of knowledge in Arabic. This combination was chosen to balance instruction-following ability with knowledge grounding, especially in domains where Arabic resources are often underrepresented.
During supervised fine-tuning, the model was optimized to better understand natural instructions, generate more coherent and contextually accurate responses, and handle a wide range of tasks spanning reasoning, summarization, and factual question answering. The inclusion of Arabic Wikipedia allows the model to provide stronger support for Arabic-language queries, enabling it to handle both monolingual Arabic tasks and mixed bilingual prompts more effectively than the base Llama 3.2 Instruct model.
The resulting model is well-suited for general-purpose instruction following, with a particular emphasis on Arabic fluency and comprehension. It is expected to be useful in applications such as educational tools, knowledge assistants, conversational agents, and research systems where instruction compliance and multilingual support are critical. While the model shows improved reliability in following prompts and generating informative content, users should remain aware of potential limitations, including biases inherited from the training data and the possibility of occasional hallucinations in factual outputs.