SILMA Kashif: The Arabic RAG Model

Community Article Published January 28, 2025

silma-ragqa-benchmark-colored-2.png

Meet SILMA Kashif 2B Instruct v1.0 the first in the SILMA Kashif family, is specifically designed for Arabic and English RAG tasks, excelling at answering questions based on provided context. While its primary strength lies in question answering, Kashif also boasts entity extraction as a secondary skill.

A Performance Powerhouse

SILMA Kashif 2B v1.0 stands at the pinnacle of open-source RAG models in the 3-9 billion parameter range. Rigorous evaluations using the SILMA RAGQA Benchmark confirm its superior performance. Built upon the robust foundation of Google Gemma, Kashif merges the best of both worlds to provide unparalleled results. As an open-weight model, it's freely available for use under an open license, further democratizing access to powerful AI tools. With a 12k context length, Kashif can handle substantial textual input, allowing for nuanced and comprehensive question answering.

A Multifaceted Skillset

Kashif's rigorous training has honed its abilities across a diverse range of tasks:

  • Bilingual Proficiency: Seamlessly answers questions in both Arabic and English.
  • Contextual Mastery: Handles both short snippets and lengthy passages with equal finesse.
  • Flexible Responses: Provides concise answers or detailed explanations as needed.
  • Numerical Acumen: Tackles complex numerical questions, although limitations exist (see below).
  • Tabular Data Comprehension: Extracts information from tables to answer related queries.
  • Multi-Hop Reasoning: Synthesizes information from multiple paragraphs to answer complex questions.
  • Negative Rejection: Intelligently identifies and rejects inaccurate answers, opting instead for a clear "The answer cannot be found in the given context" response.
  • Multi-Domain Expertise: Answers questions across diverse fields, including finance, medicine, and law.
  • Ambiguity Resolution: Navigates ambiguous contexts to provide accurate and relevant answers.
  • Entity Extraction: Identifies and extracts key entities from text.
  • Prompt Versatility: Handles a variety of complex and diverse prompts.

Evaluation and Benchmarks

The SILMA RAGQA Benchmark rigorously tested Kashif across a range of datasets, including FinQA, TatQA, MS MARCO, SciQ, COVIDQA, EManual, XQuAD, BoolQ, and HotpotQA, in both Arabic and English. While achieving impressive average scores across metrics like exact_match, ROUGE1, BLEU, and BERTScore, Kashif’s overall benchmark score stands at a respectable 0.3478, demonstrating its robust performance.

Getting Started with Kashif

Using Kashif is straightforward thanks to the Transformers library. A simple pip install followed by a few lines of code using the pipeline API allows you to quickly start querying the model. Examples are provided for both Arabic and English prompts, outlining the recommended format for optimal performance. Ollama users can also run the model with a simplified command.

Hardware Requirements and Quantization

For optimal performance, a GPU with at least 24GB of memory (e.g., Nvidia RTX 4090) is recommended. However, Kashif can run on GPUs with as little as 8GB of memory (e.g., Nvidia RTX 3070, 3080, or T4), though performance may be impacted. Quantizing the model to 4-bit can reduce the memory footprint, but comes with a slight performance trade-off (around a 2.6% drop in score).

Limitations and Intended Use

Despite its strengths, Kashif has limitations. Its performance on complex numerical and financial reasoning tasks is not optimal due to its parameter size. Additionally, its specialization in text-based question answering means it may struggle with tasks outside this scope.

A Valuable Tool for Arabic NLP

Developed by SILMA AI, a leading GenAI startup specializing in Arabic language models, SILMA Kashif represents a significant advancement in Arabic NLP. Its open availability, robust performance, and specialized focus on RAG tasks make it a valuable tool for researchers, developers, and anyone working with Arabic and English text. While it's not a general-purpose model, its strengths within the RAG domain are undeniable, paving the way for exciting new possibilities in question answering and information retrieval.

Community

Sign up or log in to comment