---
base_model: meta-llama/Meta-Llama-3.1-8B
datasets:
- M-A-D/Mixed-Arabic-Dataset-Main
language:
- ar
- en
library_name: transformers
pipeline_tag: question-answering
tags:
- Supporting Arabic Language
- unsloth
---


# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->


## Model Details


- To fine-tune Llama 3.1 for improved support of the Arabic language, I will utilize a dataset consisting of Arabic conversations.

Fine-tuning large language models (LLMs) like Llama 3.1 on a dataset containing text in a new language, such as Arabic, 
enhances their ability to understand, generate, and effectively use that language. This process allows the model to learn the nuances,
grammar, vocabulary, and cultural context specific to Arabic. Consequently, it becomes more proficient in producing coherent and 
contextually relevant text in Arabic, thus expanding its multilingual capabilities.


### Model Description

 - Llama3.1_8k
 - context window 128k


- **Developed by:** [Alber Bshara]
- **Language(s) (NLP):** [Arabic (Ar), English (En)]
- **License:** [NeptoneAI]
- **Finetuned from model:** [Fine-tuned from LLaMA3.1_8k model]

### Model Sources [optional]

- **Core Model:** [https://ai.meta.com/blog/meta-llama-3-1/]


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

## How to Get Started with the Model

- To use this model, please scroll to the bottom of this page to see instance usage examples.


## Training Details


### Training Data

https://huggingface.co/M-A-D#:~:text=The%20Mixed%20Arabic%20Datasets%20(MAD,language%20datasets%20across%20the%20Internet.


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 5
- mixed_precision_training: Native AMP
- 
#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[More Information Needed]

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data


'''
question = "كيف يمكنك إيجاد القاسم المشترك الأكبر للعددين 81 و 27؟"

ans= 

يمكننا استخدام "خوارزمية إقليدس" لإيجاد القاسم المشترك الأكبر للعددين 81 و 27. هذه الخوارزمية تعتمد على القسمة المتكررة، حيث نقسم العدد الأكبر على العدد الأصغر، ثم نقسم العدد الأصغر على الباقي من القسمة السابقة، ونجري هذه العملية حتى يكون الباقي صفرًا.

1. **الخطوة 1**: نقسم 81 على 27: 81 ÷ 27 = 3 والباقي R = 0.

2. **النتيجة النهائية**: عندما يكون الباقي صفرًا، فإن العدد الذي تم القسمة عليه في تلك الخطوة هو القاسم المشترك الأكبر (GCD). في هذه الحالة، عندما أصبح الباقي صفرًا، كان العدد المقسوم عليه هو 27، وبالتالي GCD لـ 81 و27 هو 27.
'''

[More Information Needed]

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary


## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

-  can run on the T4, L4 GPU or other powerfull GPUs.

#### Software

### Framework versions

- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]


## How to Use it:


```python
import sys, os
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from unsloth.chat_templates import get_chat_template
from typing import Tuple, Dict, Any, List
import torch

class LLM:
    def __init__(self, load_in_4bit: bool = True,
                 load_cpu_mem_usage: bool = True,
                 hf_model_path: str = "AlberBshara/ar_llama3.1",
                 max_new_tokens: int= 4096):
        """
        Args:
            load_in_4bit (bool): Use 4-bit quantization. Defaults to True.
            load_cpu_mem_usage (bool): Reduce CPU memory usage. Defaults to True.
            hf_model_path (str): The path of your model on HuggingFace-Hub like "your-user-name/model-name".
        """
        assert torch.cuda.is_available(), "CUDA is not available. An NVIDIA GPU is required."
        hf_auth_token = HUGGING_FACE_API_TOKEN
        # Specify the quantization config
        self._bnb_config = BitsAndBytesConfig(load_in_4bit=load_in_4bit)

        # Load model directly with quantization config
        self.model = AutoModelForCausalLM.from_pretrained(
            hf_model_path,
            low_cpu_mem_usage=load_cpu_mem_usage,
            quantization_config=self._bnb_config,
            use_auth_token=hf_auth_token
        )

        # Load the tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            hf_model_path,
            use_auth_token=hf_auth_token
        )
        self.__tokenizer = get_chat_template(
            self.tokenizer,
            chat_template="llama-3",
            mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
        )

        self._hf_model_path = hf_model_path
        self._EOS_TOKEN_ID = self.__tokenizer.eos_token_id
        self.max_new_tokens = max_new_tokens

        self._prompt = lambda context, question: f"""
        Please provide a detailed answer to the question using only the information provided in the context. Do not include any information that is not explicitly mentioned in the context.

        Context: [{context}]

        - If the context is in Arabic, answer in Arabic; otherwise, answer in English.

        Question: [{question}]

        Your answer should be comprehensive, thoroughly explaining the topic while staying within the boundaries of the provided context.
        """

    def invoke(self, context: str, question: str) -> Tuple:
        if not question.strip():
            raise ValueError("question cannot be empty or None")

        if not context.strip():
            raise ValueError("context cannot be empty or None")

        inputs = self._prompt(context, question)

        messages = [{"from": "human", "value": inputs}]
        inputs = self.__tokenizer.apply_chat_template(
              messages,
              tokenize=True,
              add_generation_prompt=True, # Must add for generation
              return_tensors="pt",
        ).to("cuda")
        
        # Increase the max_new_tokens to allow more detailed responses
        output_ids = self.model.generate(inputs, max_new_tokens=self.max_new_tokens, pad_token_id=self.__tokenizer.pad_token_id)
        output_ids = output_ids.tolist()[0] if output_ids.size(0) == 1 else output_ids.tolist()

        output_text = self.__tokenizer.decode(output_ids, skip_special_tokens=True)

        # Caching GPU Mem.
        del inputs
        del output_ids
        torch.cuda.empty_cache()

        return output_text, messages

    def extract_answer(self, response: str) -> str:
        start_with: str = ".assistant"
        start_index = response.find(start_with)

        # If the word is found, extract the substring from that point onward
        if start_index != -1:
            # Move start_index to the end of the word
            start_index += len(start_with)
            return response[start_index:]
        else:
            return response

    def get_metadata(self) -> Dict[str, Any]:
        return {
            "class_name": self.__class__.__name__,
            "init_params": {
                "load_in_4bit": True,
                "load_cpu_mem_usage": True,
                "hf_model_path": "AlberBshara/ar_llama3.1",
                "hf_auth_token": "--%$%--",
                 "max_new_tokens": self.max_new_tokens
            },
            "methods": ["invoke", "extract_answer"]
        }


llm = LLM()
```