--- base_model: meta-llama/Meta-Llama-3.1-8B datasets: - M-A-D/Mixed-Arabic-Dataset-Main language: - ar - en library_name: transformers pipeline_tag: question-answering tags: - Supporting Arabic Language - unsloth --- # Model Card for Model ID ## Model Details - To fine-tune Llama 3.1 for improved support of the Arabic language, I will utilize a dataset consisting of Arabic conversations. Fine-tuning large language models (LLMs) like Llama 3.1 on a dataset containing text in a new language, such as Arabic, enhances their ability to understand, generate, and effectively use that language. This process allows the model to learn the nuances, grammar, vocabulary, and cultural context specific to Arabic. Consequently, it becomes more proficient in producing coherent and contextually relevant text in Arabic, thus expanding its multilingual capabilities. ### Model Description - Llama3.1_8k - context window 128k - **Developed by:** [Alber Bshara] - **Language(s) (NLP):** [Arabic (Ar), English (En)] - **License:** [NeptoneAI] - **Finetuned from model:** [Fine-tuned from LLaMA3.1_8k model] ### Model Sources [optional] - **Core Model:** [https://ai.meta.com/blog/meta-llama-3-1/] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations ## How to Get Started with the Model - To use this model, please scroll to the bottom of this page to see instance usage examples. ## Training Details ### Training Data https://huggingface.co/M-A-D#:~:text=The%20Mixed%20Arabic%20Datasets%20(MAD,language%20datasets%20across%20the%20Internet. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 8 - seed: 3407 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 5 - mixed_precision_training: Native AMP - #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data ''' question = "كيف يمكنك إيجاد القاسم المشترك الأكبر للعددين 81 و 27؟" ans= يمكننا استخدام "خوارزمية إقليدس" لإيجاد القاسم المشترك الأكبر للعددين 81 و 27. هذه الخوارزمية تعتمد على القسمة المتكررة، حيث نقسم العدد الأكبر على العدد الأصغر، ثم نقسم العدد الأصغر على الباقي من القسمة السابقة، ونجري هذه العملية حتى يكون الباقي صفرًا. 1. **الخطوة 1**: نقسم 81 على 27: 81 ÷ 27 = 3 والباقي R = 0. 2. **النتيجة النهائية**: عندما يكون الباقي صفرًا، فإن العدد الذي تم القسمة عليه في تلك الخطوة هو القاسم المشترك الأكبر (GCD). في هذه الحالة، عندما أصبح الباقي صفرًا، كان العدد المقسوم عليه هو 27، وبالتالي GCD لـ 81 و27 هو 27. ''' [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware - can run on the T4, L4 GPU or other powerfull GPUs. #### Software ### Framework versions - PEFT 0.12.0 - Transformers 4.44.2 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1 ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed] ## How to Use it: ```python import sys, os from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from unsloth.chat_templates import get_chat_template from typing import Tuple, Dict, Any, List import torch class LLM: def __init__(self, load_in_4bit: bool = True, load_cpu_mem_usage: bool = True, hf_model_path: str = "AlberBshara/ar_llama3.1", max_new_tokens: int= 4096): """ Args: load_in_4bit (bool): Use 4-bit quantization. Defaults to True. load_cpu_mem_usage (bool): Reduce CPU memory usage. Defaults to True. hf_model_path (str): The path of your model on HuggingFace-Hub like "your-user-name/model-name". """ assert torch.cuda.is_available(), "CUDA is not available. An NVIDIA GPU is required." hf_auth_token = HUGGING_FACE_API_TOKEN # Specify the quantization config self._bnb_config = BitsAndBytesConfig(load_in_4bit=load_in_4bit) # Load model directly with quantization config self.model = AutoModelForCausalLM.from_pretrained( hf_model_path, low_cpu_mem_usage=load_cpu_mem_usage, quantization_config=self._bnb_config, use_auth_token=hf_auth_token ) # Load the tokenizer self.tokenizer = AutoTokenizer.from_pretrained( hf_model_path, use_auth_token=hf_auth_token ) self.__tokenizer = get_chat_template( self.tokenizer, chat_template="llama-3", mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, ) self._hf_model_path = hf_model_path self._EOS_TOKEN_ID = self.__tokenizer.eos_token_id self.max_new_tokens = max_new_tokens self._prompt = lambda context, question: f""" Please provide a detailed answer to the question using only the information provided in the context. Do not include any information that is not explicitly mentioned in the context. Context: [{context}] - If the context is in Arabic, answer in Arabic; otherwise, answer in English. Question: [{question}] Your answer should be comprehensive, thoroughly explaining the topic while staying within the boundaries of the provided context. """ def invoke(self, context: str, question: str) -> Tuple: if not question.strip(): raise ValueError("question cannot be empty or None") if not context.strip(): raise ValueError("context cannot be empty or None") inputs = self._prompt(context, question) messages = [{"from": "human", "value": inputs}] inputs = self.__tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, # Must add for generation return_tensors="pt", ).to("cuda") # Increase the max_new_tokens to allow more detailed responses output_ids = self.model.generate(inputs, max_new_tokens=self.max_new_tokens, pad_token_id=self.__tokenizer.pad_token_id) output_ids = output_ids.tolist()[0] if output_ids.size(0) == 1 else output_ids.tolist() output_text = self.__tokenizer.decode(output_ids, skip_special_tokens=True) # Caching GPU Mem. del inputs del output_ids torch.cuda.empty_cache() return output_text, messages def extract_answer(self, response: str) -> str: start_with: str = ".assistant" start_index = response.find(start_with) # If the word is found, extract the substring from that point onward if start_index != -1: # Move start_index to the end of the word start_index += len(start_with) return response[start_index:] else: return response def get_metadata(self) -> Dict[str, Any]: return { "class_name": self.__class__.__name__, "init_params": { "load_in_4bit": True, "load_cpu_mem_usage": True, "hf_model_path": "AlberBshara/ar_llama3.1", "hf_auth_token": "--%$%--", "max_new_tokens": self.max_new_tokens }, "methods": ["invoke", "extract_answer"] } llm = LLM() ```