--- library_name: transformers language: fa tags: - persian - text-generation - qlora - 4-bit-quantization license: apache-2.0 datasets: - mshojaei77/Persian_sft metrics: - bleu base_model: - google/gemma-3-4b-it --- # Gemma 3-4B Persian (v0) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6556b1bb85d43542fa1a8f91/hZ0P6q3fUqONSXv0MxGzN.png) `mshojaei77/gemma-3-4b-persian-v0` is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model. ## Usage This model is compatible with both the Hugging Face Transformers library and Ollama. ### Running with Ollama ```bash ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0 ``` ### Running with Hugging Face Transformers 1. **Install Dependencies:** ```bash pip install git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 accelerate ``` 2. **Load Model and Tokenizer:** ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "mshojaei77/gemma-3-4b-persian-v0" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", # Use "cuda" for GPU usage if available torch_dtype=torch.bfloat16, # Alternatively, use torch.float16 ) tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ { "role": "user", "content": "توماس جفرسون کیست؟" } ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_tensors="pt" ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Data and Fine-Tuning ### Training Dataset This model was fine-tuned using the [mshojaei77/Persian_sft](https://huggingface.co/datasets/mshojaei77/Persian_sft) dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features: ### Fine-Tuning - **Method:** Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization) - **Hardware:** one T4 GPU - **Software:** Utilizes Hugging Face Transformers, with supporting libraries like `peft` for QLoRA and `bitsandbytes` for quantization - **Trade-offs:** Reduced memory footprint at the expense of some precision compared to full-precision models ## Evaluation [SOON] ## Usage Considerations and Limitations ### Intended Use Cases - **Question Answering:** Responding accurately to Persian language queries - **Instruction Following:** Interpreting and executing text-based instructions in Persian - **Text Generation:** Producing fluent, context-aware Persian content - **Conversational AI:** Integrating into chatbots and virtual assistants - **Image Processing:** Retaining image input capabilities from the base model ### Limitations - **Quantization Impact:** 4-bit quantization may reduce output precision and result in occasional incoherent responses. - **Evaluation Scope:** Absence of comprehensive evaluation metrics specific to this variant. - **Bias:** The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset. - **Hallucination:** As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information. - **Safety:** The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts. ## Maintenance and Future Work This model is under active maintenance. Future updates may include: - Additional evaluation metrics and benchmarks - Enhanced safety tuning and bias mitigation strategies - Expanded documentation and usage examples - Incorporation of community feedback for iterative improvements For any queries, contributions, or issues, please contact me.