Gemma 3-4B Persian (v0)
mshojaei77/gemma-3-4b-persian-v0
is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model.
Usage
This model is compatible with both the Hugging Face Transformers library and Ollama.
Running with Ollama
ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0
Running with Hugging Face Transformers
Install Dependencies:
pip install git+https://github.com/huggingface/[email protected] accelerate
Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "mshojaei77/gemma-3-4b-persian-v0" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", # Use "cuda" for GPU usage if available torch_dtype=torch.bfloat16, # Alternatively, use torch.float16 ) tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ { "role": "user", "content": "توماس جفرسون کیست؟" } ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_tensors="pt" ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Data and Fine-Tuning
Training Dataset
This model was fine-tuned using the mshojaei77/Persian_sft dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features:
Fine-Tuning
- Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
- Hardware: one T4 GPU
- Software: Utilizes Hugging Face Transformers, with supporting libraries like
peft
for QLoRA andbitsandbytes
for quantization - Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models
Evaluation
[SOON]
Usage Considerations and Limitations
Intended Use Cases
- Question Answering: Responding accurately to Persian language queries
- Instruction Following: Interpreting and executing text-based instructions in Persian
- Text Generation: Producing fluent, context-aware Persian content
- Conversational AI: Integrating into chatbots and virtual assistants
- Image Processing: Retaining image input capabilities from the base model
Limitations
- Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
- Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
- Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
- Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
- Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.
Maintenance and Future Work
This model is under active maintenance. Future updates may include:
- Additional evaluation metrics and benchmarks
- Enhanced safety tuning and bias mitigation strategies
- Expanded documentation and usage examples
- Incorporation of community feedback for iterative improvements
For any queries, contributions, or issues, please contact me.
- Downloads last month
- 156