Gemma 3-4B Persian (v0)

mshojaei77/gemma-3-4b-persian-v0 is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model.

Usage

This model is compatible with both the Hugging Face Transformers library and Ollama.

Running with Ollama

ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0

Running with Hugging Face Transformers

Install Dependencies:

pip install git+https://github.com/huggingface/[email protected] accelerate

Load Model and Tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mshojaei77/gemma-3-4b-persian-v0"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",  # Use "cuda" for GPU usage if available
    torch_dtype=torch.bfloat16,  # Alternatively, use torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": "توماس جفرسون کیست؟"
    }
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True, tokenize=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data and Fine-Tuning

Training Dataset

This model was fine-tuned using the mshojaei77/Persian_sft dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features:

Fine-Tuning

Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
Hardware: one T4 GPU
Software: Utilizes Hugging Face Transformers, with supporting libraries like peft for QLoRA and bitsandbytes for quantization
Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models

Evaluation

[SOON]

Usage Considerations and Limitations

Intended Use Cases

Question Answering: Responding accurately to Persian language queries
Instruction Following: Interpreting and executing text-based instructions in Persian
Text Generation: Producing fluent, context-aware Persian content
Conversational AI: Integrating into chatbots and virtual assistants
Image Processing: Retaining image input capabilities from the base model

Limitations

Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.

Maintenance and Future Work

This model is under active maintenance. Future updates may include:

Additional evaluation metrics and benchmarks
Enhanced safety tuning and bias mitigation strategies
Expanded documentation and usage examples
Incorporation of community feedback for iterative improvements

For any queries, contributions, or issues, please contact me.

mshojaei77
/

gemma-3-4b-persian-v0