Gemma 3-4B Persian (v0)

image/png

mshojaei77/gemma-3-4b-persian-v0 is a Persian-specialized model built on the Gemma 3 architecture. It leverages QLoRA for 4-bit quantization to reduce computational overhead while generating and understanding Persian text. In addition to text generation, the model also retains image input capabilities inherited from its base model.

Usage

This model is compatible with both the Hugging Face Transformers library and Ollama.

Running with Ollama

ollama run hf.co/mshojaei77/gemma-3-4b-persian-v0:Q8_0

Running with Hugging Face Transformers

  1. Install Dependencies:

    pip install git+https://github.com/huggingface/[email protected] accelerate
    
  2. Load Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch
    
    model_id = "mshojaei77/gemma-3-4b-persian-v0"
    
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        device_map="auto",  # Use "cuda" for GPU usage if available
        torch_dtype=torch.bfloat16,  # Alternatively, use torch.float16
    )
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    messages = [
        {
            "role": "user",
            "content": "توماس جفرسون کیست؟"
        }
    ]
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True, tokenize=True, return_tensors="pt"
    ).to(model.device)
    
    outputs = model.generate(**inputs, max_new_tokens=200)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

Training Data and Fine-Tuning

Training Dataset

This model was fine-tuned using the mshojaei77/Persian_sft dataset, which contains approximately 681,000 rows of Persian text focused on instruction-following and conversational interactions. The dataset features:

Fine-Tuning

  • Method: Supervised Fine-Tuning (SFT) using QLoRA (4-bit quantization)
  • Hardware: one T4 GPU
  • Software: Utilizes Hugging Face Transformers, with supporting libraries like peft for QLoRA and bitsandbytes for quantization
  • Trade-offs: Reduced memory footprint at the expense of some precision compared to full-precision models

Evaluation

[SOON]

Usage Considerations and Limitations

Intended Use Cases

  • Question Answering: Responding accurately to Persian language queries
  • Instruction Following: Interpreting and executing text-based instructions in Persian
  • Text Generation: Producing fluent, context-aware Persian content
  • Conversational AI: Integrating into chatbots and virtual assistants
  • Image Processing: Retaining image input capabilities from the base model

Limitations

  • Quantization Impact: 4-bit quantization may reduce output precision and result in occasional incoherent responses.
  • Evaluation Scope: Absence of comprehensive evaluation metrics specific to this variant.
  • Bias: The model might mirror biases present in both the original Gemma 3 data and the Persian_sft dataset.
  • Hallucination: As with all LLMs, there is a risk of generating plausible-sounding but inaccurate information.
  • Safety: The model has not undergone safety tuning, so extra caution is advised when deploying in sensitive contexts.

Maintenance and Future Work

This model is under active maintenance. Future updates may include:

  • Additional evaluation metrics and benchmarks
  • Enhanced safety tuning and bias mitigation strategies
  • Expanded documentation and usage examples
  • Incorporation of community feedback for iterative improvements

For any queries, contributions, or issues, please contact me.

Downloads last month
156
Safetensors
Model size
4.3B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mshojaei77/gemma-3-4b-persian-v0

Quantized
(39)
this model
Quantizations
3 models

Dataset used to train mshojaei77/gemma-3-4b-persian-v0

Space using mshojaei77/gemma-3-4b-persian-v0 1