|
--- |
|
license: apache-2.0 |
|
language: |
|
- sw |
|
base_model: |
|
- google/gemma-2-2b-it |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- swahili |
|
- gemma2 |
|
- text-generation-inference |
|
- text-generation |
|
inference: |
|
parameters: |
|
temperature: 0.7 |
|
top_p: 0.95 |
|
max_new_tokens: 500 |
|
do_sample: true |
|
eval_mode: true |
|
model_kwargs: |
|
eval_mode: true |
|
--- |
|
|
|
|
|
|
|
|
|
|
|
# Gemma2-2B-Swahili-IT |
|
|
|
Gemma2-2B-Swahili-IT is a lightweight, efficient open variant of Google's Gemma2-2B-IT model, fine-tuned for natural Swahili language understanding and generation. This model provides a resource-efficient option for Swahili language tasks while maintaining strong performance. |
|
|
|
## Model Details |
|
|
|
- **Developer:** Alfaxad Eyembe |
|
- **Base Model:** google/gemma-2-2b-it |
|
- **Model Type:** Decoder-only transformer |
|
- **Language(s):** Swahili |
|
- **License:** Apache 2.0 |
|
- **Finetuning Approach:** Low-Rank Adaptation (LoRA) |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned on a comprehensive dataset containing: |
|
- 67,017 instruction-response pairs |
|
- 16,273,709 total tokens |
|
- Average 242.83 tokens per example |
|
- High-quality, naturally-written Swahili content |
|
|
|
|
|
 |
|
|
|
## Performance |
|
|
|
### Massive Multitask Language Understanding (MMLU) - Swahili |
|
- Base Model: 31.58% accuracy |
|
- Fine-tuned Model: 38.60% accuracy |
|
- Improvement: +7.02% |
|
|
|
### Sentiment Analysis |
|
- Base Model: 84.85% accuracy |
|
- Fine-tuned Model: 86.00% accuracy |
|
- Improvement: +1.15% |
|
- Response Validity: 100% |
|
|
|
## Intended Use |
|
|
|
This model is designed for: |
|
- Basic Swahili text generation |
|
- Question answering |
|
- Sentiment analysis |
|
- Simple creative writing |
|
- General instruction following in Swahili |
|
- Resource-constrained environments |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-2b-swahili-it") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"alfaxadeyembe/gemma2-2b-swahili-it", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16 |
|
) |
|
|
|
# Always set to eval mode for inference |
|
model.eval() |
|
|
|
# Example usage |
|
prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo." |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=500, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.95 |
|
) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
## Training Details |
|
|
|
- **Fine-tuning Method:** LoRA |
|
- **Training Steps:** 400 |
|
- **Batch Size:** 2 |
|
- **Gradient Accumulation Steps:** 32 |
|
- **Learning Rate:** 2e-4 |
|
- **Training Time:** ~8 hours on A100 GPU |
|
|
|
## Key Features |
|
|
|
- Lightweight and efficient (2B parameters) |
|
- Suitable for resource-constrained environments |
|
- Good performance on basic language tasks |
|
- Fast inference speed |
|
- Low memory footprint |
|
|
|
## Advantages |
|
|
|
1. Resource Efficiency: |
|
- Small model size (2B parameters) |
|
- Lower memory requirements |
|
- Faster inference time |
|
- Suitable for deployment on less powerful hardware |
|
|
|
2. Task Performance: |
|
- Strong sentiment analysis capabilities |
|
- Decent MMLU performance |
|
- Good instruction following |
|
- Natural Swahili generation |
|
|
|
## Limitations |
|
|
|
- Simpler responses compared to 9B/27B variants |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{gemma2-2b-swahili-it, |
|
author = {Alfaxad Eyembe}, |
|
title = {Gemma2-2B-Swahili-IT: A Lightweight Swahili Variant of Gemma2-2B-IT}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Model Hub}, |
|
} |
|
``` |
|
|
|
## Contact |
|
|
|
For questions or feedback, please reach out through: |
|
- HuggingFace: [@alfaxadeyembe](https://huggingface.co/alfaxad) |
|
- Twitter: [@alfxad](https://twitter.com/alfxad) |