|
--- |
|
base_model: google/gemma-3-4b-it |
|
library_name: transformers |
|
model_name: trainer_output |
|
tags: |
|
- generated_from_trainer |
|
- trl |
|
- grpo |
|
- reasoning |
|
- math |
|
- step-by-step-thinking |
|
licence: license |
|
--- |
|
|
|
# gemma3-4b-thinking |
|
|
|
This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization). |
|
|
|
## Model Description |
|
|
|
This model was specifically tuned to demonstrate step-by-step reasoning when solving problems, particularly mathematical word problems. The training process used reinforcement learning to reward the model for: |
|
- Providing clear reasoning steps |
|
- Using logical deduction |
|
- Arriving at the correct numerical answer |
|
|
|
## Quick start |
|
|
|
```python |
|
from transformers import pipeline, AutoProcessor |
|
|
|
# Load the model and processor |
|
processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking") |
|
generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer) |
|
|
|
# Example math problem |
|
question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?" |
|
|
|
# Format the input with chat template |
|
input_text = processor.apply_chat_template([{"role": "user", "content": question}]) |
|
|
|
# Generate response with reasoning |
|
output = generator(input_text, max_new_tokens=1024) |
|
print(output[0]["generated_text"]) |
|
``` |
|
|
|
## Model Performance |
|
|
|
The model demonstrates enhanced reasoning capabilities compared to the base model, particularly for: |
|
- Mathematical word problems |
|
- Step-by-step logical deduction |
|
- Breaking complex problems into solvable components |
|
|
|
## Training Procedure |
|
|
|
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300). |
|
|
|
### Training Details |
|
- **Dataset**: GSM8k (Grade School Math 8k), a dataset of diverse grade school math word problems |
|
- **Fine-tuning Method**: GRPO (Generative Reinforcement Learning from Policy Optimization) |
|
- **Training Steps**: 100 |
|
- **Batch Size**: 2 |
|
- **Learning Rate**: 5e-6 |
|
- **Hardware**: A100 80GB GPU |
|
- **Parameter-Efficient Fine-Tuning**: Used LoRA with r=16, alpha=32 |
|
|
|
### Reward Functions |
|
The training used multiple reward functions to guide the model: |
|
- Correctness of final answer |
|
- Using proper numerical formats |
|
- Demonstrating clear reasoning steps |
|
- Following structured formats |
|
|
|
### Framework versions |
|
- TRL: 0.16.0.dev0 |
|
- Transformers: 4.50.0.dev0 |
|
- Pytorch: 2.6.0 |
|
- Datasets: 3.3.2 |
|
- Tokenizers: 0.21.1 |
|
|
|
## Limitations |
|
|
|
- The model sometimes reverts to its base output format rather than following the structured reasoning format used during training |
|
- Performance may vary across different types of problems |
|
- The model is primarily optimized for mathematical reasoning and may not show the same level of improvement on other tasks |
|
|
|
## Ethics and Responsible Use |
|
|
|
- This model is intended to demonstrate reasoning capabilities and should not be used as a sole solution for educational assessments |
|
- Users should verify mathematical results independently for critical applications |
|
- The model can still make reasoning errors despite showing its work |
|
|
|
## Citations |
|
|
|
``` |
|
@article{gemma_2025, |
|
title={Gemma 3}, |
|
url={https://goo.gle/Gemma3Report}, |
|
publisher={Kaggle}, |
|
author={Gemma Team}, |
|
year={2025} |
|
} |
|
|
|
@article{shao2024deepseekmath, |
|
title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models}, |
|
author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others}, |
|
journal={arXiv preprint arXiv:2402.03300}, |
|
year={2024} |
|
} |
|
``` |