File size: 4,048 Bytes

2f55483
 
 
 
 
 
 
 
d6eefb8
 
 
2f55483
 
 
75449fb
2f55483
d6eefb8
 
 
 
 
 
 
 
2f55483
 
 
 
d6eefb8
2f55483
d6eefb8
75449fb
 
2f55483
d6eefb8
 
2f55483
d6eefb8
 
 
 
 
 
 
 
 
2f55483
d6eefb8
 
 
 
2f55483
d6eefb8
 
2f55483
 
d6eefb8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f55483
d6eefb8
2f55483
 
 
 
 
 
d6eefb8
 
 
 
 
 
 
 
 
 
 
 
2f55483
 
 
75449fb
 
 
 
 
 
d6eefb8
 
75449fb
 
 
 
 
2f55483

---
base_model: google/gemma-3-4b-it
library_name: transformers
model_name: trainer_output
tags:
- generated_from_trainer
- trl
- grpo
- reasoning
- math
- step-by-step-thinking
licence: license
---

# gemma3-4b-thinking

This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).

## Model Description

This model was specifically tuned to demonstrate step-by-step reasoning when solving problems, particularly mathematical word problems. The training process used reinforcement learning to reward the model for:
- Providing clear reasoning steps
- Using logical deduction
- Arriving at the correct numerical answer

## Quick start

```python
from transformers import pipeline, AutoProcessor

# Load the model and processor
processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)

# Example math problem
question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"

# Format the input with chat template
input_text = processor.apply_chat_template([{"role": "user", "content": question}])

# Generate response with reasoning
output = generator(input_text, max_new_tokens=1024)
print(output[0]["generated_text"])
```

## Model Performance

The model demonstrates enhanced reasoning capabilities compared to the base model, particularly for:
- Mathematical word problems
- Step-by-step logical deduction
- Breaking complex problems into solvable components

## Training Procedure
 
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

### Training Details
- **Dataset**: GSM8k (Grade School Math 8k), a dataset of diverse grade school math word problems
- **Fine-tuning Method**: GRPO (Generative Reinforcement Learning from Policy Optimization)
- **Training Steps**: 100
- **Batch Size**: 2
- **Learning Rate**: 5e-6
- **Hardware**: A100 80GB GPU
- **Parameter-Efficient Fine-Tuning**: Used LoRA with r=16, alpha=32

### Reward Functions
The training used multiple reward functions to guide the model:
- Correctness of final answer
- Using proper numerical formats
- Demonstrating clear reasoning steps
- Following structured formats

### Framework versions
- TRL: 0.16.0.dev0
- Transformers: 4.50.0.dev0
- Pytorch: 2.6.0
- Datasets: 3.3.2
- Tokenizers: 0.21.1

## Limitations

- The model sometimes reverts to its base output format rather than following the structured reasoning format used during training
- Performance may vary across different types of problems
- The model is primarily optimized for mathematical reasoning and may not show the same level of improvement on other tasks

## Ethics and Responsible Use

- This model is intended to demonstrate reasoning capabilities and should not be used as a sole solution for educational assessments
- Users should verify mathematical results independently for critical applications
- The model can still make reasoning errors despite showing its work

## Citations

```
@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

@article{shao2024deepseekmath,
  title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
  author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
  journal={arXiv preprint arXiv:2402.03300},
  year={2024}
}
```