gemma3-4b-thinking / README.md

Update README.md

75449fb verified 12 days ago

4.05 kB

	---
	base_model: google/gemma-3-4b-it
	library_name: transformers
	model_name: trainer_output
	tags:
	- generated_from_trainer
	- trl
	- grpo
	- reasoning
	- math
	- step-by-step-thinking
	licence: license
	---

	# gemma3-4b-thinking

	This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).

	## Model Description

	This model was specifically tuned to demonstrate step-by-step reasoning when solving problems, particularly mathematical word problems. The training process used reinforcement learning to reward the model for:
	- Providing clear reasoning steps
	- Using logical deduction
	- Arriving at the correct numerical answer

	## Quick start

	```python
	from transformers import pipeline, AutoProcessor

	# Load the model and processor
	processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
	generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)

	# Example math problem
	question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"

	# Format the input with chat template
	input_text = processor.apply_chat_template([{"role": "user", "content": question}])

	# Generate response with reasoning
	output = generator(input_text, max_new_tokens=1024)
	print(output[0]["generated_text"])
	```

	## Model Performance

	The model demonstrates enhanced reasoning capabilities compared to the base model, particularly for:
	- Mathematical word problems
	- Step-by-step logical deduction
	- Breaking complex problems into solvable components

	## Training Procedure

	This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

	### Training Details
	- Dataset: GSM8k (Grade School Math 8k), a dataset of diverse grade school math word problems
	- Fine-tuning Method: GRPO (Generative Reinforcement Learning from Policy Optimization)
	- Training Steps: 100
	- Batch Size: 2
	- Learning Rate: 5e-6
	- Hardware: A100 80GB GPU
	- Parameter-Efficient Fine-Tuning: Used LoRA with r=16, alpha=32

	### Reward Functions
	The training used multiple reward functions to guide the model:
	- Correctness of final answer
	- Using proper numerical formats
	- Demonstrating clear reasoning steps
	- Following structured formats

	### Framework versions
	- TRL: 0.16.0.dev0
	- Transformers: 4.50.0.dev0
	- Pytorch: 2.6.0
	- Datasets: 3.3.2
	- Tokenizers: 0.21.1

	## Limitations

	- The model sometimes reverts to its base output format rather than following the structured reasoning format used during training
	- Performance may vary across different types of problems
	- The model is primarily optimized for mathematical reasoning and may not show the same level of improvement on other tasks

	## Ethics and Responsible Use

	- This model is intended to demonstrate reasoning capabilities and should not be used as a sole solution for educational assessments
	- Users should verify mathematical results independently for critical applications
	- The model can still make reasoning errors despite showing its work

	## Citations

	```
	@article{gemma_2025,
	title={Gemma 3},
	url={https://goo.gle/Gemma3Report},
	publisher={Kaggle},
	author={Gemma Team},
	year={2025}
	}

	@article{shao2024deepseekmath,
	title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
	author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
	journal={arXiv preprint arXiv:2402.03300},
	year={2024}
	}
	```