real-jiakai
/

gemma3-4b-thinking

@@ -12,7 +12,7 @@ tags:
 licence: license
 ---
-# Model Card for trainer_output
 This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
@@ -29,8 +29,8 @@ This model was specifically tuned to demonstrate step-by-step reasoning when sol
 from transformers import pipeline, AutoProcessor
 # Load the model and processor
-processor = AutoProcessor.from_pretrained("real-jiakai/trainer_output")
-generator = pipeline("text-generation", model="real-jiakai/trainer_output", tokenizer=processor.tokenizer)
 # Example math problem
 question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
@@ -91,34 +91,19 @@ The training used multiple reward functions to guide the model:
 ## Citations
-Cite GRPO as:
-```bibtex
-@article{zhihong2024deepseekmath,
-    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
-    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
-    year         = 2024,
-    eprint       = {arXiv:2402.03300},
-}
 ```
-Cite TRL as:
-```bibtex
-@misc{vonwerra2022trl,
-    title        = {{TRL: Transformer Reinforcement Learning}},
-    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
-    year         = 2020,
-    journal      = {GitHub repository},
-    publisher    = {GitHub},
-    howpublished = {\url{https://github.com/huggingface/trl}}
 }
-```
-Cite GSM8k as:
-```bibtex
-@article{cobbe2021gsm8k,
-    title        = {{Training Verifiers to Solve Math Word Problems}},
-    author       = {Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},
-    year         = 2021,
-    eprint       = {arXiv:2110.14168},
 }
 ```

 licence: license
 ---
+# gemma3-4b-thinking
 This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
 from transformers import pipeline, AutoProcessor
 # Load the model and processor
+processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
+generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)
 # Example math problem
 question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
 ## Citations
 ```
+@article{gemma_2025,
+    title={Gemma 3},
+    url={https://goo.gle/Gemma3Report},
+    publisher={Kaggle},
+    author={Gemma Team},
+    year={2025}
 }
+@article{shao2024deepseekmath,
+  title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
+  author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
+  journal={arXiv preprint arXiv:2402.03300},
+  year={2024}
 }
 ```