real-jiakai commited on
Commit
75449fb
·
verified ·
1 Parent(s): d6eefb8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -29
README.md CHANGED
@@ -12,7 +12,7 @@ tags:
12
  licence: license
13
  ---
14
 
15
- # Model Card for trainer_output
16
 
17
  This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
18
 
@@ -29,8 +29,8 @@ This model was specifically tuned to demonstrate step-by-step reasoning when sol
29
  from transformers import pipeline, AutoProcessor
30
 
31
  # Load the model and processor
32
- processor = AutoProcessor.from_pretrained("real-jiakai/trainer_output")
33
- generator = pipeline("text-generation", model="real-jiakai/trainer_output", tokenizer=processor.tokenizer)
34
 
35
  # Example math problem
36
  question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
@@ -91,34 +91,19 @@ The training used multiple reward functions to guide the model:
91
 
92
  ## Citations
93
 
94
- Cite GRPO as:
95
- ```bibtex
96
- @article{zhihong2024deepseekmath,
97
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
98
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
99
- year = 2024,
100
- eprint = {arXiv:2402.03300},
101
- }
102
  ```
103
-
104
- Cite TRL as:
105
- ```bibtex
106
- @misc{vonwerra2022trl,
107
- title = {{TRL: Transformer Reinforcement Learning}},
108
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
109
- year = 2020,
110
- journal = {GitHub repository},
111
- publisher = {GitHub},
112
- howpublished = {\url{https://github.com/huggingface/trl}}
113
  }
114
- ```
115
 
116
- Cite GSM8k as:
117
- ```bibtex
118
- @article{cobbe2021gsm8k,
119
- title = {{Training Verifiers to Solve Math Word Problems}},
120
- author = {Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},
121
- year = 2021,
122
- eprint = {arXiv:2110.14168},
123
  }
124
  ```
 
12
  licence: license
13
  ---
14
 
15
+ # gemma3-4b-thinking
16
 
17
  This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
18
 
 
29
  from transformers import pipeline, AutoProcessor
30
 
31
  # Load the model and processor
32
+ processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
33
+ generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)
34
 
35
  # Example math problem
36
  question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
 
91
 
92
  ## Citations
93
 
 
 
 
 
 
 
 
 
94
  ```
95
+ @article{gemma_2025,
96
+ title={Gemma 3},
97
+ url={https://goo.gle/Gemma3Report},
98
+ publisher={Kaggle},
99
+ author={Gemma Team},
100
+ year={2025}
 
 
 
 
101
  }
 
102
 
103
+ @article{shao2024deepseekmath,
104
+ title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
105
+ author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
106
+ journal={arXiv preprint arXiv:2402.03300},
107
+ year={2024}
 
 
108
  }
109
  ```