Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
licence: license
|
13 |
---
|
14 |
|
15 |
-
#
|
16 |
|
17 |
This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
|
18 |
|
@@ -29,8 +29,8 @@ This model was specifically tuned to demonstrate step-by-step reasoning when sol
|
|
29 |
from transformers import pipeline, AutoProcessor
|
30 |
|
31 |
# Load the model and processor
|
32 |
-
processor = AutoProcessor.from_pretrained("real-jiakai/
|
33 |
-
generator = pipeline("text-generation", model="real-jiakai/
|
34 |
|
35 |
# Example math problem
|
36 |
question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
|
@@ -91,34 +91,19 @@ The training used multiple reward functions to guide the model:
|
|
91 |
|
92 |
## Citations
|
93 |
|
94 |
-
Cite GRPO as:
|
95 |
-
```bibtex
|
96 |
-
@article{zhihong2024deepseekmath,
|
97 |
-
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
98 |
-
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
|
99 |
-
year = 2024,
|
100 |
-
eprint = {arXiv:2402.03300},
|
101 |
-
}
|
102 |
```
|
103 |
-
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
year = 2020,
|
110 |
-
journal = {GitHub repository},
|
111 |
-
publisher = {GitHub},
|
112 |
-
howpublished = {\url{https://github.com/huggingface/trl}}
|
113 |
}
|
114 |
-
```
|
115 |
|
116 |
-
|
117 |
-
|
118 |
-
|
119 |
-
|
120 |
-
|
121 |
-
year = 2021,
|
122 |
-
eprint = {arXiv:2110.14168},
|
123 |
}
|
124 |
```
|
|
|
12 |
licence: license
|
13 |
---
|
14 |
|
15 |
+
# gemma3-4b-thinking
|
16 |
|
17 |
This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) trained to enhance its reasoning and step-by-step thinking capabilities. It has been trained using [TRL](https://github.com/huggingface/trl) with GRPO (Generative Reinforcement Learning from Policy Optimization).
|
18 |
|
|
|
29 |
from transformers import pipeline, AutoProcessor
|
30 |
|
31 |
# Load the model and processor
|
32 |
+
processor = AutoProcessor.from_pretrained("real-jiakai/gemma3-4b-thinking")
|
33 |
+
generator = pipeline("text-generation", model="real-jiakai/gemma3-4b-thinking", tokenizer=processor.tokenizer)
|
34 |
|
35 |
# Example math problem
|
36 |
question = "The school principal decided that she wanted every class to have an equal number of boys and girls in each first-grade classroom. There are 4 classrooms. There are 56 boys and 44 girls. How many total students are in each classroom?"
|
|
|
91 |
|
92 |
## Citations
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
```
|
95 |
+
@article{gemma_2025,
|
96 |
+
title={Gemma 3},
|
97 |
+
url={https://goo.gle/Gemma3Report},
|
98 |
+
publisher={Kaggle},
|
99 |
+
author={Gemma Team},
|
100 |
+
year={2025}
|
|
|
|
|
|
|
|
|
101 |
}
|
|
|
102 |
|
103 |
+
@article{shao2024deepseekmath,
|
104 |
+
title={Deepseekmath: Pushing the limits of mathematical reasoning in open language models},
|
105 |
+
author={Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others},
|
106 |
+
journal={arXiv preprint arXiv:2402.03300},
|
107 |
+
year={2024}
|
|
|
|
|
108 |
}
|
109 |
```
|