Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ GRPO is applied after a distilled R1 model is created to further refine its reas
|
|
| 22 |
*Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
|
| 23 |
[https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
|
| 24 |
|
| 25 |
-
- Converted to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs
|
| 26 |
|
| 27 |
# Notes:
|
| 28 |
- Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
|
|
|
|
| 22 |
*Special thanks to Dongwei for fine-tuning this version of DeepSeek-R1-Distill-Qwen-7B. More information about it can be found here:*
|
| 23 |
[https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math](https://huggingface.co/Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math)
|
| 24 |
|
| 25 |
+
- Converted to MLX format with a quantization of 4-bit for better performance on Apple Silicon Macs.
|
| 26 |
|
| 27 |
# Notes:
|
| 28 |
- Seems to brush over the "thinking" process and immediately start answering, leading to extremely quick but correct answers.
|