metadata
library_name: transformers
tags:
- math
license: mit
datasets:
- sparkle-reasoning/hardmath
pipeline_tag: reinforcement-learning
SparkleRL-7B-Stage2-aug is the Stage 2 RL-tuned model with partial-step scaffolding used in the paper Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning.
Links
Paper: https://arxiv.org/abs/2506.04723
Code: https://github.com/sparkle-reasoning/sparkle
Project Page: https://sparkle-reasoning.github.io/
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
name = "sparkle-reasoning/SparkleRL-7B-Stage2-aug"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
name, torch_dtype=torch.float16, device_map="auto"
)
prompt = "Solve step by step: If 3x + 5 = 20, what is x?"
inp = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inp, max_new_tokens=256)
print(tok.decode(out[0], skip_special_tokens=True))
Citation
@misc{wang2025accuracydissectingmathematicalreasoning,
title={Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning},
author={Jiayu Wang and Yifei Ming and Zixuan Ke and Caiming Xiong and Shafiq Joty and Aws Albarghouthi and Frederic Sala},
year={2025},
eprint={2506.04723},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2506.04723},
}