MilaWang's picture
Update README.md
0373ffe verified
metadata
library_name: transformers
tags:
  - math
license: mit
datasets:
  - sparkle-reasoning/hardmath
pipeline_tag: reinforcement-learning

SparkleRL-7B-Stage2-aug is the Stage 2 RL-tuned model with partial-step scaffolding used in the paper Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning.


Links

Paper: https://arxiv.org/abs/2506.04723

Code: https://github.com/sparkle-reasoning/sparkle

Project Page: https://sparkle-reasoning.github.io/


Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

name = "sparkle-reasoning/SparkleRL-7B-Stage2-aug"

tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(
    name, torch_dtype=torch.float16, device_map="auto"
)

prompt = "Solve step by step: If 3x + 5 = 20, what is x?"
inp = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inp, max_new_tokens=256)

print(tok.decode(out[0], skip_special_tokens=True))

Citation

@misc{wang2025accuracydissectingmathematicalreasoning,
    title={Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning}, 
    author={Jiayu Wang and Yifei Ming and Zixuan Ke and Caiming Xiong and Shafiq Joty and Aws Albarghouthi and Frederic Sala},
    year={2025},
    eprint={2506.04723},
    archivePrefix={arXiv},
    primaryClass={cs.AI},
    url={https://arxiv.org/abs/2506.04723},
}