π The paper introduces rStar-Math, which claims to rival OpenAI o1's math reasoning capabilities by integrating Monte Carlo Tree Search (MCTS) with step-by-step verified reasoning trajectories.
π€ A Process Preference Model (PPM) enables fine-grained evaluation of intermediate steps, improving training data quality.
π§ͺ The system underwent four rounds of self-evolution, progressively refining both the policy and reward models to tackle Olympiad-level math problemsβwithout GPT-4-based data distillation.
πΎ While we wait for the release of code and datasets, you can already download the prompts they used from the HF Hub!
Details and links here π
Prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
Templates on the hub: MoritzLaurer/rstar-math-prompts
Prompt-templates collection: MoritzLaurer/prompt-templates-6776aa0b0b8a923957920bb4
Paper: https://arxiv.org/pdf/2501.04519