Abstract
Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.
Community
We release the P1 series models for physics reasoning and a full-stack open-source ecosystem spanning models, algorithms, benchmarks, and agent frameworks. The flagship model P1-235B-A22B wins 12 golds and 1 silver across 13 top international/regional physics contests, tying Gemini-2.5-Pro for first on the medal table, and becomes the first open-source model to win gold at IPhO 2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal.
indeed appreciate the team members for open sourcing these superb series of model for solving the Physics Olympiad with benchmark performance.
Couple of questions to the authors :-->
- In the Paragraph 2.2 of the data construction process:-> I still have issues regarding the fact that the datasets are being generated with the focus on "depth and verifiable solutions" still relies on the relative assumption by the expert solver and its approximation for certain parameters in the solution ( like friction coefficient ? viscosity etc... ) ( which is fine for the physics problems which have precise results as in the example) , but lets say in the case if the solution can be multiple option results, will the dataset also take all possible rule verifiable results ? in that case still the reward design will remain the mean of the all rewards or weighted means .
Also I am also currently building another dataset for the competitive examinations from multilingual languages for sciences thus will indeed refer to your implementation as citations.
thanks
arXiv explained breakdown of this paper ๐ https://arxivexplained.com/papers/p1-mastering-physics-olympiads-with-reinforcement-learning
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System (2025)
- SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning (2025)
- CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning (2025)
- Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning (2025)
- Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models (2025)
- Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning (2025)
- Reasoning Curriculum: Bootstrapping Broad LLM Reasoning from Math (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper