DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 4 days ago • 89
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published 12 days ago • 20
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Paper • 2502.09082 • Published Feb 13 • 28
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 354
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 94