reasoning
updated
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 257
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published • 58
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published • 128
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
s1: Simple test-time scaling
Paper
• 2501.19393
• Published • 125
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
• 2501.04682
• Published • 99
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published • 37
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published • 47
LIMO: Less is More for Reasoning
Paper
• 2502.03387
• Published • 62
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published • 41
LIMA: Less Is More for Alignment
Paper
• 2305.11206
• Published • 27
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published • 100
Let's Verify Math Questions Step by Step
Paper
• 2505.13903
• Published • 2