PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 26 days ago • 43
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 23 days ago • 90
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 21 days ago • 316
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 21 days ago • 62
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 16 days ago • 24
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 15 days ago • 33
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 15 days ago • 102
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 14 days ago • 51
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 27 days ago • 67
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 30 days ago • 90
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 29 days ago • 273