-
Reasoning Language Models: A Blueprint
Paper • 2501.11223 • Published • 31 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 88 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302
Collections
Discover the best community collections!
Collections including paper arxiv:2501.12948
-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 60 -
apple/OpenELM
Updated • 1.43k -
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Text Generation • Updated • 526k • • 703 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302
-
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Paper • 2501.09751 • Published • 47 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 36 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 88
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 36 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 48 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 22
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 68 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 106 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 273 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 49
-
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 98 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 91 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 254 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302
-
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Paper • 2412.16849 • Published • 9 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 302
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 85 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 345 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 67
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 45 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 35 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 85 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 30 -
Outcome-Refining Process Supervision for Code Generation
Paper • 2412.15118 • Published • 19