Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 28 days ago • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 27 days ago • 107
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization Paper • 2412.12098 • Published Dec 16, 2024 • 4
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Paper • 2412.09858 • Published Dec 13, 2024 • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 25 days ago • 55
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published 21 days ago • 14
Improving Transformer World Models for Data-Efficient RL Paper • 2502.01591 • Published 21 days ago • 9
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Paper • 2502.02508 • Published 20 days ago • 21
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published 20 days ago • 22
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Paper • 2502.01618 • Published 21 days ago • 9
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published 18 days ago • 23
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models Paper • 2502.04404 • Published 18 days ago • 21
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 14 days ago • 134
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 14 days ago • 19
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 13 days ago • 33
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published 19 days ago • 23
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 11 days ago • 30
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 12 days ago • 53
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published 7 days ago • 8
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 6 days ago • 73
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 7 days ago • 15