LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 3 days ago • 47
Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering Paper • 2512.06915 • Published 19 days ago • 12
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9 • 51
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published Mar 28 • 45
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published Mar 28 • 45
Running 1.48k Big Code Models Leaderboard 📈 1.48k Explore and compare code generation models on a leaderboard