Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 10 days ago • 79
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published 27 days ago • 88
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 33
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17 • 134
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper • 2511.01833 • Published Nov 3 • 15
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! Paper • 2509.26495 • Published Sep 30 • 10
SGP-Generation Collection Symbolic Graphic Programming with Large Language Model • 5 items • Updated Sep 11 • 3
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11 • 22