ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published 19 days ago • 51
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published 19 days ago • 141
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 26 days ago • 168
Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing Paper • 2504.21356 • Published Apr 30 • 1
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published Jul 28 • 56
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 81
SpatialLM: Training Large Language Models for Structured Indoor Modeling Paper • 2506.07491 • Published Jun 9 • 49
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25 • 46
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 46
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Paper • 2507.07484 • Published Jul 10 • 17
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14 • 69
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning Paper • 2507.12508 • Published Jul 16 • 26
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published Apr 24 • 29
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 93
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 535