Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 11 days ago • 38
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 10 days ago • 94
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 11 days ago • 53
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning Paper • 2503.07608 • Published 11 days ago • 19
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper • 2503.07027 • Published 11 days ago • 25
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published 16 days ago • 212
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published 11 days ago • 37
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published 10 days ago • 57
Identifying Sensitive Weights via Post-quantization Integral Paper • 2503.01901 • Published 21 days ago • 7
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 15 days ago • 22
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published 17 days ago • 23
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 15 days ago • 81
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 15 days ago • 64
Remasking Discrete Diffusion Models with Inference-Time Scaling Paper • 2503.00307 • Published 20 days ago • 9
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published 23 days ago • 45