MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 28
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 104
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 99
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 52
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 279
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 39
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published Jan 17 • 48
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published Jan 30 • 19
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 146
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published Feb 10 • 128
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published Feb 11 • 43
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published Feb 11 • 37
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published Feb 12 • 52
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 51
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14 • 53
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 43
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 186
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 137
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 98
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 15 days ago • 54
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 20 days ago • 84
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 19 days ago • 51
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 22 days ago • 77
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 15 days ago • 80
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 15 days ago • 95
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 19 days ago • 107
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 11 days ago • 17
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published 12 days ago • 45
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published 12 days ago • 16
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published 12 days ago • 32
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published 8 days ago • 20
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published 11 days ago • 11
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published 8 days ago • 13
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published 7 days ago • 41
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published 9 days ago • 28
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published 18 days ago • 84
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 5 days ago • 59
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models Paper • 2503.18923 • Published 1 day ago • 10