VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 8 days ago • 75
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 9 days ago • 275
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 9 days ago • 51
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
Large Action Models: From Inception to Implementation Paper • 2412.10047 • Published Dec 13, 2024 • 33
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations Paper • 2412.05994 • Published Dec 8, 2024 • 18
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 93
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views Paper • 2412.06767 • Published Dec 9, 2024 • 6
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published Dec 3, 2024 • 108
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 124
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Paper • 2411.16781 • Published Nov 25, 2024 • 10
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset Paper • 2411.15640 • Published Nov 23, 2024 • 4
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published Nov 26, 2024 • 49
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 42
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 66
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89
Can Knowledge Editing Really Correct Hallucinations? Paper • 2410.16251 • Published Oct 21, 2024 • 54
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23, 2024 • 49