Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers Paper • 2503.11579 • Published 8 days ago • 14
Comics Pick-A-Panel Collection Dataset, Models and Paper from ComicsPAP: understanding comic strips by picking the correct panel • 4 items • Updated 8 days ago • 2
HoloMine: A Synthetic Dataset for Buried Landmines Recognition using Microwave Holographic Imaging Paper • 2502.21054 • Published 22 days ago
ComicsPAP: understanding comic strips by picking the correct panel Paper • 2503.08561 • Published 11 days ago • 1
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 15 days ago • 33
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 16 days ago • 50
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published 13 days ago • 23
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 19 days ago • 75
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 30 days ago • 131