VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary Paper • 2503.09402 • Published 1 day ago • 4
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 7 days ago • 59
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 10 days ago • 65
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 9 days ago • 63
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models Paper • 2502.16033 • Published 20 days ago • 16