Collections
Discover the best community collections!
Collections including paper arxiv:2409.17146
-
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 108 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 76 -
mistralai/Pixtral-12B-2409
Image-Text-to-Text • Updated • • 608 -
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text • Updated • 114k • 392
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 37 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 23
-
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 125 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 59 -
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper • 2408.16725 • Published • 53 -
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 86
-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 98 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 125 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 40 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 117 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 43