view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 865
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • Updated Sep 26, 2024 • 10.5M • • 493
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 86
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 27
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Paper • 2306.16527 • Published Jun 21, 2023 • 47
google-research-datasets/conceptual_captions Viewer • Updated Jun 17, 2024 • 5.34M • 10.8k • 93
Consolidating Attention Features for Multi-view Image Editing Paper • 2402.14792 • Published Feb 22, 2024 • 8