-
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
Paper • 2401.10529 • Published • 1 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 -
SVIT: Scaling up Visual Instruction Tuning
Paper • 2307.04087 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2311.06783
-
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 -
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Paper • 2311.07574 • Published • 16 -
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Paper • 2401.04575 • Published • 17 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 62
-
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper • 2311.06495 • Published • 12 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 23
-
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper • 2311.07361 • Published • 14 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
teknium/openhermes
Viewer • Updated • 243k • 791 • 208
-
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 10 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 23 -
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 11 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28
-
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper • 2309.07749 • Published • 8 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 27 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 53 -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper • 2309.06895 • Published • 27