Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.03766

The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 102
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Paper • 2502.09620 • Published 11 days ago • 25
The Evolution of Multimodal Model Architectures

Paper • 2405.17927 • Published May 28, 2024 • 1
What matters when building vision-language models?

Paper • 2405.02246 • Published May 3, 2024 • 102
Efficient Architectures for High Resolution Vision-Language Models

Paper • 2501.02584 • Published Jan 5

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47
Orion-14B: Open-source Multilingual Large Language Models

Paper • 2401.12246 • Published Jan 20, 2024 • 13
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published May 15, 2024 • 22
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability

Paper • 2405.14129 • Published May 23, 2024 • 13

small multi modal LM

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 14
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 126
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 52
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 14
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 65

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 45
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 14
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices

Paper • 2312.16886 • Published Dec 28, 2023 • 20
Lenna: Language Enhanced Reasoning Detection Assistant

Paper • 2312.02433 • Published Dec 5, 2023 • 2

Small Multimodal Models

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 142
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model

Paper • 2401.02330 • Published Jan 4, 2024 • 16
Textbooks Are All You Need II: phi-1.5 technical report

Paper • 2309.05463 • Published Sep 11, 2023 • 87
Visual Instruction Tuning

Paper • 2304.08485 • Published Apr 17, 2023 • 13

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 146
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 30
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 23
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

Extending Context Window of Large Language Models via Semantic Compression

Paper • 2312.09571 • Published Dec 15, 2023 • 15
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Paper • 2312.02949 • Published Dec 5, 2023 • 14
TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22, 2024 • 19

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 12
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 14
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 48

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs