-
57
Llava
π’Chat with LLaVA using images and text
-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper β’ 2310.08166 β’ Published β’ 1 -
996
OOTDiffusion
π₯ΌHigh-quality virtual try-on ~ Your cyber fitting room
Collections
Discover the best community collections!
Collections including paper arxiv:2310.08166
-
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper β’ 2311.00571 β’ Published β’ 40 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper β’ 2311.05437 β’ Published β’ 48 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper β’ 2310.08166 β’ Published β’ 1 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper β’ 2310.00653 β’ Published β’ 3
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper β’ 2310.13961 β’ Published β’ 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper β’ 2309.09582 β’ Published β’ 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper β’ 2310.13127 β’ Published β’ 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper β’ 2308.14306 β’ Published β’ 1
-
Dissecting In-Context Learning of Translations in GPTs
Paper β’ 2310.15987 β’ Published β’ 6 -
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Paper β’ 2309.08958 β’ Published β’ 2 -
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Paper β’ 2305.04160 β’ Published β’ 2 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper β’ 2310.08166 β’ Published β’ 1
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper β’ 2310.16045 β’ Published β’ 16 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper β’ 2310.14566 β’ Published β’ 26 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper β’ 2310.13355 β’ Published β’ 9 -
Conditional Diffusion Distillation
Paper β’ 2310.01407 β’ Published β’ 20