-
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Paper • 2412.04626 • Published • 13 -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 32 -
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Paper • 2411.03823 • Published • 45 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2411.14522
-
A Survey of Medical Vision-and-Language Applications and Their Techniques
Paper • 2411.12195 • Published -
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
Paper • 2411.14522 • Published • 32 -
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
Paper • 2412.07769 • Published • 26
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 53 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 30 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 73 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 43
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 13 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31