-
Random Field Augmentations for Self-Supervised Representation Learning
Paper • 2311.03629 • Published • 10 -
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Paper • 2311.04589 • Published • 23 -
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Paper • 2311.04901 • Published • 11 -
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Paper • 2311.06783 • Published • 28
Collections
Discover the best community collections!
Collections including paper arxiv:2309.07915
-
Dissecting In-Context Learning of Translations in GPTs
Paper • 2310.15987 • Published • 6 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 16 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 27 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Paper • 2309.16414 • Published • 19 -
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Paper • 2309.13018 • Published • 9 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 27 -
Language models in molecular discovery
Paper • 2309.16235 • Published • 10
-
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Paper • 2309.07915 • Published • 4 -
Skywork: A More Open Bilingual Foundation Model
Paper • 2310.19341 • Published • 6 -
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Paper • 2310.19061 • Published • 8 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 85