Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.07915

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 10
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 23
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 11
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 28

Dissecting In-Context Learning of Translations in GPTs

Paper • 2310.15987 • Published Oct 24, 2023 • 6
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

Paper • 2202.07922 • Published Feb 16, 2022 • 1
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

Paper • 2310.08101 • Published Oct 12, 2023 • 2

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 16
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Advanced and Recent Papers

Advanced and recent papers about deep learning. Please send your recommend paper to email: [email protected]

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Paper • 2309.13018 • Published Sep 22, 2023 • 9
Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 27
Language models in molecular discovery

Paper • 2309.16235 • Published Sep 28, 2023 • 10

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Paper • 2309.07915 • Published Sep 14, 2023 • 4
Skywork: A More Open Bilingual Foundation Model

Paper • 2310.19341 • Published Oct 30, 2023 • 6
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

Paper • 2310.19061 • Published Oct 29, 2023 • 8
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 85

Image Captioning

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Paper • 2309.07915 • Published Sep 14, 2023 • 4
VidChapters-7M: Video Chapters at Scale

Paper • 2309.13952 • Published Sep 25, 2023 • 11
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs