Collections

296

13

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 40

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

allenai/Molmo-72B-0924

allenai/Molmo-7B-D-0924

allenai/Molmo-7B-O-0924

allenai/MolmoE-1B-0924

DocLLM: A layout-aware generative language model for multimodal document understanding

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

LoRA+: Efficient Low Rank Adaptation of Large Models

The FinBen: An Holistic Financial Benchmark for Large Language Models

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

TrustLLM: Trustworthiness in Large Language Models

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Can Large Language Models Understand Context?

OLMo: Accelerating the Science of Language Models

Self-Rewarding Language Models

SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Anychat

Qwen2.5 Coder Artifacts

QwQ-32B-Preview

Open LLM Leaderboard

NVLM: Open Frontier-Class Multimodal LLMs

BRAVE: Broadening the visual encoding of vision-language models

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

AutoTrain: No-code training for state-of-the-art models

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

EuroLLM: Multilingual Language Models for Europe

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models