Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2401.14405

Paper - Multimodal

Paper related to Multimodal Model - Research for a : Modular, Multimodal, Multi-Stream, Mixture of Expert, Universal Transformer, Matryoshka embedding

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 26
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published Dec 16, 2024 • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 126
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published Dec 18, 2024 • 14

General Multimodal Learning

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26, 2024 • 29
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 36
Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 93

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Paper • 2312.06134 • Published Dec 11, 2023 • 3
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 7
Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 38
Exploring Format Consistency for Instruction Tuning

Paper • 2307.15504 • Published Jul 28, 2023 • 8

YOLO-World: Real-Time Open-Vocabulary Object Detection

Paper • 2401.17270 • Published Jan 30, 2024 • 36
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 17
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 27

Transformer improvements

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13

Autonomous agents

WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models

Paper • 2401.13919 • Published Jan 25, 2024 • 30
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 94
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 65

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Paper • 2407.06358 • Published Jul 8, 2024 • 19
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Paper • 2412.07825 • Published Dec 10, 2024 • 12

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Paper • 2312.08344 • Published Dec 13, 2023 • 10
Diffusion Priors for Dynamic View Synthesis from Monocular Videos

Paper • 2401.05583 • Published Jan 10, 2024 • 10
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 25 days ago • 89

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs