Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.20229

about 1 hour ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 33
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 26
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 123
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Paper • 2406.09406 • Published Jun 13, 2024 • 15
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 39

GECO: Generative Image-to-3D within a SECOnd

Paper • 2405.20327 • Published May 30, 2024 • 10
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published Jun 5, 2024 • 20
NPGA: Neural Parametric Gaussian Avatars

Paper • 2405.19331 • Published May 29, 2024 • 10
Unified Text-to-Image Generation and Retrieval

Paper • 2406.05814 • Published Jun 9, 2024 • 13

about 19 hours ago

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion

Paper • 2401.09416 • Published Jan 17, 2024 • 11
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

Paper • 2401.10171 • Published Jan 18, 2024 • 14
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Paper • 2311.09217 • Published Nov 15, 2023 • 22
GALA: Generating Animatable Layered Assets from a Single Scan

Paper • 2401.12979 • Published Jan 23, 2024 • 9

foundation model

DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 88
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 15

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs