-
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper • 2409.07452 • Published • 20 -
Generating 3D-Consistent Videos from Unposed Internet Photos
Paper • 2411.13549 • Published -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 51 -
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Paper • 2412.12093 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04928
-
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Paper • 2411.05738 • Published • 15 -
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
Paper • 2410.22476 • Published • 27 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 47 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 25
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 39 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 23 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 51 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 22
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 52 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 32 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 108 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 26
-
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 65 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 36 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper • 2408.12588 • Published • 16 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 59
-
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Paper • 2406.09406 • Published • 15 -
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper • 2406.10227 • Published • 9 -
What If We Recaption Billions of Web Images with LLaMA-3?
Paper • 2406.08478 • Published • 40
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 29 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 38
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9