-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Paper • 2411.07126 • Published • 29 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 36 -
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models
Paper • 2411.09449 • Published -
OminiControl: Minimal and Universal Control for Diffusion Transformer
Paper • 2411.15098 • Published • 55
Collections
Discover the best community collections!
Collections including paper arxiv:2408.16766
-
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
Paper • 2305.06131 • Published • 2 -
Perpetual Humanoid Control for Real-time Simulated Avatars
Paper • 2305.06456 • Published • 1 -
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Paper • 2305.10973 • Published • 35 -
LDM3D: Latent Diffusion Model for 3D
Paper • 2305.10853 • Published • 10
-
Magic Insert: Style-Aware Drag-and-Drop
Paper • 2407.02489 • Published • 22 -
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Paper • 2408.05492 • Published • 7 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper • 2408.16766 • Published • 18 -
Style-Friendly SNR Sampler for Style-Driven Generation
Paper • 2411.14793 • Published • 36
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 60 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 74
-
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Paper • 2311.07574 • Published • 16 -
CSGO: Content-Style Composition in Text-to-Image Generation
Paper • 2408.16766 • Published • 18 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 93 -
CogVLM2: Visual Language Models for Image and Video Understanding
Paper • 2408.16500 • Published • 57