Transform images based on text prompts
Image to Compositional 3D Scene Generation
PHOTOREALISTIC HUMAN RECONSTRUCTION w/ CROSS-SCALE DIFF