Generate a talking‑head video from a face image and audio/text
Generate lip-synced videos from images and audio