F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Upgraded to v1.0!
Generate text based on images and prompts