Extract sounds from audio using text prompts
Separate noisy audio into clean speaker tracks
Generate edited English speech from audio and text