Separate audio sources using text, images, and audio queries
Cutting edge open-vocabulary object detection app
Describe masked parts of images