Talk to Qwen2Audio with Gradio and WebRTC โก๏ธ
Talk to OpenAI using their multimodal API
Segment objects in images using prompts