Generate text and audio responses from images and videos
Generate detailed descriptions from images and videos