A collection of video-language models
Generate descriptions by uploading images or videos
Generate answers by uploading images or videos