VLMEvalKit Eval Results in video understanding benchmark
Text-to-Video
Chat with an AI that understands text and images