Introduction

This repository contains the 6B model of the paper InternVideo2 in stage 2.

Code: https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality

πŸš€ Installation

Please refer to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md

Usage

import cv2
from transformers import AutoModel
from modeling_internvideo2 import (retrieve_text, vid2tensor, _frame_from_video,)


if __name__ == '__main__':
    model = AutoModel.from_pretrained("OpenGVLab/InternVideo2-Stage2_6B", trust_remote_code=True).eval()

    video = cv2.VideoCapture('example1.mp4')
    frames = [x for x in _frame_from_video(video)]
    text_candidates = ["A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.",
                    "A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.",
                    "A person dressed in a blue jacket shovels the snow-covered pavement outside their house.",
                    "A cat excitedly runs through the yard, chasing a rabbit.",
                    "A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery."]

    texts, probs = retrieve_text(frames, text_candidates, model=model, topk=5)
    for t, p in zip(texts, probs):
        print(f'text: {t} ~ prob: {p:.4f}')

    vidtensor = vid2tensor('example1.mp4', fnum=4)
    feat = model.get_vid_feat(vidtensor)
Downloads last month
35
Safetensors
Model size
6.37B params
Tensor type
I64
Β·
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including OpenGVLab/InternVideo2-Stage2_6B