OpenGVLab
/

VideoChat-TPO

Video-Text-to-Text

feature-extraction

Model card Files Files and versions Community

VideoChat2-TPO

This model is based on the paper Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

🏃 Installation

pip install -r requirements.txt
python app.py

🔧 Usage

from transformers import AutoModel, AutoTokenizer
from tokenizer import MultimodalLlamaTokenizer

model_path = "OpenGVLab/VideoChat-TPO"
tokenizer =  AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True,
use_fast=False,)
model = AutoModel.from_pretrained(model_path,  trust_remote_code=True, _tokenizer=self.tokenizer).eval()

Downloads last month: 108

Safetensors

Model size

8.1B params

Tensor type

I64

·

BF16

·

Inference Providers NEW

Video-Text-to-Text

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API does not yet support model repos that contain custom code.

Model tree for OpenGVLab/VideoChat-TPO

Base model

mistralai/Mistral-7B-Instruct-v0.2

Finetuned

(916)

this model