Image-to-Text
sentence-transformers
Safetensors
Transformers
qwen2_vl
Qwen2-VL
text-generation-inference

VRAM usage

#5
by marcoaleixo - opened
HuggingFaceEmbedding(
        embed_batch_size=1,
        model_name=model_path,
        device="cuda",
        trust_remote_code=True,
    )

The model is using 9.4GB of VRAM.
Using transformers==4.49.0 and latest llama-index-embeddings-huggingface.
In the model card I'm seeing that in theory this model should uses only 4.4 of VRAM.
I need to run only the image part of the model on this API.

Am I getting something wrong?

Sign up or log in to comment