VRAM usage

by marcoaleixo - opened May 21

May 21

HuggingFaceEmbedding(
        embed_batch_size=1,
        model_name=model_path,
        device="cuda",
        trust_remote_code=True,
    )

The model is using 9.4GB of VRAM.
Using transformers==4.49.0 and latest llama-index-embeddings-huggingface.
In the model card I'm seeing that in theory this model should uses only 4.4 of VRAM.
I need to run only the image part of the model on this API.

Am I getting something wrong?

marcoaleixo

May 21

Colab to reproduce - https://colab.research.google.com/drive/19PjYHqZrgJGHrYTjtD73JbfWy_5o7mGa?usp=sharing

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment