how to exclude file in "original" folder while model download

#122

by meetzuber - opened 15 days ago

15 days ago

when I am tring to run gpt-oss model using vllm, it downloads model from hugging face. it also download original folder which takes too much time and also consume additional disk space. original folder is not used while inferencing the model.

Please suggest work around for this.

Thanks

mthreet

12 days ago

You could use the huggingface_hub.snapshot_download() function to pre-download the model, which allows you to filter out glob patterns for directories/files: https://huggingface.co/docs/huggingface_hub/en/guides/download#filter-files-to-download

Then when you run vLLM you can specify HF_HOME or HF_HUB_CACHE to point to the model path: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome

zakcali

10 days ago

I've already downloaded "original" folder, with command:
vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling

if I delete following files manually from: /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b/blobs/
68a8dc1f8e2e5996cb702f14332a25ddf3463daeab2df68e21ca09ef181203c3 (original/model--00001-of-00007.safetensors)
19b8f0d5c7dc3195c61a711d08384a1f85624f018186da541585c0f97ac61020 (original/model--00002-of-00007.safetensors)
0dbccd746d50e9543e8016d0a43ab4487c7f86d72349b1ef17abdfec509d0701 (original/model--00003-of-00007.safetensors)
bcc73cf6d18f96a2e62428758463157cc12768f410873152a50d3929a64cd049 (original/model--00004-of-00007.safetensors)
15fd69843e9cc6fdf2db0efe0cf0979b49a6ba84b3a38169b2fabc5479d04a7d (original/model--00005-of-00007.safetensors)
3aedef2ee0a5a78a003b3f74fd6883033946b80097bf41e4f4715d95066f0588 (original/model--00006-of-00007.safetensors)
20d5dfcad1ed6c50aa3c0da7d3f08828dba72b5f58686a987bf3a8f01659cda6 (original/model--00007-of-00007.safetensors)

how can I prevent next serve command downloading same 7 big files?

vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling

could you please tell me an alternative command?

default HF_HOME and HF_HUB_CACHE already points out: /home/my-user-name/.cache/huggingface/ and /home/my-user-name/.cache/huggingface/hub/

zakcali

9 days ago

•

edited 7 days ago

a working solution:

HF_HUB_OFFLINE=1 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling

reference: https://github.com/vllm-project/vllm/issues/1910

zakcali

9 days ago

opena ai suggests downloading original folder manually, so this is an intendend action

gpt-oss-120b

hf download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/

gpt-oss-20b

hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

refrence:
https://github.com/openai/gpt-oss

SunkenDreams

9 days ago

--exclude rather than --include(?):

huggingface-cli download openai/gpt-oss-120b --exclude "original/*" --local-dir ./gpt-oss-120b

zakcali

7 days ago

•

edited 7 days ago

what is working, if you delete big 10.5 GB files manually:
HF_HUB_OFFLINE=1 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling

reference: https://github.com/vllm-project/vllm/issues/1910

what is not working:
vllm serve /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b --tensor-parallel-size 4 --async-scheduling

what is not working:
vllm serve /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b/blobs --tensor-parallel-size 4 --async-scheduling

even referenced as a working solution here:
https://github.com/vllm-project/vllm/issues/9255

because vllm could not find the file named 'config.json' in local dirs

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment