how to exclude file in "original" folder while model download
Hi
when I am tring to run gpt-oss model using vllm, it downloads model from hugging face. it also download original folder which takes too much time and also consume additional disk space. original folder is not used while inferencing the model.
Please suggest work around for this.
Thanks
You could use the huggingface_hub.snapshot_download()
function to pre-download the model, which allows you to filter out glob patterns for directories/files: https://huggingface.co/docs/huggingface_hub/en/guides/download#filter-files-to-download
Then when you run vLLM you can specify HF_HOME
or HF_HUB_CACHE
to point to the model path: https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhome
I've already downloaded "original" folder, with command:
vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
if I delete following files manually from: /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b/blobs/
68a8dc1f8e2e5996cb702f14332a25ddf3463daeab2df68e21ca09ef181203c3 (original/model--00001-of-00007.safetensors)
19b8f0d5c7dc3195c61a711d08384a1f85624f018186da541585c0f97ac61020 (original/model--00002-of-00007.safetensors)
0dbccd746d50e9543e8016d0a43ab4487c7f86d72349b1ef17abdfec509d0701 (original/model--00003-of-00007.safetensors)
bcc73cf6d18f96a2e62428758463157cc12768f410873152a50d3929a64cd049 (original/model--00004-of-00007.safetensors)
15fd69843e9cc6fdf2db0efe0cf0979b49a6ba84b3a38169b2fabc5479d04a7d (original/model--00005-of-00007.safetensors)
3aedef2ee0a5a78a003b3f74fd6883033946b80097bf41e4f4715d95066f0588 (original/model--00006-of-00007.safetensors)
20d5dfcad1ed6c50aa3c0da7d3f08828dba72b5f58686a987bf3a8f01659cda6 (original/model--00007-of-00007.safetensors)
how can I prevent next serve command downloading same 7 big files?
vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
could you please tell me an alternative command?
default HF_HOME and HF_HUB_CACHE already points out: /home/my-user-name/.cache/huggingface/ and /home/my-user-name/.cache/huggingface/hub/
a working solution:
HF_HUB_OFFLINE=1 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
opena ai suggests downloading original folder manually, so this is an intendend action
gpt-oss-120b
hf download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/
gpt-oss-20b
hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
refrence:
https://github.com/openai/gpt-oss
--exclude rather than --include(?):
huggingface-cli download openai/gpt-oss-120b --exclude "original/*" --local-dir ./gpt-oss-120b
what is working, if you delete big 10.5 GB files manually:
HF_HUB_OFFLINE=1 vllm serve openai/gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
reference: https://github.com/vllm-project/vllm/issues/1910
what is not working:
vllm serve /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b --tensor-parallel-size 4 --async-scheduling
what is not working:
vllm serve /home/my-user-name/.cache/huggingface/hub/models--openai--gpt-oss-120b/blobs --tensor-parallel-size 4 --async-scheduling
even referenced as a working solution here:
https://github.com/vllm-project/vllm/issues/9255
because vllm could not find the file named 'config.json' in local dirs