Spaces:
Paused
Paused
File size: 1,725 Bytes
1a7087e ae7cfbb 1a7087e ae7cfbb 1a7087e 7935381 1a7087e ae7cfbb 7935381 fc30f26 7935381 8679a35 5f3bf21 7935381 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
title: Deploy VLLM
emoji: 🐢
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
```shell
poetry export -f requirements.txt --output requirements.txt --without-hashes
```
* The `HUGGING_FACE_HUB_TOKEN` and `HF_TOKEN` must exist during runtime (use the same value, it must have read permission to the model.)
## VLLM OpenAI Compatible API Server
> References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196
This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py
Changes (use diff tool to see the exact changes of the file):
* [x] change everything route in api_server.py that start (“/v1/xxx”) to (“/api/v1/xxx”).
and just run the python api_server.py with arguments. https://discuss.huggingface.co/t/run-vllm-docker-on-space/70228/5?u=yusufs
## Documentation about config
* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221
```shell
"serve,chat,complete",
"facebook/opt-12B",
'--config', 'config.yaml',
'-tp', '2'
```
The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation:
https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237
Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint`
* https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086
* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913
|