Spaces:

yusufs
/

vllm-inference

Paused

yusufs commited on Nov 27, 2024

Commit

7935381

1 Parent(s): 6479dc6

feat(refactor): move the files to root

Files changed (7) hide show

README.md CHANGED Viewed

@@ -6,9 +6,39 @@ colorTo: blue
 sdk: docker
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 ```shell
 poetry export -f requirements.txt --output requirements.txt --without-hashes
 ```

 sdk: docker
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 ```shell
 poetry export -f requirements.txt --output requirements.txt --without-hashes
 ```
+## VLLM OpenAI Compatible API Server
+> References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196
+This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py
+* The `HUGGING_FACE_HUB_TOKEN` must exist during runtime.
+## Documentation about config
+* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221
+```shell
+"serve,chat,complete",
+"facebook/opt-12B",
+'--config', 'config.yaml',
+'-tp', '2'
+```
+The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation:
+https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237
+Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint`
+* https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086
+* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913

openai/README.md DELETED Viewed

@@ -1,26 +0,0 @@
-# VLLM OpenAI Compatible API Server
-> References: https://huggingface.co/spaces/sofianhw/ai/tree/c6527a750644a849b6705bb6fe2fcea4e54a8196
-This `api_server.py` file is exact copy version from https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/api_server.py
-* The `HUGGING_FACE_HUB_TOKEN` must exist during runtime.
-## Documentation about config
-* https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/utils.py#L1207-L1221
-```shell
-"serve,chat,complete",
-"facebook/opt-12B",
-'--config', 'config.yaml',
-'-tp', '2'
-```
-The yaml is equivalent with argument flag params. Consider passing using flag params that defined here for better documentation:
-https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/entrypoints/openai/cli_args.py#L77-L237
-Other arguments is the same as LLM class such as `--max-model-len`, `--dtype`, or `--otlp-traces-endpoint`
- * https://github.com/vllm-project/vllm/blob/v0.6.4/vllm/config.py#L1061-L1086
- * https://github.com/vllm-project/vllm/blob/v0.6.4.post1/vllm/engine/arg_utils.py#L221-L913

openai/__init__.py DELETED Viewed

File without changes

openai/api_server.py → openai_compatible_api_server.py RENAMED Viewed

File without changes

poetry.lock CHANGED Viewed

@@ -4117,4 +4117,4 @@ type = ["pytest-mypy"]
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.12,<3.13"
-content-hash = "3dbf211555b75838ffe36b519ebe3192776ef7f2eb4fb48f33cff9791b9f91fb"

 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.12,<3.13"
+content-hash = "cb3970f2566497f77454d834fb9d3dfe2dfe25be5a327e21bae997924b5c0619"

pyproject.toml CHANGED Viewed

@@ -13,6 +13,7 @@ fastapi = "^0.115.5"
 pydantic = "^2.10.2"
 uvicorn = "^0.32.1"
 torch = "^2.5.1"
 [build-system]

 pydantic = "^2.10.2"
 uvicorn = "^0.32.1"
 torch = "^2.5.1"
+openai = "^1.55.1"
 [build-system]

run.sh CHANGED Viewed

@@ -3,7 +3,7 @@
 printf "Running vLLM OpenAI compatible API Server at port %s\n" "7860"
-python -u /app/openai/api_server.py \
     --model meta-llama/Llama-3.2-3B-Instruct \
     --revision 0cb88a4f764b7a12671c53f0838cd831a0843b95 \
     --host 0.0.0.0 \

 printf "Running vLLM OpenAI compatible API Server at port %s\n" "7860"
+python -u /app/openai_compatible_api_server.py \
     --model meta-llama/Llama-3.2-3B-Instruct \
     --revision 0cb88a4f764b7a12671c53f0838cd831a0843b95 \
     --host 0.0.0.0 \