Spaces:
Paused
Paused
fix(runner.sh): enable eager mode (disabling cuda graph)
Browse filesBecasue of this warning, I tried to disable cuda graph:
Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
runner.sh
CHANGED
@@ -51,4 +51,5 @@ python -u /app/openai_compatible_api_server.py \
|
|
51 |
--max-num-batched-tokens 32768 \
|
52 |
--max-model-len 32768 \
|
53 |
--dtype float16 \
|
|
|
54 |
--gpu-memory-utilization 0.9
|
|
|
51 |
--max-num-batched-tokens 32768 \
|
52 |
--max-model-len 32768 \
|
53 |
--dtype float16 \
|
54 |
+
--enforce-eager \
|
55 |
--gpu-memory-utilization 0.9
|