yusufs commited on
Commit
5bd7bc7
·
1 Parent(s): cb15911

fix(runner.sh): enable eager mode (disabling cuda graph)

Browse files

Becasue of this warning, I tried to disable cuda graph:

Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.

Files changed (1) hide show
  1. runner.sh +1 -0
runner.sh CHANGED
@@ -51,4 +51,5 @@ python -u /app/openai_compatible_api_server.py \
51
  --max-num-batched-tokens 32768 \
52
  --max-model-len 32768 \
53
  --dtype float16 \
 
54
  --gpu-memory-utilization 0.9
 
51
  --max-num-batched-tokens 32768 \
52
  --max-model-len 32768 \
53
  --dtype float16 \
54
+ --enforce-eager \
55
  --gpu-memory-utilization 0.9