yusufs commited on
Commit
6dac0d0
·
1 Parent(s): 0f3cd25

docs(sailor): add not about minimum resources of sailor

Browse files
Files changed (1) hide show
  1. run-sailor.sh +2 -0
run-sailor.sh CHANGED
@@ -12,6 +12,8 @@ printf "Running sail/Sailor-4B-Chat using vLLM OpenAI compatible API Server at p
12
  # ERROR 11-27 15:32:10 engine.py:366] The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (7536). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
13
 
14
  # After increasing gpu utilization to 0.9, the maximum token for this model is: 9456
 
 
15
 
16
  # 7536tokens÷1.2=6280words.
17
  # 6280words÷500words/page=12.56pages. (For single-spaced)
 
12
  # ERROR 11-27 15:32:10 engine.py:366] The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (7536). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
13
 
14
  # After increasing gpu utilization to 0.9, the maximum token for this model is: 9456
15
+ # Using NVIDIA 1xL4 (8vCPU 30GB RAM 24GB VRAM) still only support 23712 tokens.
16
+ # Using NVIDIA 1xL40S (8vCPU 62GB RAM 48GB VRAM) can support 32768 token. (Increasing RAM not works, only increasing VRAM works).
17
 
18
  # 7536tokens÷1.2=6280words.
19
  # 6280words÷500words/page=12.56pages. (For single-spaced)