Spaces:
Paused
Paused
docs(sailor): add not about minimum resources of sailor
Browse files- run-sailor.sh +2 -0
run-sailor.sh
CHANGED
@@ -12,6 +12,8 @@ printf "Running sail/Sailor-4B-Chat using vLLM OpenAI compatible API Server at p
|
|
12 |
# ERROR 11-27 15:32:10 engine.py:366] The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (7536). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
|
13 |
|
14 |
# After increasing gpu utilization to 0.9, the maximum token for this model is: 9456
|
|
|
|
|
15 |
|
16 |
# 7536tokens÷1.2=6280words.
|
17 |
# 6280words÷500words/page=12.56pages. (For single-spaced)
|
|
|
12 |
# ERROR 11-27 15:32:10 engine.py:366] The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (7536). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
|
13 |
|
14 |
# After increasing gpu utilization to 0.9, the maximum token for this model is: 9456
|
15 |
+
# Using NVIDIA 1xL4 (8vCPU 30GB RAM 24GB VRAM) still only support 23712 tokens.
|
16 |
+
# Using NVIDIA 1xL40S (8vCPU 62GB RAM 48GB VRAM) can support 32768 token. (Increasing RAM not works, only increasing VRAM works).
|
17 |
|
18 |
# 7536tokens÷1.2=6280words.
|
19 |
# 6280words÷500words/page=12.56pages. (For single-spaced)
|