Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,7 @@ You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tence
|
|
100 |
|
101 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|
102 |
|
103 |
-
| Inference Framework | Model | Number of GPUs (
|
104 |
|------|------------|-------------------------|-------------------------|---------------------|----------------------|
|
105 |
| vLLM | hunyuan-7B | 1 | 2048 | 78.9 | 279.5 |
|
106 |
|
|
|
100 |
|
101 |
This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
|
102 |
|
103 |
+
| Inference Framework | Model | Number of GPUs (GPU productA) | input_length | batch=1 | batch=4 |
|
104 |
|------|------------|-------------------------|-------------------------|---------------------|----------------------|
|
105 |
| vLLM | hunyuan-7B | 1 | 2048 | 78.9 | 279.5 |
|
106 |
|