Update README.md
Browse files
README.md
CHANGED
@@ -300,11 +300,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
300 |
<th>Model</th>
|
301 |
<th>Average Cost Reduction</th>
|
302 |
<th>Latency (s)</th>
|
303 |
-
<th>
|
304 |
-
<th>Latency (s)th>
|
305 |
-
<th>QPD</th>
|
306 |
<th>Latency (s)</th>
|
307 |
-
<th>
|
|
|
|
|
308 |
</tr>
|
309 |
</thead>
|
310 |
<tbody style="text-align: center">
|
@@ -404,7 +404,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
404 |
</tbody>
|
405 |
</table>
|
406 |
|
|
|
407 |
|
|
|
408 |
|
409 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
410 |
|
@@ -423,11 +425,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
423 |
<th>Model</th>
|
424 |
<th>Average Cost Reduction</th>
|
425 |
<th>Maximum throughput (QPS)</th>
|
426 |
-
<th>
|
427 |
<th>Maximum throughput (QPS)</th>
|
428 |
-
<th>
|
429 |
<th>Maximum throughput (QPS)</th>
|
430 |
-
<th>
|
431 |
</tr>
|
432 |
</thead>
|
433 |
<tbody style="text-align: center">
|
@@ -525,4 +527,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
|
|
525 |
<td>6777</td>
|
526 |
</tr>
|
527 |
</tbody>
|
528 |
-
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
300 |
<th>Model</th>
|
301 |
<th>Average Cost Reduction</th>
|
302 |
<th>Latency (s)</th>
|
303 |
+
<th>Queries Per Dollar</th>
|
|
|
|
|
304 |
<th>Latency (s)</th>
|
305 |
+
<th>Queries Per Dollar</th>
|
306 |
+
<th>Latency (s)</th>
|
307 |
+
<th>Queries Per Dollar</th>
|
308 |
</tr>
|
309 |
</thead>
|
310 |
<tbody style="text-align: center">
|
|
|
404 |
</tbody>
|
405 |
</table>
|
406 |
|
407 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
408 |
|
409 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|
410 |
|
411 |
### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
|
412 |
|
|
|
425 |
<th>Model</th>
|
426 |
<th>Average Cost Reduction</th>
|
427 |
<th>Maximum throughput (QPS)</th>
|
428 |
+
<th>Queries Per Dollar</th>
|
429 |
<th>Maximum throughput (QPS)</th>
|
430 |
+
<th>Queries Per Dollar</th>
|
431 |
<th>Maximum throughput (QPS)</th>
|
432 |
+
<th>Queries Per Dollar</th>
|
433 |
</tr>
|
434 |
</thead>
|
435 |
<tbody style="text-align: center">
|
|
|
527 |
<td>6777</td>
|
528 |
</tr>
|
529 |
</tbody>
|
530 |
+
</table>
|
531 |
+
|
532 |
+
**Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
|
533 |
+
|
534 |
+
**QPS: Queries per second.
|
535 |
+
|
536 |
+
**QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
|