shubhrapandit commited on
Commit
54c7f99
·
verified ·
1 Parent(s): e9b324d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -300,11 +300,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
300
  <th>Model</th>
301
  <th>Average Cost Reduction</th>
302
  <th>Latency (s)</th>
303
- <th>QPD</th>
304
- <th>Latency (s)th>
305
- <th>QPD</th>
306
  <th>Latency (s)</th>
307
- <th>QPD</th>
 
 
308
  </tr>
309
  </thead>
310
  <tbody style="text-align: center">
@@ -404,7 +404,9 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
404
  </tbody>
405
  </table>
406
 
 
407
 
 
408
 
409
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
410
 
@@ -423,11 +425,11 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
423
  <th>Model</th>
424
  <th>Average Cost Reduction</th>
425
  <th>Maximum throughput (QPS)</th>
426
- <th>QPD</th>
427
  <th>Maximum throughput (QPS)</th>
428
- <th>QPD</th>
429
  <th>Maximum throughput (QPS)</th>
430
- <th>QPD</th>
431
  </tr>
432
  </thead>
433
  <tbody style="text-align: center">
@@ -525,4 +527,10 @@ The following performance benchmarks were conducted with [vLLM](https://docs.vll
525
  <td>6777</td>
526
  </tr>
527
  </tbody>
528
- </table>
 
 
 
 
 
 
 
300
  <th>Model</th>
301
  <th>Average Cost Reduction</th>
302
  <th>Latency (s)</th>
303
+ <th>Queries Per Dollar</th>
 
 
304
  <th>Latency (s)</th>
305
+ <th>Queries Per Dollar</th>
306
+ <th>Latency (s)</th>
307
+ <th>Queries Per Dollar</th>
308
  </tr>
309
  </thead>
310
  <tbody style="text-align: center">
 
404
  </tbody>
405
  </table>
406
 
407
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
408
 
409
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).
410
 
411
  ### Multi-stream asynchronous performance (measured with vLLM version 0.7.2)
412
 
 
425
  <th>Model</th>
426
  <th>Average Cost Reduction</th>
427
  <th>Maximum throughput (QPS)</th>
428
+ <th>Queries Per Dollar</th>
429
  <th>Maximum throughput (QPS)</th>
430
+ <th>Queries Per Dollar</th>
431
  <th>Maximum throughput (QPS)</th>
432
+ <th>Queries Per Dollar</th>
433
  </tr>
434
  </thead>
435
  <tbody style="text-align: center">
 
527
  <td>6777</td>
528
  </tr>
529
  </tbody>
530
+ </table>
531
+
532
+ **Use case profiles: Image Size (WxH) / prompt tokens / generation tokens
533
+
534
+ **QPS: Queries per second.
535
+
536
+ **QPD: Queries per dollar, based on on-demand cost at [Lambda Labs](https://lambdalabs.com/service/gpu-cloud) (observed on 2/18/2025).