schmidt-sebastian commited on
Commit
383a8c8
·
verified ·
1 Parent(s): ae5ca61

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -56,19 +56,19 @@ Note that all benchmark stats are from a Samsung S24 Ultra with
56
  <tr>
57
  <td>fp32 (baseline)</td>
58
  <td>cpu</td>
59
- <td><p style="text-align: right">339.97 tk/s</p></td>
60
- <td><p style="text-align: right">49.27 tk/s</p></td>
61
- <td><p style="text-align: right">1.25 s</p></td>
62
- <td><p style="text-align: right">1,398 MB</p></td>
63
  <td><p style="text-align: right">527 MB</p></td>
64
  </tr>
65
  <tr>
66
  <td>dynamic_int8</td>
67
  <td>cpu</td>
68
- <td><p style="text-align: right">505.93 tk/s</p></td>
69
- <td><p style="text-align: right">89.88 tk/s</p></td>
70
- <td><p style="text-align: right">0.86 s</p></td>
71
- <td><p style="text-align: right">583 MB</p></td>
72
  <td><p style="text-align: right">159 MB</p></td>
73
  </tr>
74
 
@@ -79,5 +79,7 @@ Note that all benchmark stats are from a Samsung S24 Ultra with
79
  * Memory: indicator of peak RAM usage
80
  * The inference on CPU is accelerated via the LiteRT
81
  [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
82
- * Benchmark is done assuming XNNPACK cache is enabled
 
 
83
  * dynamic_int8: quantized model with int8 weights and float activations.
 
56
  <tr>
57
  <td>fp32 (baseline)</td>
58
  <td>cpu</td>
59
+ <td><p style="text-align: right">576.58 tk/s</p></td>
60
+ <td><p style="text-align: right">52.23 tk/s</p></td>
61
+ <td><p style="text-align: right">0.73 s</p></td>
62
+ <td><p style="text-align: right">927 MB</p></td>
63
  <td><p style="text-align: right">527 MB</p></td>
64
  </tr>
65
  <tr>
66
  <td>dynamic_int8</td>
67
  <td>cpu</td>
68
+ <td><p style="text-align: right">1142.86 tk/s</p></td>
69
+ <td><p style="text-align: right">96.65 tk/s</p></td>
70
+ <td><p style="text-align: right">0.45 s</p></td>
71
+ <td><p style="text-align: right">567 MB</p></td>
72
  <td><p style="text-align: right">159 MB</p></td>
73
  </tr>
74
 
 
79
  * Memory: indicator of peak RAM usage
80
  * The inference on CPU is accelerated via the LiteRT
81
  [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
82
+ * Benchmark is run with cache enabled and initialized. During the first run,
83
+ the time to first token may differ.
84
+ * dynamic_int4: quantized model with int4 weights and float activations.
85
  * dynamic_int8: quantized model with int8 weights and float activations.