Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,95 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# litert-community/Gecko-110m-en
|
| 8 |
+
|
| 9 |
+
This model provides a few variants of the embedding model published in the [Gecko paper](https://arxiv.org/abs/2403.20327) that are ready for deployment on Android or iOS using [LiteRT stack](https://ai.google.dev/edge/litert) or [google ai edge RAG SDK](https://ai.google.dev/edge/mediapipe/solutions/genai/rag).
|
| 10 |
+
|
| 11 |
+
## Use the models
|
| 12 |
+
|
| 13 |
+
### Android
|
| 14 |
+
|
| 15 |
+
* Try out the gecko embedding model in the [google ai edge RAG SDK](https://ai.google.dev/edge/mediapipe/solutions/genai/rag)
|
| 16 |
+
* Follow the instructions in the [android guide](https://ai.google.dev/edge/mediapipe/solutions/genai/rag/android)
|
| 17 |
+
|
| 18 |
+
## Performance
|
| 19 |
+
|
| 20 |
+
### Android
|
| 21 |
+
|
| 22 |
+
Note that all benchmark stats are from a Samsung S23 Ultra.
|
| 23 |
+
|
| 24 |
+
<table border="1">
|
| 25 |
+
<tr>
|
| 26 |
+
<th></th>
|
| 27 |
+
<th>Backend</th>
|
| 28 |
+
<th>Max sequence length</th>
|
| 29 |
+
<th>Init time (ms)</th>
|
| 30 |
+
<th>Inference time (ms)</th>
|
| 31 |
+
<th>Memory (RSS in MB)</th>
|
| 32 |
+
<th>Model size (MB)</th>
|
| 33 |
+
</tr>
|
| 34 |
+
<tr>
|
| 35 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 36 |
+
<td><p style="text-align: right">GPU</p></td>
|
| 37 |
+
<td><p style="text-align: right">256</p></td>
|
| 38 |
+
<td><p style="text-align: right">1306.06</p></td>
|
| 39 |
+
<td><p style="text-align: right">76.2</p></td>
|
| 40 |
+
<td><p style="text-align: right">604.5</p></td>
|
| 41 |
+
<td><p style="text-align: right">114</p></td>
|
| 42 |
+
</tr>
|
| 43 |
+
<tr>
|
| 44 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 45 |
+
<td><p style="text-align: right">GPU</p></td>
|
| 46 |
+
<td><p style="text-align: right">512</p></td>
|
| 47 |
+
<td><p style="text-align: right">1363.38</p></td>
|
| 48 |
+
<td><p style="text-align: right">173.2</p></td>
|
| 49 |
+
<td><p style="text-align: right">604.6</p></td>
|
| 50 |
+
<td><p style="text-align: right">120</p></td>
|
| 51 |
+
</tr>
|
| 52 |
+
<tr>
|
| 53 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 54 |
+
<td><p style="text-align: right">GPU</p></td>
|
| 55 |
+
<td><p style="text-align: right">1024</p></td>
|
| 56 |
+
<td><p style="text-align: right">1419.87</p></td>
|
| 57 |
+
<td><p style="text-align: right">397</p></td>
|
| 58 |
+
<td><p style="text-align: right">871.1</p></td>
|
| 59 |
+
<td><p style="text-align: right">145</p></td>
|
| 60 |
+
</tr>
|
| 61 |
+
<tr>
|
| 62 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 63 |
+
<td><p style="text-align: right">CPU</p></td>
|
| 64 |
+
<td><p style="text-align: right">256</p></td>
|
| 65 |
+
<td><p style="text-align: right">11.03</p></td>
|
| 66 |
+
<td><p style="text-align: right">147.6</p></td>
|
| 67 |
+
<td><p style="text-align: right">126.3</p></td>
|
| 68 |
+
<td><p style="text-align: right">114</p></td>
|
| 69 |
+
</tr>
|
| 70 |
+
<tr>
|
| 71 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 72 |
+
<td><p style="text-align: right">CPU</p></td>
|
| 73 |
+
<td><p style="text-align: right">512</p></td>
|
| 74 |
+
<td><p style="text-align: right">30.04</p></td>
|
| 75 |
+
<td><p style="text-align: right">353.1</p></td>
|
| 76 |
+
<td><p style="text-align: right">225.6</p></td>
|
| 77 |
+
<td><p style="text-align: right">120</p></td>
|
| 78 |
+
</tr>
|
| 79 |
+
<tr>
|
| 80 |
+
<td><p style="text-align: right">dynamic_int8</p></td>
|
| 81 |
+
<td><p style="text-align: right">CPU</p></td>
|
| 82 |
+
<td><p style="text-align: right">1024</p></td>
|
| 83 |
+
<td><p style="text-align: right">79.17</p></td>
|
| 84 |
+
<td><p style="text-align: right">954</p></td>
|
| 85 |
+
<td><p style="text-align: right">619.5</p></td>
|
| 86 |
+
<td><p style="text-align: right">145</p></td>
|
| 87 |
+
</tr>
|
| 88 |
+
</table>
|
| 89 |
+
|
| 90 |
+
* Model Size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models)
|
| 91 |
+
* Memory: indicator of peak RAM usage
|
| 92 |
+
* The inference is run on CPU is accelerated via the LiteRT [XNNPACK](https://github.com/google/XNNPACK) delegate with 4 threads
|
| 93 |
+
* The inference on GPU is accelerated via LiteRT GPU delegate.
|
| 94 |
+
* Benchmark is done assuming XNNPACK cache is enabled
|
| 95 |
+
* dynamic_int8: quantized model with int8 weights and float activations.
|