litert-community/Gecko-110m-en
This model provides a few variants of the embedding model published in the Gecko paper that are ready for deployment on Android or iOS using LiteRT stack or google ai edge RAG SDK.
Use the models
Android
- Try out the gecko embedding model in the google ai edge RAG SDK. You can find the SDK on GitHub or follow our android guide to install directly from Maven. We have also published a sample app.
- Use the sentencepiece model as the tokenizer for the Gecko embedding model.
Performance
Android
Note that all benchmark stats are from a Samsung S23 Ultra.
Backend | Max sequence length | Init time (ms) | Inference time (ms) | Memory (RSS in MB) | Model size (MB) | |
---|---|---|---|---|---|---|
dynamic_int8 |
GPU |
256 |
1306.06 |
76.2 |
604.5 |
114 |
dynamic_int8 |
GPU |
512 |
1363.38 |
173.2 |
604.6 |
120 |
dynamic_int8 |
GPU |
1024 |
1419.87 |
397 |
871.1 |
145 |
dynamic_int8 |
CPU |
256 |
11.03 |
147.6 |
126.3 |
114 |
dynamic_int8 |
CPU |
512 |
30.04 |
353.1 |
225.6 |
120 |
dynamic_int8 |
CPU |
1024 |
79.17 |
954 |
619.5 |
145 |
- Model Size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models)
- Memory: indicator of peak RAM usage
- The inference is run on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
- The inference on GPU is accelerated via LiteRT GPU delegate.
- Benchmark is done assuming XNNPACK cache is enabled
- dynamic_int8: quantized model with int8 weights and float activations.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.