litert-community
/

Gecko-110m-en

Question Answering

Model card Files Files and versions

litert-community/Gecko-110m-en

This model provides a few variants of the embedding model published in the Gecko paper that are ready for deployment on Android or iOS using LiteRT stack or google ai edge RAG SDK.

Use the models

Android

Try out the gecko embedding model in the google ai edge RAG SDK. You can find the SDK on GitHub or follow our android guide to install directly from Maven. We have also published a sample app.
Use the sentencepiece model as the tokenizer for the Gecko embedding model.

Performance

Android

Note that all benchmark stats are from a Samsung S23 Ultra.

	Backend	Max sequence length	Init time (ms)	Inference time (ms)	Memory (RSS in MB)	Model size (MB)
dynamic_int8	GPU	256	1306.06	76.2	604.5	114
dynamic_int8	GPU	512	1363.38	173.2	604.6	120
dynamic_int8	GPU	1024	1419.87	397	871.1	145
dynamic_int8	CPU	256	11.03	147.6	126.3	114
dynamic_int8	CPU	512	30.04	353.1	225.6	120
dynamic_int8	CPU	1024	79.17	954	619.5	145

Model Size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models)
Memory: indicator of peak RAM usage
The inference is run on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
The inference on GPU is accelerated via LiteRT GPU delegate.
Benchmark is done assuming XNNPACK cache is enabled
dynamic_int8: quantized model with int8 weights and float activations.

Downloads last month: 327