Question Answering
LiteRT
English

litert-community/Gecko-110m-en

This model provides a few variants of the embedding model published in the Gecko paper that are ready for deployment on Android or iOS using LiteRT stack or google ai edge RAG SDK.

Use the models

Android

  • Try out the gecko embedding model in the google ai edge RAG SDK. You can find the SDK on GitHub or follow our android guide to install directly from Maven. We have also published a sample app.
  • Use the sentencepiece model as the tokenizer for the Gecko embedding model.

Performance

Android

Note that all benchmark stats are from a Samsung S23 Ultra.

Backend Max sequence length Init time (ms) Inference time (ms) Memory (RSS in MB) Model size (MB)

dynamic_int8

GPU

256

1306.06

76.2

604.5

114

dynamic_int8

GPU

512

1363.38

173.2

604.6

120

dynamic_int8

GPU

1024

1419.87

397

871.1

145

dynamic_int8

CPU

256

11.03

147.6

126.3

114

dynamic_int8

CPU

512

30.04

353.1

225.6

120

dynamic_int8

CPU

1024

79.17

954

619.5

145

  • Model Size: measured by the size of the .tflite flatbuffer (serialization format for LiteRT models)
  • Memory: indicator of peak RAM usage
  • The inference is run on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
  • The inference on GPU is accelerated via LiteRT GPU delegate.
  • Benchmark is done assuming XNNPACK cache is enabled
  • dynamic_int8: quantized model with int8 weights and float activations.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.