File size: 2,869 Bytes
d345edd b3e7d74 d345edd b3e7d74 4c2dc50 b3e7d74 a4a8b62 405b411 a4a8b62 c7ca07e a4a8b62 75e15af b3e7d74 75e15af b3e7d74 d544ad3 b3e7d74 d464545 d345edd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: mit
---
Converted [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) model in onnx fp16/int8 format for use with [Vespa Embedding](https://docs.vespa.ai/en/embedding.html).
- intfloat-multilingual-e5-large_fp16.onnx (fp16)
- intfloat-multilingual-e5-large_quantized.onnx (int8 quantized)
The model was quantized using the [optimum](https://github.com/huggingface/optimum) toolkit.
## Example of vespa services.xml:
**Notice**: FP16 works well with Vespa versions `8.325.46` and above.
```
<component id="me5_large" type="hugging-face-embedder">
<transformer-model
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_fp16.onnx" />
<!-- or int8 quantization model
<transformer-model
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/intfloat-multilingual-e5-large_quantized.onnx"
/>
-->
<tokenizer-model
url="https://huggingface.co/hotchpotch/vespa-onnx-intfloat-multilingual-e5-large/resolve/main/tokenizer.json" />
<normalize>true</normalize>
<pooling-strategy>mean</pooling-strategy>
</component>
```
### deploy
```
# FP16 model has a larger file size, which can result in longer deployment times.
vespa deploy --wait 1800 .
```
## Tips: conver to int8 quantized
```
# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large
```
```
optimum-cli onnxruntime quantize --onnx_model ./me5-large -o me5-large-large_quantized --avx512_vnni
```
## Tips: convert to fp16
```
# https://github.com/vespa-engine/sample-apps/blob/master/simple-semantic-search/export_hf_model_from_hf.py
./export_hf_model_from_hf.py --hf_model intfloat/multilingual-e5-large --output_dir me5-large
```
- https://gist.github.com/hotchpotch/64fa52d32886fe61cc1d110066afef38
```
# https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py
import onnx
from onnxruntime.transformers.float16 import convert_float_to_float16
onnx_model = onnx.load("me5-large/intfloat-multilingual-e5-large.onnx")
model_fp16 = convert_float_to_float16(onnx_model, disable_shape_infer=True)
onnx.save(model_fp16, "me5-large/intfloat-multilingual-e5-large_fp16.onnx")
```
## License
The license for this model is based on the original license (found in the LICENSE file in the project's root directory), which is the MIT License.
- https://huggingface.co/intfloat/multilingual-e5-large
## Attribution
All credits for this model go to the authors of Multilingual-E5-large and the associated researchers and organizations. When using this model, please be sure to attribute the original authors. |