license: mit | |
pipeline_tag: text-generation | |
library_name: transformers.js | |
tags: | |
- ONNX | |
- DML | |
- ONNXRuntime | |
- nlp | |
- conversational | |
# Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web | |
This is the same models as the [official phi3 onnx model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) with a few changes to make it work for onnxruntime-web: | |
1. the model is fp16 with int4 block quantization for weights | |
2. the 'logits' output is fp32 | |
3. the model uses MHA instead of GQA | |
4. onnx and external data file need to stay below 2GB to be cacheable in chromium |