See the benchmark scripts in this repo.

pip install deepsparse-nightly[llm]==1.6.0.20231120
pip install openvino==2023.3.0

Benchmarking

  1. Clone this repo
  2. Concatenate the big fp32 IR model:
cd ./models/neuralmagic/mpt-7b-gsm8k-pt/fp32
cat openvino_model.bin.part-a* > openvino_model.bin
  1. Reproduce NM paper: deepsparse_reproduce.bash
  2. OV benchmarkapp: benchmarkapp_*.bash

Generating these IRs

https://github.com/yujiepan-work/24h1-sparse-quantized-llm-ov

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-generation models for OpenVINO library.