Quantized Mistral-NeMo-Instruct-2407 versions for Prompt Sensitivity Blog

This repository contains four quantized versions of Mistral-NeMo-Instruct-2407, created using llama.cpp. The goal was to examine how different quantization methods affect prompt sensitivity with sentiment classification tasks.

Quantization Details

Models were quantized using llama.cpp (release b3922). The imatrix versions used an imatrix.dat file created from Bartowski's calibration dataset, mentioned here.

Models

Filename Size Description
Mistral-NeMo-12B-Instruct-2407-Q8_0.gguf 13 GB 8-bit default quantization
Mistral-NeMo-12B-Instruct-2407-Q5_0.gguf 8.73 GB 5-bit default quantization
Mistral-NeMo-12B-Instruct-2407-imatrix-Q8_0.gguf 13 GB 8-bit with imatrix quantization
Mistral-NeMo-12B-Instruct-2407-imatrix-Q5_0.gguf 8.73 GB 5-bit with imatrix quantization

I've also included the imatrix.dat (7.05 MB) file used to create the imatrix-quantized versions.

Findings

Prompt sensitivity was seen specifically in 5-bit models using imatrix quantization, but not with default llama.cpp quantization settings. Prompt sensitivity was not observed in 8-bit models with either quantization method.

For further discussion please see my accompanying blog post.

Author

Simon Barnes

Downloads last month
30
GGUF
Model size
12.2B params
Architecture
llama

5-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for NLPoetic/Mistral-NeMo-Instruct-2407-GGUF