Pixtral-12B-2409: int4 Weight Quant

W4A16 quant of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4' 

If you want a more advanced/fully featured chat template you can use this jinja template

Downloads last month
329
Safetensors
Model size
3.23B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support vllm models with pipeline type image-text-to-text

Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128

Quantized
(5)
this model