Pixtral-12B-2409: 2:4 sparse
2:4 sparse version of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.
Example VLLM usage
vllm serve nintwentydo/pixtral-12b-2409-2of4-sparse --max-model-len 131072 --limit-mm-per-prompt 'image=4'
If you want a more advanced/fully featured chat template you can use this jinja template
- Downloads last month
- 96
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the HF Inference API does not support vllm models with pipeline type image-text-to-text
Model tree for nintwentydo/pixtral-12b-2409-2of4-sparse
Base model
mistral-community/pixtral-12b