InterLM2-Chat NF4 Quant

Usage

As of 2024/1/17, Transformers must be installed from source and bitsandbytes >=0.42.0 is required in order to load serialized 4-bit quants.

pip install -U git+https://github.com/huggingface/transformers bitsandbytes

Quantization config

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

Not necessary for inference, just load the model without specifying any quantization/load_in_*bit.

Model Details

Downloads last month
9
Safetensors
Model size
10.8B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.