gpt2 quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/gpt2-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/gpt2-AutoGPTQ-4bit-128g 26.5000 25.1875 1.3125
smpanaro/gpt2-medium-AutoGPTQ-4bit-128g 19.1719 18.4739 0.698
smpanaro/gpt2-large-AutoGPTQ-4bit-128g 16.6875 16.4541 0.2334
smpanaro/gpt2-xl-AutoGPTQ-4bit-128g 14.9297 14.7951 0.1346
Wikitext perplexity measured as in the huggingface docs, lower is better
Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train smpanaro/gpt2-AutoGPTQ-4bit-128g

Collection including smpanaro/gpt2-AutoGPTQ-4bit-128g