Midnight Rose 70B v2.0.3 - GPTQ

a second attempt at quantizing a pre-trained model with AutoGPTQ, targetting sophosympatheia/Midnight-Rose-70B-v2.0.3

the base is a popular opensource model that scored highly on EQBench and was specifically designed for storytelling and roleplaying

drops the model size down from ~140->35GB

Notes

first (naive) attempt & notes on learnings here: sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ-naive

this time, i used the same VMWare/open-instruct calibration dataset that i saw TheBloke used in some of his quantizations

he describes its purpose here:

GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy.

Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).

running on an H100 on modal.com it took ~1.5hrs to quantize, costing roughly $10

code to perform the quantization here: https://github.com/OpenRouterTeam/openrouter-runner/pull/79

Comparison to naive version

i uh couldnt really tell a difference lol granted was just a brief vibe check. decided to just ship this version on openrouter.ai

Downloads last month
42
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ

Quantized
(5)
this model