Midnight Rose 70B v2.0.3 - GPTQ

a second attempt at quantizing a pre-trained model with AutoGPTQ, targetting sophosympatheia/Midnight-Rose-70B-v2.0.3

the base is a popular opensource model that scored highly on EQBench and was specifically designed for storytelling and roleplaying

drops the model size down from ~140->35GB

Notes

first (naive) attempt & notes on learnings here: sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ-naive

this time, i used the same VMWare/open-instruct calibration dataset that i saw TheBloke used in some of his quantizations

he describes its purpose here:

GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy.

Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).

running on an H100 on modal.com it took ~1.5hrs to quantize, costing roughly $10

code to perform the quantization here: https://github.com/OpenRouterTeam/openrouter-runner/pull/79

Comparison to naive version

i uh couldnt really tell a difference lol granted was just a brief vibe check. decided to just ship this version on openrouter.ai

sambarnes
/

Midnight-Rose-70B-v2.0.3-GPTQ

Midnight Rose 70B v2.0.3 - GPTQ

Notes

Comparison to naive version

Model tree for sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ