TheDrummer's GLM Steam quantized using ik_llama.cpp

first attempt at quantizing something "on my own"

ive tried using both bartowski's and mradermacher's imatrix files but wasn't able to use any of them and had to make one myself from the guide (skill issue)

this quant requires ik_llama.cpp fork to work properly

followed ubergarm's quant cookers basic guide but since i had no idea what i was doing i just copied his recipes and applied it on TheDrummer's model

also used general calibration data instead of rp focused so performance may suffer a bit

feel free to roast me if i messed something up (which i certainly did)

Downloads last month
21
GGUF
Model size
110B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for StarFighter12/GLM-Steam-106B-A12B-v1-GGUF

Quantized
(8)
this model