gguf / Quants

#3
by PsiPi - opened

Someone had to say it
I get it's tiny.

Guess there would be testing needed on how much quality is retained at a Q4 or Q5 level.
It would be particularly interesting to see if quantization affects the "vibe" of voice, and if the compression significantly impacts the frame rate which seems to be a key feature of how this model excels.

Yeah, the thought was a q8 to start and with the advent of the incoming 0.5 (and perhaps a 7?) quants may be less (or more) important,
In some case simply being in the correct format "wrapper" allows people to use their preferred tooling also.

Sign up or log in to comment