gguf / Quants
#3
by
PsiPi
- opened
Someone had to say it
I get it's tiny.
Guess there would be testing needed on how much quality is retained at a Q4 or Q5 level.
It would be particularly interesting to see if quantization affects the "vibe" of voice, and if the compression significantly impacts the frame rate which seems to be a key feature of how this model excels.
Yeah, the thought was a q8 to start and with the advent of the incoming 0.5 (and perhaps a 7?) quants may be less (or more) important,
In some case simply being in the correct format "wrapper" allows people to use their preferred tooling also.