3.5bpw quant of 70b model

#1
by Nexesenex - opened

Hey LoneStriker,
I run a heterogeneous GPU setup with 36GB VRAM (3090+3060), and I wondered if you could provide a 3.50bpw exl2-2 quant of this LZLV model at least, and if possible, add it to the other 70b models that you quantize in exl2-2.
It would be optimal for my setting at 8k context in fp8 (and at least for a few others folks I spotted on Reddit with similar setup to mine), and spare me the purchase of a second 3090! ^^
Thanks for providing all these exl2 quants in any case!

Screenshot 2023-12-28 at 08-48-30 https __preview.redd.it_thjkpeb78g5c1.png width 664&format png&auto webp&s 111c217d35d208c653ab719b3fb7b74405fa30dd.png

I'll add it to the list of bpw. I can't go back and add this to all of the quants, though, as that would take ages to churn through. If you have list of favorite models, let me know though.

I reran a bunch of the requested 3.5bpw quants as well.

Thank you very much, Lonestriker.
I'm testing them all, plus many older models in Oobabooga.
I will report a big table of my perplexity results (for all of them, it's still a pertinent baseline indicator) and usage impression (for some of them) next week-end, including the quants that you made at my request and several others models you quantized.
For now, I can tell that Dophin 2.2 3.5bpw has a problem (low quality feeling and ppl of 6.6 vs 4.8 for others at 512ctx), while XWin 70b 3.5bpw seems to be broken (ppl 76000 at 512ctx and gibberish output, maybe there's a parameter I don't know about to set?), while the rest behaves as expected.
I also learned to install and use exllamav2 on my Windows setup and started to make my own quants, but on 1 to 14b models only because my hardware is still short for longer quants to realize. I'm hasty to see the wheel 0.0.12 out, because I'm unable to create the dev wheels by myself.
Also, I know it's some work to ask, but for the sake of testing, could you eventually provide a exl2-2 3.75bpw quant for lzlv 70b for my tests? I like very much that model due to the XWin base.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment