How to quantize the hunyuan model to fp8
#1
by
hz094
- opened
Hi sir, Thank for the excellent work, I am curious about how you quantize the hunyuan model, may you show more details?
you need torch and llama.cpp; could try to convert the safetensors to gguf and test it first; simply execute: ggc t
actually, if you just want fp8, the updated node has a tool - tensor cutter
, which will help you make your own fp8 scaled model (50% decreased in file size) in an easy way; you don't need llama.cpp or any extra dependency in that case
calcuis
changed discussion status to
closed
calcuis
changed discussion status to
open