How to quantize the hunyuan model to fp8

#1
by hz094 - opened

Hi sir, Thank for the excellent work, I am curious about how you quantize the hunyuan model, may you show more details?

you need torch and llama.cpp; could try to convert the safetensors to gguf and test it first; simply execute: ggc t

Screenshot 2024-12-27 001107.png

Screenshot 2024-12-27 001148.png

actually, if you just want fp8, the updated node has a tool - tensor cutter, which will help you make your own fp8 scaled model (50% decreased in file size) in an easy way; you don't need llama.cpp or any extra dependency in that case

calcuis changed discussion status to closed
calcuis changed discussion status to open

Sign up or log in to comment