We release an empirical study to showcase "How Good Are Low-bit Quantized hashtag#LLaMA3 🦙 Models" with existing LLM quantization techniques!
In this study, the performance of the low-bit LLaMA3 models (especially LLaMA3-70B) is impressively notable. 🚀 However, the results also exposed significant performance degradation issues faced by existing quantization techniques when dealing with LLaMA3, especially under ultra-low bit-width.
We hope this study can serve as a reference for the LLM quantization community and promote the emergence of stronger LLM quantization methods in the context of LLaMA3's release. More work is on the way...