Qwen/Qwen2.5-7B-Instruct-1M please...

by koesn - opened Jun 17

Jun 17

Your quants are amazing on my 3060, achieved prompt processing speed at >15.000 token/sec when running high concurrent request. I've read that INT8 is optimized on Ampere cards. I hope you quantize also Qwen/Qwen2.5-7B-Instruct-1M. Thank's.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment