Great work! But here's a fine-tuned model called DeepScaleR which has a better performance, could you quantize it with NexaQuant from the original Q8_0?
I think Q8_0 and FP16 provide similar performance, so using Q8_0 may be even faster
· Sign up or log in to comment