Can BAAI/bge-reranker-v2-gemma be run quantized?

#3
by dophys - opened

Hello, I'm interested in bge-reranker based on gemma. A question is if this model could be run in a quantized form. This would greatly improve inference efficiency and reduce memory requirements.
I used torch to quantize this model (int8), but fragembedding doesn't seem to support running quantized models. Can anyone give me some guidance?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment