Colbert Mode Usage

#41
by pulkitchahar - opened

I wanted to store the colbert embeddings for faster reranking of retrieval based on dense vec. But considering that if a document have 1024 tokens on average(trunc if more), I will have 1024*1024 matrix, the size of which if i use fp16 will be 2MB. That sounds huge, especially when I think about scaling up. Am I doing this right, or am I missing something? Are there any ways to decrease the size but still keep the performance similar to original.

i'm also interested

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment