Update README.md
Browse files
README.md
CHANGED
@@ -177,7 +177,7 @@ The below numbers are with mDPR model, but miniDense_arabic_v1 should give a eve
|
|
177 |
|
178 |
*Note: MIRACL paper shows a different (higher) value for BM25 Arabic, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
|
179 |
|
180 |
-
# MTEB numbers:
|
181 |
MTEB is a general purpose embedding evaluation benchmark covering wide range of tasks, but miniDense models (like BGE-M3) are predominantly tuned for retireval tasks aimed at search & IR based usecases.
|
182 |
So it makes sense to evaluate our models in retrieval slice of the MTEB benchmark.
|
183 |
|
@@ -185,13 +185,22 @@ So it makes sense to evaluate our models in retrieval slice of the MTEB benchmar
|
|
185 |
|
186 |
Refer tables above
|
187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
188 |
#### Long Document Retrieval
|
189 |
|
190 |
This is very ambitious eval because we have not trained for long context, the max_len was 512 for all the models below except BGE-M3 which had 8192 context and finetuned for long doc.
|
191 |
|
192 |
<center>
|
193 |
<img src="./ar_metrics_4.png" width=150%/>
|
194 |
-
<b><p>Table
|
195 |
</center>
|
196 |
|
197 |
|
@@ -202,7 +211,7 @@ This explains it's overall competitive performance when compared to models that
|
|
202 |
|
203 |
<center>
|
204 |
<img src="./ar_metrics_5.png" width=120%/>
|
205 |
-
<b><p>Table
|
206 |
</center>
|
207 |
|
208 |
<br/>
|
|
|
177 |
|
178 |
*Note: MIRACL paper shows a different (higher) value for BM25 Arabic, So we are taking that value from BGE-M3 paper, rest all are form the MIRACL paper.*
|
179 |
|
180 |
+
# MTEB Retrieval numbers:
|
181 |
MTEB is a general purpose embedding evaluation benchmark covering wide range of tasks, but miniDense models (like BGE-M3) are predominantly tuned for retireval tasks aimed at search & IR based usecases.
|
182 |
So it makes sense to evaluate our models in retrieval slice of the MTEB benchmark.
|
183 |
|
|
|
185 |
|
186 |
Refer tables above
|
187 |
|
188 |
+
#### Sadeem Question Retrieval
|
189 |
+
|
190 |
+
<center>
|
191 |
+
<img src="./ar_metrics_6.png" width=150%/>
|
192 |
+
<b><p>Table 3: Detailed Arabic retrieval performance on the SadeemQA eval set (measured by nDCG@10)</p></b>
|
193 |
+
</center>
|
194 |
+
|
195 |
+
|
196 |
+
|
197 |
#### Long Document Retrieval
|
198 |
|
199 |
This is very ambitious eval because we have not trained for long context, the max_len was 512 for all the models below except BGE-M3 which had 8192 context and finetuned for long doc.
|
200 |
|
201 |
<center>
|
202 |
<img src="./ar_metrics_4.png" width=150%/>
|
203 |
+
<b><p>Table 4: Detailed Arabic retrieval performance on the MultiLongDoc dev set (measured by nDCG@10)</p></b>
|
204 |
</center>
|
205 |
|
206 |
|
|
|
211 |
|
212 |
<center>
|
213 |
<img src="./ar_metrics_5.png" width=120%/>
|
214 |
+
<b><p>Table 5: Detailed Arabic retrieval performance on the 3 X-lingual test set (measured by nDCG@10)</p></b>
|
215 |
</center>
|
216 |
|
217 |
<br/>
|