File size: 1,181 Bytes
2359bda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Natural Questions Models
[Google's Natural Questions dataset](https://ai.google.com/research/NaturalQuestions) constists of about 100k real search queries from Google with the respective, relevant passage from Wikipedia. Models trained on this dataset work well for question-answer retrieval.

## Usage

```python
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('nq-distilbert-base-v1')

query_embedding = model.encode('How many people live in London?')

#The passages are encoded as [ [title1, text1], [title2, text2], ...]
passage_embedding = model.encode([['London', 'London has 9,787,426 inhabitants at the 2011 census.']])

print("Similarity:", util.cos_sim(query_embedding, passage_embedding))
```

Note: For the passage, we have to encode the Wikipedia article title together with a text paragraph from that article.


## Performance
The models are evaluated on the Natural Questions development dataset using MRR@10.

| Approach       |  MRR@10 (NQ dev set small) |  
| ------------- |:-------------: |
| nq-distilbert-base-v1 | 72.36 |
| *Other models* | |
| [DPR](https://huggingface.co/transformers/model_doc/dpr.html) | 58.96 |