File size: 1,181 Bytes
2359bda |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# Natural Questions Models
[Google's Natural Questions dataset](https://ai.google.com/research/NaturalQuestions) constists of about 100k real search queries from Google with the respective, relevant passage from Wikipedia. Models trained on this dataset work well for question-answer retrieval.
## Usage
```python
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('nq-distilbert-base-v1')
query_embedding = model.encode('How many people live in London?')
#The passages are encoded as [ [title1, text1], [title2, text2], ...]
passage_embedding = model.encode([['London', 'London has 9,787,426 inhabitants at the 2011 census.']])
print("Similarity:", util.cos_sim(query_embedding, passage_embedding))
```
Note: For the passage, we have to encode the Wikipedia article title together with a text paragraph from that article.
## Performance
The models are evaluated on the Natural Questions development dataset using MRR@10.
| Approach | MRR@10 (NQ dev set small) |
| ------------- |:-------------: |
| nq-distilbert-base-v1 | 72.36 |
| *Other models* | |
| [DPR](https://huggingface.co/transformers/model_doc/dpr.html) | 58.96 | |