|
# Natural Questions Models |
|
[Google's Natural Questions dataset](https://ai.google.com/research/NaturalQuestions) constists of about 100k real search queries from Google with the respective, relevant passage from Wikipedia. Models trained on this dataset work well for question-answer retrieval. |
|
|
|
## Usage |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer, util |
|
model = SentenceTransformer('nq-distilbert-base-v1') |
|
|
|
query_embedding = model.encode('How many people live in London?') |
|
|
|
#The passages are encoded as [ [title1, text1], [title2, text2], ...] |
|
passage_embedding = model.encode([['London', 'London has 9,787,426 inhabitants at the 2011 census.']]) |
|
|
|
print("Similarity:", util.cos_sim(query_embedding, passage_embedding)) |
|
``` |
|
|
|
Note: For the passage, we have to encode the Wikipedia article title together with a text paragraph from that article. |
|
|
|
|
|
## Performance |
|
The models are evaluated on the Natural Questions development dataset using MRR@10. |
|
|
|
| Approach | MRR@10 (NQ dev set small) | |
|
| ------------- |:-------------: | |
|
| nq-distilbert-base-v1 | 72.36 | |
|
| *Other models* | | |
|
| [DPR](https://huggingface.co/transformers/model_doc/dpr.html) | 58.96 | |