lengocduc195
/

SentenceTransformer

pushNe

2359bda almost 2 years ago

1.18 kB

	# Natural Questions Models
	[Google's Natural Questions dataset](https://ai.google.com/research/NaturalQuestions) constists of about 100k real search queries from Google with the respective, relevant passage from Wikipedia. Models trained on this dataset work well for question-answer retrieval.

	## Usage

	```python
	from sentence_transformers import SentenceTransformer, util
	model = SentenceTransformer('nq-distilbert-base-v1')

	query_embedding = model.encode('How many people live in London?')

	#The passages are encoded as [ [title1, text1], [title2, text2], ...]
	passage_embedding = model.encode([['London', 'London has 9,787,426 inhabitants at the 2011 census.']])

	print("Similarity:", util.cos_sim(query_embedding, passage_embedding))
	```

	Note: For the passage, we have to encode the Wikipedia article title together with a text paragraph from that article.


	## Performance
	The models are evaluated on the Natural Questions development dataset using MRR@10.

	\| Approach \| MRR@10 (NQ dev set small) \|
	\| ------------- \|:-------------: \|
	\| nq-distilbert-base-v1 \| 72.36 \|
	\| Other models \| \|
	\| [DPR](https://huggingface.co/transformers/model_doc/dpr.html) \| 58.96 \|