---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1340
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: Can you tell me about the origin of the word 'Shehnai'?
sentences:
- Krishan Kant (28 February 1927 – 27 July 2002) was the tenth Vice President of
India from 1997 until his death. Previously, he was Governor of Andhra Pradesh
from 1990 to 1997.
- Acherontia lachesis is a large (up to 13 cm wingspan) Sphingid moth found in India
and much of the Oriental region, one of the three species of Death's-head Hawkmoth,
also known as the "Bee Robber".
- A Shehnai is a South Asian music instrument which is normally played at marriages
and other ceremonies, rites and rituals. The word itself is of Muslim/Turkish
origin, combining 'Sheh' (or 'Shah') 'Royal' and '-Nai' or 'Ney', a type of Flute.
A version of the "Shehnai", the "Surnai", is also played in the Northern and North-western
areas of India and Pakistan, in particular at traditional Polo matches.
- source_sentence: How do scammers typically operate in these scams?
sentences:
- The Singapore strategy was a strategy about defending the British Empire in the
Asian Far East, mainly against the Empire of Japan. The strategy involved a number
of different plans and stages, developed between 1919 and 1941. The basic idea
was to base a fleet of ships in the Far East. This fleet could then be used to
stop and defeat a Japanese force heading towards India or Australia. In 1919,
Singapore was chosen because of its strategic location at the end of the Strait
of Malacca.
- The Non-Cooperation Movement was a significant phase of the Indian independence
movement from British rule. It was led by Mohandas Karamchand Gandhi after the
Jallianwala Bagh Massacre. It aimed to resist British rule in India through non-violent
means or "satyagraha". Protestors would refuse to buy British goods, adopt nihal
use of local handicrafts and picket and liquor shops. The ideas of Ahimsa and
nonviolence, and Gandhi's ability to rally hundreds of thousands of common citizens
towards the cause of Indian independence, were first seen on a large scale in
this movement through the summer 1920. Gandhi feared that the movement might lead
to popular violence. The non-cooperation movement was launched on 12th August,
1921.
- A technical support scam is a form of telephone fraud that tricks people by pretending
that they are a service which helps people fix their computers. In most cases
they convince the victim they have a computer problem that does not actually exist.
A common type is when someone gets a call from someone (usually from places like
India or Pakistan) pretending to be from a company that sounds real such as "Microsoft"
or "Windows" support. Often the caller tries to gain the victim's trust. They
may use confusing and very technical language to sound authentic. They may ask
the victim to perform several tasks on their computer. Often they target legitimate
files on the victim's computer saying these are viruses. These tactics are designed
to scare people into letting the scammer fix the problem (that does not really
exist). The caller may have the victim install malicious software that could capture
sensitive data, such as online banking passwords or credit card information.
- source_sentence: How is Northeast India connected to the rest of India?
sentences:
- Air India Flight 182 was a passenger plane which, on June 23, 1985, exploded from
a bomb that was placed on the plane. The aircraft was going between Montréal-Mirabel
International Airport, Montreal, Quebec, and New Delhi, India. It was an Air India
Boeing 747-237B, registration VT-EFO. The bombing was called the largest mass
murder in modern Canadian history, and the deadliest act of air terrorism before
9/11.
- Hinduism is not only a religion but also a way of life. Hinduism is widely practiced
in South Asia mainly in India and Nepal. Hinduism is the oldest religion in the
world, and Hindus refer to it as "", "the eternal tradition," or the "eternal
way," beyond human history. Scholars regard Hinduism as a combination of different
Indian cultures and traditions, with diverse roots. Hinduism has no founder and
origins of Hinduism is unknown. What we now call Hinduism have roots in cave paintings
that have been preserved from Mesolithic sites dating from c. 30,000 BCE in Bhimbetka,
near present-day Bhopal, in the Vindhya Mountains in the Madhya Pradesh." There
was no concept of religion in India and Hinduism was not a religion. Hinduism
as a religion started to develop between 500 BCE and 300 CE, after the Vedic period
(1500 BCE to 500 BCE).
- Various groups are involved in the Insurgency in Northeast India, India's northeast
states, which are connected to the rest of the Republic of India by a narrow strip
of land known as the Siliguri Corridor. In the region several armed factions operate.
Some groups call for a separate state, others for regional autonomy, while some
extreme groups demand complete independence.
- source_sentence: How many songs did Rafi sing during his career?
sentences:
- Inder Kumar Gujral (4 December 1919 – 30 November 2012) was an Indian politician.
He was the 12th Prime Minister of India from April 1997 to March 1998. Gujral
was the third Prime Minister to be from the Rajya Sabha.
- Mohammed Rafi (, , December 24, 1924 – July 31, 1980) was a popular Bollywood
playback singer. In a career of over 40 years, Rafi sang more than 26,000 songs
in the national languages of India and sometimes in other languages.
- The University of Calcutta (informally known as Calcutta University or CU) is
a public state university located in Kolkata (formerly "Calcutta"), West Bengal,
India. It was created on 24 January 1857. Within India it is recognized as a "Five-Star
University" and a "Centre with Potential for Excellence" by the University Grants
Commission and the National Assessment and Accreditation Council.
- source_sentence: Who was Lal Bahadur Shastri?
sentences:
- The Bharatiya Janata Party (abbreviated BJP) is one of the two major political
parties in India. (The second being the Indian National Congress). Since the Indian
elections in 2014, the BJP has 303 of the 542 seats in the Lok Sabha, the lower
house of the Parliament of India and 78 of the 238 seats in Rajya Sabha, the upper
house of the Parliament of India. Amit Shah is the national president of BJP since
2014.
- Rex Vernon Whitehead (26 October 1948 – 26 June 2014) was an Australian Test cricket
match umpire and cricketer. He umpired four Test matches between 1981 and 1982.
His first match was between Australia and India in Sydney on 2 January to 4 January
1981. Altogether, he umpired 15 first-class matches in his career between 1979
and 1983.
- Lal Bahadur Shastri (, , 2 October 1904 – 11 January 1966) was an Indian politician.
He was the 2nd Prime Minister of India from 1964 to 1966. He was a senior leader
of the Indian National Congress political party.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on BAAI/bge-base-en-v1.5
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sunjupskilling/sunj-bge-base-en-v1.5")
# Run inference
sentences = [
'Who was Lal Bahadur Shastri?',
'Lal Bahadur Shastri (, , 2 October 1904\xa0– 11 January 1966) was an Indian politician. He was the 2nd Prime Minister of India from 1964 to 1966. He was a senior leader of the Indian National Congress political party.',
'Rex Vernon Whitehead (26 October 1948 – 26 June 2014) was an Australian Test cricket match umpire and cricketer. He umpired four Test matches between 1981 and 1982. His first match was between Australia and India in Sydney on 2 January to 4 January 1981. Altogether, he umpired 15 first-class matches in his career between 1979 and 1983.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 1,340 training samples
* Columns: question
and context
* Approximate statistics based on the first 1000 samples:
| | question | context |
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
What is Basil commonly known as?
| Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60 cm tall. It has light green, silky leaves 3–5 cm long and 1–3 cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.
|
| Where is Basil originally native to?
| Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60 cm tall. It has light green, silky leaves 3–5 cm long and 1–3 cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.
|
| What is the significance of the Roerich Pact?
| The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten nations: Brazil, Chile, Colombia, Cuba, the Dominican Republic, El Salvador, Guatemala, Mexico, the United States, and Venezuela. It went into effect on 26 August 1935. The Government of India approved the Treaty in 1948, but did not take any further formal action. The Roerich Pact is also known as "Pax Cultura" ("Cultural Peace" or "Peace through Culture"). The most important part of the Roerich Pact is the legal recognition that the protection of culture is always more important than any military necessity.
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 100 evaluation samples
* Columns: question
and context
* Approximate statistics based on the first 100 samples:
| | question | context |
|:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details | What are the bases of political relations between India and Ireland?
| Indo-Irish relations between the Republic of Ireland and the Republic of India picked up steam during the freedom struggles of the respective countries against a common imperial empire in the United Kingdom. Political relations between the two states have largely been based on socio-cultural ties, although political and economic ties have also helped build relations. Indians recognise Northern Ireland as part of its country.
|
| When did Rex Whitehead umpire his first Test match?
| Rex Vernon Whitehead (26 October 1948 – 26 June 2014) was an Australian Test cricket match umpire and cricketer. He umpired four Test matches between 1981 and 1982. His first match was between Australia and India in Sydney on 2 January to 4 January 1981. Altogether, he umpired 15 first-class matches in his career between 1979 and 1983.
|
| What can you tell me about Nayaganj?
| Nayaganj is a village in Vaishali District, Bihar, India. It is very close to the river Ganga. It is also a postal office of India
|
* Loss: [MultipleNegativesRankingLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `learning_rate`: 3e-06
- `weight_decay`: 0.03
- `max_steps`: 332
- `warmup_ratio`: 0.1
- `warmup_steps`: 1
- `fp16`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters