SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/model4")
# Run inference
sentences = [
    'What is the significance of the first written mention of Metylovice, and in which year did it occur',
    'The Olešná Stream flows through the municipality. History\nThe first written mention of Metylovice is in a deed of Bishop Dětřich from 1299. From the second half of the 17th century, tanning developed in the village, thanks to which the originally agricultural village began to prosper and grow. Brick houses began to replace the original wooden ones and the education and cultural life of the inhabitants increased. Sights\nThe most important monument is the Church of All Saints.',
    'Users could also get discounts when they bought the coins in bulk and earn coins through certain apps on the Appstore. In 2014, with the release of the Fire Phone, Amazon offered app developers 500,000 Amazon Coins for each paid app or app with in-app purchasing developed and optimized for the Fire Phone.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 20,816 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 17.92 tokens
    • max: 42 tokens
    • min: 9 tokens
    • mean: 168.82 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What was the birth date and place of Helena Binder, also known as Blanche Blotto Born June 13, 1955 in Batavia, New York. Helena Binder, aka Blanche Blotto (keyboards, vocals; 1978-1980).
    What incidents involving Israeli soldiers occurred in the occupied West Bank on Tuesday Also Tuesday, Israeli soldiers fired a barrage of gas bombs and concussion grenades at a Palestinian home in the Masafer Yatta area, south of Hebron, in the southern part of the occupied West Bank, wounding an entire family, including children. On Tuesday evening, Israeli soldiers invaded the al-Maghayir village northeast of Ramallah, in the central West Bank, after many illegal colonizers attacked Palestinian cars. In related news, the soldiers shot three Palestinian construction workers near the illegal Annexation Wall, west of Hebron, in the southern part of the occupied West Bank, and abducted them.
    How was the Mosbrucher Maar formed, and when did it occur The Mosbrucher Weiher, also called the Mosbrucher Maar, is a silted up maar east of the municipal boundary of the village of Mosbruch in the county Vulkaneifel in Germany. It is located immediately at the foot of the 675-metre-high Hochkelberg, a former volcano. The floor of the maar is in the shape of an elongated oval and is about 700×500 metres in size, its upper boundary has a diameter of about 1,300 × 1,050 metres. This makes the Mosbrucher Maar the third largest of the maars in the western Eifel region. The Üßbach stream flows past and close to the Mosbrucher Weiher. Origin
    According to pollen analysis studies, the crater was formed about 11,000 years ago by a volcanic eruption. In the area around the maar there are very few volcanic tuffs in comparison to other Eifel maars; only in two places are there greater accumulations of tuff; the rest of the surrounding area is covered only by a thin layer.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,096 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 18.26 tokens
    • max: 574 tokens
    • min: 6 tokens
    • mean: 189.9 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What architectural features are present on the front and southern sides of the Martínez Adobe house The front and southern sides of the house have wooden wrap-around porches at each level. Wood shingles of either cedar or redwood originally covered the roof. The Martínez Adobe is now part of the John Muir National Historic Site and is open to the public. See also
    California Historical Landmarks in Contra Costa County
    National Register of Historic Places listings in Contra Costa County, California

    References

    Further reading
    Feasibility Report John Muir Home and Vicente Martinez Adobe, Martinez, California. (1963). United States: National Park Service, U.S. Department of the Interior. Western Regional Office. Vincent, G., Mariotti, J., Rubin, J. (2009). Pinole. United States: Arcadia Publishing.
    What are the cognitive aspects being assessed in relation to TBI, and how do they impact the rehabilitation services for individuals, including warfighters with hearing problems “Within AASC, we’ve been very proactive as part of interdisciplinary teams assessing TBI. Another area we’re looking at involves cognitive aspects associated with TBI and mild TBI and the best approach to providing rehabilitative services.”
    As with warfighters who return to duty – including combat – with prosthetic feet or legs, many with hearing problems also want to continue serving rather than accept medical discharges.
    What are the benefits mentioned by BIO President & CEO Jim Greenwood regarding the energy title programs in rural America BIO President & CEO Jim Greenwood said, “The important energy title programs authorized and funded in this bill are just beginning to have a positive impact in revitalizing rural America, fueling economic growth and creating well-paying opportunities where we need it most -- in manufacturing, energy, agriculture and forestry. These programs can also help meet our responsibilities to revitalize rural areas, reduce dependence on foreign oil, and renew economic growth.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.0961 100 0.2849 0.0915
0.1921 200 0.0963 0.0511
0.2882 300 0.069 0.0459
0.3842 400 0.0622 0.0445
0.4803 500 0.0544 0.0441
0.5764 600 0.0615 0.0418
0.6724 700 0.0573 0.0416
0.7685 800 0.0524 0.0435
0.8646 900 0.0523 0.0398

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.0
  • Transformers: 4.42.4
  • PyTorch: 2.2.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Infonce

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Jrinky/model4

Base model

BAAI/bge-m3
Finetuned
(220)
this model