SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/model3")
# Run inference
sentences = [
    'When did he migrate to New South Wales',
    'He attended Derby Grammar School and Beaufort House in London, and migrated to New South Wales in 1883. He settled in Newcastle, where he worked as a shipping agent, eventually partnering with his brothers in a firm. On 6 May 1893 he married Gertrude Mary Saddington, with whom he had five children.',
    'Shizuka Shirakawa, Scholar of Chinese-language literature. Horin Fukuoji, Nihonga painter. 2005\n Mitsuko Mori. Actress. Makoto Saitō (1921–2008). Political scientist, specializing in American diplomatic and political history. Ryuzan Aoki, Ceramic artist. Toshio Sawada, Civil engineer. Shigeaki Hinohara, Doctor. 2006\n Yoshiaki Arata. A pioneer of nuclear fusion research. Jakuchō Setouchi. Writer/Buddhist nun. Hidekazu Yoshida. Music critic. Chusaku Oyama, Nihonga painter. Miyohei Shinohara, Economist. 2007\n Akira Mikazuki. Former justice minister and professor emeritus. Shinya Nakamura. Sculptor. Kōji Nakanishi. Organic chemist. Tokindo Okada, Developmental biologist. Shigeyama Sensaku, Kyogen performer. 2008\n Hironoshin Furuhashi (1928–2009). Sportsman and sports bureaucrat. Kiyoshi Itō. A mathematician whose work is now called Itō calculus. Donald Keene.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,808 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 17.85 tokens
    • max: 48 tokens
    • min: 6 tokens
    • mean: 186.46 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What type of tournament structure was used in this freestyle wrestling competition This freestyle wrestling competition consisted of a single-elimination tournament, with a repechage used to determine the winners of two bronze medals. Results
    Legend
    F — Won by fall

    Final

    Top half

    Bottom half

    Repechage

    References
    Official website

    Women's freestyle 58 kg
    World
    What was the status of Josip Broz Tito under the 1974 Constitution of Yugoslavia regarding his presidency 1 Wednesday, 22 April 1998. 2 (8.30 a.m.). 3 JUDGE CASSESE: Good morning. May I ask the
    4 Registrar to call out the case number, please. 5 THE REGISTRAR: Case number IT-95-13a-T,
    6 Prosecutor versus Slavko Dokmanovic. 7 MR. NIEMANN: My name is Niemann. I appear
    8 with my colleagues, Mr. Williamson, Mr. Waespi and
    9 Mr. Vos. 10 MR. FILA: My name is Mr. Toma Fila and
    11 I appear with Ms. Lopicic and Mr. Petrovic in Defence of
    12 my client, Mr. Slavko Dokmanovic. 13 JUDGE CASSESE: Mr. Dokmanovic, can you
    14 follow me? Before we call the witness, may I ask you
    15 whether you agree to this note from the Registrar about
    16 the two documents which we discussed yesterday -- you
    17 have probably received the English translation of the
    18 bibliography of our witness, plus the missing pages of
    19 the other document, so I think it is agreed that they
    20 can be admitted into evidence. 21 MR. NIEMANN: Yes. 22 JUDGE CASSESE: Shall we proceed with the
    24 MR. FILA: Your Honour, before we continue
    25 wi...
    How quickly can you get loan approval and funds transferred with Crawfort Then click on the submit button, and it’s done. Make your dream come true with Crawfort
    When you all submit the loan form, then the agency takes a few hours to process and for approval of the loan. Not only that, you can get your loan amount in your account within a day after getting approval. Many money lenders all take more time in processing things and to credit the amount as well. So, for all that, a customer suffers more as they can’t get the money immediately. But here all these things are not done, and the staff here always make sure to provide you best and fast services. For all these things, you can get the best loan services from here without any doubt.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,476 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 17.61 tokens
    • max: 47 tokens
    • min: 6 tokens
    • mean: 171.81 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What is Hector Guimard best known for Hector Guimard (, 10 March 1867 – 20 May 1942) was a French architect and designer, and a prominent figure of the Art Nouveau style. He achieved early fame with his design for the Castel Beranger, the first Art Nouveau apartment building in Paris, which was selected in an 1899 competition as one of the best new building facades in the city. He is best known for the glass and iron edicules or canopies, with ornamental Art Nouveau curves, which he designed to cover the entrances of the first stations of the Paris Metro. Between 1890 and 1930, Guimard designed and built some fifty buildings, in addition to one hundred and forty-one subway entrances for Paris Metro, as well as numerous pieces of furniture and other decorative works. However, in the 1910s Art Nouveau went out of fashion and by the 1960s most of his works had been demolished, and only two of his original Metro edicules were still in place. Guimard's critical reputation revived in the 1960s, in part due to subsequent acquisit...
    What does Mark Kantrowitz say about the inclusion of loans in financial aid packages "They don't always understand that part of the financial aid package includes loans," he says. But loans "don't really reduce your costs," explains Mark Kantrowitz, founder of the financial aid website FinAid.org and publisher of Edvisors Network. "They simply spread them out over time. ... A loan is a loan.
    How can Ayurveda support women's health during menopause Especially as we journey towards menopause, Ayurveda is there to support us with its millenary wisdom. These are some easy routines to incorporate for the daily care of the vulva and vagina, our most delicate flower. Sesame oil: our best allied against dryness, it cannot be missing in our diet.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.2033 100 0.2694 0.0690
0.4065 200 0.0822 0.0528
0.6098 300 0.0689 0.0497
0.8130 400 0.0644 0.0469
1.0163 500 0.0643 0.0443
1.2195 600 0.0378 0.0473
1.4228 700 0.04 0.0479
1.6260 800 0.0358 0.0461
1.8293 900 0.0332 0.0507
2.0325 1000 0.0283 0.0538

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.0
  • Transformers: 4.42.4
  • PyTorch: 2.2.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Infonce

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Jrinky/model3

Base model

BAAI/bge-m3
Finetuned
(220)
this model