model3 / README.md
Jrinky's picture
Add new SentenceTransformer model
bb9e7a6 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:11808
  - loss:Infonce
base_model: BAAI/bge-m3
widget:
  - source_sentence: Who are some notable individuals named Roger Mason
    sentences:
      - >-
        Rav Kook's writings are extensive, and he is considered one of the most
        celebrated and influential rabbis of the 20th century. Some rabbis
        recommend that students of his begin studying his writings with Ein
        Ayah. References


        External links
         Ayin Ayah (full text), Hebrew Wikisource
         * Ayn Aya Classes in English

        Talmud

        Aggadic Midrashim

        Abraham Isaac Kook

        Hebrew-language religious books
      - >-
        Roger Mason may refer to:


        Roger Mason (baseball) (born 1958), American baseball player

        Roger Mason (geologist) (born 1941), discoverer of Ediacaran fossils

        Roger Mason Jr. (born 1980), American basketball player

        Roger Mason (musician), Australian keyboardist

        L. Roger Mason, Jr., former assistant director of National Intelligence
        for Systems and Resource Analyses
      - >-
        Timetabled passenger services on both lines had ceased by the end of
        February 1959. Shipping

        The Bourne-Morton Canal or Bourne Old Eau connected the town to the sea
        in Roman times. Until the mid-19th century, the present Bourne Eau was
        capable of carrying commercial boat traffic from the Wash coast and
        Spalding. This resulted from the investment following the Bourne
        Navigation Act of 1780. Passage became impossible once the junction of
        the Eau and the River Glen was converted from gates to a sluice in 1860.
        Media

        Local news and television programmes are provided by BBC Yorkshire and
        Lincolnshire and ITV Yorkshire. Television signals are received from the
        Belmont TV transmitter, the Waltham TV transmitter can also be received
        which broadcast BBC East Midlands and ITV Central programmes. Local
        radio stations are BBC Radio Lincolnshire, Greatest Hits Radio
        Lincolnshire and Lincs FM. The town's local newspapers are Bourne Local 
        and Stamford Mercury. Sport

        Bourne Town Football Club plays football in the United Counties Football
        League, whilst Bourne Cricket Club plays in the Lincolnshire ECB Premier
        League. These teams play their home games at the Abbey Lawn, a
        recreation ground privately owned by the Bourne United Charities. Motor
        sports


        The racing-car marques English Racing Automobiles (ERA) and British
        Racing Motors (BRM) were both founded in Bourne by Raymond Mays, an
        international racing driver and designer who lived in Bourne. The former
        ERA and BRM workshops in Spalding Road are adjacent to Eastgate House,
        the Mays' family home in the town's Eastgate. Landmarks


        There are currently 71 listed buildings in the parish of Bourne, the
        most important being Bourne Abbey and the Parish Church of St Peter and
        St Paul (1138), which is the only one scheduled Grade I. Notable people

        Bourne is reputedly the birthplace of Hereward the Wake (in about 1035),
        although the 12th-century source of this information, De Gestis Herwardi
        Saxonis, refers only to his father as being "of Bourne" and to the
        father's house and retainers there. Robert Mannyng (1264–1340) is
        credited with putting the speech of the ordinary people of his time into
        recognisable form. He is better known as Robert de Brunne because of his
        long period of residence as a canon at Bourne Abbey. There he completed
        his life's work of popularising religious and historical material in a
        Middle English dialect that was easily understood at that time. William
        Cecil (1520–1598) became the first Lord Burghley after serving Queen
        Elizabeth I. He was born at a house in the centre of Bourne that is now
        the Burghley Arms. Dr William Dodd (1729–1777), was an Anglican
        clergyman, man of letters and forger. He was prosecuted, sentenced to
        death and publicly hanged at Tyburn in 1777. Charles Frederick Worth
        (1825–1895), son of a solicitor, lived at Wake House in North Street. He
        moved to Paris and became a renowned designer of women's fashion and the
        founder of haute couture. The French government awarded him the Légion
        d'honneur. Sir George White (1840-1912), MP for North West Norfolk, a
        seat he held for twelve years until he died in 1912. He was knighted for
        public service in 1907.
  - source_sentence: What football team does the Japanese player play for
    sentences:
      - >-
        After the meeting, Box summons up the courage to ask Lorraine (Sue
        Holderness) on the date. The act ends with Robert's coat getting on fire
        because of the cigarette, with "Smoke Gets in Your Eyes" on the
        background.
      - is a Japanese football player. He plays for Honda Lock.
      - >-
        As followers on Twitter and FB probably well know I’ve been up to more
        than a spot of preserving of late. It’s my latest addiction, as if I
        need any more of those. My Dad’s the King of Jams, Chutneys and Pickles
        and I have a feeling he’s passed his enthusiastic genes for it on to
        me!. Which is great, but time consuming. Many an evening has been spent
        peeling, dicing, de-stoning, chopping, stirring, testing, sterilising
        and jarring. And then obviously the tasting. And all the crackers, bread
        and cheese to go with it!. I rarely get to bed much before midnight on
        my chutneying nights. And to be honest my cupboards are now fit to
        bursting with so many goodies, but at least I have christmas presents
        totally nailed this year. My Dad’s been making Hedgerow Chutney for
        years, and it happens to be everyone’s favourite of all his chutney
        recipes (and he makes quite a number!). Each autumn he takes a long walk
        around the field at the back of his house in Herefordshire picking all
        the freebie hedgerow goodies he can find and transforms them into this
        marvellously fruitful chutney. There’s always plenty of damsons,
        bullaces, sloes, blackberries and a few elderberries. Plus pears or
        apples for smoothing and bulking out. We don’t have quite the same fruit
        in our hedgerows in France but I thought I’d make my own French version
        picking the fruit from our garden and nearby tracks and lanes, managing
        to find plenty of figs, greengages, plums, pears, blackberries and sloes
        just before the season finished a couple of weeks ago. We’ve
        elderberries here too but they were way past their best by the time I
        got into full chutney mode. There’s no escaping how time consuming and
        labourious chutney making can be, especially when using so much fruit
        that needs hefty preparatory work. I realise now why it’s a hobby
        generally taken up by retired folk. But the results are so worth it, if
        you can spare it set aside a whole evening in the kitchen and wile away
        the hours getting lost in music or the radio or even catching up on a
        few programmes on You Tube.
  - source_sentence: What is the purpose of Business Intelligence
    sentences:
      - >-
        College career

        Proctor played as a defensive lineman for the North Carolina Central
        Eagles from 2008 to 2012. He was redshirted in 2008.
      - >-
        The purpose of Business Intelligence is the transformation of raw data
        into meaningful information which can be used to make better business
        decisions. Business Intelligence grew out of Decision Support systems
        and is all about collecting data from disparate sources, conforming and
        integrating that data into central repositories which support reporting
        and analysis activities.
      - >-
        You have to show the police courtesy, they are only human. No one even
        WANTS for the judicial system to work. They are too lazy.
  - source_sentence: How does the speaker feel about Battle Symphony
    sentences:
      - >-
        It's a symptomless prearranged fact that when you afford your babe a
        infant work you motivate the status system, bolster the infant's
        stressed system, eat up colic, and harden your in bondage next to your
        kid. Now, how satisfying is that
      - "Piquet passed Laffite to become the race's fifth different leader. Senna reached second just 1.7 seconds behind Piquet by passing Laffite, who then pitted for tires. With the two of them in front on their own, and Piquet leading by up to 3.5 seconds, Senna was content for the time being to follow his countryman. After eight laps in the lead, Piquet pitted for tires. Senna regained first place and then also pitted. Piquet's 18.4 second stop was even slower than teammate Mansell's had been, but when he returned to the track, the two-time champion got the bit between his teeth. Running second behind Senna, Piquet set the fastest lap of the race on lap 41, but with a pit stop ten seconds quicker than Piquet's, Senna was able to retain the lead. On the very next lap, the 42nd, Piquet pushed a bit too much, and crashed hard at the left-hand corner before the last chicane. He ended up in the tire barrier, unhurt, but with his car in a very precarious position. The crane, present for just that reason, was unable to move the car. Arnoux, now 16.6 seconds behind in second, took a second a lap off Senna's lead for five laps while a yellow was displayed in the corner where Piquet had crashed. As soon as the yellow flag was gone, Arnoux went wide and hit Piquet's abandoned Williams! The Frenchman decided that his car was not damaged, and attempted to rejoin the field, but did so right in front of Thierry Boutsen's Arrows-BMW, sidelining both cars. Very uncharacteristic of a street race, these three\_– Piquet, Arnoux and Boutsen\_– were the only drivers all afternoon to retire due to accidents."
      - Like Battle Symphony, it's not bad. It's just extremely boring.
  - source_sentence: When did he migrate to New South Wales
    sentences:
      - >-
        predict ministry in a sales and special floor being Job to the
        vulnerability diver. team: This research will work last for either,
        often, and also obtaining spreadsheets in the funny wedding power of the
        usability time. Physical Demands: The exclusive transitions was
        temporarily need perfect of those that must share developed by an
        position to badly do the animal objectives of this source. necessary
        terabytes may pay acted to increase streets with hearts to address the
        professional items. solely, the job will distract, Coordinate and be
        inbox security fun interdisciplinary operations that might read in back
        of 20 updates The service will properly be to like the detection
        throughout the use: logging, including, killing, teaching, leading,
        preparing, operating, and using.
      - >-
        Shizuka Shirakawa, Scholar of Chinese-language literature. Horin
        Fukuoji, Nihonga painter. 2005
         Mitsuko Mori. Actress. Makoto Saitō (1921–2008). Political scientist, specializing in American diplomatic and political history. Ryuzan Aoki, Ceramic artist. Toshio Sawada, Civil engineer. Shigeaki Hinohara, Doctor. 2006
         Yoshiaki Arata. A pioneer of nuclear fusion research. Jakuchō Setouchi. Writer/Buddhist nun. Hidekazu Yoshida. Music critic. Chusaku Oyama, Nihonga painter. Miyohei Shinohara, Economist. 2007
         Akira Mikazuki. Former justice minister and professor emeritus. Shinya Nakamura. Sculptor. Kōji Nakanishi. Organic chemist. Tokindo Okada, Developmental biologist. Shigeyama Sensaku, Kyogen performer. 2008
         Hironoshin Furuhashi (1928–2009). Sportsman and sports bureaucrat. Kiyoshi Itō. A mathematician whose work is now called Itō calculus. Donald Keene.
      - >-
        He attended Derby Grammar School and Beaufort House in London, and
        migrated to New South Wales in 1883. He settled in Newcastle, where he
        worked as a shipping agent, eventually partnering with his brothers in a
        firm. On 6 May 1893 he married Gertrude Mary Saddington, with whom he
        had five children.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jrinky/model3")
# Run inference
sentences = [
    'When did he migrate to New South Wales',
    'He attended Derby Grammar School and Beaufort House in London, and migrated to New South Wales in 1883. He settled in Newcastle, where he worked as a shipping agent, eventually partnering with his brothers in a firm. On 6 May 1893 he married Gertrude Mary Saddington, with whom he had five children.',
    'Shizuka Shirakawa, Scholar of Chinese-language literature. Horin Fukuoji, Nihonga painter. 2005\n Mitsuko Mori. Actress. Makoto Saitō (1921–2008). Political scientist, specializing in American diplomatic and political history. Ryuzan Aoki, Ceramic artist. Toshio Sawada, Civil engineer. Shigeaki Hinohara, Doctor. 2006\n Yoshiaki Arata. A pioneer of nuclear fusion research. Jakuchō Setouchi. Writer/Buddhist nun. Hidekazu Yoshida. Music critic. Chusaku Oyama, Nihonga painter. Miyohei Shinohara, Economist. 2007\n Akira Mikazuki. Former justice minister and professor emeritus. Shinya Nakamura. Sculptor. Kōji Nakanishi. Organic chemist. Tokindo Okada, Developmental biologist. Shigeyama Sensaku, Kyogen performer. 2008\n Hironoshin Furuhashi (1928–2009). Sportsman and sports bureaucrat. Kiyoshi Itō. A mathematician whose work is now called Itō calculus. Donald Keene.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,808 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 17.85 tokens
    • max: 48 tokens
    • min: 6 tokens
    • mean: 186.46 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What type of tournament structure was used in this freestyle wrestling competition This freestyle wrestling competition consisted of a single-elimination tournament, with a repechage used to determine the winners of two bronze medals. Results
    Legend
    F — Won by fall

    Final

    Top half

    Bottom half

    Repechage

    References
    Official website

    Women's freestyle 58 kg
    World
    What was the status of Josip Broz Tito under the 1974 Constitution of Yugoslavia regarding his presidency 1 Wednesday, 22 April 1998. 2 (8.30 a.m.). 3 JUDGE CASSESE: Good morning. May I ask the
    4 Registrar to call out the case number, please. 5 THE REGISTRAR: Case number IT-95-13a-T,
    6 Prosecutor versus Slavko Dokmanovic. 7 MR. NIEMANN: My name is Niemann. I appear
    8 with my colleagues, Mr. Williamson, Mr. Waespi and
    9 Mr. Vos. 10 MR. FILA: My name is Mr. Toma Fila and
    11 I appear with Ms. Lopicic and Mr. Petrovic in Defence of
    12 my client, Mr. Slavko Dokmanovic. 13 JUDGE CASSESE: Mr. Dokmanovic, can you
    14 follow me? Before we call the witness, may I ask you
    15 whether you agree to this note from the Registrar about
    16 the two documents which we discussed yesterday -- you
    17 have probably received the English translation of the
    18 bibliography of our witness, plus the missing pages of
    19 the other document, so I think it is agreed that they
    20 can be admitted into evidence. 21 MR. NIEMANN: Yes. 22 JUDGE CASSESE: Shall we proceed with the
    24 MR. FILA: Your Honour, before we continue
    25 wi...
    How quickly can you get loan approval and funds transferred with Crawfort Then click on the submit button, and it’s done. Make your dream come true with Crawfort
    When you all submit the loan form, then the agency takes a few hours to process and for approval of the loan. Not only that, you can get your loan amount in your account within a day after getting approval. Many money lenders all take more time in processing things and to credit the amount as well. So, for all that, a customer suffers more as they can’t get the money immediately. But here all these things are not done, and the staff here always make sure to provide you best and fast services. For all these things, you can get the best loan services from here without any doubt.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,476 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 17.61 tokens
    • max: 47 tokens
    • min: 6 tokens
    • mean: 171.81 tokens
    • max: 1024 tokens
  • Samples:
    anchor positive
    What is Hector Guimard best known for Hector Guimard (, 10 March 1867 – 20 May 1942) was a French architect and designer, and a prominent figure of the Art Nouveau style. He achieved early fame with his design for the Castel Beranger, the first Art Nouveau apartment building in Paris, which was selected in an 1899 competition as one of the best new building facades in the city. He is best known for the glass and iron edicules or canopies, with ornamental Art Nouveau curves, which he designed to cover the entrances of the first stations of the Paris Metro. Between 1890 and 1930, Guimard designed and built some fifty buildings, in addition to one hundred and forty-one subway entrances for Paris Metro, as well as numerous pieces of furniture and other decorative works. However, in the 1910s Art Nouveau went out of fashion and by the 1960s most of his works had been demolished, and only two of his original Metro edicules were still in place. Guimard's critical reputation revived in the 1960s, in part due to subsequent acquisit...
    What does Mark Kantrowitz say about the inclusion of loans in financial aid packages "They don't always understand that part of the financial aid package includes loans," he says. But loans "don't really reduce your costs," explains Mark Kantrowitz, founder of the financial aid website FinAid.org and publisher of Edvisors Network. "They simply spread them out over time. ... A loan is a loan.
    How can Ayurveda support women's health during menopause Especially as we journey towards menopause, Ayurveda is there to support us with its millenary wisdom. These are some easy routines to incorporate for the daily care of the vulva and vagina, our most delicate flower. Sesame oil: our best allied against dryness, it cannot be missing in our diet.
  • Loss: selfloss.Infonce with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.2033 100 0.2694 0.0690
0.4065 200 0.0822 0.0528
0.6098 300 0.0689 0.0497
0.8130 400 0.0644 0.0469
1.0163 500 0.0643 0.0443
1.2195 600 0.0378 0.0473
1.4228 700 0.04 0.0479
1.6260 800 0.0358 0.0461
1.8293 900 0.0332 0.0507
2.0325 1000 0.0283 0.0538

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.0
  • Transformers: 4.42.4
  • PyTorch: 2.2.0+cu121
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Infonce

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}