viggypoker1's picture
Add new SentenceTransformer model.
3b697a3 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:311351
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: >-
      What specialized services does Equifax's Workforce Solutions segment
      offer?
    sentences:
      - >-
        Supermarkets are generally operated under one of the following formats:
        combination food and drug stores ('combo stores'); multi-department
        stores; marketplace stores; or price impact warehouses.
      - >-
        Workforce Solutions — provides services enabling customers to verify
        income, employment, educational history, criminal justice data,
        healthcare professional licensure and sanctions of people in the U.S.
        (Verification Services), as well as providing our employer customers
        with services which include unemployment claims management, I-9 and
        onboarding services, Affordable Care Act compliance management, tax
        credits and incentives and other complementary employment-based
        transaction services (Employer Services)
      - >-
        International Business Machines Corporation (IBM or the company) was
        incorporated in the State of New York on June 16, 1911, as the
        Computing-Tabulating-Recording Co. (C-T-R).
  - source_sentence: >-
      What factors contributed to the increase in operating income for the
      Company in 2023?
    sentences:
      - >-
        Operating income increased $5.8 billion, or 72.8%, in 2023 compared to
        2022. The increase in operating income was primarily driven by the
        absence of $5.8 billion of opioid litigation charges recorded in 2022
        and increases in the Pharmacy & Consumer Wellness segment, primarily
        driven by the absence of a $2.5 billion loss on assets held for sale
        recorded in 2022 related to the write-down of the Company’s Omnicare®
        long-term care business which was partially offset by continued pharmacy
        reimbursement pressure and decreased COVID-19 vaccinations and
        diagnostic testing compared to 2022, as well as an increase in the
        Health Services segment.
      - >-
        Pennsylvania law requires that the Office of Attorney General be
        provided advance notice of any transaction that would result in Hershey
        Trust Company, as trustee for the Trust, no longer having voting control
        of the Company.
      - >-
        In 2023, UnitedHealthcare invested $3,386 million in property,
        equipment, and capitalized software.
  - source_sentence: >-
      What event took place in September 2021 involving the Company and the
      counsel representing plaintiffs?
    sentences:
      - >-
        Item 8, which requires the inclusion of financial statements and
        supplementary data, directs readers to Item 15(a) for this information.
      - >-
        In September 2021, the Company entered into a settlement in principle
        with the counsel representing plaintiffs in this matter and in
        substantially all of the outstanding cases in the United States. The
        costs associated with this and other settlements are reflected in the
        Company’s accruals.
      - >-
        GM empowers employees to 'Speak Up for Safety' through the Employee
        Safety Concern Process which makes it easier for employees to report
        potential safety issues or suggest improvements without fear of
        retaliation and ensures their safety every day.
  - source_sentence: >-
      What was the total cash consideration for Comcast's acquisition of Masergy
      in October 2021?
    sentences:
      - >-
        In October 2021, Comcast acquired Masergy, a provider of
        software-defined networking and cloud platforms for global enterprises,
        for a total cash consideration of $1.2 billion.
      - >-
        The net unit growth for Hilton in the year ended December 31, 2023, was
        4.9 percent.
      - >-
        Financial Statements and Supplementary Data are addressed in Item 8 of
        the financial document.
  - source_sentence: >-
      What does the term 'Acquired brands' refer to and how does it affect the
      reported volumes?
    sentences:
      - >-
        Phrases such as 'anticipates', 'believes', 'estimates', 'seeks',
        'expects', 'plans', 'intends', 'remains', 'positions', and similar
        expressions are intended to identify forward-looking statements related
        to the company or management.
      - >-
        'Acquired brands' refers to brands acquired during the past 12 months.
        Typically, the Company has not reported unit case volume or recognized
        concentrate sales volume related to acquired brands in periods prior to
        the closing of a transaction. Therefore, the unit case volume and
        concentrate sales volume related to an acquired brand are incremental to
        prior year volume.
      - >-
        The Company made matching contributions to employee accounts in
        connection with the 401(k) plan of $37.3 million in fiscal 2023, $37.9
        million in fiscal 2022 and $34.1 million in fiscal 2021.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: Vignesh finetuned bge2
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8414285714285714
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8785714285714286
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.92
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.28047619047619043
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17571428571428568
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09199999999999998
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8414285714285714
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8785714285714286
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.92
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8129831819187487
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7784263038548753
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7817486756411115
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.6914285714285714
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.84
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8857142857142857
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9242857142857143
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6914285714285714
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.28
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1771428571428571
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09242857142857142
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6914285714285714
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.84
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8857142857142857
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9242857142857143
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.812081821657879
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7757766439909298
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7786577115899984
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.69
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8285714285714286
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8728571428571429
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9142857142857143
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.69
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.27619047619047615
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17457142857142854
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09142857142857141
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.69
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8285714285714286
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8728571428571429
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9142857142857143
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.8040804108630832
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.768536281179138
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7719825285723502
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.67
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8171428571428572
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8657142857142858
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9071428571428571
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.67
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2723809523809524
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17314285714285713
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0907142857142857
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.67
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.8171428571428572
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8657142857142858
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9071428571428571
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7904898848742749
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7528854875283444
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7566672358984098
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6314285714285715
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7942857142857143
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8385714285714285
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8828571428571429
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6314285714285715
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.26476190476190475
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.16771428571428568
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08828571428571427
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6314285714285715
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7942857142857143
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.8385714285714285
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8828571428571429
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7591380417514834
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7191768707482988
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.7235543749437979
            name: Cosine Map@100

Vignesh finetuned bge2

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("viggypoker1/Vignesh-finetuned-bge2")
# Run inference
sentences = [
    "What does the term 'Acquired brands' refer to and how does it affect the reported volumes?",
    "'Acquired brands' refers to brands acquired during the past 12 months. Typically, the Company has not reported unit case volume or recognized concentrate sales volume related to acquired brands in periods prior to the closing of a transaction. Therefore, the unit case volume and concentrate sales volume related to an acquired brand are incremental to prior year volume.",
    'The Company made matching contributions to employee accounts in connection with the 401(k) plan of $37.3 million in fiscal 2023, $37.9 million in fiscal 2022 and $34.1 million in fiscal 2021.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7
cosine_accuracy@3 0.8414
cosine_accuracy@5 0.8786
cosine_accuracy@10 0.92
cosine_precision@1 0.7
cosine_precision@3 0.2805
cosine_precision@5 0.1757
cosine_precision@10 0.092
cosine_recall@1 0.7
cosine_recall@3 0.8414
cosine_recall@5 0.8786
cosine_recall@10 0.92
cosine_ndcg@10 0.813
cosine_mrr@10 0.7784
cosine_map@100 0.7817

Information Retrieval

Metric Value
cosine_accuracy@1 0.6914
cosine_accuracy@3 0.84
cosine_accuracy@5 0.8857
cosine_accuracy@10 0.9243
cosine_precision@1 0.6914
cosine_precision@3 0.28
cosine_precision@5 0.1771
cosine_precision@10 0.0924
cosine_recall@1 0.6914
cosine_recall@3 0.84
cosine_recall@5 0.8857
cosine_recall@10 0.9243
cosine_ndcg@10 0.8121
cosine_mrr@10 0.7758
cosine_map@100 0.7787

Information Retrieval

Metric Value
cosine_accuracy@1 0.69
cosine_accuracy@3 0.8286
cosine_accuracy@5 0.8729
cosine_accuracy@10 0.9143
cosine_precision@1 0.69
cosine_precision@3 0.2762
cosine_precision@5 0.1746
cosine_precision@10 0.0914
cosine_recall@1 0.69
cosine_recall@3 0.8286
cosine_recall@5 0.8729
cosine_recall@10 0.9143
cosine_ndcg@10 0.8041
cosine_mrr@10 0.7685
cosine_map@100 0.772

Information Retrieval

Metric Value
cosine_accuracy@1 0.67
cosine_accuracy@3 0.8171
cosine_accuracy@5 0.8657
cosine_accuracy@10 0.9071
cosine_precision@1 0.67
cosine_precision@3 0.2724
cosine_precision@5 0.1731
cosine_precision@10 0.0907
cosine_recall@1 0.67
cosine_recall@3 0.8171
cosine_recall@5 0.8657
cosine_recall@10 0.9071
cosine_ndcg@10 0.7905
cosine_mrr@10 0.7529
cosine_map@100 0.7567

Information Retrieval

Metric Value
cosine_accuracy@1 0.6314
cosine_accuracy@3 0.7943
cosine_accuracy@5 0.8386
cosine_accuracy@10 0.8829
cosine_precision@1 0.6314
cosine_precision@3 0.2648
cosine_precision@5 0.1677
cosine_precision@10 0.0883
cosine_recall@1 0.6314
cosine_recall@3 0.7943
cosine_recall@5 0.8386
cosine_recall@10 0.8829
cosine_ndcg@10 0.7591
cosine_mrr@10 0.7192
cosine_map@100 0.7236

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 311,351 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 9 tokens
    • mean: 20.47 tokens
    • max: 41 tokens
    • min: 7 tokens
    • mean: 46.65 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    What section from item 8 addresses financial information? Item 8 covers 'Financial Statements and Supplementary Data' relating to financial information.
    What was the percentage increase in interest income from 2022 to 2023? Interest income increased $769 million, or 259%, in the year ended December 31, 2023 as compared to the year ended December 31, 2022. This increase was primarily due to higher interest earned on our cash and cash equivalents and short-term investments in the year ended December 31, 2023 as compared to the prior year due to rising interest rates and our increasing portfolio balance.
    What was the operating margin for UnitedHealthcare in 2023? The operating margin for UnitedHealthcare in 2023 was reported as 5.8%.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

json

  • Dataset: json
  • Size: 700 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 700 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 20.59 tokens
    • max: 40 tokens
    • min: 6 tokens
    • mean: 47.59 tokens
    • max: 326 tokens
  • Samples:
    anchor positive
    What was the maximum borrowing capacity available from the Federal Home Loan Bank of Boston as of December 31, 2023? The maximum borrowing capacity available from the FHLBB as of December 31, 2023 was approximately $1.0 billion.
    What new compliance requirement was established by the CFPB's final rule issued on March 30, 2023, regarding small business credit applications? On March 30, 2023, the CFPB adopted a final rule requiring covered financial institutions, such as us, to collect and report data to the CFPB regarding certain small business credit applications.
    What potential impact could continued geopolitical tensions have on the business? While the ongoing Russia-Ukraine and Israel conflicts are still evolving and outcomes remain uncertain, the business does not expect the resulting challenging macroeconomic conditions to have a material impact currently. However, if conflicts continue or worsen, it could lead to greater disruptions and uncertainty, negatively impacting the business.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • fp16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0658 10 12.7958 - - - - - -
0.1315 20 16.8225 - - - - - -
0.1973 30 20.1236 - - - - - -
0.2630 40 22.0845 - - - - - -
0.3288 50 19.7865 - - - - - -
0.3946 60 6.0102 - - - - - -
0.4603 70 3.7813 - - - - - -
0.5261 80 2.8675 - - - - - -
0.5919 90 2.2002 - - - - - -
0.6576 100 1.8334 - - - - - -
0.7234 110 1.5052 - - - - - -
0.7891 120 1.3454 - - - - - -
0.8549 130 1.2089 - - - - - -
0.9207 140 1.0615 - - - - - -
0.9864 150 1.011 - - - - - -
0.9996 152 - 0.2963 0.7043 0.7228 0.7462 0.6496 0.7566
1.0522 160 7.9844 - - - - - -
1.1180 170 12.726 - - - - - -
1.1837 180 17.3762 - - - - - -
1.2495 190 19.358 - - - - - -
1.3152 200 19.4805 - - - - - -
1.3810 210 5.7452 - - - - - -
1.4468 220 1.3857 - - - - - -
1.5125 230 0.9792 - - - - - -
1.5783 240 0.8632 - - - - - -
1.6441 250 0.8256 - - - - - -
1.7098 260 0.742 - - - - - -
1.7756 270 0.7307 - - - - - -
1.8413 280 0.7064 - - - - - -
1.9071 290 0.6492 - - - - - -
1.9729 300 0.6265 - - - - - -
1.9992 304 - 0.2345 0.7145 0.7317 0.7548 0.6706 0.7609
2.0386 310 4.0854 - - - - - -
2.1044 320 11.4485 - - - - - -
2.1702 330 14.1851 - - - - - -
2.2359 340 17.7422 - - - - - -
2.3017 350 19.2742 - - - - - -
2.3674 360 7.3918 - - - - - -
2.4332 370 1.0444 - - - - - -
2.4990 380 0.6947 - - - - - -
2.5647 390 0.6 - - - - - -
2.6305 400 0.6005 - - - - - -
2.6963 410 0.5314 - - - - - -
2.7620 420 0.5238 - - - - - -
2.8278 430 0.5207 - - - - - -
2.8935 440 0.5075 - - - - - -
2.9593 450 0.4673 - - - - - -
2.9988 456 - 0.2111 0.7252 0.7333 0.7530 0.6821 0.7617
3.0251 460 1.5162 - - - - - -
3.0908 470 10.5824 - - - - - -
3.1566 480 11.8184 - - - - - -
3.2224 490 16.3944 - - - - - -
3.2881 500 18.1591 - - - - - -
3.3539 510 10.8653 - - - - - -
3.4196 520 0.8936 - - - - - -
3.4854 530 0.5606 - - - - - -
3.5512 540 0.4724 - - - - - -
3.6169 550 0.4681 - - - - - -
3.6827 560 0.4334 - - - - - -
3.7485 570 0.4005 - - - - - -
3.8142 580 0.4224 - - - - - -
3.8800 590 0.4296 - - - - - -
3.9457 600 0.3788 - - - - - -
3.9984 608 - 0.1889 0.7345 0.7469 0.7647 0.6906 0.7633
4.0115 610 0.5548 - - - - - -
4.0773 620 8.6803 - - - - - -
4.1430 630 10.6235 - - - - - -
4.2088 640 14.5689 - - - - - -
4.2746 650 17.649 - - - - - -
4.3403 660 13.9682 - - - - - -
4.4061 670 0.7801 - - - - - -
4.4718 680 0.4848 - - - - - -
4.5376 690 0.4082 - - - - - -
4.6034 700 0.3883 - - - - - -
4.6691 710 0.3737 - - - - - -
4.7349 720 0.3485 - - - - - -
4.8007 730 0.3547 - - - - - -
4.8664 740 0.357 - - - - - -
4.9322 750 0.3223 - - - - - -
4.9979 760 0.3322 0.1843 0.7364 0.7482 0.7645 0.6911 0.7652
5.0637 770 6.5343 - - - - - -
5.1295 780 10.1093 - - - - - -
5.1952 790 13.3253 - - - - - -
5.2610 800 16.6724 - - - - - -
5.3268 810 15.6655 - - - - - -
5.3925 820 2.0319 - - - - - -
5.4583 830 0.4315 - - - - - -
5.5240 840 0.3544 - - - - - -
5.5898 850 0.3488 - - - - - -
5.6556 860 0.3301 - - - - - -
5.7213 870 0.3035 - - - - - -
5.7871 880 0.3123 - - - - - -
5.8529 890 0.3149 - - - - - -
5.9186 900 0.2857 - - - - - -
5.9844 910 0.3021 - - - - - -
5.9975 912 - 0.1704 0.7442 0.7527 0.7643 0.7031 0.7700
6.0501 920 4.5418 - - - - - -
6.1159 930 8.909 - - - - - -
6.1817 940 12.7023 - - - - - -
6.2474 950 15.6328 - - - - - -
6.3132 960 17.1026 - - - - - -
6.3790 970 3.8174 - - - - - -
6.4447 980 0.4035 - - - - - -
6.5105 990 0.3281 - - - - - -
6.5762 1000 0.3126 - - - - - -
6.6420 1010 0.304 - - - - - -
6.7078 1020 0.2692 - - - - - -
6.7735 1030 0.2807 - - - - - -
6.8393 1040 0.2993 - - - - - -
6.9051 1050 0.2721 - - - - - -
6.9708 1060 0.2674 - - - - - -
6.9971 1064 - 0.1596 0.7481 0.7607 0.7723 0.7074 0.7735
7.0366 1070 2.5499 - - - - - -
7.1023 1080 8.8274 - - - - - -
7.1681 1090 11.3224 - - - - - -
7.2339 1100 15.0825 - - - - - -
7.2996 1110 17.6647 - - - - - -
7.3654 1120 6.0271 - - - - - -
7.4312 1130 0.3838 - - - - - -
7.4969 1140 0.3137 - - - - - -
7.5627 1150 0.285 - - - - - -
7.6284 1160 0.2913 - - - - - -
7.6942 1170 0.268 - - - - - -
7.7600 1180 0.2643 - - - - - -
7.8257 1190 0.2702 - - - - - -
7.8915 1200 0.2775 - - - - - -
7.9573 1210 0.2563 - - - - - -
7.9967 1216 - 0.1543 0.7495 0.7645 0.7715 0.7124 0.7802
8.0230 1220 0.7657 - - - - - -
8.0888 1230 8.542 - - - - - -
8.1545 1240 9.9807 - - - - - -
8.2203 1250 14.3646 - - - - - -
8.2861 1260 16.877 - - - - - -
8.3518 1270 10.2992 - - - - - -
8.4176 1280 0.363 - - - - - -
8.4834 1290 0.304 - - - - - -
8.5491 1300 0.2851 - - - - - -
8.6149 1310 0.2853 - - - - - -
8.6806 1320 0.2676 - - - - - -
8.7464 1330 0.2522 - - - - - -
8.8122 1340 0.2619 - - - - - -
8.8779 1350 0.2757 - - - - - -
8.9437 1360 0.2528 - - - - - -
8.9963 1368 - 0.1483 0.7529 0.7680 0.7759 0.7172 0.7807
9.0095 1370 0.3564 - - - - - -
9.0752 1380 7.1402 - - - - - -
9.1410 1390 9.4364 - - - - - -
9.2067 1400 13.1391 - - - - - -
9.2725 1410 16.7827 - - - - - -
9.3383 1420 13.456 - - - - - -
9.4040 1430 0.5238 - - - - - -
9.4698 1440 0.3073 - - - - - -
9.5356 1450 0.2773 - - - - - -
9.6013 1460 0.2783 - - - - - -
9.6671 1470 0.2645 - - - - - -
9.7328 1480 0.2495 - - - - - -
9.7986 1490 0.2649 - - - - - -
9.8644 1500 0.2655 - - - - - -
9.9301 1510 0.2395 - - - - - -
9.9959 1520 0.2569 0.1453 0.7567 0.772 0.7787 0.7236 0.7817
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 2.19.1
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}