metadata
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: >-
The fair value of consideration transferred of $212.1 million consisted
of: (1) cash consideration paid of $211.3 million, net of cash acquired,
and (2) non-cash consideration of $0.8 million representing the portion of
the replacement equity awards issued in connection with the acquisition
that was associated with services rendered through the date of the
acquisition.
sentences:
- >-
What is the monthly cost of a Connected Fitness Subscription if it
includes a combination of a Bike, Tread, Guide, or Row product in the
same household as of June 2022?
- >-
What was the fair value of the total consideration transferred for the
acquisition discussed, and how was it composed?
- >-
How did the Tax Court rule on November 18, 2020, regarding the company's
dispute with the IRS?
- source_sentence: >-
Each of the UK LSA members has agreed, on a several and not joint basis,
to compensate the Company for certain losses which may be incurred by the
Company, Visa Europe or their affiliates as a result of certain existing
and potential litigation relating to the setting and implementation of
domestic multilateral interchange fee rates in the United Kingdom prior to
the closing of the Visa Europe acquisition (Closing), subject to the terms
and conditions set forth therein and, with respect to each UK LSA member,
up to a maximum amount of the up-front cash consideration received by such
UK LSA member. The UK LSA members’ obligations under the UK loss sharing
agreement are conditional upon, among other things, either (a) losses
valued in excess of the sterling equivalent on June 21, 2016 of €1.0
billion having arisen in UK covered claims (and such losses having reduced
the conversion rate of the series B preferred stock accordingly), or (b)
the conversion rate of the series B preferred stock having been reduced to
zero pursuant to losses arising in claims...
sentences:
- >-
Are AbbVie's corporate governance materials available to the public, and
if so, where?
- >-
What conditions must be met for the UK loss sharing agreement to
compensate for losses?
- >-
How much did Delta Air Lines recognize in government grants from the
Payroll Support Programs during the year ended December 31, 2021?
- source_sentence: >-
We provide our customers with an opportunity to trade-in their pre-owned
gaming, mobility, and other products at our stores in exchange for cash or
credit which can be applied towards the purchase of other products.
sentences:
- What is GameStop's trade-in program?
- >-
What were the total unrealized losses on U.S. Treasury securities as of
the last reporting date?
- >-
What methods can a refinery use to meet its Environmental Protection
Agency (EPA) requirements for blending renewable fuels?
- source_sentence: >-
Diluted earnings per share is calculated using our weighted-average
outstanding common shares including the dilutive effect of stock awards as
determined under the treasury stock method.
sentences:
- >-
How do changes in the assumed long-term rate of return affect AbbVie's
net periodic benefit cost for pension plans?
- >-
What are the primary factors discussed in the Management’s Discussion
and Analysis that affect the financial statements year-to-year changes?
- What is the method used to calculate diluted earnings per share?
- source_sentence: >-
Item 8 in the document covers 'Financial Statements and Supplementary
Data'.
sentences:
- What type of information does Item 8 in the document cover?
- >-
What are some of the potential consequences for Meta Platforms, Inc.
from inquiries or investigations as noted in the provided text?
- How is the take rate calculated and what does it represent?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: BGE base Financial Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.68
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8242857142857143
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8571428571428571
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8985714285714286
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.68
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.27476190476190476
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1714285714285714
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08985714285714284
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.68
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8242857142857143
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8571428571428571
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8985714285714286
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7931022011968226
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.759021541950113
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7627727073081649
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.6685714285714286
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.82
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.86
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9042857142857142
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6685714285714286
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2733333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.172
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09042857142857141
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6685714285714286
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.82
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.86
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9042857142857142
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7907009828560375
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7540430839002267
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7572918009226873
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.6771428571428572
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8142857142857143
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8571428571428571
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8857142857142857
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6771428571428572
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2714285714285714
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1714285714285714
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08857142857142855
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6771428571428572
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8142857142857143
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8571428571428571
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8857142857142857
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7870155634206691
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7548027210884352
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7592885578023618
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.6542857142857142
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8071428571428572
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8514285714285714
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8857142857142857
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6542857142857142
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.26904761904761904
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.17028571428571426
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08857142857142856
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6542857142857142
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8071428571428572
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8514285714285714
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8857142857142857
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7751084647376248
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.73912925170068
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7430473786684797
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.6157142857142858
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.7771428571428571
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.8214285714285714
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8728571428571429
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.6157142857142858
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.259047619047619
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.16428571428571428
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.08728571428571427
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.6157142857142858
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.7771428571428571
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.8214285714285714
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.8728571428571429
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.7472883962433147
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.7067517006802716
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.7111439006196084
name: Cosine Map@100
BGE base Financial Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Shivam1311/bge-base-financial-matryoshka")
# Run inference
sentences = [
"Item 8 in the document covers 'Financial Statements and Supplementary Data'.",
'What type of information does Item 8 in the document cover?',
'What are some of the potential consequences for Meta Platforms, Inc. from inquiries or investigations as noted in the provided text?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.68 | 0.6686 | 0.6771 | 0.6543 | 0.6157 |
cosine_accuracy@3 | 0.8243 | 0.82 | 0.8143 | 0.8071 | 0.7771 |
cosine_accuracy@5 | 0.8571 | 0.86 | 0.8571 | 0.8514 | 0.8214 |
cosine_accuracy@10 | 0.8986 | 0.9043 | 0.8857 | 0.8857 | 0.8729 |
cosine_precision@1 | 0.68 | 0.6686 | 0.6771 | 0.6543 | 0.6157 |
cosine_precision@3 | 0.2748 | 0.2733 | 0.2714 | 0.269 | 0.259 |
cosine_precision@5 | 0.1714 | 0.172 | 0.1714 | 0.1703 | 0.1643 |
cosine_precision@10 | 0.0899 | 0.0904 | 0.0886 | 0.0886 | 0.0873 |
cosine_recall@1 | 0.68 | 0.6686 | 0.6771 | 0.6543 | 0.6157 |
cosine_recall@3 | 0.8243 | 0.82 | 0.8143 | 0.8071 | 0.7771 |
cosine_recall@5 | 0.8571 | 0.86 | 0.8571 | 0.8514 | 0.8214 |
cosine_recall@10 | 0.8986 | 0.9043 | 0.8857 | 0.8857 | 0.8729 |
cosine_ndcg@10 | 0.7931 | 0.7907 | 0.787 | 0.7751 | 0.7473 |
cosine_mrr@10 | 0.759 | 0.754 | 0.7548 | 0.7391 | 0.7068 |
cosine_map@100 | 0.7628 | 0.7573 | 0.7593 | 0.743 | 0.7111 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 6,300 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 8 tokens
- mean: 46.61 tokens
- max: 439 tokens
- min: 7 tokens
- mean: 20.72 tokens
- max: 51 tokens
- Samples:
positive anchor Operating costs and expenses increased $80.3 million, or 7.1%, during the year ended December 31, 2023, compared to the year ended December 31, 2022 primarily due to increases in film exhibition and food and beverage costs.
What factors contributed to the escalation in operating costs and expenses in 2023?
In the United States, the company purchases HFCS to meet its and its bottlers’ requirements with the assistance of Coca-Cola Bottlers’ Sales & Services Company LLC, which is a procurement service provider for their North American operations.
How does the company source high fructose corn syrup (HFCS) in the United States?
Item 8. Financial Statements and Supplementary Data The index to Financial Statements and Supplementary Data is presented
What is presented in Item 8 according to Financial Statements and Supplementary Data?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 16per_device_eval_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 4lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.4061 | 10 | 16.0873 | - | - | - | - | - |
0.8122 | 20 | 8.3282 | - | - | - | - | - |
1.0 | 25 | - | 0.7841 | 0.7796 | 0.7774 | 0.7631 | 0.7320 |
1.2030 | 30 | 5.1781 | - | - | - | - | - |
1.6091 | 40 | 4.0947 | - | - | - | - | - |
2.0 | 50 | 3.9824 | 0.7888 | 0.7867 | 0.7851 | 0.7701 | 0.7401 |
2.4061 | 60 | 2.854 | - | - | - | - | - |
2.8122 | 70 | 2.9878 | - | - | - | - | - |
3.0 | 75 | - | 0.7913 | 0.7903 | 0.7869 | 0.7755 | 0.7469 |
3.2030 | 80 | 2.5653 | - | - | - | - | - |
3.6091 | 90 | 2.999 | - | - | - | - | - |
3.8528 | 96 | - | 0.7931 | 0.7907 | 0.7870 | 0.7751 | 0.7473 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}