bge-base-financial-matryoshka / README.md

Add new SentenceTransformer model

e7a7abb verified 2 days ago

28.5 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:6300
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: BAAI/bge-base-en-v1.5
	widget:
	- source_sentence: 'The fair value of consideration transferred of $212.1 million
	consisted of: (1) cash consideration paid of $211.3 million, net of cash acquired,
	and (2) non-cash consideration of $0.8 million representing the portion of the
	replacement equity awards issued in connection with the acquisition that was associated
	with services rendered through the date of the acquisition.'
	sentences:
	- What is the monthly cost of a Connected Fitness Subscription if it includes a
	combination of a Bike, Tread, Guide, or Row product in the same household as of
	June 2022?
	- What was the fair value of the total consideration transferred for the acquisition
	discussed, and how was it composed?
	- How did the Tax Court rule on November 18, 2020, regarding the company's dispute
	with the IRS?
	- source_sentence: Each of the UK LSA members has agreed, on a several and not joint
	basis, to compensate the Company for certain losses which may be incurred by the
	Company, Visa Europe or their affiliates as a result of certain existing and potential
	litigation relating to the setting and implementation of domestic multilateral
	interchange fee rates in the United Kingdom prior to the closing of the Visa Europe
	acquisition (Closing), subject to the terms and conditions set forth therein and,
	with respect to each UK LSA member, up to a maximum amount of the up-front cash
	consideration received by such UK LSA member. The UK LSA members’ obligations
	under the UK loss sharing agreement are conditional upon, among other things,
	either (a) losses valued in excess of the sterling equivalent on June 21, 2016
	of €1.0 billion having arisen in UK covered claims (and such losses having reduced
	the conversion rate of the series B preferred stock accordingly), or (b) the conversion
	rate of the series B preferred stock having been reduced to zero pursuant to losses
	arising in claims...
	sentences:
	- Are AbbVie's corporate governance materials available to the public, and if so,
	where?
	- What conditions must be met for the UK loss sharing agreement to compensate for
	losses?
	- How much did Delta Air Lines recognize in government grants from the Payroll Support
	Programs during the year ended December 31, 2021?
	- source_sentence: We provide our customers with an opportunity to trade-in their
	pre-owned gaming, mobility, and other products at our stores in exchange for cash
	or credit which can be applied towards the purchase of other products.
	sentences:
	- What is GameStop's trade-in program?
	- What were the total unrealized losses on U.S. Treasury securities as of the last
	reporting date?
	- What methods can a refinery use to meet its Environmental Protection Agency (EPA)
	requirements for blending renewable fuels?
	- source_sentence: Diluted earnings per share is calculated using our weighted-average
	outstanding common shares including the dilutive effect of stock awards as determined
	under the treasury stock method.
	sentences:
	- How do changes in the assumed long-term rate of return affect AbbVie's net periodic
	benefit cost for pension plans?
	- What are the primary factors discussed in the Management’s Discussion and Analysis
	that affect the financial statements year-to-year changes?
	- What is the method used to calculate diluted earnings per share?
	- source_sentence: Item 8 in the document covers 'Financial Statements and Supplementary
	Data'.
	sentences:
	- What type of information does Item 8 in the document cover?
	- What are some of the potential consequences for Meta Platforms, Inc. from inquiries
	or investigations as noted in the provided text?
	- How is the take rate calculated and what does it represent?
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: BGE base Financial Matryoshka
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 768
	type: dim_768
	metrics:
	- type: cosine_accuracy@1
	value: 0.68
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.8242857142857143
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.8571428571428571
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8985714285714286
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.68
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.27476190476190476
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.1714285714285714
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08985714285714284
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.68
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.8242857142857143
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.8571428571428571
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8985714285714286
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7931022011968226
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.759021541950113
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7627727073081649
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 512
	type: dim_512
	metrics:
	- type: cosine_accuracy@1
	value: 0.6685714285714286
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.82
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.86
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.9042857142857142
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6685714285714286
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2733333333333333
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.172
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.09042857142857141
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6685714285714286
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.82
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.86
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.9042857142857142
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7907009828560375
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7540430839002267
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7572918009226873
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 256
	type: dim_256
	metrics:
	- type: cosine_accuracy@1
	value: 0.6771428571428572
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.8142857142857143
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.8571428571428571
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8857142857142857
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6771428571428572
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.2714285714285714
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.1714285714285714
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08857142857142855
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6771428571428572
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.8142857142857143
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.8571428571428571
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8857142857142857
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7870155634206691
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7548027210884352
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7592885578023618
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 128
	type: dim_128
	metrics:
	- type: cosine_accuracy@1
	value: 0.6542857142857142
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.8071428571428572
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.8514285714285714
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8857142857142857
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6542857142857142
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.26904761904761904
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.17028571428571426
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08857142857142856
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6542857142857142
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.8071428571428572
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.8514285714285714
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8857142857142857
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7751084647376248
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.73912925170068
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7430473786684797
	name: Cosine Map@100
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: dim 64
	type: dim_64
	metrics:
	- type: cosine_accuracy@1
	value: 0.6157142857142858
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.7771428571428571
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.8214285714285714
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.8728571428571429
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.6157142857142858
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.259047619047619
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.16428571428571428
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.08728571428571427
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.6157142857142858
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.7771428571428571
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.8214285714285714
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.8728571428571429
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7472883962433147
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.7067517006802716
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.7111439006196084
	name: Cosine Map@100
	---

	# BGE base Financial Matryoshka

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- json
	- Language: en
	- License: apache-2.0

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("Shivam1311/bge-base-financial-matryoshka")
	# Run inference
	sentences = [
	"Item 8 in the document covers 'Financial Statements and Supplementary Data'.",
	'What type of information does Item 8 in the document cover?',
	'What are some of the potential consequences for Meta Platforms, Inc. from inquiries or investigations as noted in the provided text?',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| dim_768 \| dim_512 \| dim_256 \| dim_128 \| dim_64 \|
	\|:--------------------\|:-----------\|:-----------\|:----------\|:-----------\|:-----------\|
	\| cosine_accuracy@1 \| 0.68 \| 0.6686 \| 0.6771 \| 0.6543 \| 0.6157 \|
	\| cosine_accuracy@3 \| 0.8243 \| 0.82 \| 0.8143 \| 0.8071 \| 0.7771 \|
	\| cosine_accuracy@5 \| 0.8571 \| 0.86 \| 0.8571 \| 0.8514 \| 0.8214 \|
	\| cosine_accuracy@10 \| 0.8986 \| 0.9043 \| 0.8857 \| 0.8857 \| 0.8729 \|
	\| cosine_precision@1 \| 0.68 \| 0.6686 \| 0.6771 \| 0.6543 \| 0.6157 \|
	\| cosine_precision@3 \| 0.2748 \| 0.2733 \| 0.2714 \| 0.269 \| 0.259 \|
	\| cosine_precision@5 \| 0.1714 \| 0.172 \| 0.1714 \| 0.1703 \| 0.1643 \|
	\| cosine_precision@10 \| 0.0899 \| 0.0904 \| 0.0886 \| 0.0886 \| 0.0873 \|
	\| cosine_recall@1 \| 0.68 \| 0.6686 \| 0.6771 \| 0.6543 \| 0.6157 \|
	\| cosine_recall@3 \| 0.8243 \| 0.82 \| 0.8143 \| 0.8071 \| 0.7771 \|
	\| cosine_recall@5 \| 0.8571 \| 0.86 \| 0.8571 \| 0.8514 \| 0.8214 \|
	\| cosine_recall@10 \| 0.8986 \| 0.9043 \| 0.8857 \| 0.8857 \| 0.8729 \|
	\| cosine_ndcg@10 \| 0.7931 \| 0.7907 \| 0.787 \| 0.7751 \| 0.7473 \|
	\| cosine_mrr@10 \| 0.759 \| 0.754 \| 0.7548 \| 0.7391 \| 0.7068 \|
	\| cosine_map@100 \| 0.7628 \| 0.7573 \| 0.7593 \| 0.743 \| 0.7111 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### json

	* Dataset: json
	* Size: 6,300 training samples
	* Columns: <code>positive</code> and <code>anchor</code>
	* Approximate statistics based on the first 1000 samples:
	\| \| positive \| anchor \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 8 tokens</li><li>mean: 46.61 tokens</li><li>max: 439 tokens</li></ul> \| <ul><li>min: 7 tokens</li><li>mean: 20.72 tokens</li><li>max: 51 tokens</li></ul> \|
	* Samples:
	\| positive \| anchor \|
	\|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------------------------\|
	\| <code>Operating costs and expenses increased $80.3 million, or 7.1%, during the year ended December 31, 2023, compared to the year ended December 31, 2022 primarily due to increases in film exhibition and food and beverage costs.</code> \| <code>What factors contributed to the escalation in operating costs and expenses in 2023?</code> \|
	\| <code>In the United States, the company purchases HFCS to meet its and its bottlers’ requirements with the assistance of Coca-Cola Bottlers’ Sales & Services Company LLC, which is a procurement service provider for their North American operations.</code> \| <code>How does the company source high fructose corn syrup (HFCS) in the United States?</code> \|
	\| <code>Item 8. Financial Statements and Supplementary Data The index to Financial Statements and Supplementary Data is presented</code> \| <code>What is presented in Item 8 according to Financial Statements and Supplementary Data?</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: epoch
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `gradient_accumulation_steps`: 16
	- `learning_rate`: 2e-05
	- `num_train_epochs`: 4
	- `lr_scheduler_type`: cosine
	- `warmup_ratio`: 0.1
	- `bf16`: True
	- `tf32`: False
	- `load_best_model_at_end`: True
	- `optim`: adamw_torch_fused
	- `batch_sampler`: no_duplicates

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: epoch
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 16
	- `per_device_eval_batch_size`: 16
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 16
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 2e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1.0
	- `num_train_epochs`: 4
	- `max_steps`: -1
	- `lr_scheduler_type`: cosine
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.1
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: True
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: False
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: True
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch_fused
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: no_duplicates
	- `multi_dataset_batch_sampler`: proportional

	</details>

	### Training Logs
	\| Epoch \| Step \| Training Loss \| dim_768_cosine_ndcg@10 \| dim_512_cosine_ndcg@10 \| dim_256_cosine_ndcg@10 \| dim_128_cosine_ndcg@10 \| dim_64_cosine_ndcg@10 \|
	\|:-------:\|:------:\|:-------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:----------------------:\|:---------------------:\|
	\| 0.4061 \| 10 \| 16.0873 \| - \| - \| - \| - \| - \|
	\| 0.8122 \| 20 \| 8.3282 \| - \| - \| - \| - \| - \|
	\| 1.0 \| 25 \| - \| 0.7841 \| 0.7796 \| 0.7774 \| 0.7631 \| 0.7320 \|
	\| 1.2030 \| 30 \| 5.1781 \| - \| - \| - \| - \| - \|
	\| 1.6091 \| 40 \| 4.0947 \| - \| - \| - \| - \| - \|
	\| 2.0 \| 50 \| 3.9824 \| 0.7888 \| 0.7867 \| 0.7851 \| 0.7701 \| 0.7401 \|
	\| 2.4061 \| 60 \| 2.854 \| - \| - \| - \| - \| - \|
	\| 2.8122 \| 70 \| 2.9878 \| - \| - \| - \| - \| - \|
	\| 3.0 \| 75 \| - \| 0.7913 \| 0.7903 \| 0.7869 \| 0.7755 \| 0.7469 \|
	\| 3.2030 \| 80 \| 2.5653 \| - \| - \| - \| - \| - \|
	\| 3.6091 \| 90 \| 2.999 \| - \| - \| - \| - \| - \|
	\| 3.8528 \| 96 \| - \| 0.7931 \| 0.7907 \| 0.7870 \| 0.7751 \| 0.7473 \|

	* The bold row denotes the saved checkpoint.

	### Framework Versions
	- Python: 3.11.11
	- Sentence Transformers: 3.4.1
	- Transformers: 4.48.3
	- PyTorch: 2.5.1+cu124
	- Accelerate: 1.3.0
	- Datasets: 3.3.2
	- Tokenizers: 0.21.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->