--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:6300 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: BAAI/bge-base-en-v1.5 widget: - source_sentence: Our effective tax rate for fiscal years 2023 and 2022 was 19% and 13%, respectively. sentences: - What does the Corporate and Other segment include in its composition? - What was the effective tax rate for Microsoft in fiscal year 2023? - What roles did Elizabeth Rutledge hold before being appointed as Chief Marketing Officer in February 2018? - source_sentence: Many factors are considered when assessing whether it is more likely than not that the deferred tax assets will be realized, including recent cumulative earnings, expectations of future taxable income, carryforward periods and other relevant quantitative and qualitative factors. sentences: - What factors are considered when evaluating the realization of deferred tax assets? - What are the contents of Item 8 in the financial document? - Are goodwill and indefinite-lived intangible assets amortized? - source_sentence: Cost of net revenues represents costs associated with customer support, site operations, and payment processing. Significant components of these costs primarily consist of employee compensation (including stock-based compensation), contractor costs, facilities costs, depreciation of equipment and amortization expense, bank transaction fees, credit card interchange and assessment fees, authentication costs, shipping costs and digital services tax. sentences: - What was the total percentage of U.S. dialysis patient service revenues coming from government-based programs in 2023? - What are the key components of cost of net revenues? - What elements define Ford Credit's balance sheet liquidity profile? - source_sentence: Net revenue from outside of the United States decreased 15.5% to $34.9 billion in fiscal year 2023. sentences: - How did the company's net revenue perform internationally in fiscal year 2023? - What was the fair value of money market mutual funds measured at as of January 31, 2023 and how was it categorized in the fair value hierarchy? - How much did professional services expenses increase in 2023 from the previous year? - source_sentence: Marketplace revenue increased $86.3 million to $2.0 billion in the year ended December 31, 2023 compared to the year ended December 31, 2022. sentences: - What were the main factors considered in the audit process to evaluate the self-insurance reserve? - How much did Marketplace revenue increase in the year ended December 31, 2023? - Why did operations and support expenses decrease in 2023, and what factors offset this decrease? pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: BGE base Financial Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.7 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8285714285714286 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8785714285714286 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9085714285714286 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.7 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.27619047619047615 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17571428571428568 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09085714285714284 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.7 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8285714285714286 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8785714285714286 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9085714285714286 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8070713920635244 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.774145124716553 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7778677437532947 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.6942857142857143 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.83 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8728571428571429 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9042857142857142 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6942857142857143 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.27666666666666667 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17457142857142854 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09042857142857143 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6942857142857143 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.83 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8728571428571429 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9042857142857142 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.8031148082413071 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.770209750566893 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7742865136346454 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.6828571428571428 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8242857142857143 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8657142857142858 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.9042857142857142 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6828571428571428 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2747619047619047 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17314285714285713 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09042857142857143 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6828571428571428 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8242857142857143 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8657142857142858 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.9042857142857142 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7969921030232127 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.762270975056689 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7658165867130817 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.68 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.8085714285714286 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8514285714285714 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8842857142857142 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.68 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2695238095238095 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.17028571428571426 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08842857142857141 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.68 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.8085714285714286 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8514285714285714 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8842857142857142 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7840025892817639 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.751556689342403 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7563834249655896 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.6371428571428571 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.7814285714285715 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.8271428571428572 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8728571428571429 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.6371428571428571 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.2604761904761905 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.1654285714285714 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.08728571428571427 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.6371428571428571 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.7814285714285715 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.8271428571428572 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8728571428571429 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7566246856089167 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.7193163265306118 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.7237471572016445 name: Cosine Map@100 --- # BGE base Financial Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("viggypoker1/bge-base-financial-matryoshka") # Run inference sentences = [ 'Marketplace revenue increased $86.3 million to $2.0 billion in the year ended December 31, 2023 compared to the year ended December 31, 2022.', 'How much did Marketplace revenue increase in the year ended December 31, 2023?', 'Why did operations and support expenses decrease in 2023, and what factors offset this decrease?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.7 | | cosine_accuracy@3 | 0.8286 | | cosine_accuracy@5 | 0.8786 | | cosine_accuracy@10 | 0.9086 | | cosine_precision@1 | 0.7 | | cosine_precision@3 | 0.2762 | | cosine_precision@5 | 0.1757 | | cosine_precision@10 | 0.0909 | | cosine_recall@1 | 0.7 | | cosine_recall@3 | 0.8286 | | cosine_recall@5 | 0.8786 | | cosine_recall@10 | 0.9086 | | cosine_ndcg@10 | 0.8071 | | cosine_mrr@10 | 0.7741 | | **cosine_map@100** | **0.7779** | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6943 | | cosine_accuracy@3 | 0.83 | | cosine_accuracy@5 | 0.8729 | | cosine_accuracy@10 | 0.9043 | | cosine_precision@1 | 0.6943 | | cosine_precision@3 | 0.2767 | | cosine_precision@5 | 0.1746 | | cosine_precision@10 | 0.0904 | | cosine_recall@1 | 0.6943 | | cosine_recall@3 | 0.83 | | cosine_recall@5 | 0.8729 | | cosine_recall@10 | 0.9043 | | cosine_ndcg@10 | 0.8031 | | cosine_mrr@10 | 0.7702 | | **cosine_map@100** | **0.7743** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6829 | | cosine_accuracy@3 | 0.8243 | | cosine_accuracy@5 | 0.8657 | | cosine_accuracy@10 | 0.9043 | | cosine_precision@1 | 0.6829 | | cosine_precision@3 | 0.2748 | | cosine_precision@5 | 0.1731 | | cosine_precision@10 | 0.0904 | | cosine_recall@1 | 0.6829 | | cosine_recall@3 | 0.8243 | | cosine_recall@5 | 0.8657 | | cosine_recall@10 | 0.9043 | | cosine_ndcg@10 | 0.797 | | cosine_mrr@10 | 0.7623 | | **cosine_map@100** | **0.7658** | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.68 | | cosine_accuracy@3 | 0.8086 | | cosine_accuracy@5 | 0.8514 | | cosine_accuracy@10 | 0.8843 | | cosine_precision@1 | 0.68 | | cosine_precision@3 | 0.2695 | | cosine_precision@5 | 0.1703 | | cosine_precision@10 | 0.0884 | | cosine_recall@1 | 0.68 | | cosine_recall@3 | 0.8086 | | cosine_recall@5 | 0.8514 | | cosine_recall@10 | 0.8843 | | cosine_ndcg@10 | 0.784 | | cosine_mrr@10 | 0.7516 | | **cosine_map@100** | **0.7564** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.6371 | | cosine_accuracy@3 | 0.7814 | | cosine_accuracy@5 | 0.8271 | | cosine_accuracy@10 | 0.8729 | | cosine_precision@1 | 0.6371 | | cosine_precision@3 | 0.2605 | | cosine_precision@5 | 0.1654 | | cosine_precision@10 | 0.0873 | | cosine_recall@1 | 0.6371 | | cosine_recall@3 | 0.7814 | | cosine_recall@5 | 0.8271 | | cosine_recall@10 | 0.8729 | | cosine_ndcg@10 | 0.7566 | | cosine_mrr@10 | 0.7193 | | **cosine_map@100** | **0.7237** | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 6,300 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------| | GM Financial's penetration of our retail sales in the U.S. was 42% in the year ended December 31, 2023, compared to 43% in the corresponding period in 2022. | How did the penetration rate of GM Financial's retail sales in the U.S. change from 2022 to 2023? | | Net cash provided by operating activities decreased by $2.0 billion in fiscal 2022 compared to fiscal 2021. | How did the cash flow from operating activities change in fiscal 2022 compared to fiscal 2021? | | Total revenues increased $8.2 billion, or 7.5%, in 2023 compared to 2022. The increase was primarily driven by pharmacy drug mix, increased prescription volume, brand inflation, and increased contributions from vaccinations. | How much did total revenues increase in 2023 compared to the previous year? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Evaluation Dataset #### json * Dataset: json * Size: 700 evaluation samples * Columns: positive and anchor * Approximate statistics based on the first 700 samples: | | positive | anchor | |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------| | Using these constant rates, total revenue and advertising revenue would have been $374 million and $379 million lower than actual total revenue and advertising revenue, respectively, for the full year 2023. | How much would total revenue and advertising revenue have been lower in 2023 using constant foreign exchange rates compared to actual figures? | | Interest expense increased $42.9 million to $348.8 million for the year ended December 31, 2023, compared to $305.9 million during the year ended December 31, 2022. | What was the total interest expense for the year ended December 31, 2023? | | Net cash provided by operating activities increased $183.3 million in 2022 compared to 2021 primarily as a result of higher current year earnings, net of non-cash items, and smaller decreases in liability balances, partially offset by higher inventory levels and a smaller increase in accounts payable. | How much did net cash provided by operating activities increase in 2022 compared to 2021? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 4 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `fp16`: True - `tf32`: False - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: False - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 | |:----------:|:------:|:-------------:|:----------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:| | 0.8122 | 10 | 1.6144 | - | - | - | - | - | - | | 0.9746 | 12 | - | 0.2439 | 0.7301 | 0.7428 | 0.7539 | 0.6957 | 0.7607 | | 1.6244 | 20 | 0.6547 | - | - | - | - | - | - | | 1.9492 | 24 | - | 0.1966 | 0.7496 | 0.7631 | 0.7729 | 0.7187 | 0.7733 | | 2.4365 | 30 | 0.4734 | - | - | - | - | - | - | | 2.9239 | 36 | - | 0.1822 | 0.7556 | 0.7643 | 0.7743 | 0.7242 | 0.7756 | | 3.2487 | 40 | 0.3833 | - | - | - | - | - | - | | **3.8985** | **48** | **-** | **0.1794** | **0.7564** | **0.7658** | **0.7743** | **0.7237** | **0.7779** | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.8.10 - Sentence Transformers: 3.1.1 - Transformers: 4.45.2 - PyTorch: 2.1.2+cu121 - Accelerate: 1.0.1 - Datasets: 2.19.1 - Tokenizers: 0.20.3 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```