---
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
- source_sentence: Our effective tax rate for fiscal years 2023 and 2022 was 19% and
    13%, respectively.
  sentences:
  - What does the Corporate and Other segment include in its composition?
  - What was the effective tax rate for Microsoft in fiscal year 2023?
  - What roles did Elizabeth Rutledge hold before being appointed as Chief Marketing
    Officer in February 2018?
- source_sentence: Many factors are considered when assessing whether it is more likely
    than not that the deferred tax assets will be realized, including recent cumulative
    earnings, expectations of future taxable income, carryforward periods and other
    relevant quantitative and qualitative factors.
  sentences:
  - What factors are considered when evaluating the realization of deferred tax assets?
  - What are the contents of Item 8 in the financial document?
  - Are goodwill and indefinite-lived intangible assets amortized?
- source_sentence: Cost of net revenues represents costs associated with customer
    support, site operations, and payment processing. Significant components of these
    costs primarily consist of employee compensation (including stock-based compensation),
    contractor costs, facilities costs, depreciation of equipment and amortization
    expense, bank transaction fees, credit card interchange and assessment fees, authentication
    costs, shipping costs and digital services tax.
  sentences:
  - What was the total percentage of U.S. dialysis patient service revenues coming
    from government-based programs in 2023?
  - What are the key components of cost of net revenues?
  - What elements define Ford Credit's balance sheet liquidity profile?
- source_sentence: Net revenue from outside of the United States decreased 15.5% to
    $34.9 billion in fiscal year 2023.
  sentences:
  - How did the company's net revenue perform internationally in fiscal year 2023?
  - What was the fair value of money market mutual funds measured at as of January
    31, 2023 and how was it categorized in the fair value hierarchy?
  - How much did professional services expenses increase in 2023 from the previous
    year?
- source_sentence: Marketplace revenue increased $86.3 million to $2.0 billion in
    the year ended December 31, 2023 compared to the year ended December 31, 2022.
  sentences:
  - What were the main factors considered in the audit process to evaluate the self-insurance
    reserve?
  - How much did Marketplace revenue increase in the year ended December 31, 2023?
  - Why did operations and support expenses decrease in 2023, and what factors offset
    this decrease?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: BGE base Financial Matryoshka
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 768
      type: dim_768
    metrics:
    - type: cosine_accuracy@1
      value: 0.7
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.8285714285714286
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.8785714285714286
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.9085714285714286
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.7
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.27619047619047615
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.17571428571428568
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09085714285714284
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.7
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.8285714285714286
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.8785714285714286
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.9085714285714286
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.8070713920635244
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.774145124716553
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7778677437532947
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 512
      type: dim_512
    metrics:
    - type: cosine_accuracy@1
      value: 0.6942857142857143
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.83
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.8728571428571429
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.9042857142857142
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.6942857142857143
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.27666666666666667
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.17457142857142854
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09042857142857143
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.6942857142857143
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.83
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.8728571428571429
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.9042857142857142
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.8031148082413071
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.770209750566893
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7742865136346454
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 256
      type: dim_256
    metrics:
    - type: cosine_accuracy@1
      value: 0.6828571428571428
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.8242857142857143
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.8657142857142858
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.9042857142857142
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.6828571428571428
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.2747619047619047
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.17314285714285713
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.09042857142857143
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.6828571428571428
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.8242857142857143
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.8657142857142858
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.9042857142857142
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.7969921030232127
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.762270975056689
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7658165867130817
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 128
      type: dim_128
    metrics:
    - type: cosine_accuracy@1
      value: 0.68
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.8085714285714286
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.8514285714285714
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.8842857142857142
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.68
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.2695238095238095
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.17028571428571426
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.08842857142857141
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.68
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.8085714285714286
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.8514285714285714
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.8842857142857142
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.7840025892817639
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.751556689342403
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7563834249655896
      name: Cosine Map@100
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: dim 64
      type: dim_64
    metrics:
    - type: cosine_accuracy@1
      value: 0.6371428571428571
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 0.7814285714285715
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 0.8271428571428572
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 0.8728571428571429
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.6371428571428571
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.2604761904761905
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.1654285714285714
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.08728571428571427
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.6371428571428571
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 0.7814285714285715
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 0.8271428571428572
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 0.8728571428571429
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.7566246856089167
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.7193163265306118
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.7237471572016445
      name: Cosine Map@100
---

# BGE base Financial Matryoshka

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
    - json
- **Language:** en
- **License:** apache-2.0

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("viggypoker1/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'Marketplace revenue increased $86.3 million to $2.0 billion in the year ended December 31, 2023 compared to the year ended December 31, 2022.',
    'How much did Marketplace revenue increase in the year ended December 31, 2023?',
    'Why did operations and support expenses decrease in 2023, and what factors offset this decrease?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval
* Dataset: `dim_768`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.7        |
| cosine_accuracy@3   | 0.8286     |
| cosine_accuracy@5   | 0.8786     |
| cosine_accuracy@10  | 0.9086     |
| cosine_precision@1  | 0.7        |
| cosine_precision@3  | 0.2762     |
| cosine_precision@5  | 0.1757     |
| cosine_precision@10 | 0.0909     |
| cosine_recall@1     | 0.7        |
| cosine_recall@3     | 0.8286     |
| cosine_recall@5     | 0.8786     |
| cosine_recall@10    | 0.9086     |
| cosine_ndcg@10      | 0.8071     |
| cosine_mrr@10       | 0.7741     |
| **cosine_map@100**  | **0.7779** |

#### Information Retrieval
* Dataset: `dim_512`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.6943     |
| cosine_accuracy@3   | 0.83       |
| cosine_accuracy@5   | 0.8729     |
| cosine_accuracy@10  | 0.9043     |
| cosine_precision@1  | 0.6943     |
| cosine_precision@3  | 0.2767     |
| cosine_precision@5  | 0.1746     |
| cosine_precision@10 | 0.0904     |
| cosine_recall@1     | 0.6943     |
| cosine_recall@3     | 0.83       |
| cosine_recall@5     | 0.8729     |
| cosine_recall@10    | 0.9043     |
| cosine_ndcg@10      | 0.8031     |
| cosine_mrr@10       | 0.7702     |
| **cosine_map@100**  | **0.7743** |

#### Information Retrieval
* Dataset: `dim_256`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.6829     |
| cosine_accuracy@3   | 0.8243     |
| cosine_accuracy@5   | 0.8657     |
| cosine_accuracy@10  | 0.9043     |
| cosine_precision@1  | 0.6829     |
| cosine_precision@3  | 0.2748     |
| cosine_precision@5  | 0.1731     |
| cosine_precision@10 | 0.0904     |
| cosine_recall@1     | 0.6829     |
| cosine_recall@3     | 0.8243     |
| cosine_recall@5     | 0.8657     |
| cosine_recall@10    | 0.9043     |
| cosine_ndcg@10      | 0.797      |
| cosine_mrr@10       | 0.7623     |
| **cosine_map@100**  | **0.7658** |

#### Information Retrieval
* Dataset: `dim_128`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.68       |
| cosine_accuracy@3   | 0.8086     |
| cosine_accuracy@5   | 0.8514     |
| cosine_accuracy@10  | 0.8843     |
| cosine_precision@1  | 0.68       |
| cosine_precision@3  | 0.2695     |
| cosine_precision@5  | 0.1703     |
| cosine_precision@10 | 0.0884     |
| cosine_recall@1     | 0.68       |
| cosine_recall@3     | 0.8086     |
| cosine_recall@5     | 0.8514     |
| cosine_recall@10    | 0.8843     |
| cosine_ndcg@10      | 0.784      |
| cosine_mrr@10       | 0.7516     |
| **cosine_map@100**  | **0.7564** |

#### Information Retrieval
* Dataset: `dim_64`
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.6371     |
| cosine_accuracy@3   | 0.7814     |
| cosine_accuracy@5   | 0.8271     |
| cosine_accuracy@10  | 0.8729     |
| cosine_precision@1  | 0.6371     |
| cosine_precision@3  | 0.2605     |
| cosine_precision@5  | 0.1654     |
| cosine_precision@10 | 0.0873     |
| cosine_recall@1     | 0.6371     |
| cosine_recall@3     | 0.7814     |
| cosine_recall@5     | 0.8271     |
| cosine_recall@10    | 0.8729     |
| cosine_ndcg@10      | 0.7566     |
| cosine_mrr@10       | 0.7193     |
| **cosine_map@100**  | **0.7237** |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### json

* Dataset: json
* Size: 6,300 training samples
* Columns: <code>positive</code> and <code>anchor</code>
* Approximate statistics based on the first 1000 samples:
  |         | positive                                                                           | anchor                                                                            |
  |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                            |
  | details | <ul><li>min: 8 tokens</li><li>mean: 45.56 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 20.61 tokens</li><li>max: 42 tokens</li></ul> |
* Samples:
  | positive                                                                                                                                                                                                                                      | anchor                                                                                                         |
  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|
  | <code>GM Financial's penetration of our retail sales in the U.S. was 42% in the year ended December 31, 2023, compared to 43% in the corresponding period in 2022.</code>                                                                     | <code>How did the penetration rate of GM Financial's retail sales in the U.S. change from 2022 to 2023?</code> |
  | <code>Net cash provided by operating activities decreased by $2.0 billion in fiscal 2022 compared to fiscal 2021.</code>                                                                                                                      | <code>How did the cash flow from operating activities change in fiscal 2022 compared to fiscal 2021?</code>    |
  | <code>Total revenues increased $8.2 billion, or 7.5%, in 2023 compared to 2022. The increase was primarily driven by pharmacy drug mix, increased prescription volume, brand inflation, and increased contributions from vaccinations.</code> | <code>How much did total revenues increase in 2023 compared to the previous year?</code>                       |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Evaluation Dataset

#### json

* Dataset: json
* Size: 700 evaluation samples
* Columns: <code>positive</code> and <code>anchor</code>
* Approximate statistics based on the first 700 samples:
  |         | positive                                                                            | anchor                                                                             |
  |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
  | type    | string                                                                              | string                                                                             |
  | details | <ul><li>min: 10 tokens</li><li>mean: 44.82 tokens</li><li>max: 439 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 20.31 tokens</li><li>max: 51 tokens</li></ul> |
* Samples:
  | positive                                                                                                                                                                                                                                                                                                                    | anchor                                                                                                                                                      |
  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>Using these constant rates, total revenue and advertising revenue would have been $374 million and $379 million lower than actual total revenue and advertising revenue, respectively, for the full year 2023.</code>                                                                                                 | <code>How much would total revenue and advertising revenue have been lower in 2023 using constant foreign exchange rates compared to actual figures?</code> |
  | <code>Interest expense increased $42.9 million to $348.8 million for the year ended December 31, 2023, compared to $305.9 million during the year ended December 31, 2022.</code>                                                                                                                                           | <code>What was the total interest expense for the year ended December 31, 2023?</code>                                                                      |
  | <code>Net cash provided by operating activities increased $183.3 million in 2022 compared to 2021 primarily as a result of higher current year earnings, net of non-cash items, and smaller decreases in liability balances, partially offset by higher inventory levels and a smaller increase in accounts payable.</code> | <code>How much did net cash provided by operating activities increase in 2022 compared to 2021?</code>                                                      |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: epoch
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `gradient_accumulation_steps`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 4
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `fp16`: True
- `tf32`: False
- `load_best_model_at_end`: True
- `optim`: adamw_torch_fused
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 16
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: False
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch      | Step   | Training Loss | loss       | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
|:----------:|:------:|:-------------:|:----------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
| 0.8122     | 10     | 1.6144        | -          | -                      | -                      | -                      | -                     | -                      |
| 0.9746     | 12     | -             | 0.2439     | 0.7301                 | 0.7428                 | 0.7539                 | 0.6957                | 0.7607                 |
| 1.6244     | 20     | 0.6547        | -          | -                      | -                      | -                      | -                     | -                      |
| 1.9492     | 24     | -             | 0.1966     | 0.7496                 | 0.7631                 | 0.7729                 | 0.7187                | 0.7733                 |
| 2.4365     | 30     | 0.4734        | -          | -                      | -                      | -                      | -                     | -                      |
| 2.9239     | 36     | -             | 0.1822     | 0.7556                 | 0.7643                 | 0.7743                 | 0.7242                | 0.7756                 |
| 3.2487     | 40     | 0.3833        | -          | -                      | -                      | -                      | -                     | -                      |
| **3.8985** | **48** | **-**         | **0.1794** | **0.7564**             | **0.7658**             | **0.7743**             | **0.7237**            | **0.7779**             |

* The bold row denotes the saved checkpoint.

### Framework Versions
- Python: 3.8.10
- Sentence Transformers: 3.1.1
- Transformers: 4.45.2
- PyTorch: 2.1.2+cu121
- Accelerate: 1.0.1
- Datasets: 2.19.1
- Tokenizers: 0.20.3

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->