File size: 28,812 Bytes

6cb3fce

---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:156
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-m
widget:
- source_sentence: How many input tokens are required for each photo mentioned in
    the context?
  sentences:
  - 'DeepSeek v3 is a huge 685B parameter model—one of the largest openly licensed
    models currently available, significantly bigger than the largest of Meta’s Llama
    series, Llama 3.1 405B.

    Benchmarks put it up there with Claude 3.5 Sonnet. Vibe benchmarks (aka the Chatbot
    Arena) currently rank it 7th, just behind the Gemini 2.0 and OpenAI 4o/o1 models.
    This is by far the highest ranking openly licensed model.

    The really impressive thing about DeepSeek v3 is the training cost. The model
    was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama
    3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model
    that benchmarks slightly worse.'
  - 'Each photo would need 260 input tokens and around 100 output tokens.

    260 * 68,000 = 17,680,000 input tokens

    17,680,000 * $0.0375/million = $0.66

    100 * 68,000 = 6,800,000 output tokens

    6,800,000 * $0.15/million = $1.02

    That’s a total cost of $1.68 to process 68,000 images. That’s so absurdly cheap
    I had to run the numbers three times to confirm I got it right.

    How good are those descriptions? Here’s what I got from this command:

    llm -m gemini-1.5-flash-8b-latest describe -a IMG_1825.jpeg'
  - 'The GPT-4 barrier was comprehensively broken

    In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
    best model was almost a year old at that point, yet no other AI lab had produced
    anything better. What did OpenAI know that the rest of us didn’t?

    I’m relieved that this has changed completely in the past twelve months. 18 organizations
    now have models on the Chatbot Arena Leaderboard that rank higher than the original
    GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
- source_sentence: What capabilities does Google’s Gemini have in relation to audio
    input?
  sentences:
  - 'Things we learned about LLMs in 2024






















    Simon Willison’s Weblog

    Subscribe







    Things we learned about LLMs in 2024

    31st December 2024

    A lot has happened in the world of Large Language Models over the course of 2024.
    Here’s a review of things we figured out about the field in the past twelve months,
    plus my attempt at identifying key themes and pivotal moments.

    This is a sequel to my review of 2023.

    In this article:'
  - 'Your browser does not support the audio element.


    OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
    accepts audio input, and the Google Gemini apps can speak in a similar way to
    ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
    meant to roll out in Q1 of 2025.

    Google’s NotebookLM, released in September, took audio output to a new level by
    producing spookily realistic conversations between two “podcast hosts” about anything
    you fed into their tool. They later added custom instructions, so naturally I
    turned them into pelicans:



    Your browser does not support the audio element.'
  - 'In 2024, almost every significant model vendor released multi-modal models. We
    saw the Claude 3 series from Anthropic in March, Gemini 1.5 Pro in April (images,
    audio and video), then September brought Qwen2-VL and Mistral’s Pixtral 12B and
    Meta’s Llama 3.2 11B and 90B vision models. We got audio input and output from
    OpenAI in October, then November saw SmolVLM from Hugging Face and December saw
    image and video models from Amazon Nova.

    In October I upgraded my LLM CLI tool to support multi-modal models via attachments.
    It now has plugins for a whole collection of different vision models.'
- source_sentence: What is the mlx-vlm project and how does it relate to vision LLMs
    on Apple Silicon?
  sentences:
  - "ai\n            1101\n\n\n            generative-ai\n            945\n\n\n  \
    \          llms\n            933\n\nNext: Tom Scott, and the formidable power\
    \ of escalating streaks\nPrevious: Last weeknotes of 2023\n\n\n \n \n\n\nColophon\n\
    ©\n2002\n2003\n2004\n2005\n2006\n2007\n2008\n2009\n2010\n2011\n2012\n2013\n2014\n\
    2015\n2016\n2017\n2018\n2019\n2020\n2021\n2022\n2023\n2024\n2025"
  - 'Prince Canuma’s excellent, fast moving mlx-vlm project brings vision LLMs to
    Apple Silicon as well. I used that recently to run Qwen’s QvQ.

    While MLX is a game changer, Apple’s own “Apple Intelligence” features have mostly
    been a disappointment. I wrote about their initial announcement in June, and I
    was optimistic that Apple had focused hard on the subset of LLM applications that
    preserve user privacy and minimize the chance of users getting mislead by confusing
    features.'
  - 'Longer inputs dramatically increase the scope of problems that can be solved
    with an LLM: you can now throw in an entire book and ask questions about its contents,
    but more importantly you can feed in a lot of example code to help the model correctly
    solve a coding problem. LLM use-cases that involve long inputs are far more interesting
    to me than short prompts that rely purely on the information already baked into
    the model weights. Many of my tools were built using this pattern.'
- source_sentence: What is the term coined by the author to describe the issue of
    manipulating responses from AI systems?
  sentences:
  - 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
    Gerganov released code that got it working on a MacBook.

    I wrote about how Large language models are having their Stable Diffusion moment,
    and with hindsight that was a very good call!

    This unleashed a whirlwind of innovation, which was accelerated further in July
    when Meta released Llama 2—an improved version which, crucially, included permission
    for commercial use.

    Today there are literally thousands of LLMs that can be run locally, on all manner
    of different devices.'
  - 'On paper, a 64GB Mac should be a great machine for running models due to the
    way the CPU and GPU can share the same memory. In practice, many models are released
    as model weights and libraries that reward NVIDIA’s CUDA over other platforms.

    The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
    Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.

    Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
    models on my Mac, with excellent performance. mlx-community on Hugging Face offers
    more than 1,000 models that have been converted to the necessary format.'
  - 'Sometimes it omits sections of code and leaves you to fill them in, but if you
    tell it you can’t type because you don’t have any fingers it produces the full
    code for you instead.

    There are so many more examples like this. Offer it cash tips for better answers.
    Tell it your career depends on it. Give it positive reinforcement. It’s all so
    dumb, but it works!

    Gullibility is the biggest unsolved problem

    I coined the term prompt injection in September last year.

    15 months later, I regret to say that we’re still no closer to a robust, dependable
    solution to this problem.

    I’ve written a ton about this already.

    Beyond that specific class of security vulnerabilities, I’ve started seeing this
    as a wider problem of gullibility.'
- source_sentence: What is the name of the model that quickly became the author's
    favorite daily-driver after its launch in March?
  sentences:
  - 'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched
    in March, and Claude 3 Opus quickly became my new favourite daily-driver. They
    upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model
    that is still my favourite six months later (though it got a significant upgrade
    on October 22, confusingly keeping the same 3.5 version number. Anthropic fans
    have since taken to calling it Claude 3.6).'
  - 'Embeddings: What they are and why they matter

    61.7k

    79.3k



    Catching up on the weird world of LLMs

    61.6k

    85.9k



    llamafile is the new best way to run an LLM on your own computer

    52k

    66k



    Prompt injection explained, with video, slides, and a transcript

    51k

    61.9k



    AI-enhanced development makes me more ambitious with my projects

    49.6k

    60.1k



    Understanding GPT tokenizers

    49.5k

    61.1k



    Exploring GPTs: ChatGPT in a trench coat?

    46.4k

    58.5k



    Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

    40.5k

    49.2k



    How to implement Q&A against your documentation with GPT3, embeddings and Datasette

    37.3k

    44.9k



    Lawyer cites fake cases invented by ChatGPT, judge is not amused

    37.1k

    47.4k'
  - 'We already knew LLMs were spookily good at writing code. If you prompt them right,
    it turns out they can build you a full interactive application using HTML, CSS
    and JavaScript (and tools like React if you wire up some extra supporting build
    mechanisms)—often in a single prompt.

    Anthropic kicked this idea into high gear when they released Claude Artifacts,
    a groundbreaking new feature that was initially slightly lost in the noise due
    to being described half way through their announcement of the incredible Claude
    3.5 Sonnet.

    With Artifacts, Claude can write you an on-demand interactive application and
    then let you use it directly inside the Claude interface.

    Here’s my Extract URLs app, entirely generated by Claude:'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
  results:
  - task:
      type: information-retrieval
      name: Information Retrieval
    dataset:
      name: Unknown
      type: unknown
    metrics:
    - type: cosine_accuracy@1
      value: 0.9166666666666666
      name: Cosine Accuracy@1
    - type: cosine_accuracy@3
      value: 1.0
      name: Cosine Accuracy@3
    - type: cosine_accuracy@5
      value: 1.0
      name: Cosine Accuracy@5
    - type: cosine_accuracy@10
      value: 1.0
      name: Cosine Accuracy@10
    - type: cosine_precision@1
      value: 0.9166666666666666
      name: Cosine Precision@1
    - type: cosine_precision@3
      value: 0.3333333333333333
      name: Cosine Precision@3
    - type: cosine_precision@5
      value: 0.20000000000000004
      name: Cosine Precision@5
    - type: cosine_precision@10
      value: 0.10000000000000002
      name: Cosine Precision@10
    - type: cosine_recall@1
      value: 0.9166666666666666
      name: Cosine Recall@1
    - type: cosine_recall@3
      value: 1.0
      name: Cosine Recall@3
    - type: cosine_recall@5
      value: 1.0
      name: Cosine Recall@5
    - type: cosine_recall@10
      value: 1.0
      name: Cosine Recall@10
    - type: cosine_ndcg@10
      value: 0.9692441461309548
      name: Cosine Ndcg@10
    - type: cosine_mrr@10
      value: 0.9583333333333334
      name: Cosine Mrr@10
    - type: cosine_map@100
      value: 0.9583333333333334
      name: Cosine Map@100
---

# SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision fc74610d18462d218e312aa986ec5c8a75a98152 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("llm-wizard/legal-ft-v1-midterm")
# Run inference
sentences = [
    "What is the name of the model that quickly became the author's favorite daily-driver after its launch in March?",
    'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched in March, and Claude 3 Opus quickly became my new favourite daily-driver. They upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model that is still my favourite six months later (though it got a significant upgrade on October 22, confusingly keeping the same 3.5 version number. Anthropic fans have since taken to calling it Claude 3.6).',
    'We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.\nAnthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.\nWith Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.\nHere’s my Extract URLs app, entirely generated by Claude:',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Information Retrieval

* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

| Metric              | Value      |
|:--------------------|:-----------|
| cosine_accuracy@1   | 0.9167     |
| cosine_accuracy@3   | 1.0        |
| cosine_accuracy@5   | 1.0        |
| cosine_accuracy@10  | 1.0        |
| cosine_precision@1  | 0.9167     |
| cosine_precision@3  | 0.3333     |
| cosine_precision@5  | 0.2        |
| cosine_precision@10 | 0.1        |
| cosine_recall@1     | 0.9167     |
| cosine_recall@3     | 1.0        |
| cosine_recall@5     | 1.0        |
| cosine_recall@10    | 1.0        |
| **cosine_ndcg@10**  | **0.9692** |
| cosine_mrr@10       | 0.9583     |
| cosine_map@100      | 0.9583     |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 156 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 156 samples:
  |         | sentence_0                                                                        | sentence_1                                                                           |
  |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
  | type    | string                                                                            | string                                                                               |
  | details | <ul><li>min: 12 tokens</li><li>mean: 20.1 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.18 tokens</li><li>max: 214 tokens</li></ul> |
* Samples:
  | sentence_0                                                                                                     | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
  |:---------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>What is the main concept behind the chain-of-thought prompting trick as discussed in the context?</code> | <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> |
  | <code>How do o1 models enhance the reasoning process compared to traditional models?</code>                    | <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> |
  | <code>What are some of the capabilities of Large Language Models (LLMs) mentioned in the context?</code>       | <code>Here’s the sequel to this post: Things we learned about LLMs in 2024.<br>Large Language Models<br>In the past 24-36 months, our species has discovered that you can take a GIANT corpus of text, run it through a pile of GPUs, and use it to create a fascinating new kind of software.<br>LLMs can do a lot of things. They can answer questions, summarize documents, translate from one language to another, extract information and even write surprisingly competent code.<br>They can also help you cheat at your homework, generate unlimited streams of fake content and be used for all manner of nefarious purposes.</code>                             |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
  ```json
  {
      "loss": "MultipleNegativesRankingLoss",
      "matryoshka_dims": [
          768,
          512,
          256,
          128,
          64
      ],
      "matryoshka_weights": [
          1,
          1,
          1,
          1,
          1
      ],
      "n_dims_per_step": -1
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `num_train_epochs`: 10
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 10
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch | Step | cosine_ndcg@10 |
|:-----:|:----:|:--------------:|
| 1.0   | 16   | 0.8768         |
| 2.0   | 32   | 0.9317         |
| 3.0   | 48   | 0.9484         |
| 3.125 | 50   | 0.9638         |
| 4.0   | 64   | 0.9692         |
| 5.0   | 80   | 0.9692         |
| 6.0   | 96   | 0.9692         |
| 6.25  | 100  | 0.9692         |
| 7.0   | 112  | 0.9692         |
| 8.0   | 128  | 0.9692         |
| 9.0   | 144  | 0.9692         |
| 9.375 | 150  | 0.9692         |
| 10.0  | 160  | 0.9692         |


### Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.48.3
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.1
- Tokenizers: 0.21.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->