Add new SentenceTransformer model

6cb3fce verified about 1 month ago

28.8 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:156
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: Snowflake/snowflake-arctic-embed-m
	widget:
	- source_sentence: How many input tokens are required for each photo mentioned in
	the context?
	sentences:
	- 'DeepSeek v3 is a huge 685B parameter model—one of the largest openly licensed
	models currently available, significantly bigger than the largest of Meta’s Llama
	series, Llama 3.1 405B.

	Benchmarks put it up there with Claude 3.5 Sonnet. Vibe benchmarks (aka the Chatbot
	Arena) currently rank it 7th, just behind the Gemini 2.0 and OpenAI 4o/o1 models.
	This is by far the highest ranking openly licensed model.

	The really impressive thing about DeepSeek v3 is the training cost. The model
	was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama
	3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model
	that benchmarks slightly worse.'
	- 'Each photo would need 260 input tokens and around 100 output tokens.

	260 * 68,000 = 17,680,000 input tokens

	17,680,000 * $0.0375/million = $0.66

	100 * 68,000 = 6,800,000 output tokens

	6,800,000 * $0.15/million = $1.02

	That’s a total cost of $1.68 to process 68,000 images. That’s so absurdly cheap
	I had to run the numbers three times to confirm I got it right.

	How good are those descriptions? Here’s what I got from this command:

	llm -m gemini-1.5-flash-8b-latest describe -a IMG_1825.jpeg'
	- 'The GPT-4 barrier was comprehensively broken

	In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
	best model was almost a year old at that point, yet no other AI lab had produced
	anything better. What did OpenAI know that the rest of us didn’t?

	I’m relieved that this has changed completely in the past twelve months. 18 organizations
	now have models on the Chatbot Arena Leaderboard that rank higher than the original
	GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
	- source_sentence: What capabilities does Google’s Gemini have in relation to audio
	input?
	sentences:
	- 'Things we learned about LLMs in 2024






















	Simon Willison’s Weblog

	Subscribe







	Things we learned about LLMs in 2024

	31st December 2024

	A lot has happened in the world of Large Language Models over the course of 2024.
	Here’s a review of things we figured out about the field in the past twelve months,
	plus my attempt at identifying key themes and pivotal moments.

	This is a sequel to my review of 2023.

	In this article:'
	- 'Your browser does not support the audio element.


	OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
	accepts audio input, and the Google Gemini apps can speak in a similar way to
	ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
	meant to roll out in Q1 of 2025.

	Google’s NotebookLM, released in September, took audio output to a new level by
	producing spookily realistic conversations between two “podcast hosts” about anything
	you fed into their tool. They later added custom instructions, so naturally I
	turned them into pelicans:



	Your browser does not support the audio element.'
	- 'In 2024, almost every significant model vendor released multi-modal models. We
	saw the Claude 3 series from Anthropic in March, Gemini 1.5 Pro in April (images,
	audio and video), then September brought Qwen2-VL and Mistral’s Pixtral 12B and
	Meta’s Llama 3.2 11B and 90B vision models. We got audio input and output from
	OpenAI in October, then November saw SmolVLM from Hugging Face and December saw
	image and video models from Amazon Nova.

	In October I upgraded my LLM CLI tool to support multi-modal models via attachments.
	It now has plugins for a whole collection of different vision models.'
	- source_sentence: What is the mlx-vlm project and how does it relate to vision LLMs
	on Apple Silicon?
	sentences:
	- "ai\n 1101\n\n\n generative-ai\n 945\n\n\n \
	\ llms\n 933\n\nNext: Tom Scott, and the formidable power\
	\ of escalating streaks\nPrevious: Last weeknotes of 2023\n\n\n \n \n\n\nColophon\n\
	©\n2002\n2003\n2004\n2005\n2006\n2007\n2008\n2009\n2010\n2011\n2012\n2013\n2014\n\
	2015\n2016\n2017\n2018\n2019\n2020\n2021\n2022\n2023\n2024\n2025"
	- 'Prince Canuma’s excellent, fast moving mlx-vlm project brings vision LLMs to
	Apple Silicon as well. I used that recently to run Qwen’s QvQ.

	While MLX is a game changer, Apple’s own “Apple Intelligence” features have mostly
	been a disappointment. I wrote about their initial announcement in June, and I
	was optimistic that Apple had focused hard on the subset of LLM applications that
	preserve user privacy and minimize the chance of users getting mislead by confusing
	features.'
	- 'Longer inputs dramatically increase the scope of problems that can be solved
	with an LLM: you can now throw in an entire book and ask questions about its contents,
	but more importantly you can feed in a lot of example code to help the model correctly
	solve a coding problem. LLM use-cases that involve long inputs are far more interesting
	to me than short prompts that rely purely on the information already baked into
	the model weights. Many of my tools were built using this pattern.'
	- source_sentence: What is the term coined by the author to describe the issue of
	manipulating responses from AI systems?
	sentences:
	- 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
	Gerganov released code that got it working on a MacBook.

	I wrote about how Large language models are having their Stable Diffusion moment,
	and with hindsight that was a very good call!

	This unleashed a whirlwind of innovation, which was accelerated further in July
	when Meta released Llama 2—an improved version which, crucially, included permission
	for commercial use.

	Today there are literally thousands of LLMs that can be run locally, on all manner
	of different devices.'
	- 'On paper, a 64GB Mac should be a great machine for running models due to the
	way the CPU and GPU can share the same memory. In practice, many models are released
	as model weights and libraries that reward NVIDIA’s CUDA over other platforms.

	The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
	Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.

	Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
	models on my Mac, with excellent performance. mlx-community on Hugging Face offers
	more than 1,000 models that have been converted to the necessary format.'
	- 'Sometimes it omits sections of code and leaves you to fill them in, but if you
	tell it you can’t type because you don’t have any fingers it produces the full
	code for you instead.

	There are so many more examples like this. Offer it cash tips for better answers.
	Tell it your career depends on it. Give it positive reinforcement. It’s all so
	dumb, but it works!

	Gullibility is the biggest unsolved problem

	I coined the term prompt injection in September last year.

	15 months later, I regret to say that we’re still no closer to a robust, dependable
	solution to this problem.

	I’ve written a ton about this already.

	Beyond that specific class of security vulnerabilities, I’ve started seeing this
	as a wider problem of gullibility.'
	- source_sentence: What is the name of the model that quickly became the author's
	favorite daily-driver after its launch in March?
	sentences:
	- 'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched
	in March, and Claude 3 Opus quickly became my new favourite daily-driver. They
	upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model
	that is still my favourite six months later (though it got a significant upgrade
	on October 22, confusingly keeping the same 3.5 version number. Anthropic fans
	have since taken to calling it Claude 3.6).'
	- 'Embeddings: What they are and why they matter

	61.7k

	79.3k



	Catching up on the weird world of LLMs

	61.6k

	85.9k



	llamafile is the new best way to run an LLM on your own computer

	52k

	66k



	Prompt injection explained, with video, slides, and a transcript

	51k

	61.9k



	AI-enhanced development makes me more ambitious with my projects

	49.6k

	60.1k



	Understanding GPT tokenizers

	49.5k

	61.1k



	Exploring GPTs: ChatGPT in a trench coat?

	46.4k

	58.5k



	Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

	40.5k

	49.2k



	How to implement Q&A against your documentation with GPT3, embeddings and Datasette

	37.3k

	44.9k



	Lawyer cites fake cases invented by ChatGPT, judge is not amused

	37.1k

	47.4k'
	- 'We already knew LLMs were spookily good at writing code. If you prompt them right,
	it turns out they can build you a full interactive application using HTML, CSS
	and JavaScript (and tools like React if you wire up some extra supporting build
	mechanisms)—often in a single prompt.

	Anthropic kicked this idea into high gear when they released Claude Artifacts,
	a groundbreaking new feature that was initially slightly lost in the noise due
	to being described half way through their announcement of the incredible Claude
	3.5 Sonnet.

	With Artifacts, Claude can write you an on-demand interactive application and
	then let you use it directly inside the Claude interface.

	Here’s my Extract URLs app, entirely generated by Claude:'
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: Unknown
	type: unknown
	metrics:
	- type: cosine_accuracy@1
	value: 0.9166666666666666
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 1.0
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.9166666666666666
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.3333333333333333
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.20000000000000004
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.10000000000000002
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.9166666666666666
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 1.0
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.9692441461309548
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.9583333333333334
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.9583333333333334
	name: Cosine Map@100
	---

	# SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision fc74610d18462d218e312aa986ec5c8a75a98152 -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 dimensions
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("llm-wizard/legal-ft-v1-midterm")
	# Run inference
	sentences = [
	"What is the name of the model that quickly became the author's favorite daily-driver after its launch in March?",
	'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched in March, and Claude 3 Opus quickly became my new favourite daily-driver. They upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model that is still my favourite six months later (though it got a significant upgrade on October 22, confusingly keeping the same 3.5 version number. Anthropic fans have since taken to calling it Claude 3.6).',
	'We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.\nAnthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.\nWith Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.\nHere’s my Extract URLs app, entirely generated by Claude:',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.9167 \|
	\| cosine_accuracy@3 \| 1.0 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.9167 \|
	\| cosine_precision@3 \| 0.3333 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.9167 \|
	\| cosine_recall@3 \| 1.0 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.9692 \|
	\| cosine_mrr@10 \| 0.9583 \|
	\| cosine_map@100 \| 0.9583 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 156 training samples
	* Columns: <code>sentence_0</code> and <code>sentence_1</code>
	* Approximate statistics based on the first 156 samples:
	\| \| sentence_0 \| sentence_1 \|
	\|:--------\|:----------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 12 tokens</li><li>mean: 20.1 tokens</li><li>max: 31 tokens</li></ul> \| <ul><li>min: 43 tokens</li><li>mean: 135.18 tokens</li><li>max: 214 tokens</li></ul> \|
	* Samples:
	\| sentence_0 \| sentence_1 \|
	\|:---------------------------------------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>What is the main concept behind the chain-of-thought prompting trick as discussed in the context?</code> \| <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> \|
	\| <code>How do o1 models enhance the reasoning process compared to traditional models?</code> \| <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> \|
	\| <code>What are some of the capabilities of Large Language Models (LLMs) mentioned in the context?</code> \| <code>Here’s the sequel to this post: Things we learned about LLMs in 2024.<br>Large Language Models<br>In the past 24-36 months, our species has discovered that you can take a GIANT corpus of text, run it through a pile of GPUs, and use it to create a fascinating new kind of software.<br>LLMs can do a lot of things. They can answer questions, summarize documents, translate from one language to another, extract information and even write surprisingly competent code.<br>They can also help you cheat at your homework, generate unlimited streams of fake content and be used for all manner of nefarious purposes.</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `num_train_epochs`: 10
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 10
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin

	</details>

	### Training Logs
	\| Epoch \| Step \| cosine_ndcg@10 \|
	\|:-----:\|:----:\|:--------------:\|
	\| 1.0 \| 16 \| 0.8768 \|
	\| 2.0 \| 32 \| 0.9317 \|
	\| 3.0 \| 48 \| 0.9484 \|
	\| 3.125 \| 50 \| 0.9638 \|
	\| 4.0 \| 64 \| 0.9692 \|
	\| 5.0 \| 80 \| 0.9692 \|
	\| 6.0 \| 96 \| 0.9692 \|
	\| 6.25 \| 100 \| 0.9692 \|
	\| 7.0 \| 112 \| 0.9692 \|
	\| 8.0 \| 128 \| 0.9692 \|
	\| 9.0 \| 144 \| 0.9692 \|
	\| 9.375 \| 150 \| 0.9692 \|
	\| 10.0 \| 160 \| 0.9692 \|


	### Framework Versions
	- Python: 3.11.11
	- Sentence Transformers: 3.4.1
	- Transformers: 4.48.3
	- PyTorch: 2.5.1+cu124
	- Accelerate: 1.3.0
	- Datasets: 3.3.1
	- Tokenizers: 0.21.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->