legal-ft-2 / README.md

dataera2013

Add new SentenceTransformer model

c0abc78 verified about 1 month ago

31.3 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:164
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: Snowflake/snowflake-arctic-embed-l
	widget:
	- source_sentence: 'QUESTION #1\n'
	sentences:
	- 'An interesting point of comparison here could be the way railways rolled out
	around the world in the 1800s. Constructing these required enormous investments
	and had a massive environmental impact, and many of the lines that were built
	turned out to be unnecessary—sometimes multiple lines from different companies
	serving the exact same routes!

	The resulting bubbles contributed to several financial crashes, see Wikipedia
	for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They
	left us with a lot of useful infrastructure and a great deal of bankruptcies and
	environmental damage.

	The year of slop'
	- 'This remains astonishing to me. I thought a model with the capabilities and output
	quality of GPT-4 needed a datacenter class server with one or more $40,000+ GPUs.

	These models take up enough of my 64GB of RAM that I don’t run them often—they
	don’t leave much room for anything else.

	The fact that they run at all is a testament to the incredible training and inference
	performance gains that we’ve figured out over the past year. It turns out there
	was a lot of low-hanging fruit to be harvested in terms of model efficiency. I
	expect there’s still more to come.'
	- 'Things we learned about LLMs in 2024






















	Simon Willison’s Weblog

	Subscribe







	Things we learned about LLMs in 2024

	31st December 2024

	A lot has happened in the world of Large Language Models over the course of 2024.
	Here’s a review of things we figured out about the field in the past twelve months,
	plus my attempt at identifying key themes and pivotal moments.

	This is a sequel to my review of 2023.

	In this article:'
	- source_sentence: 'QUESTION #2\n...\n\nContext:\nJust this week, the New York Times
	launched a landmark lawsuit against OpenAI and Microsoft over this issue. The
	69 page PDF is genuinely worth reading—especially the first few pages, which lay
	out the issues in a way that’s surprisingly easy to follow. The rest of the document
	includes some of the clearest explanations of what LLMs are, how they work and
	how they are built that I’ve read anywhere.\nThe legal arguments here are complex.
	I’m not a lawyer, but I don’t think this one will be easily decided. Whichever
	way it goes, I expect this case to have a profound impact on how this technology
	develops in the future.\n'', additional_kwargs={}, response_metadata={})]'
	sentences:
	- 'A lot of people are excited about AI agents—an infuriatingly vague term that
	seems to be converging on “AI systems that can go away and act on your behalf”.
	We’ve been talking about them all year, but I’ve seen few if any examples of them
	running in production, despite lots of exciting prototypes.

	I think this is because of gullibility.

	Can we solve this? Honestly, I’m beginning to suspect that you can’t fully solve
	gullibility without achieving AGI. So it may be quite a while before those agent
	dreams can really start to come true!

	Code may be the best application

	Over the course of the year, it’s become increasingly clear that writing code
	is one of the things LLMs are most capable of.'
	- 'Just this week, the New York Times launched a landmark lawsuit against OpenAI
	and Microsoft over this issue. The 69 page PDF is genuinely worth reading—especially
	the first few pages, which lay out the issues in a way that’s surprisingly easy
	to follow. The rest of the document includes some of the clearest explanations
	of what LLMs are, how they work and how they are built that I’ve read anywhere.

	The legal arguments here are complex. I’m not a lawyer, but I don’t think this
	one will be easily decided. Whichever way it goes, I expect this case to have
	a profound impact on how this technology develops in the future.'
	- 'Then there’s the rest. If you browse the Chatbot Arena leaderboard today—still
	the most useful single place to get a vibes-based evaluation of models—you’ll
	see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with
	higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01
	AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21
	Labs, Princeton and Tencent.

	Training a GPT-4 beating model was a huge deal in 2023. In 2024 it’s an achievement
	that isn’t even particularly notable, though I personally still celebrate any
	time a new organization joins that list.

	Some of those GPT-4 models run on my laptop'
	- source_sentence: 'QUESTION #1\n'
	sentences:
	- 'The biggest innovation here is that it opens up a new way to scale a model: instead
	of improving model performance purely through additional compute at training time,
	models can now take on harder problems by spending more compute on inference.

	The sequel to o1, o3 (they skipped “o2” for European trademark reasons) was announced
	on 20th December with an impressive result against the ARC-AGI benchmark, albeit
	one that likely involved more than $1,000,000 of compute time expense!

	o3 is expected to ship in January. I doubt many people have real-world problems
	that would benefit from that level of compute expenditure—I certainly don’t!—but
	it appears to be a genuine next step in LLM architecture for taking on much harder
	problems.'
	- 'Those US export regulations on GPUs to China seem to have inspired some very
	effective training optimizations!

	The environmental impact got better

	A welcome result of the increased efficiency of the models—both the hosted ones
	and the ones I can run locally—is that the energy usage and environmental impact
	of running a prompt has dropped enormously over the past couple of years.

	OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days.
	I have it on good authority that neither Google Gemini nor Amazon Nova (two of
	the least expensive model providers) are running prompts at a loss.'
	- 'OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely
	available from its launch in June. This was a momentus change, because for the
	previous year free users had mostly been restricted to GPT-3.5 level models, meaning
	new users got a very inaccurate mental model of what a capable LLM could actually
	do.

	That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT
	Pro. This $200/month subscription service is the only way to access their most
	capable model, o1 Pro.

	Since the trick behind the o1 series (and the future models it will undoubtedly
	inspire) is to expend more compute time to get better results, I don’t think those
	days of free access to the best available models are likely to return.'
	- source_sentence: 'QUESTION #1\n'
	sentences:
	- 'The May 13th announcement of GPT-4o included a demo of a brand new voice mode,
	where the true multi-modal GPT-4o (the o is for “omni”) model could accept audio
	input and output incredibly realistic sounding speech without needing separate
	TTS or STT models.

	The demo also sounded conspicuously similar to Scarlett Johansson... and after
	she complained the voice from the demo, Skye, never made it to a production product.

	The delay in releasing the new voice mode after the initial demo caused quite
	a lot of confusion. I wrote about that in ChatGPT in “4o” mode is not running
	the new features yet.'
	- 'Against this photo of butterflies at the California Academy of Sciences:



	A shallow dish, likely a hummingbird or butterfly feeder, is red. Pieces of orange
	slices of fruit are visible inside the dish.

	Two butterflies are positioned in the feeder, one is a dark brown/black butterfly
	with white/cream-colored markings. The other is a large, brown butterfly with
	patterns of lighter brown, beige, and black markings, including prominent eye
	spots. The larger brown butterfly appears to be feeding on the fruit.'
	- 'The year of slop

	Synthetic training data works great

	LLMs somehow got even harder to use

	Knowledge is incredibly unevenly distributed

	LLMs need better criticism

	Everything tagged “llms” on my blog in 2024'
	- source_sentence: 'QUESTION #1\n'
	sentences:
	- 'Terminology aside, I remain skeptical as to their utility based, once again,
	on the challenge of gullibility. LLMs believe anything you tell them. Any systems
	that attempts to make meaningful decisions on your behalf will run into the same
	roadblock: how good is a travel agent, or a digital assistant, or even a research
	tool if it can’t distinguish truth from fiction?

	Just the other day Google Search was caught serving up an entirely fake description
	of the non-existant movie “Encanto 2”. It turned out to be summarizing an imagined
	movie listing from a fan fiction wiki.'
	- 'Your browser does not support the audio element.


	OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
	accepts audio input, and the Google Gemini apps can speak in a similar way to
	ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
	meant to roll out in Q1 of 2025.

	Google’s NotebookLM, released in September, took audio output to a new level by
	producing spookily realistic conversations between two “podcast hosts” about anything
	you fed into their tool. They later added custom instructions, so naturally I
	turned them into pelicans:



	Your browser does not support the audio element.'
	- 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
	Gerganov released code that got it working on a MacBook.

	I wrote about how Large language models are having their Stable Diffusion moment,
	and with hindsight that was a very good call!

	This unleashed a whirlwind of innovation, which was accelerated further in July
	when Meta released Llama 2—an improved version which, crucially, included permission
	for commercial use.

	Today there are literally thousands of LLMs that can be run locally, on all manner
	of different devices.'
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: Unknown
	type: unknown
	metrics:
	- type: cosine_accuracy@1
	value: 0.56
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 0.64
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 0.72
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 0.92
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.56
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.21333333333333332
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.14400000000000002
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.09200000000000001
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.56
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 0.64
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 0.72
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 0.92
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.7017423735235339
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.63715873015873
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.6441284271284272
	name: Cosine Map@100
	---

	# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 1024 dimensions
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("dataera2013/legal-ft-2")
	# Run inference
	sentences = [
	'QUESTION #1\\n',
	'Your browser does not support the audio element.\n\nOpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also accepts audio input, and the Google Gemini apps can speak in a similar way to ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s meant to roll out in Q1 of 2025.\nGoogle’s NotebookLM, released in September, took audio output to a new level by producing spookily realistic conversations between two “podcast hosts” about anything you fed into their tool. They later added custom instructions, so naturally I turned them into pelicans:\n\n\nYour browser does not support the audio element.',
	'Then in February, Meta released Llama. And a few weeks later in March, Georgi Gerganov released code that got it working on a MacBook.\nI wrote about how Large language models are having their Stable Diffusion moment, and with hindsight that was a very good call!\nThis unleashed a whirlwind of innovation, which was accelerated further in July when Meta released Llama 2—an improved version which, crucially, included permission for commercial use.\nToday there are literally thousands of LLMs that can be run locally, on all manner of different devices.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 1024]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.56 \|
	\| cosine_accuracy@3 \| 0.64 \|
	\| cosine_accuracy@5 \| 0.72 \|
	\| cosine_accuracy@10 \| 0.92 \|
	\| cosine_precision@1 \| 0.56 \|
	\| cosine_precision@3 \| 0.2133 \|
	\| cosine_precision@5 \| 0.144 \|
	\| cosine_precision@10 \| 0.092 \|
	\| cosine_recall@1 \| 0.56 \|
	\| cosine_recall@3 \| 0.64 \|
	\| cosine_recall@5 \| 0.72 \|
	\| cosine_recall@10 \| 0.92 \|
	\| cosine_ndcg@10 \| 0.7017 \|
	\| cosine_mrr@10 \| 0.6372 \|
	\| cosine_map@100 \| 0.6441 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 164 training samples
	* Columns: <code>sentence_0</code> and <code>sentence_1</code>
	* Approximate statistics based on the first 164 samples:
	\| \| sentence_0 \| sentence_1 \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 4 tokens</li><li>mean: 72.05 tokens</li><li>max: 228 tokens</li></ul> \| <ul><li>min: 43 tokens</li><li>mean: 135.85 tokens</li><li>max: 214 tokens</li></ul> \|
	* Samples:
	\| sentence_0 \| sentence_1 \|
	\|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code>QUESTION #1\n</code> \| <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> \|
	\| <code>QUESTION #2\n...\n\nContext:\nStuff we figured out about AI in 2023\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSimon Willison’s Weblog\nSubscribe\n\n\n\n\n\n\nStuff we figured out about AI in 2023\n31st December 2023\n2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.\nHere’s my attempt to round up the highlights in one place!\n', additional_kwargs={}, response_metadata={})]</code> \| <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> \|
	\| <code>QUESTION #1\n</code> \| <code>Large Language Models<br>They’re actually quite easy to build<br>You can run LLMs on your own devices<br>Hobbyists can build their own fine-tuned models<br>We don’t yet know how to build GPT-4<br>Vibes Based Development<br>LLMs are really smart, and also really, really dumb<br>Gullibility is the biggest unsolved problem<br>Code may be the best application<br>The ethics of this space remain diabolically complex<br>My blog in 2023</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `num_train_epochs`: 10
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 10
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `dispatch_batches`: None
	- `split_batches`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin

	</details>

	### Training Logs
	\| Epoch \| Step \| cosine_ndcg@10 \|
	\|:------:\|:----:\|:--------------:\|
	\| 1.0 \| 17 \| 0.7017 \|
	\| 2.0 \| 34 \| 0.7017 \|
	\| 2.9412 \| 50 \| 0.7017 \|
	\| 3.0 \| 51 \| 0.7017 \|
	\| 4.0 \| 68 \| 0.7017 \|
	\| 5.0 \| 85 \| 0.7017 \|
	\| 5.8824 \| 100 \| 0.7017 \|
	\| 6.0 \| 102 \| 0.7017 \|
	\| 7.0 \| 119 \| 0.7017 \|
	\| 8.0 \| 136 \| 0.7017 \|
	\| 8.8235 \| 150 \| 0.7017 \|
	\| 9.0 \| 153 \| 0.7017 \|
	\| 10.0 \| 170 \| 0.7017 \|


	### Framework Versions
	- Python: 3.13.1
	- Sentence Transformers: 3.4.1
	- Transformers: 4.48.3
	- PyTorch: 2.6.0+cu124
	- Accelerate: 1.3.0
	- Datasets: 3.2.0
	- Tokenizers: 0.21.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->