|
--- |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:164 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
base_model: Snowflake/snowflake-arctic-embed-l |
|
widget: |
|
- source_sentence: 'QUESTION #1\n' |
|
sentences: |
|
- 'An interesting point of comparison here could be the way railways rolled out |
|
around the world in the 1800s. Constructing these required enormous investments |
|
and had a massive environmental impact, and many of the lines that were built |
|
turned out to be unnecessary—sometimes multiple lines from different companies |
|
serving the exact same routes! |
|
|
|
The resulting bubbles contributed to several financial crashes, see Wikipedia |
|
for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They |
|
left us with a lot of useful infrastructure and a great deal of bankruptcies and |
|
environmental damage. |
|
|
|
The year of slop' |
|
- 'This remains astonishing to me. I thought a model with the capabilities and output |
|
quality of GPT-4 needed a datacenter class server with one or more $40,000+ GPUs. |
|
|
|
These models take up enough of my 64GB of RAM that I don’t run them often—they |
|
don’t leave much room for anything else. |
|
|
|
The fact that they run at all is a testament to the incredible training and inference |
|
performance gains that we’ve figured out over the past year. It turns out there |
|
was a lot of low-hanging fruit to be harvested in terms of model efficiency. I |
|
expect there’s still more to come.' |
|
- 'Things we learned about LLMs in 2024 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Simon Willison’s Weblog |
|
|
|
Subscribe |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Things we learned about LLMs in 2024 |
|
|
|
31st December 2024 |
|
|
|
A lot has happened in the world of Large Language Models over the course of 2024. |
|
Here’s a review of things we figured out about the field in the past twelve months, |
|
plus my attempt at identifying key themes and pivotal moments. |
|
|
|
This is a sequel to my review of 2023. |
|
|
|
In this article:' |
|
- source_sentence: 'QUESTION #2\n...\n\nContext:\nJust this week, the New York Times |
|
launched a landmark lawsuit against OpenAI and Microsoft over this issue. The |
|
69 page PDF is genuinely worth reading—especially the first few pages, which lay |
|
out the issues in a way that’s surprisingly easy to follow. The rest of the document |
|
includes some of the clearest explanations of what LLMs are, how they work and |
|
how they are built that I’ve read anywhere.\nThe legal arguments here are complex. |
|
I’m not a lawyer, but I don’t think this one will be easily decided. Whichever |
|
way it goes, I expect this case to have a profound impact on how this technology |
|
develops in the future.\n'', additional_kwargs={}, response_metadata={})]' |
|
sentences: |
|
- 'A lot of people are excited about AI agents—an infuriatingly vague term that |
|
seems to be converging on “AI systems that can go away and act on your behalf”. |
|
We’ve been talking about them all year, but I’ve seen few if any examples of them |
|
running in production, despite lots of exciting prototypes. |
|
|
|
I think this is because of gullibility. |
|
|
|
Can we solve this? Honestly, I’m beginning to suspect that you can’t fully solve |
|
gullibility without achieving AGI. So it may be quite a while before those agent |
|
dreams can really start to come true! |
|
|
|
Code may be the best application |
|
|
|
Over the course of the year, it’s become increasingly clear that writing code |
|
is one of the things LLMs are most capable of.' |
|
- 'Just this week, the New York Times launched a landmark lawsuit against OpenAI |
|
and Microsoft over this issue. The 69 page PDF is genuinely worth reading—especially |
|
the first few pages, which lay out the issues in a way that’s surprisingly easy |
|
to follow. The rest of the document includes some of the clearest explanations |
|
of what LLMs are, how they work and how they are built that I’ve read anywhere. |
|
|
|
The legal arguments here are complex. I’m not a lawyer, but I don’t think this |
|
one will be easily decided. Whichever way it goes, I expect this case to have |
|
a profound impact on how this technology develops in the future.' |
|
- 'Then there’s the rest. If you browse the Chatbot Arena leaderboard today—still |
|
the most useful single place to get a vibes-based evaluation of models—you’ll |
|
see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with |
|
higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01 |
|
AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21 |
|
Labs, Princeton and Tencent. |
|
|
|
Training a GPT-4 beating model was a huge deal in 2023. In 2024 it’s an achievement |
|
that isn’t even particularly notable, though I personally still celebrate any |
|
time a new organization joins that list. |
|
|
|
Some of those GPT-4 models run on my laptop' |
|
- source_sentence: 'QUESTION #1\n' |
|
sentences: |
|
- 'The biggest innovation here is that it opens up a new way to scale a model: instead |
|
of improving model performance purely through additional compute at training time, |
|
models can now take on harder problems by spending more compute on inference. |
|
|
|
The sequel to o1, o3 (they skipped “o2” for European trademark reasons) was announced |
|
on 20th December with an impressive result against the ARC-AGI benchmark, albeit |
|
one that likely involved more than $1,000,000 of compute time expense! |
|
|
|
o3 is expected to ship in January. I doubt many people have real-world problems |
|
that would benefit from that level of compute expenditure—I certainly don’t!—but |
|
it appears to be a genuine next step in LLM architecture for taking on much harder |
|
problems.' |
|
- 'Those US export regulations on GPUs to China seem to have inspired some very |
|
effective training optimizations! |
|
|
|
The environmental impact got better |
|
|
|
A welcome result of the increased efficiency of the models—both the hosted ones |
|
and the ones I can run locally—is that the energy usage and environmental impact |
|
of running a prompt has dropped enormously over the past couple of years. |
|
|
|
OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days. |
|
I have it on good authority that neither Google Gemini nor Amazon Nova (two of |
|
the least expensive model providers) are running prompts at a loss.' |
|
- 'OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely |
|
available from its launch in June. This was a momentus change, because for the |
|
previous year free users had mostly been restricted to GPT-3.5 level models, meaning |
|
new users got a very inaccurate mental model of what a capable LLM could actually |
|
do. |
|
|
|
That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT |
|
Pro. This $200/month subscription service is the only way to access their most |
|
capable model, o1 Pro. |
|
|
|
Since the trick behind the o1 series (and the future models it will undoubtedly |
|
inspire) is to expend more compute time to get better results, I don’t think those |
|
days of free access to the best available models are likely to return.' |
|
- source_sentence: 'QUESTION #1\n' |
|
sentences: |
|
- 'The May 13th announcement of GPT-4o included a demo of a brand new voice mode, |
|
where the true multi-modal GPT-4o (the o is for “omni”) model could accept audio |
|
input and output incredibly realistic sounding speech without needing separate |
|
TTS or STT models. |
|
|
|
The demo also sounded conspicuously similar to Scarlett Johansson... and after |
|
she complained the voice from the demo, Skye, never made it to a production product. |
|
|
|
The delay in releasing the new voice mode after the initial demo caused quite |
|
a lot of confusion. I wrote about that in ChatGPT in “4o” mode is not running |
|
the new features yet.' |
|
- 'Against this photo of butterflies at the California Academy of Sciences: |
|
|
|
|
|
|
|
A shallow dish, likely a hummingbird or butterfly feeder, is red. Pieces of orange |
|
slices of fruit are visible inside the dish. |
|
|
|
Two butterflies are positioned in the feeder, one is a dark brown/black butterfly |
|
with white/cream-colored markings. The other is a large, brown butterfly with |
|
patterns of lighter brown, beige, and black markings, including prominent eye |
|
spots. The larger brown butterfly appears to be feeding on the fruit.' |
|
- 'The year of slop |
|
|
|
Synthetic training data works great |
|
|
|
LLMs somehow got even harder to use |
|
|
|
Knowledge is incredibly unevenly distributed |
|
|
|
LLMs need better criticism |
|
|
|
Everything tagged “llms” on my blog in 2024' |
|
- source_sentence: 'QUESTION #1\n' |
|
sentences: |
|
- 'Terminology aside, I remain skeptical as to their utility based, once again, |
|
on the challenge of gullibility. LLMs believe anything you tell them. Any systems |
|
that attempts to make meaningful decisions on your behalf will run into the same |
|
roadblock: how good is a travel agent, or a digital assistant, or even a research |
|
tool if it can’t distinguish truth from fiction? |
|
|
|
Just the other day Google Search was caught serving up an entirely fake description |
|
of the non-existant movie “Encanto 2”. It turned out to be summarizing an imagined |
|
movie listing from a fan fiction wiki.' |
|
- 'Your browser does not support the audio element. |
|
|
|
|
|
OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also |
|
accepts audio input, and the Google Gemini apps can speak in a similar way to |
|
ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s |
|
meant to roll out in Q1 of 2025. |
|
|
|
Google’s NotebookLM, released in September, took audio output to a new level by |
|
producing spookily realistic conversations between two “podcast hosts” about anything |
|
you fed into their tool. They later added custom instructions, so naturally I |
|
turned them into pelicans: |
|
|
|
|
|
|
|
Your browser does not support the audio element.' |
|
- 'Then in February, Meta released Llama. And a few weeks later in March, Georgi |
|
Gerganov released code that got it working on a MacBook. |
|
|
|
I wrote about how Large language models are having their Stable Diffusion moment, |
|
and with hindsight that was a very good call! |
|
|
|
This unleashed a whirlwind of innovation, which was accelerated further in July |
|
when Meta released Llama 2—an improved version which, crucially, included permission |
|
for commercial use. |
|
|
|
Today there are literally thousands of LLMs that can be run locally, on all manner |
|
of different devices.' |
|
pipeline_tag: sentence-similarity |
|
library_name: sentence-transformers |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
model-index: |
|
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: Unknown |
|
type: unknown |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.56 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.64 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.72 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.92 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.56 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.21333333333333332 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.14400000000000002 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09200000000000001 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.56 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.64 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.72 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.92 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.7017423735235339 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.63715873015873 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.6441284271284272 |
|
name: Cosine Map@100 |
|
--- |
|
|
|
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b --> |
|
- **Maximum Sequence Length:** 512 tokens |
|
- **Output Dimensionality:** 1024 dimensions |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Normalize() |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("dataera2013/legal-ft-2") |
|
# Run inference |
|
sentences = [ |
|
'QUESTION #1\\n', |
|
'Your browser does not support the audio element.\n\nOpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also accepts audio input, and the Google Gemini apps can speak in a similar way to ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s meant to roll out in Q1 of 2025.\nGoogle’s NotebookLM, released in September, took audio output to a new level by producing spookily realistic conversations between two “podcast hosts” about anything you fed into their tool. They later added custom instructions, so naturally I turned them into pelicans:\n\n\nYour browser does not support the audio element.', |
|
'Then in February, Meta released Llama. And a few weeks later in March, Georgi Gerganov released code that got it working on a MacBook.\nI wrote about how Large language models are having their Stable Diffusion moment, and with hindsight that was a very good call!\nThis unleashed a whirlwind of innovation, which was accelerated further in July when Meta released Llama 2—an improved version which, crucially, included permission for commercial use.\nToday there are literally thousands of LLMs that can be run locally, on all manner of different devices.', |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 1024] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
|
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.56 | |
|
| cosine_accuracy@3 | 0.64 | |
|
| cosine_accuracy@5 | 0.72 | |
|
| cosine_accuracy@10 | 0.92 | |
|
| cosine_precision@1 | 0.56 | |
|
| cosine_precision@3 | 0.2133 | |
|
| cosine_precision@5 | 0.144 | |
|
| cosine_precision@10 | 0.092 | |
|
| cosine_recall@1 | 0.56 | |
|
| cosine_recall@3 | 0.64 | |
|
| cosine_recall@5 | 0.72 | |
|
| cosine_recall@10 | 0.92 | |
|
| **cosine_ndcg@10** | **0.7017** | |
|
| cosine_mrr@10 | 0.6372 | |
|
| cosine_map@100 | 0.6441 | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
* Size: 164 training samples |
|
* Columns: <code>sentence_0</code> and <code>sentence_1</code> |
|
* Approximate statistics based on the first 164 samples: |
|
| | sentence_0 | sentence_1 | |
|
|:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 4 tokens</li><li>mean: 72.05 tokens</li><li>max: 228 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.85 tokens</li><li>max: 214 tokens</li></ul> | |
|
* Samples: |
|
| sentence_0 | sentence_1 | |
|
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
| <code>QUESTION #1\n</code> | <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> | |
|
| <code>QUESTION #2\n...\n\nContext:\nStuff we figured out about AI in 2023\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSimon Willison’s Weblog\nSubscribe\n\n\n\n\n\n\nStuff we figured out about AI in 2023\n31st December 2023\n2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.\nHere’s my attempt to round up the highlights in one place!\n', additional_kwargs={}, response_metadata={})]</code> | <code>Stuff we figured out about AI in 2023<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>Simon Willison’s Weblog<br>Subscribe<br><br><br><br><br><br><br>Stuff we figured out about AI in 2023<br>31st December 2023<br>2023 was the breakthrough year for Large Language Models (LLMs). I think it’s OK to call these AI—they’re the latest and (currently) most interesting development in the academic field of Artificial Intelligence that dates back to the 1950s.<br>Here’s my attempt to round up the highlights in one place!</code> | |
|
| <code>QUESTION #1\n</code> | <code>Large Language Models<br>They’re actually quite easy to build<br>You can run LLMs on your own devices<br>Hobbyists can build their own fine-tuned models<br>We don’t yet know how to build GPT-4<br>Vibes Based Development<br>LLMs are really smart, and also really, really dumb<br>Gullibility is the biggest unsolved problem<br>Code may be the best application<br>The ethics of this space remain diabolically complex<br>My blog in 2023</code> | |
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
|
```json |
|
{ |
|
"loss": "MultipleNegativesRankingLoss", |
|
"matryoshka_dims": [ |
|
768, |
|
512, |
|
256, |
|
128, |
|
64 |
|
], |
|
"matryoshka_weights": [ |
|
1, |
|
1, |
|
1, |
|
1, |
|
1 |
|
], |
|
"n_dims_per_step": -1 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: steps |
|
- `per_device_train_batch_size`: 10 |
|
- `per_device_eval_batch_size`: 10 |
|
- `num_train_epochs`: 10 |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: steps |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 10 |
|
- `per_device_eval_batch_size`: 10 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 1 |
|
- `eval_accumulation_steps`: None |
|
- `torch_empty_cache_steps`: None |
|
- `learning_rate`: 5e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1 |
|
- `num_train_epochs`: 10 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: linear |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.0 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: False |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: None |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: False |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: None |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `include_for_metrics`: [] |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `eval_on_start`: False |
|
- `use_liger_kernel`: False |
|
- `eval_use_gather_object`: False |
|
- `average_tokens_across_devices`: False |
|
- `prompts`: None |
|
- `batch_sampler`: batch_sampler |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | cosine_ndcg@10 | |
|
|:------:|:----:|:--------------:| |
|
| 1.0 | 17 | 0.7017 | |
|
| 2.0 | 34 | 0.7017 | |
|
| 2.9412 | 50 | 0.7017 | |
|
| 3.0 | 51 | 0.7017 | |
|
| 4.0 | 68 | 0.7017 | |
|
| 5.0 | 85 | 0.7017 | |
|
| 5.8824 | 100 | 0.7017 | |
|
| 6.0 | 102 | 0.7017 | |
|
| 7.0 | 119 | 0.7017 | |
|
| 8.0 | 136 | 0.7017 | |
|
| 8.8235 | 150 | 0.7017 | |
|
| 9.0 | 153 | 0.7017 | |
|
| 10.0 | 170 | 0.7017 | |
|
|
|
|
|
### Framework Versions |
|
- Python: 3.13.1 |
|
- Sentence Transformers: 3.4.1 |
|
- Transformers: 4.48.3 |
|
- PyTorch: 2.6.0+cu124 |
|
- Accelerate: 1.3.0 |
|
- Datasets: 3.2.0 |
|
- Tokenizers: 0.21.0 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MatryoshkaLoss |
|
```bibtex |
|
@misc{kusupati2024matryoshka, |
|
title={Matryoshka Representation Learning}, |
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
|
year={2024}, |
|
eprint={2205.13147}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |