Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +720 -0
config.json +25 -0
config_sentence_transformers.json +12 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +63 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,720 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:156
+- loss:MatryoshkaLoss
+- loss:MultipleNegativesRankingLoss
+base_model: Snowflake/snowflake-arctic-embed-l
+widget:
+- source_sentence: How much has the cost of using OpenAI's most expensive model changed
+    compared to the previous pricing?
+  sentences:
+  - Synthetic data as a substantial component of pretraining is becoming increasingly
+    common, and the Phi series of models has consistently emphasized the importance
+    of synthetic data. Rather than serving as a cheap substitute for organic data,
+    synthetic data has several direct advantages over organic data.
+  - 'Here’s the rest of the transcript. It’s bland and generic, but my phone can pitch
+    bland and generic Christmas movies to Netflix now!
+    LLM prices crashed, thanks to competition and increased efficiency
+    The past twelve months have seen a dramatic collapse in the cost of running a
+    prompt through the top tier hosted LLMs.
+    In December 2023 (here’s the Internet Archive for the OpenAI pricing page) OpenAI
+    were charging $30/million input tokens for GPT-4, $10/mTok for the then-new GPT-4
+    Turbo and $1/mTok for GPT-3.5 Turbo.
+    Today $30/mTok gets you OpenAI’s most expensive model, o1. GPT-4o is $2.50 (12x
+    cheaper than GPT-4) and GPT-4o mini is $0.15/mTok—nearly 7x cheaper than GPT-3.5
+    and massively more capable.'
+  - 'Then there’s the rest. If you browse the Chatbot Arena leaderboard today—still
+    the most useful single place to get a vibes-based evaluation of models—you’ll
+    see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with
+    higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01
+    AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21
+    Labs, Princeton and Tencent.
+    Training a GPT-4 beating model was a huge deal in 2023. In 2024 it’s an achievement
+    that isn’t even particularly notable, though I personally still celebrate any
+    time a new organization joins that list.
+    Some of those GPT-4 models run on my laptop'
+- source_sentence: What are some potential consequences of making decisions based
+    on hype and misinformation?
+  sentences:
+  - 'The GPT-4 barrier was comprehensively broken
+    In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
+    best model was almost a year old at that point, yet no other AI lab had produced
+    anything better. What did OpenAI know that the rest of us didn’t?
+    I’m relieved that this has changed completely in the past twelve months. 18 organizations
+    now have models on the Chatbot Arena Leaderboard that rank higher than the original
+    GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
+  - 'I like people who are skeptical of this stuff. The hype has been deafening for
+    more than two years now, and there are enormous quantities of snake oil and misinformation
+    out there. A lot of very bad decisions are being made based on that hype. Being
+    critical is a virtue.
+    If we want people with decision-making authority to make good decisions about
+    how to apply these tools we first need to acknowledge that there ARE good applications,
+    and then help explain how to put those into practice while avoiding the many unintiutive
+    traps.
+    (If you still don’t think there are any good applications at all I’m not sure
+    why you made it to this point in the article!)'
+  - '17th: AI for Data Journalism: demonstrating what we can do with this stuff right
+    now
+    22nd: Options for accessing Llama 3 from the terminal using LLM
+    May
+    8th: Slop is the new name for unwanted AI-generated content
+    15th: ChatGPT in “4o” mode is not running the new features yet
+    29th: Training is not the same as chatting: ChatGPT and other LLMs don’t remember
+    everything you say
+    June
+    6th: Accidental prompt injection against RAG applications
+    10th: Thoughts on the WWDC 2024 keynote on Apple Intelligence
+    17th: Language models on the command-line
+    21st: Building search-based RAG using Claude, Datasette and Val Town
+    27th: Open challenges for AI engineering
+    July
+    14th: Imitation Intelligence, my keynote for PyCon US 2024'
+- source_sentence: What advancements have been made in multimodal vision and audio/video
+    capabilities in LLMs?
+  sentences:
+  - 'The year of slop
+    2024 was the year that the word "slop" became a term of art. I wrote about this
+    in May, expanding on this tweet by @deepfates:'
+  - 'The GPT-4 barrier was comprehensively broken
+    Some of those GPT-4 models run on my laptop
+    LLM prices crashed, thanks to competition and increased efficiency
+    Multimodal vision is common, audio and video are starting to emerge
+    Voice and live camera mode are science fiction come to life
+    Prompt driven app generation is a commodity already
+    Universal access to the best models lasted for just a few short months
+    “Agents” still haven’t really happened yet
+    Evals really matter
+    Apple Intelligence is bad, Apple’s MLX library is excellent
+    The rise of inference-scaling “reasoning” models
+    Was the best currently available LLM trained in China for less than $6m?
+    The environmental impact got better
+    The environmental impact got much, much worse'
+  - "Posted 31st December 2024 at 6:07 pm · Follow me on Mastodon or Twitter or subscribe\
+    \ to my newsletter\n\n\nMore recent articles\n\nLLM 0.22, the annotated release\
+    \ notes - 17th February 2025\nRun LLMs on macOS using llm-mlx and Apple's MLX\
+    \ framework - 15th February 2025\nURL-addressable Pyodide Python environments\
+    \ - 13th February 2025\n\n\n \n\n\nThis is Things we learned about LLMs in 2024\
+    \ by Simon Willison, posted on 31st December 2024.\n\nPart of series LLMs annual\
+    \ review\n\nStuff we figured out about AI in 2023 - Dec. 31, 2023, 11:59 p.m.\
+    \ \nThings we learned about LLMs in 2024 - Dec. 31, 2024, 6:07 p.m. \n\n\n\n \
+    \           google\n            347\n\n\n            ai\n            1100\n\n\n\
+    \            openai\n            257"
+- source_sentence: When did the author first run a large language model on their laptop?
+  sentences:
+  - '24th: Notes on the new Claude analysis JavaScript code execution tool
+    27th: Run a prompt to generate and execute jq programs using llm-jq
+    29th: You can now run prompts against images, audio and video in your terminal
+    using LLM
+    30th: W̶e̶e̶k̶n̶o̶t̶e̶s̶  Monthnotes for October
+    November
+    4th: Claude 3.5 Haiku
+    7th: Project: VERDAD—tracking misinformation in radio broadcasts using Gemini
+    1.5
+    12th: Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
+    19th: Notes from Bing Chat—Our First Encounter With Manipulative AI
+    25th: Ask questions of SQLite databases and CSV/JSON files in your terminal
+    December
+    4th: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)
+    7th: Prompts.js'
+  - '260 input tokens, 92 output tokens. Cost approximately 0.0024 cents (that’s less
+    than a 400th of a cent).
+    This increase in efficiency and reduction in price is my single favourite trend
+    from 2024. I want the utility of LLMs at a fraction of the energy cost and it
+    looks like that’s what we’re getting.
+    Multimodal vision is common, audio and video are starting to emerge
+    My butterfly example above illustrates another key trend from 2024: the rise of
+    multi-modal LLMs.
+    A year ago the single most notable example of these was GPT-4 Vision, released
+    at OpenAI’s DevDay in November 2023. Google’s multi-modal Gemini 1.0 was announced
+    on December 7th 2023 so it also (just) makes it into the 2023 window.'
+  - 'My personal laptop is a 64GB M2 MacBook Pro from 2023. It’s a powerful machine,
+    but it’s also nearly two years old now—and crucially it’s the same laptop I’ve
+    been using ever since I first ran an LLM on my computer back in March 2023 (see
+    Large language models are having their Stable Diffusion moment).
+    That same laptop that could just about run a GPT-3-class model in March last year
+    has now run multiple GPT-4 class models! Some of my notes on that:'
+- source_sentence: What notable development in LLM technology occurred in the final
+    quarter of 2024?
+  sentences:
+  - 'Now that those features are rolling out they’re pretty weak. As an LLM power-user
+    I know what these models are capable of, and Apple’s LLM features offer a pale
+    imitation of what a frontier LLM can do. Instead we’re getting notification summaries
+    that misrepresent news headlines and writing assistant tools that I’ve not found
+    useful at all. Genmoji are kind of fun though.
+    The rise of inference-scaling “reasoning” models
+    The most interesting development in the final quarter of 2024 was the introduction
+    of a new shape of LLM, exemplified by OpenAI’s o1 models—initially released as
+    o1-preview and o1-mini on September 12th.'
+  - 'The year of slop
+    Synthetic training data works great
+    LLMs somehow got even harder to use
+    Knowledge is incredibly unevenly distributed
+    LLMs need better criticism
+    Everything tagged “llms” on my blog in 2024'
+  - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
+    little progress on tackling that problem in 2024, and we’ve been talking about
+    it since September 2022.
+    I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
+    A model that’s robust against gulliblity is a very tall order indeed.
+    Evals really matter
+    Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: Unknown
+      type: unknown
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.8333333333333334
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 1.0
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 1.0
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 1.0
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.8333333333333334
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.3333333333333333
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.20000000000000004
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.10000000000000002
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.8333333333333334
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 1.0
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 1.0
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 1.0
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.9330328858630988
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.9097222222222222
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.9097222222222222
+      name: Cosine Map@100
+---
+# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 1024 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("dabraldeepti25/legal-ft-v0")
+# Run inference
+sentences = [
+    'What notable development in LLM technology occurred in the final quarter of 2024?',
+    'Now that those features are rolling out they’re pretty weak. As an LLM power-user I know what these models are capable of, and Apple’s LLM features offer a pale imitation of what a frontier LLM can do. Instead we’re getting notification summaries that misrepresent news headlines and writing assistant tools that I’ve not found useful at all. Genmoji are kind of fun though.\nThe rise of inference-scaling “reasoning” models\nThe most interesting development in the final quarter of 2024 was the introduction of a new shape of LLM, exemplified by OpenAI’s o1 models—initially released as o1-preview and o1-mini on September 12th.',
+    'Prompt injection is a natural consequence of this gulibility. I’ve seen precious little progress on tackling that problem in 2024, and we’ve been talking about it since September 2022.\nI’m beginning to see the most popular idea of “agents” as dependent on AGI itself. A model that’s robust against gulliblity is a very tall order indeed.\nEvals really matter\nAnthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+| Metric              | Value     |
+|:--------------------|:----------|
+| cosine_accuracy@1   | 0.8333    |
+| cosine_accuracy@3   | 1.0       |
+| cosine_accuracy@5   | 1.0       |
+| cosine_accuracy@10  | 1.0       |
+| cosine_precision@1  | 0.8333    |
+| cosine_precision@3  | 0.3333    |
+| cosine_precision@5  | 0.2       |
+| cosine_precision@10 | 0.1       |
+| cosine_recall@1     | 0.8333    |
+| cosine_recall@3     | 1.0       |
+| cosine_recall@5     | 1.0       |
+| cosine_recall@10    | 1.0       |
+| **cosine_ndcg@10**  | **0.933** |
+| cosine_mrr@10       | 0.9097    |
+| cosine_map@100      | 0.9097    |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 156 training samples
+* Columns: <code>sentence_0</code> and <code>sentence_1</code>
+* Approximate statistics based on the first 156 samples:
+  |         | sentence_0                                                                         | sentence_1                                                                          |
+  |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                              |
+  | details | <ul><li>min: 13 tokens</li><li>mean: 20.06 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 130.5 tokens</li><li>max: 204 tokens</li></ul> |
+* Samples:
+  | sentence_0                                                                                                                        | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+  |:----------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What is the significance of Claude Artifacts in the context of LLMs and application development?</code>                     | <code>We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.<br>Anthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.<br>With Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.<br>Here’s my Extract URLs app, entirely generated by Claude:</code> |
+  | <code>How does Claude enable users to interact with applications generated by its capabilities?</code>                            | <code>We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.<br>Anthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.<br>With Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.<br>Here’s my Extract URLs app, entirely generated by Claude:</code> |
+  | <code>What are some of the new capabilities introduced in multi-modal models that enhance their functionality beyond text?</code> | <code>I think people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.<br>Voice and live camera mode are science fiction come to life<br>The audio and live video modes that have started to emerge deserve a special mention.<br>The ability to talk to ChatGPT first arrived in September 2023, but it was mostly an illusion: OpenAI used their excellent Whisper speech-to-text model and a new text-to-speech model (creatively named tts-1) to enable conversations with the ChatGPT mobile apps, but the actual model just saw text.</code>                                         |
+* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
+  ```json
+  {
+      "loss": "MultipleNegativesRankingLoss",
+      "matryoshka_dims": [
+          768,
+          512,
+          256,
+          128,
+          64
+      ],
+      "matryoshka_weights": [
+          1,
+          1,
+          1,
+          1,
+          1
+      ],
+      "n_dims_per_step": -1
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `num_train_epochs`: 10
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 10
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+</details>
+### Training Logs
+| Epoch | Step | cosine_ndcg@10 |
+|:-----:|:----:|:--------------:|
+| 1.0   | 16   | 0.9039         |
+| 2.0   | 32   | 0.9010         |
+| 3.0   | 48   | 0.9218         |
+| 3.125 | 50   | 0.9218         |
+| 4.0   | 64   | 0.9218         |
+| 5.0   | 80   | 0.9247         |
+| 6.0   | 96   | 0.9330         |
+| 6.25  | 100  | 0.9330         |
+| 7.0   | 112  | 0.9330         |
+| 8.0   | 128  | 0.9330         |
+| 9.0   | 144  | 0.9330         |
+| 9.375 | 150  | 0.9330         |
+| 10.0  | 160  | 0.9330         |
+### Framework Versions
+- Python: 3.11.11
+- Sentence Transformers: 3.4.1
+- Transformers: 4.48.3
+- PyTorch: 2.5.1+cu124
+- Accelerate: 1.3.0
+- Datasets: 3.3.1
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MatryoshkaLoss
+```bibtex
+@misc{kusupati2024matryoshka,
+    title={Matryoshka Representation Learning},
+    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
+    year={2024},
+    eprint={2205.13147},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.1",
+    "transformers": "4.48.3",
+    "pytorch": "2.5.1+cu124"
+  },
+  "prompts": {
+    "query": "Represent this sentence for searching relevant passages: "
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5ace2531486c4a6541343de9dd41b2217d6229a61854ecc77b8d1496a4c618c
+size 1336413848

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff