Add new SentenceTransformer model

Browse files

Files changed (11) hide show

1_Pooling/config.json +10 -0
README.md +663 -0
config.json +25 -0
config_sentence_transformers.json +12 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +63 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 1024,
+  "pooling_mode_cls_token": true,
+  "pooling_mode_mean_tokens": false,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,663 @@

+---
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:156
+- loss:MatryoshkaLoss
+- loss:MultipleNegativesRankingLoss
+base_model: Snowflake/snowflake-arctic-embed-l
+widget:
+- source_sentence: What is the estimated training cost of DeepSeek v3, and how does
+    it compare to the training hours used for Llama 31?
+  sentences:
+  - 'Your browser does not support the audio element.
+    OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
+    accepts audio input, and the Google Gemini apps can speak in a similar way to
+    ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
+    meant to roll out in Q1 of 2025.
+    Google’s NotebookLM, released in September, took audio output to a new level by
+    producing spookily realistic conversations between two “podcast hosts” about anything
+    you fed into their tool. They later added custom instructions, so naturally I
+    turned them into pelicans:
+    Your browser does not support the audio element.'
+  - 'DeepSeek v3 is a huge 685B parameter model—one of the largest openly licensed
+    models currently available, significantly bigger than the largest of Meta’s Llama
+    series, Llama 3.1 405B.
+    Benchmarks put it up there with Claude 3.5 Sonnet. Vibe benchmarks (aka the Chatbot
+    Arena) currently rank it 7th, just behind the Gemini 2.0 and OpenAI 4o/o1 models.
+    This is by far the highest ranking openly licensed model.
+    The really impressive thing about DeepSeek v3 is the training cost. The model
+    was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama
+    3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model
+    that benchmarks slightly worse.'
+  - 'Those US export regulations on GPUs to China seem to have inspired some very
+    effective training optimizations!
+    The environmental impact got better
+    A welcome result of the increased efficiency of the models—both the hosted ones
+    and the ones I can run locally—is that the energy usage and environmental impact
+    of running a prompt has dropped enormously over the past couple of years.
+    OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days.
+    I have it on good authority that neither Google Gemini nor Amazon Nova (two of
+    the least expensive model providers) are running prompts at a loss.'
+- source_sentence: How does the launch of ChatGPT Pro impact access to OpenAI's most
+    capable model compared to previous offerings?
+  sentences:
+  - 'These abilities are just a few weeks old at this point, and I don’t think their
+    impact has been fully felt yet. If you haven’t tried them out yet you really should.
+    Both Gemini and OpenAI offer API access to these features as well. OpenAI started
+    with a WebSocket API that was quite challenging to use, but in December they announced
+    a new WebRTC API which is much easier to get started with. Building a web app
+    that a user can talk to via voice is easy now!
+    Prompt driven app generation is a commodity already
+    This was possible with GPT-4 in 2023, but the value it provides became evident
+    in 2024.'
+  - 'OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely
+    available from its launch in June. This was a momentus change, because for the
+    previous year free users had mostly been restricted to GPT-3.5 level models, meaning
+    new users got a very inaccurate mental model of what a capable LLM could actually
+    do.
+    That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT
+    Pro. This $200/month subscription service is the only way to access their most
+    capable model, o1 Pro.
+    Since the trick behind the o1 series (and the future models it will undoubtedly
+    inspire) is to expend more compute time to get better results, I don’t think those
+    days of free access to the best available models are likely to return.'
+  - 'Intuitively, one would expect that systems this powerful would take millions
+    of lines of complex code. Instead, it turns out a few hundred lines of Python
+    is genuinely enough to train a basic version!
+    What matters most is the training  data. You need a lot of data to make these
+    things work, and the quantity and quality of the training data appears to be the
+    most important factor in how good the resulting model is.
+    If you can gather the right data, and afford to pay for the GPUs to train it,
+    you can build an LLM.'
+- source_sentence: What are the implications of having a Code Interpreter equivalent
+    for fact-checking natural language?
+  sentences:
+  - 'Your browser does not support the audio element.
+    OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
+    accepts audio input, and the Google Gemini apps can speak in a similar way to
+    ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
+    meant to roll out in Q1 of 2025.
+    Google’s NotebookLM, released in September, took audio output to a new level by
+    producing spookily realistic conversations between two “podcast hosts” about anything
+    you fed into their tool. They later added custom instructions, so naturally I
+    turned them into pelicans:
+    Your browser does not support the audio element.'
+  - 'Except... you can run generated code to see if it’s correct. And with patterns
+    like ChatGPT Code Interpreter the LLM can execute the code itself, process the
+    error message, then rewrite it and keep trying until it works!
+    So hallucination is a much lesser problem for code generation than for anything
+    else. If only we had the equivalent of Code Interpreter for fact-checking natural
+    language!
+    How should we feel about this as software engineers?
+    On the one hand, this feels like a threat: who needs a programmer if ChatGPT can
+    write code for you?'
+  - 'The biggest innovation here is that it opens up a new way to scale a model: instead
+    of improving model performance purely through additional compute at training time,
+    models can now take on harder problems by spending more compute on inference.
+    The sequel to o1, o3 (they skipped “o2” for European trademark reasons) was announced
+    on 20th December with an impressive result against the ARC-AGI benchmark, albeit
+    one that likely involved more than $1,000,000 of compute time expense!
+    o3 is expected to ship in January. I doubt many people have real-world problems
+    that would benefit from that level of compute expenditure—I certainly don’t!—but
+    it appears to be a genuine next step in LLM architecture for taking on much harder
+    problems.'
+- source_sentence: What advantages does a 64GB Mac have for running models compared
+    to other machines?
+  sentences:
+  - 'My personal laptop is a 64GB M2 MacBook Pro from 2023. It’s a powerful machine,
+    but it’s also nearly two years old now—and crucially it’s the same laptop I’ve
+    been using ever since I first ran an LLM on my computer back in March 2023 (see
+    Large language models are having their Stable Diffusion moment).
+    That same laptop that could just about run a GPT-3-class model in March last year
+    has now run multiple GPT-4 class models! Some of my notes on that:'
+  - 'This prompt-driven custom interface feature is so powerful and easy to build
+    (once you’ve figured out the gnarly details of browser sandboxing) that I expect
+    it to show up as a feature in a wide range of products in 2025.
+    Universal access to the best models lasted for just a few short months
+    For a few short months this year all three of the best available models—GPT-4o,
+    Claude 3.5 Sonnet and Gemini 1.5 Pro—were freely available to most of the world.'
+  - 'On paper, a 64GB Mac should be a great machine for running models due to the
+    way the CPU and GPU can share the same memory. In practice, many models are released
+    as model weights and libraries that reward NVIDIA’s CUDA over other platforms.
+    The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
+    Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.
+    Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
+    models on my Mac, with excellent performance. mlx-community on Hugging Face offers
+    more than 1,000 models that have been converted to the necessary format.'
+- source_sentence: How does Claude enable users to interact with applications generated
+    by its system?
+  sentences:
+  - 'We already knew LLMs were spookily good at writing code. If you prompt them right,
+    it turns out they can build you a full interactive application using HTML, CSS
+    and JavaScript (and tools like React if you wire up some extra supporting build
+    mechanisms)—often in a single prompt.
+    Anthropic kicked this idea into high gear when they released Claude Artifacts,
+    a groundbreaking new feature that was initially slightly lost in the noise due
+    to being described half way through their announcement of the incredible Claude
+    3.5 Sonnet.
+    With Artifacts, Claude can write you an on-demand interactive application and
+    then let you use it directly inside the Claude interface.
+    Here’s my Extract URLs app, entirely generated by Claude:'
+  - 'An interesting point of comparison here could be the way railways rolled out
+    around the world in the 1800s. Constructing these required enormous investments
+    and had a massive environmental impact, and many of the lines that were built
+    turned out to be unnecessary—sometimes multiple lines from different companies
+    serving the exact same routes!
+    The resulting bubbles contributed to several financial crashes, see Wikipedia
+    for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They
+    left us with a lot of useful infrastructure and a great deal of bankruptcies and
+    environmental damage.
+    The year of slop'
+  - 'We don’t yet know how to build GPT-4
+    Frustratingly, despite the enormous leaps ahead we’ve had this year, we are yet
+    to see an alternative model that’s better than GPT-4.
+    OpenAI released GPT-4 in March, though it later turned out we had a sneak peak
+    of it in February when Microsoft used it as part of the new Bing.
+    This may well change in the next few weeks: Google’s Gemini Ultra has big claims,
+    but isn’t yet available for us to try out.
+    The team behind Mistral are working to beat GPT-4 as well, and their track record
+    is already extremely strong considering their first public model only came out
+    in September, and they’ve released two significant improvements since then.'
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: Unknown
+      type: unknown
+    metrics:
+    - type: cosine_accuracy@1
+      value: 1.0
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 1.0
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 1.0
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 1.0
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 1.0
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.3333333333333333
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.20000000000000004
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.10000000000000002
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 1.0
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 1.0
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 1.0
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 1.0
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 1.0
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 1.0
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 1.0
+      name: Cosine Map@100
+---
+# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
+- **Maximum Sequence Length:** 512 tokens
+- **Output Dimensionality:** 1024 dimensions
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("llm-wizard/legal-ft-2")
+# Run inference
+sentences = [
+    'How does Claude enable users to interact with applications generated by its system?',
+    'We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.\nAnthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.\nWith Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.\nHere’s my Extract URLs app, entirely generated by Claude:',
+    'We don’t yet know how to build GPT-4\nFrustratingly, despite the enormous leaps ahead we’ve had this year, we are yet to see an alternative model that’s better than GPT-4.\nOpenAI released GPT-4 in March, though it later turned out we had a sneak peak of it in February when Microsoft used it as part of the new Bing.\nThis may well change in the next few weeks: Google’s Gemini Ultra has big claims, but isn’t yet available for us to try out.\nThe team behind Mistral are working to beat GPT-4 as well, and their track record is already extremely strong considering their first public model only came out in September, and they’ve released two significant improvements since then.',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 1024]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+| Metric              | Value   |
+|:--------------------|:--------|
+| cosine_accuracy@1   | 1.0     |
+| cosine_accuracy@3   | 1.0     |
+| cosine_accuracy@5   | 1.0     |
+| cosine_accuracy@10  | 1.0     |
+| cosine_precision@1  | 1.0     |
+| cosine_precision@3  | 0.3333  |
+| cosine_precision@5  | 0.2     |
+| cosine_precision@10 | 0.1     |
+| cosine_recall@1     | 1.0     |
+| cosine_recall@3     | 1.0     |
+| cosine_recall@5     | 1.0     |
+| cosine_recall@10    | 1.0     |
+| **cosine_ndcg@10**  | **1.0** |
+| cosine_mrr@10       | 1.0     |
+| cosine_map@100      | 1.0     |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### Unnamed Dataset
+* Size: 156 training samples
+* Columns: <code>sentence_0</code> and <code>sentence_1</code>
+* Approximate statistics based on the first 156 samples:
+  |         | sentence_0                                                                         | sentence_1                                                                           |
+  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                               |
+  | details | <ul><li>min: 12 tokens</li><li>mean: 20.22 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 134.95 tokens</li><li>max: 214 tokens</li></ul> |
+* Samples:
+  | sentence_0                                                                          | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+  |:------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What topics were covered in the annotated presentations given in 2023?</code> | <code>I also gave a bunch of talks and podcast appearances. I’ve started habitually turning my talks into annotated presentations—here are my best from 2023:<br><br>Prompt injection explained, with video, slides, and a transcript<br>Catching up on the weird world of LLMs<br>Making Large Language Models work for you<br>Open questions for AI engineering<br>Embeddings: What they are and why they matter<br>Financial sustainability for open source projects at GitHub Universe<br><br>And in podcasts:<br><br><br>What AI can do for you on the Theory of Change<br><br>Working in public on Path to Citus Con<br><br>LLMs break the internet on the Changelog<br><br>Talking Large Language Models on Rooftop Ruby<br><br>Thoughts on the OpenAI board situation on Newsroom Robots</code> |
+  | <code>Which podcasts featured discussions about Large Language Models?</code>       | <code>I also gave a bunch of talks and podcast appearances. I’ve started habitually turning my talks into annotated presentations—here are my best from 2023:<br><br>Prompt injection explained, with video, slides, and a transcript<br>Catching up on the weird world of LLMs<br>Making Large Language Models work for you<br>Open questions for AI engineering<br>Embeddings: What they are and why they matter<br>Financial sustainability for open source projects at GitHub Universe<br><br>And in podcasts:<br><br><br>What AI can do for you on the Theory of Change<br><br>Working in public on Path to Citus Con<br><br>LLMs break the internet on the Changelog<br><br>Talking Large Language Models on Rooftop Ruby<br><br>Thoughts on the OpenAI board situation on Newsroom Robots</code> |
+  | <code>When did Google release their gemini-20-flash-thinking-exp model?</code>      | <code>OpenAI are not the only game in town here. Google released their first entrant in the category, gemini-2.0-flash-thinking-exp, on December 19th.<br>Alibaba’s Qwen team released their QwQ model on November 28th—under an Apache 2.0 license, and that one I could run on my own machine. They followed that up with a vision reasoning model called QvQ on December 24th, which I also ran locally.<br>DeepSeek made their DeepSeek-R1-Lite-Preview model available to try out through their chat interface on November 20th.<br>To understand more about inference scaling I recommend Is AI progress slowing down? by Arvind Narayanan and Sayash Kapoor.</code>                                                                                                                              |
+* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
+  ```json
+  {
+      "loss": "MultipleNegativesRankingLoss",
+      "matryoshka_dims": [
+          768,
+          512,
+          256,
+          128,
+          64
+      ],
+      "matryoshka_weights": [
+          1,
+          1,
+          1,
+          1,
+          1
+      ],
+      "n_dims_per_step": -1
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `num_train_epochs`: 10
+- `multi_dataset_batch_sampler`: round_robin
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 10
+- `per_device_eval_batch_size`: 10
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1
+- `num_train_epochs`: 10
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.0
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: round_robin
+</details>
+### Training Logs
+| Epoch | Step | cosine_ndcg@10 |
+|:-----:|:----:|:--------------:|
+| 1.0   | 16   | 1.0            |
+| 2.0   | 32   | 1.0            |
+| 3.0   | 48   | 1.0            |
+| 3.125 | 50   | 1.0            |
+| 4.0   | 64   | 1.0            |
+| 5.0   | 80   | 1.0            |
+| 6.0   | 96   | 1.0            |
+| 6.25  | 100  | 1.0            |
+| 7.0   | 112  | 1.0            |
+| 8.0   | 128  | 1.0            |
+| 9.0   | 144  | 1.0            |
+| 9.375 | 150  | 1.0            |
+| 10.0  | 160  | 1.0            |
+### Framework Versions
+- Python: 3.13.1
+- Sentence Transformers: 3.4.1
+- Transformers: 4.48.3
+- PyTorch: 2.6.0+cu124
+- Accelerate: 1.3.0
+- Datasets: 3.2.0
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MatryoshkaLoss
+```bibtex
+@misc{kusupati2024matryoshka,
+    title={Matryoshka Representation Learning},
+    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
+    year={2024},
+    eprint={2205.13147},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{henderson2017efficient,
+    title={Efficient Natural Language Response Suggestion for Smart Reply},
+    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
+    year={2017},
+    eprint={1705.00652},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.1",
+    "transformers": "4.48.3",
+    "pytorch": "2.6.0+cu124"
+  },
+  "prompts": {
+    "query": "Represent this sentence for searching relevant passages: "
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:675c77bfbd9c7527de94a6f13bc029fc78de9b2c65bd419f154cf0422cf7554e
+size 1336413848

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff