tsss1's picture
Add new SentenceTransformer model
c9df61a verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5822
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: nomic-ai/nomic-embed-text-v2-moe
widget:
  - source_sentence: >-
      the Polaris Solicitations as currently drafted do not comply with Section
      3306(c)(3).  In its request 

      to apply Section 3306(c)(3) to the Polaris Solicitations, GSA stated that 
       
       
        
      Supplement 2, AR at 2907–08.  Because GSA adopted an overly broad
      understanding of Section 

      3306(c)(3)’s scope, GSA stated the Solicitations will include a “full
      range of order types,”
    sentences:
      - What did Al-Hamim confirm about the citations?
      - What understanding did GSA adopt regarding Section 3306(c)(3)'s scope?
      - What was the reason for denying the agency's motion without prejudice?
  - source_sentence: >-
      objective (as position, profit, or a prize); [or to] be in a state of
      rivalry.”  Compete, Merriam-

      Webster’s Collegiate Dictionary (11th ed. 2003); see Competing,
      Merriam-Webster Dictionary, 

      https://www.merriam-webster.com/dictionary/competing (last visited Mar. 7,
      2023) (defining 

      “competing” as being “in a state of rivalry or competition (as for
      position, profit, or a prize)”).
    sentences:
      - >-
        Who claims that Congress has done much of the work to reconcile FACA §
        10(b) and the FOIA exemptions?
      - When was the online dictionary last visited according to the document?
      - What action will the Court take regarding Count Nine in No. 11-444?
  - source_sentence: >-
      a witness for the State, Mr. Zimmerman testified that he was shot in the
      back while sitting 

      in the driver’s seat of his vehicle.  Over objection, during Mr.
      Zimmerman’s direct 

      examination, the circuit court admitted into evidence a video, retrieved
      by a detective, that 

      had been recorded by a camera mounted on the exterior wall of a residence
      near the site of
    sentences:
      - What must a complaint do to defeat a Rule 12(b)(6) motion?
      - What was the position of Mr. Zimmerman when he was shot?
      - >-
        What does Rule 11 impose on any party who signs a pleading, motion, or
        other paper?
  - source_sentence: >-
      than if they had submitted a new request on the same subject,” Fifth Lutz
      Decl. ¶ 9, implicitly 

      confirms that the Assignment of Rights Policy tends to prejudice
      requesters.  To the extent an 

      “assignee would be placed in a better position to litigate the assigned
      request than if they had 

      submitted a new request on the same subject,” id., then a FOIA requester
      “submit[ing] a new
    sentences:
      - >-
        What does the Fifth Lutz Declaration paragraph 9 imply about the
        Assignment of Rights Policy?
      - When did Illinois Supreme Court Rule 663 become effective?
      - >-
        What would happen if the Solicitations were amended to comply with the
        regulations according to the plaintiffs?
  - source_sentence: >-
      against six federal agencies pursuant to the Freedom of Information Act
      (“FOIA”), 5 U.S.C. 

      § 552, claiming that the defendant agencies have violated the FOIA in
      numerous ways.1  NSC’s 

      claims run the gamut, including challenges to: the withholding of specific
      information; the 

      adequacy of the agencies’ search efforts; the refusal to process FOIA
      requests; the refusal to
    sentences:
      - >-
        Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior
        regarding the retroactivity of statutes?
      - How many federal agencies is the action against?
      - Who questioned Mr. Zimmerman after the bench conference?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: ModernBERT Embed base Legal Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.5533230293663061
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.6105100463678517
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7125193199381762
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8083462132921174
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5533230293663061
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5275631117980423
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4126738794435858
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.2502318392581144
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1984801648634724
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.5175167439464194
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6554611025244719
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7895414734672848
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6787324741180409
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.610266553813694
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6544139401960045
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.5502318392581144
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5996908809891809
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7001545595054096
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7897990726429676
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5502318392581144
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5218959299330241
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4046367851622875
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.24296754250386396
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.19886656362699637
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.5137815558990211
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.643353941267388
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7695775373518804
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6665384668011486
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6033776158582955
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6473311395712609
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.5239567233384853
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5703245749613601
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6754250386398764
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.768160741885626
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5239567233384853
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4951056156620299
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.3888717156105101
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.23910355486862445
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.18830499742400822
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4858320453374549
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6172076249356002
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.750772797527048
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6435527388538038
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5769025539118272
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6222193004139938
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.46213292117465227
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5208655332302936
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6089644513137558
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6862442040185471
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.46213292117465227
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4456465739309634
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.3536321483771252
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.21298299845440496
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1656362699639361
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4363730036063884
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5607934054611026
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6692426584234931
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5742333897429361
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5144243271754859
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5623047162890543
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.3276661514683153
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.38639876352395675
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.47913446676970634
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.5641421947449768
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.3276661514683153
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3219989696032972
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.2676970633693972
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.16924265842349304
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1172076249356002
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.321483771251932
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.43379701184956204
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5401854714064915
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4411753101398826
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.38149088589583163
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.43191750141987145
            name: Cosine Map@100

ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/nomic-embed-text-v2-moe
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1  NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to',
    'How many federal agencies is the action against?',
    'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.5533 0.5502 0.524 0.4621 0.3277
cosine_accuracy@3 0.6105 0.5997 0.5703 0.5209 0.3864
cosine_accuracy@5 0.7125 0.7002 0.6754 0.609 0.4791
cosine_accuracy@10 0.8083 0.7898 0.7682 0.6862 0.5641
cosine_precision@1 0.5533 0.5502 0.524 0.4621 0.3277
cosine_precision@3 0.5276 0.5219 0.4951 0.4456 0.322
cosine_precision@5 0.4127 0.4046 0.3889 0.3536 0.2677
cosine_precision@10 0.2502 0.243 0.2391 0.213 0.1692
cosine_recall@1 0.1985 0.1989 0.1883 0.1656 0.1172
cosine_recall@3 0.5175 0.5138 0.4858 0.4364 0.3215
cosine_recall@5 0.6555 0.6434 0.6172 0.5608 0.4338
cosine_recall@10 0.7895 0.7696 0.7508 0.6692 0.5402
cosine_ndcg@10 0.6787 0.6665 0.6436 0.5742 0.4412
cosine_mrr@10 0.6103 0.6034 0.5769 0.5144 0.3815
cosine_map@100 0.6544 0.6473 0.6222 0.5623 0.4319

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,822 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 29 tokens
    • mean: 94.33 tokens
    • max: 156 tokens
    • min: 8 tokens
    • mean: 18.25 tokens
    • max: 35 tokens
  • Samples:
    positive anchor
    aspect” of “substantial independent authority.” Dong v. Smithsonian Inst., 125 F.3d 877, 881

    4 See CREW v. Office of Admin., 566 F.3d 219, 220 (D.C. Cir. 2009); Armstrong v. Exec. Office
    of the President, 90 F.3d 553, 558 (D.C. Cir. 1996); Sweetland v. Walters, 60 F.3d 852, 854
    What court circuit is mentioned in connection with the case Sweetland v. Walters?
    the entire list of remaining PQPs shifts up one position.
    Once GSA has verified, through the evaluation and validation process, the point totals
    claimed by the 100/80/70 highest-scoring offerors, GSA will cease evaluations and award IDIQ
    contracts to the successful, verified bidders. AR at 1114, 2154, 2645. If, after the evaluation
    What is the GSA responsible for verifying?
    Department components], to assist with the processing of [FOIA or Privacy Act] requests for
    purposes of administrative expediency and efficiency.” Third Walter Decl. ¶ 3. Indeed, the
    State Department’s declarant explains that these five State Department components, including
    DS, “conduct their own FOIA/Privacy Act reviews and respond directly to requesters,” despite
    What is the identified purpose for assisting with processing FOIA or Privacy Act requests?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0549 10 2.6704 - - - - -
0.1099 20 1.7246 - - - - -
0.1648 30 1.3634 - - - - -
0.2198 40 1.0962 - - - - -
0.2747 50 0.8985 - - - - -
0.3297 60 0.8667 - - - - -
0.3846 70 0.7371 - - - - -
0.4396 80 1.038 - - - - -
0.4945 90 0.733 - - - - -
0.5495 100 0.9032 - - - - -
0.6044 110 0.7283 - - - - -
0.6593 120 0.6085 - - - - -
0.7143 130 0.5774 - - - - -
0.7692 140 0.6164 - - - - -
0.8242 150 0.8098 - - - - -
0.8791 160 0.6534 - - - - -
0.9341 170 0.6035 - - - - -
0.9890 180 0.5209 - - - - -
1.0 182 - 0.6911 0.6719 0.6341 0.5600 0.4203
1.0440 190 0.3718 - - - - -
1.0989 200 0.2309 - - - - -
1.1538 210 0.2128 - - - - -
1.2088 220 0.138 - - - - -
1.2637 230 0.1129 - - - - -
1.3187 240 0.0889 - - - - -
1.3736 250 0.0607 - - - - -
1.4286 260 0.1156 - - - - -
1.4835 270 0.0826 - - - - -
1.5385 280 0.098 - - - - -
1.5934 290 0.0891 - - - - -
1.6484 300 0.0451 - - - - -
1.7033 310 0.0581 - - - - -
1.7582 320 0.0722 - - - - -
1.8132 330 0.0785 - - - - -
1.8681 340 0.1407 - - - - -
1.9231 350 0.1022 - - - - -
1.9780 360 0.0771 - - - - -
2.0 364 - 0.6787 0.6665 0.6436 0.5742 0.4412
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.3.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}