metadata

language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:5822
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: nomic-ai/nomic-embed-text-v2-moe
widget:
  - source_sentence: >-
      the Polaris Solicitations as currently drafted do not comply with Section
      3306(c)(3).  In its request 

      to apply Section 3306(c)(3) to the Polaris Solicitations, GSA stated that 
       
       
        
      Supplement 2, AR at 2907–08.  Because GSA adopted an overly broad
      understanding of Section 

      3306(c)(3)’s scope, GSA stated the Solicitations will include a “full
      range of order types,”
    sentences:
      - What did Al-Hamim confirm about the citations?
      - What understanding did GSA adopt regarding Section 3306(c)(3)'s scope?
      - What was the reason for denying the agency's motion without prejudice?
  - source_sentence: >-
      objective (as position, profit, or a prize); [or to] be in a state of
      rivalry.”  Compete, Merriam-

      Webster’s Collegiate Dictionary (11th ed. 2003); see Competing,
      Merriam-Webster Dictionary, 

      https://www.merriam-webster.com/dictionary/competing (last visited Mar. 7,
      2023) (defining 

      “competing” as being “in a state of rivalry or competition (as for
      position, profit, or a prize)”).
    sentences:
      - >-
        Who claims that Congress has done much of the work to reconcile FACA §
        10(b) and the FOIA exemptions?
      - When was the online dictionary last visited according to the document?
      - What action will the Court take regarding Count Nine in No. 11-444?
  - source_sentence: >-
      a witness for the State, Mr. Zimmerman testified that he was shot in the
      back while sitting 

      in the driver’s seat of his vehicle.  Over objection, during Mr.
      Zimmerman’s direct 

      examination, the circuit court admitted into evidence a video, retrieved
      by a detective, that 

      had been recorded by a camera mounted on the exterior wall of a residence
      near the site of
    sentences:
      - What must a complaint do to defeat a Rule 12(b)(6) motion?
      - What was the position of Mr. Zimmerman when he was shot?
      - >-
        What does Rule 11 impose on any party who signs a pleading, motion, or
        other paper?
  - source_sentence: >-
      than if they had submitted a new request on the same subject,” Fifth Lutz
      Decl. ¶ 9, implicitly 

      confirms that the Assignment of Rights Policy tends to prejudice
      requesters.  To the extent an 

      “assignee would be placed in a better position to litigate the assigned
      request than if they had 

      submitted a new request on the same subject,” id., then a FOIA requester
      “submit[ing] a new
    sentences:
      - >-
        What does the Fifth Lutz Declaration paragraph 9 imply about the
        Assignment of Rights Policy?
      - When did Illinois Supreme Court Rule 663 become effective?
      - >-
        What would happen if the Solicitations were amended to comply with the
        regulations according to the plaintiffs?
  - source_sentence: >-
      against six federal agencies pursuant to the Freedom of Information Act
      (“FOIA”), 5 U.S.C. 

      § 552, claiming that the defendant agencies have violated the FOIA in
      numerous ways.1  NSC’s 

      claims run the gamut, including challenges to: the withholding of specific
      information; the 

      adequacy of the agencies’ search efforts; the refusal to process FOIA
      requests; the refusal to
    sentences:
      - >-
        Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior
        regarding the retroactivity of statutes?
      - How many federal agencies is the action against?
      - Who questioned Mr. Zimmerman after the bench conference?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: ModernBERT Embed base Legal Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.5533230293663061
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.6105100463678517
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7125193199381762
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8083462132921174
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5533230293663061
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5275631117980423
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4126738794435858
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.2502318392581144
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1984801648634724
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.5175167439464194
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6554611025244719
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7895414734672848
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6787324741180409
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.610266553813694
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6544139401960045
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.5502318392581144
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5996908809891809
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7001545595054096
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7897990726429676
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5502318392581144
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5218959299330241
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4046367851622875
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.24296754250386396
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.19886656362699637
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.5137815558990211
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.643353941267388
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7695775373518804
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6665384668011486
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6033776158582955
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6473311395712609
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.5239567233384853
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5703245749613601
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6754250386398764
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.768160741885626
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5239567233384853
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4951056156620299
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.3888717156105101
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.23910355486862445
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.18830499742400822
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4858320453374549
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6172076249356002
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.750772797527048
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6435527388538038
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5769025539118272
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6222193004139938
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.46213292117465227
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.5208655332302936
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6089644513137558
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6862442040185471
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.46213292117465227
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.4456465739309634
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.3536321483771252
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.21298299845440496
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1656362699639361
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4363730036063884
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5607934054611026
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6692426584234931
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5742333897429361
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5144243271754859
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5623047162890543
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.3276661514683153
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.38639876352395675
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.47913446676970634
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.5641421947449768
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.3276661514683153
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3219989696032972
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.2676970633693972
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.16924265842349304
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.1172076249356002
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.321483771251932
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.43379701184956204
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5401854714064915
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4411753101398826
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.38149088589583163
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.43191750141987145
            name: Cosine Map@100

ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: nomic-ai/nomic-embed-text-v2-moe
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1  NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to',
    'How many federal agencies is the action against?',
    'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Datasets: dim_768, dim_512, dim_256, dim_128 and dim_64
Evaluated with InformationRetrievalEvaluator

Metric	dim_768	dim_512	dim_256	dim_128	dim_64
cosine_accuracy@1	0.5533	0.5502	0.524	0.4621	0.3277
cosine_accuracy@3	0.6105	0.5997	0.5703	0.5209	0.3864
cosine_accuracy@5	0.7125	0.7002	0.6754	0.609	0.4791
cosine_accuracy@10	0.8083	0.7898	0.7682	0.6862	0.5641
cosine_precision@1	0.5533	0.5502	0.524	0.4621	0.3277
cosine_precision@3	0.5276	0.5219	0.4951	0.4456	0.322
cosine_precision@5	0.4127	0.4046	0.3889	0.3536	0.2677
cosine_precision@10	0.2502	0.243	0.2391	0.213	0.1692
cosine_recall@1	0.1985	0.1989	0.1883	0.1656	0.1172
cosine_recall@3	0.5175	0.5138	0.4858	0.4364	0.3215
cosine_recall@5	0.6555	0.6434	0.6172	0.5608	0.4338
cosine_recall@10	0.7895	0.7696	0.7508	0.6692	0.5402
cosine_ndcg@10	0.6787	0.6665	0.6436	0.5742	0.4412
cosine_mrr@10	0.6103	0.6034	0.5769	0.5144	0.3815
cosine_map@100	0.6544	0.6473	0.6222	0.5623	0.4319

Training Details

Training Dataset

json

Dataset: json
Size: 5,822 training samples
Columns: positive and anchor
Approximate statistics based on the first 1000 samples:
positive anchor
type string string
details
min: 29 tokens
mean: 94.33 tokens
max: 156 tokens

min: 8 tokens
mean: 18.25 tokens
max: 35 tokens

	positive	anchor
type	string	string
details	min: 29 tokens mean: 94.33 tokens max: 156 tokens	min: 8 tokens mean: 18.25 tokens max: 35 tokens

Samples:

positive	anchor
`aspect” of “substantial independent authority.” Dong v. Smithsonian Inst., 125 F.3d 877, 881 4 See CREW v. Office of Admin., 566 F.3d 219, 220 (D.C. Cir. 2009); Armstrong v. Exec. Office of the President, 90 F.3d 553, 558 (D.C. Cir. 1996); Sweetland v. Walters, 60 F.3d 852, 854`	`What court circuit is mentioned in connection with the case Sweetland v. Walters?`
`the entire list of remaining PQPs shifts up one position. Once GSA has verified, through the evaluation and validation process, the point totals claimed by the 100/80/70 highest-scoring offerors, GSA will cease evaluations and award IDIQ contracts to the successful, verified bidders. AR at 1114, 2154, 2645. If, after the evaluation`	`What is the GSA responsible for verifying?`
`Department components], to assist with the processing of [FOIA or Privacy Act] requests for purposes of administrative expediency and efficiency.” Third Walter Decl. ¶ 3. Indeed, the State Department’s declarant explains that these five State Department components, including DS, “conduct their own FOIA/Privacy Act reviews and respond directly to requesters,” despite`	`What is the identified purpose for assisting with processing FOIA or Privacy Act requests?`

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        768,
        512,
        256,
        128,
        64
    ],
    "matryoshka_weights": [
        1,
        1,
        1,
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 4
per_device_eval_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 2e-05
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
tf32: False
load_best_model_at_end: True
optim: adamw_torch_fused
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 4
per_device_eval_batch_size: 2
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 4
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: False
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	dim_768_cosine_ndcg@10	dim_512_cosine_ndcg@10	dim_256_cosine_ndcg@10	dim_128_cosine_ndcg@10	dim_64_cosine_ndcg@10
0.0549	10	2.6704	-	-	-	-	-
0.1099	20	1.7246	-	-	-	-	-
0.1648	30	1.3634	-	-	-	-	-
0.2198	40	1.0962	-	-	-	-	-
0.2747	50	0.8985	-	-	-	-	-
0.3297	60	0.8667	-	-	-	-	-
0.3846	70	0.7371	-	-	-	-	-
0.4396	80	1.038	-	-	-	-	-
0.4945	90	0.733	-	-	-	-	-
0.5495	100	0.9032	-	-	-	-	-
0.6044	110	0.7283	-	-	-	-	-
0.6593	120	0.6085	-	-	-	-	-
0.7143	130	0.5774	-	-	-	-	-
0.7692	140	0.6164	-	-	-	-	-
0.8242	150	0.8098	-	-	-	-	-
0.8791	160	0.6534	-	-	-	-	-
0.9341	170	0.6035	-	-	-	-	-
0.9890	180	0.5209	-	-	-	-	-
1.0	182	-	0.6911	0.6719	0.6341	0.5600	0.4203
1.0440	190	0.3718	-	-	-	-	-
1.0989	200	0.2309	-	-	-	-	-
1.1538	210	0.2128	-	-	-	-	-
1.2088	220	0.138	-	-	-	-	-
1.2637	230	0.1129	-	-	-	-	-
1.3187	240	0.0889	-	-	-	-	-
1.3736	250	0.0607	-	-	-	-	-
1.4286	260	0.1156	-	-	-	-	-
1.4835	270	0.0826	-	-	-	-	-
1.5385	280	0.098	-	-	-	-	-
1.5934	290	0.0891	-	-	-	-	-
1.6484	300	0.0451	-	-	-	-	-
1.7033	310	0.0581	-	-	-	-	-
1.7582	320	0.0722	-	-	-	-	-
1.8132	330	0.0785	-	-	-	-	-
1.8681	340	0.1407	-	-	-	-	-
1.9231	350	0.1022	-	-	-	-	-
1.9780	360	0.0771	-	-	-	-	-
2.0	364	-	0.6787	0.6665	0.6436	0.5742	0.4412

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.47.0
PyTorch: 2.3.1+cu121
Accelerate: 1.2.1
Datasets: 3.3.1
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}