metadata
language:
- en
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:5822
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: nomic-ai/nomic-embed-text-v2-moe
widget:
- source_sentence: >-
the Polaris Solicitations as currently drafted do not comply with Section
3306(c)(3). In its request
to apply Section 3306(c)(3) to the Polaris Solicitations, GSA stated that
Supplement 2, AR at 2907–08. Because GSA adopted an overly broad
understanding of Section
3306(c)(3)’s scope, GSA stated the Solicitations will include a “full
range of order types,”
sentences:
- What did Al-Hamim confirm about the citations?
- What understanding did GSA adopt regarding Section 3306(c)(3)'s scope?
- What was the reason for denying the agency's motion without prejudice?
- source_sentence: >-
objective (as position, profit, or a prize); [or to] be in a state of
rivalry.” Compete, Merriam-
Webster’s Collegiate Dictionary (11th ed. 2003); see Competing,
Merriam-Webster Dictionary,
https://www.merriam-webster.com/dictionary/competing (last visited Mar. 7,
2023) (defining
“competing” as being “in a state of rivalry or competition (as for
position, profit, or a prize)”).
sentences:
- >-
Who claims that Congress has done much of the work to reconcile FACA §
10(b) and the FOIA exemptions?
- When was the online dictionary last visited according to the document?
- What action will the Court take regarding Count Nine in No. 11-444?
- source_sentence: >-
a witness for the State, Mr. Zimmerman testified that he was shot in the
back while sitting
in the driver’s seat of his vehicle. Over objection, during Mr.
Zimmerman’s direct
examination, the circuit court admitted into evidence a video, retrieved
by a detective, that
had been recorded by a camera mounted on the exterior wall of a residence
near the site of
sentences:
- What must a complaint do to defeat a Rule 12(b)(6) motion?
- What was the position of Mr. Zimmerman when he was shot?
- >-
What does Rule 11 impose on any party who signs a pleading, motion, or
other paper?
- source_sentence: >-
than if they had submitted a new request on the same subject,” Fifth Lutz
Decl. ¶ 9, implicitly
confirms that the Assignment of Rights Policy tends to prejudice
requesters. To the extent an
“assignee would be placed in a better position to litigate the assigned
request than if they had
submitted a new request on the same subject,” id., then a FOIA requester
“submit[ing] a new
sentences:
- >-
What does the Fifth Lutz Declaration paragraph 9 imply about the
Assignment of Rights Policy?
- When did Illinois Supreme Court Rule 663 become effective?
- >-
What would happen if the Solicitations were amended to comply with the
regulations according to the plaintiffs?
- source_sentence: >-
against six federal agencies pursuant to the Freedom of Information Act
(“FOIA”), 5 U.S.C.
§ 552, claiming that the defendant agencies have violated the FOIA in
numerous ways.1 NSC’s
claims run the gamut, including challenges to: the withholding of specific
information; the
adequacy of the agencies’ search efforts; the refusal to process FOIA
requests; the refusal to
sentences:
- >-
Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior
regarding the retroactivity of statutes?
- How many federal agencies is the action against?
- Who questioned Mr. Zimmerman after the bench conference?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: ModernBERT Embed base Legal Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.5533230293663061
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.6105100463678517
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.7125193199381762
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.8083462132921174
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5533230293663061
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5275631117980423
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.4126738794435858
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.2502318392581144
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.1984801648634724
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5175167439464194
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6554611025244719
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7895414734672848
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6787324741180409
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.610266553813694
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6544139401960045
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.5502318392581144
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5996908809891809
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.7001545595054096
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.7897990726429676
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5502318392581144
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.5218959299330241
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.4046367851622875
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.24296754250386396
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.19886656362699637
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.5137815558990211
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.643353941267388
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.7695775373518804
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6665384668011486
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.6033776158582955
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6473311395712609
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.5239567233384853
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5703245749613601
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6754250386398764
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.768160741885626
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.5239567233384853
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4951056156620299
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.3888717156105101
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.23910355486862445
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.18830499742400822
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.4858320453374549
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.6172076249356002
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.750772797527048
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.6435527388538038
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.5769025539118272
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.6222193004139938
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.46213292117465227
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.5208655332302936
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.6089644513137558
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.6862442040185471
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.46213292117465227
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4456465739309634
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.3536321483771252
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.21298299845440496
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.1656362699639361
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.4363730036063884
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.5607934054611026
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.6692426584234931
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.5742333897429361
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.5144243271754859
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.5623047162890543
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.3276661514683153
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.38639876352395675
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.47913446676970634
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.5641421947449768
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.3276661514683153
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3219989696032972
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.2676970633693972
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.16924265842349304
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.1172076249356002
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.321483771251932
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.43379701184956204
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.5401854714064915
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.4411753101398826
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.38149088589583163
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.43191750141987145
name: Cosine Map@100
ModernBERT Embed base Legal Matryoshka
This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/nomic-embed-text-v2-moe
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tsss1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
'against six federal agencies pursuant to the Freedom of Information Act (“FOIA”), 5 U.S.C. \n§ 552, claiming that the defendant agencies have violated the FOIA in numerous ways.1 NSC’s \nclaims run the gamut, including challenges to: the withholding of specific information; the \nadequacy of the agencies’ search efforts; the refusal to process FOIA requests; the refusal to',
'How many federal agencies is the action against?',
'Which case was quoted in Entertainment Ltd. v. U.S. Dep’t of Interior regarding the retroactivity of statutes?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 |
cosine_accuracy@3 | 0.6105 | 0.5997 | 0.5703 | 0.5209 | 0.3864 |
cosine_accuracy@5 | 0.7125 | 0.7002 | 0.6754 | 0.609 | 0.4791 |
cosine_accuracy@10 | 0.8083 | 0.7898 | 0.7682 | 0.6862 | 0.5641 |
cosine_precision@1 | 0.5533 | 0.5502 | 0.524 | 0.4621 | 0.3277 |
cosine_precision@3 | 0.5276 | 0.5219 | 0.4951 | 0.4456 | 0.322 |
cosine_precision@5 | 0.4127 | 0.4046 | 0.3889 | 0.3536 | 0.2677 |
cosine_precision@10 | 0.2502 | 0.243 | 0.2391 | 0.213 | 0.1692 |
cosine_recall@1 | 0.1985 | 0.1989 | 0.1883 | 0.1656 | 0.1172 |
cosine_recall@3 | 0.5175 | 0.5138 | 0.4858 | 0.4364 | 0.3215 |
cosine_recall@5 | 0.6555 | 0.6434 | 0.6172 | 0.5608 | 0.4338 |
cosine_recall@10 | 0.7895 | 0.7696 | 0.7508 | 0.6692 | 0.5402 |
cosine_ndcg@10 | 0.6787 | 0.6665 | 0.6436 | 0.5742 | 0.4412 |
cosine_mrr@10 | 0.6103 | 0.6034 | 0.5769 | 0.5144 | 0.3815 |
cosine_map@100 | 0.6544 | 0.6473 | 0.6222 | 0.5623 | 0.4319 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 5,822 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 29 tokens
- mean: 94.33 tokens
- max: 156 tokens
- min: 8 tokens
- mean: 18.25 tokens
- max: 35 tokens
- Samples:
positive anchor aspect” of “substantial independent authority.” Dong v. Smithsonian Inst., 125 F.3d 877, 881
4 See CREW v. Office of Admin., 566 F.3d 219, 220 (D.C. Cir. 2009); Armstrong v. Exec. Office
of the President, 90 F.3d 553, 558 (D.C. Cir. 1996); Sweetland v. Walters, 60 F.3d 852, 854What court circuit is mentioned in connection with the case Sweetland v. Walters?
the entire list of remaining PQPs shifts up one position.
Once GSA has verified, through the evaluation and validation process, the point totals
claimed by the 100/80/70 highest-scoring offerors, GSA will cease evaluations and award IDIQ
contracts to the successful, verified bidders. AR at 1114, 2154, 2645. If, after the evaluationWhat is the GSA responsible for verifying?
Department components], to assist with the processing of [FOIA or Privacy Act] requests for
purposes of administrative expediency and efficiency.” Third Walter Decl. ¶ 3. Indeed, the
State Department’s declarant explains that these five State Department components, including
DS, “conduct their own FOIA/Privacy Act reviews and respond directly to requesters,” despiteWhat is the identified purpose for assisting with processing FOIA or Privacy Act requests?
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 4per_device_eval_batch_size
: 2gradient_accumulation_steps
: 4learning_rate
: 2e-05num_train_epochs
: 2lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 4per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 4eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.0549 | 10 | 2.6704 | - | - | - | - | - |
0.1099 | 20 | 1.7246 | - | - | - | - | - |
0.1648 | 30 | 1.3634 | - | - | - | - | - |
0.2198 | 40 | 1.0962 | - | - | - | - | - |
0.2747 | 50 | 0.8985 | - | - | - | - | - |
0.3297 | 60 | 0.8667 | - | - | - | - | - |
0.3846 | 70 | 0.7371 | - | - | - | - | - |
0.4396 | 80 | 1.038 | - | - | - | - | - |
0.4945 | 90 | 0.733 | - | - | - | - | - |
0.5495 | 100 | 0.9032 | - | - | - | - | - |
0.6044 | 110 | 0.7283 | - | - | - | - | - |
0.6593 | 120 | 0.6085 | - | - | - | - | - |
0.7143 | 130 | 0.5774 | - | - | - | - | - |
0.7692 | 140 | 0.6164 | - | - | - | - | - |
0.8242 | 150 | 0.8098 | - | - | - | - | - |
0.8791 | 160 | 0.6534 | - | - | - | - | - |
0.9341 | 170 | 0.6035 | - | - | - | - | - |
0.9890 | 180 | 0.5209 | - | - | - | - | - |
1.0 | 182 | - | 0.6911 | 0.6719 | 0.6341 | 0.5600 | 0.4203 |
1.0440 | 190 | 0.3718 | - | - | - | - | - |
1.0989 | 200 | 0.2309 | - | - | - | - | - |
1.1538 | 210 | 0.2128 | - | - | - | - | - |
1.2088 | 220 | 0.138 | - | - | - | - | - |
1.2637 | 230 | 0.1129 | - | - | - | - | - |
1.3187 | 240 | 0.0889 | - | - | - | - | - |
1.3736 | 250 | 0.0607 | - | - | - | - | - |
1.4286 | 260 | 0.1156 | - | - | - | - | - |
1.4835 | 270 | 0.0826 | - | - | - | - | - |
1.5385 | 280 | 0.098 | - | - | - | - | - |
1.5934 | 290 | 0.0891 | - | - | - | - | - |
1.6484 | 300 | 0.0451 | - | - | - | - | - |
1.7033 | 310 | 0.0581 | - | - | - | - | - |
1.7582 | 320 | 0.0722 | - | - | - | - | - |
1.8132 | 330 | 0.0785 | - | - | - | - | - |
1.8681 | 340 | 0.1407 | - | - | - | - | - |
1.9231 | 350 | 0.1022 | - | - | - | - | - |
1.9780 | 360 | 0.0771 | - | - | - | - | - |
2.0 | 364 | - | 0.6787 | 0.6665 | 0.6436 | 0.5742 | 0.4412 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.47.0
- PyTorch: 2.3.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}