Fine-tuned with QuicKB
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: nomic-ai/modernbert-embed-base
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'What is described in Section 25 of the Arbitration Act?',
'. (3) The provision of subsections (1) and (2) shall apply only to the extent agreed to by the parties. (4) The arbitral tribunal shall decide according to considerations of general justice and fairness or trade usages only if the parties have expressly authorised it to do so. Section 25 of the Arbitration Act describes the form and content of the arbitral award as follows: 25',
'. 9 and 10 based on the objection taken to them by the Counsel for HNB, despite the fact that they did not arise from the pleadings, and were altogether inconsistent with them, answered the afore-stated question of law (in respect of which this Court had granted Leave to Appeal in that case) in the affirmative and in favour of HNB, and stated as follows: “In conclusion, it needs to be emphasised',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
cosine_accuracy@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
cosine_accuracy@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
cosine_accuracy@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
cosine_precision@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
cosine_precision@3 | 0.2539 | 0.2544 | 0.2427 | 0.2253 | 0.186 |
cosine_precision@5 | 0.164 | 0.1642 | 0.1584 | 0.1471 | 0.1244 |
cosine_precision@10 | 0.0885 | 0.0875 | 0.0862 | 0.0824 | 0.0725 |
cosine_recall@1 | 0.5741 | 0.5741 | 0.5552 | 0.4971 | 0.3968 |
cosine_recall@3 | 0.7616 | 0.7631 | 0.7282 | 0.6759 | 0.5581 |
cosine_recall@5 | 0.8198 | 0.8212 | 0.7922 | 0.7355 | 0.6221 |
cosine_recall@10 | 0.8852 | 0.875 | 0.8619 | 0.8241 | 0.7253 |
cosine_ndcg@10 | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
cosine_mrr@10 | 0.6812 | 0.6782 | 0.6586 | 0.6038 | 0.497 |
cosine_map@100 | 0.6852 | 0.6828 | 0.6631 | 0.609 | 0.505 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 6,190 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 7 tokens
- mean: 15.11 tokens
- max: 32 tokens
- min: 3 tokens
- mean: 69.53 tokens
- max: 214 tokens
- Samples:
anchor positive How must the District Court exercise its discretion?
imposition of ‘ a’ term; (5) It is not mandatory to impose security, as evinced by the use of the conjunction “or”; (6) In imposing terms, the District Court must be mindful of the objectives of the Act, and its discretion must be exercised judicially
What is the source of the observation made by Christian Appu?
. Christian Appu , (1895) 1 NLR 288 observed that , “possession is "disturbed" either by an action intended to remove the possessor from the land, or by acts which prevent the possessor from enjoying the free and full use of 12 the land of which he is in the course of acquiring the dominion, and which convert his continuous user into a disconnected and divided user ”
What must the defendant do regarding the plaintiff's claim?
. The Court of Appeal in Ramanayake v Sampath Bank Ltd and Others [(1993) 1 Sri LR 145 at page 153] has held that, “The defendant has to deal with the plaintiff’s claim on its merits; it is not competent for the defendant to merely set out technical objections. It is also incumbent on the defendant to reveal his defence, if he has any
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 16gradient_accumulation_steps
: 8learning_rate
: 2e-05lr_scheduler_type
: cosinewarmup_ratio
: 0.1tf32
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 8eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Truelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.1034 | 5 | 29.8712 | - | - | - | - | - |
0.2067 | 10 | 26.1323 | - | - | - | - | - |
0.3101 | 15 | 17.8585 | - | - | - | - | - |
0.4134 | 20 | 14.0232 | - | - | - | - | - |
0.5168 | 25 | 11.6897 | - | - | - | - | - |
0.6202 | 30 | 10.8431 | - | - | - | - | - |
0.7235 | 35 | 9.264 | - | - | - | - | - |
0.8269 | 40 | 11.2186 | - | - | - | - | - |
0.9302 | 45 | 9.9143 | - | - | - | - | - |
1.0 | 49 | - | 0.7134 | 0.7110 | 0.6902 | 0.6341 | 0.5282 |
1.0207 | 50 | 7.2581 | - | - | - | - | - |
1.1240 | 55 | 6.066 | - | - | - | - | - |
1.2274 | 60 | 6.3626 | - | - | - | - | - |
1.3307 | 65 | 6.8135 | - | - | - | - | - |
1.4341 | 70 | 5.5556 | - | - | - | - | - |
1.5375 | 75 | 6.0144 | - | - | - | - | - |
1.6408 | 80 | 6.1965 | - | - | - | - | - |
1.7442 | 85 | 5.596 | - | - | - | - | - |
1.8475 | 90 | 6.631 | - | - | - | - | - |
1.9509 | 95 | 6.3319 | - | - | - | - | - |
2.0 | 98 | - | 0.7331 | 0.7304 | 0.7074 | 0.6569 | 0.5477 |
2.0413 | 100 | 4.7382 | - | - | - | - | - |
2.1447 | 105 | 4.1516 | - | - | - | - | - |
2.2481 | 110 | 4.3517 | - | - | - | - | - |
2.3514 | 115 | 3.7044 | - | - | - | - | - |
2.4548 | 120 | 4.1593 | - | - | - | - | - |
2.5581 | 125 | 4.8081 | - | - | - | - | - |
2.6615 | 130 | 3.908 | - | - | - | - | - |
2.7649 | 135 | 3.7684 | - | - | - | - | - |
2.8682 | 140 | 3.8927 | - | - | - | - | - |
2.9509 | 144 | - | 0.7308 | 0.7262 | 0.7078 | 0.6568 | 0.5514 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.13.3
- Sentence Transformers: 3.4.0
- Transformers: 4.48.1
- PyTorch: 2.6.0+cu126
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 3
Model tree for Vinsuka/legora_model
Base model
answerdotai/ModernBERT-base
Finetuned
nomic-ai/modernbert-embed-base
Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.574
- Cosine Accuracy@3 on dim 768self-reported0.762
- Cosine Accuracy@5 on dim 768self-reported0.820
- Cosine Accuracy@10 on dim 768self-reported0.885
- Cosine Precision@1 on dim 768self-reported0.574
- Cosine Precision@3 on dim 768self-reported0.254
- Cosine Precision@5 on dim 768self-reported0.164
- Cosine Precision@10 on dim 768self-reported0.089
- Cosine Recall@1 on dim 768self-reported0.574
- Cosine Recall@3 on dim 768self-reported0.762