LawyerAI1 / README.md
Jonuu's picture
Add new SentenceTransformer model
9577913 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:46
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: nomic-ai/modernbert-embed-base
widget:
  - source_sentence: >-
      Medical science is the application of scientific principles to the study
      and practice of medicine. It has transformed medicine by providing a
      deeper understanding of the human body at the cellular and molecular
      levels, allowing for more effective treatments and interventions. Medical
      science has enabled us to develop new treatments, understand the causes of
      diseases, and improve patient outcomes. It's had a profound impact on the
      way medicine is practiced today.
    sentences:
      - >-
        I was reading about health and wellness, and I came across the term
        "quackery." What is quackery in the context of medicine?
      - >-
        That's really interesting. What is medical science, and how has it
        impacted the practice of medicine?
      - >-
        That's helpful to know. What is the primary purpose of a physical
        examination in medicine, anyway?
  - source_sentence: >-
      The purpose of differential diagnosis is to rule out conditions based on
      the information provided, in order to narrow down the possible causes of a
      patient's symptoms. By considering multiple potential diagnoses and
      evaluating the likelihood of each, doctors can arrive at a more accurate
      diagnosis and develop an effective treatment plan.
    sentences:
      - >-
        I've heard the term "differential diagnosis" before. What is the purpose
        of differential diagnosis?
      - >-
        Hello, I'm interested in learning about the various ways that diseases
        can be treated. Can you tell me some common ways to treat disease?
      - >-
        I was just wondering about what happens during a typical doctor's visit.
        What kinds of medical devices are typically used in basic diagnostic
        procedures?
  - source_sentence: >-
      Typically, individual governments establish legal, credentialing, and
      financing frameworks to support health care systems. These frameworks help
      to structure the way health care is delivered and accessed within a
      country.
    sentences:
      - >-
        That makes sense. I'm also curious about the frameworks themselves. What
        types of frameworks are typically established by individual governments
        to support health care systems?
      - I see. Where is contemporary medicine generally conducted?
      - >-
        That makes sense. I've been to the doctor's office a few times and I've
        seen them use those devices. What is the role of physicians and
        physician assistants in modern clinical practice?
  - source_sentence: >-
      The information gathered during a medical encounter is documented in the
      medical record, which is a legal document in many jurisdictions. This
      record contains all the relevant information about the patient's
      condition, treatment, and medical history, and is used to guide future
      care and treatment decisions.
    sentences:
      - >-
        I see. I think I understand, but I'm a bit confused. Is there a more
        general term for medical treatments that are used outside of scientific
        medicine?
      - >-
        That makes sense. What types of medical information might you collect
        from a patient's medical history?
      - What happens to the information gathered during a medical encounter?
  - source_sentence: >-
      Regional differences in culture and technology are significant factors
      that contribute to variations in medical availability and clinical
      practice around the world. These factors can shape the way healthcare is
      delivered, the types of treatments that are available, and even the way
      patients interact with healthcare professionals. It's fascinating to learn
      about these differences and how they impact healthcare outcomes.
    sentences:
      - >-
        I see. I'm curious about the term "therapy" in the context of treating
        disease. Can you explain what you understand by that term?
      - >-
        Hi, I'm learning about medical interviews, and I'm a bit confused about
        the information that's gathered about a patient's occupation and
        lifestyle. What information is typically gathered during the interview?
      - >-
        I see. I'm also interested in learning more about the variations in
        medical availability and clinical practice around the world. What are
        some factors that contribute to variations in medical availability and
        clinical practice around the world?
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: ModernBERT Embed base Legal Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.8333333333333334
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8333333333333334
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8333333333333334
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9384882922619097
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9166666666666666
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9166666666666666
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 1
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 1
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 1
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 1
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 1
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 1
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 1
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 1
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 1
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 1
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 1
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 1
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.8333333333333334
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8333333333333334
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8333333333333334
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9384882922619097
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9166666666666666
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9166666666666666
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 1
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 1
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19999999999999998
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09999999999999999
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 1
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 1
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 1
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 1
            name: Cosine Map@100

ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Jonuu/LawyerAI1")
# Run inference
sentences = [
    "Regional differences in culture and technology are significant factors that contribute to variations in medical availability and clinical practice around the world. These factors can shape the way healthcare is delivered, the types of treatments that are available, and even the way patients interact with healthcare professionals. It's fascinating to learn about these differences and how they impact healthcare outcomes.",
    "I see. I'm also interested in learning more about the variations in medical availability and clinical practice around the world. What are some factors that contribute to variations in medical availability and clinical practice around the world?",
    "Hi, I'm learning about medical interviews, and I'm a bit confused about the information that's gathered about a patient's occupation and lifestyle. What information is typically gathered during the interview?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.8333 1.0 1.0 0.8333 1.0
cosine_accuracy@3 1.0 1.0 1.0 1.0 1.0
cosine_accuracy@5 1.0 1.0 1.0 1.0 1.0
cosine_accuracy@10 1.0 1.0 1.0 1.0 1.0
cosine_precision@1 0.8333 1.0 1.0 0.8333 1.0
cosine_precision@3 0.3333 0.3333 0.3333 0.3333 0.3333
cosine_precision@5 0.2 0.2 0.2 0.2 0.2
cosine_precision@10 0.1 0.1 0.1 0.1 0.1
cosine_recall@1 0.8333 1.0 1.0 0.8333 1.0
cosine_recall@3 1.0 1.0 1.0 1.0 1.0
cosine_recall@5 1.0 1.0 1.0 1.0 1.0
cosine_recall@10 1.0 1.0 1.0 1.0 1.0
cosine_ndcg@10 0.9385 1.0 1.0 0.9385 1.0
cosine_mrr@10 0.9167 1.0 1.0 0.9167 1.0
cosine_map@100 0.9167 1.0 1.0 0.9167 1.0

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 46 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 46 samples:
    positive anchor
    type string string
    details
    • min: 37 tokens
    • mean: 71.26 tokens
    • max: 148 tokens
    • min: 10 tokens
    • mean: 29.57 tokens
    • max: 47 tokens
  • Samples:
    positive anchor
    The characteristics of a health care system have a significant impact on the way medical care is provided. The structure, financing, and policies of a health care system can all influence the availability, accessibility, and quality of medical care. That helps clarify things. How do the characteristics of a health care system impact the way medical care is provided?
    Ancient philosophers and physicians applied treatments like bloodletting based on theoretical frameworks such as humorism, which attempted to explain the workings of the human body. These early theories were often influenced by cultural and philosophical beliefs, and they laid the groundwork for the development of modern medical science. It's interesting to see how our understanding of the human body has evolved over time, isn't it? I'm curious about ancient philosophers and physicians. How did they approach medicine?
    Quackery is an interesting topic. In the context of medicine, quackery refers to medical treatments that are used outside of scientific medicine, but have significant concerns related to ethics, safety, and efficacy. This means that these treatments are not necessarily supported by scientific evidence, and may even be harmful to patients. I was reading about health and wellness, and I came across the term "quackery." What is quackery in the context of medicine?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
1.0 1 0.9385 1.0 0.9385 0.9385 1.0
2.0 2 0.9385 1.0 1.0 0.9385 1.0
3.0 3 0.9385 1.0 1.0 0.9385 1.0
4.0 4 0.9385 1.0 1.0 0.9385 1.0
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0+cu118
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}