SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LequeuISIR/ModernBERT-base-DPR-8e-05")
# Run inference
sentences = [
    'This incites social hatred, threatens economic and social stability, and undermines trust in the authorities.',
    '\xa0The conditions for a healthy entrepreneurship, where the most innovative and creative win and where the source of enrichment cannot be property speculation or guilds and networks.   ',
    'As a result, the profits of the oligarchs are more than 400 times what our entire country gets from the exploitation of natural resources.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 478,146 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 17 tokens
    • mean: 33.73 tokens
    • max: 107 tokens
    • min: 16 tokens
    • mean: 33.84 tokens
    • max: 101 tokens
    • 0: ~57.50%
    • 1: ~4.10%
    • 2: ~38.40%
  • Samples:
    sentence1 sentence2 label
    There have also been other important structural changes in the countryside, which have come together to form this new, as yet unknown, country. Meanwhile, investment, which is the way to increase production, employment capacity and competitiveness of the economy, fell from 20% of output in 1974 to only 11.8% on average between 1984 and 1988. 0
    Introduce new visa categories so we can be responsive to humanitarian needs and incentivise greater investment in our domestic infrastructure and regional economies The purpose of the project is to design and implement public policies aimed at achieving greater and faster inclusion of immigrants. 2
    and economic crimes that seriously and generally affect the fundamental rights of individuals and the international community as a whole. For the first time in the history, not only of Ecuador, but of the entire world, a government promoted a public audit process of the foreign debt and declared some of its tranches illegitimate and immoral. 0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

json

  • Dataset: json
  • Size: 478,146 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 17 tokens
    • mean: 33.62 tokens
    • max: 103 tokens
    • min: 16 tokens
    • mean: 34.48 tokens
    • max: 111 tokens
    • 0: ~57.30%
    • 1: ~2.90%
    • 2: ~39.80%
  • Samples:
    sentence1 sentence2 label
    The anchoring of the Slovak Republic in the European Union allows citizens to feel: secure politically, secure economically, secure socially. Radikale Venstre wants Denmark to participate fully and firmly in EU cooperation on immigration, asylum and cross-border crime. 2
    Portugal's participation in the Community's negotiation of the next financial perspective should also be geared in the same direction. Given the dynamic international framework, safeguarding the national interest requires adjustments to each of these vectors. 2
    On asylum, the Green Party will: Dismantle the direct provision system and replace it with an efficient and humane system for determining the status of asylum seekers The crisis in the coal sector subsequently forced these immigrant workers to move into other economic sectors such as metallurgy, chemicals, construction and transport. 2
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 8e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.05
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 8e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0837 500 6.425
0.1673 1000 6.0308
0.2510 1500 5.9522
0.3346 2000 5.7818
0.4183 2500 5.7122
0.5019 3000 5.6378
0.5856 3500 5.5503
0.6692 4000 5.4429
0.7529 4500 5.4246
0.8365 5000 5.3536
0.9202 5500 5.4072
1.0038 6000 5.3033
1.0875 6500 4.7611
1.1712 7000 4.7535
1.2548 7500 4.7503
1.3385 8000 4.7453
1.4221 8500 4.7413
1.5058 9000 4.6753
1.5894 9500 4.67
1.6731 10000 4.7352
1.7567 10500 4.7164
1.8404 11000 4.6784
1.9240 11500 4.651
2.0077 12000 4.5708
2.0914 12500 3.6274
2.1750 13000 3.5683
2.2587 13500 3.7028
2.3423 14000 3.5859
2.4260 14500 3.6872
2.5096 15000 3.5148
2.5933 15500 3.7241
2.6769 16000 3.5983
2.7606 16500 3.6269
2.8442 17000 3.6078
2.9279 17500 3.6292
3.0115 18000 3.5151
3.0952 18500 2.5933
3.1789 19000 2.599
3.2625 19500 2.5598
3.3462 20000 2.5577
3.4298 20500 2.5827
3.5135 21000 2.5598
3.5971 21500 2.4173
3.6808 22000 2.5884
3.7644 22500 2.4313
3.8481 23000 2.5669
3.9317 23500 2.5162
4.0154 24000 2.2531
4.0990 24500 1.3758
4.1827 25000 1.5491
4.2664 25500 1.4933
4.3500 26000 1.5139
4.4337 26500 1.4607
4.5173 27000 1.6117
4.6010 27500 1.5395
4.6846 28000 1.493
4.7683 28500 1.3984
4.8519 29000 1.4183
4.9356 29500 1.3517

Framework Versions

  • Python: 3.9.21
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
5
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for LequeuISIR/ModernBERT-base-DPR-8e-05

Finetuned
(229)
this model