SentenceTransformer based on answerdotai/ModernBERT-base
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("LequeuISIR/ModernBERT-base-DPR-8e-05")
# Run inference
sentences = [
'This incites social hatred, threatens economic and social stability, and undermines trust in the authorities.',
'\xa0The conditions for a healthy entrepreneurship, where the most innovative and creative win and where the source of enrichment cannot be property speculation or guilds and networks. ',
'As a result, the profits of the oligarchs are more than 400 times what our entire country gets from the exploitation of natural resources.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
json
- Dataset: json
- Size: 478,146 training samples
- Columns:
sentence1
,sentence2
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 17 tokens
- mean: 33.73 tokens
- max: 107 tokens
- min: 16 tokens
- mean: 33.84 tokens
- max: 101 tokens
- 0: ~57.50%
- 1: ~4.10%
- 2: ~38.40%
- Samples:
sentence1 sentence2 label There have also been other important structural changes in the countryside, which have come together to form this new, as yet unknown, country.
Meanwhile, investment, which is the way to increase production, employment capacity and competitiveness of the economy, fell from 20% of output in 1974 to only 11.8% on average between 1984 and 1988.
0
Introduce new visa categories so we can be responsive to humanitarian needs and incentivise greater investment in our domestic infrastructure and regional economies
The purpose of the project is to design and implement public policies aimed at achieving greater and faster inclusion of immigrants.
2
and economic crimes that seriously and generally affect the fundamental rights of individuals and the international community as a whole.
For the first time in the history, not only of Ecuador, but of the entire world, a government promoted a public audit process of the foreign debt and declared some of its tranches illegitimate and immoral.
0
- Loss:
CoSENTLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Evaluation Dataset
json
- Dataset: json
- Size: 478,146 evaluation samples
- Columns:
sentence1
,sentence2
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string int details - min: 17 tokens
- mean: 33.62 tokens
- max: 103 tokens
- min: 16 tokens
- mean: 34.48 tokens
- max: 111 tokens
- 0: ~57.30%
- 1: ~2.90%
- 2: ~39.80%
- Samples:
sentence1 sentence2 label The anchoring of the Slovak Republic in the European Union allows citizens to feel: secure politically, secure economically, secure socially.
Radikale Venstre wants Denmark to participate fully and firmly in EU cooperation on immigration, asylum and cross-border crime.
2
Portugal's participation in the Community's negotiation of the next financial perspective should also be geared in the same direction.
Given the dynamic international framework, safeguarding the national interest requires adjustments to each of these vectors.
2
On asylum, the Green Party will: Dismantle the direct provision system and replace it with an efficient and humane system for determining the status of asylum seekers
The crisis in the coal sector subsequently forced these immigrant workers to move into other economic sectors such as metallurgy, chemicals, construction and transport.
2
- Loss:
CoSENTLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "pairwise_cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 64per_device_eval_batch_size
: 64learning_rate
: 8e-05num_train_epochs
: 5warmup_ratio
: 0.05bf16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 64per_device_eval_batch_size
: 64per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 8e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.05warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0837 | 500 | 6.425 |
0.1673 | 1000 | 6.0308 |
0.2510 | 1500 | 5.9522 |
0.3346 | 2000 | 5.7818 |
0.4183 | 2500 | 5.7122 |
0.5019 | 3000 | 5.6378 |
0.5856 | 3500 | 5.5503 |
0.6692 | 4000 | 5.4429 |
0.7529 | 4500 | 5.4246 |
0.8365 | 5000 | 5.3536 |
0.9202 | 5500 | 5.4072 |
1.0038 | 6000 | 5.3033 |
1.0875 | 6500 | 4.7611 |
1.1712 | 7000 | 4.7535 |
1.2548 | 7500 | 4.7503 |
1.3385 | 8000 | 4.7453 |
1.4221 | 8500 | 4.7413 |
1.5058 | 9000 | 4.6753 |
1.5894 | 9500 | 4.67 |
1.6731 | 10000 | 4.7352 |
1.7567 | 10500 | 4.7164 |
1.8404 | 11000 | 4.6784 |
1.9240 | 11500 | 4.651 |
2.0077 | 12000 | 4.5708 |
2.0914 | 12500 | 3.6274 |
2.1750 | 13000 | 3.5683 |
2.2587 | 13500 | 3.7028 |
2.3423 | 14000 | 3.5859 |
2.4260 | 14500 | 3.6872 |
2.5096 | 15000 | 3.5148 |
2.5933 | 15500 | 3.7241 |
2.6769 | 16000 | 3.5983 |
2.7606 | 16500 | 3.6269 |
2.8442 | 17000 | 3.6078 |
2.9279 | 17500 | 3.6292 |
3.0115 | 18000 | 3.5151 |
3.0952 | 18500 | 2.5933 |
3.1789 | 19000 | 2.599 |
3.2625 | 19500 | 2.5598 |
3.3462 | 20000 | 2.5577 |
3.4298 | 20500 | 2.5827 |
3.5135 | 21000 | 2.5598 |
3.5971 | 21500 | 2.4173 |
3.6808 | 22000 | 2.5884 |
3.7644 | 22500 | 2.4313 |
3.8481 | 23000 | 2.5669 |
3.9317 | 23500 | 2.5162 |
4.0154 | 24000 | 2.2531 |
4.0990 | 24500 | 1.3758 |
4.1827 | 25000 | 1.5491 |
4.2664 | 25500 | 1.4933 |
4.3500 | 26000 | 1.5139 |
4.4337 | 26500 | 1.4607 |
4.5173 | 27000 | 1.6117 |
4.6010 | 27500 | 1.5395 |
4.6846 | 28000 | 1.493 |
4.7683 | 28500 | 1.3984 |
4.8519 | 29000 | 1.4183 |
4.9356 | 29500 | 1.3517 |
Framework Versions
- Python: 3.9.21
- Sentence Transformers: 3.4.0
- Transformers: 4.48.1
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for LequeuISIR/ModernBERT-base-DPR-8e-05
Base model
answerdotai/ModernBERT-base