SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LucaZilli/all_mpnet_base_v2_190225")
# Run inference
sentences = [
    'materiali isolanti per sistemi radianti a soffitto',
    'materiali isolanti per edifici',
    'privacy and data protection training',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric custom_dataset stsbenchmark
pearson_cosine 0.7376 0.8404
spearman_cosine 0.7392 0.8342

Triplet

Metric Value
cosine_accuracy 0.9319

Training Details

Training Dataset

Unnamed Dataset

  • Size: 25,310 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 13.32 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 11.06 tokens
    • max: 31 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    ottimizzazione dei tempi di produzione per capi sartoriali di lusso strumenti per l'ottimizzazione dei tempi di produzione 0.6
    software di programmazione robotica per lucidatura software gestionale generico 0.4
    rete di sensori per l'analisi del suolo in tempo reale software per gestione aziendale 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 3,164 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 13.61 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 11.39 tokens
    • max: 27 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    ispezioni regolari per camion aziendali ispezioni regolari per camion di consegna 1.0
    blister packaging machines GMP compliant food packaging machines 0.4
    EMI shielding paints for electronics Vernici per schermatura elettromagnetica dispositivi elettronici 0.8
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss custom_dataset_spearman_cosine all_nli_dataset_cosine_accuracy stsbenchmark_spearman_cosine
-1 -1 - - 0.7392 0.9319 0.8342
0.0632 100 0.0484 - - - -
0.1264 200 0.0389 0.0363 - - -
0.1896 300 0.0377 - - - -
0.2528 400 0.033 0.0307 - - -
0.3161 500 0.0302 - - - -
0.3793 600 0.0329 0.0298 - - -
0.4425 700 0.0315 - - - -
0.5057 800 0.0308 0.0275 - - -
0.5689 900 0.0309 - - - -
0.6321 1000 0.0279 0.0282 - - -
0.6953 1100 0.0299 - - - -
0.7585 1200 0.0286 0.0271 - - -
0.8217 1300 0.0292 - - - -
0.8850 1400 0.027 0.0261 - - -
0.9482 1500 0.0254 - - - -
1.0114 1600 0.0237 0.0242 - - -
1.0746 1700 0.0196 - - - -
1.1378 1800 0.0196 0.0242 - - -
1.2010 1900 0.021 - - - -
1.2642 2000 0.0225 0.0267 - - -
1.3274 2100 0.022 - - - -
1.3906 2200 0.0203 0.0246 - - -
1.4539 2300 0.0178 - - - -
1.5171 2400 0.019 0.0240 - - -
1.5803 2500 0.0209 - - - -
1.6435 2600 0.0186 0.0240 - - -
1.7067 2700 0.0261 - - - -
1.7699 2800 0.0193 0.0246 - - -
1.8331 2900 0.02 - - - -
1.8963 3000 0.0196 0.0240 - - -
1.9595 3100 0.0186 - - - -
2.0228 3200 0.0164 0.0226 - - -
2.0860 3300 0.0122 - - - -
2.1492 3400 0.0123 0.0221 - - -
2.2124 3500 0.0135 - - - -
2.2756 3600 0.0134 0.0226 - - -
2.3388 3700 0.0128 - - - -
2.4020 3800 0.0126 0.0231 - - -
2.4652 3900 0.0134 - - - -
2.5284 4000 0.0142 0.0231 - - -
2.5917 4100 0.0124 - - - -
2.6549 4200 0.0132 0.0215 - - -
2.7181 4300 0.0136 - - - -
2.7813 4400 0.013 0.0218 - - -
2.8445 4500 0.0127 - - - -
2.9077 4600 0.0126 0.0213 - - -
2.9709 4700 0.0133 - - - -
3.0341 4800 0.0103 0.0209 - - -
3.0973 4900 0.0086 - - - -
3.1606 5000 0.0088 0.0211 - - -
3.2238 5100 0.0081 - - - -
3.2870 5200 0.0079 0.0212 - - -
3.3502 5300 0.0094 - - - -
3.4134 5400 0.0086 0.0210 - - -
3.4766 5500 0.0089 - - - -
3.5398 5600 0.009 0.0209 - - -
3.6030 5700 0.0086 - - - -
3.6662 5800 0.0087 0.0213 - - -
3.7295 5900 0.0085 - - - -
3.7927 6000 0.0094 0.0213 - - -
3.8559 6100 0.0095 - - - -
3.9191 6200 0.01 0.0212 - - -
3.9823 6300 0.0097 - - - -
4.0455 6400 0.0081 0.0214 - - -
4.1087 6500 0.0095 - - - -
4.1719 6600 0.0083 0.0208 - - -
4.2351 6700 0.0082 - - - -
4.2984 6800 0.0074 0.0208 - - -
4.3616 6900 0.0076 - - - -
4.4248 7000 0.0072 0.0205 - - -
4.4880 7100 0.0072 - - - -
4.5512 7200 0.007 0.0205 - - -
4.6144 7300 0.0069 - - - -
4.6776 7400 0.0069 0.0203 - - -
4.7408 7500 0.0068 - - - -
4.8040 7600 0.0073 0.0204 - - -
4.8673 7700 0.0063 - - - -
4.9305 7800 0.0065 0.0203 - - -
4.9937 7900 0.0069 - - - -

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
6
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for LucaZilli/all_mpnet_base_v2_190225

Finetuned
(215)
this model

Evaluation results