SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base on the rozetka_positive_pairs dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Dot Product
  • Training Dataset:
    • rozetka_positive_pairs

Model Sources

Full Model Architecture

RZTKSentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rztk/multilingual-e5-base-matryoshka2d-mnr-3")
# Run inference
sentences = [
    'query: мебель для кухни',
    'passage: Кухня Эко модуль Вытяжка 600 Эверест Ясень Шимо Светлый 60х30х28 см',
    'passage: Ключниці кишенькові Karya Гарантія 14 днів Для кого Для жінок Колір Червоний Матеріал Шкіра Країна реєстрації бренда Туреччина Країна-виробник товару Туреччина',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

rozetka_positive_pairs

  • Dataset: rozetka_positive_pairs
  • Size: 58,620,066 training samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 6 tokens
    • mean: 11.27 tokens
    • max: 30 tokens
    • min: 11 tokens
    • mean: 59.47 tokens
    • max: 512 tokens
  • Samples:
    query text
    query: xsiomi 9c скло passage: Защитные стекла Назначение Для мобильных телефонов Цвет Черный Теги Теги Наличие рамки C рамкой Форм-фактор Плоское Клеевой слой По всей поверхности
    query: xsiomi 9c скло passage: Захисне скло Призначення Для мобільних телефонів Колір Чорний Теги Теги Наявність рамки З рамкою Форм-фактор Плоске Клейовий шар По всій поверхні
    query: xsiomi 9c скло passage: Захисне скло Glass Full Glue для Xiaomi Redmi 9A/9C/10A (Чорний)
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Evaluation Dataset

rozetka_positive_pairs

  • Dataset: rozetka_positive_pairs
  • Size: 1,903,728 evaluation samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 6 tokens
    • mean: 8.36 tokens
    • max: 16 tokens
    • min: 8 tokens
    • mean: 45.68 tokens
    • max: 365 tokens
  • Samples:
    query text
    query: создаем нейронную сеть passage: Створюємо нейронну мережу
    query: создаем нейронную сеть passage: Создаем нейронную сеть (1666498)
    query: создаем нейронную сеть passage: Научная и техническая литература Переплет Мягкий
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 88
  • per_device_eval_batch_size: 88
  • learning_rate: 2e-05
  • num_train_epochs: 1.0
  • warmup_ratio: 0.1
  • bf16: True
  • bf16_full_eval: True
  • tf32: True
  • dataloader_num_workers: 8
  • load_best_model_at_end: True
  • optim: adafactor
  • push_to_hub: True
  • hub_model_id: rztk/multilingual-e5-base-matryoshka2d-mnr-3
  • hub_private_repo: True
  • prompts: {'query': 'query: ', 'text': 'passage: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 88
  • per_device_eval_batch_size: 88
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: True
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 8
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adafactor
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: rztk/multilingual-e5-base-matryoshka2d-mnr-3
  • hub_strategy: every_save
  • hub_private_repo: True
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'query': 'query: ', 'text': 'passage: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • ddp_static_graph: False
  • ddp_comm_hook: bf16
  • gradient_as_bucket_view: False
  • num_proc: 30

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0050 833 4.8404 -
0.0100 1666 4.6439 -
0.0150 2499 4.2238 -
0.0200 3332 3.5445 -
0.0250 4165 2.7514 -
0.0300 4998 2.4037 -
0.0350 5831 2.1916 -
0.0400 6664 2.0938 -
0.0450 7497 1.9268 -
0.0500 8330 1.8671 -
0.0550 9163 1.7069 -
0.0600 9996 1.6419 -
0.0650 10829 1.55 -
0.0700 11662 1.5483 -
0.0750 12495 1.5419 -
0.0800 13328 1.3582 -
0.0850 14161 1.3537 -
0.0900 14994 1.3067 -
0.0950 15827 1.2128 -
0.1000 16654 - 1.0107
0.1000 16660 1.2248 -
0.1050 17493 1.1565 -
0.1100 18326 1.1351 -
0.1150 19159 1.0808 -
0.1200 19992 1.0561 -
0.1250 20825 1.078 -
0.1301 21658 1.1413 -
0.1351 22491 1.0446 -
0.1401 23324 0.9986 -
0.1451 24157 0.9668 -
0.1501 24990 0.9753 -
0.1551 25823 1.0031 -
0.1601 26656 0.9688 -
0.1651 27489 0.9262 -
0.1701 28322 0.9702 -
0.1751 29155 0.9082 -
0.1801 29988 0.9264 -
0.1851 30821 0.8526 -
0.1901 31654 0.9667 -
0.1951 32487 0.9421 -
0.2000 33308 - 0.6416
0.2001 33320 0.9216 -
0.2051 34153 0.95 -
0.2101 34986 0.8895 -
0.2151 35819 0.8349 -
0.2201 36652 0.8628 -
0.2251 37485 0.8729 -
0.2301 38318 0.9285 -
0.2351 39151 0.8718 -
0.2401 39984 0.8792 -
0.2451 40817 0.8852 -
0.2501 41650 0.877 -
0.2551 42483 0.8325 -
0.2601 43316 0.8446 -
0.2651 44149 0.812 -
0.2701 44982 0.8246 -
0.2751 45815 0.8086 -
0.2801 46648 0.8553 -
0.2851 47481 0.8506 -
0.2901 48314 0.834 -
0.2951 49147 0.8313 -
0.3000 49962 - 0.5377
0.3001 49980 0.8376 -
0.3051 50813 0.7836 -
0.3101 51646 0.8089 -
0.3151 52479 0.8065 -
0.3201 53312 0.8284 -
0.3251 54145 0.7959 -
0.3301 54978 0.8332 -
0.3351 55811 0.7924 -
0.3401 56644 0.8171 -
0.3451 57477 0.7924 -
0.3501 58310 0.7977 -
0.3551 59143 0.7729 -
0.3601 59976 0.7617 -
0.3651 60809 0.8211 -
0.3701 61642 0.8497 -
0.3751 62475 0.8218 -
0.3802 63308 0.7846 -
0.3852 64141 0.7876 -
0.3902 64974 0.7912 -
0.3952 65807 0.7977 -
0.4000 66616 - 0.4974
0.4002 66640 0.8096 -
0.4052 67473 0.8356 -
0.4102 68306 0.788 -
0.4152 69139 0.7683 -
0.4202 69972 0.7358 -
0.4252 70805 0.7634 -
0.4302 71638 0.7535 -
0.4352 72471 0.756 -
0.4402 73304 0.7633 -
0.4452 74137 0.7509 -
0.4502 74970 0.7547 -
0.4552 75803 0.7539 -
0.4602 76636 0.7608 -
0.4652 77469 0.8262 -
0.4702 78302 0.8076 -
0.4752 79135 0.8179 -
0.4802 79968 0.7709 -
0.4852 80801 0.744 -
0.4902 81634 0.7846 -
0.4952 82467 0.7473 -
0.5000 83270 - 0.4776
0.5002 83300 0.7759 -
0.5052 84133 0.755 -
0.5102 84966 0.7308 -
0.5152 85799 0.7256 -
0.5202 86632 0.7703 -
0.5252 87465 0.7823 -
0.5302 88298 0.8109 -
0.5352 89131 0.7795 -
0.5402 89964 0.7833 -
0.5452 90797 0.7752 -
0.5502 91630 0.7975 -
0.5552 92463 0.7863 -
0.5602 93296 0.7337 -
0.5652 94129 0.7755 -
0.5702 94962 0.7928 -
0.5752 95795 0.7604 -
0.5802 96628 0.7983 -
0.5852 97461 0.7665 -
0.5902 98294 0.7749 -
0.5952 99127 0.7838 -
0.6000 99924 - 0.4669
0.6002 99960 0.7727 -
0.6052 100793 0.8049 -
0.6102 101626 0.7857 -
0.6152 102459 0.7622 -
0.6202 103292 0.8117 -
0.6252 104125 0.7711 -
0.6302 104958 0.7892 -
0.6353 105791 0.7938 -
0.6403 106624 0.728 -
0.6453 107457 0.7693 -
0.6503 108290 0.7875 -
0.6553 109123 0.7958 -
0.6603 109956 0.749 -
0.6653 110789 0.7788 -
0.6703 111622 0.7614 -
0.6753 112455 0.7577 -
0.6803 113288 0.7805 -
0.6853 114121 0.7677 -
0.6903 114954 0.7458 -
0.6953 115787 0.7962 -
0.7000 116578 - 0.4641
0.7003 116620 0.7275 -
0.7053 117453 0.7778 -
0.7103 118286 0.7885 -
0.7153 119119 0.8046 -
0.7203 119952 0.8222 -
0.7253 120785 0.7714 -
0.7303 121618 0.7983 -
0.7353 122451 0.7359 -
0.7403 123284 0.7618 -
0.7453 124117 0.783 -
0.7503 124950 0.763 -
0.7553 125783 0.809 -
0.7603 126616 0.794 -
0.7653 127449 0.7366 -
0.7703 128282 0.776 -
0.7753 129115 0.8053 -
0.7803 129948 0.7941 -
0.7853 130781 0.7722 -
0.7903 131614 0.7959 -
0.7953 132447 0.8061 -
0.8000 133232 - 0.4468

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for yklymchuk-rztk/e5-3-test2

Finetuned
(40)
this model

Evaluation results