SentenceTransformer based on distilbert/distilbert-base-multilingual-cased

This is a sentence-transformers model finetuned from distilbert/distilbert-base-multilingual-cased on the default dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})


Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '1 Concrete is typically measured by cubic yards (3â\x80\x99x3â\x80\x99x3â\x80\x99). 2  An average cost for a cubic yard of concrete is $75 to $125, depending on how much is needed and local prices. 3  Labor costs to pour and form concrete run somewhere around $3.50 to $7.00 per square foot. An average cost for a cubic yard of concrete is $75 to $125, depending on how much is needed and local prices. 2  Labor costs to pour and form concrete run somewhere around $3.50 to $7.00 per square foot.',
    '1 Beton biasanya diukur dengan meter kubik (3âÂâââx3Ãâ¢Ã‚ââx3âÂââ„¢). 2 Biaya rata-rata untuk satu yard kubik beton adalah $75 sampai $125, tergantung pada berapa banyak yang dibutuhkan dan harga setempat. 3 Biaya tenaga kerja untuk menuangkan dan membentuk beton berkisar antara $3,50 hingga $7,00 per kaki persegi. Biaya rata-rata untuk satu yard kubik beton adalah $75 sampai $125, tergantung pada berapa banyak yang dibutuhkan dan harga lokal. 2 Biaya tenaga kerja untuk menuangkan dan membentuk beton berkisar antara $3,50 hingga $7,00 per kaki persegi.',
    "Parrot Tattoos - Polly ingin cracker.. Ungkapan ini identik dengan 'parrot', terutama yang duduk di bahu bajak laut, seperti yang dibuat terkenal dalam cerita klasik Robert Louis Stevenson, Treasure Island (1883).lint', yang mungkin tidak adalah burung beo pertama yang diasosiasikan dengan nama 'Polly', tapi dia pasti mempopulerkannya. Dan Polly tentu memastikan burung beo itu akan menjadi simbol ikonik dari tradisi bajak laut. Sebagai pendamping legendaris bagi manusia, burung beo menyarankan semacam wali.",
embeddings = model.encode(sentences)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
# [3, 3]



Knowledge Distillation

Metric Value
negative_mse -3.5554


Metric Value
src2trg_accuracy 0.9894
trg2src_accuracy 0.9861
mean_accuracy 0.9877

Training Details

Training Dataset


  • Dataset: default at c8bc0cb
  • Size: 1,000,000 training samples
  • Columns: english, indonesian, and label
  • Approximate statistics based on the first 1000 samples:
    english indonesian label
    type string string list
    • min: 4 tokens
    • mean: 44.27 tokens
    • max: 128 tokens
    • min: 5 tokens
    • mean: 48.93 tokens
    • max: 128 tokens
    • size: 768 elements
  • Samples:
    english indonesian label
    This sample job description shares how one smaller sized, growing, multi-site nonprofit organization configured the role of executive director.The executive director is responsible for general management as well as designing a national expansion plan. There also is a heavy emphasis on program evaluation.Feel free to use this sample job description in creating one for your organization.osition. Reporting to the Board of Directors, the Executive Director (ED) will have overall strategic and operational responsibility for XYZ Nonprofit's staff, programs, expansion, and execution of its mission. S/he will initially develop deep knowledge of field, core programs, operations, and business plans. Uraian tugas contoh ini membagikan bagaimana satu organisasi nirlaba multi-situs berukuran lebih kecil, berkembang, mengonfigurasi peran direktur eksekutif. Direktur eksekutif bertanggung jawab atas manajemen umum serta merancang rencana ekspansi nasional. Ada juga penekanan berat pada evaluasi program. Jangan ragu untuk menggunakan contoh deskripsi pekerjaan ini dalam membuat satu untuk posisi organisasi Anda. Melaporkan kepada Dewan Direksi, Direktur Eksekutif (ED) akan memiliki tanggung jawab strategis dan operasional secara keseluruhan untuk staf, program, ekspansi, dan pelaksanaan misi XYZ Nirlaba. Dia awalnya akan mengembangkan pengetahuan yang mendalam tentang lapangan, program inti, operasi, dan rencana bisnis. [-0.4337165653705597, -0.0650932714343071, -0.04308838024735451, -0.1756953001022339, 0.32854965329170227, ...]
    Industrial revolution occured last in Russia. In Germany, France and United States industrial revolution occured in early-to-mid 1800's. While in Russia creation of railroads, and foundation of factories happened by govermental initiatives towards the end of XIX century.n Germany, France and United States industrial revolution occured in early-to-mid 1800's. Revolusi industri terakhir terjadi di Rusia. Di Jerman, Perancis dan Amerika Serikat terjadi revolusi industri pada awal hingga pertengahan 1800-an. Sedangkan di Rusia pembuatan rel kereta api, dan pendirian pabrik terjadi atas inisiatif pemerintah menjelang akhir abad XIX. Revolusi industri Jerman, Prancis dan Amerika Serikat terjadi pada awal hingga pertengahan 1800-an. [-0.22887374460697174, -0.17583712935447693, 0.08270637691020966, -0.15496928989887238, -0.18010610342025757, ...]
    what causes hordeolum internum left lower eyelid apa penyebab hordeolum internum kelopak mata kiri bawah [-0.19872592389583588, 0.4119395911693573, 0.3756648004055023, -0.4884617030620575, 0.15375499427318573, ...]
  • Loss: MSELoss

Evaluation Dataset


  • Dataset: default at c8bc0cb
  • Size: 1,000,000 evaluation samples
  • Columns: english, indonesian, and label
  • Approximate statistics based on the first 1000 samples:
    english indonesian label
    type string string list
    • min: 5 tokens
    • mean: 46.58 tokens
    • max: 128 tokens
    • min: 5 tokens
    • mean: 51.0 tokens
    • max: 128 tokens
    • size: 768 elements
  • Samples:
    english indonesian label
    do appraisers give adjustments for lot size apakah penilai memberikan penyesuaian untuk ukuran lot? [0.12256570905447006, 0.011573846451938152, -0.19426874816417694, -0.17596185207366943, 0.35024771094322205, ...]
    hotels in binghamton ny hotel di binghamton ny [0.14259624481201172, -0.048470016568899155, 0.1078888401389122, 0.06728225946426392, 0.6096671223640442, ...]
    guitarist kenny greenberg gitaris kenny greenberg [-0.6973275542259216, 0.27737292647361755, -0.09295299649238586, 0.24035970866680145, 0.154855415225029, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.4.0
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3



