SenhorDasMoscas's picture
Add new SentenceTransformer model.
a877f16 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:19697
  - loss:CosineSimilarityLoss
base_model: neuralmind/bert-large-portuguese-cased
widget:
  - source_sentence: procurar sapato social masculino
    sentences:
      - beleza autocuidado
      - moda acessorio
      - doce chocolate
  - source_sentence: livro ultimo adeus cynthia hand
    sentences:
      - livro material literario
      - item colecao
      - joia bijuterio
  - source_sentence: relogio pulso
    sentences:
      - servico reparo eletronico
      - hortifruti
      - hortifruti
  - source_sentence: medicamento antipulga gato
    sentences:
      - produto pet animal domestico
      - hortifruti
      - padaria confeitaria
  - source_sentence: guitarra gibson Les Paul
    sentences:
      - moda acessorio
      - tinta
      - peixaria pescado
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on neuralmind/bert-large-portuguese-cased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: eval similarity
          type: eval-similarity
        metrics:
          - type: pearson_cosine
            value: 0.932130151806209
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8467496824207882
            name: Spearman Cosine

SentenceTransformer based on neuralmind/bert-large-portuguese-cased

This is a sentence-transformers model finetuned from neuralmind/bert-large-portuguese-cased. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: neuralmind/bert-large-portuguese-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SenhorDasMoscas/acho-ptbr-e3-lr0.0001-08-01-2025")
# Run inference
sentences = [
    'guitarra gibson Les Paul',
    'tinta',
    'peixaria pescado',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9321
spearman_cosine 0.8467

Training Details

Training Dataset

Unnamed Dataset

  • Size: 19,697 training samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 7.78 tokens
    • max: 17 tokens
    • min: 3 tokens
    • mean: 6.17 tokens
    • max: 11 tokens
    • min: 0.1
    • mean: 0.55
    • max: 1.0
  • Samples:
    text1 text2 label
    fritadeira eletrico em esse loja festa decoracao festa 0.1
    vinho papelaria escritorio 0.1
    forno eletrico Fischer eletrodomestico 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 2,189 evaluation samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string float
    details
    • min: 3 tokens
    • mean: 7.7 tokens
    • max: 18 tokens
    • min: 3 tokens
    • mean: 6.16 tokens
    • max: 11 tokens
    • min: 0.1
    • mean: 0.52
    • max: 1.0
  • Samples:
    text1 text2 label
    querer salgado comida rapido fastfood 1.0
    ervilha enlatar movel 0.1
    preciso loja artigo esporte aquatico servico area educacao 0.1
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 0.0001
  • weight_decay: 0.1
  • warmup_ratio: 0.1
  • warmup_steps: 246
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.1
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 246
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval-similarity_spearman_cosine
0.0081 5 0.1965 - -
0.0162 10 0.2125 - -
0.0244 15 0.1944 - -
0.0325 20 0.1674 - -
0.0406 25 0.1518 - -
0.0487 30 0.1381 - -
0.0568 35 0.1385 - -
0.0649 40 0.109 - -
0.0731 45 0.1054 - -
0.0812 50 0.0963 - -
0.0893 55 0.0917 - -
0.0974 60 0.0797 - -
0.1055 65 0.0877 - -
0.1136 70 0.0755 - -
0.1218 75 0.0773 - -
0.1299 80 0.0605 - -
0.1380 85 0.0669 - -
0.1461 90 0.0698 - -
0.1542 95 0.0595 - -
0.1623 100 0.0382 - -
0.1705 105 0.0723 - -
0.1786 110 0.0448 - -
0.1867 115 0.0703 - -
0.1948 120 0.0694 - -
0.2029 125 0.0515 - -
0.2110 130 0.0581 - -
0.2192 135 0.0458 - -
0.2273 140 0.0643 - -
0.2354 145 0.0602 - -
0.2435 150 0.0651 - -
0.2516 155 0.0662 - -
0.2597 160 0.0712 - -
0.2679 165 0.0546 - -
0.2760 170 0.0419 - -
0.2841 175 0.061 - -
0.2922 180 0.0549 - -
0.3003 185 0.0523 - -
0.3084 190 0.0579 - -
0.3166 195 0.0595 - -
0.3247 200 0.0478 - -
0.3328 205 0.0507 - -
0.3409 210 0.0312 - -
0.3490 215 0.041 - -
0.3571 220 0.0528 - -
0.3653 225 0.0386 - -
0.3734 230 0.0656 - -
0.3815 235 0.0567 - -
0.3896 240 0.0673 - -
0.3977 245 0.103 - -
0.4058 250 0.1704 - -
0.4140 255 0.0844 - -
0.4221 260 0.0883 - -
0.4302 265 0.0728 - -
0.4383 270 0.0531 - -
0.4464 275 0.0715 - -
0.4545 280 0.0623 - -
0.4627 285 0.0679 - -
0.4708 290 0.0577 - -
0.4789 295 0.0781 - -
0.4870 300 0.0541 - -
0.4951 305 0.0876 - -
0.5032 310 0.0648 - -
0.5114 315 0.0583 - -
0.5195 320 0.0506 - -
0.5276 325 0.051 - -
0.5357 330 0.0633 - -
0.5438 335 0.0764 - -
0.5519 340 0.0753 - -
0.5601 345 0.0701 - -
0.5682 350 0.0688 - -
0.5763 355 0.0691 - -
0.5844 360 0.0497 - -
0.5925 365 0.0606 - -
0.6006 370 0.0544 - -
0.6088 375 0.0587 - -
0.6169 380 0.0432 - -
0.625 385 0.0768 - -
0.6331 390 0.0701 - -
0.6412 395 0.0421 - -
0.6494 400 0.0415 - -
0.6575 405 0.0419 - -
0.6656 410 0.0613 - -
0.6737 415 0.0442 - -
0.6818 420 0.0487 - -
0.6899 425 0.0443 - -
0.6981 430 0.0493 - -
0.7062 435 0.0429 - -
0.7143 440 0.0464 - -
0.7224 445 0.0541 - -
0.7305 450 0.0539 - -
0.7386 455 0.0497 - -
0.7468 460 0.0471 - -
0.75 462 - 0.0457 0.8234
0.7549 465 0.0514 - -
0.7630 470 0.0457 - -
0.7711 475 0.0315 - -
0.7792 480 0.0491 - -
0.7873 485 0.0619 - -
0.7955 490 0.0298 - -
0.8036 495 0.0725 - -
0.8117 500 0.043 - -
0.8198 505 0.0392 - -
0.8279 510 0.0275 - -
0.8360 515 0.0509 - -
0.8442 520 0.0508 - -
0.8523 525 0.0394 - -
0.8604 530 0.0309 - -
0.8685 535 0.0601 - -
0.8766 540 0.0524 - -
0.8847 545 0.0491 - -
0.8929 550 0.0626 - -
0.9010 555 0.0395 - -
0.9091 560 0.0655 - -
0.9172 565 0.045 - -
0.9253 570 0.0394 - -
0.9334 575 0.0521 - -
0.9416 580 0.0324 - -
0.9497 585 0.0426 - -
0.9578 590 0.032 - -
0.9659 595 0.0425 - -
0.9740 600 0.0458 - -
0.9821 605 0.0341 - -
0.9903 610 0.0339 - -
0.9984 615 0.0444 - -
1.0065 620 0.0364 - -
1.0146 625 0.0277 - -
1.0227 630 0.0372 - -
1.0308 635 0.0254 - -
1.0390 640 0.0382 - -
1.0471 645 0.0333 - -
1.0552 650 0.0312 - -
1.0633 655 0.0366 - -
1.0714 660 0.0341 - -
1.0795 665 0.0146 - -
1.0877 670 0.0362 - -
1.0958 675 0.0225 - -
1.1039 680 0.038 - -
1.1120 685 0.0406 - -
1.1201 690 0.0392 - -
1.1282 695 0.0343 - -
1.1364 700 0.0494 - -
1.1445 705 0.021 - -
1.1526 710 0.0358 - -
1.1607 715 0.034 - -
1.1688 720 0.0288 - -
1.1769 725 0.0224 - -
1.1851 730 0.0324 - -
1.1932 735 0.0378 - -
1.2013 740 0.0446 - -
1.2094 745 0.0293 - -
1.2175 750 0.0314 - -
1.2256 755 0.0444 - -
1.2338 760 0.0283 - -
1.2419 765 0.0207 - -
1.25 770 0.0413 - -
1.2581 775 0.0317 - -
1.2662 780 0.0382 - -
1.2744 785 0.0363 - -
1.2825 790 0.0324 - -
1.2906 795 0.0225 - -
1.2987 800 0.0316 - -
1.3068 805 0.0438 - -
1.3149 810 0.0298 - -
1.3231 815 0.0395 - -
1.3312 820 0.0388 - -
1.3393 825 0.0289 - -
1.3474 830 0.0233 - -
1.3555 835 0.022 - -
1.3636 840 0.016 - -
1.3718 845 0.0488 - -
1.3799 850 0.0519 - -
1.3880 855 0.033 - -
1.3961 860 0.025 - -
1.4042 865 0.0212 - -
1.4123 870 0.0184 - -
1.4205 875 0.0335 - -
1.4286 880 0.0308 - -
1.4367 885 0.028 - -
1.4448 890 0.0352 - -
1.4529 895 0.0255 - -
1.4610 900 0.0243 - -
1.4692 905 0.0355 - -
1.4773 910 0.0267 - -
1.4854 915 0.0263 - -
1.4935 920 0.0275 - -
1.5 924 - 0.0313 0.8414
1.5016 925 0.0294 - -
1.5097 930 0.0514 - -
1.5179 935 0.0321 - -
1.5260 940 0.0306 - -
1.5341 945 0.0279 - -
1.5422 950 0.0334 - -
1.5503 955 0.0337 - -
1.5584 960 0.0266 - -
1.5666 965 0.036 - -
1.5747 970 0.0328 - -
1.5828 975 0.0224 - -
1.5909 980 0.0404 - -
1.5990 985 0.0293 - -
1.6071 990 0.016 - -
1.6153 995 0.0177 - -
1.6234 1000 0.0216 - -
1.6315 1005 0.029 - -
1.6396 1010 0.0306 - -
1.6477 1015 0.0291 - -
1.6558 1020 0.032 - -
1.6640 1025 0.0277 - -
1.6721 1030 0.0191 - -
1.6802 1035 0.0353 - -
1.6883 1040 0.0304 - -
1.6964 1045 0.0385 - -
1.7045 1050 0.0315 - -
1.7127 1055 0.0428 - -
1.7208 1060 0.0338 - -
1.7289 1065 0.0258 - -
1.7370 1070 0.0303 - -
1.7451 1075 0.0171 - -
1.7532 1080 0.0229 - -
1.7614 1085 0.0278 - -
1.7695 1090 0.0246 - -
1.7776 1095 0.0241 - -
1.7857 1100 0.0182 - -
1.7938 1105 0.0366 - -
1.8019 1110 0.0204 - -
1.8101 1115 0.0208 - -
1.8182 1120 0.01 - -
1.8263 1125 0.0239 - -
1.8344 1130 0.0228 - -
1.8425 1135 0.0228 - -
1.8506 1140 0.0176 - -
1.8588 1145 0.0278 - -
1.8669 1150 0.0242 - -
1.875 1155 0.0174 - -
1.8831 1160 0.0248 - -
1.8912 1165 0.0192 - -
1.8994 1170 0.0293 - -
1.9075 1175 0.017 - -
1.9156 1180 0.0212 - -
1.9237 1185 0.0214 - -
1.9318 1190 0.025 - -
1.9399 1195 0.0246 - -
1.9481 1200 0.0202 - -
1.9562 1205 0.021 - -
1.9643 1210 0.0183 - -
1.9724 1215 0.0313 - -
1.9805 1220 0.0211 - -
1.9886 1225 0.0299 - -
1.9968 1230 0.0222 - -
2.0049 1235 0.0154 - -
2.0130 1240 0.018 - -
2.0211 1245 0.0212 - -
2.0292 1250 0.0123 - -
2.0373 1255 0.013 - -
2.0455 1260 0.0213 - -
2.0536 1265 0.0125 - -
2.0617 1270 0.0175 - -
2.0698 1275 0.0092 - -
2.0779 1280 0.0209 - -
2.0860 1285 0.0135 - -
2.0942 1290 0.0295 - -
2.1023 1295 0.0175 - -
2.1104 1300 0.0252 - -
2.1185 1305 0.0071 - -
2.1266 1310 0.0139 - -
2.1347 1315 0.0104 - -
2.1429 1320 0.0125 - -
2.1510 1325 0.0103 - -
2.1591 1330 0.0171 - -
2.1672 1335 0.0083 - -
2.1753 1340 0.0185 - -
2.1834 1345 0.0141 - -
2.1916 1350 0.0177 - -
2.1997 1355 0.0189 - -
2.2078 1360 0.0254 - -
2.2159 1365 0.0198 - -
2.2240 1370 0.0162 - -
2.2321 1375 0.0139 - -
2.2403 1380 0.013 - -
2.2484 1385 0.0201 - -
2.25 1386 - 0.0292 0.8443
2.2565 1390 0.0202 - -
2.2646 1395 0.0169 - -
2.2727 1400 0.0105 - -
2.2808 1405 0.0136 - -
2.2890 1410 0.0125 - -
2.2971 1415 0.0168 - -
2.3052 1420 0.0108 - -
2.3133 1425 0.0297 - -
2.3214 1430 0.0233 - -
2.3295 1435 0.0164 - -
2.3377 1440 0.0178 - -
2.3458 1445 0.0203 - -
2.3539 1450 0.0112 - -
2.3620 1455 0.0156 - -
2.3701 1460 0.0151 - -
2.3782 1465 0.0097 - -
2.3864 1470 0.0196 - -
2.3945 1475 0.0148 - -
2.4026 1480 0.0154 - -
2.4107 1485 0.0069 - -
2.4188 1490 0.0145 - -
2.4269 1495 0.0204 - -
2.4351 1500 0.0225 - -
2.4432 1505 0.0165 - -
2.4513 1510 0.0079 - -
2.4594 1515 0.0183 - -
2.4675 1520 0.0196 - -
2.4756 1525 0.0085 - -
2.4838 1530 0.0109 - -
2.4919 1535 0.0168 - -
2.5 1540 0.0124 - -
2.5081 1545 0.0218 - -
2.5162 1550 0.0164 - -
2.5244 1555 0.0234 - -
2.5325 1560 0.0115 - -
2.5406 1565 0.0135 - -
2.5487 1570 0.0179 - -
2.5568 1575 0.0104 - -
2.5649 1580 0.0188 - -
2.5731 1585 0.0166 - -
2.5812 1590 0.0228 - -
2.5893 1595 0.015 - -
2.5974 1600 0.0171 - -
2.6055 1605 0.0207 - -
2.6136 1610 0.009 - -
2.6218 1615 0.0111 - -
2.6299 1620 0.0109 - -
2.6380 1625 0.0175 - -
2.6461 1630 0.0155 - -
2.6542 1635 0.0193 - -
2.6623 1640 0.0189 - -
2.6705 1645 0.0123 - -
2.6786 1650 0.0102 - -
2.6867 1655 0.0097 - -
2.6948 1660 0.0116 - -
2.7029 1665 0.0134 - -
2.7110 1670 0.0218 - -
2.7192 1675 0.0148 - -
2.7273 1680 0.0137 - -
2.7354 1685 0.0062 - -
2.7435 1690 0.0075 - -
2.7516 1695 0.0078 - -
2.7597 1700 0.0151 - -
2.7679 1705 0.0157 - -
2.7760 1710 0.0153 - -
2.7841 1715 0.0088 - -
2.7922 1720 0.0093 - -
2.8003 1725 0.0154 - -
2.8084 1730 0.0124 - -
2.8166 1735 0.0128 - -
2.8247 1740 0.0088 - -
2.8328 1745 0.0144 - -
2.8409 1750 0.0184 - -
2.8490 1755 0.0114 - -
2.8571 1760 0.0043 - -
2.8653 1765 0.0151 - -
2.8734 1770 0.0089 - -
2.8815 1775 0.014 - -
2.8896 1780 0.0095 - -
2.8977 1785 0.0106 - -
2.9058 1790 0.007 - -
2.9140 1795 0.0275 - -
2.9221 1800 0.0185 - -
2.9302 1805 0.0158 - -
2.9383 1810 0.0134 - -
2.9464 1815 0.0068 - -
2.9545 1820 0.0144 - -
2.9627 1825 0.0134 - -
2.9708 1830 0.0109 - -
2.9789 1835 0.0114 - -
2.9870 1840 0.0097 - -
2.9951 1845 0.0076 - -
3.0 1848 - 0.0269 0.8467
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 2.14.4
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}