zihoo's picture
Add new SentenceTransformer model.
91900c7 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:8000
  - loss:SoftmaxLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: I suspect their compliments are disingenuous
    sentences:
      - I feel skeptical about their ideas during planning sessions
      - I believe they intentionally withhold information from me
      - I notice the tension in rushed work hours.
  - source_sentence: I decline their invitations to mutual events
    sentences:
      - I accept moments of uncertainty as part of the job.
      - I embrace constructive criticism for personal growth.
      - I accept change as an integral part of progress.
  - source_sentence: I feel anger simmering when they speak up in meetings
    sentences:
      - I concentrate on tasks without getting sidetracked by emails.
      - I maintain focus by taking regular breaks.
      - I focus on one work task at a time.
  - source_sentence: I stay conscious of my patterned responses to pressure.
    sentences:
      - I ignore background noise to maintain task concentration.
      - I concentrate fully on reading reports.
      - I accept that criticism is a growth opportunity.
  - source_sentence: I accept my mistakes as part of my learning process.
    sentences:
      - I fully concentrate on client communications.
      - I suspect their compliments are disingenuous
      - I remain conscious of my work-life balance.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zihoo/all-MiniLM-L6-v2-WMNLI-10epoch")
# Run inference
sentences = [
    'I accept my mistakes as part of my learning process.',
    'I fully concentrate on client communications.',
    'I remain conscious of my work-life balance.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 8,000 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 8 tokens
    • mean: 11.65 tokens
    • max: 17 tokens
    • min: 8 tokens
    • mean: 11.77 tokens
    • max: 17 tokens
    • 0: ~25.80%
    • 1: ~36.80%
    • 2: ~37.40%
  • Samples:
    sentence1 sentence2 label
    I focus on one work task at a time. I keep my attention on the task despite office chatter. 0
    I worry they might spread false rumors about me I return focus to my work when my mind drifts. 2
    I stay aware of my posture when working at a desk. I pay attention to non-verbal cues from others. 0
  • Loss: SoftmaxLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 2,000 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string int
    details
    • min: 8 tokens
    • mean: 11.68 tokens
    • max: 17 tokens
    • min: 8 tokens
    • mean: 11.79 tokens
    • max: 17 tokens
    • 0: ~24.40%
    • 1: ~36.30%
    • 2: ~39.30%
  • Samples:
    sentence1 sentence2 label
    I stay conscious of my emotional responses to work challenges. I pay close attention to verbal instructions. 1
    I accept varied perspectives from my team graciously. I accept team dynamics as they naturally evolve. 0
    I accept technology upgrades with an open heart. I am mindful of my facial expressions during discussions. 1
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 9
  • warmup_ratio: 0.01

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 9
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.01
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.4 100 0.9566 0.8119
0.8 200 0.7499 0.6819
1.2 300 0.6541 0.5908
1.6 400 0.5759 0.5258
2.0 500 0.5112 0.4811
2.4 600 0.4659 0.4377
2.8 700 0.44 0.4020
3.2 800 0.4112 0.3721
3.6 900 0.3751 0.3462
4.0 1000 0.3517 0.3233
4.4 1100 0.3232 0.3033
4.8 1200 0.3189 0.2871
5.2 1300 0.2961 0.2711
5.6 1400 0.2865 0.2597
6.0 1500 0.2715 0.2499
6.4 1600 0.2639 0.2403
6.8 1700 0.2528 0.2339
7.2 1800 0.2482 0.2277
7.6 1900 0.2406 0.2236
8.0 2000 0.2403 0.2207
8.4 2100 0.2382 0.2184
8.8 2200 0.2314 0.2166

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}