BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mrhimanshu/bge-base-tm500-matryoshka")
# Run inference
sentences = [
    'What is the purpose of the Percentage block error value parameter in the HarqConfigUlSchForceErrors command?',
    'This parameter specifies the required percentage of retransmissions, with a value in the range 1-100.',
    'The possible values are: 0 = b56, 1 = b120, 2 = b208, 3 = b256, 4 = b328, 5 = b440, 6 = b552, 7 = b680.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.6359 0.6367 0.641 0.6443 0.6386
cosine_accuracy@3 0.7213 0.7221 0.7235 0.7261 0.7227
cosine_accuracy@5 0.7436 0.7458 0.7458 0.7503 0.7458
cosine_accuracy@10 0.7681 0.7709 0.7702 0.7762 0.7694
cosine_precision@1 0.6359 0.6367 0.641 0.6443 0.6386
cosine_precision@3 0.2404 0.2407 0.2412 0.242 0.2409
cosine_precision@5 0.1487 0.1492 0.1492 0.1501 0.1492
cosine_precision@10 0.0768 0.0771 0.077 0.0776 0.0769
cosine_recall@1 0.6359 0.6367 0.641 0.6443 0.6386
cosine_recall@3 0.7213 0.7221 0.7235 0.7261 0.7227
cosine_recall@5 0.7436 0.7458 0.7458 0.7503 0.7458
cosine_recall@10 0.7681 0.7709 0.7702 0.7762 0.7694
cosine_ndcg@10 0.7043 0.7056 0.7074 0.7115 0.7061
cosine_mrr@10 0.6837 0.6845 0.6871 0.6906 0.6856
cosine_map@100 0.6869 0.6876 0.6901 0.6943 0.6887

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 56,079 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 18.97 tokens
    • max: 80 tokens
    • min: 3 tokens
    • mean: 36.92 tokens
    • max: 183 tokens
  • Samples:
    anchor positive
    What is the purpose of the si-RepetitionPattern-r13 field? This field indicates the starting radio frames within the SI window used for SI message transmission.
    How do I deploy the DPDK Ping-Pong app's pod? You can deploy the DPDK Ping-Pong app's pod using the command 'oc create -f dpdk-ping-pong-app.yaml'.
    What are some of the key areas of focus for the TM500 support work in FY22? Some of the key areas of focus for the TM500 support work in FY22 include fault/error handling improvements, O-RAN FH 7-2 5G NR, and 5G NR R15 March 2021 Specification Analysis for SA/NSA Uplift.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 50
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 50
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
30.0913 3280 0.3105 - - - - -
30.1825 3290 0.3844 - - - - -
30.2738 3300 0.3692 - - - - -
30.3651 3310 0.4765 - - - - -
30.4564 3320 0.6655 - - - - -
30.5476 3330 0.4266 - - - - -
30.6389 3340 0.4789 - - - - -
30.7302 3350 0.3434 - - - - -
30.8214 3360 0.3293 - - - - -
30.9127 3370 0.3756 - - - - -
30.9949 3379 - 0.7044 0.7056 0.7068 0.7094 0.7051
31.0091 3380 0.365 - - - - -
31.1004 3390 0.2408 - - - - -
31.1917 3400 0.3518 - - - - -
31.2829 3410 0.2206 - - - - -
31.3742 3420 0.4491 - - - - -
31.4655 3430 0.3459 - - - - -
31.5568 3440 0.2167 - - - - -
31.6480 3450 0.3448 - - - - -
31.7393 3460 0.5039 - - - - -
31.8306 3470 0.3631 - - - - -
31.9218 3480 0.391 - - - - -
32.0 3489 - 0.7035 0.7051 0.7063 0.7092 0.7045
32.0091 3490 0.2659 - - - - -
32.1004 3500 0.2745 - - - - -
32.1917 3510 0.2537 - - - - -
32.2829 3520 0.3166 - - - - -
32.3742 3530 0.2939 - - - - -
32.4655 3540 0.2689 - - - - -
32.5568 3550 0.2336 - - - - -
32.6480 3560 0.3077 - - - - -
32.7393 3570 0.1973 - - - - -
32.8306 3580 0.2745 - - - - -
32.9218 3590 0.2869 - - - - -
33.0 3599 - 0.7049 0.7056 0.7065 0.7094 0.7048
33.0091 3600 0.2421 - - - - -
33.1004 3610 0.1584 - - - - -
33.1917 3620 0.2186 - - - - -
33.2829 3630 0.2598 - - - - -
33.3742 3640 0.3069 - - - - -
33.4655 3650 0.2056 - - - - -
33.5568 3660 0.3103 - - - - -
33.6480 3670 0.2616 - - - - -
33.7393 3680 0.2165 - - - - -
33.8306 3690 0.3849 - - - - -
33.9218 3700 0.2509 - - - - -
34.0 3709 - 0.7031 0.7045 0.7085 0.7102 0.7042
34.0091 3710 0.1596 - - - - -
34.1004 3720 0.214 - - - - -
34.1917 3730 0.3127 - - - - -
34.2829 3740 0.251 - - - - -
34.3742 3750 0.1834 - - - - -
34.4655 3760 0.2944 - - - - -
34.5568 3770 0.2227 - - - - -
34.6480 3780 0.17 - - - - -
34.7393 3790 0.2432 - - - - -
34.8306 3800 0.3604 - - - - -
34.9218 3810 0.2174 - - - - -
35.0 3819 - 0.7055 0.7065 0.708 0.7117 0.7062
35.0091 3820 0.1954 - - - - -
35.1004 3830 0.1728 - - - - -
35.1917 3840 0.3067 - - - - -
35.2829 3850 0.193 - - - - -
35.3742 3860 0.3561 - - - - -
35.4655 3870 0.2222 - - - - -
35.5568 3880 0.199 - - - - -
35.6480 3890 0.3151 - - - - -
35.7393 3900 0.1705 - - - - -
35.8306 3910 0.163 - - - - -
35.9218 3920 0.2656 - - - - -
36.0 3929 - 0.7048 0.7061 0.7078 0.7106 0.7056
36.0091 3930 0.2812 - - - - -
36.1004 3940 0.1983 - - - - -
36.1917 3950 0.3152 - - - - -
36.2829 3960 0.3657 - - - - -
36.3742 3970 0.1352 - - - - -
36.4655 3980 0.2214 - - - - -
36.5568 3990 0.1478 - - - - -
36.6480 4000 0.1882 - - - - -
36.7393 4010 0.2573 - - - - -
36.8306 4020 0.1419 - - - - -
36.9218 4030 0.2309 - - - - -
37.0 4039 - 0.7033 0.7046 0.7063 0.7081 0.7054
37.0091 4040 0.1689 - - - - -
37.1004 4050 0.2388 - - - - -
37.1917 4060 0.2038 - - - - -
37.2829 4070 0.1518 - - - - -
37.3742 4080 0.229 - - - - -
37.4655 4090 0.25 - - - - -
37.5568 4100 0.1462 - - - - -
37.6480 4110 0.211 - - - - -
37.7393 4120 0.122 - - - - -
37.8306 4130 0.1597 - - - - -
37.9218 4140 0.1778 - - - - -
38.0 4149 - 0.7031 0.7041 0.7056 0.7100 0.7049
38.0091 4150 0.2633 - - - - -
38.1004 4160 0.1517 - - - - -
38.1917 4170 0.1462 - - - - -
38.2829 4180 0.1307 - - - - -
38.3742 4190 0.1781 - - - - -
38.4655 4200 0.1962 - - - - -
38.5568 4210 0.2277 - - - - -
38.6480 4220 0.1357 - - - - -
38.7393 4230 0.136 - - - - -
38.8306 4240 0.2501 - - - - -
38.9218 4250 0.2639 - - - - -
39.0 4259 - 0.7038 0.7043 0.7064 0.7106 0.7042
39.0091 4260 0.1438 - - - - -
39.1004 4270 0.2083 - - - - -
39.1917 4280 0.2004 - - - - -
39.2829 4290 0.1305 - - - - -
39.3742 4300 0.1537 - - - - -
39.4655 4310 0.2368 - - - - -
39.5568 4320 0.1382 - - - - -
39.6480 4330 0.1886 - - - - -
39.7393 4340 0.2257 - - - - -
39.8306 4350 0.2291 - - - - -
39.9218 4360 0.2334 - - - - -
40.0 4369 - 0.7045 0.7052 0.7068 0.7088 0.7054
40.0091 4370 0.1684 - - - - -
40.1004 4380 0.2032 - - - - -
40.1917 4390 0.2066 - - - - -
40.2829 4400 0.1334 - - - - -
40.3742 4410 0.2159 - - - - -
40.4655 4420 0.3625 - - - - -
40.5568 4430 0.2307 - - - - -
40.6480 4440 0.1256 - - - - -
40.7393 4450 0.1902 - - - - -
40.8306 4460 0.3093 - - - - -
40.9218 4470 0.2015 - - - - -
41.0 4479 - 0.7034 0.7052 0.7056 0.7089 0.7061
41.0091 4480 0.1347 - - - - -
41.1004 4490 0.2251 - - - - -
41.1917 4500 0.1658 - - - - -
41.2829 4510 0.1449 - - - - -
41.3742 4520 0.1744 - - - - -
41.4655 4530 0.3955 - - - - -
41.5568 4540 0.2401 - - - - -
41.6480 4550 0.1298 - - - - -
41.7393 4560 0.1126 - - - - -
41.8306 4570 0.1784 - - - - -
41.9218 4580 0.1656 - - - - -
42.0 4589 - 0.7036 0.7043 0.7069 0.7085 0.7042
42.0091 4590 0.1977 - - - - -
42.1004 4600 0.3292 - - - - -
42.1917 4610 0.2736 - - - - -
42.2829 4620 0.231 - - - - -
42.3742 4630 0.1568 - - - - -
42.4655 4640 0.1616 - - - - -
42.5568 4650 0.1698 - - - - -
42.6480 4660 0.1936 - - - - -
42.7393 4670 0.1758 - - - - -
42.8306 4680 0.2178 - - - - -
42.9218 4690 0.1935 - - - - -
43.0 4699 - 0.7038 0.7040 0.7063 0.7081 0.7037
43.0091 4700 0.1325 - - - - -
43.1004 4710 0.1463 - - - - -
43.1917 4720 0.1971 - - - - -
43.2829 4730 0.242 - - - - -
43.3742 4740 0.1195 - - - - -
43.4655 4750 0.1844 - - - - -
43.5568 4760 0.2116 - - - - -
43.6480 4770 0.2107 - - - - -
43.7393 4780 0.1371 - - - - -
43.8306 4790 0.1556 - - - - -
43.9218 4800 0.1953 - - - - -
44.0 4809 - 0.7045 0.7042 0.7060 0.7078 0.7052
44.0091 4810 0.2728 - - - - -
44.1004 4820 0.1924 - - - - -
44.1917 4830 0.2475 - - - - -
44.2829 4840 0.2132 - - - - -
44.3742 4850 0.1206 - - - - -
44.4655 4860 0.1451 - - - - -
44.5568 4870 0.1233 - - - - -
44.6480 4880 0.2009 - - - - -
44.7393 4890 0.2006 - - - - -
44.8306 4900 0.1673 - - - - -
44.9218 4910 0.1228 - - - - -
45.0 4919 - 0.7044 0.7051 0.7069 0.7085 0.7051
45.0091 4920 0.276 - - - - -
45.1004 4930 0.1732 - - - - -
45.1917 4940 0.2183 - - - - -
45.2829 4950 0.1727 - - - - -
45.3742 4960 0.1709 - - - - -
45.4655 4970 0.1594 - - - - -
45.5568 4980 0.2395 - - - - -
45.6480 4990 0.1834 - - - - -
45.7393 5000 0.1934 - - - - -
45.8306 5010 0.165 - - - - -
45.9218 5020 0.1232 - - - - -
46.0 5029 - 0.7051 0.7061 0.7074 0.7096 0.7055
46.0091 5030 0.1327 - - - - -
46.1004 5040 0.1241 - - - - -
46.1917 5050 0.2127 - - - - -
46.2829 5060 0.2242 - - - - -
46.3742 5070 0.101 - - - - -
46.4655 5080 0.1632 - - - - -
46.5568 5090 0.1537 - - - - -
46.6480 5100 0.1272 - - - - -
46.7393 5110 0.2669 - - - - -
46.8306 5120 0.2315 - - - - -
46.9218 5130 0.1955 - - - - -
47.0 5139 - 0.7047 0.7050 0.7073 0.7067 0.7055
47.0091 5140 0.1464 - - - - -
47.1004 5150 0.2769 - - - - -
47.1917 5160 0.191 - - - - -
47.2829 5170 0.2333 - - - - -
47.3742 5180 0.1276 - - - - -
47.4655 5190 0.2427 - - - - -
47.5568 5200 0.2157 - - - - -
47.6480 5210 0.1175 - - - - -
47.7393 5220 0.1148 - - - - -
47.8306 5230 0.1821 - - - - -
47.9218 5240 0.1594 - - - - -
48.0 5249 - 0.7039 0.7048 0.7079 0.7093 0.7055
48.0091 5250 0.1146 - - - - -
48.1004 5260 0.1747 - - - - -
48.1917 5270 0.1448 - - - - -
48.2829 5280 0.3647 - - - - -
48.3742 5290 0.0926 - - - - -
48.4655 5300 0.163 - - - - -
48.5568 5310 0.2323 - - - - -
48.6480 5320 0.1281 - - - - -
48.7393 5330 0.2165 - - - - -
48.8306 5340 0.2673 - - - - -
48.9218 5350 0.1951 - - - - -
49.0 5359 - 0.7045 0.7060 0.7080 0.7100 0.7058
49.0091 5360 0.1871 - - - - -
49.1004 5370 0.2343 - - - - -
49.1917 5380 0.107 - - - - -
49.2829 5390 0.182 - - - - -
49.3742 5400 0.1722 - - - - -
49.4655 5410 0.1704 - - - - -
49.5568 5420 0.2386 - - - - -
49.6480 5430 0.1723 - - - - -
49.7393 5440 0.1875 - - - - -
49.8306 5450 0.1851 0.7043 0.7056 0.7074 0.7115 0.7061
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.2.2+cu121
  • Accelerate: 1.3.0
  • Datasets: 2.19.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
50
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mrhimanshu/bge-base-tm500-matryoshka

Finetuned
(366)
this model

Evaluation results