metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6552
- loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-small-en-v1.5
widget:
- source_sentence: >-
What problem can reconfigurable intelligent surfaces mitigate in light
fidelity systems?
sentences:
- >-
The document mentions that blind channel estimation requires a large
number of data symbols to improve accuracy, which may not be feasible in
practice.
- >-
Empirical evidence suggests that the power decay can even be exponential
with distance.
- >-
Reconfigurable intelligent surface-enabled environments can enhance
light fidelity coverage by mitigating the dead-zone problem for users at
the edge of the cell, improving link quality.
- source_sentence: >-
What is the advantage of conformal arrays in UAV (Unmanned Aerial Vehicle)
communication systems?
sentences:
- >-
Overfitting occurs when a model fits the training data too well and
fails to generalize to unseen data, while underfitting occurs when a
model does not fit the training data well enough to capture the
underlying patterns.
- >-
A point-to-multipoint service is a service type in which data is sent to
all service subscribers or a pre-defined subset of all subscribers
within an area defined by the Service Requester.
- >-
Conformal arrays offer good aerodynamic performance, enable full-space
beam scanning, and provide more DoFs for geometry design.
- source_sentence: What is a Virtual Home Environment?
sentences:
- >-
Compressive spectrum sensing utilizes the sparsity property of signals
to enable sub-Nyquist sampling.
- >-
A Virtual Home Environment is a concept that allows for the portability
of personal service environments across network boundaries and between
terminals.
- >-
In the Client Server model, a Client application waits passively on
contact while a Server starts the communication actively.
- source_sentence: What is multi-agent RL (Reinforcement learning) concerned with?
sentences:
- >-
Data centers account for about 1% of global electricity demand, as
stated in the document.
- >-
Fog Computing and Communication in the Frugal 5G network architecture
brings intelligence to the edge and enables more efficient communication
with reduced resource usage.
- >-
Multi-agent RL is concerned with learning in presence of multiple agents
and encompasses unique problem formulation that draws from game
theoretical concepts.
- source_sentence: >-
What is the trade-off between privacy and convergence performance when
using artificial noise obscuring in federated learning?
sentences:
- >-
The 'decrypt_error' alert indicates a handshake cryptographic operation
failed, including being unable to verify a signature, decrypt a key
exchange, or validate a finished message.
- >-
The trade-off between privacy and convergence performance when using
artificial noise obscuring in federated learning is that increasing the
noise variance improves privacy but degrades convergence.
- >-
The design rules for sub-carrier allocations to users in cellular
systems are to allocate the sub-carriers as spread out as possible and
hop the sub-carriers every OFDM symbol time.
datasets:
- dinho1597/Telecom-QA-MultipleChoice
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_recall@1
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on BAAI/bge-small-en-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: telecom ir eval
type: telecom-ir-eval
metrics:
- type: cosine_accuracy@1
value: 0.9679633867276888
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.9916094584286804
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9916094584286804
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.992372234935164
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9679633867276888
name: Cosine Precision@1
- type: cosine_recall@1
value: 0.9679633867276888
name: Cosine Recall@1
- type: cosine_ndcg@10
value: 0.9823240649953693
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9788647342995168
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9791402442094453
name: Cosine Map@100
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the telecom-qa-multiple_choice dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'What is the trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning?',
'The trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning is that increasing the noise variance improves privacy but degrades convergence.',
"The 'decrypt_error' alert indicates a handshake cryptographic operation failed, including being unable to verify a signature, decrypt a key exchange, or validate a finished message.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
telecom-ir-eval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.968 |
cosine_accuracy@3 | 0.9916 |
cosine_accuracy@5 | 0.9916 |
cosine_accuracy@10 | 0.9924 |
cosine_precision@1 | 0.968 |
cosine_recall@1 | 0.968 |
cosine_ndcg@10 | 0.9823 |
cosine_mrr@10 | 0.9789 |
cosine_map@100 | 0.9791 |
Training Details
Training Dataset
telecom-qa-multiple_choice
- Dataset: telecom-qa-multiple_choice at 73aebbb
- Size: 6,552 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 4 tokens
- mean: 18.8 tokens
- max: 48 tokens
- min: 8 tokens
- mean: 29.27 tokens
- max: 92 tokens
- Samples:
anchor positive What is multi-user multiple input, multiple output (MU-MIMO) in IEEE 802.11-2020?
MU-MIMO is a technique by which multiple stations (STAs) either simultaneously transmit to a single STA or simultaneously receive from a single STA independent data streams over the same radio frequencies.
What is the purpose of wireless network virtualization?
The purpose of wireless network virtualization is to improve resource utilization, support diverse services/use cases, and be cost-effective and flexible for new services.
What is the E2E (end-to-end) latency requirement for factory automation applications?
Factory automation applications require an E2E latency of 0.25-10 ms.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Evaluation Dataset
telecom-qa-multiple_choice
- Dataset: telecom-qa-multiple_choice at 73aebbb
- Size: 6,552 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 4 tokens
- mean: 18.5 tokens
- max: 52 tokens
- min: 9 tokens
- mean: 28.83 tokens
- max: 85 tokens
- Samples:
anchor positive Which standard enables building Digital Twins of different Physical Twins using combinations of XML (eXtensible Markup Language) and C codes?
The functional mockup interface (FMI) is a standard that enables building Digital Twins of different Physical Twins using combinations of XML and C codes.
What algorithm is commonly used for digital signatures in S/MIME?
RSA is commonly used for digital signatures in S/MIME.
What are the three modes of operation based on the communication range and the SA (subarray) separation?
The three modes of operation based on the communication range and the SA separation are: (1) a mode where the channel paths are independent and the channel is always well-conditioned, (2) a mode where the channel is ill-conditioned, and (3) a mode where the channel is highly correlated.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 256per_device_eval_batch_size
: 256weight_decay
: 0.01num_train_epochs
: 10lr_scheduler_type
: cosine_with_restartswarmup_ratio
: 0.1fp16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 256per_device_eval_batch_size
: 256per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.01adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 10max_steps
: -1lr_scheduler_type
: cosine_with_restartslr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
---|---|---|---|---|
0.7143 | 15 | 0.824 | 0.1333 | 0.9701 |
1.3810 | 30 | 0.1731 | 0.0759 | 0.9776 |
2.0476 | 45 | 0.0917 | 0.0657 | 0.9807 |
2.7619 | 60 | 0.0676 | 0.0609 | 0.9813 |
3.4286 | 75 | 0.0435 | 0.0596 | 0.9818 |
4.0952 | 90 | 0.038 | 0.0606 | 0.9814 |
4.8095 | 105 | 0.0332 | 0.0594 | 0.9820 |
5.4762 | 120 | 0.0269 | 0.0607 | 0.9817 |
6.1429 | 135 | 0.0219 | 0.0600 | 0.9819 |
6.8571 | 150 | 0.0244 | 0.0599 | 0.9823 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.47.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}