dinho1597 commited on
Commit
dd05793
·
verified ·
1 Parent(s): f6eea6d

Subiendo modelo inicial

Browse files
Files changed (7) hide show
  1. README.md +102 -105
  2. model.safetensors +1 -1
  3. optimizer.pt +3 -0
  4. rng_state.pth +3 -0
  5. scheduler.pt +3 -0
  6. trainer_state.json +273 -0
  7. training_args.bin +3 -0
README.md CHANGED
@@ -8,53 +8,55 @@ tags:
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: BAAI/bge-small-en-v1.5
10
  widget:
11
- - source_sentence: What does the RAT (Radio Access Technology) Agnostic Control Functions
12
- (RACFs) interface with?
13
  sentences:
14
- - TLS/SSL is an encryption protocol that creates a secure data channel on an insecure
15
- network to make client/server applications secure.
16
- - TLS/SSL ensures that data transferred between client and server applications cannot
17
- be misused or read, and detects if data have been amended during transmission.
18
- - The RAT Agnostic Control Functions (RACFs) interface with the Flow Controller
19
- and RAT Specific Control Functions (RSCFs).
20
- - source_sentence: What is an ultra-dense network (UDN)?
 
 
21
  sentences:
22
- - The Markov decision process approach models the delay-aware cross-layer optimization
23
- problem as an infinite horizon average cost MDP.
24
- - DNS (Domain Name System) is a system that translates human-friendly domain names
25
- into IP addresses used by computers to locate servers on the internet.
26
- - The document defines an ultra-dense network (UDN) as a network with the spatial
27
- density of cells much larger than that of active users.
28
- - source_sentence: What is the main benefit of storage coding for video files?
 
 
29
  sentences:
30
- - In the context of deep learning, GPU stands for Graphics Processing Unit, which
31
- enables parallel computing for faster inference.
32
- - Storage coding allows the recovery of original video files even in the case of
33
- multiple failures.
34
- - For DMG STAs, the TXOP holder may transmit a frame using a modulation class other
35
- than the DMG Control modulation class at the start of the TXOP if the time elapsed
36
- since the last frame received from the TXOP responder is shorter than the Heartbeat
37
- Elapsed Time value computed using the Heartbeat Elapsed Indication field within
38
- the TXOP responder's DMG Capabilities element.
39
- - source_sentence: What is the primary parameter used to measure the severity of narrowband
40
- fading in AG (air to ground) propagation channels?
41
  sentences:
42
- - The main difference between the Wyner's wiretap coding scheme and the coding scheme
43
- used in FB-CSs is that Wyner's scheme uses codeword rate and rate redundancy,
44
- while the FB-CSs scheme uses coding rate directly.
45
- - The RA field is the individual address of the STA that is the immediate intended
46
- receiver of the frame.
47
- - The severity of narrowband fading in AG propagation channels is measured using
48
- the Ricean K-factor, which is the ratio of dominant channel component power to
49
- the power in the sum of all other received components.
50
- - source_sentence: What is one way to enhance the robustness of terahertz in real-time
51
- communication?
52
  sentences:
53
- - Handover refers to the process of changing the serving cell of a UE in RRC_CONNECTED.
54
- - Enhancing beam tracking, resource allocation, and user association can improve
55
- the robustness of terahertz in real-time communication.
56
- - NG-PON has a one-way latency of 2.5 μs, which is the lowest among all the fronthaul
57
- technologies.
 
 
 
 
58
  datasets:
59
  - dinho1597/Telecom-QA-MultipleChoice
60
  pipeline_tag: sentence-similarity
@@ -80,31 +82,31 @@ model-index:
80
  type: telecom-ir-eval
81
  metrics:
82
  - type: cosine_accuracy@1
83
- value: 0.9482044198895028
84
  name: Cosine Accuracy@1
85
  - type: cosine_accuracy@3
86
- value: 0.9910220994475138
87
  name: Cosine Accuracy@3
88
  - type: cosine_accuracy@5
89
- value: 0.9924033149171271
90
  name: Cosine Accuracy@5
91
  - type: cosine_accuracy@10
92
- value: 0.9951657458563536
93
  name: Cosine Accuracy@10
94
  - type: cosine_precision@1
95
- value: 0.9482044198895028
96
  name: Cosine Precision@1
97
  - type: cosine_recall@1
98
- value: 0.9482044198895028
99
  name: Cosine Recall@1
100
  - type: cosine_ndcg@10
101
- value: 0.9753237736211358
102
  name: Cosine Ndcg@10
103
  - type: cosine_mrr@10
104
- value: 0.9685674822415152
105
  name: Cosine Mrr@10
106
  - type: cosine_map@100
107
- value: 0.9688408433724855
108
  name: Cosine Map@100
109
  ---
110
 
@@ -159,9 +161,9 @@ from sentence_transformers import SentenceTransformer
159
  model = SentenceTransformer("sentence_transformers_model_id")
160
  # Run inference
161
  sentences = [
162
- 'What is one way to enhance the robustness of terahertz in real-time communication?',
163
- 'Enhancing beam tracking, resource allocation, and user association can improve the robustness of terahertz in real-time communication.',
164
- 'Handover refers to the process of changing the serving cell of a UE in RRC_CONNECTED.',
165
  ]
166
  embeddings = model.encode(sentences)
167
  print(embeddings.shape)
@@ -208,15 +210,15 @@ You can finetune this model on your own dataset.
208
 
209
  | Metric | Value |
210
  |:-------------------|:-----------|
211
- | cosine_accuracy@1 | 0.9482 |
212
- | cosine_accuracy@3 | 0.991 |
213
- | cosine_accuracy@5 | 0.9924 |
214
- | cosine_accuracy@10 | 0.9952 |
215
- | cosine_precision@1 | 0.9482 |
216
- | cosine_recall@1 | 0.9482 |
217
- | **cosine_ndcg@10** | **0.9753** |
218
- | cosine_mrr@10 | 0.9686 |
219
- | cosine_map@100 | 0.9688 |
220
 
221
  <!--
222
  ## Bias, Risks and Limitations
@@ -240,16 +242,16 @@ You can finetune this model on your own dataset.
240
  * Size: 6,552 training samples
241
  * Columns: <code>anchor</code> and <code>positive</code>
242
  * Approximate statistics based on the first 1000 samples:
243
- | | anchor | positive |
244
- |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
245
- | type | string | string |
246
- | details | <ul><li>min: 4 tokens</li><li>mean: 18.54 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 29.33 tokens</li><li>max: 100 tokens</li></ul> |
247
  * Samples:
248
- | anchor | positive |
249
- |:-----------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
250
- | <code>What are the two mechanisms in a CDMA (Code Division Multiple Access) system that reduce the chances of significant interference?</code> | <code>In a CDMA system, power control and soft handoff mechanisms are used to reduce the chances of significant interference. Power control ensures that there is no significant intra-cell interference, and soft handoff selects the base station with the best reception to control the user's power, reducing the chance of out-of-cell interference.</code> |
251
- | <code>What type of traffic is LPWAN (Low-Power Wide Area Network) suitable for?</code> | <code>LPWANs are suitable for sporadic and intermittent transmissions of very small packets.</code> |
252
- | <code>What is the definition of an Authenticator in IEEE Std 802.11-2020?</code> | <code>An Authenticator is an entity at one end of a point-to-point LAN segment that facilitates authentication of the entity attached to the other end of that link.</code> |
253
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
254
  ```json
255
  {
@@ -266,16 +268,16 @@ You can finetune this model on your own dataset.
266
  * Size: 6,552 evaluation samples
267
  * Columns: <code>anchor</code> and <code>positive</code>
268
  * Approximate statistics based on the first 1000 samples:
269
- | | anchor | positive |
270
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
271
- | type | string | string |
272
- | details | <ul><li>min: 4 tokens</li><li>mean: 18.83 tokens</li><li>max: 45 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 28.7 tokens</li><li>max: 99 tokens</li></ul> |
273
  * Samples:
274
- | anchor | positive |
275
- |:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
276
- | <code>How is the continuous-time impulse response of the end-to-end system computed?</code> | <code>The continuous-time impulse response of the end-to-end system is computed as the convolution of the impulse responses of the RIS paths and the propagation channels.</code> |
277
- | <code>What is lattice staggering in a multicarrier scheme?</code> | <code>Lattice staggering is a methodology that generates inherent orthogonality between the points in the lattice for the real domain by using different prototype filters for the real and imaginary parts of the scheme.</code> |
278
- | <code>What are the benefits of using Mobile Edge Computing (MEC) for computation-intensive applications?</code> | <code>MEC enables applications to be split into small tasks with some of the tasks performed at the local or regional clouds, reducing delay and backhaul usage.</code> |
279
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
280
  ```json
281
  {
@@ -288,10 +290,10 @@ You can finetune this model on your own dataset.
288
  #### Non-Default Hyperparameters
289
 
290
  - `eval_strategy`: steps
291
- - `per_device_train_batch_size`: 128
292
- - `per_device_eval_batch_size`: 128
293
  - `weight_decay`: 0.01
294
- - `num_train_epochs`: 5
295
  - `lr_scheduler_type`: cosine_with_restarts
296
  - `warmup_ratio`: 0.1
297
  - `fp16`: True
@@ -305,8 +307,8 @@ You can finetune this model on your own dataset.
305
  - `do_predict`: False
306
  - `eval_strategy`: steps
307
  - `prediction_loss_only`: True
308
- - `per_device_train_batch_size`: 128
309
- - `per_device_eval_batch_size`: 128
310
  - `per_gpu_train_batch_size`: None
311
  - `per_gpu_eval_batch_size`: None
312
  - `gradient_accumulation_steps`: 1
@@ -318,7 +320,7 @@ You can finetune this model on your own dataset.
318
  - `adam_beta2`: 0.999
319
  - `adam_epsilon`: 1e-08
320
  - `max_grad_norm`: 1.0
321
- - `num_train_epochs`: 5
322
  - `max_steps`: -1
323
  - `lr_scheduler_type`: cosine_with_restarts
324
  - `lr_scheduler_kwargs`: {}
@@ -420,24 +422,19 @@ You can finetune this model on your own dataset.
420
  </details>
421
 
422
  ### Training Logs
423
- | Epoch | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
424
- |:----------:|:------:|:-------------:|:---------------:|:------------------------------:|
425
- | 0.3659 | 15 | 0.5651 | 0.1124 | 0.9593 |
426
- | 0.7317 | 30 | 0.144 | 0.0674 | 0.9713 |
427
- | 1.0976 | 45 | 0.0783 | 0.0605 | 0.9730 |
428
- | **1.4634** | **60** | **0.0633** | **0.0607** | **0.9753** |
429
- | 1.8293 | 75 | 0.0469 | 0.0570 | 0.9749 |
430
- | 2.1951 | 90 | 0.0297 | 0.0568 | 0.9755 |
431
- | 2.5610 | 105 | 0.0287 | 0.0583 | 0.9749 |
432
- | 2.9268 | 120 | 0.0266 | 0.0594 | 0.9747 |
433
- | 3.2927 | 135 | 0.0196 | 0.0604 | 0.9738 |
434
- | 3.6585 | 150 | 0.0191 | 0.0609 | 0.9742 |
435
- | 4.0244 | 165 | 0.0167 | 0.0608 | 0.9749 |
436
- | 4.3902 | 180 | 0.0165 | 0.0611 | 0.9746 |
437
- | 4.7561 | 195 | 0.016 | 0.0611 | 0.9746 |
438
- | 5.0 | 205 | - | - | 0.9753 |
439
-
440
- * The bold row denotes the saved checkpoint.
441
 
442
  ### Framework Versions
443
  - Python: 3.10.12
 
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: BAAI/bge-small-en-v1.5
10
  widget:
11
+ - source_sentence: What problem can reconfigurable intelligent surfaces mitigate in
12
+ light fidelity systems?
13
  sentences:
14
+ - The document mentions that blind channel estimation requires a large number of
15
+ data symbols to improve accuracy, which may not be feasible in practice.
16
+ - Empirical evidence suggests that the power decay can even be exponential with
17
+ distance.
18
+ - Reconfigurable intelligent surface-enabled environments can enhance light fidelity
19
+ coverage by mitigating the dead-zone problem for users at the edge of the cell,
20
+ improving link quality.
21
+ - source_sentence: What is the advantage of conformal arrays in UAV (Unmanned Aerial
22
+ Vehicle) communication systems?
23
  sentences:
24
+ - Overfitting occurs when a model fits the training data too well and fails to generalize
25
+ to unseen data, while underfitting occurs when a model does not fit the training
26
+ data well enough to capture the underlying patterns.
27
+ - A point-to-multipoint service is a service type in which data is sent to all service
28
+ subscribers or a pre-defined subset of all subscribers within an area defined
29
+ by the Service Requester.
30
+ - Conformal arrays offer good aerodynamic performance, enable full-space beam scanning,
31
+ and provide more DoFs for geometry design.
32
+ - source_sentence: What is a Virtual Home Environment?
33
  sentences:
34
+ - Compressive spectrum sensing utilizes the sparsity property of signals to enable
35
+ sub-Nyquist sampling.
36
+ - A Virtual Home Environment is a concept that allows for the portability of personal
37
+ service environments across network boundaries and between terminals.
38
+ - In the Client Server model, a Client application waits passively on contact while
39
+ a Server starts the communication actively.
40
+ - source_sentence: What is multi-agent RL (Reinforcement learning) concerned with?
 
 
 
 
41
  sentences:
42
+ - Data centers account for about 1% of global electricity demand, as stated in the
43
+ document.
44
+ - Fog Computing and Communication in the Frugal 5G network architecture brings intelligence
45
+ to the edge and enables more efficient communication with reduced resource usage.
46
+ - Multi-agent RL is concerned with learning in presence of multiple agents and encompasses
47
+ unique problem formulation that draws from game theoretical concepts.
48
+ - source_sentence: What is the trade-off between privacy and convergence performance
49
+ when using artificial noise obscuring in federated learning?
 
 
50
  sentences:
51
+ - The 'decrypt_error' alert indicates a handshake cryptographic operation failed,
52
+ including being unable to verify a signature, decrypt a key exchange, or validate
53
+ a finished message.
54
+ - The trade-off between privacy and convergence performance when using artificial
55
+ noise obscuring in federated learning is that increasing the noise variance improves
56
+ privacy but degrades convergence.
57
+ - The design rules for sub-carrier allocations to users in cellular systems are
58
+ to allocate the sub-carriers as spread out as possible and hop the sub-carriers
59
+ every OFDM symbol time.
60
  datasets:
61
  - dinho1597/Telecom-QA-MultipleChoice
62
  pipeline_tag: sentence-similarity
 
82
  type: telecom-ir-eval
83
  metrics:
84
  - type: cosine_accuracy@1
85
+ value: 0.9679633867276888
86
  name: Cosine Accuracy@1
87
  - type: cosine_accuracy@3
88
+ value: 0.9916094584286804
89
  name: Cosine Accuracy@3
90
  - type: cosine_accuracy@5
91
+ value: 0.9916094584286804
92
  name: Cosine Accuracy@5
93
  - type: cosine_accuracy@10
94
+ value: 0.992372234935164
95
  name: Cosine Accuracy@10
96
  - type: cosine_precision@1
97
+ value: 0.9679633867276888
98
  name: Cosine Precision@1
99
  - type: cosine_recall@1
100
+ value: 0.9679633867276888
101
  name: Cosine Recall@1
102
  - type: cosine_ndcg@10
103
+ value: 0.9823240649953693
104
  name: Cosine Ndcg@10
105
  - type: cosine_mrr@10
106
+ value: 0.9788647342995168
107
  name: Cosine Mrr@10
108
  - type: cosine_map@100
109
+ value: 0.9791402442094453
110
  name: Cosine Map@100
111
  ---
112
 
 
161
  model = SentenceTransformer("sentence_transformers_model_id")
162
  # Run inference
163
  sentences = [
164
+ 'What is the trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning?',
165
+ 'The trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning is that increasing the noise variance improves privacy but degrades convergence.',
166
+ "The 'decrypt_error' alert indicates a handshake cryptographic operation failed, including being unable to verify a signature, decrypt a key exchange, or validate a finished message.",
167
  ]
168
  embeddings = model.encode(sentences)
169
  print(embeddings.shape)
 
210
 
211
  | Metric | Value |
212
  |:-------------------|:-----------|
213
+ | cosine_accuracy@1 | 0.968 |
214
+ | cosine_accuracy@3 | 0.9916 |
215
+ | cosine_accuracy@5 | 0.9916 |
216
+ | cosine_accuracy@10 | 0.9924 |
217
+ | cosine_precision@1 | 0.968 |
218
+ | cosine_recall@1 | 0.968 |
219
+ | **cosine_ndcg@10** | **0.9823** |
220
+ | cosine_mrr@10 | 0.9789 |
221
+ | cosine_map@100 | 0.9791 |
222
 
223
  <!--
224
  ## Bias, Risks and Limitations
 
242
  * Size: 6,552 training samples
243
  * Columns: <code>anchor</code> and <code>positive</code>
244
  * Approximate statistics based on the first 1000 samples:
245
+ | | anchor | positive |
246
+ |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
247
+ | type | string | string |
248
+ | details | <ul><li>min: 4 tokens</li><li>mean: 18.8 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 29.27 tokens</li><li>max: 92 tokens</li></ul> |
249
  * Samples:
250
+ | anchor | positive |
251
+ |:---------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
252
+ | <code>What is multi-user multiple input, multiple output (MU-MIMO) in IEEE 802.11-2020?</code> | <code>MU-MIMO is a technique by which multiple stations (STAs) either simultaneously transmit to a single STA or simultaneously receive from a single STA independent data streams over the same radio frequencies.</code> |
253
+ | <code>What is the purpose of wireless network virtualization?</code> | <code>The purpose of wireless network virtualization is to improve resource utilization, support diverse services/use cases, and be cost-effective and flexible for new services.</code> |
254
+ | <code>What is the E2E (end-to-end) latency requirement for factory automation applications?</code> | <code>Factory automation applications require an E2E latency of 0.25-10 ms.</code> |
255
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
256
  ```json
257
  {
 
268
  * Size: 6,552 evaluation samples
269
  * Columns: <code>anchor</code> and <code>positive</code>
270
  * Approximate statistics based on the first 1000 samples:
271
+ | | anchor | positive |
272
+ |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
273
+ | type | string | string |
274
+ | details | <ul><li>min: 4 tokens</li><li>mean: 18.5 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 28.83 tokens</li><li>max: 85 tokens</li></ul> |
275
  * Samples:
276
+ | anchor | positive |
277
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
278
+ | <code>Which standard enables building Digital Twins of different Physical Twins using combinations of XML (eXtensible Markup Language) and C codes?</code> | <code>The functional mockup interface (FMI) is a standard that enables building Digital Twins of different Physical Twins using combinations of XML and C codes.</code> |
279
+ | <code>What algorithm is commonly used for digital signatures in S/MIME?</code> | <code>RSA is commonly used for digital signatures in S/MIME.</code> |
280
+ | <code>What are the three modes of operation based on the communication range and the SA (subarray) separation?</code> | <code>The three modes of operation based on the communication range and the SA separation are: (1) a mode where the channel paths are independent and the channel is always well-conditioned, (2) a mode where the channel is ill-conditioned, and (3) a mode where the channel is highly correlated.</code> |
281
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
282
  ```json
283
  {
 
290
  #### Non-Default Hyperparameters
291
 
292
  - `eval_strategy`: steps
293
+ - `per_device_train_batch_size`: 256
294
+ - `per_device_eval_batch_size`: 256
295
  - `weight_decay`: 0.01
296
+ - `num_train_epochs`: 10
297
  - `lr_scheduler_type`: cosine_with_restarts
298
  - `warmup_ratio`: 0.1
299
  - `fp16`: True
 
307
  - `do_predict`: False
308
  - `eval_strategy`: steps
309
  - `prediction_loss_only`: True
310
+ - `per_device_train_batch_size`: 256
311
+ - `per_device_eval_batch_size`: 256
312
  - `per_gpu_train_batch_size`: None
313
  - `per_gpu_eval_batch_size`: None
314
  - `gradient_accumulation_steps`: 1
 
320
  - `adam_beta2`: 0.999
321
  - `adam_epsilon`: 1e-08
322
  - `max_grad_norm`: 1.0
323
+ - `num_train_epochs`: 10
324
  - `max_steps`: -1
325
  - `lr_scheduler_type`: cosine_with_restarts
326
  - `lr_scheduler_kwargs`: {}
 
422
  </details>
423
 
424
  ### Training Logs
425
+ | Epoch | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
426
+ |:------:|:----:|:-------------:|:---------------:|:------------------------------:|
427
+ | 0.7143 | 15 | 0.824 | 0.1333 | 0.9701 |
428
+ | 1.3810 | 30 | 0.1731 | 0.0759 | 0.9776 |
429
+ | 2.0476 | 45 | 0.0917 | 0.0657 | 0.9807 |
430
+ | 2.7619 | 60 | 0.0676 | 0.0609 | 0.9813 |
431
+ | 3.4286 | 75 | 0.0435 | 0.0596 | 0.9818 |
432
+ | 4.0952 | 90 | 0.038 | 0.0606 | 0.9814 |
433
+ | 4.8095 | 105 | 0.0332 | 0.0594 | 0.9820 |
434
+ | 5.4762 | 120 | 0.0269 | 0.0607 | 0.9817 |
435
+ | 6.1429 | 135 | 0.0219 | 0.0600 | 0.9819 |
436
+ | 6.8571 | 150 | 0.0244 | 0.0599 | 0.9823 |
437
+
 
 
 
 
 
438
 
439
  ### Framework Versions
440
  - Python: 3.10.12
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3a89325a91a1814f74f9f16c67b9e60d4e246ff9577a9858772f23410241b1d
3
  size 133462128
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2788d601e71fb4f15d15d5780dce8b56db76d4301e25196c35a72a70c1a3e625
3
  size 133462128
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f28e9dfbc6751727e9f8f3d4493a6b20d0699b33e1f5c8e9917aa1e00b4ca69
3
+ size 265862074
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:679261b13fadc34524ddf3ac6ba660d0c6599219939ca3ff724f158ac156674d
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e77392b3a8e4ea80ac4740e1e6f52f30fdf708598646017d54d25d49fa8a1ca
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9679633867276888,
3
+ "best_model_checkpoint": "/content/drive/MyDrive/Papers/RAG_3GPP/models/checkpoints/embedding/bge-small-telecom_10e_256bs/checkpoint-150",
4
+ "epoch": 6.857142857142857,
5
+ "eval_steps": 15,
6
+ "global_step": 150,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.7142857142857143,
13
+ "grad_norm": 1.681250810623169,
14
+ "learning_rate": 3.571428571428572e-05,
15
+ "loss": 0.824,
16
+ "step": 15
17
+ },
18
+ {
19
+ "epoch": 0.7142857142857143,
20
+ "eval_loss": 0.13330750167369843,
21
+ "eval_runtime": 3.6814,
22
+ "eval_samples_per_second": 356.115,
23
+ "eval_steps_per_second": 1.63,
24
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9397406559877955,
25
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
26
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9839816933638444,
27
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9893211289092296,
28
+ "eval_telecom-ir-eval_cosine_map@100": 0.9625163452108533,
29
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9623769568849659,
30
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9701258981216676,
31
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9397406559877955,
32
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9397406559877955,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 1.380952380952381,
37
+ "grad_norm": 0.8189207315444946,
38
+ "learning_rate": 4.972077065562821e-05,
39
+ "loss": 0.1731,
40
+ "step": 30
41
+ },
42
+ {
43
+ "epoch": 1.380952380952381,
44
+ "eval_loss": 0.07593704760074615,
45
+ "eval_runtime": 4.0688,
46
+ "eval_samples_per_second": 322.209,
47
+ "eval_steps_per_second": 1.475,
48
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9565217391304348,
49
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
50
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9877955758962624,
51
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
52
+ "eval_telecom-ir-eval_cosine_map@100": 0.9723266300874301,
53
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9721883210441564,
54
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9776352051817517,
55
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9565217391304348,
56
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9565217391304348,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 2.0476190476190474,
61
+ "grad_norm": 0.7057574391365051,
62
+ "learning_rate": 4.803690529676019e-05,
63
+ "loss": 0.0917,
64
+ "step": 45
65
+ },
66
+ {
67
+ "epoch": 2.0476190476190474,
68
+ "eval_loss": 0.06566686183214188,
69
+ "eval_runtime": 3.7186,
70
+ "eval_samples_per_second": 352.553,
71
+ "eval_steps_per_second": 1.614,
72
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
73
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
74
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9900839054157132,
75
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
76
+ "eval_telecom-ir-eval_cosine_map@100": 0.9768047979761636,
77
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9765700483091787,
78
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9807364362901521,
79
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
80
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
81
+ "step": 45
82
+ },
83
+ {
84
+ "epoch": 2.761904761904762,
85
+ "grad_norm": 0.7498806118965149,
86
+ "learning_rate": 4.4928312680573064e-05,
87
+ "loss": 0.0676,
88
+ "step": 60
89
+ },
90
+ {
91
+ "epoch": 2.761904761904762,
92
+ "eval_loss": 0.06091764196753502,
93
+ "eval_runtime": 3.7927,
94
+ "eval_samples_per_second": 345.667,
95
+ "eval_steps_per_second": 1.582,
96
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9641495041952708,
97
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
98
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
99
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
100
+ "eval_telecom-ir-eval_cosine_map@100": 0.977428148947981,
101
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9771802695143658,
102
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9812569737659373,
103
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9641495041952708,
104
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9641495041952708,
105
+ "step": 60
106
+ },
107
+ {
108
+ "epoch": 3.4285714285714284,
109
+ "grad_norm": 0.48658156394958496,
110
+ "learning_rate": 4.058724504646834e-05,
111
+ "loss": 0.0435,
112
+ "step": 75
113
+ },
114
+ {
115
+ "epoch": 3.4285714285714284,
116
+ "eval_loss": 0.05956002324819565,
117
+ "eval_runtime": 4.2667,
118
+ "eval_samples_per_second": 307.261,
119
+ "eval_steps_per_second": 1.406,
120
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
121
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
122
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
123
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
124
+ "eval_telecom-ir-eval_cosine_map@100": 0.978052610298987,
125
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9778295376121463,
126
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9817518617980646,
127
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
128
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
129
+ "step": 75
130
+ },
131
+ {
132
+ "epoch": 4.095238095238095,
133
+ "grad_norm": 0.4985809624195099,
134
+ "learning_rate": 3.5282177578265296e-05,
135
+ "loss": 0.038,
136
+ "step": 90
137
+ },
138
+ {
139
+ "epoch": 4.095238095238095,
140
+ "eval_loss": 0.060632411390542984,
141
+ "eval_runtime": 4.6488,
142
+ "eval_samples_per_second": 282.008,
143
+ "eval_steps_per_second": 1.291,
144
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
145
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
146
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
147
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
148
+ "eval_telecom-ir-eval_cosine_map@100": 0.9775869566334031,
149
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9773646071700992,
150
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9813932046352999,
151
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
152
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
153
+ "step": 90
154
+ },
155
+ {
156
+ "epoch": 4.809523809523809,
157
+ "grad_norm": 0.4105435609817505,
158
+ "learning_rate": 2.9341204441673266e-05,
159
+ "loss": 0.0332,
160
+ "step": 105
161
+ },
162
+ {
163
+ "epoch": 4.809523809523809,
164
+ "eval_loss": 0.05935605987906456,
165
+ "eval_runtime": 4.0644,
166
+ "eval_samples_per_second": 322.554,
167
+ "eval_steps_per_second": 1.476,
168
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
169
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
170
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
171
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
172
+ "eval_telecom-ir-eval_cosine_map@100": 0.9783638236659703,
173
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9781273836765828,
174
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9819743331685896,
175
+ "eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
176
+ "eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
177
+ "step": 105
178
+ },
179
+ {
180
+ "epoch": 5.476190476190476,
181
+ "grad_norm": 0.468258261680603,
182
+ "learning_rate": 2.3131747660339394e-05,
183
+ "loss": 0.0269,
184
+ "step": 120
185
+ },
186
+ {
187
+ "epoch": 5.476190476190476,
188
+ "eval_loss": 0.060672808438539505,
189
+ "eval_runtime": 4.0797,
190
+ "eval_samples_per_second": 321.343,
191
+ "eval_steps_per_second": 1.471,
192
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9664378337147216,
193
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
194
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
195
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
196
+ "eval_telecom-ir-eval_cosine_map@100": 0.9780891289133677,
197
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9778688871938299,
198
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9817380288044749,
199
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9664378337147216,
200
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9664378337147216,
201
+ "step": 120
202
+ },
203
+ {
204
+ "epoch": 6.142857142857143,
205
+ "grad_norm": 0.192308709025383,
206
+ "learning_rate": 1.7037833743707892e-05,
207
+ "loss": 0.0219,
208
+ "step": 135
209
+ },
210
+ {
211
+ "epoch": 6.142857142857143,
212
+ "eval_loss": 0.06004022806882858,
213
+ "eval_runtime": 3.6988,
214
+ "eval_samples_per_second": 354.443,
215
+ "eval_steps_per_second": 1.622,
216
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
217
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
218
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
219
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
220
+ "eval_telecom-ir-eval_cosine_map@100": 0.9779666698415427,
221
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9778095601322145,
222
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9818676160795978,
223
+ "eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
224
+ "eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
225
+ "step": 135
226
+ },
227
+ {
228
+ "epoch": 6.857142857142857,
229
+ "grad_norm": 0.3330775499343872,
230
+ "learning_rate": 1.1436343403356017e-05,
231
+ "loss": 0.0244,
232
+ "step": 150
233
+ },
234
+ {
235
+ "epoch": 6.857142857142857,
236
+ "eval_loss": 0.05985964834690094,
237
+ "eval_runtime": 3.8386,
238
+ "eval_samples_per_second": 341.53,
239
+ "eval_steps_per_second": 1.563,
240
+ "eval_telecom-ir-eval_cosine_accuracy@1": 0.9679633867276888,
241
+ "eval_telecom-ir-eval_cosine_accuracy@10": 0.992372234935164,
242
+ "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
243
+ "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
244
+ "eval_telecom-ir-eval_cosine_map@100": 0.9791402442094453,
245
+ "eval_telecom-ir-eval_cosine_mrr@10": 0.9788647342995168,
246
+ "eval_telecom-ir-eval_cosine_ndcg@10": 0.9823240649953693,
247
+ "eval_telecom-ir-eval_cosine_precision@1": 0.9679633867276888,
248
+ "eval_telecom-ir-eval_cosine_recall@1": 0.9679633867276888,
249
+ "step": 150
250
+ }
251
+ ],
252
+ "logging_steps": 15,
253
+ "max_steps": 210,
254
+ "num_input_tokens_seen": 0,
255
+ "num_train_epochs": 10,
256
+ "save_steps": 15,
257
+ "stateful_callbacks": {
258
+ "TrainerControl": {
259
+ "args": {
260
+ "should_epoch_stop": false,
261
+ "should_evaluate": false,
262
+ "should_log": false,
263
+ "should_save": true,
264
+ "should_training_stop": false
265
+ },
266
+ "attributes": {}
267
+ }
268
+ },
269
+ "total_flos": 0.0,
270
+ "train_batch_size": 256,
271
+ "trial_name": null,
272
+ "trial_params": null
273
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17256d515423b97b1d01b2965f1f797edb0c05bede71be4bcea12c22c4dc76c5
3
+ size 5816