Subiendo modelo inicial
Browse files- README.md +102 -105
- model.safetensors +1 -1
- optimizer.pt +3 -0
- rng_state.pth +3 -0
- scheduler.pt +3 -0
- trainer_state.json +273 -0
- training_args.bin +3 -0
README.md
CHANGED
@@ -8,53 +8,55 @@ tags:
|
|
8 |
- loss:MultipleNegativesRankingLoss
|
9 |
base_model: BAAI/bge-small-en-v1.5
|
10 |
widget:
|
11 |
-
- source_sentence: What
|
12 |
-
|
13 |
sentences:
|
14 |
-
-
|
15 |
-
|
16 |
-
-
|
17 |
-
|
18 |
-
-
|
19 |
-
|
20 |
-
|
|
|
|
|
21 |
sentences:
|
22 |
-
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
-
|
|
|
|
|
29 |
sentences:
|
30 |
-
-
|
31 |
-
|
32 |
-
-
|
33 |
-
|
34 |
-
-
|
35 |
-
|
36 |
-
|
37 |
-
Elapsed Time value computed using the Heartbeat Elapsed Indication field within
|
38 |
-
the TXOP responder's DMG Capabilities element.
|
39 |
-
- source_sentence: What is the primary parameter used to measure the severity of narrowband
|
40 |
-
fading in AG (air to ground) propagation channels?
|
41 |
sentences:
|
42 |
-
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
- source_sentence: What is one way to enhance the robustness of terahertz in real-time
|
51 |
-
communication?
|
52 |
sentences:
|
53 |
-
-
|
54 |
-
|
55 |
-
|
56 |
-
-
|
57 |
-
|
|
|
|
|
|
|
|
|
58 |
datasets:
|
59 |
- dinho1597/Telecom-QA-MultipleChoice
|
60 |
pipeline_tag: sentence-similarity
|
@@ -80,31 +82,31 @@ model-index:
|
|
80 |
type: telecom-ir-eval
|
81 |
metrics:
|
82 |
- type: cosine_accuracy@1
|
83 |
-
value: 0.
|
84 |
name: Cosine Accuracy@1
|
85 |
- type: cosine_accuracy@3
|
86 |
-
value: 0.
|
87 |
name: Cosine Accuracy@3
|
88 |
- type: cosine_accuracy@5
|
89 |
-
value: 0.
|
90 |
name: Cosine Accuracy@5
|
91 |
- type: cosine_accuracy@10
|
92 |
-
value: 0.
|
93 |
name: Cosine Accuracy@10
|
94 |
- type: cosine_precision@1
|
95 |
-
value: 0.
|
96 |
name: Cosine Precision@1
|
97 |
- type: cosine_recall@1
|
98 |
-
value: 0.
|
99 |
name: Cosine Recall@1
|
100 |
- type: cosine_ndcg@10
|
101 |
-
value: 0.
|
102 |
name: Cosine Ndcg@10
|
103 |
- type: cosine_mrr@10
|
104 |
-
value: 0.
|
105 |
name: Cosine Mrr@10
|
106 |
- type: cosine_map@100
|
107 |
-
value: 0.
|
108 |
name: Cosine Map@100
|
109 |
---
|
110 |
|
@@ -159,9 +161,9 @@ from sentence_transformers import SentenceTransformer
|
|
159 |
model = SentenceTransformer("sentence_transformers_model_id")
|
160 |
# Run inference
|
161 |
sentences = [
|
162 |
-
'What is
|
163 |
-
'
|
164 |
-
'
|
165 |
]
|
166 |
embeddings = model.encode(sentences)
|
167 |
print(embeddings.shape)
|
@@ -208,15 +210,15 @@ You can finetune this model on your own dataset.
|
|
208 |
|
209 |
| Metric | Value |
|
210 |
|:-------------------|:-----------|
|
211 |
-
| cosine_accuracy@1 | 0.
|
212 |
-
| cosine_accuracy@3 | 0.
|
213 |
-
| cosine_accuracy@5 | 0.
|
214 |
-
| cosine_accuracy@10 | 0.
|
215 |
-
| cosine_precision@1 | 0.
|
216 |
-
| cosine_recall@1 | 0.
|
217 |
-
| **cosine_ndcg@10** | **0.
|
218 |
-
| cosine_mrr@10 | 0.
|
219 |
-
| cosine_map@100 | 0.
|
220 |
|
221 |
<!--
|
222 |
## Bias, Risks and Limitations
|
@@ -240,16 +242,16 @@ You can finetune this model on your own dataset.
|
|
240 |
* Size: 6,552 training samples
|
241 |
* Columns: <code>anchor</code> and <code>positive</code>
|
242 |
* Approximate statistics based on the first 1000 samples:
|
243 |
-
| | anchor
|
244 |
-
|
245 |
-
| type | string
|
246 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 18.
|
247 |
* Samples:
|
248 |
-
| anchor
|
249 |
-
|
250 |
-
| <code>What
|
251 |
-
| <code>What
|
252 |
-
| <code>What is the
|
253 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
254 |
```json
|
255 |
{
|
@@ -266,16 +268,16 @@ You can finetune this model on your own dataset.
|
|
266 |
* Size: 6,552 evaluation samples
|
267 |
* Columns: <code>anchor</code> and <code>positive</code>
|
268 |
* Approximate statistics based on the first 1000 samples:
|
269 |
-
| | anchor
|
270 |
-
|
271 |
-
| type | string
|
272 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 18.
|
273 |
* Samples:
|
274 |
-
| anchor
|
275 |
-
|
276 |
-
| <code>
|
277 |
-
| <code>What is
|
278 |
-
| <code>What are the
|
279 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
280 |
```json
|
281 |
{
|
@@ -288,10 +290,10 @@ You can finetune this model on your own dataset.
|
|
288 |
#### Non-Default Hyperparameters
|
289 |
|
290 |
- `eval_strategy`: steps
|
291 |
-
- `per_device_train_batch_size`:
|
292 |
-
- `per_device_eval_batch_size`:
|
293 |
- `weight_decay`: 0.01
|
294 |
-
- `num_train_epochs`:
|
295 |
- `lr_scheduler_type`: cosine_with_restarts
|
296 |
- `warmup_ratio`: 0.1
|
297 |
- `fp16`: True
|
@@ -305,8 +307,8 @@ You can finetune this model on your own dataset.
|
|
305 |
- `do_predict`: False
|
306 |
- `eval_strategy`: steps
|
307 |
- `prediction_loss_only`: True
|
308 |
-
- `per_device_train_batch_size`:
|
309 |
-
- `per_device_eval_batch_size`:
|
310 |
- `per_gpu_train_batch_size`: None
|
311 |
- `per_gpu_eval_batch_size`: None
|
312 |
- `gradient_accumulation_steps`: 1
|
@@ -318,7 +320,7 @@ You can finetune this model on your own dataset.
|
|
318 |
- `adam_beta2`: 0.999
|
319 |
- `adam_epsilon`: 1e-08
|
320 |
- `max_grad_norm`: 1.0
|
321 |
-
- `num_train_epochs`:
|
322 |
- `max_steps`: -1
|
323 |
- `lr_scheduler_type`: cosine_with_restarts
|
324 |
- `lr_scheduler_kwargs`: {}
|
@@ -420,24 +422,19 @@ You can finetune this model on your own dataset.
|
|
420 |
</details>
|
421 |
|
422 |
### Training Logs
|
423 |
-
| Epoch
|
424 |
-
|
425 |
-
| 0.
|
426 |
-
|
|
427 |
-
|
|
428 |
-
|
|
429 |
-
|
|
430 |
-
|
|
431 |
-
|
|
432 |
-
|
|
433 |
-
|
|
434 |
-
|
|
435 |
-
|
436 |
-
| 4.3902 | 180 | 0.0165 | 0.0611 | 0.9746 |
|
437 |
-
| 4.7561 | 195 | 0.016 | 0.0611 | 0.9746 |
|
438 |
-
| 5.0 | 205 | - | - | 0.9753 |
|
439 |
-
|
440 |
-
* The bold row denotes the saved checkpoint.
|
441 |
|
442 |
### Framework Versions
|
443 |
- Python: 3.10.12
|
|
|
8 |
- loss:MultipleNegativesRankingLoss
|
9 |
base_model: BAAI/bge-small-en-v1.5
|
10 |
widget:
|
11 |
+
- source_sentence: What problem can reconfigurable intelligent surfaces mitigate in
|
12 |
+
light fidelity systems?
|
13 |
sentences:
|
14 |
+
- The document mentions that blind channel estimation requires a large number of
|
15 |
+
data symbols to improve accuracy, which may not be feasible in practice.
|
16 |
+
- Empirical evidence suggests that the power decay can even be exponential with
|
17 |
+
distance.
|
18 |
+
- Reconfigurable intelligent surface-enabled environments can enhance light fidelity
|
19 |
+
coverage by mitigating the dead-zone problem for users at the edge of the cell,
|
20 |
+
improving link quality.
|
21 |
+
- source_sentence: What is the advantage of conformal arrays in UAV (Unmanned Aerial
|
22 |
+
Vehicle) communication systems?
|
23 |
sentences:
|
24 |
+
- Overfitting occurs when a model fits the training data too well and fails to generalize
|
25 |
+
to unseen data, while underfitting occurs when a model does not fit the training
|
26 |
+
data well enough to capture the underlying patterns.
|
27 |
+
- A point-to-multipoint service is a service type in which data is sent to all service
|
28 |
+
subscribers or a pre-defined subset of all subscribers within an area defined
|
29 |
+
by the Service Requester.
|
30 |
+
- Conformal arrays offer good aerodynamic performance, enable full-space beam scanning,
|
31 |
+
and provide more DoFs for geometry design.
|
32 |
+
- source_sentence: What is a Virtual Home Environment?
|
33 |
sentences:
|
34 |
+
- Compressive spectrum sensing utilizes the sparsity property of signals to enable
|
35 |
+
sub-Nyquist sampling.
|
36 |
+
- A Virtual Home Environment is a concept that allows for the portability of personal
|
37 |
+
service environments across network boundaries and between terminals.
|
38 |
+
- In the Client Server model, a Client application waits passively on contact while
|
39 |
+
a Server starts the communication actively.
|
40 |
+
- source_sentence: What is multi-agent RL (Reinforcement learning) concerned with?
|
|
|
|
|
|
|
|
|
41 |
sentences:
|
42 |
+
- Data centers account for about 1% of global electricity demand, as stated in the
|
43 |
+
document.
|
44 |
+
- Fog Computing and Communication in the Frugal 5G network architecture brings intelligence
|
45 |
+
to the edge and enables more efficient communication with reduced resource usage.
|
46 |
+
- Multi-agent RL is concerned with learning in presence of multiple agents and encompasses
|
47 |
+
unique problem formulation that draws from game theoretical concepts.
|
48 |
+
- source_sentence: What is the trade-off between privacy and convergence performance
|
49 |
+
when using artificial noise obscuring in federated learning?
|
|
|
|
|
50 |
sentences:
|
51 |
+
- The 'decrypt_error' alert indicates a handshake cryptographic operation failed,
|
52 |
+
including being unable to verify a signature, decrypt a key exchange, or validate
|
53 |
+
a finished message.
|
54 |
+
- The trade-off between privacy and convergence performance when using artificial
|
55 |
+
noise obscuring in federated learning is that increasing the noise variance improves
|
56 |
+
privacy but degrades convergence.
|
57 |
+
- The design rules for sub-carrier allocations to users in cellular systems are
|
58 |
+
to allocate the sub-carriers as spread out as possible and hop the sub-carriers
|
59 |
+
every OFDM symbol time.
|
60 |
datasets:
|
61 |
- dinho1597/Telecom-QA-MultipleChoice
|
62 |
pipeline_tag: sentence-similarity
|
|
|
82 |
type: telecom-ir-eval
|
83 |
metrics:
|
84 |
- type: cosine_accuracy@1
|
85 |
+
value: 0.9679633867276888
|
86 |
name: Cosine Accuracy@1
|
87 |
- type: cosine_accuracy@3
|
88 |
+
value: 0.9916094584286804
|
89 |
name: Cosine Accuracy@3
|
90 |
- type: cosine_accuracy@5
|
91 |
+
value: 0.9916094584286804
|
92 |
name: Cosine Accuracy@5
|
93 |
- type: cosine_accuracy@10
|
94 |
+
value: 0.992372234935164
|
95 |
name: Cosine Accuracy@10
|
96 |
- type: cosine_precision@1
|
97 |
+
value: 0.9679633867276888
|
98 |
name: Cosine Precision@1
|
99 |
- type: cosine_recall@1
|
100 |
+
value: 0.9679633867276888
|
101 |
name: Cosine Recall@1
|
102 |
- type: cosine_ndcg@10
|
103 |
+
value: 0.9823240649953693
|
104 |
name: Cosine Ndcg@10
|
105 |
- type: cosine_mrr@10
|
106 |
+
value: 0.9788647342995168
|
107 |
name: Cosine Mrr@10
|
108 |
- type: cosine_map@100
|
109 |
+
value: 0.9791402442094453
|
110 |
name: Cosine Map@100
|
111 |
---
|
112 |
|
|
|
161 |
model = SentenceTransformer("sentence_transformers_model_id")
|
162 |
# Run inference
|
163 |
sentences = [
|
164 |
+
'What is the trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning?',
|
165 |
+
'The trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning is that increasing the noise variance improves privacy but degrades convergence.',
|
166 |
+
"The 'decrypt_error' alert indicates a handshake cryptographic operation failed, including being unable to verify a signature, decrypt a key exchange, or validate a finished message.",
|
167 |
]
|
168 |
embeddings = model.encode(sentences)
|
169 |
print(embeddings.shape)
|
|
|
210 |
|
211 |
| Metric | Value |
|
212 |
|:-------------------|:-----------|
|
213 |
+
| cosine_accuracy@1 | 0.968 |
|
214 |
+
| cosine_accuracy@3 | 0.9916 |
|
215 |
+
| cosine_accuracy@5 | 0.9916 |
|
216 |
+
| cosine_accuracy@10 | 0.9924 |
|
217 |
+
| cosine_precision@1 | 0.968 |
|
218 |
+
| cosine_recall@1 | 0.968 |
|
219 |
+
| **cosine_ndcg@10** | **0.9823** |
|
220 |
+
| cosine_mrr@10 | 0.9789 |
|
221 |
+
| cosine_map@100 | 0.9791 |
|
222 |
|
223 |
<!--
|
224 |
## Bias, Risks and Limitations
|
|
|
242 |
* Size: 6,552 training samples
|
243 |
* Columns: <code>anchor</code> and <code>positive</code>
|
244 |
* Approximate statistics based on the first 1000 samples:
|
245 |
+
| | anchor | positive |
|
246 |
+
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
247 |
+
| type | string | string |
|
248 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 18.8 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 29.27 tokens</li><li>max: 92 tokens</li></ul> |
|
249 |
* Samples:
|
250 |
+
| anchor | positive |
|
251 |
+
|:---------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
252 |
+
| <code>What is multi-user multiple input, multiple output (MU-MIMO) in IEEE 802.11-2020?</code> | <code>MU-MIMO is a technique by which multiple stations (STAs) either simultaneously transmit to a single STA or simultaneously receive from a single STA independent data streams over the same radio frequencies.</code> |
|
253 |
+
| <code>What is the purpose of wireless network virtualization?</code> | <code>The purpose of wireless network virtualization is to improve resource utilization, support diverse services/use cases, and be cost-effective and flexible for new services.</code> |
|
254 |
+
| <code>What is the E2E (end-to-end) latency requirement for factory automation applications?</code> | <code>Factory automation applications require an E2E latency of 0.25-10 ms.</code> |
|
255 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
256 |
```json
|
257 |
{
|
|
|
268 |
* Size: 6,552 evaluation samples
|
269 |
* Columns: <code>anchor</code> and <code>positive</code>
|
270 |
* Approximate statistics based on the first 1000 samples:
|
271 |
+
| | anchor | positive |
|
272 |
+
|:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
273 |
+
| type | string | string |
|
274 |
+
| details | <ul><li>min: 4 tokens</li><li>mean: 18.5 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 28.83 tokens</li><li>max: 85 tokens</li></ul> |
|
275 |
* Samples:
|
276 |
+
| anchor | positive |
|
277 |
+
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
278 |
+
| <code>Which standard enables building Digital Twins of different Physical Twins using combinations of XML (eXtensible Markup Language) and C codes?</code> | <code>The functional mockup interface (FMI) is a standard that enables building Digital Twins of different Physical Twins using combinations of XML and C codes.</code> |
|
279 |
+
| <code>What algorithm is commonly used for digital signatures in S/MIME?</code> | <code>RSA is commonly used for digital signatures in S/MIME.</code> |
|
280 |
+
| <code>What are the three modes of operation based on the communication range and the SA (subarray) separation?</code> | <code>The three modes of operation based on the communication range and the SA separation are: (1) a mode where the channel paths are independent and the channel is always well-conditioned, (2) a mode where the channel is ill-conditioned, and (3) a mode where the channel is highly correlated.</code> |
|
281 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
282 |
```json
|
283 |
{
|
|
|
290 |
#### Non-Default Hyperparameters
|
291 |
|
292 |
- `eval_strategy`: steps
|
293 |
+
- `per_device_train_batch_size`: 256
|
294 |
+
- `per_device_eval_batch_size`: 256
|
295 |
- `weight_decay`: 0.01
|
296 |
+
- `num_train_epochs`: 10
|
297 |
- `lr_scheduler_type`: cosine_with_restarts
|
298 |
- `warmup_ratio`: 0.1
|
299 |
- `fp16`: True
|
|
|
307 |
- `do_predict`: False
|
308 |
- `eval_strategy`: steps
|
309 |
- `prediction_loss_only`: True
|
310 |
+
- `per_device_train_batch_size`: 256
|
311 |
+
- `per_device_eval_batch_size`: 256
|
312 |
- `per_gpu_train_batch_size`: None
|
313 |
- `per_gpu_eval_batch_size`: None
|
314 |
- `gradient_accumulation_steps`: 1
|
|
|
320 |
- `adam_beta2`: 0.999
|
321 |
- `adam_epsilon`: 1e-08
|
322 |
- `max_grad_norm`: 1.0
|
323 |
+
- `num_train_epochs`: 10
|
324 |
- `max_steps`: -1
|
325 |
- `lr_scheduler_type`: cosine_with_restarts
|
326 |
- `lr_scheduler_kwargs`: {}
|
|
|
422 |
</details>
|
423 |
|
424 |
### Training Logs
|
425 |
+
| Epoch | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
|
426 |
+
|:------:|:----:|:-------------:|:---------------:|:------------------------------:|
|
427 |
+
| 0.7143 | 15 | 0.824 | 0.1333 | 0.9701 |
|
428 |
+
| 1.3810 | 30 | 0.1731 | 0.0759 | 0.9776 |
|
429 |
+
| 2.0476 | 45 | 0.0917 | 0.0657 | 0.9807 |
|
430 |
+
| 2.7619 | 60 | 0.0676 | 0.0609 | 0.9813 |
|
431 |
+
| 3.4286 | 75 | 0.0435 | 0.0596 | 0.9818 |
|
432 |
+
| 4.0952 | 90 | 0.038 | 0.0606 | 0.9814 |
|
433 |
+
| 4.8095 | 105 | 0.0332 | 0.0594 | 0.9820 |
|
434 |
+
| 5.4762 | 120 | 0.0269 | 0.0607 | 0.9817 |
|
435 |
+
| 6.1429 | 135 | 0.0219 | 0.0600 | 0.9819 |
|
436 |
+
| 6.8571 | 150 | 0.0244 | 0.0599 | 0.9823 |
|
437 |
+
|
|
|
|
|
|
|
|
|
|
|
438 |
|
439 |
### Framework Versions
|
440 |
- Python: 3.10.12
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 133462128
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2788d601e71fb4f15d15d5780dce8b56db76d4301e25196c35a72a70c1a3e625
|
3 |
size 133462128
|
optimizer.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3f28e9dfbc6751727e9f8f3d4493a6b20d0699b33e1f5c8e9917aa1e00b4ca69
|
3 |
+
size 265862074
|
rng_state.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:679261b13fadc34524ddf3ac6ba660d0c6599219939ca3ff724f158ac156674d
|
3 |
+
size 14244
|
scheduler.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4e77392b3a8e4ea80ac4740e1e6f52f30fdf708598646017d54d25d49fa8a1ca
|
3 |
+
size 1064
|
trainer_state.json
ADDED
@@ -0,0 +1,273 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_metric": 0.9679633867276888,
|
3 |
+
"best_model_checkpoint": "/content/drive/MyDrive/Papers/RAG_3GPP/models/checkpoints/embedding/bge-small-telecom_10e_256bs/checkpoint-150",
|
4 |
+
"epoch": 6.857142857142857,
|
5 |
+
"eval_steps": 15,
|
6 |
+
"global_step": 150,
|
7 |
+
"is_hyper_param_search": false,
|
8 |
+
"is_local_process_zero": true,
|
9 |
+
"is_world_process_zero": true,
|
10 |
+
"log_history": [
|
11 |
+
{
|
12 |
+
"epoch": 0.7142857142857143,
|
13 |
+
"grad_norm": 1.681250810623169,
|
14 |
+
"learning_rate": 3.571428571428572e-05,
|
15 |
+
"loss": 0.824,
|
16 |
+
"step": 15
|
17 |
+
},
|
18 |
+
{
|
19 |
+
"epoch": 0.7142857142857143,
|
20 |
+
"eval_loss": 0.13330750167369843,
|
21 |
+
"eval_runtime": 3.6814,
|
22 |
+
"eval_samples_per_second": 356.115,
|
23 |
+
"eval_steps_per_second": 1.63,
|
24 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9397406559877955,
|
25 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
26 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9839816933638444,
|
27 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9893211289092296,
|
28 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9625163452108533,
|
29 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9623769568849659,
|
30 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9701258981216676,
|
31 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9397406559877955,
|
32 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9397406559877955,
|
33 |
+
"step": 15
|
34 |
+
},
|
35 |
+
{
|
36 |
+
"epoch": 1.380952380952381,
|
37 |
+
"grad_norm": 0.8189207315444946,
|
38 |
+
"learning_rate": 4.972077065562821e-05,
|
39 |
+
"loss": 0.1731,
|
40 |
+
"step": 30
|
41 |
+
},
|
42 |
+
{
|
43 |
+
"epoch": 1.380952380952381,
|
44 |
+
"eval_loss": 0.07593704760074615,
|
45 |
+
"eval_runtime": 4.0688,
|
46 |
+
"eval_samples_per_second": 322.209,
|
47 |
+
"eval_steps_per_second": 1.475,
|
48 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9565217391304348,
|
49 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
|
50 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9877955758962624,
|
51 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
|
52 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9723266300874301,
|
53 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9721883210441564,
|
54 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9776352051817517,
|
55 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9565217391304348,
|
56 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9565217391304348,
|
57 |
+
"step": 30
|
58 |
+
},
|
59 |
+
{
|
60 |
+
"epoch": 2.0476190476190474,
|
61 |
+
"grad_norm": 0.7057574391365051,
|
62 |
+
"learning_rate": 4.803690529676019e-05,
|
63 |
+
"loss": 0.0917,
|
64 |
+
"step": 45
|
65 |
+
},
|
66 |
+
{
|
67 |
+
"epoch": 2.0476190476190474,
|
68 |
+
"eval_loss": 0.06566686183214188,
|
69 |
+
"eval_runtime": 3.7186,
|
70 |
+
"eval_samples_per_second": 352.553,
|
71 |
+
"eval_steps_per_second": 1.614,
|
72 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
|
73 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
74 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9900839054157132,
|
75 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
|
76 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9768047979761636,
|
77 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9765700483091787,
|
78 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9807364362901521,
|
79 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
|
80 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
|
81 |
+
"step": 45
|
82 |
+
},
|
83 |
+
{
|
84 |
+
"epoch": 2.761904761904762,
|
85 |
+
"grad_norm": 0.7498806118965149,
|
86 |
+
"learning_rate": 4.4928312680573064e-05,
|
87 |
+
"loss": 0.0676,
|
88 |
+
"step": 60
|
89 |
+
},
|
90 |
+
{
|
91 |
+
"epoch": 2.761904761904762,
|
92 |
+
"eval_loss": 0.06091764196753502,
|
93 |
+
"eval_runtime": 3.7927,
|
94 |
+
"eval_samples_per_second": 345.667,
|
95 |
+
"eval_steps_per_second": 1.582,
|
96 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9641495041952708,
|
97 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
98 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
|
99 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
|
100 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.977428148947981,
|
101 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9771802695143658,
|
102 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9812569737659373,
|
103 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9641495041952708,
|
104 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9641495041952708,
|
105 |
+
"step": 60
|
106 |
+
},
|
107 |
+
{
|
108 |
+
"epoch": 3.4285714285714284,
|
109 |
+
"grad_norm": 0.48658156394958496,
|
110 |
+
"learning_rate": 4.058724504646834e-05,
|
111 |
+
"loss": 0.0435,
|
112 |
+
"step": 75
|
113 |
+
},
|
114 |
+
{
|
115 |
+
"epoch": 3.4285714285714284,
|
116 |
+
"eval_loss": 0.05956002324819565,
|
117 |
+
"eval_runtime": 4.2667,
|
118 |
+
"eval_samples_per_second": 307.261,
|
119 |
+
"eval_steps_per_second": 1.406,
|
120 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
|
121 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
122 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
|
123 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
|
124 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.978052610298987,
|
125 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9778295376121463,
|
126 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9817518617980646,
|
127 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
|
128 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
|
129 |
+
"step": 75
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"epoch": 4.095238095238095,
|
133 |
+
"grad_norm": 0.4985809624195099,
|
134 |
+
"learning_rate": 3.5282177578265296e-05,
|
135 |
+
"loss": 0.038,
|
136 |
+
"step": 90
|
137 |
+
},
|
138 |
+
{
|
139 |
+
"epoch": 4.095238095238095,
|
140 |
+
"eval_loss": 0.060632411390542984,
|
141 |
+
"eval_runtime": 4.6488,
|
142 |
+
"eval_samples_per_second": 282.008,
|
143 |
+
"eval_steps_per_second": 1.291,
|
144 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
|
145 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
146 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
|
147 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
|
148 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9775869566334031,
|
149 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9773646071700992,
|
150 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9813932046352999,
|
151 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
|
152 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
|
153 |
+
"step": 90
|
154 |
+
},
|
155 |
+
{
|
156 |
+
"epoch": 4.809523809523809,
|
157 |
+
"grad_norm": 0.4105435609817505,
|
158 |
+
"learning_rate": 2.9341204441673266e-05,
|
159 |
+
"loss": 0.0332,
|
160 |
+
"step": 105
|
161 |
+
},
|
162 |
+
{
|
163 |
+
"epoch": 4.809523809523809,
|
164 |
+
"eval_loss": 0.05935605987906456,
|
165 |
+
"eval_runtime": 4.0644,
|
166 |
+
"eval_samples_per_second": 322.554,
|
167 |
+
"eval_steps_per_second": 1.476,
|
168 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
|
169 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
170 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
|
171 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
|
172 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9783638236659703,
|
173 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9781273836765828,
|
174 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9819743331685896,
|
175 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
|
176 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
|
177 |
+
"step": 105
|
178 |
+
},
|
179 |
+
{
|
180 |
+
"epoch": 5.476190476190476,
|
181 |
+
"grad_norm": 0.468258261680603,
|
182 |
+
"learning_rate": 2.3131747660339394e-05,
|
183 |
+
"loss": 0.0269,
|
184 |
+
"step": 120
|
185 |
+
},
|
186 |
+
{
|
187 |
+
"epoch": 5.476190476190476,
|
188 |
+
"eval_loss": 0.060672808438539505,
|
189 |
+
"eval_runtime": 4.0797,
|
190 |
+
"eval_samples_per_second": 321.343,
|
191 |
+
"eval_steps_per_second": 1.471,
|
192 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9664378337147216,
|
193 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
|
194 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
|
195 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
|
196 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9780891289133677,
|
197 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9778688871938299,
|
198 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9817380288044749,
|
199 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9664378337147216,
|
200 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9664378337147216,
|
201 |
+
"step": 120
|
202 |
+
},
|
203 |
+
{
|
204 |
+
"epoch": 6.142857142857143,
|
205 |
+
"grad_norm": 0.192308709025383,
|
206 |
+
"learning_rate": 1.7037833743707892e-05,
|
207 |
+
"loss": 0.0219,
|
208 |
+
"step": 135
|
209 |
+
},
|
210 |
+
{
|
211 |
+
"epoch": 6.142857142857143,
|
212 |
+
"eval_loss": 0.06004022806882858,
|
213 |
+
"eval_runtime": 3.6988,
|
214 |
+
"eval_samples_per_second": 354.443,
|
215 |
+
"eval_steps_per_second": 1.622,
|
216 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
|
217 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
|
218 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
|
219 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
|
220 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9779666698415427,
|
221 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9778095601322145,
|
222 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9818676160795978,
|
223 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
|
224 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
|
225 |
+
"step": 135
|
226 |
+
},
|
227 |
+
{
|
228 |
+
"epoch": 6.857142857142857,
|
229 |
+
"grad_norm": 0.3330775499343872,
|
230 |
+
"learning_rate": 1.1436343403356017e-05,
|
231 |
+
"loss": 0.0244,
|
232 |
+
"step": 150
|
233 |
+
},
|
234 |
+
{
|
235 |
+
"epoch": 6.857142857142857,
|
236 |
+
"eval_loss": 0.05985964834690094,
|
237 |
+
"eval_runtime": 3.8386,
|
238 |
+
"eval_samples_per_second": 341.53,
|
239 |
+
"eval_steps_per_second": 1.563,
|
240 |
+
"eval_telecom-ir-eval_cosine_accuracy@1": 0.9679633867276888,
|
241 |
+
"eval_telecom-ir-eval_cosine_accuracy@10": 0.992372234935164,
|
242 |
+
"eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
|
243 |
+
"eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
|
244 |
+
"eval_telecom-ir-eval_cosine_map@100": 0.9791402442094453,
|
245 |
+
"eval_telecom-ir-eval_cosine_mrr@10": 0.9788647342995168,
|
246 |
+
"eval_telecom-ir-eval_cosine_ndcg@10": 0.9823240649953693,
|
247 |
+
"eval_telecom-ir-eval_cosine_precision@1": 0.9679633867276888,
|
248 |
+
"eval_telecom-ir-eval_cosine_recall@1": 0.9679633867276888,
|
249 |
+
"step": 150
|
250 |
+
}
|
251 |
+
],
|
252 |
+
"logging_steps": 15,
|
253 |
+
"max_steps": 210,
|
254 |
+
"num_input_tokens_seen": 0,
|
255 |
+
"num_train_epochs": 10,
|
256 |
+
"save_steps": 15,
|
257 |
+
"stateful_callbacks": {
|
258 |
+
"TrainerControl": {
|
259 |
+
"args": {
|
260 |
+
"should_epoch_stop": false,
|
261 |
+
"should_evaluate": false,
|
262 |
+
"should_log": false,
|
263 |
+
"should_save": true,
|
264 |
+
"should_training_stop": false
|
265 |
+
},
|
266 |
+
"attributes": {}
|
267 |
+
}
|
268 |
+
},
|
269 |
+
"total_flos": 0.0,
|
270 |
+
"train_batch_size": 256,
|
271 |
+
"trial_name": null,
|
272 |
+
"trial_params": null
|
273 |
+
}
|
training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:17256d515423b97b1d01b2965f1f797edb0c05bede71be4bcea12c22c4dc76c5
|
3 |
+
size 5816
|