Subiendo modelo inicial

Browse files

Files changed (7) hide show

README.md +102 -105
model.safetensors +1 -1
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
trainer_state.json +273 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -8,53 +8,55 @@ tags:
 - loss:MultipleNegativesRankingLoss
 base_model: BAAI/bge-small-en-v1.5
 widget:
-- source_sentence: What does the RAT (Radio Access Technology) Agnostic Control Functions
-    (RACFs) interface with?
   sentences:
-  - TLS/SSL is an encryption protocol that creates a secure data channel on an insecure
-    network to make client/server applications secure.
-  - TLS/SSL ensures that data transferred between client and server applications cannot
-    be misused or read, and detects if data have been amended during transmission.
-  - The RAT Agnostic Control Functions (RACFs) interface with the Flow Controller
-    and RAT Specific Control Functions (RSCFs).
-- source_sentence: What is an ultra-dense network (UDN)?
   sentences:
-  - The Markov decision process approach models the delay-aware cross-layer optimization
-    problem as an infinite horizon average cost MDP.
-  - DNS (Domain Name System) is a system that translates human-friendly domain names
-    into IP addresses used by computers to locate servers on the internet.
-  - The document defines an ultra-dense network (UDN) as a network with the spatial
-    density of cells much larger than that of active users.
-- source_sentence: What is the main benefit of storage coding for video files?
   sentences:
-  - In the context of deep learning, GPU stands for Graphics Processing Unit, which
-    enables parallel computing for faster inference.
-  - Storage coding allows the recovery of original video files even in the case of
-    multiple failures.
-  - For DMG STAs, the TXOP holder may transmit a frame using a modulation class other
-    than the DMG Control modulation class at the start of the TXOP if the time elapsed
-    since the last frame received from the TXOP responder is shorter than the Heartbeat
-    Elapsed Time value computed using the Heartbeat Elapsed Indication field within
-    the TXOP responder's DMG Capabilities element.
-- source_sentence: What is the primary parameter used to measure the severity of narrowband
-    fading in AG (air to ground) propagation channels?
   sentences:
-  - The main difference between the Wyner's wiretap coding scheme and the coding scheme
-    used in FB-CSs is that Wyner's scheme uses codeword rate and rate redundancy,
-    while the FB-CSs scheme uses coding rate directly.
-  - The RA field is the individual address of the STA that is the immediate intended
-    receiver of the frame.
-  - The severity of narrowband fading in AG propagation channels is measured using
-    the Ricean K-factor, which is the ratio of dominant channel component power to
-    the power in the sum of all other received components.
-- source_sentence: What is one way to enhance the robustness of terahertz in real-time
-    communication?
   sentences:
-  - Handover refers to the process of changing the serving cell of a UE in RRC_CONNECTED.
-  - Enhancing beam tracking, resource allocation, and user association can improve
-    the robustness of terahertz in real-time communication.
-  - NG-PON has a one-way latency of 2.5 μs, which is the lowest among all the fronthaul
-    technologies.
 datasets:
 - dinho1597/Telecom-QA-MultipleChoice
 pipeline_tag: sentence-similarity
@@ -80,31 +82,31 @@ model-index:
       type: telecom-ir-eval
     metrics:
     - type: cosine_accuracy@1
-      value: 0.9482044198895028
       name: Cosine Accuracy@1
     - type: cosine_accuracy@3
-      value: 0.9910220994475138
       name: Cosine Accuracy@3
     - type: cosine_accuracy@5
-      value: 0.9924033149171271
       name: Cosine Accuracy@5
     - type: cosine_accuracy@10
-      value: 0.9951657458563536
       name: Cosine Accuracy@10
     - type: cosine_precision@1
-      value: 0.9482044198895028
       name: Cosine Precision@1
     - type: cosine_recall@1
-      value: 0.9482044198895028
       name: Cosine Recall@1
     - type: cosine_ndcg@10
-      value: 0.9753237736211358
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
-      value: 0.9685674822415152
       name: Cosine Mrr@10
     - type: cosine_map@100
-      value: 0.9688408433724855
       name: Cosine Map@100
 ---
@@ -159,9 +161,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("sentence_transformers_model_id")
 # Run inference
 sentences = [
-    'What is one way to enhance the robustness of terahertz in real-time communication?',
-    'Enhancing beam tracking, resource allocation, and user association can improve the robustness of terahertz in real-time communication.',
-    'Handover refers to the process of changing the serving cell of a UE in RRC_CONNECTED.',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -208,15 +210,15 @@ You can finetune this model on your own dataset.
 | Metric             | Value      |
 |:-------------------|:-----------|
-| cosine_accuracy@1  | 0.9482     |
-| cosine_accuracy@3  | 0.991      |
-| cosine_accuracy@5  | 0.9924     |
-| cosine_accuracy@10 | 0.9952     |
-| cosine_precision@1 | 0.9482     |
-| cosine_recall@1    | 0.9482     |
-| **cosine_ndcg@10** | **0.9753** |
-| cosine_mrr@10      | 0.9686     |
-| cosine_map@100     | 0.9688     |
 <!--
 ## Bias, Risks and Limitations
@@ -240,16 +242,16 @@ You can finetune this model on your own dataset.
 * Size: 6,552 training samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                            | positive                                                                            |
-  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                              |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 18.54 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 29.33 tokens</li><li>max: 100 tokens</li></ul> |
 * Samples:
-  | anchor                                                                                                                                         | positive                                                                                                                                                                                                                                                                                                                                                         |
-  |:-----------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>What are the two mechanisms in a CDMA (Code Division Multiple Access) system that reduce the chances of significant interference?</code> | <code>In a CDMA system, power control and soft handoff mechanisms are used to reduce the chances of significant interference. Power control ensures that there is no significant intra-cell interference, and soft handoff selects the base station with the best reception to control the user's power, reducing the chance of out-of-cell interference.</code> |
-  | <code>What type of traffic is LPWAN (Low-Power Wide Area Network) suitable for?</code>                                                         | <code>LPWANs are suitable for sporadic and intermittent transmissions of very small packets.</code>                                                                                                                                                                                                                                                              |
-  | <code>What is the definition of an Authenticator in IEEE Std 802.11-2020?</code>                                                               | <code>An Authenticator is an entity at one end of a point-to-point LAN segment that facilitates authentication of the entity attached to the other end of that link.</code>                                                                                                                                                                                      |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -266,16 +268,16 @@ You can finetune this model on your own dataset.
 * Size: 6,552 evaluation samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | anchor                                                                            | positive                                                                          |
-  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                            |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 18.83 tokens</li><li>max: 45 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 28.7 tokens</li><li>max: 99 tokens</li></ul> |
 * Samples:
-  | anchor                                                                                                          | positive                                                                                                                                                                                                                          |
-  |:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>How is the continuous-time impulse response of the end-to-end system computed?</code>                     | <code>The continuous-time impulse response of the end-to-end system is computed as the convolution of the impulse responses of the RIS paths and the propagation channels.</code>                                                 |
-  | <code>What is lattice staggering in a multicarrier scheme?</code>                                               | <code>Lattice staggering is a methodology that generates inherent orthogonality between the points in the lattice for the real domain by using different prototype filters for the real and imaginary parts of the scheme.</code> |
-  | <code>What are the benefits of using Mobile Edge Computing (MEC) for computation-intensive applications?</code> | <code>MEC enables applications to be split into small tasks with some of the tasks performed at the local or regional clouds, reducing delay and backhaul usage.</code>                                                           |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -288,10 +290,10 @@ You can finetune this model on your own dataset.
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `per_device_train_batch_size`: 128
-- `per_device_eval_batch_size`: 128
 - `weight_decay`: 0.01
-- `num_train_epochs`: 5
 - `lr_scheduler_type`: cosine_with_restarts
 - `warmup_ratio`: 0.1
 - `fp16`: True
@@ -305,8 +307,8 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 128
-- `per_device_eval_batch_size`: 128
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
@@ -318,7 +320,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
-- `num_train_epochs`: 5
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {}
@@ -420,24 +422,19 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch      | Step   | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
-|:----------:|:------:|:-------------:|:---------------:|:------------------------------:|
-| 0.3659     | 15     | 0.5651        | 0.1124          | 0.9593                         |
-| 0.7317     | 30     | 0.144         | 0.0674          | 0.9713                         |
-| 1.0976     | 45     | 0.0783        | 0.0605          | 0.9730                         |
-| **1.4634** | **60** | **0.0633**    | **0.0607**      | **0.9753**                     |
-| 1.8293     | 75     | 0.0469        | 0.0570          | 0.9749                         |
-| 2.1951     | 90     | 0.0297        | 0.0568          | 0.9755                         |
-| 2.5610     | 105    | 0.0287        | 0.0583          | 0.9749                         |
-| 2.9268     | 120    | 0.0266        | 0.0594          | 0.9747                         |
-| 3.2927     | 135    | 0.0196        | 0.0604          | 0.9738                         |
-| 3.6585     | 150    | 0.0191        | 0.0609          | 0.9742                         |
-| 4.0244     | 165    | 0.0167        | 0.0608          | 0.9749                         |
-| 4.3902     | 180    | 0.0165        | 0.0611          | 0.9746                         |
-| 4.7561     | 195    | 0.016         | 0.0611          | 0.9746                         |
-| 5.0        | 205    | -             | -               | 0.9753                         |
-* The bold row denotes the saved checkpoint.
 ### Framework Versions
 - Python: 3.10.12

 - loss:MultipleNegativesRankingLoss
 base_model: BAAI/bge-small-en-v1.5
 widget:
+- source_sentence: What problem can reconfigurable intelligent surfaces mitigate in
+    light fidelity systems?
   sentences:
+  - The document mentions that blind channel estimation requires a large number of
+    data symbols to improve accuracy, which may not be feasible in practice.
+  - Empirical evidence suggests that the power decay can even be exponential with
+    distance.
+  - Reconfigurable intelligent surface-enabled environments can enhance light fidelity
+    coverage by mitigating the dead-zone problem for users at the edge of the cell,
+    improving link quality.
+- source_sentence: What is the advantage of conformal arrays in UAV (Unmanned Aerial
+    Vehicle) communication systems?
   sentences:
+  - Overfitting occurs when a model fits the training data too well and fails to generalize
+    to unseen data, while underfitting occurs when a model does not fit the training
+    data well enough to capture the underlying patterns.
+  - A point-to-multipoint service is a service type in which data is sent to all service
+    subscribers or a pre-defined subset of all subscribers within an area defined
+    by the Service Requester.
+  - Conformal arrays offer good aerodynamic performance, enable full-space beam scanning,
+    and provide more DoFs for geometry design.
+- source_sentence: What is a Virtual Home Environment?
   sentences:
+  - Compressive spectrum sensing utilizes the sparsity property of signals to enable
+    sub-Nyquist sampling.
+  - A Virtual Home Environment is a concept that allows for the portability of personal
+    service environments across network boundaries and between terminals.
+  - In the Client Server model, a Client application waits passively on contact while
+    a Server starts the communication actively.
+- source_sentence: What is multi-agent RL (Reinforcement learning) concerned with?
   sentences:
+  - Data centers account for about 1% of global electricity demand, as stated in the
+    document.
+  - Fog Computing and Communication in the Frugal 5G network architecture brings intelligence
+    to the edge and enables more efficient communication with reduced resource usage.
+  - Multi-agent RL is concerned with learning in presence of multiple agents and encompasses
+    unique problem formulation that draws from game theoretical concepts.
+- source_sentence: What is the trade-off between privacy and convergence performance
+    when using artificial noise obscuring in federated learning?
   sentences:
+  - The 'decrypt_error' alert indicates a handshake cryptographic operation failed,
+    including being unable to verify a signature, decrypt a key exchange, or validate
+    a finished message.
+  - The trade-off between privacy and convergence performance when using artificial
+    noise obscuring in federated learning is that increasing the noise variance improves
+    privacy but degrades convergence.
+  - The design rules for sub-carrier allocations to users in cellular systems are
+    to allocate the sub-carriers as spread out as possible and hop the sub-carriers
+    every OFDM symbol time.
 datasets:
 - dinho1597/Telecom-QA-MultipleChoice
 pipeline_tag: sentence-similarity
       type: telecom-ir-eval
     metrics:
     - type: cosine_accuracy@1
+      value: 0.9679633867276888
       name: Cosine Accuracy@1
     - type: cosine_accuracy@3
+      value: 0.9916094584286804
       name: Cosine Accuracy@3
     - type: cosine_accuracy@5
+      value: 0.9916094584286804
       name: Cosine Accuracy@5
     - type: cosine_accuracy@10
+      value: 0.992372234935164
       name: Cosine Accuracy@10
     - type: cosine_precision@1
+      value: 0.9679633867276888
       name: Cosine Precision@1
     - type: cosine_recall@1
+      value: 0.9679633867276888
       name: Cosine Recall@1
     - type: cosine_ndcg@10
+      value: 0.9823240649953693
       name: Cosine Ndcg@10
     - type: cosine_mrr@10
+      value: 0.9788647342995168
       name: Cosine Mrr@10
     - type: cosine_map@100
+      value: 0.9791402442094453
       name: Cosine Map@100
 ---
 model = SentenceTransformer("sentence_transformers_model_id")
 # Run inference
 sentences = [
+    'What is the trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning?',
+    'The trade-off between privacy and convergence performance when using artificial noise obscuring in federated learning is that increasing the noise variance improves privacy but degrades convergence.',
+    "The 'decrypt_error' alert indicates a handshake cryptographic operation failed, including being unable to verify a signature, decrypt a key exchange, or validate a finished message.",
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 | Metric             | Value      |
 |:-------------------|:-----------|
+| cosine_accuracy@1  | 0.968      |
+| cosine_accuracy@3  | 0.9916     |
+| cosine_accuracy@5  | 0.9916     |
+| cosine_accuracy@10 | 0.9924     |
+| cosine_precision@1 | 0.968      |
+| cosine_recall@1    | 0.968      |
+| **cosine_ndcg@10** | **0.9823** |
+| cosine_mrr@10      | 0.9789     |
+| cosine_map@100     | 0.9791     |
 <!--
 ## Bias, Risks and Limitations
 * Size: 6,552 training samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                           | positive                                                                          |
+  |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                            |
+  | details | <ul><li>min: 4 tokens</li><li>mean: 18.8 tokens</li><li>max: 48 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 29.27 tokens</li><li>max: 92 tokens</li></ul> |
 * Samples:
+  | anchor                                                                                             | positive                                                                                                                                                                                                                   |
+  |:---------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What is multi-user multiple input, multiple output (MU-MIMO) in IEEE 802.11-2020?</code>     | <code>MU-MIMO is a technique by which multiple stations (STAs) either simultaneously transmit to a single STA or simultaneously receive from a single STA independent data streams over the same radio frequencies.</code> |
+  | <code>What is the purpose of wireless network virtualization?</code>                               | <code>The purpose of wireless network virtualization is to improve resource utilization, support diverse services/use cases, and be cost-effective and flexible for new services.</code>                                   |
+  | <code>What is the E2E (end-to-end) latency requirement for factory automation applications?</code> | <code>Factory automation applications require an E2E latency of 0.25-10 ms.</code>                                                                                                                                         |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 * Size: 6,552 evaluation samples
 * Columns: <code>anchor</code> and <code>positive</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                           | positive                                                                          |
+  |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                            |
+  | details | <ul><li>min: 4 tokens</li><li>mean: 18.5 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 28.83 tokens</li><li>max: 85 tokens</li></ul> |
 * Samples:
+  | anchor                                                                                                                                                     | positive                                                                                                                                                                                                                                                                                                     |
+  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>Which standard enables building Digital Twins of different Physical Twins using combinations of XML (eXtensible Markup Language) and C codes?</code> | <code>The functional mockup interface (FMI) is a standard that enables building Digital Twins of different Physical Twins using combinations of XML and C codes.</code>                                                                                                                                      |
+  | <code>What algorithm is commonly used for digital signatures in S/MIME?</code>                                                                             | <code>RSA is commonly used for digital signatures in S/MIME.</code>                                                                                                                                                                                                                                          |
+  | <code>What are the three modes of operation based on the communication range and the SA (subarray) separation?</code>                                      | <code>The three modes of operation based on the communication range and the SA separation are: (1) a mode where the channel paths are independent and the channel is always well-conditioned, (2) a mode where the channel is ill-conditioned, and (3) a mode where the channel is highly correlated.</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 256
+- `per_device_eval_batch_size`: 256
 - `weight_decay`: 0.01
+- `num_train_epochs`: 10
 - `lr_scheduler_type`: cosine_with_restarts
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 256
+- `per_device_eval_batch_size`: 256
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
 - `gradient_accumulation_steps`: 1
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
+- `num_train_epochs`: 10
 - `max_steps`: -1
 - `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch  | Step | Training Loss | Validation Loss | telecom-ir-eval_cosine_ndcg@10 |
+|:------:|:----:|:-------------:|:---------------:|:------------------------------:|
+| 0.7143 | 15   | 0.824         | 0.1333          | 0.9701                         |
+| 1.3810 | 30   | 0.1731        | 0.0759          | 0.9776                         |
+| 2.0476 | 45   | 0.0917        | 0.0657          | 0.9807                         |
+| 2.7619 | 60   | 0.0676        | 0.0609          | 0.9813                         |
+| 3.4286 | 75   | 0.0435        | 0.0596          | 0.9818                         |
+| 4.0952 | 90   | 0.038         | 0.0606          | 0.9814                         |
+| 4.8095 | 105  | 0.0332        | 0.0594          | 0.9820                         |
+| 5.4762 | 120  | 0.0269        | 0.0607          | 0.9817                         |
+| 6.1429 | 135  | 0.0219        | 0.0600          | 0.9819                         |
+| 6.8571 | 150  | 0.0244        | 0.0599          | 0.9823                         |
 ### Framework Versions
 - Python: 3.10.12

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3a89325a91a1814f74f9f16c67b9e60d4e246ff9577a9858772f23410241b1d
 size 133462128

 version https://git-lfs.github.com/spec/v1
+oid sha256:2788d601e71fb4f15d15d5780dce8b56db76d4301e25196c35a72a70c1a3e625
 size 133462128

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f28e9dfbc6751727e9f8f3d4493a6b20d0699b33e1f5c8e9917aa1e00b4ca69
+size 265862074

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:679261b13fadc34524ddf3ac6ba660d0c6599219939ca3ff724f158ac156674d
+size 14244

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e77392b3a8e4ea80ac4740e1e6f52f30fdf708598646017d54d25d49fa8a1ca
+size 1064

trainer_state.json ADDED Viewed

	@@ -0,0 +1,273 @@

+{
+  "best_metric": 0.9679633867276888,
+  "best_model_checkpoint": "/content/drive/MyDrive/Papers/RAG_3GPP/models/checkpoints/embedding/bge-small-telecom_10e_256bs/checkpoint-150",
+  "epoch": 6.857142857142857,
+  "eval_steps": 15,
+  "global_step": 150,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.681250810623169,
+      "learning_rate": 3.571428571428572e-05,
+      "loss": 0.824,
+      "step": 15
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "eval_loss": 0.13330750167369843,
+      "eval_runtime": 3.6814,
+      "eval_samples_per_second": 356.115,
+      "eval_steps_per_second": 1.63,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9397406559877955,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9839816933638444,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9893211289092296,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9625163452108533,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9623769568849659,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9701258981216676,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9397406559877955,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9397406559877955,
+      "step": 15
+    },
+    {
+      "epoch": 1.380952380952381,
+      "grad_norm": 0.8189207315444946,
+      "learning_rate": 4.972077065562821e-05,
+      "loss": 0.1731,
+      "step": 30
+    },
+    {
+      "epoch": 1.380952380952381,
+      "eval_loss": 0.07593704760074615,
+      "eval_runtime": 4.0688,
+      "eval_samples_per_second": 322.209,
+      "eval_steps_per_second": 1.475,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9565217391304348,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9877955758962624,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9723266300874301,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9721883210441564,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9776352051817517,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9565217391304348,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9565217391304348,
+      "step": 30
+    },
+    {
+      "epoch": 2.0476190476190474,
+      "grad_norm": 0.7057574391365051,
+      "learning_rate": 4.803690529676019e-05,
+      "loss": 0.0917,
+      "step": 45
+    },
+    {
+      "epoch": 2.0476190476190474,
+      "eval_loss": 0.06566686183214188,
+      "eval_runtime": 3.7186,
+      "eval_samples_per_second": 352.553,
+      "eval_steps_per_second": 1.614,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9900839054157132,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9908466819221968,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9768047979761636,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9765700483091787,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9807364362901521,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
+      "step": 45
+    },
+    {
+      "epoch": 2.761904761904762,
+      "grad_norm": 0.7498806118965149,
+      "learning_rate": 4.4928312680573064e-05,
+      "loss": 0.0676,
+      "step": 60
+    },
+    {
+      "epoch": 2.761904761904762,
+      "eval_loss": 0.06091764196753502,
+      "eval_runtime": 3.7927,
+      "eval_samples_per_second": 345.667,
+      "eval_steps_per_second": 1.582,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9641495041952708,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_map@100": 0.977428148947981,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9771802695143658,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9812569737659373,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9641495041952708,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9641495041952708,
+      "step": 60
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "grad_norm": 0.48658156394958496,
+      "learning_rate": 4.058724504646834e-05,
+      "loss": 0.0435,
+      "step": 75
+    },
+    {
+      "epoch": 3.4285714285714284,
+      "eval_loss": 0.05956002324819565,
+      "eval_runtime": 4.2667,
+      "eval_samples_per_second": 307.261,
+      "eval_steps_per_second": 1.406,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_map@100": 0.978052610298987,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9778295376121463,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9817518617980646,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
+      "step": 75
+    },
+    {
+      "epoch": 4.095238095238095,
+      "grad_norm": 0.4985809624195099,
+      "learning_rate": 3.5282177578265296e-05,
+      "loss": 0.038,
+      "step": 90
+    },
+    {
+      "epoch": 4.095238095238095,
+      "eval_loss": 0.060632411390542984,
+      "eval_runtime": 4.6488,
+      "eval_samples_per_second": 282.008,
+      "eval_steps_per_second": 1.291,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9775869566334031,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9773646071700992,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9813932046352999,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9649122807017544,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9649122807017544,
+      "step": 90
+    },
+    {
+      "epoch": 4.809523809523809,
+      "grad_norm": 0.4105435609817505,
+      "learning_rate": 2.9341204441673266e-05,
+      "loss": 0.0332,
+      "step": 105
+    },
+    {
+      "epoch": 4.809523809523809,
+      "eval_loss": 0.05935605987906456,
+      "eval_runtime": 4.0644,
+      "eval_samples_per_second": 322.554,
+      "eval_steps_per_second": 1.476,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.992372234935164,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9783638236659703,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9781273836765828,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9819743331685896,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
+      "step": 105
+    },
+    {
+      "epoch": 5.476190476190476,
+      "grad_norm": 0.468258261680603,
+      "learning_rate": 2.3131747660339394e-05,
+      "loss": 0.0269,
+      "step": 120
+    },
+    {
+      "epoch": 5.476190476190476,
+      "eval_loss": 0.060672808438539505,
+      "eval_runtime": 4.0797,
+      "eval_samples_per_second": 321.343,
+      "eval_steps_per_second": 1.471,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9664378337147216,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9931350114416476,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9780891289133677,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9778688871938299,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9817380288044749,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9664378337147216,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9664378337147216,
+      "step": 120
+    },
+    {
+      "epoch": 6.142857142857143,
+      "grad_norm": 0.192308709025383,
+      "learning_rate": 1.7037833743707892e-05,
+      "loss": 0.0219,
+      "step": 135
+    },
+    {
+      "epoch": 6.142857142857143,
+      "eval_loss": 0.06004022806882858,
+      "eval_runtime": 3.6988,
+      "eval_samples_per_second": 354.443,
+      "eval_steps_per_second": 1.622,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.965675057208238,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.9938977879481312,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9908466819221968,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9779666698415427,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9778095601322145,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9818676160795978,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.965675057208238,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.965675057208238,
+      "step": 135
+    },
+    {
+      "epoch": 6.857142857142857,
+      "grad_norm": 0.3330775499343872,
+      "learning_rate": 1.1436343403356017e-05,
+      "loss": 0.0244,
+      "step": 150
+    },
+    {
+      "epoch": 6.857142857142857,
+      "eval_loss": 0.05985964834690094,
+      "eval_runtime": 3.8386,
+      "eval_samples_per_second": 341.53,
+      "eval_steps_per_second": 1.563,
+      "eval_telecom-ir-eval_cosine_accuracy@1": 0.9679633867276888,
+      "eval_telecom-ir-eval_cosine_accuracy@10": 0.992372234935164,
+      "eval_telecom-ir-eval_cosine_accuracy@3": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_accuracy@5": 0.9916094584286804,
+      "eval_telecom-ir-eval_cosine_map@100": 0.9791402442094453,
+      "eval_telecom-ir-eval_cosine_mrr@10": 0.9788647342995168,
+      "eval_telecom-ir-eval_cosine_ndcg@10": 0.9823240649953693,
+      "eval_telecom-ir-eval_cosine_precision@1": 0.9679633867276888,
+      "eval_telecom-ir-eval_cosine_recall@1": 0.9679633867276888,
+      "step": 150
+    }
+  ],
+  "logging_steps": 15,
+  "max_steps": 210,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 10,
+  "save_steps": 15,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 256,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17256d515423b97b1d01b2965f1f797edb0c05bede71be4bcea12c22c4dc76c5
+size 5816