Training in progress, step 150, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +0 -0
last-checkpoint/trainer_state.json +1108 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: unsloth/Mistral-Nemo-Instruct-2407
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/Mistral-Nemo-Instruct-2407",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.3,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "up_proj",
+    "down_proj",
+    "k_proj",
+    "v_proj",
+    "gate_proj",
+    "o_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02c5275f7034bfe8dc226269d8f1533353bf48d3965661a72637ed84e63ce58a
+size 912336848

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15416a84b7db6ee0953f697e20bbbd539e5721c8bac11dc533c5f2fd8851db97
+size 463916180

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd0a9779afad7ab95e8d25fa9d24c12711c083618f37760a46f8101070c0f933
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcf6c2743f0632628c20d16fde51d82be9077fac7fbc2c9011927c3124f1796c
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
+size 17078292

last-checkpoint/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1108 @@

+{
+  "best_metric": 0.8662680983543396,
+  "best_model_checkpoint": "miner_id_24/checkpoint-150",
+  "epoch": 0.02822732404968009,
+  "eval_steps": 150,
+  "global_step": 150,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0001881821603312006,
+      "grad_norm": 8.021297454833984,
+      "learning_rate": 5e-06,
+      "loss": 4.7627,
+      "step": 1
+    },
+    {
+      "epoch": 0.0001881821603312006,
+      "eval_loss": 1.206099271774292,
+      "eval_runtime": 977.2207,
+      "eval_samples_per_second": 9.159,
+      "eval_steps_per_second": 2.29,
+      "step": 1
+    },
+    {
+      "epoch": 0.0003763643206624012,
+      "grad_norm": 8.330506324768066,
+      "learning_rate": 1e-05,
+      "loss": 4.7186,
+      "step": 2
+    },
+    {
+      "epoch": 0.0005645464809936018,
+      "grad_norm": 7.244218826293945,
+      "learning_rate": 1.5e-05,
+      "loss": 4.7416,
+      "step": 3
+    },
+    {
+      "epoch": 0.0007527286413248024,
+      "grad_norm": 7.7417449951171875,
+      "learning_rate": 2e-05,
+      "loss": 4.8388,
+      "step": 4
+    },
+    {
+      "epoch": 0.000940910801656003,
+      "grad_norm": 7.4192376136779785,
+      "learning_rate": 2.5e-05,
+      "loss": 4.4709,
+      "step": 5
+    },
+    {
+      "epoch": 0.0011290929619872036,
+      "grad_norm": 4.583150386810303,
+      "learning_rate": 3e-05,
+      "loss": 4.305,
+      "step": 6
+    },
+    {
+      "epoch": 0.0013172751223184042,
+      "grad_norm": 3.533625602722168,
+      "learning_rate": 3.5e-05,
+      "loss": 4.0565,
+      "step": 7
+    },
+    {
+      "epoch": 0.0015054572826496049,
+      "grad_norm": 4.560721397399902,
+      "learning_rate": 4e-05,
+      "loss": 4.0204,
+      "step": 8
+    },
+    {
+      "epoch": 0.0016936394429808055,
+      "grad_norm": 13.375167846679688,
+      "learning_rate": 4.5e-05,
+      "loss": 4.3096,
+      "step": 9
+    },
+    {
+      "epoch": 0.001881821603312006,
+      "grad_norm": 6.8327813148498535,
+      "learning_rate": 5e-05,
+      "loss": 4.279,
+      "step": 10
+    },
+    {
+      "epoch": 0.0020700037636432068,
+      "grad_norm": 5.540746688842773,
+      "learning_rate": 5.500000000000001e-05,
+      "loss": 4.2908,
+      "step": 11
+    },
+    {
+      "epoch": 0.002258185923974407,
+      "grad_norm": 4.140532970428467,
+      "learning_rate": 6e-05,
+      "loss": 4.2689,
+      "step": 12
+    },
+    {
+      "epoch": 0.0024463680843056076,
+      "grad_norm": 1.9986830949783325,
+      "learning_rate": 6.500000000000001e-05,
+      "loss": 4.0747,
+      "step": 13
+    },
+    {
+      "epoch": 0.0026345502446368085,
+      "grad_norm": 1.936602234840393,
+      "learning_rate": 7e-05,
+      "loss": 3.956,
+      "step": 14
+    },
+    {
+      "epoch": 0.002822732404968009,
+      "grad_norm": 1.9803358316421509,
+      "learning_rate": 7.500000000000001e-05,
+      "loss": 4.0274,
+      "step": 15
+    },
+    {
+      "epoch": 0.0030109145652992097,
+      "grad_norm": 2.102954864501953,
+      "learning_rate": 8e-05,
+      "loss": 3.9315,
+      "step": 16
+    },
+    {
+      "epoch": 0.00319909672563041,
+      "grad_norm": 1.8604391813278198,
+      "learning_rate": 8.5e-05,
+      "loss": 3.9579,
+      "step": 17
+    },
+    {
+      "epoch": 0.003387278885961611,
+      "grad_norm": 5.816336631774902,
+      "learning_rate": 9e-05,
+      "loss": 3.8109,
+      "step": 18
+    },
+    {
+      "epoch": 0.0035754610462928114,
+      "grad_norm": 1.8846168518066406,
+      "learning_rate": 9.5e-05,
+      "loss": 3.8936,
+      "step": 19
+    },
+    {
+      "epoch": 0.003763643206624012,
+      "grad_norm": 2.1951682567596436,
+      "learning_rate": 0.0001,
+      "loss": 3.8866,
+      "step": 20
+    },
+    {
+      "epoch": 0.003951825366955213,
+      "grad_norm": 1.8232979774475098,
+      "learning_rate": 9.999866555428618e-05,
+      "loss": 3.7866,
+      "step": 21
+    },
+    {
+      "epoch": 0.0041400075272864136,
+      "grad_norm": 1.7551851272583008,
+      "learning_rate": 9.999466228837451e-05,
+      "loss": 3.848,
+      "step": 22
+    },
+    {
+      "epoch": 0.0043281896876176135,
+      "grad_norm": 3.396578788757324,
+      "learning_rate": 9.998799041595064e-05,
+      "loss": 3.8432,
+      "step": 23
+    },
+    {
+      "epoch": 0.004516371847948814,
+      "grad_norm": 1.9480079412460327,
+      "learning_rate": 9.997865029314463e-05,
+      "loss": 3.9188,
+      "step": 24
+    },
+    {
+      "epoch": 0.004704554008280015,
+      "grad_norm": 1.8144193887710571,
+      "learning_rate": 9.996664241851197e-05,
+      "loss": 3.9241,
+      "step": 25
+    },
+    {
+      "epoch": 0.004892736168611215,
+      "grad_norm": 1.6157795190811157,
+      "learning_rate": 9.995196743300692e-05,
+      "loss": 3.7515,
+      "step": 26
+    },
+    {
+      "epoch": 0.005080918328942416,
+      "grad_norm": 1.8001606464385986,
+      "learning_rate": 9.993462611994832e-05,
+      "loss": 3.8946,
+      "step": 27
+    },
+    {
+      "epoch": 0.005269100489273617,
+      "grad_norm": 2.1896607875823975,
+      "learning_rate": 9.991461940497786e-05,
+      "loss": 3.6935,
+      "step": 28
+    },
+    {
+      "epoch": 0.005457282649604818,
+      "grad_norm": 1.9953135251998901,
+      "learning_rate": 9.989194835601048e-05,
+      "loss": 3.6832,
+      "step": 29
+    },
+    {
+      "epoch": 0.005645464809936018,
+      "grad_norm": 1.9537560939788818,
+      "learning_rate": 9.986661418317759e-05,
+      "loss": 3.6689,
+      "step": 30
+    },
+    {
+      "epoch": 0.005833646970267219,
+      "grad_norm": 1.7720470428466797,
+      "learning_rate": 9.983861823876231e-05,
+      "loss": 3.8731,
+      "step": 31
+    },
+    {
+      "epoch": 0.0060218291305984195,
+      "grad_norm": 1.8037577867507935,
+      "learning_rate": 9.980796201712734e-05,
+      "loss": 3.7388,
+      "step": 32
+    },
+    {
+      "epoch": 0.0062100112909296195,
+      "grad_norm": 1.558807611465454,
+      "learning_rate": 9.977464715463524e-05,
+      "loss": 3.5732,
+      "step": 33
+    },
+    {
+      "epoch": 0.00639819345126082,
+      "grad_norm": 1.7031415700912476,
+      "learning_rate": 9.973867542956104e-05,
+      "loss": 3.7046,
+      "step": 34
+    },
+    {
+      "epoch": 0.006586375611592021,
+      "grad_norm": 1.8096400499343872,
+      "learning_rate": 9.97000487619973e-05,
+      "loss": 3.7951,
+      "step": 35
+    },
+    {
+      "epoch": 0.006774557771923222,
+      "grad_norm": 1.6755597591400146,
+      "learning_rate": 9.965876921375165e-05,
+      "loss": 3.7345,
+      "step": 36
+    },
+    {
+      "epoch": 0.006962739932254422,
+      "grad_norm": 1.6606807708740234,
+      "learning_rate": 9.961483898823678e-05,
+      "loss": 3.6923,
+      "step": 37
+    },
+    {
+      "epoch": 0.007150922092585623,
+      "grad_norm": 1.602203130722046,
+      "learning_rate": 9.956826043035268e-05,
+      "loss": 3.6913,
+      "step": 38
+    },
+    {
+      "epoch": 0.007339104252916824,
+      "grad_norm": 1.6571152210235596,
+      "learning_rate": 9.951903602636166e-05,
+      "loss": 3.6178,
+      "step": 39
+    },
+    {
+      "epoch": 0.007527286413248024,
+      "grad_norm": 1.737025499343872,
+      "learning_rate": 9.946716840375551e-05,
+      "loss": 3.6084,
+      "step": 40
+    },
+    {
+      "epoch": 0.0077154685735792245,
+      "grad_norm": 1.6381107568740845,
+      "learning_rate": 9.94126603311153e-05,
+      "loss": 3.4131,
+      "step": 41
+    },
+    {
+      "epoch": 0.007903650733910425,
+      "grad_norm": 1.668062686920166,
+      "learning_rate": 9.935551471796358e-05,
+      "loss": 3.4251,
+      "step": 42
+    },
+    {
+      "epoch": 0.008091832894241625,
+      "grad_norm": 1.6551685333251953,
+      "learning_rate": 9.92957346146091e-05,
+      "loss": 3.4862,
+      "step": 43
+    },
+    {
+      "epoch": 0.008280015054572827,
+      "grad_norm": 1.5389114618301392,
+      "learning_rate": 9.923332321198395e-05,
+      "loss": 3.3558,
+      "step": 44
+    },
+    {
+      "epoch": 0.008468197214904027,
+      "grad_norm": 1.6278464794158936,
+      "learning_rate": 9.916828384147331e-05,
+      "loss": 3.6091,
+      "step": 45
+    },
+    {
+      "epoch": 0.008656379375235227,
+      "grad_norm": 1.642901062965393,
+      "learning_rate": 9.910061997473752e-05,
+      "loss": 3.3503,
+      "step": 46
+    },
+    {
+      "epoch": 0.008844561535566429,
+      "grad_norm": 2.450819730758667,
+      "learning_rate": 9.903033522352687e-05,
+      "loss": 3.7451,
+      "step": 47
+    },
+    {
+      "epoch": 0.009032743695897629,
+      "grad_norm": 1.6337559223175049,
+      "learning_rate": 9.895743333948874e-05,
+      "loss": 3.7759,
+      "step": 48
+    },
+    {
+      "epoch": 0.009220925856228829,
+      "grad_norm": 2.088834285736084,
+      "learning_rate": 9.888191821396744e-05,
+      "loss": 3.8653,
+      "step": 49
+    },
+    {
+      "epoch": 0.00940910801656003,
+      "grad_norm": 2.713131904602051,
+      "learning_rate": 9.880379387779637e-05,
+      "loss": 3.468,
+      "step": 50
+    },
+    {
+      "epoch": 0.00959729017689123,
+      "grad_norm": 1.8180787563323975,
+      "learning_rate": 9.872306450108292e-05,
+      "loss": 3.5826,
+      "step": 51
+    },
+    {
+      "epoch": 0.00978547233722243,
+      "grad_norm": 1.6677582263946533,
+      "learning_rate": 9.863973439298597e-05,
+      "loss": 3.7982,
+      "step": 52
+    },
+    {
+      "epoch": 0.009973654497553632,
+      "grad_norm": 1.569620966911316,
+      "learning_rate": 9.855380800148572e-05,
+      "loss": 3.7162,
+      "step": 53
+    },
+    {
+      "epoch": 0.010161836657884832,
+      "grad_norm": 1.6128700971603394,
+      "learning_rate": 9.846528991314639e-05,
+      "loss": 3.5721,
+      "step": 54
+    },
+    {
+      "epoch": 0.010350018818216034,
+      "grad_norm": 1.8484185934066772,
+      "learning_rate": 9.837418485287127e-05,
+      "loss": 3.9133,
+      "step": 55
+    },
+    {
+      "epoch": 0.010538200978547234,
+      "grad_norm": 1.706660270690918,
+      "learning_rate": 9.828049768365068e-05,
+      "loss": 3.7007,
+      "step": 56
+    },
+    {
+      "epoch": 0.010726383138878434,
+      "grad_norm": 1.7316683530807495,
+      "learning_rate": 9.818423340630228e-05,
+      "loss": 3.7366,
+      "step": 57
+    },
+    {
+      "epoch": 0.010914565299209636,
+      "grad_norm": 1.5684834718704224,
+      "learning_rate": 9.808539715920414e-05,
+      "loss": 3.6784,
+      "step": 58
+    },
+    {
+      "epoch": 0.011102747459540836,
+      "grad_norm": 1.5667593479156494,
+      "learning_rate": 9.798399421802056e-05,
+      "loss": 3.7003,
+      "step": 59
+    },
+    {
+      "epoch": 0.011290929619872036,
+      "grad_norm": 1.526808500289917,
+      "learning_rate": 9.78800299954203e-05,
+      "loss": 3.7078,
+      "step": 60
+    },
+    {
+      "epoch": 0.011479111780203237,
+      "grad_norm": 2.5877087116241455,
+      "learning_rate": 9.777351004078783e-05,
+      "loss": 3.6875,
+      "step": 61
+    },
+    {
+      "epoch": 0.011667293940534437,
+      "grad_norm": 1.545409917831421,
+      "learning_rate": 9.766444003992703e-05,
+      "loss": 3.8299,
+      "step": 62
+    },
+    {
+      "epoch": 0.011855476100865637,
+      "grad_norm": 1.5253973007202148,
+      "learning_rate": 9.755282581475769e-05,
+      "loss": 3.6405,
+      "step": 63
+    },
+    {
+      "epoch": 0.012043658261196839,
+      "grad_norm": 1.8160947561264038,
+      "learning_rate": 9.743867332300478e-05,
+      "loss": 3.7575,
+      "step": 64
+    },
+    {
+      "epoch": 0.012231840421528039,
+      "grad_norm": 1.62673819065094,
+      "learning_rate": 9.732198865788047e-05,
+      "loss": 3.736,
+      "step": 65
+    },
+    {
+      "epoch": 0.012420022581859239,
+      "grad_norm": 1.609445571899414,
+      "learning_rate": 9.72027780477588e-05,
+      "loss": 3.5029,
+      "step": 66
+    },
+    {
+      "epoch": 0.01260820474219044,
+      "grad_norm": 1.640405535697937,
+      "learning_rate": 9.708104785584323e-05,
+      "loss": 3.6871,
+      "step": 67
+    },
+    {
+      "epoch": 0.01279638690252164,
+      "grad_norm": 1.546713948249817,
+      "learning_rate": 9.695680457982713e-05,
+      "loss": 3.5635,
+      "step": 68
+    },
+    {
+      "epoch": 0.012984569062852842,
+      "grad_norm": 1.525071620941162,
+      "learning_rate": 9.683005485154677e-05,
+      "loss": 3.6238,
+      "step": 69
+    },
+    {
+      "epoch": 0.013172751223184042,
+      "grad_norm": 1.5092836618423462,
+      "learning_rate": 9.67008054366274e-05,
+      "loss": 3.6092,
+      "step": 70
+    },
+    {
+      "epoch": 0.013360933383515242,
+      "grad_norm": 1.6913105249404907,
+      "learning_rate": 9.656906323412217e-05,
+      "loss": 3.6286,
+      "step": 71
+    },
+    {
+      "epoch": 0.013549115543846444,
+      "grad_norm": 2.151688814163208,
+      "learning_rate": 9.643483527614372e-05,
+      "loss": 3.5817,
+      "step": 72
+    },
+    {
+      "epoch": 0.013737297704177644,
+      "grad_norm": 1.5763871669769287,
+      "learning_rate": 9.629812872748901e-05,
+      "loss": 3.5043,
+      "step": 73
+    },
+    {
+      "epoch": 0.013925479864508844,
+      "grad_norm": 1.5946530103683472,
+      "learning_rate": 9.615895088525677e-05,
+      "loss": 3.5902,
+      "step": 74
+    },
+    {
+      "epoch": 0.014113662024840046,
+      "grad_norm": 1.5125662088394165,
+      "learning_rate": 9.601730917845797e-05,
+      "loss": 3.5002,
+      "step": 75
+    },
+    {
+      "epoch": 0.014301844185171246,
+      "grad_norm": 1.952149510383606,
+      "learning_rate": 9.587321116761938e-05,
+      "loss": 3.5881,
+      "step": 76
+    },
+    {
+      "epoch": 0.014490026345502446,
+      "grad_norm": 1.5184606313705444,
+      "learning_rate": 9.57266645443799e-05,
+      "loss": 3.3213,
+      "step": 77
+    },
+    {
+      "epoch": 0.014678208505833647,
+      "grad_norm": 1.5643829107284546,
+      "learning_rate": 9.557767713108009e-05,
+      "loss": 3.7112,
+      "step": 78
+    },
+    {
+      "epoch": 0.014866390666164847,
+      "grad_norm": 1.5019704103469849,
+      "learning_rate": 9.542625688034449e-05,
+      "loss": 3.4665,
+      "step": 79
+    },
+    {
+      "epoch": 0.015054572826496047,
+      "grad_norm": 1.647053837776184,
+      "learning_rate": 9.527241187465734e-05,
+      "loss": 3.33,
+      "step": 80
+    },
+    {
+      "epoch": 0.015242754986827249,
+      "grad_norm": 1.527642846107483,
+      "learning_rate": 9.511615032593096e-05,
+      "loss": 3.6355,
+      "step": 81
+    },
+    {
+      "epoch": 0.015430937147158449,
+      "grad_norm": 1.6316813230514526,
+      "learning_rate": 9.49574805750675e-05,
+      "loss": 3.5475,
+      "step": 82
+    },
+    {
+      "epoch": 0.01561911930748965,
+      "grad_norm": 1.6150362491607666,
+      "learning_rate": 9.479641109151373e-05,
+      "loss": 3.5272,
+      "step": 83
+    },
+    {
+      "epoch": 0.01580730146782085,
+      "grad_norm": 1.5907738208770752,
+      "learning_rate": 9.463295047280891e-05,
+      "loss": 3.6044,
+      "step": 84
+    },
+    {
+      "epoch": 0.015995483628152053,
+      "grad_norm": 1.5602425336837769,
+      "learning_rate": 9.446710744412595e-05,
+      "loss": 3.4846,
+      "step": 85
+    },
+    {
+      "epoch": 0.01618366578848325,
+      "grad_norm": 1.5599199533462524,
+      "learning_rate": 9.429889085780557e-05,
+      "loss": 3.598,
+      "step": 86
+    },
+    {
+      "epoch": 0.016371847948814452,
+      "grad_norm": 1.4632290601730347,
+      "learning_rate": 9.41283096928839e-05,
+      "loss": 3.3955,
+      "step": 87
+    },
+    {
+      "epoch": 0.016560030109145654,
+      "grad_norm": 1.5269198417663574,
+      "learning_rate": 9.395537305461311e-05,
+      "loss": 3.5423,
+      "step": 88
+    },
+    {
+      "epoch": 0.016748212269476852,
+      "grad_norm": 1.6142536401748657,
+      "learning_rate": 9.378009017397542e-05,
+      "loss": 3.5366,
+      "step": 89
+    },
+    {
+      "epoch": 0.016936394429808054,
+      "grad_norm": 1.4717563390731812,
+      "learning_rate": 9.360247040719039e-05,
+      "loss": 3.4092,
+      "step": 90
+    },
+    {
+      "epoch": 0.017124576590139256,
+      "grad_norm": 2.3025355339050293,
+      "learning_rate": 9.342252323521545e-05,
+      "loss": 3.441,
+      "step": 91
+    },
+    {
+      "epoch": 0.017312758750470454,
+      "grad_norm": 1.6364383697509766,
+      "learning_rate": 9.324025826323994e-05,
+      "loss": 3.4081,
+      "step": 92
+    },
+    {
+      "epoch": 0.017500940910801656,
+      "grad_norm": 1.6637694835662842,
+      "learning_rate": 9.305568522017227e-05,
+      "loss": 3.4648,
+      "step": 93
+    },
+    {
+      "epoch": 0.017689123071132858,
+      "grad_norm": 1.5033453702926636,
+      "learning_rate": 9.286881395812066e-05,
+      "loss": 3.4444,
+      "step": 94
+    },
+    {
+      "epoch": 0.017877305231464056,
+      "grad_norm": 1.7990686893463135,
+      "learning_rate": 9.267965445186733e-05,
+      "loss": 3.4532,
+      "step": 95
+    },
+    {
+      "epoch": 0.018065487391795258,
+      "grad_norm": 1.5659180879592896,
+      "learning_rate": 9.248821679833596e-05,
+      "loss": 3.444,
+      "step": 96
+    },
+    {
+      "epoch": 0.01825366955212646,
+      "grad_norm": 1.495245337486267,
+      "learning_rate": 9.229451121605279e-05,
+      "loss": 3.5251,
+      "step": 97
+    },
+    {
+      "epoch": 0.018441851712457658,
+      "grad_norm": 1.6818476915359497,
+      "learning_rate": 9.209854804460121e-05,
+      "loss": 3.323,
+      "step": 98
+    },
+    {
+      "epoch": 0.01863003387278886,
+      "grad_norm": 1.9330247640609741,
+      "learning_rate": 9.190033774406977e-05,
+      "loss": 3.4345,
+      "step": 99
+    },
+    {
+      "epoch": 0.01881821603312006,
+      "grad_norm": 2.6322007179260254,
+      "learning_rate": 9.16998908944939e-05,
+      "loss": 3.2795,
+      "step": 100
+    },
+    {
+      "epoch": 0.01900639819345126,
+      "grad_norm": 1.6533123254776,
+      "learning_rate": 9.149721819529119e-05,
+      "loss": 3.6009,
+      "step": 101
+    },
+    {
+      "epoch": 0.01919458035378246,
+      "grad_norm": 1.739073395729065,
+      "learning_rate": 9.129233046469022e-05,
+      "loss": 3.8017,
+      "step": 102
+    },
+    {
+      "epoch": 0.019382762514113663,
+      "grad_norm": 1.728940486907959,
+      "learning_rate": 9.108523863915314e-05,
+      "loss": 3.7084,
+      "step": 103
+    },
+    {
+      "epoch": 0.01957094467444486,
+      "grad_norm": 1.6274477243423462,
+      "learning_rate": 9.087595377279192e-05,
+      "loss": 3.5057,
+      "step": 104
+    },
+    {
+      "epoch": 0.019759126834776063,
+      "grad_norm": 1.6262691020965576,
+      "learning_rate": 9.066448703677828e-05,
+      "loss": 3.6988,
+      "step": 105
+    },
+    {
+      "epoch": 0.019947308995107264,
+      "grad_norm": 4.310033321380615,
+      "learning_rate": 9.045084971874738e-05,
+      "loss": 3.7893,
+      "step": 106
+    },
+    {
+      "epoch": 0.020135491155438466,
+      "grad_norm": 6.6696977615356445,
+      "learning_rate": 9.023505322219536e-05,
+      "loss": 3.5515,
+      "step": 107
+    },
+    {
+      "epoch": 0.020323673315769664,
+      "grad_norm": 1.6663572788238525,
+      "learning_rate": 9.001710906587064e-05,
+      "loss": 3.4112,
+      "step": 108
+    },
+    {
+      "epoch": 0.020511855476100866,
+      "grad_norm": 1.6214839220046997,
+      "learning_rate": 8.9797028883159e-05,
+      "loss": 3.3542,
+      "step": 109
+    },
+    {
+      "epoch": 0.020700037636432068,
+      "grad_norm": 1.540847897529602,
+      "learning_rate": 8.957482442146272e-05,
+      "loss": 3.6587,
+      "step": 110
+    },
+    {
+      "epoch": 0.020888219796763266,
+      "grad_norm": 1.5367134809494019,
+      "learning_rate": 8.935050754157344e-05,
+      "loss": 3.5199,
+      "step": 111
+    },
+    {
+      "epoch": 0.021076401957094468,
+      "grad_norm": 1.7050776481628418,
+      "learning_rate": 8.912409021703913e-05,
+      "loss": 3.6533,
+      "step": 112
+    },
+    {
+      "epoch": 0.02126458411742567,
+      "grad_norm": 1.5632902383804321,
+      "learning_rate": 8.889558453352492e-05,
+      "loss": 3.5123,
+      "step": 113
+    },
+    {
+      "epoch": 0.021452766277756868,
+      "grad_norm": 1.5934125185012817,
+      "learning_rate": 8.866500268816803e-05,
+      "loss": 3.5423,
+      "step": 114
+    },
+    {
+      "epoch": 0.02164094843808807,
+      "grad_norm": 1.5730620622634888,
+      "learning_rate": 8.84323569889266e-05,
+      "loss": 3.3918,
+      "step": 115
+    },
+    {
+      "epoch": 0.02182913059841927,
+      "grad_norm": 1.5384756326675415,
+      "learning_rate": 8.819765985392296e-05,
+      "loss": 3.5834,
+      "step": 116
+    },
+    {
+      "epoch": 0.02201731275875047,
+      "grad_norm": 1.4878556728363037,
+      "learning_rate": 8.79609238107805e-05,
+      "loss": 3.5273,
+      "step": 117
+    },
+    {
+      "epoch": 0.02220549491908167,
+      "grad_norm": 1.8022403717041016,
+      "learning_rate": 8.772216149595513e-05,
+      "loss": 3.6865,
+      "step": 118
+    },
+    {
+      "epoch": 0.022393677079412873,
+      "grad_norm": 1.6074386835098267,
+      "learning_rate": 8.748138565406081e-05,
+      "loss": 3.35,
+      "step": 119
+    },
+    {
+      "epoch": 0.02258185923974407,
+      "grad_norm": 5.877849102020264,
+      "learning_rate": 8.72386091371891e-05,
+      "loss": 3.5387,
+      "step": 120
+    },
+    {
+      "epoch": 0.022770041400075273,
+      "grad_norm": 1.6074655055999756,
+      "learning_rate": 8.699384490422331e-05,
+      "loss": 3.4726,
+      "step": 121
+    },
+    {
+      "epoch": 0.022958223560406475,
+      "grad_norm": 1.5693819522857666,
+      "learning_rate": 8.674710602014671e-05,
+      "loss": 3.4279,
+      "step": 122
+    },
+    {
+      "epoch": 0.023146405720737673,
+      "grad_norm": 1.5131280422210693,
+      "learning_rate": 8.649840565534513e-05,
+      "loss": 3.4739,
+      "step": 123
+    },
+    {
+      "epoch": 0.023334587881068874,
+      "grad_norm": 1.5035758018493652,
+      "learning_rate": 8.624775708490402e-05,
+      "loss": 3.4492,
+      "step": 124
+    },
+    {
+      "epoch": 0.023522770041400076,
+      "grad_norm": 1.6009769439697266,
+      "learning_rate": 8.59951736878998e-05,
+      "loss": 3.6759,
+      "step": 125
+    },
+    {
+      "epoch": 0.023710952201731274,
+      "grad_norm": 1.539108395576477,
+      "learning_rate": 8.574066894668573e-05,
+      "loss": 3.5554,
+      "step": 126
+    },
+    {
+      "epoch": 0.023899134362062476,
+      "grad_norm": 1.8990849256515503,
+      "learning_rate": 8.548425644617224e-05,
+      "loss": 3.4451,
+      "step": 127
+    },
+    {
+      "epoch": 0.024087316522393678,
+      "grad_norm": 1.5071018934249878,
+      "learning_rate": 8.522594987310184e-05,
+      "loss": 3.4815,
+      "step": 128
+    },
+    {
+      "epoch": 0.024275498682724876,
+      "grad_norm": 1.5156155824661255,
+      "learning_rate": 8.49657630153185e-05,
+      "loss": 3.526,
+      "step": 129
+    },
+    {
+      "epoch": 0.024463680843056078,
+      "grad_norm": 1.4841188192367554,
+      "learning_rate": 8.47037097610317e-05,
+      "loss": 3.4503,
+      "step": 130
+    },
+    {
+      "epoch": 0.02465186300338728,
+      "grad_norm": 1.8373347520828247,
+      "learning_rate": 8.443980409807512e-05,
+      "loss": 3.4492,
+      "step": 131
+    },
+    {
+      "epoch": 0.024840045163718478,
+      "grad_norm": 1.9632869958877563,
+      "learning_rate": 8.417406011315998e-05,
+      "loss": 3.5533,
+      "step": 132
+    },
+    {
+      "epoch": 0.02502822732404968,
+      "grad_norm": 1.525312900543213,
+      "learning_rate": 8.390649199112315e-05,
+      "loss": 3.5785,
+      "step": 133
+    },
+    {
+      "epoch": 0.02521640948438088,
+      "grad_norm": 1.7048442363739014,
+      "learning_rate": 8.363711401417e-05,
+      "loss": 3.5644,
+      "step": 134
+    },
+    {
+      "epoch": 0.025404591644712083,
+      "grad_norm": 1.4897756576538086,
+      "learning_rate": 8.336594056111197e-05,
+      "loss": 3.4262,
+      "step": 135
+    },
+    {
+      "epoch": 0.02559277380504328,
+      "grad_norm": 1.5408360958099365,
+      "learning_rate": 8.309298610659916e-05,
+      "loss": 3.5162,
+      "step": 136
+    },
+    {
+      "epoch": 0.025780955965374483,
+      "grad_norm": 1.613747239112854,
+      "learning_rate": 8.281826522034764e-05,
+      "loss": 3.6368,
+      "step": 137
+    },
+    {
+      "epoch": 0.025969138125705685,
+      "grad_norm": 1.5057644844055176,
+      "learning_rate": 8.254179256636179e-05,
+      "loss": 3.4841,
+      "step": 138
+    },
+    {
+      "epoch": 0.026157320286036883,
+      "grad_norm": 1.5167312622070312,
+      "learning_rate": 8.226358290215151e-05,
+      "loss": 3.3391,
+      "step": 139
+    },
+    {
+      "epoch": 0.026345502446368085,
+      "grad_norm": 1.5216038227081299,
+      "learning_rate": 8.198365107794457e-05,
+      "loss": 3.3654,
+      "step": 140
+    },
+    {
+      "epoch": 0.026533684606699286,
+      "grad_norm": 1.5505784749984741,
+      "learning_rate": 8.17020120358939e-05,
+      "loss": 3.461,
+      "step": 141
+    },
+    {
+      "epoch": 0.026721866767030485,
+      "grad_norm": 3.795452833175659,
+      "learning_rate": 8.141868080927996e-05,
+      "loss": 3.4515,
+      "step": 142
+    },
+    {
+      "epoch": 0.026910048927361686,
+      "grad_norm": 1.4928878545761108,
+      "learning_rate": 8.113367252170844e-05,
+      "loss": 3.4339,
+      "step": 143
+    },
+    {
+      "epoch": 0.027098231087692888,
+      "grad_norm": 1.5022090673446655,
+      "learning_rate": 8.084700238630283e-05,
+      "loss": 3.3639,
+      "step": 144
+    },
+    {
+      "epoch": 0.027286413248024086,
+      "grad_norm": 1.4917182922363281,
+      "learning_rate": 8.055868570489247e-05,
+      "loss": 3.2841,
+      "step": 145
+    },
+    {
+      "epoch": 0.027474595408355288,
+      "grad_norm": 1.5091995000839233,
+      "learning_rate": 8.026873786719573e-05,
+      "loss": 3.4067,
+      "step": 146
+    },
+    {
+      "epoch": 0.02766277756868649,
+      "grad_norm": 1.455728530883789,
+      "learning_rate": 7.997717434999861e-05,
+      "loss": 3.2996,
+      "step": 147
+    },
+    {
+      "epoch": 0.027850959729017688,
+      "grad_norm": 1.852761149406433,
+      "learning_rate": 7.968401071632855e-05,
+      "loss": 3.4227,
+      "step": 148
+    },
+    {
+      "epoch": 0.02803914188934889,
+      "grad_norm": 1.8084666728973389,
+      "learning_rate": 7.938926261462366e-05,
+      "loss": 3.4068,
+      "step": 149
+    },
+    {
+      "epoch": 0.02822732404968009,
+      "grad_norm": 2.763946056365967,
+      "learning_rate": 7.909294577789766e-05,
+      "loss": 3.1156,
+      "step": 150
+    },
+    {
+      "epoch": 0.02822732404968009,
+      "eval_loss": 0.8662680983543396,
+      "eval_runtime": 982.7503,
+      "eval_samples_per_second": 9.107,
+      "eval_steps_per_second": 2.277,
+      "step": 150
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 450,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 150,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.4262434839527424e+17,
+  "train_batch_size": 8,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5cf951f87e4a8617e150bca2f14b321b0fa027f5107931992665eb1d65a9cdaa
+size 6840