bxw315-umd commited on May 1

Commit

a96db1c

verified ·

1 Parent(s): dfbb238

Upload folder using huggingface_hub

Browse files

Files changed (18) hide show

.gitattributes +1 -0
README.md +202 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0
added_tokens.json +16 -0
merges.txt +0 -0
optimizer.pt +3 -0
rng_state_0.pth +3 -0
rng_state_1.pth +3 -0
rng_state_2.pth +3 -0
rng_state_3.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +145 -0
trainer_state.json +1133 -0
training_args.bin +3 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2-VL-2B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-VL-2B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "model.layers.[\\d]+.(mlp|cross_attn|self_attn).(up|down|gate|q|k|v|o)_proj",
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1611f7e57b0e71d4fefa595d0bdf8acb826c61cfbefd87b417b3b184ca677e13
+size 147770496

added_tokens.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d83ebe95f3b4704d826ceb22f2a818928b4e2bf5ec4c5cc42b36467ce5a51506
+size 75471860

rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f468eb3bda29235a22e50a1d4a93f0e73d6e9156f3c92e06cc5b7020cd46bbb
+size 15024

rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9742577ecbe5367851a3be372592dbb9dbb3b2dabf655c60e5813491c5f50d9
+size 15024

rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35cd17226cf768d6137d8b72e50c382bb6f3d514c5a45a8e5a0b33ef7e9aa7a6
+size 15024

rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14c110ee99d870159422e4b5c03e9aab5e722d210c9c33de767b837376711e45
+size 15024

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0738320863223ebbb3380cfe5a102839731d45856d40da34875ee4eb29904cd
+size 1064

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88a3a6fcb80132f76da8aa40cdc3fccd7e5d8468ef15421f5b0c2715e85217d2
+size 11420538

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,145 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "left",
+  "processor_class": "Qwen2VLProcessor",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1133 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 157,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.006369426751592357,
+      "grad_norm": 0.4737717807292938,
+      "learning_rate": 0.0,
+      "loss": 2.4387,
+      "step": 1
+    },
+    {
+      "epoch": 0.012738853503184714,
+      "grad_norm": 0.4696364402770996,
+      "learning_rate": 1.3333333333333333e-05,
+      "loss": 2.4216,
+      "step": 2
+    },
+    {
+      "epoch": 0.01910828025477707,
+      "grad_norm": 0.46825963258743286,
+      "learning_rate": 2.6666666666666667e-05,
+      "loss": 2.4473,
+      "step": 3
+    },
+    {
+      "epoch": 0.025477707006369428,
+      "grad_norm": 0.4676540195941925,
+      "learning_rate": 4e-05,
+      "loss": 2.4151,
+      "step": 4
+    },
+    {
+      "epoch": 0.03184713375796178,
+      "grad_norm": 0.4555565416812897,
+      "learning_rate": 5.333333333333333e-05,
+      "loss": 2.3768,
+      "step": 5
+    },
+    {
+      "epoch": 0.03821656050955414,
+      "grad_norm": 0.4768979549407959,
+      "learning_rate": 6.666666666666667e-05,
+      "loss": 2.3961,
+      "step": 6
+    },
+    {
+      "epoch": 0.044585987261146494,
+      "grad_norm": 0.4434276521205902,
+      "learning_rate": 8e-05,
+      "loss": 2.3393,
+      "step": 7
+    },
+    {
+      "epoch": 0.050955414012738856,
+      "grad_norm": 0.41826605796813965,
+      "learning_rate": 9.333333333333334e-05,
+      "loss": 2.2803,
+      "step": 8
+    },
+    {
+      "epoch": 0.05732484076433121,
+      "grad_norm": 0.38292792439460754,
+      "learning_rate": 0.00010666666666666667,
+      "loss": 2.2154,
+      "step": 9
+    },
+    {
+      "epoch": 0.06369426751592357,
+      "grad_norm": 0.35984838008880615,
+      "learning_rate": 0.00012,
+      "loss": 2.1313,
+      "step": 10
+    },
+    {
+      "epoch": 0.07006369426751592,
+      "grad_norm": 0.36338669061660767,
+      "learning_rate": 0.00013333333333333334,
+      "loss": 2.0336,
+      "step": 11
+    },
+    {
+      "epoch": 0.07643312101910828,
+      "grad_norm": 0.32774657011032104,
+      "learning_rate": 0.00014666666666666666,
+      "loss": 1.9673,
+      "step": 12
+    },
+    {
+      "epoch": 0.08280254777070063,
+      "grad_norm": 0.30120769143104553,
+      "learning_rate": 0.00016,
+      "loss": 1.9096,
+      "step": 13
+    },
+    {
+      "epoch": 0.08917197452229299,
+      "grad_norm": 0.3341425359249115,
+      "learning_rate": 0.00017333333333333334,
+      "loss": 1.8116,
+      "step": 14
+    },
+    {
+      "epoch": 0.09554140127388536,
+      "grad_norm": 0.3417530357837677,
+      "learning_rate": 0.0001866666666666667,
+      "loss": 1.7509,
+      "step": 15
+    },
+    {
+      "epoch": 0.10191082802547771,
+      "grad_norm": 0.3540063798427582,
+      "learning_rate": 0.0002,
+      "loss": 1.6482,
+      "step": 16
+    },
+    {
+      "epoch": 0.10828025477707007,
+      "grad_norm": 0.46677955985069275,
+      "learning_rate": 0.00019997552766852432,
+      "loss": 1.5641,
+      "step": 17
+    },
+    {
+      "epoch": 0.11464968152866242,
+      "grad_norm": 0.4193592369556427,
+      "learning_rate": 0.00019990212265199738,
+      "loss": 1.5238,
+      "step": 18
+    },
+    {
+      "epoch": 0.12101910828025478,
+      "grad_norm": 0.3629758358001709,
+      "learning_rate": 0.00019977982087825713,
+      "loss": 1.393,
+      "step": 19
+    },
+    {
+      "epoch": 0.12738853503184713,
+      "grad_norm": 0.37026986479759216,
+      "learning_rate": 0.00019960868220749448,
+      "loss": 1.3252,
+      "step": 20
+    },
+    {
+      "epoch": 0.1337579617834395,
+      "grad_norm": 0.44961684942245483,
+      "learning_rate": 0.00019938879040295508,
+      "loss": 1.2826,
+      "step": 21
+    },
+    {
+      "epoch": 0.14012738853503184,
+      "grad_norm": 0.31476661562919617,
+      "learning_rate": 0.00019912025308994148,
+      "loss": 1.2013,
+      "step": 22
+    },
+    {
+      "epoch": 0.1464968152866242,
+      "grad_norm": 0.25825104117393494,
+      "learning_rate": 0.0001988032017031364,
+      "loss": 1.1777,
+      "step": 23
+    },
+    {
+      "epoch": 0.15286624203821655,
+      "grad_norm": 0.230068638920784,
+      "learning_rate": 0.00019843779142227256,
+      "loss": 1.1182,
+      "step": 24
+    },
+    {
+      "epoch": 0.1592356687898089,
+      "grad_norm": 0.21921853721141815,
+      "learning_rate": 0.0001980242010961803,
+      "loss": 1.103,
+      "step": 25
+    },
+    {
+      "epoch": 0.16560509554140126,
+      "grad_norm": 0.2058652639389038,
+      "learning_rate": 0.0001975626331552507,
+      "loss": 1.1107,
+      "step": 26
+    },
+    {
+      "epoch": 0.17197452229299362,
+      "grad_norm": 0.19609420001506805,
+      "learning_rate": 0.00019705331351235674,
+      "loss": 1.0931,
+      "step": 27
+    },
+    {
+      "epoch": 0.17834394904458598,
+      "grad_norm": 0.20052394270896912,
+      "learning_rate": 0.00019649649145228102,
+      "loss": 1.0617,
+      "step": 28
+    },
+    {
+      "epoch": 0.18471337579617833,
+      "grad_norm": 0.20906223356723785,
+      "learning_rate": 0.00019589243950970402,
+      "loss": 1.0567,
+      "step": 29
+    },
+    {
+      "epoch": 0.1910828025477707,
+      "grad_norm": 0.2615731656551361,
+      "learning_rate": 0.00019524145333581317,
+      "loss": 1.0118,
+      "step": 30
+    },
+    {
+      "epoch": 0.19745222929936307,
+      "grad_norm": 0.29617559909820557,
+      "learning_rate": 0.00019454385155359702,
+      "loss": 1.0332,
+      "step": 31
+    },
+    {
+      "epoch": 0.20382165605095542,
+      "grad_norm": 0.15338869392871857,
+      "learning_rate": 0.00019379997560189675,
+      "loss": 1.0087,
+      "step": 32
+    },
+    {
+      "epoch": 0.21019108280254778,
+      "grad_norm": 0.15214228630065918,
+      "learning_rate": 0.00019301018956828964,
+      "loss": 0.9625,
+      "step": 33
+    },
+    {
+      "epoch": 0.21656050955414013,
+      "grad_norm": 0.14021392166614532,
+      "learning_rate": 0.00019217488001088784,
+      "loss": 0.9786,
+      "step": 34
+    },
+    {
+      "epoch": 0.2229299363057325,
+      "grad_norm": 0.13724100589752197,
+      "learning_rate": 0.00019129445576913888,
+      "loss": 0.9617,
+      "step": 35
+    },
+    {
+      "epoch": 0.22929936305732485,
+      "grad_norm": 0.14516188204288483,
+      "learning_rate": 0.0001903693477637204,
+      "loss": 0.9436,
+      "step": 36
+    },
+    {
+      "epoch": 0.2356687898089172,
+      "grad_norm": 0.16043990850448608,
+      "learning_rate": 0.00018940000878562758,
+      "loss": 0.9361,
+      "step": 37
+    },
+    {
+      "epoch": 0.24203821656050956,
+      "grad_norm": 0.16804192960262299,
+      "learning_rate": 0.0001883869132745561,
+      "loss": 0.9318,
+      "step": 38
+    },
+    {
+      "epoch": 0.2484076433121019,
+      "grad_norm": 0.1397908627986908,
+      "learning_rate": 0.00018733055708668926,
+      "loss": 0.9509,
+      "step": 39
+    },
+    {
+      "epoch": 0.25477707006369427,
+      "grad_norm": 0.1381651759147644,
+      "learning_rate": 0.00018623145725200278,
+      "loss": 0.9048,
+      "step": 40
+    },
+    {
+      "epoch": 0.2611464968152866,
+      "grad_norm": 0.14483928680419922,
+      "learning_rate": 0.00018509015172120621,
+      "loss": 0.8873,
+      "step": 41
+    },
+    {
+      "epoch": 0.267515923566879,
+      "grad_norm": 0.13835562765598297,
+      "learning_rate": 0.00018390719910244487,
+      "loss": 0.8715,
+      "step": 42
+    },
+    {
+      "epoch": 0.27388535031847133,
+      "grad_norm": 0.1395685374736786,
+      "learning_rate": 0.00018268317838789088,
+      "loss": 0.9215,
+      "step": 43
+    },
+    {
+      "epoch": 0.2802547770700637,
+      "grad_norm": 0.12839213013648987,
+      "learning_rate": 0.00018141868867035745,
+      "loss": 0.857,
+      "step": 44
+    },
+    {
+      "epoch": 0.28662420382165604,
+      "grad_norm": 0.13072729110717773,
+      "learning_rate": 0.00018011434885007482,
+      "loss": 0.8962,
+      "step": 45
+    },
+    {
+      "epoch": 0.2929936305732484,
+      "grad_norm": 0.12887758016586304,
+      "learning_rate": 0.00017877079733177184,
+      "loss": 0.8788,
+      "step": 46
+    },
+    {
+      "epoch": 0.29936305732484075,
+      "grad_norm": 0.13233087956905365,
+      "learning_rate": 0.00017738869171221068,
+      "loss": 0.8521,
+      "step": 47
+    },
+    {
+      "epoch": 0.3057324840764331,
+      "grad_norm": 0.12415429204702377,
+      "learning_rate": 0.0001759687084583285,
+      "loss": 0.8319,
+      "step": 48
+    },
+    {
+      "epoch": 0.31210191082802546,
+      "grad_norm": 0.13997192680835724,
+      "learning_rate": 0.00017451154257614287,
+      "loss": 0.8506,
+      "step": 49
+    },
+    {
+      "epoch": 0.3184713375796178,
+      "grad_norm": 0.1296936273574829,
+      "learning_rate": 0.00017301790727058345,
+      "loss": 0.8495,
+      "step": 50
+    },
+    {
+      "epoch": 0.3248407643312102,
+      "grad_norm": 0.1260387897491455,
+      "learning_rate": 0.00017148853359641626,
+      "loss": 0.8714,
+      "step": 51
+    },
+    {
+      "epoch": 0.33121019108280253,
+      "grad_norm": 0.14483502507209778,
+      "learning_rate": 0.00016992417010043142,
+      "loss": 0.8284,
+      "step": 52
+    },
+    {
+      "epoch": 0.3375796178343949,
+      "grad_norm": 0.13002006709575653,
+      "learning_rate": 0.00016832558245506935,
+      "loss": 0.8478,
+      "step": 53
+    },
+    {
+      "epoch": 0.34394904458598724,
+      "grad_norm": 0.12996485829353333,
+      "learning_rate": 0.0001666935530836651,
+      "loss": 0.8351,
+      "step": 54
+    },
+    {
+      "epoch": 0.3503184713375796,
+      "grad_norm": 0.13534753024578094,
+      "learning_rate": 0.0001650288807774937,
+      "loss": 0.8092,
+      "step": 55
+    },
+    {
+      "epoch": 0.35668789808917195,
+      "grad_norm": 0.12969057261943817,
+      "learning_rate": 0.0001633323803048047,
+      "loss": 0.8425,
+      "step": 56
+    },
+    {
+      "epoch": 0.3630573248407643,
+      "grad_norm": 0.12496688216924667,
+      "learning_rate": 0.00016160488201203644,
+      "loss": 0.8313,
+      "step": 57
+    },
+    {
+      "epoch": 0.36942675159235666,
+      "grad_norm": 0.12409767508506775,
+      "learning_rate": 0.00015984723141740576,
+      "loss": 0.8226,
+      "step": 58
+    },
+    {
+      "epoch": 0.37579617834394907,
+      "grad_norm": 0.13024461269378662,
+      "learning_rate": 0.0001580602887970721,
+      "loss": 0.8281,
+      "step": 59
+    },
+    {
+      "epoch": 0.3821656050955414,
+      "grad_norm": 0.13077828288078308,
+      "learning_rate": 0.0001562449287640781,
+      "loss": 0.7939,
+      "step": 60
+    },
+    {
+      "epoch": 0.3885350318471338,
+      "grad_norm": 0.13101692497730255,
+      "learning_rate": 0.00015440203984027324,
+      "loss": 0.8085,
+      "step": 61
+    },
+    {
+      "epoch": 0.39490445859872614,
+      "grad_norm": 0.13003262877464294,
+      "learning_rate": 0.00015253252402142988,
+      "loss": 0.8179,
+      "step": 62
+    },
+    {
+      "epoch": 0.4012738853503185,
+      "grad_norm": 0.13072040677070618,
+      "learning_rate": 0.0001506372963357644,
+      "loss": 0.7945,
+      "step": 63
+    },
+    {
+      "epoch": 0.40764331210191085,
+      "grad_norm": 0.11719010770320892,
+      "learning_rate": 0.00014871728439607966,
+      "loss": 0.8202,
+      "step": 64
+    },
+    {
+      "epoch": 0.4140127388535032,
+      "grad_norm": 0.12282049655914307,
+      "learning_rate": 0.00014677342794574817,
+      "loss": 0.7987,
+      "step": 65
+    },
+    {
+      "epoch": 0.42038216560509556,
+      "grad_norm": 0.11916442960500717,
+      "learning_rate": 0.00014480667839875786,
+      "loss": 0.7963,
+      "step": 66
+    },
+    {
+      "epoch": 0.4267515923566879,
+      "grad_norm": 0.13636524975299835,
+      "learning_rate": 0.00014281799837404552,
+      "loss": 0.808,
+      "step": 67
+    },
+    {
+      "epoch": 0.43312101910828027,
+      "grad_norm": 0.12172853946685791,
+      "learning_rate": 0.0001408083612243465,
+      "loss": 0.7889,
+      "step": 68
+    },
+    {
+      "epoch": 0.4394904458598726,
+      "grad_norm": 0.12408412247896194,
+      "learning_rate": 0.00013877875055979023,
+      "loss": 0.8145,
+      "step": 69
+    },
+    {
+      "epoch": 0.445859872611465,
+      "grad_norm": 0.1280149221420288,
+      "learning_rate": 0.00013673015976647568,
+      "loss": 0.796,
+      "step": 70
+    },
+    {
+      "epoch": 0.45222929936305734,
+      "grad_norm": 0.13267850875854492,
+      "learning_rate": 0.00013466359152026195,
+      "loss": 0.7928,
+      "step": 71
+    },
+    {
+      "epoch": 0.4585987261146497,
+      "grad_norm": 0.12811611592769623,
+      "learning_rate": 0.00013258005729601177,
+      "loss": 0.781,
+      "step": 72
+    },
+    {
+      "epoch": 0.46496815286624205,
+      "grad_norm": 0.1286015510559082,
+      "learning_rate": 0.00013048057687252865,
+      "loss": 0.7603,
+      "step": 73
+    },
+    {
+      "epoch": 0.4713375796178344,
+      "grad_norm": 0.14502450823783875,
+      "learning_rate": 0.0001283661778334297,
+      "loss": 0.7862,
+      "step": 74
+    },
+    {
+      "epoch": 0.47770700636942676,
+      "grad_norm": 0.15144243836402893,
+      "learning_rate": 0.0001262378950641979,
+      "loss": 0.8128,
+      "step": 75
+    },
+    {
+      "epoch": 0.4840764331210191,
+      "grad_norm": 0.14334194362163544,
+      "learning_rate": 0.00012409677024566144,
+      "loss": 0.7535,
+      "step": 76
+    },
+    {
+      "epoch": 0.49044585987261147,
+      "grad_norm": 0.13474944233894348,
+      "learning_rate": 0.00012194385134414608,
+      "loss": 0.8058,
+      "step": 77
+    },
+    {
+      "epoch": 0.4968152866242038,
+      "grad_norm": 0.13721239566802979,
+      "learning_rate": 0.00011978019209855174,
+      "loss": 0.7731,
+      "step": 78
+    },
+    {
+      "epoch": 0.5031847133757962,
+      "grad_norm": 0.1446215659379959,
+      "learning_rate": 0.00011760685150460362,
+      "loss": 0.7494,
+      "step": 79
+    },
+    {
+      "epoch": 0.5095541401273885,
+      "grad_norm": 0.14299342036247253,
+      "learning_rate": 0.00011542489329653024,
+      "loss": 0.7749,
+      "step": 80
+    },
+    {
+      "epoch": 0.5159235668789809,
+      "grad_norm": 0.13673968613147736,
+      "learning_rate": 0.00011323538542642227,
+      "loss": 0.7402,
+      "step": 81
+    },
+    {
+      "epoch": 0.5222929936305732,
+      "grad_norm": 0.13603895902633667,
+      "learning_rate": 0.000111039399541527,
+      "loss": 0.7651,
+      "step": 82
+    },
+    {
+      "epoch": 0.5286624203821656,
+      "grad_norm": 0.13204550743103027,
+      "learning_rate": 0.00010883801045973425,
+      "loss": 0.7432,
+      "step": 83
+    },
+    {
+      "epoch": 0.535031847133758,
+      "grad_norm": 0.14662222564220428,
+      "learning_rate": 0.00010663229564351041,
+      "loss": 0.7648,
+      "step": 84
+    },
+    {
+      "epoch": 0.5414012738853503,
+      "grad_norm": 0.14292187988758087,
+      "learning_rate": 0.00010442333467253789,
+      "loss": 0.758,
+      "step": 85
+    },
+    {
+      "epoch": 0.5477707006369427,
+      "grad_norm": 0.13865429162979126,
+      "learning_rate": 0.00010221220871531869,
+      "loss": 0.7427,
+      "step": 86
+    },
+    {
+      "epoch": 0.554140127388535,
+      "grad_norm": 0.13653217256069183,
+      "learning_rate": 0.0001,
+      "loss": 0.7466,
+      "step": 87
+    },
+    {
+      "epoch": 0.5605095541401274,
+      "grad_norm": 0.1345141977071762,
+      "learning_rate": 9.778779128468132e-05,
+      "loss": 0.7842,
+      "step": 88
+    },
+    {
+      "epoch": 0.5668789808917197,
+      "grad_norm": 0.13763494789600372,
+      "learning_rate": 9.557666532746213e-05,
+      "loss": 0.7757,
+      "step": 89
+    },
+    {
+      "epoch": 0.5732484076433121,
+      "grad_norm": 0.14381903409957886,
+      "learning_rate": 9.336770435648964e-05,
+      "loss": 0.7413,
+      "step": 90
+    },
+    {
+      "epoch": 0.5796178343949044,
+      "grad_norm": 0.15023089945316315,
+      "learning_rate": 9.116198954026577e-05,
+      "loss": 0.7528,
+      "step": 91
+    },
+    {
+      "epoch": 0.5859872611464968,
+      "grad_norm": 0.13393183052539825,
+      "learning_rate": 8.896060045847304e-05,
+      "loss": 0.7356,
+      "step": 92
+    },
+    {
+      "epoch": 0.5923566878980892,
+      "grad_norm": 0.14455927908420563,
+      "learning_rate": 8.676461457357776e-05,
+      "loss": 0.7661,
+      "step": 93
+    },
+    {
+      "epoch": 0.5987261146496815,
+      "grad_norm": 0.1516951471567154,
+      "learning_rate": 8.457510670346976e-05,
+      "loss": 0.7611,
+      "step": 94
+    },
+    {
+      "epoch": 0.6050955414012739,
+      "grad_norm": 0.15287864208221436,
+      "learning_rate": 8.239314849539638e-05,
+      "loss": 0.7751,
+      "step": 95
+    },
+    {
+      "epoch": 0.6114649681528662,
+      "grad_norm": 0.13344816863536835,
+      "learning_rate": 8.021980790144827e-05,
+      "loss": 0.7555,
+      "step": 96
+    },
+    {
+      "epoch": 0.6178343949044586,
+      "grad_norm": 0.12972086668014526,
+      "learning_rate": 7.805614865585396e-05,
+      "loss": 0.744,
+      "step": 97
+    },
+    {
+      "epoch": 0.6242038216560509,
+      "grad_norm": 0.13361124694347382,
+      "learning_rate": 7.590322975433857e-05,
+      "loss": 0.7616,
+      "step": 98
+    },
+    {
+      "epoch": 0.6305732484076433,
+      "grad_norm": 0.14594393968582153,
+      "learning_rate": 7.376210493580212e-05,
+      "loss": 0.7778,
+      "step": 99
+    },
+    {
+      "epoch": 0.6369426751592356,
+      "grad_norm": 0.14642472565174103,
+      "learning_rate": 7.163382216657034e-05,
+      "loss": 0.7516,
+      "step": 100
+    },
+    {
+      "epoch": 0.643312101910828,
+      "grad_norm": 0.1501522958278656,
+      "learning_rate": 6.951942312747134e-05,
+      "loss": 0.7631,
+      "step": 101
+    },
+    {
+      "epoch": 0.6496815286624203,
+      "grad_norm": 0.1488322615623474,
+      "learning_rate": 6.741994270398826e-05,
+      "loss": 0.7446,
+      "step": 102
+    },
+    {
+      "epoch": 0.6560509554140127,
+      "grad_norm": 0.13777847588062286,
+      "learning_rate": 6.533640847973808e-05,
+      "loss": 0.7726,
+      "step": 103
+    },
+    {
+      "epoch": 0.6624203821656051,
+      "grad_norm": 0.15372638404369354,
+      "learning_rate": 6.326984023352435e-05,
+      "loss": 0.7472,
+      "step": 104
+    },
+    {
+      "epoch": 0.6687898089171974,
+      "grad_norm": 0.1486649364233017,
+      "learning_rate": 6.122124944020977e-05,
+      "loss": 0.779,
+      "step": 105
+    },
+    {
+      "epoch": 0.6751592356687898,
+      "grad_norm": 0.15514899790287018,
+      "learning_rate": 5.91916387756535e-05,
+      "loss": 0.7297,
+      "step": 106
+    },
+    {
+      "epoch": 0.6815286624203821,
+      "grad_norm": 0.14487750828266144,
+      "learning_rate": 5.718200162595449e-05,
+      "loss": 0.7044,
+      "step": 107
+    },
+    {
+      "epoch": 0.6878980891719745,
+      "grad_norm": 0.1558275818824768,
+      "learning_rate": 5.5193321601242156e-05,
+      "loss": 0.7442,
+      "step": 108
+    },
+    {
+      "epoch": 0.6942675159235668,
+      "grad_norm": 0.14666889607906342,
+      "learning_rate": 5.322657205425183e-05,
+      "loss": 0.7321,
+      "step": 109
+    },
+    {
+      "epoch": 0.7006369426751592,
+      "grad_norm": 0.1434071809053421,
+      "learning_rate": 5.1282715603920374e-05,
+      "loss": 0.749,
+      "step": 110
+    },
+    {
+      "epoch": 0.7070063694267515,
+      "grad_norm": 0.13561522960662842,
+      "learning_rate": 4.936270366423563e-05,
+      "loss": 0.7085,
+      "step": 111
+    },
+    {
+      "epoch": 0.7133757961783439,
+      "grad_norm": 0.15304827690124512,
+      "learning_rate": 4.746747597857014e-05,
+      "loss": 0.6975,
+      "step": 112
+    },
+    {
+      "epoch": 0.7197452229299363,
+      "grad_norm": 0.14578115940093994,
+      "learning_rate": 4.559796015972677e-05,
+      "loss": 0.7282,
+      "step": 113
+    },
+    {
+      "epoch": 0.7261146496815286,
+      "grad_norm": 0.14384236931800842,
+      "learning_rate": 4.375507123592194e-05,
+      "loss": 0.7241,
+      "step": 114
+    },
+    {
+      "epoch": 0.732484076433121,
+      "grad_norm": 0.16036854684352875,
+      "learning_rate": 4.1939711202927936e-05,
+      "loss": 0.7581,
+      "step": 115
+    },
+    {
+      "epoch": 0.7388535031847133,
+      "grad_norm": 0.1477956473827362,
+      "learning_rate": 4.015276858259427e-05,
+      "loss": 0.7372,
+      "step": 116
+    },
+    {
+      "epoch": 0.7452229299363057,
+      "grad_norm": 0.13899311423301697,
+      "learning_rate": 3.839511798796357e-05,
+      "loss": 0.7358,
+      "step": 117
+    },
+    {
+      "epoch": 0.7515923566878981,
+      "grad_norm": 0.14456766843795776,
+      "learning_rate": 3.6667619695195285e-05,
+      "loss": 0.7386,
+      "step": 118
+    },
+    {
+      "epoch": 0.7579617834394905,
+      "grad_norm": 0.13479867577552795,
+      "learning_rate": 3.49711192225063e-05,
+      "loss": 0.7083,
+      "step": 119
+    },
+    {
+      "epoch": 0.7643312101910829,
+      "grad_norm": 0.15040116012096405,
+      "learning_rate": 3.330644691633492e-05,
+      "loss": 0.7387,
+      "step": 120
+    },
+    {
+      "epoch": 0.7707006369426752,
+      "grad_norm": 0.13942091166973114,
+      "learning_rate": 3.167441754493066e-05,
+      "loss": 0.7583,
+      "step": 121
+    },
+    {
+      "epoch": 0.7770700636942676,
+      "grad_norm": 0.1347065269947052,
+      "learning_rate": 3.0075829899568597e-05,
+      "loss": 0.7298,
+      "step": 122
+    },
+    {
+      "epoch": 0.7834394904458599,
+      "grad_norm": 0.13468332588672638,
+      "learning_rate": 2.8511466403583766e-05,
+      "loss": 0.7437,
+      "step": 123
+    },
+    {
+      "epoch": 0.7898089171974523,
+      "grad_norm": 0.13734403252601624,
+      "learning_rate": 2.6982092729416587e-05,
+      "loss": 0.7262,
+      "step": 124
+    },
+    {
+      "epoch": 0.7961783439490446,
+      "grad_norm": 0.13269653916358948,
+      "learning_rate": 2.548845742385717e-05,
+      "loss": 0.7228,
+      "step": 125
+    },
+    {
+      "epoch": 0.802547770700637,
+      "grad_norm": 0.13551217317581177,
+      "learning_rate": 2.403129154167153e-05,
+      "loss": 0.7317,
+      "step": 126
+    },
+    {
+      "epoch": 0.8089171974522293,
+      "grad_norm": 0.13381463289260864,
+      "learning_rate": 2.2611308287789344e-05,
+      "loss": 0.7263,
+      "step": 127
+    },
+    {
+      "epoch": 0.8152866242038217,
+      "grad_norm": 0.1421995759010315,
+      "learning_rate": 2.1229202668228197e-05,
+      "loss": 0.7302,
+      "step": 128
+    },
+    {
+      "epoch": 0.821656050955414,
+      "grad_norm": 0.13885881006717682,
+      "learning_rate": 1.988565114992519e-05,
+      "loss": 0.7597,
+      "step": 129
+    },
+    {
+      "epoch": 0.8280254777070064,
+      "grad_norm": 0.1440562754869461,
+      "learning_rate": 1.858131132964259e-05,
+      "loss": 0.7532,
+      "step": 130
+    },
+    {
+      "epoch": 0.8343949044585988,
+      "grad_norm": 0.13995864987373352,
+      "learning_rate": 1.7316821612109136e-05,
+      "loss": 0.7205,
+      "step": 131
+    },
+    {
+      "epoch": 0.8407643312101911,
+      "grad_norm": 0.1380428522825241,
+      "learning_rate": 1.609280089755515e-05,
+      "loss": 0.7176,
+      "step": 132
+    },
+    {
+      "epoch": 0.8471337579617835,
+      "grad_norm": 0.12905895709991455,
+      "learning_rate": 1.4909848278793782e-05,
+      "loss": 0.7404,
+      "step": 133
+    },
+    {
+      "epoch": 0.8535031847133758,
+      "grad_norm": 0.15510842204093933,
+      "learning_rate": 1.3768542747997215e-05,
+      "loss": 0.7249,
+      "step": 134
+    },
+    {
+      "epoch": 0.8598726114649682,
+      "grad_norm": 0.13218414783477783,
+      "learning_rate": 1.2669442913310725e-05,
+      "loss": 0.719,
+      "step": 135
+    },
+    {
+      "epoch": 0.8662420382165605,
+      "grad_norm": 0.1409848928451538,
+      "learning_rate": 1.161308672544389e-05,
+      "loss": 0.705,
+      "step": 136
+    },
+    {
+      "epoch": 0.8726114649681529,
+      "grad_norm": 0.13037748634815216,
+      "learning_rate": 1.059999121437244e-05,
+      "loss": 0.7165,
+      "step": 137
+    },
+    {
+      "epoch": 0.8789808917197452,
+      "grad_norm": 0.139460951089859,
+      "learning_rate": 9.630652236279625e-06,
+      "loss": 0.69,
+      "step": 138
+    },
+    {
+      "epoch": 0.8853503184713376,
+      "grad_norm": 0.14767377078533173,
+      "learning_rate": 8.70554423086114e-06,
+      "loss": 0.7517,
+      "step": 139
+    },
+    {
+      "epoch": 0.89171974522293,
+      "grad_norm": 0.13341829180717468,
+      "learning_rate": 7.825119989112173e-06,
+      "loss": 0.7135,
+      "step": 140
+    },
+    {
+      "epoch": 0.8980891719745223,
+      "grad_norm": 0.12980401515960693,
+      "learning_rate": 6.989810431710375e-06,
+      "loss": 0.6892,
+      "step": 141
+    },
+    {
+      "epoch": 0.9044585987261147,
+      "grad_norm": 0.13363727927207947,
+      "learning_rate": 6.200024398103255e-06,
+      "loss": 0.7217,
+      "step": 142
+    },
+    {
+      "epoch": 0.910828025477707,
+      "grad_norm": 0.14944419264793396,
+      "learning_rate": 5.456148446402976e-06,
+      "loss": 0.742,
+      "step": 143
+    },
+    {
+      "epoch": 0.9171974522292994,
+      "grad_norm": 0.134084090590477,
+      "learning_rate": 4.758546664186869e-06,
+      "loss": 0.735,
+      "step": 144
+    },
+    {
+      "epoch": 0.9235668789808917,
+      "grad_norm": 0.1409689486026764,
+      "learning_rate": 4.107560490295992e-06,
+      "loss": 0.7192,
+      "step": 145
+    },
+    {
+      "epoch": 0.9299363057324841,
+      "grad_norm": 0.13256186246871948,
+      "learning_rate": 3.5035085477190143e-06,
+      "loss": 0.6932,
+      "step": 146
+    },
+    {
+      "epoch": 0.9363057324840764,
+      "grad_norm": 0.1366000920534134,
+      "learning_rate": 2.94668648764328e-06,
+      "loss": 0.7295,
+      "step": 147
+    },
+    {
+      "epoch": 0.9426751592356688,
+      "grad_norm": 0.13672901690006256,
+      "learning_rate": 2.4373668447493224e-06,
+      "loss": 0.7059,
+      "step": 148
+    },
+    {
+      "epoch": 0.9490445859872612,
+      "grad_norm": 0.13341008126735687,
+      "learning_rate": 1.9757989038197146e-06,
+      "loss": 0.7184,
+      "step": 149
+    },
+    {
+      "epoch": 0.9554140127388535,
+      "grad_norm": 0.13967232406139374,
+      "learning_rate": 1.562208577727442e-06,
+      "loss": 0.7264,
+      "step": 150
+    },
+    {
+      "epoch": 0.9617834394904459,
+      "grad_norm": 0.1259932667016983,
+      "learning_rate": 1.1967982968635993e-06,
+      "loss": 0.7352,
+      "step": 151
+    },
+    {
+      "epoch": 0.9681528662420382,
+      "grad_norm": 0.14053477346897125,
+      "learning_rate": 8.797469100585431e-07,
+      "loss": 0.7145,
+      "step": 152
+    },
+    {
+      "epoch": 0.9745222929936306,
+      "grad_norm": 0.12578168511390686,
+      "learning_rate": 6.11209597044926e-07,
+      "loss": 0.7092,
+      "step": 153
+    },
+    {
+      "epoch": 0.9808917197452229,
+      "grad_norm": 0.14532066881656647,
+      "learning_rate": 3.913177925055189e-07,
+      "loss": 0.7048,
+      "step": 154
+    },
+    {
+      "epoch": 0.9872611464968153,
+      "grad_norm": 0.13926520943641663,
+      "learning_rate": 2.201791217428917e-07,
+      "loss": 0.7192,
+      "step": 155
+    },
+    {
+      "epoch": 0.9936305732484076,
+      "grad_norm": 0.1295260787010193,
+      "learning_rate": 9.78773480026396e-08,
+      "loss": 0.7363,
+      "step": 156
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.13316026329994202,
+      "learning_rate": 2.447233147570005e-08,
+      "loss": 0.7182,
+      "step": 157
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 157,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.999866416811213e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e7ad3a5846e8dba34f7faf3c5820de1c903cd66cafe879846f0b6dc5f79affd
+size 7672

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff