ErrorAI commited on
Commit
42d8a13
·
verified ·
1 Parent(s): abfff06

Training in progress, step 345, checkpoint

Browse files
last-checkpoint/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: katuni4ka/tiny-random-falcon-40b
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.13.2
last-checkpoint/adapter_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "katuni4ka/tiny-random-falcon-40b",
5
+ "bias": "none",
6
+ "fan_in_fan_out": null,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "dense",
24
+ "dense_h_to_4h",
25
+ "dense_4h_to_h",
26
+ "query_key_value"
27
+ ],
28
+ "task_type": "CAUSAL_LM",
29
+ "use_dora": false,
30
+ "use_rslora": false
31
+ }
last-checkpoint/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26c2732ae22d6babda155af79b5cb91f4d4e11b94c72cb4d54373dc84e41a8dc
3
+ size 125040
last-checkpoint/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b470787b406db98f576038d7ba8ee1fa63eef5334e0bea32ae3aaba4312559e
3
+ size 162868
last-checkpoint/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f45c92add258b0e6eb71d53b54445c891dd1b453c771bfe824e8b6a5a1ecf4f7
3
+ size 14244
last-checkpoint/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f549cd01d3025604157e79707eeb03b302e046e33bf895a4b0d7a56ddff8dec2
3
+ size 1064
last-checkpoint/special_tokens_map.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>PREFIX<<",
12
+ ">>SUFFIX<<",
13
+ ">>MIDDLE<<"
14
+ ],
15
+ "eos_token": {
16
+ "content": "<|endoftext|>",
17
+ "lstrip": false,
18
+ "normalized": false,
19
+ "rstrip": false,
20
+ "single_word": false
21
+ },
22
+ "pad_token": {
23
+ "content": "<|endoftext|>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false
28
+ }
29
+ }
last-checkpoint/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
last-checkpoint/tokenizer_config.json ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": ">>TITLE<<",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": ">>ABSTRACT<<",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": ">>INTRODUCTION<<",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": ">>SUMMARY<<",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": ">>COMMENT<<",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": ">>ANSWER<<",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": ">>QUESTION<<",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": ">>DOMAIN<<",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": ">>PREFIX<<",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": ">>SUFFIX<<",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": ">>MIDDLE<<",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<|endoftext|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ }
100
+ },
101
+ "additional_special_tokens": [
102
+ ">>TITLE<<",
103
+ ">>ABSTRACT<<",
104
+ ">>INTRODUCTION<<",
105
+ ">>SUMMARY<<",
106
+ ">>COMMENT<<",
107
+ ">>ANSWER<<",
108
+ ">>QUESTION<<",
109
+ ">>DOMAIN<<",
110
+ ">>PREFIX<<",
111
+ ">>SUFFIX<<",
112
+ ">>MIDDLE<<"
113
+ ],
114
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
115
+ "clean_up_tokenization_spaces": false,
116
+ "eos_token": "<|endoftext|>",
117
+ "model_input_names": [
118
+ "input_ids",
119
+ "attention_mask"
120
+ ],
121
+ "model_max_length": 2048,
122
+ "pad_token": "<|endoftext|>",
123
+ "tokenizer_class": "PreTrainedTokenizerFast"
124
+ }
last-checkpoint/trainer_state.json ADDED
@@ -0,0 +1,2464 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.03421940091251736,
5
+ "eval_steps": 345,
6
+ "global_step": 345,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 9.918666931164451e-05,
13
+ "grad_norm": 1.620390772819519,
14
+ "learning_rate": 2e-05,
15
+ "loss": 44.4878,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 9.918666931164451e-05,
20
+ "eval_loss": 11.11313247680664,
21
+ "eval_runtime": 12.6618,
22
+ "eval_samples_per_second": 335.261,
23
+ "eval_steps_per_second": 167.67,
24
+ "step": 1
25
+ },
26
+ {
27
+ "epoch": 0.00019837333862328903,
28
+ "grad_norm": 1.4469047784805298,
29
+ "learning_rate": 4e-05,
30
+ "loss": 44.5058,
31
+ "step": 2
32
+ },
33
+ {
34
+ "epoch": 0.00029756000793493357,
35
+ "grad_norm": 1.5632233619689941,
36
+ "learning_rate": 6e-05,
37
+ "loss": 44.4768,
38
+ "step": 3
39
+ },
40
+ {
41
+ "epoch": 0.00039674667724657806,
42
+ "grad_norm": 1.4555176496505737,
43
+ "learning_rate": 8e-05,
44
+ "loss": 44.4999,
45
+ "step": 4
46
+ },
47
+ {
48
+ "epoch": 0.0004959333465582226,
49
+ "grad_norm": 1.6239758729934692,
50
+ "learning_rate": 0.0001,
51
+ "loss": 44.3626,
52
+ "step": 5
53
+ },
54
+ {
55
+ "epoch": 0.0005951200158698671,
56
+ "grad_norm": 1.8492672443389893,
57
+ "learning_rate": 0.00012,
58
+ "loss": 44.5033,
59
+ "step": 6
60
+ },
61
+ {
62
+ "epoch": 0.0006943066851815116,
63
+ "grad_norm": 1.629564881324768,
64
+ "learning_rate": 0.00014,
65
+ "loss": 44.5071,
66
+ "step": 7
67
+ },
68
+ {
69
+ "epoch": 0.0007934933544931561,
70
+ "grad_norm": 1.5436326265335083,
71
+ "learning_rate": 0.00016,
72
+ "loss": 44.5455,
73
+ "step": 8
74
+ },
75
+ {
76
+ "epoch": 0.0008926800238048007,
77
+ "grad_norm": 1.712166666984558,
78
+ "learning_rate": 0.00018,
79
+ "loss": 44.3016,
80
+ "step": 9
81
+ },
82
+ {
83
+ "epoch": 0.0009918666931164452,
84
+ "grad_norm": 1.554046392440796,
85
+ "learning_rate": 0.0002,
86
+ "loss": 44.4516,
87
+ "step": 10
88
+ },
89
+ {
90
+ "epoch": 0.0010910533624280897,
91
+ "grad_norm": 1.6125948429107666,
92
+ "learning_rate": 0.00019999973592181317,
93
+ "loss": 44.3897,
94
+ "step": 11
95
+ },
96
+ {
97
+ "epoch": 0.0011902400317397343,
98
+ "grad_norm": 1.6851387023925781,
99
+ "learning_rate": 0.00019999894368864744,
100
+ "loss": 44.5396,
101
+ "step": 12
102
+ },
103
+ {
104
+ "epoch": 0.0012894267010513786,
105
+ "grad_norm": 1.5964475870132446,
106
+ "learning_rate": 0.00019999762330468704,
107
+ "loss": 44.5239,
108
+ "step": 13
109
+ },
110
+ {
111
+ "epoch": 0.0013886133703630231,
112
+ "grad_norm": 1.717020034790039,
113
+ "learning_rate": 0.00019999577477690568,
114
+ "loss": 44.5325,
115
+ "step": 14
116
+ },
117
+ {
118
+ "epoch": 0.0014878000396746677,
119
+ "grad_norm": 1.6545541286468506,
120
+ "learning_rate": 0.00019999339811506642,
121
+ "loss": 44.3654,
122
+ "step": 15
123
+ },
124
+ {
125
+ "epoch": 0.0015869867089863122,
126
+ "grad_norm": 1.6294705867767334,
127
+ "learning_rate": 0.00019999049333172182,
128
+ "loss": 44.5449,
129
+ "step": 16
130
+ },
131
+ {
132
+ "epoch": 0.0016861733782979568,
133
+ "grad_norm": 1.6682461500167847,
134
+ "learning_rate": 0.0001999870604422136,
135
+ "loss": 44.3407,
136
+ "step": 17
137
+ },
138
+ {
139
+ "epoch": 0.0017853600476096013,
140
+ "grad_norm": 1.4981224536895752,
141
+ "learning_rate": 0.0001999830994646729,
142
+ "loss": 44.407,
143
+ "step": 18
144
+ },
145
+ {
146
+ "epoch": 0.0018845467169212459,
147
+ "grad_norm": 1.670657753944397,
148
+ "learning_rate": 0.00019997861042001972,
149
+ "loss": 44.3801,
150
+ "step": 19
151
+ },
152
+ {
153
+ "epoch": 0.0019837333862328904,
154
+ "grad_norm": 1.7518388032913208,
155
+ "learning_rate": 0.00019997359333196339,
156
+ "loss": 44.4139,
157
+ "step": 20
158
+ },
159
+ {
160
+ "epoch": 0.002082920055544535,
161
+ "grad_norm": 1.7747282981872559,
162
+ "learning_rate": 0.0001999680482270019,
163
+ "loss": 44.3652,
164
+ "step": 21
165
+ },
166
+ {
167
+ "epoch": 0.0021821067248561795,
168
+ "grad_norm": 1.973069667816162,
169
+ "learning_rate": 0.00019996197513442204,
170
+ "loss": 44.4699,
171
+ "step": 22
172
+ },
173
+ {
174
+ "epoch": 0.002281293394167824,
175
+ "grad_norm": 1.9039539098739624,
176
+ "learning_rate": 0.00019995537408629932,
177
+ "loss": 44.5052,
178
+ "step": 23
179
+ },
180
+ {
181
+ "epoch": 0.0023804800634794686,
182
+ "grad_norm": 1.5412516593933105,
183
+ "learning_rate": 0.00019994824511749757,
184
+ "loss": 44.3189,
185
+ "step": 24
186
+ },
187
+ {
188
+ "epoch": 0.0024796667327911127,
189
+ "grad_norm": 1.517510175704956,
190
+ "learning_rate": 0.00019994058826566886,
191
+ "loss": 44.269,
192
+ "step": 25
193
+ },
194
+ {
195
+ "epoch": 0.002578853402102757,
196
+ "grad_norm": 1.7756372690200806,
197
+ "learning_rate": 0.00019993240357125337,
198
+ "loss": 44.2606,
199
+ "step": 26
200
+ },
201
+ {
202
+ "epoch": 0.0026780400714144018,
203
+ "grad_norm": 1.606436014175415,
204
+ "learning_rate": 0.00019992369107747906,
205
+ "loss": 44.4704,
206
+ "step": 27
207
+ },
208
+ {
209
+ "epoch": 0.0027772267407260463,
210
+ "grad_norm": 2.018287181854248,
211
+ "learning_rate": 0.00019991445083036155,
212
+ "loss": 44.3927,
213
+ "step": 28
214
+ },
215
+ {
216
+ "epoch": 0.002876413410037691,
217
+ "grad_norm": 2.0493083000183105,
218
+ "learning_rate": 0.0001999046828787038,
219
+ "loss": 44.2501,
220
+ "step": 29
221
+ },
222
+ {
223
+ "epoch": 0.0029756000793493354,
224
+ "grad_norm": 1.8427711725234985,
225
+ "learning_rate": 0.00019989438727409584,
226
+ "loss": 44.4513,
227
+ "step": 30
228
+ },
229
+ {
230
+ "epoch": 0.00307478674866098,
231
+ "grad_norm": 1.6964459419250488,
232
+ "learning_rate": 0.00019988356407091455,
233
+ "loss": 44.2702,
234
+ "step": 31
235
+ },
236
+ {
237
+ "epoch": 0.0031739734179726245,
238
+ "grad_norm": 1.9234622716903687,
239
+ "learning_rate": 0.00019987221332632343,
240
+ "loss": 44.4652,
241
+ "step": 32
242
+ },
243
+ {
244
+ "epoch": 0.003273160087284269,
245
+ "grad_norm": 1.8713277578353882,
246
+ "learning_rate": 0.0001998603351002721,
247
+ "loss": 44.2959,
248
+ "step": 33
249
+ },
250
+ {
251
+ "epoch": 0.0033723467565959135,
252
+ "grad_norm": 2.15635085105896,
253
+ "learning_rate": 0.0001998479294554962,
254
+ "loss": 44.2018,
255
+ "step": 34
256
+ },
257
+ {
258
+ "epoch": 0.003471533425907558,
259
+ "grad_norm": 2.2415690422058105,
260
+ "learning_rate": 0.00019983499645751692,
261
+ "loss": 44.387,
262
+ "step": 35
263
+ },
264
+ {
265
+ "epoch": 0.0035707200952192026,
266
+ "grad_norm": 1.9001964330673218,
267
+ "learning_rate": 0.00019982153617464072,
268
+ "loss": 44.2773,
269
+ "step": 36
270
+ },
271
+ {
272
+ "epoch": 0.003669906764530847,
273
+ "grad_norm": 2.3937878608703613,
274
+ "learning_rate": 0.00019980754867795898,
275
+ "loss": 44.372,
276
+ "step": 37
277
+ },
278
+ {
279
+ "epoch": 0.0037690934338424917,
280
+ "grad_norm": 2.18562650680542,
281
+ "learning_rate": 0.00019979303404134746,
282
+ "loss": 44.206,
283
+ "step": 38
284
+ },
285
+ {
286
+ "epoch": 0.0038682801031541363,
287
+ "grad_norm": 2.0082168579101562,
288
+ "learning_rate": 0.00019977799234146618,
289
+ "loss": 44.2083,
290
+ "step": 39
291
+ },
292
+ {
293
+ "epoch": 0.003967466772465781,
294
+ "grad_norm": 2.294412612915039,
295
+ "learning_rate": 0.0001997624236577589,
296
+ "loss": 44.2663,
297
+ "step": 40
298
+ },
299
+ {
300
+ "epoch": 0.004066653441777425,
301
+ "grad_norm": 2.418440818786621,
302
+ "learning_rate": 0.00019974632807245253,
303
+ "loss": 44.0445,
304
+ "step": 41
305
+ },
306
+ {
307
+ "epoch": 0.00416584011108907,
308
+ "grad_norm": 2.312135934829712,
309
+ "learning_rate": 0.00019972970567055698,
310
+ "loss": 44.0673,
311
+ "step": 42
312
+ },
313
+ {
314
+ "epoch": 0.004265026780400714,
315
+ "grad_norm": 2.2133700847625732,
316
+ "learning_rate": 0.00019971255653986452,
317
+ "loss": 44.2475,
318
+ "step": 43
319
+ },
320
+ {
321
+ "epoch": 0.004364213449712359,
322
+ "grad_norm": 2.2712738513946533,
323
+ "learning_rate": 0.00019969488077094933,
324
+ "loss": 44.1074,
325
+ "step": 44
326
+ },
327
+ {
328
+ "epoch": 0.0044634001190240035,
329
+ "grad_norm": 2.1439149379730225,
330
+ "learning_rate": 0.0001996766784571672,
331
+ "loss": 44.037,
332
+ "step": 45
333
+ },
334
+ {
335
+ "epoch": 0.004562586788335648,
336
+ "grad_norm": 2.561309814453125,
337
+ "learning_rate": 0.00019965794969465473,
338
+ "loss": 44.0506,
339
+ "step": 46
340
+ },
341
+ {
342
+ "epoch": 0.004661773457647293,
343
+ "grad_norm": 2.5531442165374756,
344
+ "learning_rate": 0.00019963869458232907,
345
+ "loss": 44.2289,
346
+ "step": 47
347
+ },
348
+ {
349
+ "epoch": 0.004760960126958937,
350
+ "grad_norm": 2.356276512145996,
351
+ "learning_rate": 0.0001996189132218874,
352
+ "loss": 44.224,
353
+ "step": 48
354
+ },
355
+ {
356
+ "epoch": 0.004860146796270581,
357
+ "grad_norm": 2.1195573806762695,
358
+ "learning_rate": 0.00019959860571780615,
359
+ "loss": 44.2144,
360
+ "step": 49
361
+ },
362
+ {
363
+ "epoch": 0.004959333465582225,
364
+ "grad_norm": 2.272167205810547,
365
+ "learning_rate": 0.00019957777217734078,
366
+ "loss": 44.1248,
367
+ "step": 50
368
+ },
369
+ {
370
+ "epoch": 0.00505852013489387,
371
+ "grad_norm": 2.4642817974090576,
372
+ "learning_rate": 0.00019955641271052487,
373
+ "loss": 43.9688,
374
+ "step": 51
375
+ },
376
+ {
377
+ "epoch": 0.005157706804205514,
378
+ "grad_norm": 2.2487308979034424,
379
+ "learning_rate": 0.00019953452743016987,
380
+ "loss": 43.9119,
381
+ "step": 52
382
+ },
383
+ {
384
+ "epoch": 0.005256893473517159,
385
+ "grad_norm": 2.326240062713623,
386
+ "learning_rate": 0.00019951211645186426,
387
+ "loss": 44.0474,
388
+ "step": 53
389
+ },
390
+ {
391
+ "epoch": 0.0053560801428288035,
392
+ "grad_norm": 2.248692035675049,
393
+ "learning_rate": 0.00019948917989397308,
394
+ "loss": 44.0475,
395
+ "step": 54
396
+ },
397
+ {
398
+ "epoch": 0.005455266812140448,
399
+ "grad_norm": 2.3791592121124268,
400
+ "learning_rate": 0.00019946571787763718,
401
+ "loss": 43.9322,
402
+ "step": 55
403
+ },
404
+ {
405
+ "epoch": 0.005554453481452093,
406
+ "grad_norm": 2.520576000213623,
407
+ "learning_rate": 0.00019944173052677272,
408
+ "loss": 43.7626,
409
+ "step": 56
410
+ },
411
+ {
412
+ "epoch": 0.005653640150763737,
413
+ "grad_norm": 2.3382210731506348,
414
+ "learning_rate": 0.00019941721796807042,
415
+ "loss": 43.8322,
416
+ "step": 57
417
+ },
418
+ {
419
+ "epoch": 0.005752826820075382,
420
+ "grad_norm": 2.1466667652130127,
421
+ "learning_rate": 0.0001993921803309949,
422
+ "loss": 43.9254,
423
+ "step": 58
424
+ },
425
+ {
426
+ "epoch": 0.005852013489387026,
427
+ "grad_norm": 2.225789785385132,
428
+ "learning_rate": 0.00019936661774778408,
429
+ "loss": 43.9358,
430
+ "step": 59
431
+ },
432
+ {
433
+ "epoch": 0.005951200158698671,
434
+ "grad_norm": 2.6257212162017822,
435
+ "learning_rate": 0.00019934053035344836,
436
+ "loss": 43.694,
437
+ "step": 60
438
+ },
439
+ {
440
+ "epoch": 0.006050386828010315,
441
+ "grad_norm": 2.5033576488494873,
442
+ "learning_rate": 0.00019931391828576991,
443
+ "loss": 43.6864,
444
+ "step": 61
445
+ },
446
+ {
447
+ "epoch": 0.00614957349732196,
448
+ "grad_norm": 2.434095621109009,
449
+ "learning_rate": 0.00019928678168530212,
450
+ "loss": 43.7584,
451
+ "step": 62
452
+ },
453
+ {
454
+ "epoch": 0.006248760166633604,
455
+ "grad_norm": 2.2750728130340576,
456
+ "learning_rate": 0.0001992591206953687,
457
+ "loss": 43.7666,
458
+ "step": 63
459
+ },
460
+ {
461
+ "epoch": 0.006347946835945249,
462
+ "grad_norm": 1.9984159469604492,
463
+ "learning_rate": 0.0001992309354620629,
464
+ "loss": 43.7153,
465
+ "step": 64
466
+ },
467
+ {
468
+ "epoch": 0.0064471335052568935,
469
+ "grad_norm": 2.116001844406128,
470
+ "learning_rate": 0.00019920222613424682,
471
+ "loss": 43.8651,
472
+ "step": 65
473
+ },
474
+ {
475
+ "epoch": 0.006546320174568538,
476
+ "grad_norm": 2.1209514141082764,
477
+ "learning_rate": 0.00019917299286355058,
478
+ "loss": 43.7459,
479
+ "step": 66
480
+ },
481
+ {
482
+ "epoch": 0.0066455068438801826,
483
+ "grad_norm": 2.773214340209961,
484
+ "learning_rate": 0.00019914323580437162,
485
+ "loss": 43.6404,
486
+ "step": 67
487
+ },
488
+ {
489
+ "epoch": 0.006744693513191827,
490
+ "grad_norm": 2.400967836380005,
491
+ "learning_rate": 0.00019911295511387372,
492
+ "loss": 43.6164,
493
+ "step": 68
494
+ },
495
+ {
496
+ "epoch": 0.006843880182503472,
497
+ "grad_norm": 2.4684884548187256,
498
+ "learning_rate": 0.00019908215095198626,
499
+ "loss": 43.68,
500
+ "step": 69
501
+ },
502
+ {
503
+ "epoch": 0.006943066851815116,
504
+ "grad_norm": 2.086573600769043,
505
+ "learning_rate": 0.00019905082348140341,
506
+ "loss": 43.7257,
507
+ "step": 70
508
+ },
509
+ {
510
+ "epoch": 0.007042253521126761,
511
+ "grad_norm": 2.0327978134155273,
512
+ "learning_rate": 0.00019901897286758318,
513
+ "loss": 43.5133,
514
+ "step": 71
515
+ },
516
+ {
517
+ "epoch": 0.007141440190438405,
518
+ "grad_norm": 2.020599842071533,
519
+ "learning_rate": 0.00019898659927874662,
520
+ "loss": 43.5116,
521
+ "step": 72
522
+ },
523
+ {
524
+ "epoch": 0.00724062685975005,
525
+ "grad_norm": 2.3334968090057373,
526
+ "learning_rate": 0.00019895370288587695,
527
+ "loss": 43.5966,
528
+ "step": 73
529
+ },
530
+ {
531
+ "epoch": 0.007339813529061694,
532
+ "grad_norm": 2.053218126296997,
533
+ "learning_rate": 0.0001989202838627185,
534
+ "loss": 43.74,
535
+ "step": 74
536
+ },
537
+ {
538
+ "epoch": 0.007439000198373339,
539
+ "grad_norm": 2.083590269088745,
540
+ "learning_rate": 0.000198886342385776,
541
+ "loss": 43.6054,
542
+ "step": 75
543
+ },
544
+ {
545
+ "epoch": 0.007538186867684983,
546
+ "grad_norm": 2.0403592586517334,
547
+ "learning_rate": 0.00019885187863431352,
548
+ "loss": 43.5867,
549
+ "step": 76
550
+ },
551
+ {
552
+ "epoch": 0.007637373536996628,
553
+ "grad_norm": 2.0573623180389404,
554
+ "learning_rate": 0.00019881689279035354,
555
+ "loss": 43.6566,
556
+ "step": 77
557
+ },
558
+ {
559
+ "epoch": 0.0077365602063082725,
560
+ "grad_norm": 1.7056894302368164,
561
+ "learning_rate": 0.00019878138503867608,
562
+ "loss": 43.604,
563
+ "step": 78
564
+ },
565
+ {
566
+ "epoch": 0.007835746875619916,
567
+ "grad_norm": 2.111693859100342,
568
+ "learning_rate": 0.00019874535556681756,
569
+ "loss": 43.4557,
570
+ "step": 79
571
+ },
572
+ {
573
+ "epoch": 0.007934933544931562,
574
+ "grad_norm": 2.048046350479126,
575
+ "learning_rate": 0.0001987088045650699,
576
+ "loss": 43.5062,
577
+ "step": 80
578
+ },
579
+ {
580
+ "epoch": 0.008034120214243205,
581
+ "grad_norm": 2.2594175338745117,
582
+ "learning_rate": 0.0001986717322264796,
583
+ "loss": 43.5631,
584
+ "step": 81
585
+ },
586
+ {
587
+ "epoch": 0.00813330688355485,
588
+ "grad_norm": 2.0121192932128906,
589
+ "learning_rate": 0.00019863413874684653,
590
+ "loss": 43.681,
591
+ "step": 82
592
+ },
593
+ {
594
+ "epoch": 0.008232493552866494,
595
+ "grad_norm": 1.7314908504486084,
596
+ "learning_rate": 0.0001985960243247231,
597
+ "loss": 43.601,
598
+ "step": 83
599
+ },
600
+ {
601
+ "epoch": 0.00833168022217814,
602
+ "grad_norm": 1.969624638557434,
603
+ "learning_rate": 0.00019855738916141303,
604
+ "loss": 43.3999,
605
+ "step": 84
606
+ },
607
+ {
608
+ "epoch": 0.008430866891489783,
609
+ "grad_norm": 1.8517192602157593,
610
+ "learning_rate": 0.0001985182334609704,
611
+ "loss": 43.6391,
612
+ "step": 85
613
+ },
614
+ {
615
+ "epoch": 0.008530053560801429,
616
+ "grad_norm": 1.8946924209594727,
617
+ "learning_rate": 0.00019847855743019858,
618
+ "loss": 43.5514,
619
+ "step": 86
620
+ },
621
+ {
622
+ "epoch": 0.008629240230113073,
623
+ "grad_norm": 1.6315033435821533,
624
+ "learning_rate": 0.00019843836127864895,
625
+ "loss": 43.4945,
626
+ "step": 87
627
+ },
628
+ {
629
+ "epoch": 0.008728426899424718,
630
+ "grad_norm": 2.0259015560150146,
631
+ "learning_rate": 0.00019839764521862015,
632
+ "loss": 43.5663,
633
+ "step": 88
634
+ },
635
+ {
636
+ "epoch": 0.008827613568736362,
637
+ "grad_norm": 2.0214736461639404,
638
+ "learning_rate": 0.0001983564094651566,
639
+ "loss": 43.4939,
640
+ "step": 89
641
+ },
642
+ {
643
+ "epoch": 0.008926800238048007,
644
+ "grad_norm": 2.1720495223999023,
645
+ "learning_rate": 0.00019831465423604752,
646
+ "loss": 43.4768,
647
+ "step": 90
648
+ },
649
+ {
650
+ "epoch": 0.00902598690735965,
651
+ "grad_norm": 1.972532868385315,
652
+ "learning_rate": 0.00019827237975182592,
653
+ "loss": 43.5539,
654
+ "step": 91
655
+ },
656
+ {
657
+ "epoch": 0.009125173576671296,
658
+ "grad_norm": 1.7981455326080322,
659
+ "learning_rate": 0.00019822958623576714,
660
+ "loss": 43.5811,
661
+ "step": 92
662
+ },
663
+ {
664
+ "epoch": 0.00922436024598294,
665
+ "grad_norm": 1.8357104063034058,
666
+ "learning_rate": 0.0001981862739138878,
667
+ "loss": 43.5306,
668
+ "step": 93
669
+ },
670
+ {
671
+ "epoch": 0.009323546915294585,
672
+ "grad_norm": 1.6894700527191162,
673
+ "learning_rate": 0.00019814244301494474,
674
+ "loss": 43.5594,
675
+ "step": 94
676
+ },
677
+ {
678
+ "epoch": 0.009422733584606229,
679
+ "grad_norm": 1.7075917720794678,
680
+ "learning_rate": 0.00019809809377043368,
681
+ "loss": 43.6353,
682
+ "step": 95
683
+ },
684
+ {
685
+ "epoch": 0.009521920253917874,
686
+ "grad_norm": 1.8798929452896118,
687
+ "learning_rate": 0.0001980532264145879,
688
+ "loss": 43.4567,
689
+ "step": 96
690
+ },
691
+ {
692
+ "epoch": 0.009621106923229518,
693
+ "grad_norm": 1.9370898008346558,
694
+ "learning_rate": 0.00019800784118437727,
695
+ "loss": 43.5642,
696
+ "step": 97
697
+ },
698
+ {
699
+ "epoch": 0.009720293592541162,
700
+ "grad_norm": 1.7264550924301147,
701
+ "learning_rate": 0.00019796193831950673,
702
+ "loss": 43.5234,
703
+ "step": 98
704
+ },
705
+ {
706
+ "epoch": 0.009819480261852807,
707
+ "grad_norm": 2.1001346111297607,
708
+ "learning_rate": 0.0001979155180624152,
709
+ "loss": 43.3069,
710
+ "step": 99
711
+ },
712
+ {
713
+ "epoch": 0.00991866693116445,
714
+ "grad_norm": 1.971402883529663,
715
+ "learning_rate": 0.00019786858065827425,
716
+ "loss": 43.2957,
717
+ "step": 100
718
+ },
719
+ {
720
+ "epoch": 0.010017853600476096,
721
+ "grad_norm": 1.9676079750061035,
722
+ "learning_rate": 0.00019782112635498676,
723
+ "loss": 43.3939,
724
+ "step": 101
725
+ },
726
+ {
727
+ "epoch": 0.01011704026978774,
728
+ "grad_norm": 1.6693346500396729,
729
+ "learning_rate": 0.00019777315540318565,
730
+ "loss": 43.4189,
731
+ "step": 102
732
+ },
733
+ {
734
+ "epoch": 0.010216226939099385,
735
+ "grad_norm": 1.7066096067428589,
736
+ "learning_rate": 0.00019772466805623252,
737
+ "loss": 43.5922,
738
+ "step": 103
739
+ },
740
+ {
741
+ "epoch": 0.010315413608411029,
742
+ "grad_norm": 1.8379673957824707,
743
+ "learning_rate": 0.00019767566457021648,
744
+ "loss": 43.5591,
745
+ "step": 104
746
+ },
747
+ {
748
+ "epoch": 0.010414600277722674,
749
+ "grad_norm": 1.9390805959701538,
750
+ "learning_rate": 0.00019762614520395247,
751
+ "loss": 43.5645,
752
+ "step": 105
753
+ },
754
+ {
755
+ "epoch": 0.010513786947034318,
756
+ "grad_norm": 1.8997091054916382,
757
+ "learning_rate": 0.00019757611021898024,
758
+ "loss": 43.6896,
759
+ "step": 106
760
+ },
761
+ {
762
+ "epoch": 0.010612973616345963,
763
+ "grad_norm": 1.7241935729980469,
764
+ "learning_rate": 0.0001975255598795627,
765
+ "loss": 43.6881,
766
+ "step": 107
767
+ },
768
+ {
769
+ "epoch": 0.010712160285657607,
770
+ "grad_norm": 1.5904136896133423,
771
+ "learning_rate": 0.00019747449445268474,
772
+ "loss": 43.4718,
773
+ "step": 108
774
+ },
775
+ {
776
+ "epoch": 0.010811346954969252,
777
+ "grad_norm": 1.940871238708496,
778
+ "learning_rate": 0.00019742291420805165,
779
+ "loss": 43.3384,
780
+ "step": 109
781
+ },
782
+ {
783
+ "epoch": 0.010910533624280896,
784
+ "grad_norm": 1.9792636632919312,
785
+ "learning_rate": 0.0001973708194180878,
786
+ "loss": 43.5913,
787
+ "step": 110
788
+ },
789
+ {
790
+ "epoch": 0.011009720293592542,
791
+ "grad_norm": 2.008183240890503,
792
+ "learning_rate": 0.00019731821035793507,
793
+ "loss": 43.4205,
794
+ "step": 111
795
+ },
796
+ {
797
+ "epoch": 0.011108906962904185,
798
+ "grad_norm": 2.022135019302368,
799
+ "learning_rate": 0.00019726508730545162,
800
+ "loss": 43.385,
801
+ "step": 112
802
+ },
803
+ {
804
+ "epoch": 0.01120809363221583,
805
+ "grad_norm": 1.4658271074295044,
806
+ "learning_rate": 0.00019721145054121028,
807
+ "loss": 43.5396,
808
+ "step": 113
809
+ },
810
+ {
811
+ "epoch": 0.011307280301527474,
812
+ "grad_norm": 1.9923700094223022,
813
+ "learning_rate": 0.00019715730034849696,
814
+ "loss": 43.4149,
815
+ "step": 114
816
+ },
817
+ {
818
+ "epoch": 0.01140646697083912,
819
+ "grad_norm": 2.1149795055389404,
820
+ "learning_rate": 0.00019710263701330937,
821
+ "loss": 43.4565,
822
+ "step": 115
823
+ },
824
+ {
825
+ "epoch": 0.011505653640150763,
826
+ "grad_norm": 1.8142424821853638,
827
+ "learning_rate": 0.0001970474608243554,
828
+ "loss": 43.4097,
829
+ "step": 116
830
+ },
831
+ {
832
+ "epoch": 0.011604840309462409,
833
+ "grad_norm": 1.8109841346740723,
834
+ "learning_rate": 0.00019699177207305166,
835
+ "loss": 43.3104,
836
+ "step": 117
837
+ },
838
+ {
839
+ "epoch": 0.011704026978774052,
840
+ "grad_norm": 1.83271062374115,
841
+ "learning_rate": 0.0001969355710535218,
842
+ "loss": 43.5675,
843
+ "step": 118
844
+ },
845
+ {
846
+ "epoch": 0.011803213648085698,
847
+ "grad_norm": 2.103610038757324,
848
+ "learning_rate": 0.00019687885806259504,
849
+ "loss": 43.1547,
850
+ "step": 119
851
+ },
852
+ {
853
+ "epoch": 0.011902400317397342,
854
+ "grad_norm": 1.7136248350143433,
855
+ "learning_rate": 0.00019682163339980474,
856
+ "loss": 43.618,
857
+ "step": 120
858
+ },
859
+ {
860
+ "epoch": 0.012001586986708987,
861
+ "grad_norm": 1.6430232524871826,
862
+ "learning_rate": 0.00019676389736738656,
863
+ "loss": 43.5748,
864
+ "step": 121
865
+ },
866
+ {
867
+ "epoch": 0.01210077365602063,
868
+ "grad_norm": 1.9517197608947754,
869
+ "learning_rate": 0.00019670565027027706,
870
+ "loss": 43.4926,
871
+ "step": 122
872
+ },
873
+ {
874
+ "epoch": 0.012199960325332276,
875
+ "grad_norm": 2.109898567199707,
876
+ "learning_rate": 0.00019664689241611196,
877
+ "loss": 43.6146,
878
+ "step": 123
879
+ },
880
+ {
881
+ "epoch": 0.01229914699464392,
882
+ "grad_norm": 1.8593815565109253,
883
+ "learning_rate": 0.00019658762411522464,
884
+ "loss": 43.3389,
885
+ "step": 124
886
+ },
887
+ {
888
+ "epoch": 0.012398333663955565,
889
+ "grad_norm": 1.5332725048065186,
890
+ "learning_rate": 0.0001965278456806444,
891
+ "loss": 43.4073,
892
+ "step": 125
893
+ },
894
+ {
895
+ "epoch": 0.012497520333267209,
896
+ "grad_norm": 2.055220603942871,
897
+ "learning_rate": 0.00019646755742809487,
898
+ "loss": 43.051,
899
+ "step": 126
900
+ },
901
+ {
902
+ "epoch": 0.012596707002578854,
903
+ "grad_norm": 1.8248649835586548,
904
+ "learning_rate": 0.00019640675967599224,
905
+ "loss": 43.3472,
906
+ "step": 127
907
+ },
908
+ {
909
+ "epoch": 0.012695893671890498,
910
+ "grad_norm": 1.7612577676773071,
911
+ "learning_rate": 0.0001963454527454438,
912
+ "loss": 43.4539,
913
+ "step": 128
914
+ },
915
+ {
916
+ "epoch": 0.012795080341202143,
917
+ "grad_norm": 1.922441005706787,
918
+ "learning_rate": 0.00019628363696024592,
919
+ "loss": 43.3149,
920
+ "step": 129
921
+ },
922
+ {
923
+ "epoch": 0.012894267010513787,
924
+ "grad_norm": 1.77986478805542,
925
+ "learning_rate": 0.00019622131264688267,
926
+ "loss": 43.2937,
927
+ "step": 130
928
+ },
929
+ {
930
+ "epoch": 0.01299345367982543,
931
+ "grad_norm": 1.7966907024383545,
932
+ "learning_rate": 0.00019615848013452387,
933
+ "loss": 43.373,
934
+ "step": 131
935
+ },
936
+ {
937
+ "epoch": 0.013092640349137076,
938
+ "grad_norm": 1.9414621591567993,
939
+ "learning_rate": 0.00019609513975502342,
940
+ "loss": 43.2283,
941
+ "step": 132
942
+ },
943
+ {
944
+ "epoch": 0.01319182701844872,
945
+ "grad_norm": 1.8673967123031616,
946
+ "learning_rate": 0.0001960312918429176,
947
+ "loss": 43.5471,
948
+ "step": 133
949
+ },
950
+ {
951
+ "epoch": 0.013291013687760365,
952
+ "grad_norm": 2.33743953704834,
953
+ "learning_rate": 0.00019596693673542322,
954
+ "loss": 43.4404,
955
+ "step": 134
956
+ },
957
+ {
958
+ "epoch": 0.013390200357072009,
959
+ "grad_norm": 1.9352983236312866,
960
+ "learning_rate": 0.00019590207477243588,
961
+ "loss": 43.4078,
962
+ "step": 135
963
+ },
964
+ {
965
+ "epoch": 0.013489387026383654,
966
+ "grad_norm": 1.6548969745635986,
967
+ "learning_rate": 0.00019583670629652814,
968
+ "loss": 43.5059,
969
+ "step": 136
970
+ },
971
+ {
972
+ "epoch": 0.013588573695695298,
973
+ "grad_norm": 1.8436064720153809,
974
+ "learning_rate": 0.0001957708316529478,
975
+ "loss": 43.4167,
976
+ "step": 137
977
+ },
978
+ {
979
+ "epoch": 0.013687760365006943,
980
+ "grad_norm": 1.878476619720459,
981
+ "learning_rate": 0.00019570445118961602,
982
+ "loss": 43.4777,
983
+ "step": 138
984
+ },
985
+ {
986
+ "epoch": 0.013786947034318587,
987
+ "grad_norm": 1.7631700038909912,
988
+ "learning_rate": 0.0001956375652571254,
989
+ "loss": 43.4291,
990
+ "step": 139
991
+ },
992
+ {
993
+ "epoch": 0.013886133703630232,
994
+ "grad_norm": 1.9526875019073486,
995
+ "learning_rate": 0.00019557017420873825,
996
+ "loss": 43.6412,
997
+ "step": 140
998
+ },
999
+ {
1000
+ "epoch": 0.013985320372941876,
1001
+ "grad_norm": 1.9807161092758179,
1002
+ "learning_rate": 0.00019550227840038476,
1003
+ "loss": 43.1551,
1004
+ "step": 141
1005
+ },
1006
+ {
1007
+ "epoch": 0.014084507042253521,
1008
+ "grad_norm": 1.9654165506362915,
1009
+ "learning_rate": 0.0001954338781906609,
1010
+ "loss": 43.4855,
1011
+ "step": 142
1012
+ },
1013
+ {
1014
+ "epoch": 0.014183693711565165,
1015
+ "grad_norm": 1.8886873722076416,
1016
+ "learning_rate": 0.00019536497394082676,
1017
+ "loss": 43.4846,
1018
+ "step": 143
1019
+ },
1020
+ {
1021
+ "epoch": 0.01428288038087681,
1022
+ "grad_norm": 1.9163804054260254,
1023
+ "learning_rate": 0.0001952955660148045,
1024
+ "loss": 43.5417,
1025
+ "step": 144
1026
+ },
1027
+ {
1028
+ "epoch": 0.014382067050188454,
1029
+ "grad_norm": 1.6911556720733643,
1030
+ "learning_rate": 0.00019522565477917653,
1031
+ "loss": 43.737,
1032
+ "step": 145
1033
+ },
1034
+ {
1035
+ "epoch": 0.0144812537195001,
1036
+ "grad_norm": 1.8554767370224,
1037
+ "learning_rate": 0.0001951552406031835,
1038
+ "loss": 43.5507,
1039
+ "step": 146
1040
+ },
1041
+ {
1042
+ "epoch": 0.014580440388811743,
1043
+ "grad_norm": 1.7839596271514893,
1044
+ "learning_rate": 0.00019508432385872238,
1045
+ "loss": 43.5163,
1046
+ "step": 147
1047
+ },
1048
+ {
1049
+ "epoch": 0.014679627058123389,
1050
+ "grad_norm": 1.5847241878509521,
1051
+ "learning_rate": 0.00019501290492034444,
1052
+ "loss": 43.4539,
1053
+ "step": 148
1054
+ },
1055
+ {
1056
+ "epoch": 0.014778813727435032,
1057
+ "grad_norm": 2.139988422393799,
1058
+ "learning_rate": 0.00019494098416525336,
1059
+ "loss": 43.457,
1060
+ "step": 149
1061
+ },
1062
+ {
1063
+ "epoch": 0.014878000396746678,
1064
+ "grad_norm": 1.82491934299469,
1065
+ "learning_rate": 0.00019486856197330324,
1066
+ "loss": 43.4277,
1067
+ "step": 150
1068
+ },
1069
+ {
1070
+ "epoch": 0.014977187066058321,
1071
+ "grad_norm": 1.5014609098434448,
1072
+ "learning_rate": 0.00019479563872699647,
1073
+ "loss": 43.3014,
1074
+ "step": 151
1075
+ },
1076
+ {
1077
+ "epoch": 0.015076373735369967,
1078
+ "grad_norm": 1.6398491859436035,
1079
+ "learning_rate": 0.0001947222148114818,
1080
+ "loss": 43.2079,
1081
+ "step": 152
1082
+ },
1083
+ {
1084
+ "epoch": 0.01517556040468161,
1085
+ "grad_norm": 1.5315080881118774,
1086
+ "learning_rate": 0.00019464829061455234,
1087
+ "loss": 43.3428,
1088
+ "step": 153
1089
+ },
1090
+ {
1091
+ "epoch": 0.015274747073993256,
1092
+ "grad_norm": 1.8612087965011597,
1093
+ "learning_rate": 0.00019457386652664346,
1094
+ "loss": 43.672,
1095
+ "step": 154
1096
+ },
1097
+ {
1098
+ "epoch": 0.0153739337433049,
1099
+ "grad_norm": 1.6463797092437744,
1100
+ "learning_rate": 0.0001944989429408307,
1101
+ "loss": 43.3032,
1102
+ "step": 155
1103
+ },
1104
+ {
1105
+ "epoch": 0.015473120412616545,
1106
+ "grad_norm": 1.8886929750442505,
1107
+ "learning_rate": 0.00019442352025282776,
1108
+ "loss": 43.409,
1109
+ "step": 156
1110
+ },
1111
+ {
1112
+ "epoch": 0.015572307081928189,
1113
+ "grad_norm": 1.5860872268676758,
1114
+ "learning_rate": 0.00019434759886098442,
1115
+ "loss": 43.5002,
1116
+ "step": 157
1117
+ },
1118
+ {
1119
+ "epoch": 0.015671493751239832,
1120
+ "grad_norm": 1.5877327919006348,
1121
+ "learning_rate": 0.0001942711791662843,
1122
+ "loss": 43.1642,
1123
+ "step": 158
1124
+ },
1125
+ {
1126
+ "epoch": 0.015770680420551478,
1127
+ "grad_norm": 1.9605953693389893,
1128
+ "learning_rate": 0.00019419426157234288,
1129
+ "loss": 43.5132,
1130
+ "step": 159
1131
+ },
1132
+ {
1133
+ "epoch": 0.015869867089863123,
1134
+ "grad_norm": 1.751679539680481,
1135
+ "learning_rate": 0.00019411684648540538,
1136
+ "loss": 43.5054,
1137
+ "step": 160
1138
+ },
1139
+ {
1140
+ "epoch": 0.01596905375917477,
1141
+ "grad_norm": 1.4614412784576416,
1142
+ "learning_rate": 0.00019403893431434445,
1143
+ "loss": 43.3814,
1144
+ "step": 161
1145
+ },
1146
+ {
1147
+ "epoch": 0.01606824042848641,
1148
+ "grad_norm": 1.5248690843582153,
1149
+ "learning_rate": 0.00019396052547065827,
1150
+ "loss": 43.5447,
1151
+ "step": 162
1152
+ },
1153
+ {
1154
+ "epoch": 0.016167427097798056,
1155
+ "grad_norm": 1.5921061038970947,
1156
+ "learning_rate": 0.0001938816203684681,
1157
+ "loss": 43.5961,
1158
+ "step": 163
1159
+ },
1160
+ {
1161
+ "epoch": 0.0162666137671097,
1162
+ "grad_norm": 1.978134036064148,
1163
+ "learning_rate": 0.00019380221942451623,
1164
+ "loss": 43.318,
1165
+ "step": 164
1166
+ },
1167
+ {
1168
+ "epoch": 0.016365800436421343,
1169
+ "grad_norm": 1.8620237112045288,
1170
+ "learning_rate": 0.00019372232305816387,
1171
+ "loss": 43.4081,
1172
+ "step": 165
1173
+ },
1174
+ {
1175
+ "epoch": 0.01646498710573299,
1176
+ "grad_norm": 1.890067219734192,
1177
+ "learning_rate": 0.00019364193169138876,
1178
+ "loss": 43.37,
1179
+ "step": 166
1180
+ },
1181
+ {
1182
+ "epoch": 0.016564173775044634,
1183
+ "grad_norm": 1.7698861360549927,
1184
+ "learning_rate": 0.000193561045748783,
1185
+ "loss": 43.2327,
1186
+ "step": 167
1187
+ },
1188
+ {
1189
+ "epoch": 0.01666336044435628,
1190
+ "grad_norm": 2.023280382156372,
1191
+ "learning_rate": 0.00019347966565755084,
1192
+ "loss": 43.5565,
1193
+ "step": 168
1194
+ },
1195
+ {
1196
+ "epoch": 0.01676254711366792,
1197
+ "grad_norm": 1.60719895362854,
1198
+ "learning_rate": 0.00019339779184750647,
1199
+ "loss": 43.4494,
1200
+ "step": 169
1201
+ },
1202
+ {
1203
+ "epoch": 0.016861733782979567,
1204
+ "grad_norm": 1.7296546697616577,
1205
+ "learning_rate": 0.0001933154247510716,
1206
+ "loss": 43.4502,
1207
+ "step": 170
1208
+ },
1209
+ {
1210
+ "epoch": 0.016960920452291212,
1211
+ "grad_norm": 1.568786382675171,
1212
+ "learning_rate": 0.00019323256480327335,
1213
+ "loss": 43.53,
1214
+ "step": 171
1215
+ },
1216
+ {
1217
+ "epoch": 0.017060107121602858,
1218
+ "grad_norm": 1.7664810419082642,
1219
+ "learning_rate": 0.00019314921244174173,
1220
+ "loss": 43.3674,
1221
+ "step": 172
1222
+ },
1223
+ {
1224
+ "epoch": 0.0171592937909145,
1225
+ "grad_norm": 1.7805827856063843,
1226
+ "learning_rate": 0.0001930653681067076,
1227
+ "loss": 43.3056,
1228
+ "step": 173
1229
+ },
1230
+ {
1231
+ "epoch": 0.017258480460226145,
1232
+ "grad_norm": 1.9585574865341187,
1233
+ "learning_rate": 0.00019298103224100018,
1234
+ "loss": 43.4174,
1235
+ "step": 174
1236
+ },
1237
+ {
1238
+ "epoch": 0.01735766712953779,
1239
+ "grad_norm": 1.9280157089233398,
1240
+ "learning_rate": 0.00019289620529004467,
1241
+ "loss": 43.4921,
1242
+ "step": 175
1243
+ },
1244
+ {
1245
+ "epoch": 0.017456853798849436,
1246
+ "grad_norm": 1.6859203577041626,
1247
+ "learning_rate": 0.00019281088770186004,
1248
+ "loss": 43.2668,
1249
+ "step": 176
1250
+ },
1251
+ {
1252
+ "epoch": 0.017556040468161078,
1253
+ "grad_norm": 1.5108959674835205,
1254
+ "learning_rate": 0.00019272507992705657,
1255
+ "loss": 43.4231,
1256
+ "step": 177
1257
+ },
1258
+ {
1259
+ "epoch": 0.017655227137472723,
1260
+ "grad_norm": 1.8462445735931396,
1261
+ "learning_rate": 0.00019263878241883352,
1262
+ "loss": 43.5939,
1263
+ "step": 178
1264
+ },
1265
+ {
1266
+ "epoch": 0.01775441380678437,
1267
+ "grad_norm": 2.0469777584075928,
1268
+ "learning_rate": 0.00019255199563297665,
1269
+ "loss": 43.4041,
1270
+ "step": 179
1271
+ },
1272
+ {
1273
+ "epoch": 0.017853600476096014,
1274
+ "grad_norm": 1.8307596445083618,
1275
+ "learning_rate": 0.0001924647200278559,
1276
+ "loss": 43.568,
1277
+ "step": 180
1278
+ },
1279
+ {
1280
+ "epoch": 0.017952787145407656,
1281
+ "grad_norm": 1.7826749086380005,
1282
+ "learning_rate": 0.00019237695606442293,
1283
+ "loss": 43.4189,
1284
+ "step": 181
1285
+ },
1286
+ {
1287
+ "epoch": 0.0180519738147193,
1288
+ "grad_norm": 1.6939207315444946,
1289
+ "learning_rate": 0.00019228870420620874,
1290
+ "loss": 43.7362,
1291
+ "step": 182
1292
+ },
1293
+ {
1294
+ "epoch": 0.018151160484030947,
1295
+ "grad_norm": 1.5782077312469482,
1296
+ "learning_rate": 0.00019219996491932114,
1297
+ "loss": 43.4713,
1298
+ "step": 183
1299
+ },
1300
+ {
1301
+ "epoch": 0.018250347153342592,
1302
+ "grad_norm": 1.732811450958252,
1303
+ "learning_rate": 0.00019211073867244228,
1304
+ "loss": 43.2893,
1305
+ "step": 184
1306
+ },
1307
+ {
1308
+ "epoch": 0.018349533822654234,
1309
+ "grad_norm": 2.1155354976654053,
1310
+ "learning_rate": 0.00019202102593682632,
1311
+ "loss": 43.6014,
1312
+ "step": 185
1313
+ },
1314
+ {
1315
+ "epoch": 0.01844872049196588,
1316
+ "grad_norm": 1.980027675628662,
1317
+ "learning_rate": 0.00019193082718629677,
1318
+ "loss": 43.2281,
1319
+ "step": 186
1320
+ },
1321
+ {
1322
+ "epoch": 0.018547907161277525,
1323
+ "grad_norm": 1.9374451637268066,
1324
+ "learning_rate": 0.0001918401428972441,
1325
+ "loss": 43.6718,
1326
+ "step": 187
1327
+ },
1328
+ {
1329
+ "epoch": 0.01864709383058917,
1330
+ "grad_norm": 2.091322898864746,
1331
+ "learning_rate": 0.00019174897354862317,
1332
+ "loss": 43.4641,
1333
+ "step": 188
1334
+ },
1335
+ {
1336
+ "epoch": 0.018746280499900812,
1337
+ "grad_norm": 1.642065167427063,
1338
+ "learning_rate": 0.00019165731962195065,
1339
+ "loss": 43.4111,
1340
+ "step": 189
1341
+ },
1342
+ {
1343
+ "epoch": 0.018845467169212458,
1344
+ "grad_norm": 1.905786156654358,
1345
+ "learning_rate": 0.00019156518160130267,
1346
+ "loss": 43.5117,
1347
+ "step": 190
1348
+ },
1349
+ {
1350
+ "epoch": 0.018944653838524103,
1351
+ "grad_norm": 1.964237928390503,
1352
+ "learning_rate": 0.00019147255997331198,
1353
+ "loss": 43.5668,
1354
+ "step": 191
1355
+ },
1356
+ {
1357
+ "epoch": 0.01904384050783575,
1358
+ "grad_norm": 1.7513357400894165,
1359
+ "learning_rate": 0.00019137945522716568,
1360
+ "loss": 43.5371,
1361
+ "step": 192
1362
+ },
1363
+ {
1364
+ "epoch": 0.01914302717714739,
1365
+ "grad_norm": 1.5807865858078003,
1366
+ "learning_rate": 0.00019128586785460236,
1367
+ "loss": 43.4721,
1368
+ "step": 193
1369
+ },
1370
+ {
1371
+ "epoch": 0.019242213846459036,
1372
+ "grad_norm": 1.9611173868179321,
1373
+ "learning_rate": 0.00019119179834990973,
1374
+ "loss": 43.5727,
1375
+ "step": 194
1376
+ },
1377
+ {
1378
+ "epoch": 0.01934140051577068,
1379
+ "grad_norm": 1.5683705806732178,
1380
+ "learning_rate": 0.0001910972472099219,
1381
+ "loss": 43.5582,
1382
+ "step": 195
1383
+ },
1384
+ {
1385
+ "epoch": 0.019440587185082323,
1386
+ "grad_norm": 1.8240303993225098,
1387
+ "learning_rate": 0.00019100221493401672,
1388
+ "loss": 43.5326,
1389
+ "step": 196
1390
+ },
1391
+ {
1392
+ "epoch": 0.01953977385439397,
1393
+ "grad_norm": 1.6050409078598022,
1394
+ "learning_rate": 0.00019090670202411315,
1395
+ "loss": 43.5931,
1396
+ "step": 197
1397
+ },
1398
+ {
1399
+ "epoch": 0.019638960523705614,
1400
+ "grad_norm": 1.769248127937317,
1401
+ "learning_rate": 0.00019081070898466882,
1402
+ "loss": 43.4457,
1403
+ "step": 198
1404
+ },
1405
+ {
1406
+ "epoch": 0.01973814719301726,
1407
+ "grad_norm": 1.4461135864257812,
1408
+ "learning_rate": 0.000190714236322677,
1409
+ "loss": 43.5051,
1410
+ "step": 199
1411
+ },
1412
+ {
1413
+ "epoch": 0.0198373338623289,
1414
+ "grad_norm": 1.6308741569519043,
1415
+ "learning_rate": 0.00019061728454766423,
1416
+ "loss": 43.3338,
1417
+ "step": 200
1418
+ },
1419
+ {
1420
+ "epoch": 0.019936520531640547,
1421
+ "grad_norm": 1.7925598621368408,
1422
+ "learning_rate": 0.0001905198541716875,
1423
+ "loss": 43.4649,
1424
+ "step": 201
1425
+ },
1426
+ {
1427
+ "epoch": 0.020035707200952192,
1428
+ "grad_norm": 1.7756211757659912,
1429
+ "learning_rate": 0.00019042194570933156,
1430
+ "loss": 43.5132,
1431
+ "step": 202
1432
+ },
1433
+ {
1434
+ "epoch": 0.020134893870263838,
1435
+ "grad_norm": 1.5081807374954224,
1436
+ "learning_rate": 0.00019032355967770617,
1437
+ "loss": 43.4499,
1438
+ "step": 203
1439
+ },
1440
+ {
1441
+ "epoch": 0.02023408053957548,
1442
+ "grad_norm": 1.9493539333343506,
1443
+ "learning_rate": 0.00019022469659644344,
1444
+ "loss": 43.4622,
1445
+ "step": 204
1446
+ },
1447
+ {
1448
+ "epoch": 0.020333267208887125,
1449
+ "grad_norm": 1.603097915649414,
1450
+ "learning_rate": 0.00019012535698769502,
1451
+ "loss": 43.2635,
1452
+ "step": 205
1453
+ },
1454
+ {
1455
+ "epoch": 0.02043245387819877,
1456
+ "grad_norm": 1.8952986001968384,
1457
+ "learning_rate": 0.0001900255413761294,
1458
+ "loss": 43.3947,
1459
+ "step": 206
1460
+ },
1461
+ {
1462
+ "epoch": 0.020531640547510416,
1463
+ "grad_norm": 1.8197367191314697,
1464
+ "learning_rate": 0.0001899252502889291,
1465
+ "loss": 43.4419,
1466
+ "step": 207
1467
+ },
1468
+ {
1469
+ "epoch": 0.020630827216822058,
1470
+ "grad_norm": 1.6402533054351807,
1471
+ "learning_rate": 0.00018982448425578787,
1472
+ "loss": 43.422,
1473
+ "step": 208
1474
+ },
1475
+ {
1476
+ "epoch": 0.020730013886133703,
1477
+ "grad_norm": 1.760541558265686,
1478
+ "learning_rate": 0.00018972324380890794,
1479
+ "loss": 43.2571,
1480
+ "step": 209
1481
+ },
1482
+ {
1483
+ "epoch": 0.02082920055544535,
1484
+ "grad_norm": 1.8905282020568848,
1485
+ "learning_rate": 0.0001896215294829972,
1486
+ "loss": 43.4353,
1487
+ "step": 210
1488
+ },
1489
+ {
1490
+ "epoch": 0.020928387224756994,
1491
+ "grad_norm": 1.965397834777832,
1492
+ "learning_rate": 0.0001895193418152663,
1493
+ "loss": 43.3452,
1494
+ "step": 211
1495
+ },
1496
+ {
1497
+ "epoch": 0.021027573894068636,
1498
+ "grad_norm": 1.6746777296066284,
1499
+ "learning_rate": 0.00018941668134542602,
1500
+ "loss": 43.4485,
1501
+ "step": 212
1502
+ },
1503
+ {
1504
+ "epoch": 0.02112676056338028,
1505
+ "grad_norm": 1.8389016389846802,
1506
+ "learning_rate": 0.00018931354861568406,
1507
+ "loss": 43.3101,
1508
+ "step": 213
1509
+ },
1510
+ {
1511
+ "epoch": 0.021225947232691927,
1512
+ "grad_norm": 1.922573208808899,
1513
+ "learning_rate": 0.00018920994417074256,
1514
+ "loss": 43.3094,
1515
+ "step": 214
1516
+ },
1517
+ {
1518
+ "epoch": 0.021325133902003572,
1519
+ "grad_norm": 1.7113759517669678,
1520
+ "learning_rate": 0.000189105868557795,
1521
+ "loss": 43.437,
1522
+ "step": 215
1523
+ },
1524
+ {
1525
+ "epoch": 0.021424320571315214,
1526
+ "grad_norm": 1.8494682312011719,
1527
+ "learning_rate": 0.00018900132232652336,
1528
+ "loss": 43.6071,
1529
+ "step": 216
1530
+ },
1531
+ {
1532
+ "epoch": 0.02152350724062686,
1533
+ "grad_norm": 1.8459663391113281,
1534
+ "learning_rate": 0.00018889630602909521,
1535
+ "loss": 43.3704,
1536
+ "step": 217
1537
+ },
1538
+ {
1539
+ "epoch": 0.021622693909938505,
1540
+ "grad_norm": 1.892861008644104,
1541
+ "learning_rate": 0.00018879082022016085,
1542
+ "loss": 43.4787,
1543
+ "step": 218
1544
+ },
1545
+ {
1546
+ "epoch": 0.02172188057925015,
1547
+ "grad_norm": 1.9636040925979614,
1548
+ "learning_rate": 0.00018868486545685028,
1549
+ "loss": 43.2111,
1550
+ "step": 219
1551
+ },
1552
+ {
1553
+ "epoch": 0.021821067248561792,
1554
+ "grad_norm": 2.4153854846954346,
1555
+ "learning_rate": 0.00018857844229877032,
1556
+ "loss": 43.5971,
1557
+ "step": 220
1558
+ },
1559
+ {
1560
+ "epoch": 0.021920253917873438,
1561
+ "grad_norm": 1.7833011150360107,
1562
+ "learning_rate": 0.00018847155130800167,
1563
+ "loss": 43.5453,
1564
+ "step": 221
1565
+ },
1566
+ {
1567
+ "epoch": 0.022019440587185083,
1568
+ "grad_norm": 2.046844720840454,
1569
+ "learning_rate": 0.00018836419304909594,
1570
+ "loss": 43.5533,
1571
+ "step": 222
1572
+ },
1573
+ {
1574
+ "epoch": 0.02211862725649673,
1575
+ "grad_norm": 1.7641832828521729,
1576
+ "learning_rate": 0.00018825636808907258,
1577
+ "loss": 43.493,
1578
+ "step": 223
1579
+ },
1580
+ {
1581
+ "epoch": 0.02221781392580837,
1582
+ "grad_norm": 1.7998050451278687,
1583
+ "learning_rate": 0.00018814807699741603,
1584
+ "loss": 43.4208,
1585
+ "step": 224
1586
+ },
1587
+ {
1588
+ "epoch": 0.022317000595120016,
1589
+ "grad_norm": 1.6221156120300293,
1590
+ "learning_rate": 0.00018803932034607256,
1591
+ "loss": 43.6161,
1592
+ "step": 225
1593
+ },
1594
+ {
1595
+ "epoch": 0.02241618726443166,
1596
+ "grad_norm": 1.863451600074768,
1597
+ "learning_rate": 0.00018793009870944738,
1598
+ "loss": 43.3304,
1599
+ "step": 226
1600
+ },
1601
+ {
1602
+ "epoch": 0.022515373933743307,
1603
+ "grad_norm": 2.0428097248077393,
1604
+ "learning_rate": 0.00018782041266440148,
1605
+ "loss": 43.3551,
1606
+ "step": 227
1607
+ },
1608
+ {
1609
+ "epoch": 0.02261456060305495,
1610
+ "grad_norm": 1.7376773357391357,
1611
+ "learning_rate": 0.00018771026279024877,
1612
+ "loss": 43.4487,
1613
+ "step": 228
1614
+ },
1615
+ {
1616
+ "epoch": 0.022713747272366594,
1617
+ "grad_norm": 2.0446527004241943,
1618
+ "learning_rate": 0.00018759964966875273,
1619
+ "loss": 43.5901,
1620
+ "step": 229
1621
+ },
1622
+ {
1623
+ "epoch": 0.02281293394167824,
1624
+ "grad_norm": 2.2305171489715576,
1625
+ "learning_rate": 0.0001874885738841237,
1626
+ "loss": 43.6051,
1627
+ "step": 230
1628
+ },
1629
+ {
1630
+ "epoch": 0.02291212061098988,
1631
+ "grad_norm": 1.8739280700683594,
1632
+ "learning_rate": 0.0001873770360230155,
1633
+ "loss": 43.397,
1634
+ "step": 231
1635
+ },
1636
+ {
1637
+ "epoch": 0.023011307280301527,
1638
+ "grad_norm": 1.6180644035339355,
1639
+ "learning_rate": 0.00018726503667452243,
1640
+ "loss": 43.5229,
1641
+ "step": 232
1642
+ },
1643
+ {
1644
+ "epoch": 0.023110493949613172,
1645
+ "grad_norm": 2.046315908432007,
1646
+ "learning_rate": 0.00018715257643017621,
1647
+ "loss": 43.6235,
1648
+ "step": 233
1649
+ },
1650
+ {
1651
+ "epoch": 0.023209680618924818,
1652
+ "grad_norm": 1.8750463724136353,
1653
+ "learning_rate": 0.00018703965588394276,
1654
+ "loss": 43.5124,
1655
+ "step": 234
1656
+ },
1657
+ {
1658
+ "epoch": 0.02330886728823646,
1659
+ "grad_norm": 1.9105528593063354,
1660
+ "learning_rate": 0.00018692627563221916,
1661
+ "loss": 43.467,
1662
+ "step": 235
1663
+ },
1664
+ {
1665
+ "epoch": 0.023408053957548105,
1666
+ "grad_norm": 1.710308313369751,
1667
+ "learning_rate": 0.00018681243627383042,
1668
+ "loss": 43.4374,
1669
+ "step": 236
1670
+ },
1671
+ {
1672
+ "epoch": 0.02350724062685975,
1673
+ "grad_norm": 1.8155282735824585,
1674
+ "learning_rate": 0.0001866981384100264,
1675
+ "loss": 43.3374,
1676
+ "step": 237
1677
+ },
1678
+ {
1679
+ "epoch": 0.023606427296171396,
1680
+ "grad_norm": 1.9001719951629639,
1681
+ "learning_rate": 0.0001865833826444785,
1682
+ "loss": 43.3604,
1683
+ "step": 238
1684
+ },
1685
+ {
1686
+ "epoch": 0.023705613965483038,
1687
+ "grad_norm": 1.7378791570663452,
1688
+ "learning_rate": 0.00018646816958327667,
1689
+ "loss": 43.3533,
1690
+ "step": 239
1691
+ },
1692
+ {
1693
+ "epoch": 0.023804800634794683,
1694
+ "grad_norm": 1.938234806060791,
1695
+ "learning_rate": 0.00018635249983492603,
1696
+ "loss": 43.3329,
1697
+ "step": 240
1698
+ },
1699
+ {
1700
+ "epoch": 0.02390398730410633,
1701
+ "grad_norm": 1.8653101921081543,
1702
+ "learning_rate": 0.00018623637401034366,
1703
+ "loss": 43.4116,
1704
+ "step": 241
1705
+ },
1706
+ {
1707
+ "epoch": 0.024003173973417974,
1708
+ "grad_norm": 2.307813882827759,
1709
+ "learning_rate": 0.00018611979272285556,
1710
+ "loss": 43.2887,
1711
+ "step": 242
1712
+ },
1713
+ {
1714
+ "epoch": 0.024102360642729616,
1715
+ "grad_norm": 1.9324138164520264,
1716
+ "learning_rate": 0.00018600275658819324,
1717
+ "loss": 43.3422,
1718
+ "step": 243
1719
+ },
1720
+ {
1721
+ "epoch": 0.02420154731204126,
1722
+ "grad_norm": 1.9545015096664429,
1723
+ "learning_rate": 0.00018588526622449048,
1724
+ "loss": 43.2783,
1725
+ "step": 244
1726
+ },
1727
+ {
1728
+ "epoch": 0.024300733981352907,
1729
+ "grad_norm": 2.312723159790039,
1730
+ "learning_rate": 0.00018576732225228013,
1731
+ "loss": 43.4625,
1732
+ "step": 245
1733
+ },
1734
+ {
1735
+ "epoch": 0.024399920650664552,
1736
+ "grad_norm": 1.7571669816970825,
1737
+ "learning_rate": 0.00018564892529449078,
1738
+ "loss": 43.2928,
1739
+ "step": 246
1740
+ },
1741
+ {
1742
+ "epoch": 0.024499107319976194,
1743
+ "grad_norm": 1.896710753440857,
1744
+ "learning_rate": 0.00018553007597644353,
1745
+ "loss": 43.2533,
1746
+ "step": 247
1747
+ },
1748
+ {
1749
+ "epoch": 0.02459829398928784,
1750
+ "grad_norm": 1.737964153289795,
1751
+ "learning_rate": 0.00018541077492584864,
1752
+ "loss": 43.638,
1753
+ "step": 248
1754
+ },
1755
+ {
1756
+ "epoch": 0.024697480658599485,
1757
+ "grad_norm": 1.7832790613174438,
1758
+ "learning_rate": 0.0001852910227728022,
1759
+ "loss": 43.5125,
1760
+ "step": 249
1761
+ },
1762
+ {
1763
+ "epoch": 0.02479666732791113,
1764
+ "grad_norm": 1.793320655822754,
1765
+ "learning_rate": 0.00018517082014978282,
1766
+ "loss": 43.5349,
1767
+ "step": 250
1768
+ },
1769
+ {
1770
+ "epoch": 0.024895853997222772,
1771
+ "grad_norm": 1.9972788095474243,
1772
+ "learning_rate": 0.00018505016769164833,
1773
+ "loss": 43.3333,
1774
+ "step": 251
1775
+ },
1776
+ {
1777
+ "epoch": 0.024995040666534418,
1778
+ "grad_norm": 1.9738582372665405,
1779
+ "learning_rate": 0.00018492906603563238,
1780
+ "loss": 43.4897,
1781
+ "step": 252
1782
+ },
1783
+ {
1784
+ "epoch": 0.025094227335846063,
1785
+ "grad_norm": 1.8204716444015503,
1786
+ "learning_rate": 0.0001848075158213411,
1787
+ "loss": 43.285,
1788
+ "step": 253
1789
+ },
1790
+ {
1791
+ "epoch": 0.02519341400515771,
1792
+ "grad_norm": 1.6776862144470215,
1793
+ "learning_rate": 0.00018468551769074964,
1794
+ "loss": 43.523,
1795
+ "step": 254
1796
+ },
1797
+ {
1798
+ "epoch": 0.02529260067446935,
1799
+ "grad_norm": 1.6923521757125854,
1800
+ "learning_rate": 0.00018456307228819897,
1801
+ "loss": 43.5013,
1802
+ "step": 255
1803
+ },
1804
+ {
1805
+ "epoch": 0.025391787343780996,
1806
+ "grad_norm": 2.2982516288757324,
1807
+ "learning_rate": 0.00018444018026039224,
1808
+ "loss": 43.2198,
1809
+ "step": 256
1810
+ },
1811
+ {
1812
+ "epoch": 0.02549097401309264,
1813
+ "grad_norm": 1.6277918815612793,
1814
+ "learning_rate": 0.00018431684225639155,
1815
+ "loss": 43.6655,
1816
+ "step": 257
1817
+ },
1818
+ {
1819
+ "epoch": 0.025590160682404287,
1820
+ "grad_norm": 1.9118983745574951,
1821
+ "learning_rate": 0.0001841930589276144,
1822
+ "loss": 43.4623,
1823
+ "step": 258
1824
+ },
1825
+ {
1826
+ "epoch": 0.02568934735171593,
1827
+ "grad_norm": 1.7027877569198608,
1828
+ "learning_rate": 0.0001840688309278304,
1829
+ "loss": 43.3977,
1830
+ "step": 259
1831
+ },
1832
+ {
1833
+ "epoch": 0.025788534021027574,
1834
+ "grad_norm": 2.1675608158111572,
1835
+ "learning_rate": 0.00018394415891315758,
1836
+ "loss": 43.1506,
1837
+ "step": 260
1838
+ },
1839
+ {
1840
+ "epoch": 0.02588772069033922,
1841
+ "grad_norm": 1.6530122756958008,
1842
+ "learning_rate": 0.00018381904354205914,
1843
+ "loss": 43.545,
1844
+ "step": 261
1845
+ },
1846
+ {
1847
+ "epoch": 0.02598690735965086,
1848
+ "grad_norm": 2.0166702270507812,
1849
+ "learning_rate": 0.0001836934854753399,
1850
+ "loss": 43.4604,
1851
+ "step": 262
1852
+ },
1853
+ {
1854
+ "epoch": 0.026086094028962507,
1855
+ "grad_norm": 1.9425437450408936,
1856
+ "learning_rate": 0.00018356748537614283,
1857
+ "loss": 43.2025,
1858
+ "step": 263
1859
+ },
1860
+ {
1861
+ "epoch": 0.026185280698274152,
1862
+ "grad_norm": 1.4773979187011719,
1863
+ "learning_rate": 0.00018344104390994543,
1864
+ "loss": 43.4627,
1865
+ "step": 264
1866
+ },
1867
+ {
1868
+ "epoch": 0.026284467367585797,
1869
+ "grad_norm": 1.9007498025894165,
1870
+ "learning_rate": 0.00018331416174455635,
1871
+ "loss": 43.3737,
1872
+ "step": 265
1873
+ },
1874
+ {
1875
+ "epoch": 0.02638365403689744,
1876
+ "grad_norm": 2.1626980304718018,
1877
+ "learning_rate": 0.00018318683955011192,
1878
+ "loss": 43.5208,
1879
+ "step": 266
1880
+ },
1881
+ {
1882
+ "epoch": 0.026482840706209085,
1883
+ "grad_norm": 2.390268564224243,
1884
+ "learning_rate": 0.00018305907799907233,
1885
+ "loss": 43.3615,
1886
+ "step": 267
1887
+ },
1888
+ {
1889
+ "epoch": 0.02658202737552073,
1890
+ "grad_norm": 1.826054573059082,
1891
+ "learning_rate": 0.00018293087776621836,
1892
+ "loss": 43.3623,
1893
+ "step": 268
1894
+ },
1895
+ {
1896
+ "epoch": 0.026681214044832376,
1897
+ "grad_norm": 1.793283224105835,
1898
+ "learning_rate": 0.00018280223952864781,
1899
+ "loss": 43.2613,
1900
+ "step": 269
1901
+ },
1902
+ {
1903
+ "epoch": 0.026780400714144018,
1904
+ "grad_norm": 1.8169645071029663,
1905
+ "learning_rate": 0.00018267316396577166,
1906
+ "loss": 43.3729,
1907
+ "step": 270
1908
+ },
1909
+ {
1910
+ "epoch": 0.026879587383455663,
1911
+ "grad_norm": 1.6684504747390747,
1912
+ "learning_rate": 0.0001825436517593107,
1913
+ "loss": 43.4644,
1914
+ "step": 271
1915
+ },
1916
+ {
1917
+ "epoch": 0.02697877405276731,
1918
+ "grad_norm": 1.916520357131958,
1919
+ "learning_rate": 0.00018241370359329192,
1920
+ "loss": 43.6573,
1921
+ "step": 272
1922
+ },
1923
+ {
1924
+ "epoch": 0.027077960722078954,
1925
+ "grad_norm": 1.9397066831588745,
1926
+ "learning_rate": 0.0001822833201540449,
1927
+ "loss": 43.4218,
1928
+ "step": 273
1929
+ },
1930
+ {
1931
+ "epoch": 0.027177147391390596,
1932
+ "grad_norm": 1.697509765625,
1933
+ "learning_rate": 0.000182152502130198,
1934
+ "loss": 43.4535,
1935
+ "step": 274
1936
+ },
1937
+ {
1938
+ "epoch": 0.02727633406070224,
1939
+ "grad_norm": 1.9047552347183228,
1940
+ "learning_rate": 0.000182021250212675,
1941
+ "loss": 43.4582,
1942
+ "step": 275
1943
+ },
1944
+ {
1945
+ "epoch": 0.027375520730013887,
1946
+ "grad_norm": 2.150933265686035,
1947
+ "learning_rate": 0.00018188956509469125,
1948
+ "loss": 43.3068,
1949
+ "step": 276
1950
+ },
1951
+ {
1952
+ "epoch": 0.027474707399325532,
1953
+ "grad_norm": 2.0963590145111084,
1954
+ "learning_rate": 0.00018175744747175008,
1955
+ "loss": 43.3796,
1956
+ "step": 277
1957
+ },
1958
+ {
1959
+ "epoch": 0.027573894068637174,
1960
+ "grad_norm": 2.0096163749694824,
1961
+ "learning_rate": 0.0001816248980416392,
1962
+ "loss": 43.4927,
1963
+ "step": 278
1964
+ },
1965
+ {
1966
+ "epoch": 0.02767308073794882,
1967
+ "grad_norm": 2.438795566558838,
1968
+ "learning_rate": 0.0001814919175044268,
1969
+ "loss": 43.2995,
1970
+ "step": 279
1971
+ },
1972
+ {
1973
+ "epoch": 0.027772267407260465,
1974
+ "grad_norm": 1.8637646436691284,
1975
+ "learning_rate": 0.00018135850656245808,
1976
+ "loss": 43.4106,
1977
+ "step": 280
1978
+ },
1979
+ {
1980
+ "epoch": 0.02787145407657211,
1981
+ "grad_norm": 1.6904587745666504,
1982
+ "learning_rate": 0.00018122466592035148,
1983
+ "loss": 43.3585,
1984
+ "step": 281
1985
+ },
1986
+ {
1987
+ "epoch": 0.027970640745883752,
1988
+ "grad_norm": 1.963031530380249,
1989
+ "learning_rate": 0.00018109039628499483,
1990
+ "loss": 43.326,
1991
+ "step": 282
1992
+ },
1993
+ {
1994
+ "epoch": 0.028069827415195397,
1995
+ "grad_norm": 1.8556482791900635,
1996
+ "learning_rate": 0.00018095569836554178,
1997
+ "loss": 43.4717,
1998
+ "step": 283
1999
+ },
2000
+ {
2001
+ "epoch": 0.028169014084507043,
2002
+ "grad_norm": 2.059662103652954,
2003
+ "learning_rate": 0.000180820572873408,
2004
+ "loss": 43.3201,
2005
+ "step": 284
2006
+ },
2007
+ {
2008
+ "epoch": 0.02826820075381869,
2009
+ "grad_norm": 1.9710227251052856,
2010
+ "learning_rate": 0.00018068502052226733,
2011
+ "loss": 43.548,
2012
+ "step": 285
2013
+ },
2014
+ {
2015
+ "epoch": 0.02836738742313033,
2016
+ "grad_norm": 1.7940763235092163,
2017
+ "learning_rate": 0.0001805490420280482,
2018
+ "loss": 43.4886,
2019
+ "step": 286
2020
+ },
2021
+ {
2022
+ "epoch": 0.028466574092441976,
2023
+ "grad_norm": 1.9778364896774292,
2024
+ "learning_rate": 0.00018041263810892973,
2025
+ "loss": 43.5431,
2026
+ "step": 287
2027
+ },
2028
+ {
2029
+ "epoch": 0.02856576076175362,
2030
+ "grad_norm": 2.030728340148926,
2031
+ "learning_rate": 0.0001802758094853378,
2032
+ "loss": 43.2431,
2033
+ "step": 288
2034
+ },
2035
+ {
2036
+ "epoch": 0.028664947431065266,
2037
+ "grad_norm": 2.230532169342041,
2038
+ "learning_rate": 0.00018013855687994164,
2039
+ "loss": 43.2208,
2040
+ "step": 289
2041
+ },
2042
+ {
2043
+ "epoch": 0.02876413410037691,
2044
+ "grad_norm": 1.9482969045639038,
2045
+ "learning_rate": 0.00018000088101764953,
2046
+ "loss": 43.5798,
2047
+ "step": 290
2048
+ },
2049
+ {
2050
+ "epoch": 0.028863320769688554,
2051
+ "grad_norm": 1.7520207166671753,
2052
+ "learning_rate": 0.00017986278262560536,
2053
+ "loss": 43.5962,
2054
+ "step": 291
2055
+ },
2056
+ {
2057
+ "epoch": 0.0289625074390002,
2058
+ "grad_norm": 1.9360411167144775,
2059
+ "learning_rate": 0.00017972426243318456,
2060
+ "loss": 43.5776,
2061
+ "step": 292
2062
+ },
2063
+ {
2064
+ "epoch": 0.02906169410831184,
2065
+ "grad_norm": 1.9918673038482666,
2066
+ "learning_rate": 0.0001795853211719904,
2067
+ "loss": 43.3685,
2068
+ "step": 293
2069
+ },
2070
+ {
2071
+ "epoch": 0.029160880777623487,
2072
+ "grad_norm": 1.8499594926834106,
2073
+ "learning_rate": 0.00017944595957584999,
2074
+ "loss": 43.2545,
2075
+ "step": 294
2076
+ },
2077
+ {
2078
+ "epoch": 0.029260067446935132,
2079
+ "grad_norm": 1.7617238759994507,
2080
+ "learning_rate": 0.00017930617838081046,
2081
+ "loss": 43.2486,
2082
+ "step": 295
2083
+ },
2084
+ {
2085
+ "epoch": 0.029359254116246777,
2086
+ "grad_norm": 1.8923050165176392,
2087
+ "learning_rate": 0.0001791659783251351,
2088
+ "loss": 43.1631,
2089
+ "step": 296
2090
+ },
2091
+ {
2092
+ "epoch": 0.02945844078555842,
2093
+ "grad_norm": 2.038088798522949,
2094
+ "learning_rate": 0.00017902536014929946,
2095
+ "loss": 43.249,
2096
+ "step": 297
2097
+ },
2098
+ {
2099
+ "epoch": 0.029557627454870065,
2100
+ "grad_norm": 1.8649237155914307,
2101
+ "learning_rate": 0.00017888432459598744,
2102
+ "loss": 43.5624,
2103
+ "step": 298
2104
+ },
2105
+ {
2106
+ "epoch": 0.02965681412418171,
2107
+ "grad_norm": 2.608098268508911,
2108
+ "learning_rate": 0.0001787428724100872,
2109
+ "loss": 43.3078,
2110
+ "step": 299
2111
+ },
2112
+ {
2113
+ "epoch": 0.029756000793493356,
2114
+ "grad_norm": 2.03300404548645,
2115
+ "learning_rate": 0.00017860100433868755,
2116
+ "loss": 43.3856,
2117
+ "step": 300
2118
+ },
2119
+ {
2120
+ "epoch": 0.029855187462804997,
2121
+ "grad_norm": 1.9915575981140137,
2122
+ "learning_rate": 0.0001784587211310737,
2123
+ "loss": 43.5435,
2124
+ "step": 301
2125
+ },
2126
+ {
2127
+ "epoch": 0.029954374132116643,
2128
+ "grad_norm": 1.8422963619232178,
2129
+ "learning_rate": 0.00017831602353872357,
2130
+ "loss": 43.4246,
2131
+ "step": 302
2132
+ },
2133
+ {
2134
+ "epoch": 0.03005356080142829,
2135
+ "grad_norm": 2.0159566402435303,
2136
+ "learning_rate": 0.00017817291231530348,
2137
+ "loss": 43.4693,
2138
+ "step": 303
2139
+ },
2140
+ {
2141
+ "epoch": 0.030152747470739934,
2142
+ "grad_norm": 2.0639212131500244,
2143
+ "learning_rate": 0.00017802938821666458,
2144
+ "loss": 43.3722,
2145
+ "step": 304
2146
+ },
2147
+ {
2148
+ "epoch": 0.030251934140051576,
2149
+ "grad_norm": 1.985589623451233,
2150
+ "learning_rate": 0.00017788545200083847,
2151
+ "loss": 43.1785,
2152
+ "step": 305
2153
+ },
2154
+ {
2155
+ "epoch": 0.03035112080936322,
2156
+ "grad_norm": 1.6914057731628418,
2157
+ "learning_rate": 0.00017774110442803347,
2158
+ "loss": 43.4961,
2159
+ "step": 306
2160
+ },
2161
+ {
2162
+ "epoch": 0.030450307478674866,
2163
+ "grad_norm": 2.404210329055786,
2164
+ "learning_rate": 0.0001775963462606305,
2165
+ "loss": 43.4228,
2166
+ "step": 307
2167
+ },
2168
+ {
2169
+ "epoch": 0.030549494147986512,
2170
+ "grad_norm": 2.215900182723999,
2171
+ "learning_rate": 0.000177451178263179,
2172
+ "loss": 43.4482,
2173
+ "step": 308
2174
+ },
2175
+ {
2176
+ "epoch": 0.030648680817298154,
2177
+ "grad_norm": 1.7335083484649658,
2178
+ "learning_rate": 0.00017730560120239307,
2179
+ "loss": 43.5255,
2180
+ "step": 309
2181
+ },
2182
+ {
2183
+ "epoch": 0.0307478674866098,
2184
+ "grad_norm": 2.051217794418335,
2185
+ "learning_rate": 0.0001771596158471472,
2186
+ "loss": 43.3276,
2187
+ "step": 310
2188
+ },
2189
+ {
2190
+ "epoch": 0.030847054155921445,
2191
+ "grad_norm": 1.7068490982055664,
2192
+ "learning_rate": 0.00017701322296847236,
2193
+ "loss": 43.2431,
2194
+ "step": 311
2195
+ },
2196
+ {
2197
+ "epoch": 0.03094624082523309,
2198
+ "grad_norm": 1.6466596126556396,
2199
+ "learning_rate": 0.00017686642333955183,
2200
+ "loss": 43.3234,
2201
+ "step": 312
2202
+ },
2203
+ {
2204
+ "epoch": 0.031045427494544732,
2205
+ "grad_norm": 1.9981932640075684,
2206
+ "learning_rate": 0.00017671921773571727,
2207
+ "loss": 43.3187,
2208
+ "step": 313
2209
+ },
2210
+ {
2211
+ "epoch": 0.031144614163856377,
2212
+ "grad_norm": 1.7652865648269653,
2213
+ "learning_rate": 0.0001765716069344444,
2214
+ "loss": 43.1768,
2215
+ "step": 314
2216
+ },
2217
+ {
2218
+ "epoch": 0.031243800833168023,
2219
+ "grad_norm": 1.8989441394805908,
2220
+ "learning_rate": 0.0001764235917153491,
2221
+ "loss": 43.2334,
2222
+ "step": 315
2223
+ },
2224
+ {
2225
+ "epoch": 0.031342987502479665,
2226
+ "grad_norm": 2.141063928604126,
2227
+ "learning_rate": 0.00017627517286018321,
2228
+ "loss": 43.288,
2229
+ "step": 316
2230
+ },
2231
+ {
2232
+ "epoch": 0.031442174171791314,
2233
+ "grad_norm": 2.336796522140503,
2234
+ "learning_rate": 0.0001761263511528303,
2235
+ "loss": 43.5281,
2236
+ "step": 317
2237
+ },
2238
+ {
2239
+ "epoch": 0.031541360841102956,
2240
+ "grad_norm": 2.18377685546875,
2241
+ "learning_rate": 0.00017597712737930178,
2242
+ "loss": 43.5268,
2243
+ "step": 318
2244
+ },
2245
+ {
2246
+ "epoch": 0.0316405475104146,
2247
+ "grad_norm": 2.292229652404785,
2248
+ "learning_rate": 0.00017582750232773248,
2249
+ "loss": 43.4562,
2250
+ "step": 319
2251
+ },
2252
+ {
2253
+ "epoch": 0.031739734179726246,
2254
+ "grad_norm": 1.9269510507583618,
2255
+ "learning_rate": 0.0001756774767883767,
2256
+ "loss": 43.35,
2257
+ "step": 320
2258
+ },
2259
+ {
2260
+ "epoch": 0.03183892084903789,
2261
+ "grad_norm": 1.8633321523666382,
2262
+ "learning_rate": 0.00017552705155360378,
2263
+ "loss": 43.3391,
2264
+ "step": 321
2265
+ },
2266
+ {
2267
+ "epoch": 0.03193810751834954,
2268
+ "grad_norm": 2.1785457134246826,
2269
+ "learning_rate": 0.00017537622741789429,
2270
+ "loss": 43.6481,
2271
+ "step": 322
2272
+ },
2273
+ {
2274
+ "epoch": 0.03203729418766118,
2275
+ "grad_norm": 1.988545298576355,
2276
+ "learning_rate": 0.00017522500517783547,
2277
+ "loss": 43.1767,
2278
+ "step": 323
2279
+ },
2280
+ {
2281
+ "epoch": 0.03213648085697282,
2282
+ "grad_norm": 1.80686354637146,
2283
+ "learning_rate": 0.0001750733856321172,
2284
+ "loss": 43.2304,
2285
+ "step": 324
2286
+ },
2287
+ {
2288
+ "epoch": 0.03223566752628447,
2289
+ "grad_norm": 1.8045567274093628,
2290
+ "learning_rate": 0.00017492136958152784,
2291
+ "loss": 43.4061,
2292
+ "step": 325
2293
+ },
2294
+ {
2295
+ "epoch": 0.03233485419559611,
2296
+ "grad_norm": 1.9179420471191406,
2297
+ "learning_rate": 0.00017476895782894979,
2298
+ "loss": 43.2181,
2299
+ "step": 326
2300
+ },
2301
+ {
2302
+ "epoch": 0.032434040864907754,
2303
+ "grad_norm": 1.8775529861450195,
2304
+ "learning_rate": 0.00017461615117935546,
2305
+ "loss": 43.5342,
2306
+ "step": 327
2307
+ },
2308
+ {
2309
+ "epoch": 0.0325332275342194,
2310
+ "grad_norm": 1.6033300161361694,
2311
+ "learning_rate": 0.0001744629504398029,
2312
+ "loss": 43.4323,
2313
+ "step": 328
2314
+ },
2315
+ {
2316
+ "epoch": 0.032632414203531045,
2317
+ "grad_norm": 1.9226906299591064,
2318
+ "learning_rate": 0.00017430935641943163,
2319
+ "loss": 43.5733,
2320
+ "step": 329
2321
+ },
2322
+ {
2323
+ "epoch": 0.03273160087284269,
2324
+ "grad_norm": 2.0332751274108887,
2325
+ "learning_rate": 0.00017415536992945816,
2326
+ "loss": 43.1741,
2327
+ "step": 330
2328
+ },
2329
+ {
2330
+ "epoch": 0.032830787542154335,
2331
+ "grad_norm": 2.020566701889038,
2332
+ "learning_rate": 0.00017400099178317203,
2333
+ "loss": 43.4961,
2334
+ "step": 331
2335
+ },
2336
+ {
2337
+ "epoch": 0.03292997421146598,
2338
+ "grad_norm": 2.0364789962768555,
2339
+ "learning_rate": 0.00017384622279593122,
2340
+ "loss": 43.143,
2341
+ "step": 332
2342
+ },
2343
+ {
2344
+ "epoch": 0.033029160880777626,
2345
+ "grad_norm": 1.9677212238311768,
2346
+ "learning_rate": 0.000173691063785158,
2347
+ "loss": 43.5331,
2348
+ "step": 333
2349
+ },
2350
+ {
2351
+ "epoch": 0.03312834755008927,
2352
+ "grad_norm": 1.6932072639465332,
2353
+ "learning_rate": 0.0001735355155703346,
2354
+ "loss": 43.3948,
2355
+ "step": 334
2356
+ },
2357
+ {
2358
+ "epoch": 0.03322753421940091,
2359
+ "grad_norm": 2.192953109741211,
2360
+ "learning_rate": 0.00017337957897299884,
2361
+ "loss": 43.5575,
2362
+ "step": 335
2363
+ },
2364
+ {
2365
+ "epoch": 0.03332672088871256,
2366
+ "grad_norm": 1.7866209745407104,
2367
+ "learning_rate": 0.00017322325481673975,
2368
+ "loss": 43.2401,
2369
+ "step": 336
2370
+ },
2371
+ {
2372
+ "epoch": 0.0334259075580242,
2373
+ "grad_norm": 1.863152265548706,
2374
+ "learning_rate": 0.00017306654392719337,
2375
+ "loss": 43.4706,
2376
+ "step": 337
2377
+ },
2378
+ {
2379
+ "epoch": 0.03352509422733584,
2380
+ "grad_norm": 1.6541751623153687,
2381
+ "learning_rate": 0.00017290944713203822,
2382
+ "loss": 43.2841,
2383
+ "step": 338
2384
+ },
2385
+ {
2386
+ "epoch": 0.03362428089664749,
2387
+ "grad_norm": 2.455803394317627,
2388
+ "learning_rate": 0.00017275196526099108,
2389
+ "loss": 43.0331,
2390
+ "step": 339
2391
+ },
2392
+ {
2393
+ "epoch": 0.033723467565959134,
2394
+ "grad_norm": 1.9039475917816162,
2395
+ "learning_rate": 0.00017259409914580242,
2396
+ "loss": 43.3408,
2397
+ "step": 340
2398
+ },
2399
+ {
2400
+ "epoch": 0.03382265423527078,
2401
+ "grad_norm": 1.642409324645996,
2402
+ "learning_rate": 0.00017243584962025224,
2403
+ "loss": 43.1996,
2404
+ "step": 341
2405
+ },
2406
+ {
2407
+ "epoch": 0.033921840904582425,
2408
+ "grad_norm": 1.591845154762268,
2409
+ "learning_rate": 0.00017227721752014548,
2410
+ "loss": 43.2826,
2411
+ "step": 342
2412
+ },
2413
+ {
2414
+ "epoch": 0.034021027573894067,
2415
+ "grad_norm": 1.970920443534851,
2416
+ "learning_rate": 0.0001721182036833077,
2417
+ "loss": 43.5663,
2418
+ "step": 343
2419
+ },
2420
+ {
2421
+ "epoch": 0.034120214243205715,
2422
+ "grad_norm": 2.2533814907073975,
2423
+ "learning_rate": 0.00017195880894958063,
2424
+ "loss": 42.8656,
2425
+ "step": 344
2426
+ },
2427
+ {
2428
+ "epoch": 0.03421940091251736,
2429
+ "grad_norm": 1.8976967334747314,
2430
+ "learning_rate": 0.00017179903416081763,
2431
+ "loss": 43.4036,
2432
+ "step": 345
2433
+ },
2434
+ {
2435
+ "epoch": 0.03421940091251736,
2436
+ "eval_loss": 10.84935188293457,
2437
+ "eval_runtime": 11.4279,
2438
+ "eval_samples_per_second": 371.46,
2439
+ "eval_steps_per_second": 185.774,
2440
+ "step": 345
2441
+ }
2442
+ ],
2443
+ "logging_steps": 1,
2444
+ "max_steps": 1377,
2445
+ "num_input_tokens_seen": 0,
2446
+ "num_train_epochs": 1,
2447
+ "save_steps": 345,
2448
+ "stateful_callbacks": {
2449
+ "TrainerControl": {
2450
+ "args": {
2451
+ "should_epoch_stop": false,
2452
+ "should_evaluate": false,
2453
+ "should_log": false,
2454
+ "should_save": true,
2455
+ "should_training_stop": false
2456
+ },
2457
+ "attributes": {}
2458
+ }
2459
+ },
2460
+ "total_flos": 3361213513728.0,
2461
+ "train_batch_size": 2,
2462
+ "trial_name": null,
2463
+ "trial_params": null
2464
+ }
last-checkpoint/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f15b7133227076cd1867948b59f82ed8ae1cf3bea5fc6355dcfecaff9e397731
3
+ size 6776