Askinkaty commited on
Commit
7743cfd
·
verified ·
1 Parent(s): 3d479ef

Upload 17 files

Browse files
README.md CHANGED
@@ -1,139 +1,104 @@
1
  ---
2
  base_model: meta-llama/Llama-3.2-1B-Instruct
3
  library_name: peft
4
- license: apache-2.0
5
- language:
6
- - en
7
- metrics:
8
- - accuracy
9
- tags:
10
- - finance
11
- - relation_extraction
12
- - relation_types
13
  ---
14
 
 
15
 
16
- ## Model Description
 
 
 
 
 
 
17
 
18
  <!-- Provide a longer summary of what this model is. -->
19
 
20
 
21
- - **Finetuned from model:** Llama-3.2-1B-Instruct
22
 
 
 
 
 
 
 
 
23
 
 
24
 
25
- ## Downstream Use
26
 
27
- Model for predicting relations between entities in the financial documents.
 
 
28
 
29
- ### Relation Types
30
- - no_relation
31
- - title
32
- - operations_in
33
- - employee_of
34
- - agreement_with
35
- - formed_on
36
- - member_of
37
- - subsidiary_of
38
- - shares_of
39
- - revenue_of
40
- - loss_of
41
- - headquartered_in
42
- - acquired_on
43
- - founder_of
44
- - formed_in
45
 
46
- ## How to Get Started with the Model
47
 
48
- ```python
49
- import torch
50
- from peft import AutoPeftModelForCausalLM
51
- from transformers import AutoTokenizer, pipeline
52
 
53
- # Load Model with PEFT adapter
54
 
55
- finetune_name = 'Askinkaty/llama-finance-relations'
56
 
57
- finetined_model = AutoPeftModelForCausalLM.from_pretrained(
58
- pretrained_model_name_or_path=finetune_name,
59
- torch_dtype=torch.float16,
60
- low_cpu_mem_usage=True,
61
- )
62
 
 
63
 
64
- base_model = "meta-llama/Llama-3.2-1B-Instruct"
65
- tokenizer = AutoTokenizer.from_pretrained(model_name)
66
- base_model.config.pad_token_id = base_model.config.eos_token_id
67
 
 
68
 
69
- pipeline = pipeline('text-generation', model=base_model, tokenizer=tokenizer)
70
- pipeline.model = model.to(device)
71
 
 
72
 
73
- ```
74
 
 
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
 
 
 
 
 
 
 
 
79
 
80
- Samples from [ReFinD dataset](https://refind-re.github.io/). 100 examples for each relation type were used, least frequent relation types are omitted.
81
 
 
82
 
83
- #### Preprocessing
84
 
85
- Dataset is converted into a message format as in the code snippet below:
86
 
87
- ```python
88
- def batch_convert_to_messages(data):
89
-
90
- questions = data.apply(
91
- lambda x: f"Entity 1: {' '.join(x['token'][x['e1_start']:x['e1_end']])}. "
92
- f"Entity 2: {' '.join(x['token'][x['e2_start']:x['e2_end']])}. "
93
- f"Input sentence: {' '.join(x['token'])}",
94
- axis=1
95
- )
96
 
97
- relations = data['relation'].apply(lambda relation: relation.split(':')[-1])
98
-
99
- messages = [
100
- [
101
- {
102
- "role": "system",
103
- "content": "You are an expert in financial documentation and market analysis. Define relations between two specified entities: entity 1 [E1] and entity 2 [E2] in a sentence. Return a short response of the required format. "
104
- },
105
- {"role": "user", "content": question},
106
- {"role": "assistant", "content": relation},
107
- ]
108
- for question, relation in zip(questions, relations)
109
- ]
110
-
111
- return messages
112
- ```
113
 
 
114
 
 
115
 
116
 
117
  #### Training Hyperparameters
118
 
119
- SFT parameters:
120
- - num_train_epochs=1
121
- - per_device_train_batch_size=2
122
- - gradient_accumulation_steps=2
123
- - gradient_checkpointing=True
124
- - optim="adamw_torch_fused"
125
- - learning_rate=2e-4
126
- - max_grad_norm=0.3
127
- - warmup_ratio=0.01
128
- - lr_scheduler_type="cosine"
129
- - bf16=True
130
 
131
- LORA parameters:
132
- - rank_dimension = 6
133
- - lora_alpha = 8
134
- - lora_dropout = 0.05
135
 
 
136
 
 
137
 
138
  ## Evaluation
139
 
@@ -143,18 +108,95 @@ LORA parameters:
143
 
144
  #### Testing Data
145
 
146
- Test set sampled from Samples from [ReFinD dataset](https://refind-re.github.io/).
 
 
 
 
147
 
 
 
 
148
 
149
  #### Metrics
150
 
151
- Accuracy. Other metrics: work in progress.
 
 
152
 
153
  ### Results
154
 
155
- Accuracy before clearning the output: 0.41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
- Accuracy after clearning the output: 0.62
158
 
 
 
159
 
160
  - PEFT 0.14.0
 
1
  ---
2
  base_model: meta-llama/Llama-3.2-1B-Instruct
3
  library_name: peft
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
+ # Model Card for Model ID
7
 
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
 
16
  <!-- Provide a longer summary of what this model is. -->
17
 
18
 
 
19
 
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
+ ### Model Sources [optional]
29
 
30
+ <!-- Provide the basic links for the model. -->
31
 
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
 
36
+ ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
+ ### Direct Use
 
 
 
41
 
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
+ [More Information Needed]
45
 
46
+ ### Downstream Use [optional]
 
 
 
 
47
 
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
+ [More Information Needed]
 
 
51
 
52
+ ### Out-of-Scope Use
53
 
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
55
 
56
+ [More Information Needed]
57
 
58
+ ## Bias, Risks, and Limitations
59
 
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
+ [More Information Needed]
63
 
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
 
76
+ ## Training Details
77
 
78
+ ### Training Data
79
 
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
+ [More Information Needed]
83
 
84
+ ### Training Procedure
 
 
 
 
 
 
 
 
85
 
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ #### Preprocessing [optional]
89
 
90
+ [More Information Needed]
91
 
92
 
93
  #### Training Hyperparameters
94
 
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
96
 
97
+ #### Speeds, Sizes, Times [optional]
 
 
 
98
 
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
+ [More Information Needed]
102
 
103
  ## Evaluation
104
 
 
108
 
109
  #### Testing Data
110
 
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
 
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
 
121
  #### Metrics
122
 
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
 
127
  ### Results
128
 
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
 
197
+ ## Model Card Contact
198
 
199
+ [More Information Needed]
200
+ ### Framework versions
201
 
202
  - PEFT 0.14.0
adapter_config.json CHANGED
@@ -23,13 +23,13 @@
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
- "k_proj",
27
- "up_proj",
28
- "o_proj",
29
- "gate_proj",
30
  "v_proj",
 
31
  "q_proj",
32
- "down_proj"
 
 
 
33
  ],
34
  "task_type": "CAUSAL_LM",
35
  "use_dora": false,
 
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
 
 
 
 
26
  "v_proj",
27
+ "up_proj",
28
  "q_proj",
29
+ "gate_proj",
30
+ "k_proj",
31
+ "down_proj",
32
+ "o_proj"
33
  ],
34
  "task_type": "CAUSAL_LM",
35
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:93a96bd9a22e1bc9ba99b9f303bb95318c937238d6217e06707c1d5ffff21001
3
  size 2118301632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbfc469fc0f359611284b9a33ffea89a4c2cd108a6411a23c605797eac90d6ef
3
  size 2118301632
model-00001-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:81b62127279b14b81dc5ba337f179c0eff707e894f824c7b9a1bf889c671bc2b
3
  size 1990270808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e876dadb94eaf20db78c2ca1778580419eacb10c8032d42f5396d446ce18a1c2
3
  size 1990270808
model-00002-of-00003.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d1b19c21a12ba689d7772537834f3a206f26546166543d64fcbe078c5315f950
3
  size 1006719368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c2098a54b6104c08f791ffd311b2c8b264f47830750fc2f261e3f9f8efab622
3
  size 1006719368
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b5a53107f741c6fba66c4467e45596e9a583177380e53e494035d8463be6b38
3
  size 34007674
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60ec98f4495156924b88229bdbe56f8bb1202e5a8203e76105d5d2b0acc01442
3
  size 34007674
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0025193221626e8e44624e550369b44ae92802c57be84c04b10bc0ca73e990e8
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0f3a6090e681048a6c2caa8a2f14f616e5a16d68b188a93348c4482d6393a0c
3
  size 14244
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b799fc6e89913f6020bade7e21eca60ed0da11a47da89511a1092cf00a63e990
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c92fc546447cfc1151596b19e46f7088e39100f9de60871cd8a0f9b1a8d1df9
3
  size 1064
trainer_state.json CHANGED
@@ -1,268 +1,142 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.9986431478968792,
5
  "eval_steps": 500,
6
- "global_step": 368,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.027137042062415198,
13
- "grad_norm": 1.4497582912445068,
14
- "learning_rate": 0.0001998659482680456,
15
- "loss": 3.4336,
16
- "step": 10
17
- },
18
- {
19
- "epoch": 0.054274084124830396,
20
- "grad_norm": 1.846526861190796,
21
  "learning_rate": 0.00019904804439875633,
22
- "loss": 1.9601,
23
- "step": 20
24
- },
25
- {
26
- "epoch": 0.0814111261872456,
27
- "grad_norm": 0.9556183815002441,
28
- "learning_rate": 0.00019749279121818235,
29
- "loss": 1.202,
30
- "step": 30
31
  },
32
  {
33
- "epoch": 0.10854816824966079,
34
- "grad_norm": 1.0818341970443726,
35
  "learning_rate": 0.00019521176659107142,
36
- "loss": 1.1366,
37
- "step": 40
38
- },
39
- {
40
- "epoch": 0.13568521031207598,
41
- "grad_norm": 0.9879324436187744,
42
- "learning_rate": 0.00019222195128618106,
43
- "loss": 1.0269,
44
- "step": 50
45
  },
46
  {
47
- "epoch": 0.1628222523744912,
48
- "grad_norm": 1.1216270923614502,
49
  "learning_rate": 0.000188545602565321,
50
- "loss": 1.0766,
51
- "step": 60
52
- },
53
- {
54
- "epoch": 0.18995929443690637,
55
- "grad_norm": 1.1110620498657227,
56
- "learning_rate": 0.00018421008849228118,
57
- "loss": 0.9947,
58
- "step": 70
59
  },
60
  {
61
- "epoch": 0.21709633649932158,
62
- "grad_norm": 0.8764580488204956,
63
  "learning_rate": 0.00017924768419510904,
64
- "loss": 1.0024,
65
- "step": 80
66
- },
67
- {
68
- "epoch": 0.24423337856173677,
69
- "grad_norm": 1.2004687786102295,
70
- "learning_rate": 0.00017369533159843369,
71
- "loss": 1.04,
72
- "step": 90
73
  },
74
  {
75
- "epoch": 0.27137042062415195,
76
- "grad_norm": 0.9665369391441345,
77
  "learning_rate": 0.00016759436441447545,
78
- "loss": 0.9809,
79
- "step": 100
80
- },
81
- {
82
- "epoch": 0.29850746268656714,
83
- "grad_norm": 1.5474181175231934,
84
- "learning_rate": 0.00016099020044000727,
85
- "loss": 0.9405,
86
- "step": 110
87
  },
88
  {
89
- "epoch": 0.3256445047489824,
90
- "grad_norm": 1.014674186706543,
91
  "learning_rate": 0.00015393200344991995,
92
- "loss": 0.9054,
93
- "step": 120
94
- },
95
- {
96
- "epoch": 0.35278154681139756,
97
- "grad_norm": 1.013979434967041,
98
- "learning_rate": 0.00014647231720437686,
99
- "loss": 0.9754,
100
- "step": 130
101
  },
102
  {
103
- "epoch": 0.37991858887381275,
104
- "grad_norm": 1.0418416261672974,
105
  "learning_rate": 0.0001386666742941419,
106
- "loss": 0.8711,
107
- "step": 140
108
- },
109
- {
110
- "epoch": 0.40705563093622793,
111
- "grad_norm": 0.9280593395233154,
112
- "learning_rate": 0.0001305731827359753,
113
- "loss": 0.8358,
114
- "step": 150
115
  },
116
  {
117
- "epoch": 0.43419267299864317,
118
- "grad_norm": 1.0268974304199219,
119
  "learning_rate": 0.00012225209339563145,
120
- "loss": 0.8568,
121
- "step": 160
122
- },
123
- {
124
- "epoch": 0.46132971506105835,
125
- "grad_norm": 0.9812881350517273,
126
- "learning_rate": 0.00011376535145871684,
127
- "loss": 0.888,
128
- "step": 170
129
  },
130
  {
131
- "epoch": 0.48846675712347354,
132
- "grad_norm": 1.2109171152114868,
133
  "learning_rate": 0.00010517613528842097,
134
- "loss": 0.9166,
135
- "step": 180
136
- },
137
- {
138
- "epoch": 0.5156037991858887,
139
- "grad_norm": 0.9860134124755859,
140
- "learning_rate": 9.654838610302923e-05,
141
- "loss": 0.8357,
142
- "step": 190
143
  },
144
  {
145
- "epoch": 0.5427408412483039,
146
- "grad_norm": 1.1094295978546143,
147
  "learning_rate": 8.79463319744677e-05,
148
- "loss": 0.928,
149
- "step": 200
150
- },
151
- {
152
- "epoch": 0.5698778833107191,
153
- "grad_norm": 0.9386014342308044,
154
- "learning_rate": 7.943400969140635e-05,
155
- "loss": 0.9522,
156
- "step": 210
157
  },
158
  {
159
- "epoch": 0.5970149253731343,
160
- "grad_norm": 1.0436668395996094,
161
  "learning_rate": 7.107478804634325e-05,
162
- "loss": 0.8619,
163
- "step": 220
164
- },
165
- {
166
- "epoch": 0.6241519674355496,
167
- "grad_norm": 1.0143946409225464,
168
- "learning_rate": 6.293089609549325e-05,
169
- "loss": 0.8046,
170
- "step": 230
171
  },
172
  {
173
- "epoch": 0.6512890094979648,
174
- "grad_norm": 1.107064127922058,
175
  "learning_rate": 5.506295990328385e-05,
176
- "loss": 0.8532,
177
- "step": 240
178
- },
179
- {
180
- "epoch": 0.6784260515603799,
181
- "grad_norm": 0.8748170137405396,
182
- "learning_rate": 4.75295512200992e-05,
183
- "loss": 0.8276,
184
- "step": 250
185
  },
186
  {
187
- "epoch": 0.7055630936227951,
188
- "grad_norm": 1.0721319913864136,
189
  "learning_rate": 4.038675145307747e-05,
190
- "loss": 0.7517,
191
- "step": 260
192
- },
193
- {
194
- "epoch": 0.7327001356852103,
195
- "grad_norm": 1.1542731523513794,
196
- "learning_rate": 3.36877341759205e-05,
197
- "loss": 0.8542,
198
- "step": 270
199
  },
200
  {
201
- "epoch": 0.7598371777476255,
202
- "grad_norm": 0.89125657081604,
203
  "learning_rate": 2.7482369285662378e-05,
204
- "loss": 0.8601,
205
- "step": 280
206
- },
207
- {
208
- "epoch": 0.7869742198100407,
209
- "grad_norm": 0.9904446601867676,
210
- "learning_rate": 2.181685175319702e-05,
211
- "loss": 0.8787,
212
- "step": 290
213
  },
214
  {
215
- "epoch": 0.8141112618724559,
216
- "grad_norm": 0.9311710596084595,
217
  "learning_rate": 1.6733357731279377e-05,
218
- "loss": 0.7493,
219
- "step": 300
220
- },
221
- {
222
- "epoch": 0.841248303934871,
223
- "grad_norm": 1.0216141939163208,
224
- "learning_rate": 1.2269730580055805e-05,
225
- "loss": 0.7497,
226
- "step": 310
227
  },
228
  {
229
- "epoch": 0.8683853459972863,
230
- "grad_norm": 0.9951556324958801,
231
  "learning_rate": 8.45919914746337e-06,
232
- "loss": 0.8324,
233
- "step": 320
234
- },
235
- {
236
- "epoch": 0.8955223880597015,
237
- "grad_norm": 1.0728198289871216,
238
- "learning_rate": 5.3301304017194135e-06,
239
- "loss": 0.7964,
240
- "step": 330
241
  },
242
  {
243
- "epoch": 0.9226594301221167,
244
- "grad_norm": 1.1030397415161133,
245
  "learning_rate": 2.905818257394799e-06,
246
- "loss": 0.8166,
247
- "step": 340
248
- },
249
- {
250
- "epoch": 0.9497964721845319,
251
- "grad_norm": 0.974777102470398,
252
- "learning_rate": 1.2043101671253554e-06,
253
- "loss": 0.7952,
254
- "step": 350
255
  },
256
  {
257
- "epoch": 0.9769335142469471,
258
- "grad_norm": 1.078715443611145,
259
  "learning_rate": 2.382727698752474e-07,
260
- "loss": 0.8726,
261
- "step": 360
262
  }
263
  ],
264
  "logging_steps": 10,
265
- "max_steps": 368,
266
  "num_input_tokens_seen": 0,
267
  "num_train_epochs": 1,
268
  "save_steps": 500,
@@ -278,8 +152,8 @@
278
  "attributes": {}
279
  }
280
  },
281
- "total_flos": 1620605933346816.0,
282
- "train_batch_size": 2,
283
  "trial_name": null,
284
  "trial_params": null
285
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9993211133740665,
5
  "eval_steps": 500,
6
+ "global_step": 184,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.05431093007467753,
13
+ "grad_norm": 1.5025734901428223,
 
 
 
 
 
 
 
14
  "learning_rate": 0.00019904804439875633,
15
+ "loss": 3.2617,
16
+ "step": 10
 
 
 
 
 
 
 
17
  },
18
  {
19
+ "epoch": 0.10862186014935506,
20
+ "grad_norm": 1.5902663469314575,
21
  "learning_rate": 0.00019521176659107142,
22
+ "loss": 1.7882,
23
+ "step": 20
 
 
 
 
 
 
 
24
  },
25
  {
26
+ "epoch": 0.1629327902240326,
27
+ "grad_norm": 0.7277324795722961,
28
  "learning_rate": 0.000188545602565321,
29
+ "loss": 1.1819,
30
+ "step": 30
 
 
 
 
 
 
 
31
  },
32
  {
33
+ "epoch": 0.2172437202987101,
34
+ "grad_norm": 0.6814110279083252,
35
  "learning_rate": 0.00017924768419510904,
36
+ "loss": 1.0805,
37
+ "step": 40
 
 
 
 
 
 
 
38
  },
39
  {
40
+ "epoch": 0.27155465037338766,
41
+ "grad_norm": 0.6621173620223999,
42
  "learning_rate": 0.00016759436441447545,
43
+ "loss": 1.0806,
44
+ "step": 50
 
 
 
 
 
 
 
45
  },
46
  {
47
+ "epoch": 0.3258655804480652,
48
+ "grad_norm": 0.717393696308136,
49
  "learning_rate": 0.00015393200344991995,
50
+ "loss": 1.0157,
51
+ "step": 60
 
 
 
 
 
 
 
52
  },
53
  {
54
+ "epoch": 0.3801765105227427,
55
+ "grad_norm": 0.6353682279586792,
56
  "learning_rate": 0.0001386666742941419,
57
+ "loss": 1.0388,
58
+ "step": 70
 
 
 
 
 
 
 
59
  },
60
  {
61
+ "epoch": 0.4344874405974202,
62
+ "grad_norm": 0.7220941185951233,
63
  "learning_rate": 0.00012225209339563145,
64
+ "loss": 0.9547,
65
+ "step": 80
 
 
 
 
 
 
 
66
  },
67
  {
68
+ "epoch": 0.48879837067209775,
69
+ "grad_norm": 0.7532466650009155,
70
  "learning_rate": 0.00010517613528842097,
71
+ "loss": 1.0116,
72
+ "step": 90
 
 
 
 
 
 
 
73
  },
74
  {
75
+ "epoch": 0.5431093007467753,
76
+ "grad_norm": 0.8198474645614624,
77
  "learning_rate": 8.79463319744677e-05,
78
+ "loss": 0.9763,
79
+ "step": 100
 
 
 
 
 
 
 
80
  },
81
  {
82
+ "epoch": 0.5974202308214528,
83
+ "grad_norm": 0.79359370470047,
84
  "learning_rate": 7.107478804634325e-05,
85
+ "loss": 1.0034,
86
+ "step": 110
 
 
 
 
 
 
 
87
  },
88
  {
89
+ "epoch": 0.6517311608961304,
90
+ "grad_norm": 1.1620718240737915,
91
  "learning_rate": 5.506295990328385e-05,
92
+ "loss": 0.9125,
93
+ "step": 120
 
 
 
 
 
 
 
94
  },
95
  {
96
+ "epoch": 0.7060420909708078,
97
+ "grad_norm": 1.5092897415161133,
98
  "learning_rate": 4.038675145307747e-05,
99
+ "loss": 0.8748,
100
+ "step": 130
 
 
 
 
 
 
 
101
  },
102
  {
103
+ "epoch": 0.7603530210454854,
104
+ "grad_norm": 1.0216153860092163,
105
  "learning_rate": 2.7482369285662378e-05,
106
+ "loss": 0.9131,
107
+ "step": 140
 
 
 
 
 
 
 
108
  },
109
  {
110
+ "epoch": 0.814663951120163,
111
+ "grad_norm": 1.5323857069015503,
112
  "learning_rate": 1.6733357731279377e-05,
113
+ "loss": 0.8702,
114
+ "step": 150
 
 
 
 
 
 
 
115
  },
116
  {
117
+ "epoch": 0.8689748811948405,
118
+ "grad_norm": 0.962648332118988,
119
  "learning_rate": 8.45919914746337e-06,
120
+ "loss": 0.8396,
121
+ "step": 160
 
 
 
 
 
 
 
122
  },
123
  {
124
+ "epoch": 0.923285811269518,
125
+ "grad_norm": 0.7791914939880371,
126
  "learning_rate": 2.905818257394799e-06,
127
+ "loss": 0.8633,
128
+ "step": 170
 
 
 
 
 
 
 
129
  },
130
  {
131
+ "epoch": 0.9775967413441955,
132
+ "grad_norm": 0.8390964865684509,
133
  "learning_rate": 2.382727698752474e-07,
134
+ "loss": 0.8817,
135
+ "step": 180
136
  }
137
  ],
138
  "logging_steps": 10,
139
+ "max_steps": 184,
140
  "num_input_tokens_seen": 0,
141
  "num_train_epochs": 1,
142
  "save_steps": 500,
 
152
  "attributes": {}
153
  }
154
  },
155
+ "total_flos": 1445464568266752.0,
156
+ "train_batch_size": 1,
157
  "trial_name": null,
158
  "trial_params": null
159
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ad9d3a4e9a5aecdee049eba346c401401fe901b8db3ce260bbc237b3b3e9243
3
  size 5624
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d4a4dcbea433b28b388dcc17c84657d36933b48455101d1610f7855b760e64f
3
  size 5624