OsamaMo commited on
Commit
cbddbd5
Β·
verified Β·
1 Parent(s): ffdc083

Training in progress, step 2000

Browse files
LLaMA-Factory/wandb/run-20250305_233246-9ct1o6yk/files/output.log CHANGED
@@ -220,4 +220,226 @@ It seems you are trying to upload a large folder at once. This might take some t
220
  [INFO|tokenization_utils_base.py:2491] 2025-03-06 04:42:32,803 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
221
  [INFO|tokenization_utils_base.py:2500] 2025-03-06 04:42:32,803 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
222
  It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
223
- 24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1003/4197 [5:10:16<139:42:10, 157.46s/it]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
220
  [INFO|tokenization_utils_base.py:2491] 2025-03-06 04:42:32,803 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
221
  [INFO|tokenization_utils_base.py:2500] 2025-03-06 04:42:32,803 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
222
  It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
223
+ 26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1100/4197 [5:24:13<7:41:45, 8.95s/it][INFO|trainer.py:4226] 2025-03-06 04:57:00,517 >>
224
+ {'loss': 0.268, 'grad_norm': 1.5076738595962524, 'learning_rate': 9.409912607418172e-05, 'epoch': 0.72}
225
+ {'loss': 0.3038, 'grad_norm': 3.3230276107788086, 'learning_rate': 9.390160386775895e-05, 'epoch': 0.73}
226
+ {'loss': 0.2869, 'grad_norm': 1.699854850769043, 'learning_rate': 9.370104438953125e-05, 'epoch': 0.74}
227
+ {'loss': 0.289, 'grad_norm': 0.904507577419281, 'learning_rate': 9.349746151492902e-05, 'epoch': 0.74}
228
+ {'loss': 0.3729, 'grad_norm': 0.9463105201721191, 'learning_rate': 9.329086932855215e-05, 'epoch': 0.75}
229
+ {'loss': 0.2282, 'grad_norm': 1.4746607542037964, 'learning_rate': 9.30812821231956e-05, 'epoch': 0.76}
230
+ {'loss': 0.3029, 'grad_norm': 1.0270076990127563, 'learning_rate': 9.286871439886058e-05, 'epoch': 0.76}
231
+ {'loss': 0.3268, 'grad_norm': 2.0656538009643555, 'learning_rate': 9.265318086175143e-05, 'epoch': 0.77}
232
+ {'loss': 0.2942, 'grad_norm': 0.9798826575279236, 'learning_rate': 9.243469642325805e-05, 'epoch': 0.78}
233
+ {'loss': 0.3266, 'grad_norm': 1.1419672966003418, 'learning_rate': 9.221327619892452e-05, 'epoch': 0.79}
234
+ ***** Running Evaluation *****
235
+ [INFO|trainer.py:4228] 2025-03-06 04:57:00,517 >> Num examples = 1400
236
+ [INFO|trainer.py:4231] 2025-03-06 04:57:00,517 >> Batch size = 1
237
+ 29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1200/4197 [5:55:20<7:24:16, 8.89s/it][INFO|trainer.py:4226] 2025-03-06 05:28:07,623 >>
238
+ ***** Running Evaluation *****
239
+ {'eval_news_finetune_val_loss': 0.307956337928772, 'eval_news_finetune_val_runtime': 1003.1873, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.79}
240
+ {'loss': 0.3596, 'grad_norm': 0.6810228228569031, 'learning_rate': 9.198893550740306e-05, 'epoch': 0.79}
241
+ {'loss': 0.3106, 'grad_norm': 1.6553049087524414, 'learning_rate': 9.176168986939446e-05, 'epoch': 0.8}
242
+ {'loss': 0.3298, 'grad_norm': 0.7749443650245667, 'learning_rate': 9.153155500657422e-05, 'epoch': 0.81}
243
+ {'loss': 0.279, 'grad_norm': 0.8693751096725464, 'learning_rate': 9.129854684050481e-05, 'epoch': 0.81}
244
+ {'loss': 0.3195, 'grad_norm': 1.1013332605361938, 'learning_rate': 9.10626814915343e-05, 'epoch': 0.82}
245
+ {'loss': 0.3027, 'grad_norm': 1.2278695106506348, 'learning_rate': 9.082397527768092e-05, 'epoch': 0.83}
246
+ {'loss': 0.2238, 'grad_norm': 2.173530101776123, 'learning_rate': 9.058244471350428e-05, 'epoch': 0.84}
247
+ {'loss': 0.2399, 'grad_norm': 1.125986933708191, 'learning_rate': 9.033810650896274e-05, 'epoch': 0.84}
248
+ {'loss': 0.2736, 'grad_norm': 0.6611151099205017, 'learning_rate': 9.009097756825737e-05, 'epoch': 0.85}
249
+ {'loss': 0.2949, 'grad_norm': 1.9068485498428345, 'learning_rate': 8.98410749886625e-05, 'epoch': 0.86}
250
+ [INFO|trainer.py:4228] 2025-03-06 05:28:07,623 >> Num examples = 1400
251
+ [INFO|trainer.py:4231] 2025-03-06 05:28:07,623 >> Batch size = 1
252
+ 31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1300/4197 [6:26:37<8:29:22, 10.55s/it][INFO|trainer.py:4226] 2025-03-06 05:59:25,291 >>
253
+ ***** Running Evaluation *****
254
+ {'eval_news_finetune_val_loss': 0.31006094813346863, 'eval_news_finetune_val_runtime': 1002.7866, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.86}
255
+ {'loss': 0.3657, 'grad_norm': 1.192031979560852, 'learning_rate': 8.958841605934278e-05, 'epoch': 0.86}
256
+ {'loss': 0.3068, 'grad_norm': 1.2596725225448608, 'learning_rate': 8.933301826015715e-05, 'epoch': 0.87}
257
+ {'loss': 0.3122, 'grad_norm': 1.4713683128356934, 'learning_rate': 8.907489926044945e-05, 'epoch': 0.88}
258
+ {'loss': 0.2989, 'grad_norm': 1.3583886623382568, 'learning_rate': 8.881407691782608e-05, 'epoch': 0.89}
259
+ {'loss': 0.2549, 'grad_norm': 0.9863426089286804, 'learning_rate': 8.855056927692037e-05, 'epoch': 0.89}
260
+ {'loss': 0.2809, 'grad_norm': 1.0579396486282349, 'learning_rate': 8.828439456814442e-05, 'epoch': 0.9}
261
+ {'loss': 0.2933, 'grad_norm': 2.847482681274414, 'learning_rate': 8.801557120642766e-05, 'epoch': 0.91}
262
+ {'loss': 0.2866, 'grad_norm': 0.8942415118217468, 'learning_rate': 8.774411778994295e-05, 'epoch': 0.91}
263
+ {'loss': 0.2939, 'grad_norm': 1.297845721244812, 'learning_rate': 8.747005309881984e-05, 'epoch': 0.92}
264
+ {'loss': 0.3018, 'grad_norm': 1.2745181322097778, 'learning_rate': 8.719339609384531e-05, 'epoch': 0.93}
265
+ [INFO|trainer.py:4228] 2025-03-06 05:59:25,291 >> Num examples = 1400
266
+ [INFO|trainer.py:4231] 2025-03-06 05:59:25,291 >> Batch size = 1
267
+ 33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1400/4197 [6:57:13<5:07:20, 6.59s/it][INFO|trainer.py:4226] 2025-03-06 06:30:00,820 >>
268
+ ***** Running Evaluation *****
269
+ {'eval_news_finetune_val_loss': 0.29822030663490295, 'eval_news_finetune_val_runtime': 1002.5672, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.93}
270
+ {'loss': 0.295, 'grad_norm': 1.3898978233337402, 'learning_rate': 8.691416591515198e-05, 'epoch': 0.94}
271
+ {'loss': 0.209, 'grad_norm': 1.1516591310501099, 'learning_rate': 8.663238188089398e-05, 'epoch': 0.94}
272
+ {'loss': 0.2904, 'grad_norm': 0.9356768131256104, 'learning_rate': 8.634806348591036e-05, 'epoch': 0.95}
273
+ {'loss': 0.2607, 'grad_norm': 1.884950876235962, 'learning_rate': 8.606123040037643e-05, 'epoch': 0.96}
274
+ {'loss': 0.3279, 'grad_norm': 1.2719082832336426, 'learning_rate': 8.577190246844291e-05, 'epoch': 0.96}
275
+ {'loss': 0.3011, 'grad_norm': 0.935297429561615, 'learning_rate': 8.548009970686302e-05, 'epoch': 0.97}
276
+ {'loss': 0.2379, 'grad_norm': 1.6732884645462036, 'learning_rate': 8.51858423036076e-05, 'epoch': 0.98}
277
+ {'loss': 0.2599, 'grad_norm': 0.6651692390441895, 'learning_rate': 8.488915061646856e-05, 'epoch': 0.99}
278
+ {'loss': 0.2265, 'grad_norm': 1.121752381324768, 'learning_rate': 8.459004517165032e-05, 'epoch': 0.99}
279
+ {'loss': 0.3301, 'grad_norm': 0.5099928379058838, 'learning_rate': 8.428854666234978e-05, 'epoch': 1.0}
280
+ [INFO|trainer.py:4228] 2025-03-06 06:30:00,821 >> Num examples = 1400
281
+ [INFO|trainer.py:4231] 2025-03-06 06:30:00,821 >> Batch size = 1
282
+ 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1500/4197 [7:28:12<6:45:08, 9.01s/it][INFO|trainer.py:4226] 2025-03-06 07:00:59,933 >>
283
+ ***** Running Evaluation *****
284
+ {'eval_news_finetune_val_loss': 0.28762951493263245, 'eval_news_finetune_val_runtime': 1002.7793, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.0}
285
+ {'loss': 0.2021, 'grad_norm': 0.9986103177070618, 'learning_rate': 8.398467594732478e-05, 'epoch': 1.01}
286
+ {'loss': 0.2228, 'grad_norm': 1.2675282955169678, 'learning_rate': 8.367845404945084e-05, 'epoch': 1.01}
287
+ {'loss': 0.1947, 'grad_norm': 0.8156709671020508, 'learning_rate': 8.336990215426688e-05, 'epoch': 1.02}
288
+ {'loss': 0.2344, 'grad_norm': 0.5374387502670288, 'learning_rate': 8.305904160850941e-05, 'epoch': 1.03}
289
+ {'loss': 0.1919, 'grad_norm': 0.6672261357307434, 'learning_rate': 8.274589391863583e-05, 'epoch': 1.04}
290
+ {'loss': 0.2218, 'grad_norm': 0.9803467988967896, 'learning_rate': 8.243048074933634e-05, 'epoch': 1.04}
291
+ {'loss': 0.2556, 'grad_norm': 1.482840657234192, 'learning_rate': 8.21128239220353e-05, 'epoch': 1.05}
292
+ {'loss': 0.2052, 'grad_norm': 1.0589625835418701, 'learning_rate': 8.179294541338135e-05, 'epoch': 1.06}
293
+ {'loss': 0.2386, 'grad_norm': 0.8332052230834961, 'learning_rate': 8.147086735372716e-05, 'epoch': 1.06}
294
+ {'loss': 0.1426, 'grad_norm': 0.6018723845481873, 'learning_rate': 8.114661202559828e-05, 'epoch': 1.07}
295
+ [INFO|trainer.py:4228] 2025-03-06 07:00:59,933 >> Num examples = 1400
296
+ [INFO|trainer.py:4231] 2025-03-06 07:00:59,934 >> Batch size = 1
297
+ 36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1500/4197 [7:44:55<6:45:08, 9.01s/it][INFO|trainer.py:3910] 2025-03-06 07:17:42,683 >> Saving model checkpoint to /kaggle/working/checkpoint-1500
298
+ [INFO|configuration_utils.py:696] 2025-03-06 07:17:43,155 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
299
+ {'eval_news_finetune_val_loss': 0.30121028423309326, 'eval_news_finetune_val_runtime': 1002.7457, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.07}
300
+ [INFO|configuration_utils.py:768] 2025-03-06 07:17:43,156 >> Model config Qwen2Config {
301
+ "architectures": [
302
+ "Qwen2ForCausalLM"
303
+ ],
304
+ "attention_dropout": 0.0,
305
+ "bos_token_id": 151643,
306
+ "eos_token_id": 151645,
307
+ "hidden_act": "silu",
308
+ "hidden_size": 1536,
309
+ "initializer_range": 0.02,
310
+ "intermediate_size": 8960,
311
+ "max_position_embeddings": 32768,
312
+ "max_window_layers": 21,
313
+ "model_type": "qwen2",
314
+ "num_attention_heads": 12,
315
+ "num_hidden_layers": 28,
316
+ "num_key_value_heads": 2,
317
+ "rms_norm_eps": 1e-06,
318
+ "rope_scaling": null,
319
+ "rope_theta": 1000000.0,
320
+ "sliding_window": null,
321
+ "tie_word_embeddings": true,
322
+ "torch_dtype": "bfloat16",
323
+ "transformers_version": "4.48.3",
324
+ "use_cache": true,
325
+ "use_sliding_window": false,
326
+ "vocab_size": 151936
327
+ }
328
+
329
+ [INFO|tokenization_utils_base.py:2491] 2025-03-06 07:17:43,813 >> tokenizer config file saved in /kaggle/working/checkpoint-1500/tokenizer_config.json
330
+ [INFO|tokenization_utils_base.py:2500] 2025-03-06 07:17:43,814 >> Special tokens file saved in /kaggle/working/checkpoint-1500/special_tokens_map.json
331
+ [INFO|tokenization_utils_base.py:2491] 2025-03-06 07:17:45,300 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
332
+ [INFO|tokenization_utils_base.py:2500] 2025-03-06 07:17:45,301 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
333
+ It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
334
+ 38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1600/4197 [7:58:53<5:45:00, 7.97s/it][INFO|trainer.py:4226] 2025-03-06 07:31:41,016 >>
335
+ {'loss': 0.2407, 'grad_norm': 1.7663507461547852, 'learning_rate': 8.082020186215156e-05, 'epoch': 1.08}
336
+ {'loss': 0.2483, 'grad_norm': 1.2081632614135742, 'learning_rate': 8.049165944562316e-05, 'epoch': 1.09}
337
+ {'loss': 0.2013, 'grad_norm': 0.5045826435089111, 'learning_rate': 8.016100750576621e-05, 'epoch': 1.09}
338
+ {'loss': 0.2034, 'grad_norm': 1.4456278085708618, 'learning_rate': 7.98282689182783e-05, 'epoch': 1.1}
339
+ {'loss': 0.2386, 'grad_norm': 1.1558668613433838, 'learning_rate': 7.949346670321891e-05, 'epoch': 1.11}
340
+ {'loss': 0.2299, 'grad_norm': 1.4196126461029053, 'learning_rate': 7.915662402341664e-05, 'epoch': 1.11}
341
+ {'loss': 0.2105, 'grad_norm': 0.9341222047805786, 'learning_rate': 7.88177641828669e-05, 'epoch': 1.12}
342
+ {'loss': 0.1925, 'grad_norm': 1.066001296043396, 'learning_rate': 7.847691062511957e-05, 'epoch': 1.13}
343
+ {'loss': 0.2425, 'grad_norm': 0.7840182781219482, 'learning_rate': 7.813408693165704e-05, 'epoch': 1.14}
344
+ {'loss': 0.2014, 'grad_norm': 0.983668327331543, 'learning_rate': 7.778931682026293e-05, 'epoch': 1.14}
345
+ ***** Running Evaluation *****
346
+ [INFO|trainer.py:4228] 2025-03-06 07:31:41,016 >> Num examples = 1400
347
+ [INFO|trainer.py:4231] 2025-03-06 07:31:41,017 >> Batch size = 1
348
+ 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1700/4197 [8:29:32<5:22:45, 7.76s/it][INFO|trainer.py:4226] 2025-03-06 08:02:19,546 >>
349
+ ***** Running Evaluation *****
350
+ {'eval_news_finetune_val_loss': 0.29564452171325684, 'eval_news_finetune_val_runtime': 1003.001, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.14}
351
+ {'loss': 0.2863, 'grad_norm': 1.63984215259552, 'learning_rate': 7.744262414338099e-05, 'epoch': 1.15}
352
+ {'loss': 0.2175, 'grad_norm': 0.9211621284484863, 'learning_rate': 7.709403288646507e-05, 'epoch': 1.16}
353
+ {'loss': 0.1893, 'grad_norm': 1.3369996547698975, 'learning_rate': 7.67435671663196e-05, 'epoch': 1.16}
354
+ {'loss': 0.2483, 'grad_norm': 0.7532891631126404, 'learning_rate': 7.63912512294312e-05, 'epoch': 1.17}
355
+ {'loss': 0.1888, 'grad_norm': 1.0959442853927612, 'learning_rate': 7.603710945029119e-05, 'epoch': 1.18}
356
+ {'loss': 0.2144, 'grad_norm': 0.9019472599029541, 'learning_rate': 7.568116632970922e-05, 'epoch': 1.19}
357
+ {'loss': 0.191, 'grad_norm': 1.1219818592071533, 'learning_rate': 7.532344649311829e-05, 'epoch': 1.19}
358
+ {'loss': 0.2762, 'grad_norm': 1.0829100608825684, 'learning_rate': 7.496397468887106e-05, 'epoch': 1.2}
359
+ {'loss': 0.157, 'grad_norm': 0.7855832576751709, 'learning_rate': 7.460277578652759e-05, 'epoch': 1.21}
360
+ {'loss': 0.2627, 'grad_norm': 2.407999038696289, 'learning_rate': 7.423987477513488e-05, 'epoch': 1.21}
361
+ [INFO|trainer.py:4228] 2025-03-06 08:02:19,546 >> Num examples = 1400
362
+ [INFO|trainer.py:4231] 2025-03-06 08:02:19,546 >> Batch size = 1
363
+ 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1800/4197 [9:00:34<6:16:10, 9.42s/it][INFO|trainer.py:4226] 2025-03-06 08:33:22,380 >>
364
+ ***** Running Evaluation *****
365
+ {'eval_news_finetune_val_loss': 0.28248873353004456, 'eval_news_finetune_val_runtime': 1003.1081, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.21}
366
+ {'loss': 0.1477, 'grad_norm': 1.5500895977020264, 'learning_rate': 7.387529676149799e-05, 'epoch': 1.22}
367
+ {'loss': 0.1942, 'grad_norm': 1.5599130392074585, 'learning_rate': 7.350906696844307e-05, 'epoch': 1.23}
368
+ {'loss': 0.2, 'grad_norm': 1.6327091455459595, 'learning_rate': 7.314121073307229e-05, 'epoch': 1.24}
369
+ {'loss': 0.185, 'grad_norm': 0.6044666767120361, 'learning_rate': 7.277175350501111e-05, 'epoch': 1.24}
370
+ {'loss': 0.196, 'grad_norm': 1.317089319229126, 'learning_rate': 7.240072084464729e-05, 'epoch': 1.25}
371
+ {'loss': 0.1322, 'grad_norm': 1.089105486869812, 'learning_rate': 7.202813842136283e-05, 'epoch': 1.26}
372
+ {'loss': 0.2176, 'grad_norm': 1.4972888231277466, 'learning_rate': 7.165403201175787e-05, 'epoch': 1.26}
373
+ {'loss': 0.218, 'grad_norm': 1.4998830556869507, 'learning_rate': 7.127842749786747e-05, 'epoch': 1.27}
374
+ {'loss': 0.1653, 'grad_norm': 0.9759517908096313, 'learning_rate': 7.090135086537095e-05, 'epoch': 1.28}
375
+ {'loss': 0.175, 'grad_norm': 0.9713583588600159, 'learning_rate': 7.052282820179412e-05, 'epoch': 1.29}
376
+ [INFO|trainer.py:4228] 2025-03-06 08:33:22,381 >> Num examples = 1400
377
+ [INFO|trainer.py:4231] 2025-03-06 08:33:22,381 >> Batch size = 1
378
+ 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1900/4197 [9:32:25<5:38:20, 8.84s/it][INFO|trainer.py:4226] 2025-03-06 09:05:13,019 >>
379
+ ***** Running Evaluation *****
380
+ {'eval_news_finetune_val_loss': 0.2936909794807434, 'eval_news_finetune_val_runtime': 1003.12, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.29}
381
+ {'loss': 0.1727, 'grad_norm': 0.6328814625740051, 'learning_rate': 7.014288569470446e-05, 'epoch': 1.29}
382
+ {'loss': 0.2363, 'grad_norm': 1.622104525566101, 'learning_rate': 6.976154962989934e-05, 'epoch': 1.3}
383
+ {'loss': 0.1897, 'grad_norm': 1.8254674673080444, 'learning_rate': 6.937884638958757e-05, 'epoch': 1.31}
384
+ {'loss': 0.2029, 'grad_norm': 0.8813793063163757, 'learning_rate': 6.899480245056396e-05, 'epoch': 1.31}
385
+ {'loss': 0.2025, 'grad_norm': 0.7675999999046326, 'learning_rate': 6.860944438237788e-05, 'epoch': 1.32}
386
+ {'loss': 0.2317, 'grad_norm': 1.1973013877868652, 'learning_rate': 6.82227988454948e-05, 'epoch': 1.33}
387
+ {'loss': 0.2318, 'grad_norm': 0.7864009737968445, 'learning_rate': 6.783489258945195e-05, 'epoch': 1.34}
388
+ {'loss': 0.1871, 'grad_norm': 1.0866330862045288, 'learning_rate': 6.74457524510077e-05, 'epoch': 1.34}
389
+ {'loss': 0.211, 'grad_norm': 0.8745126724243164, 'learning_rate': 6.705540535228485e-05, 'epoch': 1.35}
390
+ {'loss': 0.2307, 'grad_norm': 1.3401581048965454, 'learning_rate': 6.66638782989081e-05, 'epoch': 1.36}
391
+ [INFO|trainer.py:4228] 2025-03-06 09:05:13,019 >> Num examples = 1400
392
+ [INFO|trainer.py:4231] 2025-03-06 09:05:13,019 >> Batch size = 1
393
+ 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2000/4197 [10:03:27<6:06:16, 10.00s/it][INFO|trainer.py:4226] 2025-03-06 09:36:15,070 >>
394
+ ***** Running Evaluation *****
395
+ {'eval_news_finetune_val_loss': 0.2787444591522217, 'eval_news_finetune_val_runtime': 1002.9344, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.36}
396
+ {'loss': 0.2128, 'grad_norm': 0.6149284839630127, 'learning_rate': 6.627119837813564e-05, 'epoch': 1.36}
397
+ {'loss': 0.1551, 'grad_norm': 1.7847625017166138, 'learning_rate': 6.587739275698525e-05, 'epoch': 1.37}
398
+ {'loss': 0.2335, 'grad_norm': 1.1973716020584106, 'learning_rate': 6.54824886803547e-05, 'epoch': 1.38}
399
+ {'loss': 0.1504, 'grad_norm': 1.5757859945297241, 'learning_rate': 6.508651346913687e-05, 'epoch': 1.39}
400
+ {'loss': 0.2679, 'grad_norm': 1.7269341945648193, 'learning_rate': 6.468949451832968e-05, 'epoch': 1.39}
401
+ {'loss': 0.1942, 'grad_norm': 1.6860129833221436, 'learning_rate': 6.429145929514063e-05, 'epoch': 1.4}
402
+ {'loss': 0.2025, 'grad_norm': 1.1732631921768188, 'learning_rate': 6.389243533708671e-05, 'epoch': 1.41}
403
+ {'loss': 0.1836, 'grad_norm': 0.9073033332824707, 'learning_rate': 6.349245025008912e-05, 'epoch': 1.41}
404
+ {'loss': 0.1526, 'grad_norm': 1.133843183517456, 'learning_rate': 6.309153170656342e-05, 'epoch': 1.42}
405
+ {'loss': 0.1939, 'grad_norm': 2.656296968460083, 'learning_rate': 6.268970744350515e-05, 'epoch': 1.43}
406
+ [INFO|trainer.py:4228] 2025-03-06 09:36:15,070 >> Num examples = 1400
407
+ [INFO|trainer.py:4231] 2025-03-06 09:36:15,070 >> Batch size = 1
408
+ 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 2000/4197 [10:20:10<6:06:16, 10.00s/it][INFO|trainer.py:3910] 2025-03-06 09:52:58,168 >> Saving model checkpoint to /kaggle/working/checkpoint-2000
409
+ [INFO|configuration_utils.py:696] 2025-03-06 09:52:58,703 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
410
+ {'eval_news_finetune_val_loss': 0.27414408326148987, 'eval_news_finetune_val_runtime': 1003.0949, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.43}
411
+ [INFO|configuration_utils.py:768] 2025-03-06 09:52:58,704 >> Model config Qwen2Config {
412
+ "architectures": [
413
+ "Qwen2ForCausalLM"
414
+ ],
415
+ "attention_dropout": 0.0,
416
+ "bos_token_id": 151643,
417
+ "eos_token_id": 151645,
418
+ "hidden_act": "silu",
419
+ "hidden_size": 1536,
420
+ "initializer_range": 0.02,
421
+ "intermediate_size": 8960,
422
+ "max_position_embeddings": 32768,
423
+ "max_window_layers": 21,
424
+ "model_type": "qwen2",
425
+ "num_attention_heads": 12,
426
+ "num_hidden_layers": 28,
427
+ "num_key_value_heads": 2,
428
+ "rms_norm_eps": 1e-06,
429
+ "rope_scaling": null,
430
+ "rope_theta": 1000000.0,
431
+ "sliding_window": null,
432
+ "tie_word_embeddings": true,
433
+ "torch_dtype": "bfloat16",
434
+ "transformers_version": "4.48.3",
435
+ "use_cache": true,
436
+ "use_sliding_window": false,
437
+ "vocab_size": 151936
438
+ }
439
+
440
+ [INFO|tokenization_utils_base.py:2491] 2025-03-06 09:52:59,374 >> tokenizer config file saved in /kaggle/working/checkpoint-2000/tokenizer_config.json
441
+ [INFO|tokenization_utils_base.py:2500] 2025-03-06 09:52:59,375 >> Special tokens file saved in /kaggle/working/checkpoint-2000/special_tokens_map.json
442
+ [INFO|tokenization_utils_base.py:2491] 2025-03-06 09:53:00,849 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
443
+ [INFO|tokenization_utils_base.py:2500] 2025-03-06 09:53:00,849 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
444
+ It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
445
+ 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 2002/4197 [10:20:29<134:16:13, 220.22s/it]
LLaMA-Factory/wandb/run-20250305_233246-9ct1o6yk/run-9ct1o6yk.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6713a73557d0cd69067d7a9ad7748c6768fdc0007b478c226343d9c67f7f86f2
3
- size 5668864
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d17c88e6de89554302307bd771e7c76121683db7c2c09bbb4defbbd053560555
3
+ size 11370496
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:17ba270b888a201fead48ad37c2c2e228e832cc5e2304c9d48ddcc2a4ab95b9d
3
  size 295488936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd91de723bbd7ea7b7dfe87942ef4a89726bd5bdcfdd6abb72301f7a8513b562
3
  size 295488936
trainer_log.jsonl CHANGED
@@ -108,3 +108,113 @@
108
  {"current_steps": 990, "total_steps": 4197, "loss": 0.3005, "lr": 9.448500422148364e-05, "epoch": 0.707395498392283, "percentage": 23.59, "elapsed_time": "4:51:35", "remaining_time": "15:44:34"}
109
  {"current_steps": 1000, "total_steps": 4197, "loss": 0.294, "lr": 9.429359734349863e-05, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "4:52:59", "remaining_time": "15:36:41"}
110
  {"current_steps": 1000, "total_steps": 4197, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "5:09:42", "remaining_time": "16:30:08"}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  {"current_steps": 990, "total_steps": 4197, "loss": 0.3005, "lr": 9.448500422148364e-05, "epoch": 0.707395498392283, "percentage": 23.59, "elapsed_time": "4:51:35", "remaining_time": "15:44:34"}
109
  {"current_steps": 1000, "total_steps": 4197, "loss": 0.294, "lr": 9.429359734349863e-05, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "4:52:59", "remaining_time": "15:36:41"}
110
  {"current_steps": 1000, "total_steps": 4197, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "5:09:42", "remaining_time": "16:30:08"}
111
+ {"current_steps": 1010, "total_steps": 4197, "loss": 0.268, "lr": 9.409912607418172e-05, "epoch": 0.721686316541622, "percentage": 24.06, "elapsed_time": "5:11:21", "remaining_time": "16:22:26"}
112
+ {"current_steps": 1020, "total_steps": 4197, "loss": 0.3038, "lr": 9.390160386775895e-05, "epoch": 0.7288317256162915, "percentage": 24.3, "elapsed_time": "5:12:42", "remaining_time": "16:14:01"}
113
+ {"current_steps": 1030, "total_steps": 4197, "loss": 0.2869, "lr": 9.370104438953125e-05, "epoch": 0.735977134690961, "percentage": 24.54, "elapsed_time": "5:14:16", "remaining_time": "16:06:20"}
114
+ {"current_steps": 1040, "total_steps": 4197, "loss": 0.289, "lr": 9.349746151492902e-05, "epoch": 0.7431225437656306, "percentage": 24.78, "elapsed_time": "5:15:40", "remaining_time": "15:58:14"}
115
+ {"current_steps": 1050, "total_steps": 4197, "loss": 0.3729, "lr": 9.329086932855215e-05, "epoch": 0.7502679528403001, "percentage": 25.02, "elapsed_time": "5:16:59", "remaining_time": "15:50:05"}
116
+ {"current_steps": 1060, "total_steps": 4197, "loss": 0.2282, "lr": 9.30812821231956e-05, "epoch": 0.7574133619149697, "percentage": 25.26, "elapsed_time": "5:18:20", "remaining_time": "15:42:05"}
117
+ {"current_steps": 1070, "total_steps": 4197, "loss": 0.3029, "lr": 9.286871439886058e-05, "epoch": 0.7645587709896392, "percentage": 25.49, "elapsed_time": "5:19:52", "remaining_time": "15:34:47"}
118
+ {"current_steps": 1080, "total_steps": 4197, "loss": 0.3268, "lr": 9.265318086175143e-05, "epoch": 0.7717041800643086, "percentage": 25.73, "elapsed_time": "5:21:19", "remaining_time": "15:27:22"}
119
+ {"current_steps": 1090, "total_steps": 4197, "loss": 0.2942, "lr": 9.243469642325805e-05, "epoch": 0.7788495891389782, "percentage": 25.97, "elapsed_time": "5:22:53", "remaining_time": "15:20:24"}
120
+ {"current_steps": 1100, "total_steps": 4197, "loss": 0.3266, "lr": 9.221327619892452e-05, "epoch": 0.7859949982136477, "percentage": 26.21, "elapsed_time": "5:24:13", "remaining_time": "15:12:49"}
121
+ {"current_steps": 1100, "total_steps": 4197, "epoch": 0.7859949982136477, "percentage": 26.21, "elapsed_time": "5:40:56", "remaining_time": "15:59:53"}
122
+ {"current_steps": 1110, "total_steps": 4197, "loss": 0.3596, "lr": 9.198893550740306e-05, "epoch": 0.7931404072883173, "percentage": 26.45, "elapsed_time": "5:42:26", "remaining_time": "15:52:22"}
123
+ {"current_steps": 1120, "total_steps": 4197, "loss": 0.3106, "lr": 9.176168986939446e-05, "epoch": 0.8002858163629868, "percentage": 26.69, "elapsed_time": "5:43:55", "remaining_time": "15:44:53"}
124
+ {"current_steps": 1130, "total_steps": 4197, "loss": 0.3298, "lr": 9.153155500657422e-05, "epoch": 0.8074312254376563, "percentage": 26.92, "elapsed_time": "5:45:16", "remaining_time": "15:37:09"}
125
+ {"current_steps": 1140, "total_steps": 4197, "loss": 0.279, "lr": 9.129854684050481e-05, "epoch": 0.8145766345123259, "percentage": 27.16, "elapsed_time": "5:46:42", "remaining_time": "15:29:42"}
126
+ {"current_steps": 1150, "total_steps": 4197, "loss": 0.3195, "lr": 9.10626814915343e-05, "epoch": 0.8217220435869954, "percentage": 27.4, "elapsed_time": "5:48:13", "remaining_time": "15:22:37"}
127
+ {"current_steps": 1160, "total_steps": 4197, "loss": 0.3027, "lr": 9.082397527768092e-05, "epoch": 0.8288674526616648, "percentage": 27.64, "elapsed_time": "5:49:37", "remaining_time": "15:15:21"}
128
+ {"current_steps": 1170, "total_steps": 4197, "loss": 0.2238, "lr": 9.058244471350428e-05, "epoch": 0.8360128617363344, "percentage": 27.88, "elapsed_time": "5:50:55", "remaining_time": "15:07:53"}
129
+ {"current_steps": 1180, "total_steps": 4197, "loss": 0.2399, "lr": 9.033810650896274e-05, "epoch": 0.8431582708110039, "percentage": 28.12, "elapsed_time": "5:52:18", "remaining_time": "15:00:47"}
130
+ {"current_steps": 1190, "total_steps": 4197, "loss": 0.2736, "lr": 9.009097756825737e-05, "epoch": 0.8503036798856735, "percentage": 28.35, "elapsed_time": "5:53:52", "remaining_time": "14:54:11"}
131
+ {"current_steps": 1200, "total_steps": 4197, "loss": 0.2949, "lr": 8.98410749886625e-05, "epoch": 0.857449088960343, "percentage": 28.59, "elapsed_time": "5:55:20", "remaining_time": "14:47:27"}
132
+ {"current_steps": 1200, "total_steps": 4197, "epoch": 0.857449088960343, "percentage": 28.59, "elapsed_time": "6:12:02", "remaining_time": "15:29:11"}
133
+ {"current_steps": 1210, "total_steps": 4197, "loss": 0.3657, "lr": 8.958841605934278e-05, "epoch": 0.8645944980350125, "percentage": 28.83, "elapsed_time": "6:13:27", "remaining_time": "15:21:54"}
134
+ {"current_steps": 1220, "total_steps": 4197, "loss": 0.3068, "lr": 8.933301826015715e-05, "epoch": 0.8717399071096821, "percentage": 29.07, "elapsed_time": "6:14:56", "remaining_time": "15:14:54"}
135
+ {"current_steps": 1230, "total_steps": 4197, "loss": 0.3122, "lr": 8.907489926044945e-05, "epoch": 0.8788853161843515, "percentage": 29.31, "elapsed_time": "6:16:27", "remaining_time": "15:08:06"}
136
+ {"current_steps": 1240, "total_steps": 4197, "loss": 0.2989, "lr": 8.881407691782608e-05, "epoch": 0.886030725259021, "percentage": 29.54, "elapsed_time": "6:17:59", "remaining_time": "15:01:22"}
137
+ {"current_steps": 1250, "total_steps": 4197, "loss": 0.2549, "lr": 8.855056927692037e-05, "epoch": 0.8931761343336906, "percentage": 29.78, "elapsed_time": "6:19:26", "remaining_time": "14:54:33"}
138
+ {"current_steps": 1260, "total_steps": 4197, "loss": 0.2809, "lr": 8.828439456814442e-05, "epoch": 0.9003215434083601, "percentage": 30.02, "elapsed_time": "6:20:44", "remaining_time": "14:47:29"}
139
+ {"current_steps": 1270, "total_steps": 4197, "loss": 0.2933, "lr": 8.801557120642766e-05, "epoch": 0.9074669524830297, "percentage": 30.26, "elapsed_time": "6:22:09", "remaining_time": "14:40:45"}
140
+ {"current_steps": 1280, "total_steps": 4197, "loss": 0.2866, "lr": 8.774411778994295e-05, "epoch": 0.9146123615576992, "percentage": 30.5, "elapsed_time": "6:23:32", "remaining_time": "14:34:03"}
141
+ {"current_steps": 1290, "total_steps": 4197, "loss": 0.2939, "lr": 8.747005309881984e-05, "epoch": 0.9217577706323687, "percentage": 30.74, "elapsed_time": "6:24:58", "remaining_time": "14:27:32"}
142
+ {"current_steps": 1300, "total_steps": 4197, "loss": 0.3018, "lr": 8.719339609384531e-05, "epoch": 0.9289031797070382, "percentage": 30.97, "elapsed_time": "6:26:37", "remaining_time": "14:21:35"}
143
+ {"current_steps": 1300, "total_steps": 4197, "epoch": 0.9289031797070382, "percentage": 30.97, "elapsed_time": "6:43:20", "remaining_time": "14:58:49"}
144
+ {"current_steps": 1310, "total_steps": 4197, "loss": 0.295, "lr": 8.691416591515198e-05, "epoch": 0.9360485887817077, "percentage": 31.21, "elapsed_time": "6:44:53", "remaining_time": "14:52:17"}
145
+ {"current_steps": 1320, "total_steps": 4197, "loss": 0.209, "lr": 8.663238188089398e-05, "epoch": 0.9431939978563773, "percentage": 31.45, "elapsed_time": "6:46:13", "remaining_time": "14:45:23"}
146
+ {"current_steps": 1330, "total_steps": 4197, "loss": 0.2904, "lr": 8.634806348591036e-05, "epoch": 0.9503394069310468, "percentage": 31.69, "elapsed_time": "6:47:41", "remaining_time": "14:38:50"}
147
+ {"current_steps": 1340, "total_steps": 4197, "loss": 0.2607, "lr": 8.606123040037643e-05, "epoch": 0.9574848160057163, "percentage": 31.93, "elapsed_time": "6:49:04", "remaining_time": "14:32:11"}
148
+ {"current_steps": 1350, "total_steps": 4197, "loss": 0.3279, "lr": 8.577190246844291e-05, "epoch": 0.9646302250803859, "percentage": 32.17, "elapsed_time": "6:50:30", "remaining_time": "14:25:43"}
149
+ {"current_steps": 1360, "total_steps": 4197, "loss": 0.3011, "lr": 8.548009970686302e-05, "epoch": 0.9717756341550554, "percentage": 32.4, "elapsed_time": "6:51:49", "remaining_time": "14:19:05"}
150
+ {"current_steps": 1370, "total_steps": 4197, "loss": 0.2379, "lr": 8.51858423036076e-05, "epoch": 0.978921043229725, "percentage": 32.64, "elapsed_time": "6:53:12", "remaining_time": "14:12:39"}
151
+ {"current_steps": 1380, "total_steps": 4197, "loss": 0.2599, "lr": 8.488915061646856e-05, "epoch": 0.9860664523043944, "percentage": 32.88, "elapsed_time": "6:54:31", "remaining_time": "14:06:09"}
152
+ {"current_steps": 1390, "total_steps": 4197, "loss": 0.2265, "lr": 8.459004517165032e-05, "epoch": 0.9932118613790639, "percentage": 33.12, "elapsed_time": "6:55:53", "remaining_time": "13:59:51"}
153
+ {"current_steps": 1400, "total_steps": 4197, "loss": 0.3301, "lr": 8.428854666234978e-05, "epoch": 1.0, "percentage": 33.36, "elapsed_time": "6:57:13", "remaining_time": "13:53:33"}
154
+ {"current_steps": 1400, "total_steps": 4197, "epoch": 1.0, "percentage": 33.36, "elapsed_time": "7:13:56", "remaining_time": "14:26:56"}
155
+ {"current_steps": 1410, "total_steps": 4197, "loss": 0.2021, "lr": 8.398467594732478e-05, "epoch": 1.0071454090746694, "percentage": 33.6, "elapsed_time": "7:15:28", "remaining_time": "14:20:45"}
156
+ {"current_steps": 1420, "total_steps": 4197, "loss": 0.2228, "lr": 8.367845404945084e-05, "epoch": 1.014290818149339, "percentage": 33.83, "elapsed_time": "7:16:52", "remaining_time": "14:14:22"}
157
+ {"current_steps": 1430, "total_steps": 4197, "loss": 0.1947, "lr": 8.336990215426688e-05, "epoch": 1.0214362272240085, "percentage": 34.07, "elapsed_time": "7:18:12", "remaining_time": "14:07:54"}
158
+ {"current_steps": 1440, "total_steps": 4197, "loss": 0.2344, "lr": 8.305904160850941e-05, "epoch": 1.0285816362986782, "percentage": 34.31, "elapsed_time": "7:19:35", "remaining_time": "14:01:37"}
159
+ {"current_steps": 1450, "total_steps": 4197, "loss": 0.1919, "lr": 8.274589391863583e-05, "epoch": 1.0357270453733476, "percentage": 34.55, "elapsed_time": "7:20:55", "remaining_time": "13:55:18"}
160
+ {"current_steps": 1460, "total_steps": 4197, "loss": 0.2218, "lr": 8.243048074933634e-05, "epoch": 1.0428724544480172, "percentage": 34.79, "elapsed_time": "7:22:32", "remaining_time": "13:49:37"}
161
+ {"current_steps": 1470, "total_steps": 4197, "loss": 0.2556, "lr": 8.21128239220353e-05, "epoch": 1.0500178635226867, "percentage": 35.03, "elapsed_time": "7:23:59", "remaining_time": "13:43:39"}
162
+ {"current_steps": 1480, "total_steps": 4197, "loss": 0.2052, "lr": 8.179294541338135e-05, "epoch": 1.057163272597356, "percentage": 35.26, "elapsed_time": "7:25:16", "remaining_time": "13:37:26"}
163
+ {"current_steps": 1490, "total_steps": 4197, "loss": 0.2386, "lr": 8.147086735372716e-05, "epoch": 1.0643086816720257, "percentage": 35.5, "elapsed_time": "7:26:40", "remaining_time": "13:31:31"}
164
+ {"current_steps": 1500, "total_steps": 4197, "loss": 0.1426, "lr": 8.114661202559828e-05, "epoch": 1.0714540907466952, "percentage": 35.74, "elapsed_time": "7:28:12", "remaining_time": "13:25:52"}
165
+ {"current_steps": 1500, "total_steps": 4197, "epoch": 1.0714540907466952, "percentage": 35.74, "elapsed_time": "7:44:55", "remaining_time": "13:55:55"}
166
+ {"current_steps": 1510, "total_steps": 4197, "loss": 0.2407, "lr": 8.082020186215156e-05, "epoch": 1.0785994998213648, "percentage": 35.98, "elapsed_time": "7:46:22", "remaining_time": "13:49:54"}
167
+ {"current_steps": 1520, "total_steps": 4197, "loss": 0.2483, "lr": 8.049165944562316e-05, "epoch": 1.0857449088960343, "percentage": 36.22, "elapsed_time": "7:47:50", "remaining_time": "13:43:56"}
168
+ {"current_steps": 1530, "total_steps": 4197, "loss": 0.2013, "lr": 8.016100750576621e-05, "epoch": 1.092890317970704, "percentage": 36.45, "elapsed_time": "7:49:06", "remaining_time": "13:37:42"}
169
+ {"current_steps": 1540, "total_steps": 4197, "loss": 0.2034, "lr": 7.98282689182783e-05, "epoch": 1.1000357270453733, "percentage": 36.69, "elapsed_time": "7:50:26", "remaining_time": "13:31:39"}
170
+ {"current_steps": 1550, "total_steps": 4197, "loss": 0.2386, "lr": 7.949346670321891e-05, "epoch": 1.107181136120043, "percentage": 36.93, "elapsed_time": "7:51:47", "remaining_time": "13:25:42"}
171
+ {"current_steps": 1560, "total_steps": 4197, "loss": 0.2299, "lr": 7.915662402341664e-05, "epoch": 1.1143265451947124, "percentage": 37.17, "elapsed_time": "7:53:14", "remaining_time": "13:19:57"}
172
+ {"current_steps": 1570, "total_steps": 4197, "loss": 0.2105, "lr": 7.88177641828669e-05, "epoch": 1.1214719542693818, "percentage": 37.41, "elapsed_time": "7:54:47", "remaining_time": "13:14:26"}
173
+ {"current_steps": 1580, "total_steps": 4197, "loss": 0.1925, "lr": 7.847691062511957e-05, "epoch": 1.1286173633440515, "percentage": 37.65, "elapsed_time": "7:56:16", "remaining_time": "13:08:51"}
174
+ {"current_steps": 1590, "total_steps": 4197, "loss": 0.2425, "lr": 7.813408693165704e-05, "epoch": 1.135762772418721, "percentage": 37.88, "elapsed_time": "7:57:40", "remaining_time": "13:03:11"}
175
+ {"current_steps": 1600, "total_steps": 4197, "loss": 0.2014, "lr": 7.778931682026293e-05, "epoch": 1.1429081814933906, "percentage": 38.12, "elapsed_time": "7:58:53", "remaining_time": "12:57:18"}
176
+ {"current_steps": 1600, "total_steps": 4197, "epoch": 1.1429081814933906, "percentage": 38.12, "elapsed_time": "8:15:36", "remaining_time": "13:24:26"}
177
+ {"current_steps": 1610, "total_steps": 4197, "loss": 0.2863, "lr": 7.744262414338099e-05, "epoch": 1.15005359056806, "percentage": 38.36, "elapsed_time": "8:16:53", "remaining_time": "13:18:25"}
178
+ {"current_steps": 1620, "total_steps": 4197, "loss": 0.2175, "lr": 7.709403288646507e-05, "epoch": 1.1571989996427297, "percentage": 38.6, "elapsed_time": "8:18:18", "remaining_time": "13:12:41"}
179
+ {"current_steps": 1630, "total_steps": 4197, "loss": 0.1893, "lr": 7.67435671663196e-05, "epoch": 1.164344408717399, "percentage": 38.84, "elapsed_time": "8:19:54", "remaining_time": "13:07:16"}
180
+ {"current_steps": 1640, "total_steps": 4197, "loss": 0.2483, "lr": 7.63912512294312e-05, "epoch": 1.1714898177920685, "percentage": 39.08, "elapsed_time": "8:21:19", "remaining_time": "13:01:37"}
181
+ {"current_steps": 1650, "total_steps": 4197, "loss": 0.1888, "lr": 7.603710945029119e-05, "epoch": 1.1786352268667382, "percentage": 39.31, "elapsed_time": "8:22:38", "remaining_time": "12:55:54"}
182
+ {"current_steps": 1660, "total_steps": 4197, "loss": 0.2144, "lr": 7.568116632970922e-05, "epoch": 1.1857806359414076, "percentage": 39.55, "elapsed_time": "8:24:07", "remaining_time": "12:50:27"}
183
+ {"current_steps": 1670, "total_steps": 4197, "loss": 0.191, "lr": 7.532344649311829e-05, "epoch": 1.1929260450160772, "percentage": 39.79, "elapsed_time": "8:25:33", "remaining_time": "12:45:00"}
184
+ {"current_steps": 1680, "total_steps": 4197, "loss": 0.2762, "lr": 7.496397468887106e-05, "epoch": 1.2000714540907467, "percentage": 40.03, "elapsed_time": "8:26:55", "remaining_time": "12:39:28"}
185
+ {"current_steps": 1690, "total_steps": 4197, "loss": 0.157, "lr": 7.460277578652759e-05, "epoch": 1.2072168631654163, "percentage": 40.27, "elapsed_time": "8:28:09", "remaining_time": "12:33:48"}
186
+ {"current_steps": 1700, "total_steps": 4197, "loss": 0.2627, "lr": 7.423987477513488e-05, "epoch": 1.2143622722400857, "percentage": 40.51, "elapsed_time": "8:29:32", "remaining_time": "12:28:24"}
187
+ {"current_steps": 1700, "total_steps": 4197, "epoch": 1.2143622722400857, "percentage": 40.51, "elapsed_time": "8:46:15", "remaining_time": "12:52:58"}
188
+ {"current_steps": 1710, "total_steps": 4197, "loss": 0.1477, "lr": 7.387529676149799e-05, "epoch": 1.2215076813147552, "percentage": 40.74, "elapsed_time": "8:47:23", "remaining_time": "12:47:02"}
189
+ {"current_steps": 1720, "total_steps": 4197, "loss": 0.1942, "lr": 7.350906696844307e-05, "epoch": 1.2286530903894248, "percentage": 40.98, "elapsed_time": "8:49:00", "remaining_time": "12:41:50"}
190
+ {"current_steps": 1730, "total_steps": 4197, "loss": 0.2, "lr": 7.314121073307229e-05, "epoch": 1.2357984994640943, "percentage": 41.22, "elapsed_time": "8:50:18", "remaining_time": "12:36:13"}
191
+ {"current_steps": 1740, "total_steps": 4197, "loss": 0.185, "lr": 7.277175350501111e-05, "epoch": 1.242943908538764, "percentage": 41.46, "elapsed_time": "8:51:48", "remaining_time": "12:30:56"}
192
+ {"current_steps": 1750, "total_steps": 4197, "loss": 0.196, "lr": 7.240072084464729e-05, "epoch": 1.2500893176134333, "percentage": 41.7, "elapsed_time": "8:53:14", "remaining_time": "12:25:37"}
193
+ {"current_steps": 1760, "total_steps": 4197, "loss": 0.1322, "lr": 7.202813842136283e-05, "epoch": 1.257234726688103, "percentage": 41.93, "elapsed_time": "8:54:37", "remaining_time": "12:20:16"}
194
+ {"current_steps": 1770, "total_steps": 4197, "loss": 0.2176, "lr": 7.165403201175787e-05, "epoch": 1.2643801357627724, "percentage": 42.17, "elapsed_time": "8:56:13", "remaining_time": "12:15:15"}
195
+ {"current_steps": 1780, "total_steps": 4197, "loss": 0.218, "lr": 7.127842749786747e-05, "epoch": 1.2715255448374418, "percentage": 42.41, "elapsed_time": "8:57:43", "remaining_time": "12:10:09"}
196
+ {"current_steps": 1790, "total_steps": 4197, "loss": 0.1653, "lr": 7.090135086537095e-05, "epoch": 1.2786709539121115, "percentage": 42.65, "elapsed_time": "8:59:05", "remaining_time": "12:04:55"}
197
+ {"current_steps": 1800, "total_steps": 4197, "loss": 0.175, "lr": 7.052282820179412e-05, "epoch": 1.285816362986781, "percentage": 42.89, "elapsed_time": "9:00:34", "remaining_time": "11:59:52"}
198
+ {"current_steps": 1800, "total_steps": 4197, "epoch": 1.285816362986781, "percentage": 42.89, "elapsed_time": "9:17:18", "remaining_time": "12:22:08"}
199
+ {"current_steps": 1810, "total_steps": 4197, "loss": 0.1727, "lr": 7.014288569470446e-05, "epoch": 1.2929617720614506, "percentage": 43.13, "elapsed_time": "9:18:50", "remaining_time": "12:16:59"}
200
+ {"current_steps": 1820, "total_steps": 4197, "loss": 0.2363, "lr": 6.976154962989934e-05, "epoch": 1.30010718113612, "percentage": 43.36, "elapsed_time": "9:20:07", "remaining_time": "12:11:32"}
201
+ {"current_steps": 1830, "total_steps": 4197, "loss": 0.1897, "lr": 6.937884638958757e-05, "epoch": 1.3072525902107897, "percentage": 43.6, "elapsed_time": "9:21:33", "remaining_time": "12:06:20"}
202
+ {"current_steps": 1840, "total_steps": 4197, "loss": 0.2029, "lr": 6.899480245056396e-05, "epoch": 1.314397999285459, "percentage": 43.84, "elapsed_time": "9:23:03", "remaining_time": "12:01:15"}
203
+ {"current_steps": 1850, "total_steps": 4197, "loss": 0.2025, "lr": 6.860944438237788e-05, "epoch": 1.3215434083601285, "percentage": 44.08, "elapsed_time": "9:24:43", "remaining_time": "11:56:26"}
204
+ {"current_steps": 1860, "total_steps": 4197, "loss": 0.2317, "lr": 6.82227988454948e-05, "epoch": 1.3286888174347982, "percentage": 44.32, "elapsed_time": "9:26:23", "remaining_time": "11:51:38"}
205
+ {"current_steps": 1870, "total_steps": 4197, "loss": 0.2318, "lr": 6.783489258945195e-05, "epoch": 1.3358342265094676, "percentage": 44.56, "elapsed_time": "9:27:50", "remaining_time": "11:46:36"}
206
+ {"current_steps": 1880, "total_steps": 4197, "loss": 0.1871, "lr": 6.74457524510077e-05, "epoch": 1.3429796355841372, "percentage": 44.79, "elapsed_time": "9:29:28", "remaining_time": "11:41:51"}
207
+ {"current_steps": 1890, "total_steps": 4197, "loss": 0.211, "lr": 6.705540535228485e-05, "epoch": 1.3501250446588067, "percentage": 45.03, "elapsed_time": "9:31:01", "remaining_time": "11:37:00"}
208
+ {"current_steps": 1900, "total_steps": 4197, "loss": 0.2307, "lr": 6.66638782989081e-05, "epoch": 1.3572704537334763, "percentage": 45.27, "elapsed_time": "9:32:25", "remaining_time": "11:32:01"}
209
+ {"current_steps": 1900, "total_steps": 4197, "epoch": 1.3572704537334763, "percentage": 45.27, "elapsed_time": "9:49:08", "remaining_time": "11:52:14"}
210
+ {"current_steps": 1910, "total_steps": 4197, "loss": 0.2128, "lr": 6.627119837813564e-05, "epoch": 1.3644158628081458, "percentage": 45.51, "elapsed_time": "9:50:27", "remaining_time": "11:46:59"}
211
+ {"current_steps": 1920, "total_steps": 4197, "loss": 0.1551, "lr": 6.587739275698525e-05, "epoch": 1.3715612718828152, "percentage": 45.75, "elapsed_time": "9:51:53", "remaining_time": "11:41:56"}
212
+ {"current_steps": 1930, "total_steps": 4197, "loss": 0.2335, "lr": 6.54824886803547e-05, "epoch": 1.3787066809574848, "percentage": 45.99, "elapsed_time": "9:53:22", "remaining_time": "11:36:58"}
213
+ {"current_steps": 1940, "total_steps": 4197, "loss": 0.1504, "lr": 6.508651346913687e-05, "epoch": 1.3858520900321543, "percentage": 46.22, "elapsed_time": "9:54:40", "remaining_time": "11:31:50"}
214
+ {"current_steps": 1950, "total_steps": 4197, "loss": 0.2679, "lr": 6.468949451832968e-05, "epoch": 1.392997499106824, "percentage": 46.46, "elapsed_time": "9:56:14", "remaining_time": "11:27:03"}
215
+ {"current_steps": 1960, "total_steps": 4197, "loss": 0.1942, "lr": 6.429145929514063e-05, "epoch": 1.4001429081814933, "percentage": 46.7, "elapsed_time": "9:57:45", "remaining_time": "11:22:13"}
216
+ {"current_steps": 1970, "total_steps": 4197, "loss": 0.2025, "lr": 6.389243533708671e-05, "epoch": 1.407288317256163, "percentage": 46.94, "elapsed_time": "9:59:06", "remaining_time": "11:17:16"}
217
+ {"current_steps": 1980, "total_steps": 4197, "loss": 0.1836, "lr": 6.349245025008912e-05, "epoch": 1.4144337263308324, "percentage": 47.18, "elapsed_time": "10:00:31", "remaining_time": "11:12:24"}
218
+ {"current_steps": 1990, "total_steps": 4197, "loss": 0.1526, "lr": 6.309153170656342e-05, "epoch": 1.4215791354055018, "percentage": 47.41, "elapsed_time": "10:01:53", "remaining_time": "11:07:31"}
219
+ {"current_steps": 2000, "total_steps": 4197, "loss": 0.1939, "lr": 6.268970744350515e-05, "epoch": 1.4287245444801715, "percentage": 47.65, "elapsed_time": "10:03:27", "remaining_time": "11:02:54"}
220
+ {"current_steps": 2000, "total_steps": 4197, "epoch": 1.4287245444801715, "percentage": 47.65, "elapsed_time": "10:20:10", "remaining_time": "11:21:15"}