Training in progress, step 2000
Browse files
LLaMA-Factory/wandb/run-20250305_233246-9ct1o6yk/files/output.log
CHANGED
@@ -220,4 +220,226 @@ It seems you are trying to upload a large folder at once. This might take some t
|
|
220 |
[INFO|tokenization_utils_base.py:2491] 2025-03-06 04:42:32,803 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
|
221 |
[INFO|tokenization_utils_base.py:2500] 2025-03-06 04:42:32,803 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
|
222 |
It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
|
223 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
220 |
[INFO|tokenization_utils_base.py:2491] 2025-03-06 04:42:32,803 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
|
221 |
[INFO|tokenization_utils_base.py:2500] 2025-03-06 04:42:32,803 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
|
222 |
It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
|
223 |
+
26%|ββββββββββ | 1100/4197 [5:24:13<7:41:45, 8.95s/it][INFO|trainer.py:4226] 2025-03-06 04:57:00,517 >>
|
224 |
+
{'loss': 0.268, 'grad_norm': 1.5076738595962524, 'learning_rate': 9.409912607418172e-05, 'epoch': 0.72}
|
225 |
+
{'loss': 0.3038, 'grad_norm': 3.3230276107788086, 'learning_rate': 9.390160386775895e-05, 'epoch': 0.73}
|
226 |
+
{'loss': 0.2869, 'grad_norm': 1.699854850769043, 'learning_rate': 9.370104438953125e-05, 'epoch': 0.74}
|
227 |
+
{'loss': 0.289, 'grad_norm': 0.904507577419281, 'learning_rate': 9.349746151492902e-05, 'epoch': 0.74}
|
228 |
+
{'loss': 0.3729, 'grad_norm': 0.9463105201721191, 'learning_rate': 9.329086932855215e-05, 'epoch': 0.75}
|
229 |
+
{'loss': 0.2282, 'grad_norm': 1.4746607542037964, 'learning_rate': 9.30812821231956e-05, 'epoch': 0.76}
|
230 |
+
{'loss': 0.3029, 'grad_norm': 1.0270076990127563, 'learning_rate': 9.286871439886058e-05, 'epoch': 0.76}
|
231 |
+
{'loss': 0.3268, 'grad_norm': 2.0656538009643555, 'learning_rate': 9.265318086175143e-05, 'epoch': 0.77}
|
232 |
+
{'loss': 0.2942, 'grad_norm': 0.9798826575279236, 'learning_rate': 9.243469642325805e-05, 'epoch': 0.78}
|
233 |
+
{'loss': 0.3266, 'grad_norm': 1.1419672966003418, 'learning_rate': 9.221327619892452e-05, 'epoch': 0.79}
|
234 |
+
***** Running Evaluation *****
|
235 |
+
[INFO|trainer.py:4228] 2025-03-06 04:57:00,517 >> Num examples = 1400
|
236 |
+
[INFO|trainer.py:4231] 2025-03-06 04:57:00,517 >> Batch size = 1
|
237 |
+
29%|ββββββββββ | 1200/4197 [5:55:20<7:24:16, 8.89s/it][INFO|trainer.py:4226] 2025-03-06 05:28:07,623 >>
|
238 |
+
***** Running Evaluation *****
|
239 |
+
{'eval_news_finetune_val_loss': 0.307956337928772, 'eval_news_finetune_val_runtime': 1003.1873, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.79}
|
240 |
+
{'loss': 0.3596, 'grad_norm': 0.6810228228569031, 'learning_rate': 9.198893550740306e-05, 'epoch': 0.79}
|
241 |
+
{'loss': 0.3106, 'grad_norm': 1.6553049087524414, 'learning_rate': 9.176168986939446e-05, 'epoch': 0.8}
|
242 |
+
{'loss': 0.3298, 'grad_norm': 0.7749443650245667, 'learning_rate': 9.153155500657422e-05, 'epoch': 0.81}
|
243 |
+
{'loss': 0.279, 'grad_norm': 0.8693751096725464, 'learning_rate': 9.129854684050481e-05, 'epoch': 0.81}
|
244 |
+
{'loss': 0.3195, 'grad_norm': 1.1013332605361938, 'learning_rate': 9.10626814915343e-05, 'epoch': 0.82}
|
245 |
+
{'loss': 0.3027, 'grad_norm': 1.2278695106506348, 'learning_rate': 9.082397527768092e-05, 'epoch': 0.83}
|
246 |
+
{'loss': 0.2238, 'grad_norm': 2.173530101776123, 'learning_rate': 9.058244471350428e-05, 'epoch': 0.84}
|
247 |
+
{'loss': 0.2399, 'grad_norm': 1.125986933708191, 'learning_rate': 9.033810650896274e-05, 'epoch': 0.84}
|
248 |
+
{'loss': 0.2736, 'grad_norm': 0.6611151099205017, 'learning_rate': 9.009097756825737e-05, 'epoch': 0.85}
|
249 |
+
{'loss': 0.2949, 'grad_norm': 1.9068485498428345, 'learning_rate': 8.98410749886625e-05, 'epoch': 0.86}
|
250 |
+
[INFO|trainer.py:4228] 2025-03-06 05:28:07,623 >> Num examples = 1400
|
251 |
+
[INFO|trainer.py:4231] 2025-03-06 05:28:07,623 >> Batch size = 1
|
252 |
+
31%|βββββββββββ | 1300/4197 [6:26:37<8:29:22, 10.55s/it][INFO|trainer.py:4226] 2025-03-06 05:59:25,291 >>
|
253 |
+
***** Running Evaluation *****
|
254 |
+
{'eval_news_finetune_val_loss': 0.31006094813346863, 'eval_news_finetune_val_runtime': 1002.7866, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.86}
|
255 |
+
{'loss': 0.3657, 'grad_norm': 1.192031979560852, 'learning_rate': 8.958841605934278e-05, 'epoch': 0.86}
|
256 |
+
{'loss': 0.3068, 'grad_norm': 1.2596725225448608, 'learning_rate': 8.933301826015715e-05, 'epoch': 0.87}
|
257 |
+
{'loss': 0.3122, 'grad_norm': 1.4713683128356934, 'learning_rate': 8.907489926044945e-05, 'epoch': 0.88}
|
258 |
+
{'loss': 0.2989, 'grad_norm': 1.3583886623382568, 'learning_rate': 8.881407691782608e-05, 'epoch': 0.89}
|
259 |
+
{'loss': 0.2549, 'grad_norm': 0.9863426089286804, 'learning_rate': 8.855056927692037e-05, 'epoch': 0.89}
|
260 |
+
{'loss': 0.2809, 'grad_norm': 1.0579396486282349, 'learning_rate': 8.828439456814442e-05, 'epoch': 0.9}
|
261 |
+
{'loss': 0.2933, 'grad_norm': 2.847482681274414, 'learning_rate': 8.801557120642766e-05, 'epoch': 0.91}
|
262 |
+
{'loss': 0.2866, 'grad_norm': 0.8942415118217468, 'learning_rate': 8.774411778994295e-05, 'epoch': 0.91}
|
263 |
+
{'loss': 0.2939, 'grad_norm': 1.297845721244812, 'learning_rate': 8.747005309881984e-05, 'epoch': 0.92}
|
264 |
+
{'loss': 0.3018, 'grad_norm': 1.2745181322097778, 'learning_rate': 8.719339609384531e-05, 'epoch': 0.93}
|
265 |
+
[INFO|trainer.py:4228] 2025-03-06 05:59:25,291 >> Num examples = 1400
|
266 |
+
[INFO|trainer.py:4231] 2025-03-06 05:59:25,291 >> Batch size = 1
|
267 |
+
33%|ββββββββββββ | 1400/4197 [6:57:13<5:07:20, 6.59s/it][INFO|trainer.py:4226] 2025-03-06 06:30:00,820 >>
|
268 |
+
***** Running Evaluation *****
|
269 |
+
{'eval_news_finetune_val_loss': 0.29822030663490295, 'eval_news_finetune_val_runtime': 1002.5672, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 0.93}
|
270 |
+
{'loss': 0.295, 'grad_norm': 1.3898978233337402, 'learning_rate': 8.691416591515198e-05, 'epoch': 0.94}
|
271 |
+
{'loss': 0.209, 'grad_norm': 1.1516591310501099, 'learning_rate': 8.663238188089398e-05, 'epoch': 0.94}
|
272 |
+
{'loss': 0.2904, 'grad_norm': 0.9356768131256104, 'learning_rate': 8.634806348591036e-05, 'epoch': 0.95}
|
273 |
+
{'loss': 0.2607, 'grad_norm': 1.884950876235962, 'learning_rate': 8.606123040037643e-05, 'epoch': 0.96}
|
274 |
+
{'loss': 0.3279, 'grad_norm': 1.2719082832336426, 'learning_rate': 8.577190246844291e-05, 'epoch': 0.96}
|
275 |
+
{'loss': 0.3011, 'grad_norm': 0.935297429561615, 'learning_rate': 8.548009970686302e-05, 'epoch': 0.97}
|
276 |
+
{'loss': 0.2379, 'grad_norm': 1.6732884645462036, 'learning_rate': 8.51858423036076e-05, 'epoch': 0.98}
|
277 |
+
{'loss': 0.2599, 'grad_norm': 0.6651692390441895, 'learning_rate': 8.488915061646856e-05, 'epoch': 0.99}
|
278 |
+
{'loss': 0.2265, 'grad_norm': 1.121752381324768, 'learning_rate': 8.459004517165032e-05, 'epoch': 0.99}
|
279 |
+
{'loss': 0.3301, 'grad_norm': 0.5099928379058838, 'learning_rate': 8.428854666234978e-05, 'epoch': 1.0}
|
280 |
+
[INFO|trainer.py:4228] 2025-03-06 06:30:00,821 >> Num examples = 1400
|
281 |
+
[INFO|trainer.py:4231] 2025-03-06 06:30:00,821 >> Batch size = 1
|
282 |
+
36%|βββββββββββββ | 1500/4197 [7:28:12<6:45:08, 9.01s/it][INFO|trainer.py:4226] 2025-03-06 07:00:59,933 >>
|
283 |
+
***** Running Evaluation *****
|
284 |
+
{'eval_news_finetune_val_loss': 0.28762951493263245, 'eval_news_finetune_val_runtime': 1002.7793, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.0}
|
285 |
+
{'loss': 0.2021, 'grad_norm': 0.9986103177070618, 'learning_rate': 8.398467594732478e-05, 'epoch': 1.01}
|
286 |
+
{'loss': 0.2228, 'grad_norm': 1.2675282955169678, 'learning_rate': 8.367845404945084e-05, 'epoch': 1.01}
|
287 |
+
{'loss': 0.1947, 'grad_norm': 0.8156709671020508, 'learning_rate': 8.336990215426688e-05, 'epoch': 1.02}
|
288 |
+
{'loss': 0.2344, 'grad_norm': 0.5374387502670288, 'learning_rate': 8.305904160850941e-05, 'epoch': 1.03}
|
289 |
+
{'loss': 0.1919, 'grad_norm': 0.6672261357307434, 'learning_rate': 8.274589391863583e-05, 'epoch': 1.04}
|
290 |
+
{'loss': 0.2218, 'grad_norm': 0.9803467988967896, 'learning_rate': 8.243048074933634e-05, 'epoch': 1.04}
|
291 |
+
{'loss': 0.2556, 'grad_norm': 1.482840657234192, 'learning_rate': 8.21128239220353e-05, 'epoch': 1.05}
|
292 |
+
{'loss': 0.2052, 'grad_norm': 1.0589625835418701, 'learning_rate': 8.179294541338135e-05, 'epoch': 1.06}
|
293 |
+
{'loss': 0.2386, 'grad_norm': 0.8332052230834961, 'learning_rate': 8.147086735372716e-05, 'epoch': 1.06}
|
294 |
+
{'loss': 0.1426, 'grad_norm': 0.6018723845481873, 'learning_rate': 8.114661202559828e-05, 'epoch': 1.07}
|
295 |
+
[INFO|trainer.py:4228] 2025-03-06 07:00:59,933 >> Num examples = 1400
|
296 |
+
[INFO|trainer.py:4231] 2025-03-06 07:00:59,934 >> Batch size = 1
|
297 |
+
36%|βββββββββββββ | 1500/4197 [7:44:55<6:45:08, 9.01s/it][INFO|trainer.py:3910] 2025-03-06 07:17:42,683 >> Saving model checkpoint to /kaggle/working/checkpoint-1500
|
298 |
+
[INFO|configuration_utils.py:696] 2025-03-06 07:17:43,155 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
|
299 |
+
{'eval_news_finetune_val_loss': 0.30121028423309326, 'eval_news_finetune_val_runtime': 1002.7457, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.07}
|
300 |
+
[INFO|configuration_utils.py:768] 2025-03-06 07:17:43,156 >> Model config Qwen2Config {
|
301 |
+
"architectures": [
|
302 |
+
"Qwen2ForCausalLM"
|
303 |
+
],
|
304 |
+
"attention_dropout": 0.0,
|
305 |
+
"bos_token_id": 151643,
|
306 |
+
"eos_token_id": 151645,
|
307 |
+
"hidden_act": "silu",
|
308 |
+
"hidden_size": 1536,
|
309 |
+
"initializer_range": 0.02,
|
310 |
+
"intermediate_size": 8960,
|
311 |
+
"max_position_embeddings": 32768,
|
312 |
+
"max_window_layers": 21,
|
313 |
+
"model_type": "qwen2",
|
314 |
+
"num_attention_heads": 12,
|
315 |
+
"num_hidden_layers": 28,
|
316 |
+
"num_key_value_heads": 2,
|
317 |
+
"rms_norm_eps": 1e-06,
|
318 |
+
"rope_scaling": null,
|
319 |
+
"rope_theta": 1000000.0,
|
320 |
+
"sliding_window": null,
|
321 |
+
"tie_word_embeddings": true,
|
322 |
+
"torch_dtype": "bfloat16",
|
323 |
+
"transformers_version": "4.48.3",
|
324 |
+
"use_cache": true,
|
325 |
+
"use_sliding_window": false,
|
326 |
+
"vocab_size": 151936
|
327 |
+
}
|
328 |
+
|
329 |
+
[INFO|tokenization_utils_base.py:2491] 2025-03-06 07:17:43,813 >> tokenizer config file saved in /kaggle/working/checkpoint-1500/tokenizer_config.json
|
330 |
+
[INFO|tokenization_utils_base.py:2500] 2025-03-06 07:17:43,814 >> Special tokens file saved in /kaggle/working/checkpoint-1500/special_tokens_map.json
|
331 |
+
[INFO|tokenization_utils_base.py:2491] 2025-03-06 07:17:45,300 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
|
332 |
+
[INFO|tokenization_utils_base.py:2500] 2025-03-06 07:17:45,301 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
|
333 |
+
It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
|
334 |
+
38%|ββββββββββββββ | 1600/4197 [7:58:53<5:45:00, 7.97s/it][INFO|trainer.py:4226] 2025-03-06 07:31:41,016 >>
|
335 |
+
{'loss': 0.2407, 'grad_norm': 1.7663507461547852, 'learning_rate': 8.082020186215156e-05, 'epoch': 1.08}
|
336 |
+
{'loss': 0.2483, 'grad_norm': 1.2081632614135742, 'learning_rate': 8.049165944562316e-05, 'epoch': 1.09}
|
337 |
+
{'loss': 0.2013, 'grad_norm': 0.5045826435089111, 'learning_rate': 8.016100750576621e-05, 'epoch': 1.09}
|
338 |
+
{'loss': 0.2034, 'grad_norm': 1.4456278085708618, 'learning_rate': 7.98282689182783e-05, 'epoch': 1.1}
|
339 |
+
{'loss': 0.2386, 'grad_norm': 1.1558668613433838, 'learning_rate': 7.949346670321891e-05, 'epoch': 1.11}
|
340 |
+
{'loss': 0.2299, 'grad_norm': 1.4196126461029053, 'learning_rate': 7.915662402341664e-05, 'epoch': 1.11}
|
341 |
+
{'loss': 0.2105, 'grad_norm': 0.9341222047805786, 'learning_rate': 7.88177641828669e-05, 'epoch': 1.12}
|
342 |
+
{'loss': 0.1925, 'grad_norm': 1.066001296043396, 'learning_rate': 7.847691062511957e-05, 'epoch': 1.13}
|
343 |
+
{'loss': 0.2425, 'grad_norm': 0.7840182781219482, 'learning_rate': 7.813408693165704e-05, 'epoch': 1.14}
|
344 |
+
{'loss': 0.2014, 'grad_norm': 0.983668327331543, 'learning_rate': 7.778931682026293e-05, 'epoch': 1.14}
|
345 |
+
***** Running Evaluation *****
|
346 |
+
[INFO|trainer.py:4228] 2025-03-06 07:31:41,016 >> Num examples = 1400
|
347 |
+
[INFO|trainer.py:4231] 2025-03-06 07:31:41,017 >> Batch size = 1
|
348 |
+
41%|βββββββββββββββ | 1700/4197 [8:29:32<5:22:45, 7.76s/it][INFO|trainer.py:4226] 2025-03-06 08:02:19,546 >>
|
349 |
+
***** Running Evaluation *****
|
350 |
+
{'eval_news_finetune_val_loss': 0.29564452171325684, 'eval_news_finetune_val_runtime': 1003.001, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.14}
|
351 |
+
{'loss': 0.2863, 'grad_norm': 1.63984215259552, 'learning_rate': 7.744262414338099e-05, 'epoch': 1.15}
|
352 |
+
{'loss': 0.2175, 'grad_norm': 0.9211621284484863, 'learning_rate': 7.709403288646507e-05, 'epoch': 1.16}
|
353 |
+
{'loss': 0.1893, 'grad_norm': 1.3369996547698975, 'learning_rate': 7.67435671663196e-05, 'epoch': 1.16}
|
354 |
+
{'loss': 0.2483, 'grad_norm': 0.7532891631126404, 'learning_rate': 7.63912512294312e-05, 'epoch': 1.17}
|
355 |
+
{'loss': 0.1888, 'grad_norm': 1.0959442853927612, 'learning_rate': 7.603710945029119e-05, 'epoch': 1.18}
|
356 |
+
{'loss': 0.2144, 'grad_norm': 0.9019472599029541, 'learning_rate': 7.568116632970922e-05, 'epoch': 1.19}
|
357 |
+
{'loss': 0.191, 'grad_norm': 1.1219818592071533, 'learning_rate': 7.532344649311829e-05, 'epoch': 1.19}
|
358 |
+
{'loss': 0.2762, 'grad_norm': 1.0829100608825684, 'learning_rate': 7.496397468887106e-05, 'epoch': 1.2}
|
359 |
+
{'loss': 0.157, 'grad_norm': 0.7855832576751709, 'learning_rate': 7.460277578652759e-05, 'epoch': 1.21}
|
360 |
+
{'loss': 0.2627, 'grad_norm': 2.407999038696289, 'learning_rate': 7.423987477513488e-05, 'epoch': 1.21}
|
361 |
+
[INFO|trainer.py:4228] 2025-03-06 08:02:19,546 >> Num examples = 1400
|
362 |
+
[INFO|trainer.py:4231] 2025-03-06 08:02:19,546 >> Batch size = 1
|
363 |
+
43%|βββββββββββββββ | 1800/4197 [9:00:34<6:16:10, 9.42s/it][INFO|trainer.py:4226] 2025-03-06 08:33:22,380 >>
|
364 |
+
***** Running Evaluation *****
|
365 |
+
{'eval_news_finetune_val_loss': 0.28248873353004456, 'eval_news_finetune_val_runtime': 1003.1081, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.21}
|
366 |
+
{'loss': 0.1477, 'grad_norm': 1.5500895977020264, 'learning_rate': 7.387529676149799e-05, 'epoch': 1.22}
|
367 |
+
{'loss': 0.1942, 'grad_norm': 1.5599130392074585, 'learning_rate': 7.350906696844307e-05, 'epoch': 1.23}
|
368 |
+
{'loss': 0.2, 'grad_norm': 1.6327091455459595, 'learning_rate': 7.314121073307229e-05, 'epoch': 1.24}
|
369 |
+
{'loss': 0.185, 'grad_norm': 0.6044666767120361, 'learning_rate': 7.277175350501111e-05, 'epoch': 1.24}
|
370 |
+
{'loss': 0.196, 'grad_norm': 1.317089319229126, 'learning_rate': 7.240072084464729e-05, 'epoch': 1.25}
|
371 |
+
{'loss': 0.1322, 'grad_norm': 1.089105486869812, 'learning_rate': 7.202813842136283e-05, 'epoch': 1.26}
|
372 |
+
{'loss': 0.2176, 'grad_norm': 1.4972888231277466, 'learning_rate': 7.165403201175787e-05, 'epoch': 1.26}
|
373 |
+
{'loss': 0.218, 'grad_norm': 1.4998830556869507, 'learning_rate': 7.127842749786747e-05, 'epoch': 1.27}
|
374 |
+
{'loss': 0.1653, 'grad_norm': 0.9759517908096313, 'learning_rate': 7.090135086537095e-05, 'epoch': 1.28}
|
375 |
+
{'loss': 0.175, 'grad_norm': 0.9713583588600159, 'learning_rate': 7.052282820179412e-05, 'epoch': 1.29}
|
376 |
+
[INFO|trainer.py:4228] 2025-03-06 08:33:22,381 >> Num examples = 1400
|
377 |
+
[INFO|trainer.py:4231] 2025-03-06 08:33:22,381 >> Batch size = 1
|
378 |
+
45%|ββββββββββββββββ | 1900/4197 [9:32:25<5:38:20, 8.84s/it][INFO|trainer.py:4226] 2025-03-06 09:05:13,019 >>
|
379 |
+
***** Running Evaluation *****
|
380 |
+
{'eval_news_finetune_val_loss': 0.2936909794807434, 'eval_news_finetune_val_runtime': 1003.12, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.29}
|
381 |
+
{'loss': 0.1727, 'grad_norm': 0.6328814625740051, 'learning_rate': 7.014288569470446e-05, 'epoch': 1.29}
|
382 |
+
{'loss': 0.2363, 'grad_norm': 1.622104525566101, 'learning_rate': 6.976154962989934e-05, 'epoch': 1.3}
|
383 |
+
{'loss': 0.1897, 'grad_norm': 1.8254674673080444, 'learning_rate': 6.937884638958757e-05, 'epoch': 1.31}
|
384 |
+
{'loss': 0.2029, 'grad_norm': 0.8813793063163757, 'learning_rate': 6.899480245056396e-05, 'epoch': 1.31}
|
385 |
+
{'loss': 0.2025, 'grad_norm': 0.7675999999046326, 'learning_rate': 6.860944438237788e-05, 'epoch': 1.32}
|
386 |
+
{'loss': 0.2317, 'grad_norm': 1.1973013877868652, 'learning_rate': 6.82227988454948e-05, 'epoch': 1.33}
|
387 |
+
{'loss': 0.2318, 'grad_norm': 0.7864009737968445, 'learning_rate': 6.783489258945195e-05, 'epoch': 1.34}
|
388 |
+
{'loss': 0.1871, 'grad_norm': 1.0866330862045288, 'learning_rate': 6.74457524510077e-05, 'epoch': 1.34}
|
389 |
+
{'loss': 0.211, 'grad_norm': 0.8745126724243164, 'learning_rate': 6.705540535228485e-05, 'epoch': 1.35}
|
390 |
+
{'loss': 0.2307, 'grad_norm': 1.3401581048965454, 'learning_rate': 6.66638782989081e-05, 'epoch': 1.36}
|
391 |
+
[INFO|trainer.py:4228] 2025-03-06 09:05:13,019 >> Num examples = 1400
|
392 |
+
[INFO|trainer.py:4231] 2025-03-06 09:05:13,019 >> Batch size = 1
|
393 |
+
48%|βββββββββββββββββ | 2000/4197 [10:03:27<6:06:16, 10.00s/it][INFO|trainer.py:4226] 2025-03-06 09:36:15,070 >>
|
394 |
+
***** Running Evaluation *****
|
395 |
+
{'eval_news_finetune_val_loss': 0.2787444591522217, 'eval_news_finetune_val_runtime': 1002.9344, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.36}
|
396 |
+
{'loss': 0.2128, 'grad_norm': 0.6149284839630127, 'learning_rate': 6.627119837813564e-05, 'epoch': 1.36}
|
397 |
+
{'loss': 0.1551, 'grad_norm': 1.7847625017166138, 'learning_rate': 6.587739275698525e-05, 'epoch': 1.37}
|
398 |
+
{'loss': 0.2335, 'grad_norm': 1.1973716020584106, 'learning_rate': 6.54824886803547e-05, 'epoch': 1.38}
|
399 |
+
{'loss': 0.1504, 'grad_norm': 1.5757859945297241, 'learning_rate': 6.508651346913687e-05, 'epoch': 1.39}
|
400 |
+
{'loss': 0.2679, 'grad_norm': 1.7269341945648193, 'learning_rate': 6.468949451832968e-05, 'epoch': 1.39}
|
401 |
+
{'loss': 0.1942, 'grad_norm': 1.6860129833221436, 'learning_rate': 6.429145929514063e-05, 'epoch': 1.4}
|
402 |
+
{'loss': 0.2025, 'grad_norm': 1.1732631921768188, 'learning_rate': 6.389243533708671e-05, 'epoch': 1.41}
|
403 |
+
{'loss': 0.1836, 'grad_norm': 0.9073033332824707, 'learning_rate': 6.349245025008912e-05, 'epoch': 1.41}
|
404 |
+
{'loss': 0.1526, 'grad_norm': 1.133843183517456, 'learning_rate': 6.309153170656342e-05, 'epoch': 1.42}
|
405 |
+
{'loss': 0.1939, 'grad_norm': 2.656296968460083, 'learning_rate': 6.268970744350515e-05, 'epoch': 1.43}
|
406 |
+
[INFO|trainer.py:4228] 2025-03-06 09:36:15,070 >> Num examples = 1400
|
407 |
+
[INFO|trainer.py:4231] 2025-03-06 09:36:15,070 >> Batch size = 1
|
408 |
+
48%|βββββββββββββββββ | 2000/4197 [10:20:10<6:06:16, 10.00s/it][INFO|trainer.py:3910] 2025-03-06 09:52:58,168 >> Saving model checkpoint to /kaggle/working/checkpoint-2000
|
409 |
+
[INFO|configuration_utils.py:696] 2025-03-06 09:52:58,703 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2.5-1.5B-Instruct/snapshots/989aa7980e4cf806f80c7fef2b1adb7bc71aa306/config.json
|
410 |
+
{'eval_news_finetune_val_loss': 0.27414408326148987, 'eval_news_finetune_val_runtime': 1003.0949, 'eval_news_finetune_val_samples_per_second': 1.396, 'eval_news_finetune_val_steps_per_second': 1.396, 'epoch': 1.43}
|
411 |
+
[INFO|configuration_utils.py:768] 2025-03-06 09:52:58,704 >> Model config Qwen2Config {
|
412 |
+
"architectures": [
|
413 |
+
"Qwen2ForCausalLM"
|
414 |
+
],
|
415 |
+
"attention_dropout": 0.0,
|
416 |
+
"bos_token_id": 151643,
|
417 |
+
"eos_token_id": 151645,
|
418 |
+
"hidden_act": "silu",
|
419 |
+
"hidden_size": 1536,
|
420 |
+
"initializer_range": 0.02,
|
421 |
+
"intermediate_size": 8960,
|
422 |
+
"max_position_embeddings": 32768,
|
423 |
+
"max_window_layers": 21,
|
424 |
+
"model_type": "qwen2",
|
425 |
+
"num_attention_heads": 12,
|
426 |
+
"num_hidden_layers": 28,
|
427 |
+
"num_key_value_heads": 2,
|
428 |
+
"rms_norm_eps": 1e-06,
|
429 |
+
"rope_scaling": null,
|
430 |
+
"rope_theta": 1000000.0,
|
431 |
+
"sliding_window": null,
|
432 |
+
"tie_word_embeddings": true,
|
433 |
+
"torch_dtype": "bfloat16",
|
434 |
+
"transformers_version": "4.48.3",
|
435 |
+
"use_cache": true,
|
436 |
+
"use_sliding_window": false,
|
437 |
+
"vocab_size": 151936
|
438 |
+
}
|
439 |
+
|
440 |
+
[INFO|tokenization_utils_base.py:2491] 2025-03-06 09:52:59,374 >> tokenizer config file saved in /kaggle/working/checkpoint-2000/tokenizer_config.json
|
441 |
+
[INFO|tokenization_utils_base.py:2500] 2025-03-06 09:52:59,375 >> Special tokens file saved in /kaggle/working/checkpoint-2000/special_tokens_map.json
|
442 |
+
[INFO|tokenization_utils_base.py:2491] 2025-03-06 09:53:00,849 >> tokenizer config file saved in /kaggle/working/tokenizer_config.json
|
443 |
+
[INFO|tokenization_utils_base.py:2500] 2025-03-06 09:53:00,849 >> Special tokens file saved in /kaggle/working/special_tokens_map.json
|
444 |
+
It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`huggingface-cli upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.
|
445 |
+
48%|βββββββββββββββ | 2002/4197 [10:20:29<134:16:13, 220.22s/it]
|
LLaMA-Factory/wandb/run-20250305_233246-9ct1o6yk/run-9ct1o6yk.wandb
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d17c88e6de89554302307bd771e7c76121683db7c2c09bbb4defbbd053560555
|
3 |
+
size 11370496
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 295488936
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dd91de723bbd7ea7b7dfe87942ef4a89726bd5bdcfdd6abb72301f7a8513b562
|
3 |
size 295488936
|
trainer_log.jsonl
CHANGED
@@ -108,3 +108,113 @@
|
|
108 |
{"current_steps": 990, "total_steps": 4197, "loss": 0.3005, "lr": 9.448500422148364e-05, "epoch": 0.707395498392283, "percentage": 23.59, "elapsed_time": "4:51:35", "remaining_time": "15:44:34"}
|
109 |
{"current_steps": 1000, "total_steps": 4197, "loss": 0.294, "lr": 9.429359734349863e-05, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "4:52:59", "remaining_time": "15:36:41"}
|
110 |
{"current_steps": 1000, "total_steps": 4197, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "5:09:42", "remaining_time": "16:30:08"}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
{"current_steps": 990, "total_steps": 4197, "loss": 0.3005, "lr": 9.448500422148364e-05, "epoch": 0.707395498392283, "percentage": 23.59, "elapsed_time": "4:51:35", "remaining_time": "15:44:34"}
|
109 |
{"current_steps": 1000, "total_steps": 4197, "loss": 0.294, "lr": 9.429359734349863e-05, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "4:52:59", "remaining_time": "15:36:41"}
|
110 |
{"current_steps": 1000, "total_steps": 4197, "epoch": 0.7145409074669525, "percentage": 23.83, "elapsed_time": "5:09:42", "remaining_time": "16:30:08"}
|
111 |
+
{"current_steps": 1010, "total_steps": 4197, "loss": 0.268, "lr": 9.409912607418172e-05, "epoch": 0.721686316541622, "percentage": 24.06, "elapsed_time": "5:11:21", "remaining_time": "16:22:26"}
|
112 |
+
{"current_steps": 1020, "total_steps": 4197, "loss": 0.3038, "lr": 9.390160386775895e-05, "epoch": 0.7288317256162915, "percentage": 24.3, "elapsed_time": "5:12:42", "remaining_time": "16:14:01"}
|
113 |
+
{"current_steps": 1030, "total_steps": 4197, "loss": 0.2869, "lr": 9.370104438953125e-05, "epoch": 0.735977134690961, "percentage": 24.54, "elapsed_time": "5:14:16", "remaining_time": "16:06:20"}
|
114 |
+
{"current_steps": 1040, "total_steps": 4197, "loss": 0.289, "lr": 9.349746151492902e-05, "epoch": 0.7431225437656306, "percentage": 24.78, "elapsed_time": "5:15:40", "remaining_time": "15:58:14"}
|
115 |
+
{"current_steps": 1050, "total_steps": 4197, "loss": 0.3729, "lr": 9.329086932855215e-05, "epoch": 0.7502679528403001, "percentage": 25.02, "elapsed_time": "5:16:59", "remaining_time": "15:50:05"}
|
116 |
+
{"current_steps": 1060, "total_steps": 4197, "loss": 0.2282, "lr": 9.30812821231956e-05, "epoch": 0.7574133619149697, "percentage": 25.26, "elapsed_time": "5:18:20", "remaining_time": "15:42:05"}
|
117 |
+
{"current_steps": 1070, "total_steps": 4197, "loss": 0.3029, "lr": 9.286871439886058e-05, "epoch": 0.7645587709896392, "percentage": 25.49, "elapsed_time": "5:19:52", "remaining_time": "15:34:47"}
|
118 |
+
{"current_steps": 1080, "total_steps": 4197, "loss": 0.3268, "lr": 9.265318086175143e-05, "epoch": 0.7717041800643086, "percentage": 25.73, "elapsed_time": "5:21:19", "remaining_time": "15:27:22"}
|
119 |
+
{"current_steps": 1090, "total_steps": 4197, "loss": 0.2942, "lr": 9.243469642325805e-05, "epoch": 0.7788495891389782, "percentage": 25.97, "elapsed_time": "5:22:53", "remaining_time": "15:20:24"}
|
120 |
+
{"current_steps": 1100, "total_steps": 4197, "loss": 0.3266, "lr": 9.221327619892452e-05, "epoch": 0.7859949982136477, "percentage": 26.21, "elapsed_time": "5:24:13", "remaining_time": "15:12:49"}
|
121 |
+
{"current_steps": 1100, "total_steps": 4197, "epoch": 0.7859949982136477, "percentage": 26.21, "elapsed_time": "5:40:56", "remaining_time": "15:59:53"}
|
122 |
+
{"current_steps": 1110, "total_steps": 4197, "loss": 0.3596, "lr": 9.198893550740306e-05, "epoch": 0.7931404072883173, "percentage": 26.45, "elapsed_time": "5:42:26", "remaining_time": "15:52:22"}
|
123 |
+
{"current_steps": 1120, "total_steps": 4197, "loss": 0.3106, "lr": 9.176168986939446e-05, "epoch": 0.8002858163629868, "percentage": 26.69, "elapsed_time": "5:43:55", "remaining_time": "15:44:53"}
|
124 |
+
{"current_steps": 1130, "total_steps": 4197, "loss": 0.3298, "lr": 9.153155500657422e-05, "epoch": 0.8074312254376563, "percentage": 26.92, "elapsed_time": "5:45:16", "remaining_time": "15:37:09"}
|
125 |
+
{"current_steps": 1140, "total_steps": 4197, "loss": 0.279, "lr": 9.129854684050481e-05, "epoch": 0.8145766345123259, "percentage": 27.16, "elapsed_time": "5:46:42", "remaining_time": "15:29:42"}
|
126 |
+
{"current_steps": 1150, "total_steps": 4197, "loss": 0.3195, "lr": 9.10626814915343e-05, "epoch": 0.8217220435869954, "percentage": 27.4, "elapsed_time": "5:48:13", "remaining_time": "15:22:37"}
|
127 |
+
{"current_steps": 1160, "total_steps": 4197, "loss": 0.3027, "lr": 9.082397527768092e-05, "epoch": 0.8288674526616648, "percentage": 27.64, "elapsed_time": "5:49:37", "remaining_time": "15:15:21"}
|
128 |
+
{"current_steps": 1170, "total_steps": 4197, "loss": 0.2238, "lr": 9.058244471350428e-05, "epoch": 0.8360128617363344, "percentage": 27.88, "elapsed_time": "5:50:55", "remaining_time": "15:07:53"}
|
129 |
+
{"current_steps": 1180, "total_steps": 4197, "loss": 0.2399, "lr": 9.033810650896274e-05, "epoch": 0.8431582708110039, "percentage": 28.12, "elapsed_time": "5:52:18", "remaining_time": "15:00:47"}
|
130 |
+
{"current_steps": 1190, "total_steps": 4197, "loss": 0.2736, "lr": 9.009097756825737e-05, "epoch": 0.8503036798856735, "percentage": 28.35, "elapsed_time": "5:53:52", "remaining_time": "14:54:11"}
|
131 |
+
{"current_steps": 1200, "total_steps": 4197, "loss": 0.2949, "lr": 8.98410749886625e-05, "epoch": 0.857449088960343, "percentage": 28.59, "elapsed_time": "5:55:20", "remaining_time": "14:47:27"}
|
132 |
+
{"current_steps": 1200, "total_steps": 4197, "epoch": 0.857449088960343, "percentage": 28.59, "elapsed_time": "6:12:02", "remaining_time": "15:29:11"}
|
133 |
+
{"current_steps": 1210, "total_steps": 4197, "loss": 0.3657, "lr": 8.958841605934278e-05, "epoch": 0.8645944980350125, "percentage": 28.83, "elapsed_time": "6:13:27", "remaining_time": "15:21:54"}
|
134 |
+
{"current_steps": 1220, "total_steps": 4197, "loss": 0.3068, "lr": 8.933301826015715e-05, "epoch": 0.8717399071096821, "percentage": 29.07, "elapsed_time": "6:14:56", "remaining_time": "15:14:54"}
|
135 |
+
{"current_steps": 1230, "total_steps": 4197, "loss": 0.3122, "lr": 8.907489926044945e-05, "epoch": 0.8788853161843515, "percentage": 29.31, "elapsed_time": "6:16:27", "remaining_time": "15:08:06"}
|
136 |
+
{"current_steps": 1240, "total_steps": 4197, "loss": 0.2989, "lr": 8.881407691782608e-05, "epoch": 0.886030725259021, "percentage": 29.54, "elapsed_time": "6:17:59", "remaining_time": "15:01:22"}
|
137 |
+
{"current_steps": 1250, "total_steps": 4197, "loss": 0.2549, "lr": 8.855056927692037e-05, "epoch": 0.8931761343336906, "percentage": 29.78, "elapsed_time": "6:19:26", "remaining_time": "14:54:33"}
|
138 |
+
{"current_steps": 1260, "total_steps": 4197, "loss": 0.2809, "lr": 8.828439456814442e-05, "epoch": 0.9003215434083601, "percentage": 30.02, "elapsed_time": "6:20:44", "remaining_time": "14:47:29"}
|
139 |
+
{"current_steps": 1270, "total_steps": 4197, "loss": 0.2933, "lr": 8.801557120642766e-05, "epoch": 0.9074669524830297, "percentage": 30.26, "elapsed_time": "6:22:09", "remaining_time": "14:40:45"}
|
140 |
+
{"current_steps": 1280, "total_steps": 4197, "loss": 0.2866, "lr": 8.774411778994295e-05, "epoch": 0.9146123615576992, "percentage": 30.5, "elapsed_time": "6:23:32", "remaining_time": "14:34:03"}
|
141 |
+
{"current_steps": 1290, "total_steps": 4197, "loss": 0.2939, "lr": 8.747005309881984e-05, "epoch": 0.9217577706323687, "percentage": 30.74, "elapsed_time": "6:24:58", "remaining_time": "14:27:32"}
|
142 |
+
{"current_steps": 1300, "total_steps": 4197, "loss": 0.3018, "lr": 8.719339609384531e-05, "epoch": 0.9289031797070382, "percentage": 30.97, "elapsed_time": "6:26:37", "remaining_time": "14:21:35"}
|
143 |
+
{"current_steps": 1300, "total_steps": 4197, "epoch": 0.9289031797070382, "percentage": 30.97, "elapsed_time": "6:43:20", "remaining_time": "14:58:49"}
|
144 |
+
{"current_steps": 1310, "total_steps": 4197, "loss": 0.295, "lr": 8.691416591515198e-05, "epoch": 0.9360485887817077, "percentage": 31.21, "elapsed_time": "6:44:53", "remaining_time": "14:52:17"}
|
145 |
+
{"current_steps": 1320, "total_steps": 4197, "loss": 0.209, "lr": 8.663238188089398e-05, "epoch": 0.9431939978563773, "percentage": 31.45, "elapsed_time": "6:46:13", "remaining_time": "14:45:23"}
|
146 |
+
{"current_steps": 1330, "total_steps": 4197, "loss": 0.2904, "lr": 8.634806348591036e-05, "epoch": 0.9503394069310468, "percentage": 31.69, "elapsed_time": "6:47:41", "remaining_time": "14:38:50"}
|
147 |
+
{"current_steps": 1340, "total_steps": 4197, "loss": 0.2607, "lr": 8.606123040037643e-05, "epoch": 0.9574848160057163, "percentage": 31.93, "elapsed_time": "6:49:04", "remaining_time": "14:32:11"}
|
148 |
+
{"current_steps": 1350, "total_steps": 4197, "loss": 0.3279, "lr": 8.577190246844291e-05, "epoch": 0.9646302250803859, "percentage": 32.17, "elapsed_time": "6:50:30", "remaining_time": "14:25:43"}
|
149 |
+
{"current_steps": 1360, "total_steps": 4197, "loss": 0.3011, "lr": 8.548009970686302e-05, "epoch": 0.9717756341550554, "percentage": 32.4, "elapsed_time": "6:51:49", "remaining_time": "14:19:05"}
|
150 |
+
{"current_steps": 1370, "total_steps": 4197, "loss": 0.2379, "lr": 8.51858423036076e-05, "epoch": 0.978921043229725, "percentage": 32.64, "elapsed_time": "6:53:12", "remaining_time": "14:12:39"}
|
151 |
+
{"current_steps": 1380, "total_steps": 4197, "loss": 0.2599, "lr": 8.488915061646856e-05, "epoch": 0.9860664523043944, "percentage": 32.88, "elapsed_time": "6:54:31", "remaining_time": "14:06:09"}
|
152 |
+
{"current_steps": 1390, "total_steps": 4197, "loss": 0.2265, "lr": 8.459004517165032e-05, "epoch": 0.9932118613790639, "percentage": 33.12, "elapsed_time": "6:55:53", "remaining_time": "13:59:51"}
|
153 |
+
{"current_steps": 1400, "total_steps": 4197, "loss": 0.3301, "lr": 8.428854666234978e-05, "epoch": 1.0, "percentage": 33.36, "elapsed_time": "6:57:13", "remaining_time": "13:53:33"}
|
154 |
+
{"current_steps": 1400, "total_steps": 4197, "epoch": 1.0, "percentage": 33.36, "elapsed_time": "7:13:56", "remaining_time": "14:26:56"}
|
155 |
+
{"current_steps": 1410, "total_steps": 4197, "loss": 0.2021, "lr": 8.398467594732478e-05, "epoch": 1.0071454090746694, "percentage": 33.6, "elapsed_time": "7:15:28", "remaining_time": "14:20:45"}
|
156 |
+
{"current_steps": 1420, "total_steps": 4197, "loss": 0.2228, "lr": 8.367845404945084e-05, "epoch": 1.014290818149339, "percentage": 33.83, "elapsed_time": "7:16:52", "remaining_time": "14:14:22"}
|
157 |
+
{"current_steps": 1430, "total_steps": 4197, "loss": 0.1947, "lr": 8.336990215426688e-05, "epoch": 1.0214362272240085, "percentage": 34.07, "elapsed_time": "7:18:12", "remaining_time": "14:07:54"}
|
158 |
+
{"current_steps": 1440, "total_steps": 4197, "loss": 0.2344, "lr": 8.305904160850941e-05, "epoch": 1.0285816362986782, "percentage": 34.31, "elapsed_time": "7:19:35", "remaining_time": "14:01:37"}
|
159 |
+
{"current_steps": 1450, "total_steps": 4197, "loss": 0.1919, "lr": 8.274589391863583e-05, "epoch": 1.0357270453733476, "percentage": 34.55, "elapsed_time": "7:20:55", "remaining_time": "13:55:18"}
|
160 |
+
{"current_steps": 1460, "total_steps": 4197, "loss": 0.2218, "lr": 8.243048074933634e-05, "epoch": 1.0428724544480172, "percentage": 34.79, "elapsed_time": "7:22:32", "remaining_time": "13:49:37"}
|
161 |
+
{"current_steps": 1470, "total_steps": 4197, "loss": 0.2556, "lr": 8.21128239220353e-05, "epoch": 1.0500178635226867, "percentage": 35.03, "elapsed_time": "7:23:59", "remaining_time": "13:43:39"}
|
162 |
+
{"current_steps": 1480, "total_steps": 4197, "loss": 0.2052, "lr": 8.179294541338135e-05, "epoch": 1.057163272597356, "percentage": 35.26, "elapsed_time": "7:25:16", "remaining_time": "13:37:26"}
|
163 |
+
{"current_steps": 1490, "total_steps": 4197, "loss": 0.2386, "lr": 8.147086735372716e-05, "epoch": 1.0643086816720257, "percentage": 35.5, "elapsed_time": "7:26:40", "remaining_time": "13:31:31"}
|
164 |
+
{"current_steps": 1500, "total_steps": 4197, "loss": 0.1426, "lr": 8.114661202559828e-05, "epoch": 1.0714540907466952, "percentage": 35.74, "elapsed_time": "7:28:12", "remaining_time": "13:25:52"}
|
165 |
+
{"current_steps": 1500, "total_steps": 4197, "epoch": 1.0714540907466952, "percentage": 35.74, "elapsed_time": "7:44:55", "remaining_time": "13:55:55"}
|
166 |
+
{"current_steps": 1510, "total_steps": 4197, "loss": 0.2407, "lr": 8.082020186215156e-05, "epoch": 1.0785994998213648, "percentage": 35.98, "elapsed_time": "7:46:22", "remaining_time": "13:49:54"}
|
167 |
+
{"current_steps": 1520, "total_steps": 4197, "loss": 0.2483, "lr": 8.049165944562316e-05, "epoch": 1.0857449088960343, "percentage": 36.22, "elapsed_time": "7:47:50", "remaining_time": "13:43:56"}
|
168 |
+
{"current_steps": 1530, "total_steps": 4197, "loss": 0.2013, "lr": 8.016100750576621e-05, "epoch": 1.092890317970704, "percentage": 36.45, "elapsed_time": "7:49:06", "remaining_time": "13:37:42"}
|
169 |
+
{"current_steps": 1540, "total_steps": 4197, "loss": 0.2034, "lr": 7.98282689182783e-05, "epoch": 1.1000357270453733, "percentage": 36.69, "elapsed_time": "7:50:26", "remaining_time": "13:31:39"}
|
170 |
+
{"current_steps": 1550, "total_steps": 4197, "loss": 0.2386, "lr": 7.949346670321891e-05, "epoch": 1.107181136120043, "percentage": 36.93, "elapsed_time": "7:51:47", "remaining_time": "13:25:42"}
|
171 |
+
{"current_steps": 1560, "total_steps": 4197, "loss": 0.2299, "lr": 7.915662402341664e-05, "epoch": 1.1143265451947124, "percentage": 37.17, "elapsed_time": "7:53:14", "remaining_time": "13:19:57"}
|
172 |
+
{"current_steps": 1570, "total_steps": 4197, "loss": 0.2105, "lr": 7.88177641828669e-05, "epoch": 1.1214719542693818, "percentage": 37.41, "elapsed_time": "7:54:47", "remaining_time": "13:14:26"}
|
173 |
+
{"current_steps": 1580, "total_steps": 4197, "loss": 0.1925, "lr": 7.847691062511957e-05, "epoch": 1.1286173633440515, "percentage": 37.65, "elapsed_time": "7:56:16", "remaining_time": "13:08:51"}
|
174 |
+
{"current_steps": 1590, "total_steps": 4197, "loss": 0.2425, "lr": 7.813408693165704e-05, "epoch": 1.135762772418721, "percentage": 37.88, "elapsed_time": "7:57:40", "remaining_time": "13:03:11"}
|
175 |
+
{"current_steps": 1600, "total_steps": 4197, "loss": 0.2014, "lr": 7.778931682026293e-05, "epoch": 1.1429081814933906, "percentage": 38.12, "elapsed_time": "7:58:53", "remaining_time": "12:57:18"}
|
176 |
+
{"current_steps": 1600, "total_steps": 4197, "epoch": 1.1429081814933906, "percentage": 38.12, "elapsed_time": "8:15:36", "remaining_time": "13:24:26"}
|
177 |
+
{"current_steps": 1610, "total_steps": 4197, "loss": 0.2863, "lr": 7.744262414338099e-05, "epoch": 1.15005359056806, "percentage": 38.36, "elapsed_time": "8:16:53", "remaining_time": "13:18:25"}
|
178 |
+
{"current_steps": 1620, "total_steps": 4197, "loss": 0.2175, "lr": 7.709403288646507e-05, "epoch": 1.1571989996427297, "percentage": 38.6, "elapsed_time": "8:18:18", "remaining_time": "13:12:41"}
|
179 |
+
{"current_steps": 1630, "total_steps": 4197, "loss": 0.1893, "lr": 7.67435671663196e-05, "epoch": 1.164344408717399, "percentage": 38.84, "elapsed_time": "8:19:54", "remaining_time": "13:07:16"}
|
180 |
+
{"current_steps": 1640, "total_steps": 4197, "loss": 0.2483, "lr": 7.63912512294312e-05, "epoch": 1.1714898177920685, "percentage": 39.08, "elapsed_time": "8:21:19", "remaining_time": "13:01:37"}
|
181 |
+
{"current_steps": 1650, "total_steps": 4197, "loss": 0.1888, "lr": 7.603710945029119e-05, "epoch": 1.1786352268667382, "percentage": 39.31, "elapsed_time": "8:22:38", "remaining_time": "12:55:54"}
|
182 |
+
{"current_steps": 1660, "total_steps": 4197, "loss": 0.2144, "lr": 7.568116632970922e-05, "epoch": 1.1857806359414076, "percentage": 39.55, "elapsed_time": "8:24:07", "remaining_time": "12:50:27"}
|
183 |
+
{"current_steps": 1670, "total_steps": 4197, "loss": 0.191, "lr": 7.532344649311829e-05, "epoch": 1.1929260450160772, "percentage": 39.79, "elapsed_time": "8:25:33", "remaining_time": "12:45:00"}
|
184 |
+
{"current_steps": 1680, "total_steps": 4197, "loss": 0.2762, "lr": 7.496397468887106e-05, "epoch": 1.2000714540907467, "percentage": 40.03, "elapsed_time": "8:26:55", "remaining_time": "12:39:28"}
|
185 |
+
{"current_steps": 1690, "total_steps": 4197, "loss": 0.157, "lr": 7.460277578652759e-05, "epoch": 1.2072168631654163, "percentage": 40.27, "elapsed_time": "8:28:09", "remaining_time": "12:33:48"}
|
186 |
+
{"current_steps": 1700, "total_steps": 4197, "loss": 0.2627, "lr": 7.423987477513488e-05, "epoch": 1.2143622722400857, "percentage": 40.51, "elapsed_time": "8:29:32", "remaining_time": "12:28:24"}
|
187 |
+
{"current_steps": 1700, "total_steps": 4197, "epoch": 1.2143622722400857, "percentage": 40.51, "elapsed_time": "8:46:15", "remaining_time": "12:52:58"}
|
188 |
+
{"current_steps": 1710, "total_steps": 4197, "loss": 0.1477, "lr": 7.387529676149799e-05, "epoch": 1.2215076813147552, "percentage": 40.74, "elapsed_time": "8:47:23", "remaining_time": "12:47:02"}
|
189 |
+
{"current_steps": 1720, "total_steps": 4197, "loss": 0.1942, "lr": 7.350906696844307e-05, "epoch": 1.2286530903894248, "percentage": 40.98, "elapsed_time": "8:49:00", "remaining_time": "12:41:50"}
|
190 |
+
{"current_steps": 1730, "total_steps": 4197, "loss": 0.2, "lr": 7.314121073307229e-05, "epoch": 1.2357984994640943, "percentage": 41.22, "elapsed_time": "8:50:18", "remaining_time": "12:36:13"}
|
191 |
+
{"current_steps": 1740, "total_steps": 4197, "loss": 0.185, "lr": 7.277175350501111e-05, "epoch": 1.242943908538764, "percentage": 41.46, "elapsed_time": "8:51:48", "remaining_time": "12:30:56"}
|
192 |
+
{"current_steps": 1750, "total_steps": 4197, "loss": 0.196, "lr": 7.240072084464729e-05, "epoch": 1.2500893176134333, "percentage": 41.7, "elapsed_time": "8:53:14", "remaining_time": "12:25:37"}
|
193 |
+
{"current_steps": 1760, "total_steps": 4197, "loss": 0.1322, "lr": 7.202813842136283e-05, "epoch": 1.257234726688103, "percentage": 41.93, "elapsed_time": "8:54:37", "remaining_time": "12:20:16"}
|
194 |
+
{"current_steps": 1770, "total_steps": 4197, "loss": 0.2176, "lr": 7.165403201175787e-05, "epoch": 1.2643801357627724, "percentage": 42.17, "elapsed_time": "8:56:13", "remaining_time": "12:15:15"}
|
195 |
+
{"current_steps": 1780, "total_steps": 4197, "loss": 0.218, "lr": 7.127842749786747e-05, "epoch": 1.2715255448374418, "percentage": 42.41, "elapsed_time": "8:57:43", "remaining_time": "12:10:09"}
|
196 |
+
{"current_steps": 1790, "total_steps": 4197, "loss": 0.1653, "lr": 7.090135086537095e-05, "epoch": 1.2786709539121115, "percentage": 42.65, "elapsed_time": "8:59:05", "remaining_time": "12:04:55"}
|
197 |
+
{"current_steps": 1800, "total_steps": 4197, "loss": 0.175, "lr": 7.052282820179412e-05, "epoch": 1.285816362986781, "percentage": 42.89, "elapsed_time": "9:00:34", "remaining_time": "11:59:52"}
|
198 |
+
{"current_steps": 1800, "total_steps": 4197, "epoch": 1.285816362986781, "percentage": 42.89, "elapsed_time": "9:17:18", "remaining_time": "12:22:08"}
|
199 |
+
{"current_steps": 1810, "total_steps": 4197, "loss": 0.1727, "lr": 7.014288569470446e-05, "epoch": 1.2929617720614506, "percentage": 43.13, "elapsed_time": "9:18:50", "remaining_time": "12:16:59"}
|
200 |
+
{"current_steps": 1820, "total_steps": 4197, "loss": 0.2363, "lr": 6.976154962989934e-05, "epoch": 1.30010718113612, "percentage": 43.36, "elapsed_time": "9:20:07", "remaining_time": "12:11:32"}
|
201 |
+
{"current_steps": 1830, "total_steps": 4197, "loss": 0.1897, "lr": 6.937884638958757e-05, "epoch": 1.3072525902107897, "percentage": 43.6, "elapsed_time": "9:21:33", "remaining_time": "12:06:20"}
|
202 |
+
{"current_steps": 1840, "total_steps": 4197, "loss": 0.2029, "lr": 6.899480245056396e-05, "epoch": 1.314397999285459, "percentage": 43.84, "elapsed_time": "9:23:03", "remaining_time": "12:01:15"}
|
203 |
+
{"current_steps": 1850, "total_steps": 4197, "loss": 0.2025, "lr": 6.860944438237788e-05, "epoch": 1.3215434083601285, "percentage": 44.08, "elapsed_time": "9:24:43", "remaining_time": "11:56:26"}
|
204 |
+
{"current_steps": 1860, "total_steps": 4197, "loss": 0.2317, "lr": 6.82227988454948e-05, "epoch": 1.3286888174347982, "percentage": 44.32, "elapsed_time": "9:26:23", "remaining_time": "11:51:38"}
|
205 |
+
{"current_steps": 1870, "total_steps": 4197, "loss": 0.2318, "lr": 6.783489258945195e-05, "epoch": 1.3358342265094676, "percentage": 44.56, "elapsed_time": "9:27:50", "remaining_time": "11:46:36"}
|
206 |
+
{"current_steps": 1880, "total_steps": 4197, "loss": 0.1871, "lr": 6.74457524510077e-05, "epoch": 1.3429796355841372, "percentage": 44.79, "elapsed_time": "9:29:28", "remaining_time": "11:41:51"}
|
207 |
+
{"current_steps": 1890, "total_steps": 4197, "loss": 0.211, "lr": 6.705540535228485e-05, "epoch": 1.3501250446588067, "percentage": 45.03, "elapsed_time": "9:31:01", "remaining_time": "11:37:00"}
|
208 |
+
{"current_steps": 1900, "total_steps": 4197, "loss": 0.2307, "lr": 6.66638782989081e-05, "epoch": 1.3572704537334763, "percentage": 45.27, "elapsed_time": "9:32:25", "remaining_time": "11:32:01"}
|
209 |
+
{"current_steps": 1900, "total_steps": 4197, "epoch": 1.3572704537334763, "percentage": 45.27, "elapsed_time": "9:49:08", "remaining_time": "11:52:14"}
|
210 |
+
{"current_steps": 1910, "total_steps": 4197, "loss": 0.2128, "lr": 6.627119837813564e-05, "epoch": 1.3644158628081458, "percentage": 45.51, "elapsed_time": "9:50:27", "remaining_time": "11:46:59"}
|
211 |
+
{"current_steps": 1920, "total_steps": 4197, "loss": 0.1551, "lr": 6.587739275698525e-05, "epoch": 1.3715612718828152, "percentage": 45.75, "elapsed_time": "9:51:53", "remaining_time": "11:41:56"}
|
212 |
+
{"current_steps": 1930, "total_steps": 4197, "loss": 0.2335, "lr": 6.54824886803547e-05, "epoch": 1.3787066809574848, "percentage": 45.99, "elapsed_time": "9:53:22", "remaining_time": "11:36:58"}
|
213 |
+
{"current_steps": 1940, "total_steps": 4197, "loss": 0.1504, "lr": 6.508651346913687e-05, "epoch": 1.3858520900321543, "percentage": 46.22, "elapsed_time": "9:54:40", "remaining_time": "11:31:50"}
|
214 |
+
{"current_steps": 1950, "total_steps": 4197, "loss": 0.2679, "lr": 6.468949451832968e-05, "epoch": 1.392997499106824, "percentage": 46.46, "elapsed_time": "9:56:14", "remaining_time": "11:27:03"}
|
215 |
+
{"current_steps": 1960, "total_steps": 4197, "loss": 0.1942, "lr": 6.429145929514063e-05, "epoch": 1.4001429081814933, "percentage": 46.7, "elapsed_time": "9:57:45", "remaining_time": "11:22:13"}
|
216 |
+
{"current_steps": 1970, "total_steps": 4197, "loss": 0.2025, "lr": 6.389243533708671e-05, "epoch": 1.407288317256163, "percentage": 46.94, "elapsed_time": "9:59:06", "remaining_time": "11:17:16"}
|
217 |
+
{"current_steps": 1980, "total_steps": 4197, "loss": 0.1836, "lr": 6.349245025008912e-05, "epoch": 1.4144337263308324, "percentage": 47.18, "elapsed_time": "10:00:31", "remaining_time": "11:12:24"}
|
218 |
+
{"current_steps": 1990, "total_steps": 4197, "loss": 0.1526, "lr": 6.309153170656342e-05, "epoch": 1.4215791354055018, "percentage": 47.41, "elapsed_time": "10:01:53", "remaining_time": "11:07:31"}
|
219 |
+
{"current_steps": 2000, "total_steps": 4197, "loss": 0.1939, "lr": 6.268970744350515e-05, "epoch": 1.4287245444801715, "percentage": 47.65, "elapsed_time": "10:03:27", "remaining_time": "11:02:54"}
|
220 |
+
{"current_steps": 2000, "total_steps": 4197, "epoch": 1.4287245444801715, "percentage": 47.65, "elapsed_time": "10:20:10", "remaining_time": "11:21:15"}
|