[INFO|2025-03-19 00:22:24] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/special_tokens_map.json [INFO|2025-03-19 00:22:24] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/tokenizer_config.json [INFO|2025-03-19 00:22:24] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-03-19 00:22:25] configuration_utils.py:699 >> loading configuration file config.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/config.json [INFO|2025-03-19 00:22:25] configuration_utils.py:771 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Llama-3.2-3B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": [ 128001, 128008, 128009 ], "head_dim": 128, "hidden_act": "silu", "hidden_size": 3072, "initializer_range": 0.02, "intermediate_size": 8192, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 24, "num_hidden_layers": 28, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 32.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "vocab_size": 128256 } [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file tokenizer.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/tokenizer.json [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file tokenizer.model from cache at None [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file added_tokens.json from cache at None [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file special_tokens_map.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/special_tokens_map.json [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file tokenizer_config.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/tokenizer_config.json [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2050 >> loading file chat_template.jinja from cache at None [INFO|2025-03-19 00:22:25] tokenization_utils_base.py:2313 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|2025-03-19 00:22:25] logging.py:143 >> Add pad token: <|eot_id|> [INFO|2025-03-19 00:22:25] logging.py:143 >> Add <|eot_id|>,<|eom_id|> to stop words. [INFO|2025-03-19 00:22:25] logging.py:143 >> Loading dataset MAIR-GEN-mini-v2-50000.json... [INFO|2025-03-19 00:24:05] configuration_utils.py:699 >> loading configuration file config.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/config.json [INFO|2025-03-19 00:24:05] configuration_utils.py:771 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Llama-3.2-3B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": [ 128001, 128008, 128009 ], "head_dim": 128, "hidden_act": "silu", "hidden_size": 3072, "initializer_range": 0.02, "intermediate_size": 8192, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 24, "num_hidden_layers": 28, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 32.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "vocab_size": 128256 } [INFO|2025-03-19 00:24:05] modeling_utils.py:3982 >> loading weights file model.safetensors from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/model.safetensors.index.json [INFO|2025-03-19 00:24:05] modeling_utils.py:1633 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|2025-03-19 00:24:05] configuration_utils.py:1140 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": [ 128001, 128008, 128009 ] } [INFO|2025-03-19 00:24:07] modeling_utils.py:4970 >> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|2025-03-19 00:24:07] modeling_utils.py:4978 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-3.2-3B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|2025-03-19 00:24:07] configuration_utils.py:1095 >> loading configuration file generation_config.json from cache at /usr1/data/weiweis/transformers/hub/models--meta-llama--Llama-3.2-3B-Instruct/snapshots/0cb88a4f764b7a12671c53f0838cd831a0843b95/generation_config.json [INFO|2025-03-19 00:24:07] configuration_utils.py:1140 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128008, 128009 ], "temperature": 0.6, "top_p": 0.9 } [INFO|2025-03-19 00:24:07] logging.py:143 >> Gradient checkpointing enabled. [INFO|2025-03-19 00:24:07] logging.py:143 >> Using torch SDPA for faster training and inference. [INFO|2025-03-19 00:24:07] logging.py:143 >> Upcasting trainable params to float32. [INFO|2025-03-19 00:24:07] logging.py:143 >> Fine-tuning method: Full [INFO|2025-03-19 00:24:07] logging.py:143 >> trainable params: 3,212,749,824 || all params: 3,212,749,824 || trainable%: 100.0000 [INFO|2025-03-19 00:24:07] trainer.py:746 >> Using auto half precision backend [INFO|2025-03-19 00:24:20] trainer.py:2405 >> ***** Running training ***** [INFO|2025-03-19 00:24:20] trainer.py:2406 >> Num examples = 1,205,874 [INFO|2025-03-19 00:24:20] trainer.py:2407 >> Num Epochs = 3 [INFO|2025-03-19 00:24:20] trainer.py:2408 >> Instantaneous batch size per device = 4 [INFO|2025-03-19 00:24:20] trainer.py:2411 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|2025-03-19 00:24:20] trainer.py:2412 >> Gradient Accumulation steps = 4 [INFO|2025-03-19 00:24:20] trainer.py:2413 >> Total optimization steps = 28,263 [INFO|2025-03-19 00:24:20] trainer.py:2414 >> Number of trainable parameters = 3,212,749,824 [INFO|2025-03-19 00:25:05] logging.py:143 >> {'loss': 3.0716, 'learning_rate': 1.2500e-06, 'epoch': 0.00, 'throughput': 9171.28} [INFO|2025-03-19 00:25:46] logging.py:143 >> {'loss': 2.9137, 'learning_rate': 2.5000e-06, 'epoch': 0.00, 'throughput': 9580.59} [INFO|2025-03-19 00:26:26] logging.py:143 >> {'loss': 2.2880, 'learning_rate': 3.7500e-06, 'epoch': 0.00, 'throughput': 9721.59} [INFO|2025-03-19 00:27:06] logging.py:143 >> {'loss': 1.5454, 'learning_rate': 5.0000e-06, 'epoch': 0.00, 'throughput': 9825.13} [INFO|2025-03-19 00:27:46] logging.py:143 >> {'loss': 1.2933, 'learning_rate': 6.2500e-06, 'epoch': 0.00, 'throughput': 9912.18} [INFO|2025-03-19 00:28:27] logging.py:143 >> {'loss': 1.0597, 'learning_rate': 7.5000e-06, 'epoch': 0.00, 'throughput': 9925.85} [INFO|2025-03-19 00:29:08] logging.py:143 >> {'loss': 0.9685, 'learning_rate': 8.7500e-06, 'epoch': 0.00, 'throughput': 9950.16} [INFO|2025-03-19 00:29:50] logging.py:143 >> {'loss': 0.9100, 'learning_rate': 1.0000e-05, 'epoch': 0.00, 'throughput': 9918.92} [INFO|2025-03-19 00:30:31] logging.py:143 >> {'loss': 0.9364, 'learning_rate': 1.1250e-05, 'epoch': 0.00, 'throughput': 9914.85} [INFO|2025-03-19 00:31:12] logging.py:143 >> {'loss': 0.8846, 'learning_rate': 1.2500e-05, 'epoch': 0.01, 'throughput': 9939.83} [INFO|2025-03-19 00:31:54] logging.py:143 >> {'loss': 0.9108, 'learning_rate': 1.3750e-05, 'epoch': 0.01, 'throughput': 9933.13} [INFO|2025-03-19 00:32:35] logging.py:143 >> {'loss': 0.8355, 'learning_rate': 1.5000e-05, 'epoch': 0.01, 'throughput': 9946.10} [INFO|2025-03-19 00:33:15] logging.py:143 >> {'loss': 0.8772, 'learning_rate': 1.6250e-05, 'epoch': 0.01, 'throughput': 9962.56} [INFO|2025-03-19 00:33:56] logging.py:143 >> {'loss': 0.8683, 'learning_rate': 1.7500e-05, 'epoch': 0.01, 'throughput': 9961.55} [INFO|2025-03-19 00:34:36] logging.py:143 >> {'loss': 0.8377, 'learning_rate': 1.8750e-05, 'epoch': 0.01, 'throughput': 9955.47} [INFO|2025-03-19 00:35:17] logging.py:143 >> {'loss': 0.8368, 'learning_rate': 2.0000e-05, 'epoch': 0.01, 'throughput': 9940.36} [INFO|2025-03-19 00:35:57] logging.py:143 >> {'loss': 0.8790, 'learning_rate': 2.1250e-05, 'epoch': 0.01, 'throughput': 9959.35} [INFO|2025-03-19 00:36:37] logging.py:143 >> {'loss': 0.8479, 'learning_rate': 2.2500e-05, 'epoch': 0.01, 'throughput': 9962.66} [INFO|2025-03-19 00:37:18] logging.py:143 >> {'loss': 0.8490, 'learning_rate': 2.3750e-05, 'epoch': 0.01, 'throughput': 9954.19} [INFO|2025-03-19 00:37:58] logging.py:143 >> {'loss': 0.8444, 'learning_rate': 2.5000e-05, 'epoch': 0.01, 'throughput': 9968.63} [INFO|2025-03-19 00:38:39] logging.py:143 >> {'loss': 0.8518, 'learning_rate': 2.6250e-05, 'epoch': 0.01, 'throughput': 9965.09} [INFO|2025-03-19 00:39:20] logging.py:143 >> {'loss': 0.8492, 'learning_rate': 2.7500e-05, 'epoch': 0.01, 'throughput': 9967.34} [INFO|2025-03-19 00:40:01] logging.py:143 >> {'loss': 0.8605, 'learning_rate': 2.8750e-05, 'epoch': 0.01, 'throughput': 9965.68} [INFO|2025-03-19 00:40:40] logging.py:143 >> {'loss': 0.8566, 'learning_rate': 3.0000e-05, 'epoch': 0.01, 'throughput': 9982.18} [INFO|2025-03-19 00:41:19] logging.py:143 >> {'loss': 0.8848, 'learning_rate': 3.1250e-05, 'epoch': 0.01, 'throughput': 10003.72} [INFO|2025-03-19 00:42:01] logging.py:143 >> {'loss': 0.8291, 'learning_rate': 3.2500e-05, 'epoch': 0.01, 'throughput': 9994.19} [INFO|2025-03-19 00:42:42] logging.py:143 >> {'loss': 0.8577, 'learning_rate': 3.3750e-05, 'epoch': 0.01, 'throughput': 9988.34} [INFO|2025-03-19 00:43:22] logging.py:143 >> {'loss': 0.8428, 'learning_rate': 3.5000e-05, 'epoch': 0.01, 'throughput': 9998.77} [INFO|2025-03-19 00:44:02] logging.py:143 >> {'loss': 0.8493, 'learning_rate': 3.6250e-05, 'epoch': 0.02, 'throughput': 10001.26} [INFO|2025-03-19 00:44:43] logging.py:143 >> {'loss': 0.8388, 'learning_rate': 3.7500e-05, 'epoch': 0.02, 'throughput': 10005.97} [INFO|2025-03-19 00:45:24] logging.py:143 >> {'loss': 0.8422, 'learning_rate': 3.8750e-05, 'epoch': 0.02, 'throughput': 9993.53} [INFO|2025-03-19 00:46:03] logging.py:143 >> {'loss': 0.8458, 'learning_rate': 4.0000e-05, 'epoch': 0.02, 'throughput': 9996.73} [INFO|2025-03-19 00:46:42] logging.py:143 >> {'loss': 0.8608, 'learning_rate': 4.1250e-05, 'epoch': 0.02, 'throughput': 9999.33} [INFO|2025-03-19 00:47:22] logging.py:143 >> {'loss': 0.8400, 'learning_rate': 4.2500e-05, 'epoch': 0.02, 'throughput': 10000.27} [INFO|2025-03-19 00:48:03] logging.py:143 >> {'loss': 0.8518, 'learning_rate': 4.3750e-05, 'epoch': 0.02, 'throughput': 10000.17} [INFO|2025-03-19 00:48:43] logging.py:143 >> {'loss': 0.8246, 'learning_rate': 4.5000e-05, 'epoch': 0.02, 'throughput': 10007.34} [INFO|2025-03-19 00:49:23] logging.py:143 >> {'loss': 0.8350, 'learning_rate': 4.6250e-05, 'epoch': 0.02, 'throughput': 10009.77} [INFO|2025-03-19 00:50:03] logging.py:143 >> {'loss': 0.8333, 'learning_rate': 4.7500e-05, 'epoch': 0.02, 'throughput': 10015.64} [INFO|2025-03-19 00:50:43] logging.py:143 >> {'loss': 0.8611, 'learning_rate': 4.8750e-05, 'epoch': 0.02, 'throughput': 10015.65} [INFO|2025-03-19 00:51:22] logging.py:143 >> {'loss': 0.8656, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10022.05} [INFO|2025-03-19 00:52:04] logging.py:143 >> {'loss': 0.8592, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10019.48} [INFO|2025-03-19 00:52:44] logging.py:143 >> {'loss': 0.8463, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10015.05} [INFO|2025-03-19 00:53:25] logging.py:143 >> {'loss': 0.8704, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10012.65} [INFO|2025-03-19 00:54:05] logging.py:143 >> {'loss': 0.8640, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10014.71} [INFO|2025-03-19 00:54:44] logging.py:143 >> {'loss': 0.8597, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10017.53} [INFO|2025-03-19 00:55:23] logging.py:143 >> {'loss': 0.7890, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10018.64} [INFO|2025-03-19 00:56:04] logging.py:143 >> {'loss': 0.8779, 'learning_rate': 5.0000e-05, 'epoch': 0.02, 'throughput': 10027.67} [INFO|2025-03-19 00:56:45] logging.py:143 >> {'loss': 0.8795, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 10032.07} [INFO|2025-03-19 00:57:25] logging.py:143 >> {'loss': 0.8766, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 10033.85} [INFO|2025-03-19 00:58:07] logging.py:143 >> {'loss': 0.8632, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 10033.99} [INFO|2025-03-19 00:58:47] logging.py:143 >> {'loss': 0.8455, 'learning_rate': 5.0000e-05, 'epoch': 0.03, 'throughput': 10036.56} [INFO|2025-03-19 00:59:27] logging.py:143 >> {'loss': 0.8838, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10040.98} [INFO|2025-03-19 01:00:07] logging.py:143 >> {'loss': 0.8568, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10040.23} [INFO|2025-03-19 01:00:49] logging.py:143 >> {'loss': 0.8399, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10039.06} [INFO|2025-03-19 01:01:30] logging.py:143 >> {'loss': 0.8887, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10045.44} [INFO|2025-03-19 01:02:11] logging.py:143 >> {'loss': 0.8588, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10045.84} [INFO|2025-03-19 01:02:53] logging.py:143 >> {'loss': 0.8510, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10043.36} [INFO|2025-03-19 01:03:32] logging.py:143 >> {'loss': 0.8447, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10044.66} [INFO|2025-03-19 01:04:12] logging.py:143 >> {'loss': 0.8521, 'learning_rate': 4.9999e-05, 'epoch': 0.03, 'throughput': 10044.32} [INFO|2025-03-19 01:04:52] logging.py:143 >> {'loss': 0.8563, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10043.69} [INFO|2025-03-19 01:05:32] logging.py:143 >> {'loss': 0.8414, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10044.28} [INFO|2025-03-19 01:06:12] logging.py:143 >> {'loss': 0.8737, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10049.03} [INFO|2025-03-19 01:06:52] logging.py:143 >> {'loss': 0.8326, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10047.52} [INFO|2025-03-19 01:07:32] logging.py:143 >> {'loss': 0.8053, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10050.06} [INFO|2025-03-19 01:08:14] logging.py:143 >> {'loss': 0.8250, 'learning_rate': 4.9998e-05, 'epoch': 0.03, 'throughput': 10045.65} [INFO|2025-03-19 01:08:55] logging.py:143 >> {'loss': 0.8453, 'learning_rate': 4.9997e-05, 'epoch': 0.04, 'throughput': 10045.72} [INFO|2025-03-19 01:09:37] logging.py:143 >> {'loss': 0.8178, 'learning_rate': 4.9997e-05, 'epoch': 0.04, 'throughput': 10042.60} [INFO|2025-03-19 01:10:17] logging.py:143 >> {'loss': 0.8244, 'learning_rate': 4.9997e-05, 'epoch': 0.04, 'throughput': 10042.65} [INFO|2025-03-19 01:10:57] logging.py:143 >> {'loss': 0.8595, 'learning_rate': 4.9997e-05, 'epoch': 0.04, 'throughput': 10046.91} [INFO|2025-03-19 01:11:37] logging.py:143 >> {'loss': 0.8493, 'learning_rate': 4.9996e-05, 'epoch': 0.04, 'throughput': 10048.70} [INFO|2025-03-19 01:12:17] logging.py:143 >> {'loss': 0.8504, 'learning_rate': 4.9996e-05, 'epoch': 0.04, 'throughput': 10046.57} [INFO|2025-03-19 01:12:58] logging.py:143 >> {'loss': 0.8463, 'learning_rate': 4.9996e-05, 'epoch': 0.04, 'throughput': 10045.28} [INFO|2025-03-19 01:13:39] logging.py:143 >> {'loss': 0.8713, 'learning_rate': 4.9996e-05, 'epoch': 0.04, 'throughput': 10042.54} [INFO|2025-03-19 01:14:20] logging.py:143 >> {'loss': 0.8288, 'learning_rate': 4.9995e-05, 'epoch': 0.04, 'throughput': 10043.73} [INFO|2025-03-19 01:15:00] logging.py:143 >> {'loss': 0.8314, 'learning_rate': 4.9995e-05, 'epoch': 0.04, 'throughput': 10039.97} [INFO|2025-03-19 01:15:41] logging.py:143 >> {'loss': 0.8594, 'learning_rate': 4.9995e-05, 'epoch': 0.04, 'throughput': 10039.76} [INFO|2025-03-19 01:16:20] logging.py:143 >> {'loss': 0.8342, 'learning_rate': 4.9995e-05, 'epoch': 0.04, 'throughput': 10039.07} [INFO|2025-03-19 01:17:01] logging.py:143 >> {'loss': 0.8407, 'learning_rate': 4.9994e-05, 'epoch': 0.04, 'throughput': 10038.03} [INFO|2025-03-19 01:17:41] logging.py:143 >> {'loss': 0.8514, 'learning_rate': 4.9994e-05, 'epoch': 0.04, 'throughput': 10037.00} [INFO|2025-03-19 01:18:21] logging.py:143 >> {'loss': 0.8415, 'learning_rate': 4.9994e-05, 'epoch': 0.04, 'throughput': 10037.00} [INFO|2025-03-19 01:19:03] logging.py:143 >> {'loss': 0.8533, 'learning_rate': 4.9993e-05, 'epoch': 0.04, 'throughput': 10040.07} [INFO|2025-03-19 01:19:43] logging.py:143 >> {'loss': 0.8469, 'learning_rate': 4.9993e-05, 'epoch': 0.04, 'throughput': 10043.58} [INFO|2025-03-19 01:20:23] logging.py:143 >> {'loss': 0.8829, 'learning_rate': 4.9993e-05, 'epoch': 0.04, 'throughput': 10045.51} [INFO|2025-03-19 01:21:03] logging.py:143 >> {'loss': 0.8433, 'learning_rate': 4.9992e-05, 'epoch': 0.04, 'throughput': 10044.45} [INFO|2025-03-19 01:21:45] logging.py:143 >> {'loss': 0.8491, 'learning_rate': 4.9992e-05, 'epoch': 0.05, 'throughput': 10039.05} [INFO|2025-03-19 01:22:25] logging.py:143 >> {'loss': 0.8202, 'learning_rate': 4.9992e-05, 'epoch': 0.05, 'throughput': 10033.89} [INFO|2025-03-19 01:23:05] logging.py:143 >> {'loss': 0.8860, 'learning_rate': 4.9991e-05, 'epoch': 0.05, 'throughput': 10037.82} [INFO|2025-03-19 01:23:46] logging.py:143 >> {'loss': 0.8675, 'learning_rate': 4.9991e-05, 'epoch': 0.05, 'throughput': 10035.43} [INFO|2025-03-19 01:24:26] logging.py:143 >> {'loss': 0.8302, 'learning_rate': 4.9991e-05, 'epoch': 0.05, 'throughput': 10035.74} [INFO|2025-03-19 01:25:07] logging.py:143 >> {'loss': 0.8478, 'learning_rate': 4.9990e-05, 'epoch': 0.05, 'throughput': 10035.67} [INFO|2025-03-19 01:25:47] logging.py:143 >> {'loss': 0.8423, 'learning_rate': 4.9990e-05, 'epoch': 0.05, 'throughput': 10035.80} [INFO|2025-03-19 01:26:28] logging.py:143 >> {'loss': 0.8300, 'learning_rate': 4.9989e-05, 'epoch': 0.05, 'throughput': 10035.29} [INFO|2025-03-19 01:27:08] logging.py:143 >> {'loss': 0.8479, 'learning_rate': 4.9989e-05, 'epoch': 0.05, 'throughput': 10034.99} [INFO|2025-03-19 01:27:48] logging.py:143 >> {'loss': 0.8451, 'learning_rate': 4.9989e-05, 'epoch': 0.05, 'throughput': 10031.79} [INFO|2025-03-19 01:28:30] logging.py:143 >> {'loss': 0.8365, 'learning_rate': 4.9988e-05, 'epoch': 0.05, 'throughput': 10029.03} [INFO|2025-03-19 01:29:11] logging.py:143 >> {'loss': 0.8710, 'learning_rate': 4.9988e-05, 'epoch': 0.05, 'throughput': 10028.36} [INFO|2025-03-19 01:29:50] logging.py:143 >> {'loss': 0.8681, 'learning_rate': 4.9987e-05, 'epoch': 0.05, 'throughput': 10031.53} [INFO|2025-03-19 01:30:30] logging.py:143 >> {'loss': 0.8023, 'learning_rate': 4.9987e-05, 'epoch': 0.05, 'throughput': 10031.81} [INFO|2025-03-19 01:31:11] logging.py:143 >> {'loss': 0.8360, 'learning_rate': 4.9986e-05, 'epoch': 0.05, 'throughput': 10032.02} [INFO|2025-03-19 01:31:52] logging.py:143 >> {'loss': 0.8163, 'learning_rate': 4.9986e-05, 'epoch': 0.05, 'throughput': 10027.91} [INFO|2025-03-19 01:32:32] logging.py:143 >> {'loss': 0.8295, 'learning_rate': 4.9985e-05, 'epoch': 0.05, 'throughput': 10028.34} [INFO|2025-03-19 01:33:14] logging.py:143 >> {'loss': 0.8192, 'learning_rate': 4.9985e-05, 'epoch': 0.05, 'throughput': 10023.26} [INFO|2025-03-19 01:33:55] logging.py:143 >> {'loss': 0.8379, 'learning_rate': 4.9984e-05, 'epoch': 0.05, 'throughput': 10023.20} [INFO|2025-03-19 01:34:36] logging.py:143 >> {'loss': 0.8347, 'learning_rate': 4.9984e-05, 'epoch': 0.06, 'throughput': 10022.37} [INFO|2025-03-19 01:35:17] logging.py:143 >> {'loss': 0.8516, 'learning_rate': 4.9983e-05, 'epoch': 0.06, 'throughput': 10024.64} [INFO|2025-03-19 01:35:57] logging.py:143 >> {'loss': 0.8429, 'learning_rate': 4.9983e-05, 'epoch': 0.06, 'throughput': 10025.80} [INFO|2025-03-19 01:36:36] logging.py:143 >> {'loss': 0.8168, 'learning_rate': 4.9982e-05, 'epoch': 0.06, 'throughput': 10028.12} [INFO|2025-03-19 01:37:18] logging.py:143 >> {'loss': 0.8251, 'learning_rate': 4.9982e-05, 'epoch': 0.06, 'throughput': 10023.80} [INFO|2025-03-19 01:37:58] logging.py:143 >> {'loss': 0.8807, 'learning_rate': 4.9981e-05, 'epoch': 0.06, 'throughput': 10027.72} [INFO|2025-03-19 01:38:39] logging.py:143 >> {'loss': 0.8367, 'learning_rate': 4.9981e-05, 'epoch': 0.06, 'throughput': 10027.96} [INFO|2025-03-19 01:39:20] logging.py:143 >> {'loss': 0.8692, 'learning_rate': 4.9980e-05, 'epoch': 0.06, 'throughput': 10026.76} [INFO|2025-03-19 01:40:02] logging.py:143 >> {'loss': 0.8153, 'learning_rate': 4.9980e-05, 'epoch': 0.06, 'throughput': 10025.67} [INFO|2025-03-19 01:40:43] logging.py:143 >> {'loss': 0.8159, 'learning_rate': 4.9979e-05, 'epoch': 0.06, 'throughput': 10024.51} [INFO|2025-03-19 01:41:25] logging.py:143 >> {'loss': 0.8006, 'learning_rate': 4.9979e-05, 'epoch': 0.06, 'throughput': 10022.36} [INFO|2025-03-19 01:42:06] logging.py:143 >> {'loss': 0.8045, 'learning_rate': 4.9978e-05, 'epoch': 0.06, 'throughput': 10018.77} [INFO|2025-03-19 01:42:48] logging.py:143 >> {'loss': 0.8386, 'learning_rate': 4.9977e-05, 'epoch': 0.06, 'throughput': 10016.02} [INFO|2025-03-19 01:43:31] logging.py:143 >> {'loss': 0.8204, 'learning_rate': 4.9977e-05, 'epoch': 0.06, 'throughput': 10009.92} [INFO|2025-03-19 01:44:13] logging.py:143 >> {'loss': 0.8000, 'learning_rate': 4.9976e-05, 'epoch': 0.06, 'throughput': 10004.83} [INFO|2025-03-19 01:44:56] logging.py:143 >> {'loss': 0.8536, 'learning_rate': 4.9976e-05, 'epoch': 0.06, 'throughput': 10001.20} [INFO|2025-03-19 01:45:37] logging.py:143 >> {'loss': 0.8026, 'learning_rate': 4.9975e-05, 'epoch': 0.06, 'throughput': 9999.11} [INFO|2025-03-19 01:46:19] logging.py:143 >> {'loss': 0.7993, 'learning_rate': 4.9974e-05, 'epoch': 0.06, 'throughput': 9996.67} [INFO|2025-03-19 01:47:04] logging.py:143 >> {'loss': 0.8504, 'learning_rate': 4.9974e-05, 'epoch': 0.06, 'throughput': 9988.96} [INFO|2025-03-19 01:47:49] logging.py:143 >> {'loss': 0.7697, 'learning_rate': 4.9973e-05, 'epoch': 0.07, 'throughput': 9978.96} [INFO|2025-03-19 01:48:34] logging.py:143 >> {'loss': 0.8281, 'learning_rate': 4.9972e-05, 'epoch': 0.07, 'throughput': 9972.69} [INFO|2025-03-19 01:49:17] logging.py:143 >> {'loss': 0.8304, 'learning_rate': 4.9972e-05, 'epoch': 0.07, 'throughput': 9966.60} [INFO|2025-03-19 01:50:02] logging.py:143 >> {'loss': 0.8566, 'learning_rate': 4.9971e-05, 'epoch': 0.07, 'throughput': 9962.92} [INFO|2025-03-19 01:50:46] logging.py:143 >> {'loss': 0.7975, 'learning_rate': 4.9970e-05, 'epoch': 0.07, 'throughput': 9955.45} [INFO|2025-03-19 01:51:31] logging.py:143 >> {'loss': 0.8279, 'learning_rate': 4.9970e-05, 'epoch': 0.07, 'throughput': 9945.77} [INFO|2025-03-19 01:52:15] logging.py:143 >> {'loss': 0.7910, 'learning_rate': 4.9969e-05, 'epoch': 0.07, 'throughput': 9937.71} [INFO|2025-03-19 01:52:59] logging.py:143 >> {'loss': 0.8335, 'learning_rate': 4.9968e-05, 'epoch': 0.07, 'throughput': 9931.80} [INFO|2025-03-19 01:53:42] logging.py:143 >> {'loss': 0.8207, 'learning_rate': 4.9968e-05, 'epoch': 0.07, 'throughput': 9926.91} [INFO|2025-03-19 01:54:28] logging.py:143 >> {'loss': 0.7999, 'learning_rate': 4.9967e-05, 'epoch': 0.07, 'throughput': 9916.15} [INFO|2025-03-19 01:55:15] logging.py:143 >> {'loss': 0.8460, 'learning_rate': 4.9966e-05, 'epoch': 0.07, 'throughput': 9905.62} [INFO|2025-03-19 01:56:03] logging.py:143 >> {'loss': 0.8362, 'learning_rate': 4.9965e-05, 'epoch': 0.07, 'throughput': 9892.93} [INFO|2025-03-19 01:56:47] logging.py:143 >> {'loss': 0.8272, 'learning_rate': 4.9965e-05, 'epoch': 0.07, 'throughput': 9887.28} [INFO|2025-03-19 01:57:30] logging.py:143 >> {'loss': 0.7929, 'learning_rate': 4.9964e-05, 'epoch': 0.07, 'throughput': 9882.50} [INFO|2025-03-19 01:58:16] logging.py:143 >> {'loss': 0.8217, 'learning_rate': 4.9963e-05, 'epoch': 0.07, 'throughput': 9875.05} [INFO|2025-03-19 01:59:00] logging.py:143 >> {'loss': 0.8189, 'learning_rate': 4.9962e-05, 'epoch': 0.07, 'throughput': 9869.85} [INFO|2025-03-19 01:59:45] logging.py:143 >> {'loss': 0.8427, 'learning_rate': 4.9962e-05, 'epoch': 0.07, 'throughput': 9862.28} [INFO|2025-03-19 02:00:31] logging.py:143 >> {'loss': 0.8364, 'learning_rate': 4.9961e-05, 'epoch': 0.07, 'throughput': 9856.05} [INFO|2025-03-19 02:01:14] logging.py:143 >> {'loss': 0.8287, 'learning_rate': 4.9960e-05, 'epoch': 0.07, 'throughput': 9852.34} [INFO|2025-03-19 02:01:55] logging.py:143 >> {'loss': 0.8316, 'learning_rate': 4.9959e-05, 'epoch': 0.08, 'throughput': 9850.72} [INFO|2025-03-19 02:02:35] logging.py:143 >> {'loss': 0.8186, 'learning_rate': 4.9958e-05, 'epoch': 0.08, 'throughput': 9852.77} [INFO|2025-03-19 02:03:14] logging.py:143 >> {'loss': 0.8468, 'learning_rate': 4.9958e-05, 'epoch': 0.08, 'throughput': 9856.24} [INFO|2025-03-19 02:03:55] logging.py:143 >> {'loss': 0.7969, 'learning_rate': 4.9957e-05, 'epoch': 0.08, 'throughput': 9857.49} [INFO|2025-03-19 02:04:35] logging.py:143 >> {'loss': 0.8074, 'learning_rate': 4.9956e-05, 'epoch': 0.08, 'throughput': 9858.97} [INFO|2025-03-19 02:05:14] logging.py:143 >> {'loss': 0.8382, 'learning_rate': 4.9955e-05, 'epoch': 0.08, 'throughput': 9861.56} [INFO|2025-03-19 02:05:54] logging.py:143 >> {'loss': 0.7877, 'learning_rate': 4.9954e-05, 'epoch': 0.08, 'throughput': 9861.95} [INFO|2025-03-19 02:06:35] logging.py:143 >> {'loss': 0.8230, 'learning_rate': 4.9953e-05, 'epoch': 0.08, 'throughput': 9864.01} [INFO|2025-03-19 02:07:18] logging.py:143 >> {'loss': 0.8282, 'learning_rate': 4.9953e-05, 'epoch': 0.08, 'throughput': 9861.65} [INFO|2025-03-19 02:07:58] logging.py:143 >> {'loss': 0.8126, 'learning_rate': 4.9952e-05, 'epoch': 0.08, 'throughput': 9862.86} [INFO|2025-03-19 02:08:39] logging.py:143 >> {'loss': 0.8298, 'learning_rate': 4.9951e-05, 'epoch': 0.08, 'throughput': 9861.89} [INFO|2025-03-19 02:09:18] logging.py:143 >> {'loss': 0.8118, 'learning_rate': 4.9950e-05, 'epoch': 0.08, 'throughput': 9864.70} [INFO|2025-03-19 02:09:58] logging.py:143 >> {'loss': 0.8508, 'learning_rate': 4.9949e-05, 'epoch': 0.08, 'throughput': 9865.71} [INFO|2025-03-19 02:10:39] logging.py:143 >> {'loss': 0.8129, 'learning_rate': 4.9948e-05, 'epoch': 0.08, 'throughput': 9865.80} [INFO|2025-03-19 02:11:20] logging.py:143 >> {'loss': 0.8462, 'learning_rate': 4.9947e-05, 'epoch': 0.08, 'throughput': 9867.80} [INFO|2025-03-19 02:12:01] logging.py:143 >> {'loss': 0.7879, 'learning_rate': 4.9946e-05, 'epoch': 0.08, 'throughput': 9868.11} [INFO|2025-03-19 02:12:42] logging.py:143 >> {'loss': 0.8443, 'learning_rate': 4.9945e-05, 'epoch': 0.08, 'throughput': 9869.20} [INFO|2025-03-19 02:13:23] logging.py:143 >> {'loss': 0.8051, 'learning_rate': 4.9945e-05, 'epoch': 0.08, 'throughput': 9870.28} [INFO|2025-03-19 02:14:03] logging.py:143 >> {'loss': 0.8066, 'learning_rate': 4.9944e-05, 'epoch': 0.08, 'throughput': 9871.62} [INFO|2025-03-19 02:14:44] logging.py:143 >> {'loss': 0.8267, 'learning_rate': 4.9943e-05, 'epoch': 0.09, 'throughput': 9872.75} [INFO|2025-03-19 02:15:25] logging.py:143 >> {'loss': 0.8166, 'learning_rate': 4.9942e-05, 'epoch': 0.09, 'throughput': 9873.96} [INFO|2025-03-19 02:16:06] logging.py:143 >> {'loss': 0.8373, 'learning_rate': 4.9941e-05, 'epoch': 0.09, 'throughput': 9873.84} [INFO|2025-03-19 02:16:45] logging.py:143 >> {'loss': 0.8730, 'learning_rate': 4.9940e-05, 'epoch': 0.09, 'throughput': 9876.86} [INFO|2025-03-19 02:17:25] logging.py:143 >> {'loss': 0.8016, 'learning_rate': 4.9939e-05, 'epoch': 0.09, 'throughput': 9878.51} [INFO|2025-03-19 02:18:05] logging.py:143 >> {'loss': 0.8311, 'learning_rate': 4.9938e-05, 'epoch': 0.09, 'throughput': 9880.91} [INFO|2025-03-19 02:18:45] logging.py:143 >> {'loss': 0.8095, 'learning_rate': 4.9937e-05, 'epoch': 0.09, 'throughput': 9881.54} [INFO|2025-03-19 02:19:24] logging.py:143 >> {'loss': 0.8160, 'learning_rate': 4.9936e-05, 'epoch': 0.09, 'throughput': 9882.90} [INFO|2025-03-19 02:20:06] logging.py:143 >> {'loss': 0.8139, 'learning_rate': 4.9935e-05, 'epoch': 0.09, 'throughput': 9882.75} [INFO|2025-03-19 02:20:46] logging.py:143 >> {'loss': 0.7980, 'learning_rate': 4.9934e-05, 'epoch': 0.09, 'throughput': 9883.52} [INFO|2025-03-19 02:21:26] logging.py:143 >> {'loss': 0.8347, 'learning_rate': 4.9933e-05, 'epoch': 0.09, 'throughput': 9884.43} [INFO|2025-03-19 02:22:07] logging.py:143 >> {'loss': 0.8198, 'learning_rate': 4.9932e-05, 'epoch': 0.09, 'throughput': 9885.10} [INFO|2025-03-19 02:22:47] logging.py:143 >> {'loss': 0.8345, 'learning_rate': 4.9931e-05, 'epoch': 0.09, 'throughput': 9885.89} [INFO|2025-03-19 02:23:27] logging.py:143 >> {'loss': 0.7980, 'learning_rate': 4.9930e-05, 'epoch': 0.09, 'throughput': 9886.98} [INFO|2025-03-19 02:24:08] logging.py:143 >> {'loss': 0.8194, 'learning_rate': 4.9929e-05, 'epoch': 0.09, 'throughput': 9887.44} [INFO|2025-03-19 02:24:48] logging.py:143 >> {'loss': 0.7951, 'learning_rate': 4.9928e-05, 'epoch': 0.09, 'throughput': 9887.44} [INFO|2025-03-19 02:25:29] logging.py:143 >> {'loss': 0.8179, 'learning_rate': 4.9927e-05, 'epoch': 0.09, 'throughput': 9887.86} [INFO|2025-03-19 02:26:09] logging.py:143 >> {'loss': 0.8188, 'learning_rate': 4.9925e-05, 'epoch': 0.09, 'throughput': 9888.64} [INFO|2025-03-19 02:26:51] logging.py:143 >> {'loss': 0.8407, 'learning_rate': 4.9924e-05, 'epoch': 0.10, 'throughput': 9890.54} [INFO|2025-03-19 02:27:30] logging.py:143 >> {'loss': 0.7612, 'learning_rate': 4.9923e-05, 'epoch': 0.10, 'throughput': 9891.38} [INFO|2025-03-19 02:28:10] logging.py:143 >> {'loss': 0.7774, 'learning_rate': 4.9922e-05, 'epoch': 0.10, 'throughput': 9891.05} [INFO|2025-03-19 02:28:52] logging.py:143 >> {'loss': 0.8161, 'learning_rate': 4.9921e-05, 'epoch': 0.10, 'throughput': 9890.94} [INFO|2025-03-19 02:29:33] logging.py:143 >> {'loss': 0.7973, 'learning_rate': 4.9920e-05, 'epoch': 0.10, 'throughput': 9892.72} [INFO|2025-03-19 02:30:13] logging.py:143 >> {'loss': 0.8169, 'learning_rate': 4.9919e-05, 'epoch': 0.10, 'throughput': 9893.97} [INFO|2025-03-19 02:30:55] logging.py:143 >> {'loss': 0.8000, 'learning_rate': 4.9918e-05, 'epoch': 0.10, 'throughput': 9893.40} [INFO|2025-03-19 02:31:36] logging.py:143 >> {'loss': 0.8217, 'learning_rate': 4.9917e-05, 'epoch': 0.10, 'throughput': 9893.63} [INFO|2025-03-19 02:32:15] logging.py:143 >> {'loss': 0.8281, 'learning_rate': 4.9915e-05, 'epoch': 0.10, 'throughput': 9894.81} [INFO|2025-03-19 02:32:55] logging.py:143 >> {'loss': 0.8145, 'learning_rate': 4.9914e-05, 'epoch': 0.10, 'throughput': 9895.38} [INFO|2025-03-19 02:33:36] logging.py:143 >> {'loss': 0.7955, 'learning_rate': 4.9913e-05, 'epoch': 0.10, 'throughput': 9895.85} [INFO|2025-03-19 02:34:18] logging.py:143 >> {'loss': 0.7833, 'learning_rate': 4.9912e-05, 'epoch': 0.10, 'throughput': 9894.43} [INFO|2025-03-19 02:34:59] logging.py:143 >> {'loss': 0.8008, 'learning_rate': 4.9911e-05, 'epoch': 0.10, 'throughput': 9894.96} [INFO|2025-03-19 02:35:38] logging.py:143 >> {'loss': 0.8488, 'learning_rate': 4.9910e-05, 'epoch': 0.10, 'throughput': 9895.76} [INFO|2025-03-19 02:36:19] logging.py:143 >> {'loss': 0.7952, 'learning_rate': 4.9908e-05, 'epoch': 0.10, 'throughput': 9895.04} [INFO|2025-03-19 02:37:01] logging.py:143 >> {'loss': 0.8035, 'learning_rate': 4.9907e-05, 'epoch': 0.10, 'throughput': 9897.09} [INFO|2025-03-19 02:37:40] logging.py:143 >> {'loss': 0.7752, 'learning_rate': 4.9906e-05, 'epoch': 0.10, 'throughput': 9897.32} [INFO|2025-03-19 02:38:20] logging.py:143 >> {'loss': 0.8430, 'learning_rate': 4.9905e-05, 'epoch': 0.10, 'throughput': 9898.16} [INFO|2025-03-19 02:39:02] logging.py:143 >> {'loss': 0.8244, 'learning_rate': 4.9904e-05, 'epoch': 0.10, 'throughput': 9897.21} [INFO|2025-03-19 02:39:42] logging.py:143 >> {'loss': 0.8288, 'learning_rate': 4.9902e-05, 'epoch': 0.11, 'throughput': 9898.62} [INFO|2025-03-19 02:40:23] logging.py:143 >> {'loss': 0.7685, 'learning_rate': 4.9901e-05, 'epoch': 0.11, 'throughput': 9897.47} [INFO|2025-03-19 02:41:03] logging.py:143 >> {'loss': 0.8303, 'learning_rate': 4.9900e-05, 'epoch': 0.11, 'throughput': 9898.32} [INFO|2025-03-19 02:41:44] logging.py:143 >> {'loss': 0.7918, 'learning_rate': 4.9899e-05, 'epoch': 0.11, 'throughput': 9897.96} [INFO|2025-03-19 02:42:24] logging.py:143 >> {'loss': 0.8125, 'learning_rate': 4.9897e-05, 'epoch': 0.11, 'throughput': 9898.94} [INFO|2025-03-19 02:43:04] logging.py:143 >> {'loss': 0.7751, 'learning_rate': 4.9896e-05, 'epoch': 0.11, 'throughput': 9898.63} [INFO|2025-03-19 02:43:43] logging.py:143 >> {'loss': 0.8265, 'learning_rate': 4.9895e-05, 'epoch': 0.11, 'throughput': 9900.65} [INFO|2025-03-19 02:44:24] logging.py:143 >> {'loss': 0.8047, 'learning_rate': 4.9893e-05, 'epoch': 0.11, 'throughput': 9901.04} [INFO|2025-03-19 02:45:05] logging.py:143 >> {'loss': 0.8138, 'learning_rate': 4.9892e-05, 'epoch': 0.11, 'throughput': 9901.82} [INFO|2025-03-19 02:45:47] logging.py:143 >> {'loss': 0.8068, 'learning_rate': 4.9891e-05, 'epoch': 0.11, 'throughput': 9900.90} [INFO|2025-03-19 02:46:28] logging.py:143 >> {'loss': 0.8132, 'learning_rate': 4.9890e-05, 'epoch': 0.11, 'throughput': 9903.10} [INFO|2025-03-19 02:47:08] logging.py:143 >> {'loss': 0.8043, 'learning_rate': 4.9888e-05, 'epoch': 0.11, 'throughput': 9904.67} [INFO|2025-03-19 02:47:48] logging.py:143 >> {'loss': 0.8137, 'learning_rate': 4.9887e-05, 'epoch': 0.11, 'throughput': 9905.56} [INFO|2025-03-19 02:48:28] logging.py:143 >> {'loss': 0.8339, 'learning_rate': 4.9886e-05, 'epoch': 0.11, 'throughput': 9906.21} [INFO|2025-03-19 02:49:07] logging.py:143 >> {'loss': 0.7938, 'learning_rate': 4.9884e-05, 'epoch': 0.11, 'throughput': 9907.80} [INFO|2025-03-19 02:49:50] logging.py:143 >> {'loss': 0.8062, 'learning_rate': 4.9883e-05, 'epoch': 0.11, 'throughput': 9906.49} [INFO|2025-03-19 02:50:30] logging.py:143 >> {'loss': 0.8097, 'learning_rate': 4.9882e-05, 'epoch': 0.11, 'throughput': 9906.26} [INFO|2025-03-19 02:51:12] logging.py:143 >> {'loss': 0.8028, 'learning_rate': 4.9880e-05, 'epoch': 0.11, 'throughput': 9905.65} [INFO|2025-03-19 02:51:51] logging.py:143 >> {'loss': 0.8024, 'learning_rate': 4.9879e-05, 'epoch': 0.11, 'throughput': 9905.32} [INFO|2025-03-19 02:52:32] logging.py:143 >> {'loss': 0.8247, 'learning_rate': 4.9877e-05, 'epoch': 0.12, 'throughput': 9906.03} [INFO|2025-03-19 02:53:13] logging.py:143 >> {'loss': 0.7674, 'learning_rate': 4.9876e-05, 'epoch': 0.12, 'throughput': 9907.34} [INFO|2025-03-19 02:53:54] logging.py:143 >> {'loss': 0.8016, 'learning_rate': 4.9875e-05, 'epoch': 0.12, 'throughput': 9907.49} [INFO|2025-03-19 02:54:34] logging.py:143 >> {'loss': 0.7950, 'learning_rate': 4.9873e-05, 'epoch': 0.12, 'throughput': 9907.26} [INFO|2025-03-19 02:55:13] logging.py:143 >> {'loss': 0.8522, 'learning_rate': 4.9872e-05, 'epoch': 0.12, 'throughput': 9909.29} [INFO|2025-03-19 02:55:54] logging.py:143 >> {'loss': 0.7740, 'learning_rate': 4.9870e-05, 'epoch': 0.12, 'throughput': 9909.13} [INFO|2025-03-19 02:56:34] logging.py:143 >> {'loss': 0.7934, 'learning_rate': 4.9869e-05, 'epoch': 0.12, 'throughput': 9910.12} [INFO|2025-03-19 02:57:16] logging.py:143 >> {'loss': 0.8117, 'learning_rate': 4.9868e-05, 'epoch': 0.12, 'throughput': 9910.88} [INFO|2025-03-19 02:57:55] logging.py:143 >> {'loss': 0.8059, 'learning_rate': 4.9866e-05, 'epoch': 0.12, 'throughput': 9912.28} [INFO|2025-03-19 02:58:35] logging.py:143 >> {'loss': 0.8021, 'learning_rate': 4.9865e-05, 'epoch': 0.12, 'throughput': 9913.53} [INFO|2025-03-19 02:59:15] logging.py:143 >> {'loss': 0.8192, 'learning_rate': 4.9863e-05, 'epoch': 0.12, 'throughput': 9914.45} [INFO|2025-03-19 02:59:57] logging.py:143 >> {'loss': 0.8060, 'learning_rate': 4.9862e-05, 'epoch': 0.12, 'throughput': 9914.15} [INFO|2025-03-19 03:00:37] logging.py:143 >> {'loss': 0.8230, 'learning_rate': 4.9860e-05, 'epoch': 0.12, 'throughput': 9915.53} [INFO|2025-03-19 03:01:17] logging.py:143 >> {'loss': 0.7670, 'learning_rate': 4.9859e-05, 'epoch': 0.12, 'throughput': 9916.08} [INFO|2025-03-19 03:01:56] logging.py:143 >> {'loss': 0.8110, 'learning_rate': 4.9857e-05, 'epoch': 0.12, 'throughput': 9917.80} [INFO|2025-03-19 03:02:36] logging.py:143 >> {'loss': 0.7915, 'learning_rate': 4.9856e-05, 'epoch': 0.12, 'throughput': 9917.99} [INFO|2025-03-19 03:03:18] logging.py:143 >> {'loss': 0.7845, 'learning_rate': 4.9854e-05, 'epoch': 0.12, 'throughput': 9916.41} [INFO|2025-03-19 03:03:59] logging.py:143 >> {'loss': 0.7751, 'learning_rate': 4.9853e-05, 'epoch': 0.12, 'throughput': 9916.17} [INFO|2025-03-19 03:04:40] logging.py:143 >> {'loss': 0.8355, 'learning_rate': 4.9851e-05, 'epoch': 0.12, 'throughput': 9916.95} [INFO|2025-03-19 03:05:20] logging.py:143 >> {'loss': 0.7900, 'learning_rate': 4.9850e-05, 'epoch': 0.13, 'throughput': 9917.86} [INFO|2025-03-19 03:06:01] logging.py:143 >> {'loss': 0.7924, 'learning_rate': 4.9848e-05, 'epoch': 0.13, 'throughput': 9917.32} [INFO|2025-03-19 03:06:42] logging.py:143 >> {'loss': 0.8011, 'learning_rate': 4.9847e-05, 'epoch': 0.13, 'throughput': 9917.50} [INFO|2025-03-19 03:07:22] logging.py:143 >> {'loss': 0.8201, 'learning_rate': 4.9845e-05, 'epoch': 0.13, 'throughput': 9917.64} [INFO|2025-03-19 03:08:01] logging.py:143 >> {'loss': 0.7982, 'learning_rate': 4.9844e-05, 'epoch': 0.13, 'throughput': 9918.12} [INFO|2025-03-19 03:08:41] logging.py:143 >> {'loss': 0.7770, 'learning_rate': 4.9842e-05, 'epoch': 0.13, 'throughput': 9918.75} [INFO|2025-03-19 03:09:20] logging.py:143 >> {'loss': 0.8362, 'learning_rate': 4.9840e-05, 'epoch': 0.13, 'throughput': 9920.16} [INFO|2025-03-19 03:10:00] logging.py:143 >> {'loss': 0.7740, 'learning_rate': 4.9839e-05, 'epoch': 0.13, 'throughput': 9919.98} [INFO|2025-03-19 03:10:40] logging.py:143 >> {'loss': 0.8012, 'learning_rate': 4.9837e-05, 'epoch': 0.13, 'throughput': 9921.41} [INFO|2025-03-19 03:11:18] logging.py:143 >> {'loss': 0.7752, 'learning_rate': 4.9836e-05, 'epoch': 0.13, 'throughput': 9921.82} [INFO|2025-03-19 03:11:58] logging.py:143 >> {'loss': 0.7924, 'learning_rate': 4.9834e-05, 'epoch': 0.13, 'throughput': 9921.61} [INFO|2025-03-19 03:12:39] logging.py:143 >> {'loss': 0.8042, 'learning_rate': 4.9832e-05, 'epoch': 0.13, 'throughput': 9921.03} [INFO|2025-03-19 03:13:19] logging.py:143 >> {'loss': 0.8004, 'learning_rate': 4.9831e-05, 'epoch': 0.13, 'throughput': 9921.45} [INFO|2025-03-19 03:14:01] logging.py:143 >> {'loss': 0.8123, 'learning_rate': 4.9829e-05, 'epoch': 0.13, 'throughput': 9920.78} [INFO|2025-03-19 03:14:41] logging.py:143 >> {'loss': 0.8221, 'learning_rate': 4.9827e-05, 'epoch': 0.13, 'throughput': 9921.07} [INFO|2025-03-19 03:15:22] logging.py:143 >> {'loss': 0.7378, 'learning_rate': 4.9826e-05, 'epoch': 0.13, 'throughput': 9920.91} [INFO|2025-03-19 03:16:01] logging.py:143 >> {'loss': 0.7706, 'learning_rate': 4.9824e-05, 'epoch': 0.13, 'throughput': 9921.73} [INFO|2025-03-19 03:16:41] logging.py:143 >> {'loss': 0.7940, 'learning_rate': 4.9823e-05, 'epoch': 0.13, 'throughput': 9923.38} [INFO|2025-03-19 03:17:21] logging.py:143 >> {'loss': 0.7781, 'learning_rate': 4.9821e-05, 'epoch': 0.13, 'throughput': 9924.79} [INFO|2025-03-19 03:18:03] logging.py:143 >> {'loss': 0.7953, 'learning_rate': 4.9819e-05, 'epoch': 0.14, 'throughput': 9924.99} [INFO|2025-03-19 03:18:43] logging.py:143 >> {'loss': 0.7869, 'learning_rate': 4.9818e-05, 'epoch': 0.14, 'throughput': 9924.84} [INFO|2025-03-19 03:19:23] logging.py:143 >> {'loss': 0.7930, 'learning_rate': 4.9816e-05, 'epoch': 0.14, 'throughput': 9925.39} [INFO|2025-03-19 03:20:03] logging.py:143 >> {'loss': 0.7727, 'learning_rate': 4.9814e-05, 'epoch': 0.14, 'throughput': 9926.09} [INFO|2025-03-19 03:20:42] logging.py:143 >> {'loss': 0.7747, 'learning_rate': 4.9812e-05, 'epoch': 0.14, 'throughput': 9927.12} [INFO|2025-03-19 03:21:23] logging.py:143 >> {'loss': 0.8148, 'learning_rate': 4.9811e-05, 'epoch': 0.14, 'throughput': 9927.85} [INFO|2025-03-19 03:22:02] logging.py:143 >> {'loss': 0.8015, 'learning_rate': 4.9809e-05, 'epoch': 0.14, 'throughput': 9928.95} [INFO|2025-03-19 03:22:43] logging.py:143 >> {'loss': 0.7558, 'learning_rate': 4.9807e-05, 'epoch': 0.14, 'throughput': 9928.55} [INFO|2025-03-19 03:23:24] logging.py:143 >> {'loss': 0.7499, 'learning_rate': 4.9805e-05, 'epoch': 0.14, 'throughput': 9927.68} [INFO|2025-03-19 03:24:04] logging.py:143 >> {'loss': 0.7967, 'learning_rate': 4.9804e-05, 'epoch': 0.14, 'throughput': 9927.93} [INFO|2025-03-19 03:24:46] logging.py:143 >> {'loss': 0.7805, 'learning_rate': 4.9802e-05, 'epoch': 0.14, 'throughput': 9927.26} [INFO|2025-03-19 03:25:25] logging.py:143 >> {'loss': 0.7740, 'learning_rate': 4.9800e-05, 'epoch': 0.14, 'throughput': 9928.98} [INFO|2025-03-19 03:26:06] logging.py:143 >> {'loss': 0.7644, 'learning_rate': 4.9798e-05, 'epoch': 0.14, 'throughput': 9929.00} [INFO|2025-03-19 03:26:46] logging.py:143 >> {'loss': 0.7842, 'learning_rate': 4.9797e-05, 'epoch': 0.14, 'throughput': 9928.70} [INFO|2025-03-19 03:27:27] logging.py:143 >> {'loss': 0.8054, 'learning_rate': 4.9795e-05, 'epoch': 0.14, 'throughput': 9929.34} [INFO|2025-03-19 03:28:07] logging.py:143 >> {'loss': 0.7885, 'learning_rate': 4.9793e-05, 'epoch': 0.14, 'throughput': 9929.84} [INFO|2025-03-19 03:28:47] logging.py:143 >> {'loss': 0.7982, 'learning_rate': 4.9791e-05, 'epoch': 0.14, 'throughput': 9929.90} [INFO|2025-03-19 03:29:27] logging.py:143 >> {'loss': 0.8051, 'learning_rate': 4.9790e-05, 'epoch': 0.14, 'throughput': 9929.87} [INFO|2025-03-19 03:30:06] logging.py:143 >> {'loss': 0.7617, 'learning_rate': 4.9788e-05, 'epoch': 0.14, 'throughput': 9930.90} [INFO|2025-03-19 03:30:48] logging.py:143 >> {'loss': 0.7614, 'learning_rate': 4.9786e-05, 'epoch': 0.15, 'throughput': 9930.22} [INFO|2025-03-19 03:31:28] logging.py:143 >> {'loss': 0.7919, 'learning_rate': 4.9784e-05, 'epoch': 0.15, 'throughput': 9931.55} [INFO|2025-03-19 03:32:08] logging.py:143 >> {'loss': 0.7598, 'learning_rate': 4.9782e-05, 'epoch': 0.15, 'throughput': 9932.14} [INFO|2025-03-19 03:32:46] logging.py:143 >> {'loss': 0.7718, 'learning_rate': 4.9780e-05, 'epoch': 0.15, 'throughput': 9932.86} [INFO|2025-03-19 03:33:27] logging.py:143 >> {'loss': 0.7980, 'learning_rate': 4.9778e-05, 'epoch': 0.15, 'throughput': 9932.72} [INFO|2025-03-19 03:34:07] logging.py:143 >> {'loss': 0.7911, 'learning_rate': 4.9777e-05, 'epoch': 0.15, 'throughput': 9934.29} [INFO|2025-03-19 03:34:48] logging.py:143 >> {'loss': 0.8202, 'learning_rate': 4.9775e-05, 'epoch': 0.15, 'throughput': 9934.64} [INFO|2025-03-19 03:35:28] logging.py:143 >> {'loss': 0.7933, 'learning_rate': 4.9773e-05, 'epoch': 0.15, 'throughput': 9935.77} [INFO|2025-03-19 03:36:08] logging.py:143 >> {'loss': 0.7939, 'learning_rate': 4.9771e-05, 'epoch': 0.15, 'throughput': 9936.62} [INFO|2025-03-19 03:36:47] logging.py:143 >> {'loss': 0.7674, 'learning_rate': 4.9769e-05, 'epoch': 0.15, 'throughput': 9937.40} [INFO|2025-03-19 03:37:27] logging.py:143 >> {'loss': 0.7938, 'learning_rate': 4.9767e-05, 'epoch': 0.15, 'throughput': 9938.05} [INFO|2025-03-19 03:38:07] logging.py:143 >> {'loss': 0.7955, 'learning_rate': 4.9765e-05, 'epoch': 0.15, 'throughput': 9939.27} [INFO|2025-03-19 03:38:46] logging.py:143 >> {'loss': 0.7706, 'learning_rate': 4.9763e-05, 'epoch': 0.15, 'throughput': 9939.69} [INFO|2025-03-19 03:39:26] logging.py:143 >> {'loss': 0.8182, 'learning_rate': 4.9761e-05, 'epoch': 0.15, 'throughput': 9940.70} [INFO|2025-03-19 03:40:06] logging.py:143 >> {'loss': 0.8045, 'learning_rate': 4.9760e-05, 'epoch': 0.15, 'throughput': 9941.56} [INFO|2025-03-19 03:40:48] logging.py:143 >> {'loss': 0.8086, 'learning_rate': 4.9758e-05, 'epoch': 0.15, 'throughput': 9941.18} [INFO|2025-03-19 03:41:26] logging.py:143 >> {'loss': 0.7919, 'learning_rate': 4.9756e-05, 'epoch': 0.15, 'throughput': 9941.90} [INFO|2025-03-19 03:42:07] logging.py:143 >> {'loss': 0.7939, 'learning_rate': 4.9754e-05, 'epoch': 0.15, 'throughput': 9942.42} [INFO|2025-03-19 03:42:46] logging.py:143 >> {'loss': 0.7698, 'learning_rate': 4.9752e-05, 'epoch': 0.15, 'throughput': 9943.49} [INFO|2025-03-19 03:43:26] logging.py:143 >> {'loss': 0.8064, 'learning_rate': 4.9750e-05, 'epoch': 0.16, 'throughput': 9944.37} [INFO|2025-03-19 03:44:07] logging.py:143 >> {'loss': 0.7881, 'learning_rate': 4.9748e-05, 'epoch': 0.16, 'throughput': 9944.44} [INFO|2025-03-19 03:44:49] logging.py:143 >> {'loss': 0.7981, 'learning_rate': 4.9746e-05, 'epoch': 0.16, 'throughput': 9943.21} [INFO|2025-03-19 03:45:27] logging.py:143 >> {'loss': 0.7570, 'learning_rate': 4.9744e-05, 'epoch': 0.16, 'throughput': 9944.14} [INFO|2025-03-19 03:46:08] logging.py:143 >> {'loss': 0.7567, 'learning_rate': 4.9742e-05, 'epoch': 0.16, 'throughput': 9943.57} [INFO|2025-03-19 03:46:48] logging.py:143 >> {'loss': 0.7819, 'learning_rate': 4.9740e-05, 'epoch': 0.16, 'throughput': 9943.94} [INFO|2025-03-19 03:47:29] logging.py:143 >> {'loss': 0.7538, 'learning_rate': 4.9738e-05, 'epoch': 0.16, 'throughput': 9943.73} [INFO|2025-03-19 03:48:11] logging.py:143 >> {'loss': 0.7644, 'learning_rate': 4.9736e-05, 'epoch': 0.16, 'throughput': 9942.37} [INFO|2025-03-19 03:48:51] logging.py:143 >> {'loss': 0.7425, 'learning_rate': 4.9734e-05, 'epoch': 0.16, 'throughput': 9942.16} [INFO|2025-03-19 03:49:31] logging.py:143 >> {'loss': 0.7531, 'learning_rate': 4.9732e-05, 'epoch': 0.16, 'throughput': 9941.97} [INFO|2025-03-19 03:50:11] logging.py:143 >> {'loss': 0.8029, 'learning_rate': 4.9730e-05, 'epoch': 0.16, 'throughput': 9941.86} [INFO|2025-03-19 03:50:51] logging.py:143 >> {'loss': 0.7511, 'learning_rate': 4.9728e-05, 'epoch': 0.16, 'throughput': 9942.33} [INFO|2025-03-19 03:51:31] logging.py:143 >> {'loss': 0.7788, 'learning_rate': 4.9725e-05, 'epoch': 0.16, 'throughput': 9943.43} [INFO|2025-03-19 03:52:11] logging.py:143 >> {'loss': 0.7737, 'learning_rate': 4.9723e-05, 'epoch': 0.16, 'throughput': 9944.09} [INFO|2025-03-19 03:52:52] logging.py:143 >> {'loss': 0.7767, 'learning_rate': 4.9721e-05, 'epoch': 0.16, 'throughput': 9944.45} [INFO|2025-03-19 03:53:31] logging.py:143 >> {'loss': 0.7526, 'learning_rate': 4.9719e-05, 'epoch': 0.16, 'throughput': 9945.21} [INFO|2025-03-19 03:54:12] logging.py:143 >> {'loss': 0.8107, 'learning_rate': 4.9717e-05, 'epoch': 0.16, 'throughput': 9945.98} [INFO|2025-03-19 03:54:51] logging.py:143 >> {'loss': 0.7684, 'learning_rate': 4.9715e-05, 'epoch': 0.16, 'throughput': 9946.35} [INFO|2025-03-19 03:55:30] logging.py:143 >> {'loss': 0.7798, 'learning_rate': 4.9713e-05, 'epoch': 0.17, 'throughput': 9946.47} [INFO|2025-03-19 03:56:13] logging.py:143 >> {'loss': 0.7830, 'learning_rate': 4.9711e-05, 'epoch': 0.17, 'throughput': 9945.52} [INFO|2025-03-19 03:56:53] logging.py:143 >> {'loss': 0.8025, 'learning_rate': 4.9709e-05, 'epoch': 0.17, 'throughput': 9946.28} [INFO|2025-03-19 03:57:32] logging.py:143 >> {'loss': 0.7783, 'learning_rate': 4.9707e-05, 'epoch': 0.17, 'throughput': 9946.85} [INFO|2025-03-19 03:58:14] logging.py:143 >> {'loss': 0.7723, 'learning_rate': 4.9704e-05, 'epoch': 0.17, 'throughput': 9945.95} [INFO|2025-03-19 03:58:54] logging.py:143 >> {'loss': 0.8070, 'learning_rate': 4.9702e-05, 'epoch': 0.17, 'throughput': 9945.67} [INFO|2025-03-19 03:59:35] logging.py:143 >> {'loss': 0.7859, 'learning_rate': 4.9700e-05, 'epoch': 0.17, 'throughput': 9944.86} [INFO|2025-03-19 04:00:15] logging.py:143 >> {'loss': 0.7866, 'learning_rate': 4.9698e-05, 'epoch': 0.17, 'throughput': 9944.58} [INFO|2025-03-19 04:00:56] logging.py:143 >> {'loss': 0.7997, 'learning_rate': 4.9696e-05, 'epoch': 0.17, 'throughput': 9944.55} [INFO|2025-03-19 04:01:36] logging.py:143 >> {'loss': 0.7718, 'learning_rate': 4.9694e-05, 'epoch': 0.17, 'throughput': 9944.56} [INFO|2025-03-19 04:02:16] logging.py:143 >> {'loss': 0.8029, 'learning_rate': 4.9691e-05, 'epoch': 0.17, 'throughput': 9944.46} [INFO|2025-03-19 04:02:56] logging.py:143 >> {'loss': 0.7916, 'learning_rate': 4.9689e-05, 'epoch': 0.17, 'throughput': 9944.71} [INFO|2025-03-19 04:03:36] logging.py:143 >> {'loss': 0.7937, 'learning_rate': 4.9687e-05, 'epoch': 0.17, 'throughput': 9944.76} [INFO|2025-03-19 04:04:16] logging.py:143 >> {'loss': 0.8057, 'learning_rate': 4.9685e-05, 'epoch': 0.17, 'throughput': 9945.92} [INFO|2025-03-19 04:04:55] logging.py:143 >> {'loss': 0.7626, 'learning_rate': 4.9683e-05, 'epoch': 0.17, 'throughput': 9946.45} [INFO|2025-03-19 04:05:35] logging.py:143 >> {'loss': 0.8059, 'learning_rate': 4.9680e-05, 'epoch': 0.17, 'throughput': 9946.87} [INFO|2025-03-19 04:06:15] logging.py:143 >> {'loss': 0.7890, 'learning_rate': 4.9678e-05, 'epoch': 0.17, 'throughput': 9947.13} [INFO|2025-03-19 04:06:58] logging.py:143 >> {'loss': 0.7588, 'learning_rate': 4.9676e-05, 'epoch': 0.17, 'throughput': 9946.41} [INFO|2025-03-19 04:07:37] logging.py:143 >> {'loss': 0.7891, 'learning_rate': 4.9674e-05, 'epoch': 0.17, 'throughput': 9947.59} [INFO|2025-03-19 04:08:17] logging.py:143 >> {'loss': 0.8460, 'learning_rate': 4.9671e-05, 'epoch': 0.18, 'throughput': 9948.33} [INFO|2025-03-19 04:08:57] logging.py:143 >> {'loss': 0.7853, 'learning_rate': 4.9669e-05, 'epoch': 0.18, 'throughput': 9949.08} [INFO|2025-03-19 04:09:38] logging.py:143 >> {'loss': 0.7737, 'learning_rate': 4.9667e-05, 'epoch': 0.18, 'throughput': 9948.72} [INFO|2025-03-19 04:10:19] logging.py:143 >> {'loss': 0.7746, 'learning_rate': 4.9665e-05, 'epoch': 0.18, 'throughput': 9949.62} [INFO|2025-03-19 04:10:59] logging.py:143 >> {'loss': 0.7704, 'learning_rate': 4.9662e-05, 'epoch': 0.18, 'throughput': 9948.90} [INFO|2025-03-19 04:11:40] logging.py:143 >> {'loss': 0.8028, 'learning_rate': 4.9660e-05, 'epoch': 0.18, 'throughput': 9948.95} [INFO|2025-03-19 04:12:21] logging.py:143 >> {'loss': 0.7752, 'learning_rate': 4.9658e-05, 'epoch': 0.18, 'throughput': 9948.52} [INFO|2025-03-19 04:13:02] logging.py:143 >> {'loss': 0.7984, 'learning_rate': 4.9655e-05, 'epoch': 0.18, 'throughput': 9948.83} [INFO|2025-03-19 04:13:41] logging.py:143 >> {'loss': 0.7829, 'learning_rate': 4.9653e-05, 'epoch': 0.18, 'throughput': 9948.46} [INFO|2025-03-19 04:14:21] logging.py:143 >> {'loss': 0.7871, 'learning_rate': 4.9651e-05, 'epoch': 0.18, 'throughput': 9948.95} [INFO|2025-03-19 04:15:03] logging.py:143 >> {'loss': 0.7775, 'learning_rate': 4.9648e-05, 'epoch': 0.18, 'throughput': 9948.76} [INFO|2025-03-19 04:15:44] logging.py:143 >> {'loss': 0.7338, 'learning_rate': 4.9646e-05, 'epoch': 0.18, 'throughput': 9949.02} [INFO|2025-03-19 04:16:24] logging.py:143 >> {'loss': 0.7835, 'learning_rate': 4.9644e-05, 'epoch': 0.18, 'throughput': 9949.51} [INFO|2025-03-19 04:17:03] logging.py:143 >> {'loss': 0.7759, 'learning_rate': 4.9641e-05, 'epoch': 0.18, 'throughput': 9950.40} [INFO|2025-03-19 04:17:43] logging.py:143 >> {'loss': 0.7591, 'learning_rate': 4.9639e-05, 'epoch': 0.18, 'throughput': 9950.73} [INFO|2025-03-19 04:18:23] logging.py:143 >> {'loss': 0.7637, 'learning_rate': 4.9637e-05, 'epoch': 0.18, 'throughput': 9950.30} [INFO|2025-03-19 04:19:04] logging.py:143 >> {'loss': 0.7870, 'learning_rate': 4.9634e-05, 'epoch': 0.18, 'throughput': 9950.16} [INFO|2025-03-19 04:19:44] logging.py:143 >> {'loss': 0.7374, 'learning_rate': 4.9632e-05, 'epoch': 0.18, 'throughput': 9950.38} [INFO|2025-03-19 04:20:26] logging.py:143 >> {'loss': 0.7631, 'learning_rate': 4.9629e-05, 'epoch': 0.18, 'throughput': 9950.80} [INFO|2025-03-19 04:21:07] logging.py:143 >> {'loss': 0.7589, 'learning_rate': 4.9627e-05, 'epoch': 0.19, 'throughput': 9950.61} [INFO|2025-03-19 04:21:47] logging.py:143 >> {'loss': 0.7206, 'learning_rate': 4.9625e-05, 'epoch': 0.19, 'throughput': 9950.44} [INFO|2025-03-19 04:22:27] logging.py:143 >> {'loss': 0.7469, 'learning_rate': 4.9622e-05, 'epoch': 0.19, 'throughput': 9950.85} [INFO|2025-03-19 04:23:07] logging.py:143 >> {'loss': 0.7820, 'learning_rate': 4.9620e-05, 'epoch': 0.19, 'throughput': 9951.77} [INFO|2025-03-19 04:23:49] logging.py:143 >> {'loss': 0.7703, 'learning_rate': 4.9617e-05, 'epoch': 0.19, 'throughput': 9951.26} [INFO|2025-03-19 04:24:30] logging.py:143 >> {'loss': 0.7581, 'learning_rate': 4.9615e-05, 'epoch': 0.19, 'throughput': 9950.96} [INFO|2025-03-19 04:25:11] logging.py:143 >> {'loss': 0.7953, 'learning_rate': 4.9612e-05, 'epoch': 0.19, 'throughput': 9950.67} [INFO|2025-03-19 04:25:52] logging.py:143 >> {'loss': 0.7862, 'learning_rate': 4.9610e-05, 'epoch': 0.19, 'throughput': 9951.21} [INFO|2025-03-19 04:26:32] logging.py:143 >> {'loss': 0.7561, 'learning_rate': 4.9607e-05, 'epoch': 0.19, 'throughput': 9951.26} [INFO|2025-03-19 04:27:12] logging.py:143 >> {'loss': 0.7688, 'learning_rate': 4.9605e-05, 'epoch': 0.19, 'throughput': 9951.74} [INFO|2025-03-19 04:27:52] logging.py:143 >> {'loss': 0.7756, 'learning_rate': 4.9603e-05, 'epoch': 0.19, 'throughput': 9951.60} [INFO|2025-03-19 04:28:32] logging.py:143 >> {'loss': 0.7603, 'learning_rate': 4.9600e-05, 'epoch': 0.19, 'throughput': 9951.74} [INFO|2025-03-19 04:29:13] logging.py:143 >> {'loss': 0.7300, 'learning_rate': 4.9598e-05, 'epoch': 0.19, 'throughput': 9951.37} [INFO|2025-03-19 04:29:53] logging.py:143 >> {'loss': 0.7547, 'learning_rate': 4.9595e-05, 'epoch': 0.19, 'throughput': 9952.05} [INFO|2025-03-19 04:30:35] logging.py:143 >> {'loss': 0.7751, 'learning_rate': 4.9593e-05, 'epoch': 0.19, 'throughput': 9952.04} [INFO|2025-03-19 04:31:15] logging.py:143 >> {'loss': 0.7608, 'learning_rate': 4.9590e-05, 'epoch': 0.19, 'throughput': 9953.07} [INFO|2025-03-19 04:31:54] logging.py:143 >> {'loss': 0.7810, 'learning_rate': 4.9587e-05, 'epoch': 0.19, 'throughput': 9953.46} [INFO|2025-03-19 04:32:33] logging.py:143 >> {'loss': 0.7680, 'learning_rate': 4.9585e-05, 'epoch': 0.19, 'throughput': 9954.46} [INFO|2025-03-19 04:33:13] logging.py:143 >> {'loss': 0.7737, 'learning_rate': 4.9582e-05, 'epoch': 0.19, 'throughput': 9954.70} [INFO|2025-03-19 04:33:53] logging.py:143 >> {'loss': 0.7837, 'learning_rate': 4.9580e-05, 'epoch': 0.20, 'throughput': 9955.08} [INFO|2025-03-19 04:34:34] logging.py:143 >> {'loss': 0.7561, 'learning_rate': 4.9577e-05, 'epoch': 0.20, 'throughput': 9954.59} [INFO|2025-03-19 04:35:14] logging.py:143 >> {'loss': 0.7469, 'learning_rate': 4.9575e-05, 'epoch': 0.20, 'throughput': 9955.20} [INFO|2025-03-19 04:35:54] logging.py:143 >> {'loss': 0.7500, 'learning_rate': 4.9572e-05, 'epoch': 0.20, 'throughput': 9955.61} [INFO|2025-03-19 04:36:34] logging.py:143 >> {'loss': 0.7566, 'learning_rate': 4.9570e-05, 'epoch': 0.20, 'throughput': 9955.53} [INFO|2025-03-19 04:37:15] logging.py:143 >> {'loss': 0.7843, 'learning_rate': 4.9567e-05, 'epoch': 0.20, 'throughput': 9955.98} [INFO|2025-03-19 04:37:56] logging.py:143 >> {'loss': 0.7468, 'learning_rate': 4.9564e-05, 'epoch': 0.20, 'throughput': 9955.51} [INFO|2025-03-19 04:38:36] logging.py:143 >> {'loss': 0.7768, 'learning_rate': 4.9562e-05, 'epoch': 0.20, 'throughput': 9955.70} [INFO|2025-03-19 04:39:18] logging.py:143 >> {'loss': 0.7989, 'learning_rate': 4.9559e-05, 'epoch': 0.20, 'throughput': 9955.41} [INFO|2025-03-19 04:39:58] logging.py:143 >> {'loss': 0.7680, 'learning_rate': 4.9557e-05, 'epoch': 0.20, 'throughput': 9955.63} [INFO|2025-03-19 04:40:38] logging.py:143 >> {'loss': 0.7704, 'learning_rate': 4.9554e-05, 'epoch': 0.20, 'throughput': 9956.73} [INFO|2025-03-19 04:41:19] logging.py:143 >> {'loss': 0.7893, 'learning_rate': 4.9551e-05, 'epoch': 0.20, 'throughput': 9956.46} [INFO|2025-03-19 04:41:58] logging.py:143 >> {'loss': 0.7458, 'learning_rate': 4.9549e-05, 'epoch': 0.20, 'throughput': 9956.78} [INFO|2025-03-19 04:42:38] logging.py:143 >> {'loss': 0.7357, 'learning_rate': 4.9546e-05, 'epoch': 0.20, 'throughput': 9957.55} [INFO|2025-03-19 04:43:20] logging.py:143 >> {'loss': 0.7766, 'learning_rate': 4.9543e-05, 'epoch': 0.20, 'throughput': 9957.29} [INFO|2025-03-19 04:44:00] logging.py:143 >> {'loss': 0.7274, 'learning_rate': 4.9541e-05, 'epoch': 0.20, 'throughput': 9957.27} [INFO|2025-03-19 04:44:41] logging.py:143 >> {'loss': 0.7572, 'learning_rate': 4.9538e-05, 'epoch': 0.20, 'throughput': 9957.46} [INFO|2025-03-19 04:45:21] logging.py:143 >> {'loss': 0.7379, 'learning_rate': 4.9535e-05, 'epoch': 0.20, 'throughput': 9957.37} [INFO|2025-03-19 04:46:03] logging.py:143 >> {'loss': 0.7913, 'learning_rate': 4.9533e-05, 'epoch': 0.20, 'throughput': 9957.16} [INFO|2025-03-19 04:46:44] logging.py:143 >> {'loss': 0.7376, 'learning_rate': 4.9530e-05, 'epoch': 0.21, 'throughput': 9956.89} [INFO|2025-03-19 04:47:24] logging.py:143 >> {'loss': 0.8050, 'learning_rate': 4.9527e-05, 'epoch': 0.21, 'throughput': 9957.81} [INFO|2025-03-19 04:48:04] logging.py:143 >> {'loss': 0.7355, 'learning_rate': 4.9524e-05, 'epoch': 0.21, 'throughput': 9957.94} [INFO|2025-03-19 04:48:43] logging.py:143 >> {'loss': 0.7438, 'learning_rate': 4.9522e-05, 'epoch': 0.21, 'throughput': 9957.80} [INFO|2025-03-19 04:49:24] logging.py:143 >> {'loss': 0.8121, 'learning_rate': 4.9519e-05, 'epoch': 0.21, 'throughput': 9958.00} [INFO|2025-03-19 04:50:05] logging.py:143 >> {'loss': 0.7230, 'learning_rate': 4.9516e-05, 'epoch': 0.21, 'throughput': 9958.07} [INFO|2025-03-19 04:50:43] logging.py:143 >> {'loss': 0.7841, 'learning_rate': 4.9514e-05, 'epoch': 0.21, 'throughput': 9958.81} [INFO|2025-03-19 04:51:23] logging.py:143 >> {'loss': 0.7551, 'learning_rate': 4.9511e-05, 'epoch': 0.21, 'throughput': 9959.23} [INFO|2025-03-19 04:52:05] logging.py:143 >> {'loss': 0.7995, 'learning_rate': 4.9508e-05, 'epoch': 0.21, 'throughput': 9959.25} [INFO|2025-03-19 04:52:45] logging.py:143 >> {'loss': 0.7944, 'learning_rate': 4.9505e-05, 'epoch': 0.21, 'throughput': 9959.52} [INFO|2025-03-19 04:53:25] logging.py:143 >> {'loss': 0.7658, 'learning_rate': 4.9503e-05, 'epoch': 0.21, 'throughput': 9959.88} [INFO|2025-03-19 04:54:04] logging.py:143 >> {'loss': 0.8040, 'learning_rate': 4.9500e-05, 'epoch': 0.21, 'throughput': 9960.63} [INFO|2025-03-19 04:54:44] logging.py:143 >> {'loss': 0.7956, 'learning_rate': 4.9497e-05, 'epoch': 0.21, 'throughput': 9960.35} [INFO|2025-03-19 04:55:24] logging.py:143 >> {'loss': 0.7920, 'learning_rate': 4.9494e-05, 'epoch': 0.21, 'throughput': 9961.05} [INFO|2025-03-19 04:56:05] logging.py:143 >> {'loss': 0.7656, 'learning_rate': 4.9491e-05, 'epoch': 0.21, 'throughput': 9960.83} [INFO|2025-03-19 04:56:47] logging.py:143 >> {'loss': 0.7452, 'learning_rate': 4.9489e-05, 'epoch': 0.21, 'throughput': 9960.81} [INFO|2025-03-19 04:57:28] logging.py:143 >> {'loss': 0.7722, 'learning_rate': 4.9486e-05, 'epoch': 0.21, 'throughput': 9960.59} [INFO|2025-03-19 04:58:08] logging.py:143 >> {'loss': 0.7844, 'learning_rate': 4.9483e-05, 'epoch': 0.21, 'throughput': 9961.32} [INFO|2025-03-19 04:58:49] logging.py:143 >> {'loss': 0.8173, 'learning_rate': 4.9480e-05, 'epoch': 0.21, 'throughput': 9961.30} [INFO|2025-03-19 04:59:29] logging.py:143 >> {'loss': 0.7476, 'learning_rate': 4.9477e-05, 'epoch': 0.22, 'throughput': 9961.58} [INFO|2025-03-19 05:00:12] logging.py:143 >> {'loss': 0.7768, 'learning_rate': 4.9474e-05, 'epoch': 0.22, 'throughput': 9961.50} [INFO|2025-03-19 05:00:53] logging.py:143 >> {'loss': 0.7529, 'learning_rate': 4.9472e-05, 'epoch': 0.22, 'throughput': 9961.64} [INFO|2025-03-19 05:01:32] logging.py:143 >> {'loss': 0.7337, 'learning_rate': 4.9469e-05, 'epoch': 0.22, 'throughput': 9961.59} [INFO|2025-03-19 05:02:14] logging.py:143 >> {'loss': 0.7461, 'learning_rate': 4.9466e-05, 'epoch': 0.22, 'throughput': 9962.16} [INFO|2025-03-19 05:02:54] logging.py:143 >> {'loss': 0.8059, 'learning_rate': 4.9463e-05, 'epoch': 0.22, 'throughput': 9963.30} [INFO|2025-03-19 05:03:34] logging.py:143 >> {'loss': 0.7669, 'learning_rate': 4.9460e-05, 'epoch': 0.22, 'throughput': 9964.12} [INFO|2025-03-19 05:04:14] logging.py:143 >> {'loss': 0.7844, 'learning_rate': 4.9457e-05, 'epoch': 0.22, 'throughput': 9964.77} [INFO|2025-03-19 05:04:55] logging.py:143 >> {'loss': 0.7515, 'learning_rate': 4.9454e-05, 'epoch': 0.22, 'throughput': 9965.41} [INFO|2025-03-19 05:05:37] logging.py:143 >> {'loss': 0.7494, 'learning_rate': 4.9451e-05, 'epoch': 0.22, 'throughput': 9965.05} [INFO|2025-03-19 05:06:17] logging.py:143 >> {'loss': 0.7760, 'learning_rate': 4.9448e-05, 'epoch': 0.22, 'throughput': 9965.27} [INFO|2025-03-19 05:06:57] logging.py:143 >> {'loss': 0.7570, 'learning_rate': 4.9445e-05, 'epoch': 0.22, 'throughput': 9965.62} [INFO|2025-03-19 05:07:37] logging.py:143 >> {'loss': 0.7929, 'learning_rate': 4.9443e-05, 'epoch': 0.22, 'throughput': 9965.76} [INFO|2025-03-19 05:08:18] logging.py:143 >> {'loss': 0.8036, 'learning_rate': 4.9440e-05, 'epoch': 0.22, 'throughput': 9965.63} [INFO|2025-03-19 05:08:58] logging.py:143 >> {'loss': 0.7675, 'learning_rate': 4.9437e-05, 'epoch': 0.22, 'throughput': 9965.98} [INFO|2025-03-19 05:09:39] logging.py:143 >> {'loss': 0.7402, 'learning_rate': 4.9434e-05, 'epoch': 0.22, 'throughput': 9965.97} [INFO|2025-03-19 05:10:20] logging.py:143 >> {'loss': 0.7679, 'learning_rate': 4.9431e-05, 'epoch': 0.22, 'throughput': 9965.97} [INFO|2025-03-19 05:11:02] logging.py:143 >> {'loss': 0.7384, 'learning_rate': 4.9428e-05, 'epoch': 0.22, 'throughput': 9965.29} [INFO|2025-03-19 05:11:42] logging.py:143 >> {'loss': 0.7803, 'learning_rate': 4.9425e-05, 'epoch': 0.23, 'throughput': 9965.96} [INFO|2025-03-19 05:12:22] logging.py:143 >> {'loss': 0.7923, 'learning_rate': 4.9422e-05, 'epoch': 0.23, 'throughput': 9966.32} [INFO|2025-03-19 05:13:02] logging.py:143 >> {'loss': 0.7502, 'learning_rate': 4.9419e-05, 'epoch': 0.23, 'throughput': 9966.84} [INFO|2025-03-19 05:13:42] logging.py:143 >> {'loss': 0.7800, 'learning_rate': 4.9416e-05, 'epoch': 0.23, 'throughput': 9966.74} [INFO|2025-03-19 05:14:23] logging.py:143 >> {'loss': 0.8046, 'learning_rate': 4.9413e-05, 'epoch': 0.23, 'throughput': 9967.06} [INFO|2025-03-19 05:15:03] logging.py:143 >> {'loss': 0.7565, 'learning_rate': 4.9410e-05, 'epoch': 0.23, 'throughput': 9966.95} [INFO|2025-03-19 05:15:43] logging.py:143 >> {'loss': 0.7603, 'learning_rate': 4.9407e-05, 'epoch': 0.23, 'throughput': 9967.56} [INFO|2025-03-19 05:16:23] logging.py:143 >> {'loss': 0.7582, 'learning_rate': 4.9404e-05, 'epoch': 0.23, 'throughput': 9968.45} [INFO|2025-03-19 05:17:04] logging.py:143 >> {'loss': 0.7440, 'learning_rate': 4.9401e-05, 'epoch': 0.23, 'throughput': 9968.92} [INFO|2025-03-19 05:17:44] logging.py:143 >> {'loss': 0.7661, 'learning_rate': 4.9398e-05, 'epoch': 0.23, 'throughput': 9968.87} [INFO|2025-03-19 05:18:25] logging.py:143 >> {'loss': 0.7380, 'learning_rate': 4.9395e-05, 'epoch': 0.23, 'throughput': 9968.50} [INFO|2025-03-19 05:19:06] logging.py:143 >> {'loss': 0.7723, 'learning_rate': 4.9391e-05, 'epoch': 0.23, 'throughput': 9968.61} [INFO|2025-03-19 05:19:46] logging.py:143 >> {'loss': 0.7688, 'learning_rate': 4.9388e-05, 'epoch': 0.23, 'throughput': 9968.74} [INFO|2025-03-19 05:20:27] logging.py:143 >> {'loss': 0.7559, 'learning_rate': 4.9385e-05, 'epoch': 0.23, 'throughput': 9968.81} [INFO|2025-03-19 05:21:08] logging.py:143 >> {'loss': 0.7684, 'learning_rate': 4.9382e-05, 'epoch': 0.23, 'throughput': 9968.65} [INFO|2025-03-19 05:21:48] logging.py:143 >> {'loss': 0.7654, 'learning_rate': 4.9379e-05, 'epoch': 0.23, 'throughput': 9968.53} [INFO|2025-03-19 05:22:29] logging.py:143 >> {'loss': 0.7541, 'learning_rate': 4.9376e-05, 'epoch': 0.23, 'throughput': 9968.33} [INFO|2025-03-19 05:23:09] logging.py:143 >> {'loss': 0.7662, 'learning_rate': 4.9373e-05, 'epoch': 0.23, 'throughput': 9968.71} [INFO|2025-03-19 05:23:50] logging.py:143 >> {'loss': 0.7554, 'learning_rate': 4.9370e-05, 'epoch': 0.23, 'throughput': 9968.80} [INFO|2025-03-19 05:24:30] logging.py:143 >> {'loss': 0.7557, 'learning_rate': 4.9367e-05, 'epoch': 0.24, 'throughput': 9969.18} [INFO|2025-03-19 05:25:11] logging.py:143 >> {'loss': 0.7288, 'learning_rate': 4.9364e-05, 'epoch': 0.24, 'throughput': 9968.92} [INFO|2025-03-19 05:25:51] logging.py:143 >> {'loss': 0.7781, 'learning_rate': 4.9360e-05, 'epoch': 0.24, 'throughput': 9968.99} [INFO|2025-03-19 05:26:32] logging.py:143 >> {'loss': 0.7925, 'learning_rate': 4.9357e-05, 'epoch': 0.24, 'throughput': 9969.19} [INFO|2025-03-19 05:27:11] logging.py:143 >> {'loss': 0.7886, 'learning_rate': 4.9354e-05, 'epoch': 0.24, 'throughput': 9970.05} [INFO|2025-03-19 05:27:50] logging.py:143 >> {'loss': 0.7696, 'learning_rate': 4.9351e-05, 'epoch': 0.24, 'throughput': 9970.44} [INFO|2025-03-19 05:28:30] logging.py:143 >> {'loss': 0.7463, 'learning_rate': 4.9348e-05, 'epoch': 0.24, 'throughput': 9970.16} [INFO|2025-03-19 05:29:10] logging.py:143 >> {'loss': 0.7632, 'learning_rate': 4.9345e-05, 'epoch': 0.24, 'throughput': 9970.30} [INFO|2025-03-19 05:29:51] logging.py:143 >> {'loss': 0.7506, 'learning_rate': 4.9341e-05, 'epoch': 0.24, 'throughput': 9969.80} [INFO|2025-03-19 05:30:31] logging.py:143 >> {'loss': 0.7576, 'learning_rate': 4.9338e-05, 'epoch': 0.24, 'throughput': 9970.01} [INFO|2025-03-19 05:31:09] logging.py:143 >> {'loss': 0.7151, 'learning_rate': 4.9335e-05, 'epoch': 0.24, 'throughput': 9970.97} [INFO|2025-03-19 05:31:50] logging.py:143 >> {'loss': 0.7668, 'learning_rate': 4.9332e-05, 'epoch': 0.24, 'throughput': 9970.82} [INFO|2025-03-19 05:32:30] logging.py:143 >> {'loss': 0.7950, 'learning_rate': 4.9329e-05, 'epoch': 0.24, 'throughput': 9971.31} [INFO|2025-03-19 05:33:10] logging.py:143 >> {'loss': 0.7513, 'learning_rate': 4.9325e-05, 'epoch': 0.24, 'throughput': 9971.44} [INFO|2025-03-19 05:33:50] logging.py:143 >> {'loss': 0.7522, 'learning_rate': 4.9322e-05, 'epoch': 0.24, 'throughput': 9971.91} [INFO|2025-03-19 05:34:29] logging.py:143 >> {'loss': 0.7568, 'learning_rate': 4.9319e-05, 'epoch': 0.24, 'throughput': 9972.73} [INFO|2025-03-19 05:35:08] logging.py:143 >> {'loss': 0.7767, 'learning_rate': 4.9316e-05, 'epoch': 0.24, 'throughput': 9972.93} [INFO|2025-03-19 05:35:48] logging.py:143 >> {'loss': 0.7443, 'learning_rate': 4.9312e-05, 'epoch': 0.24, 'throughput': 9972.36} [INFO|2025-03-19 05:36:30] logging.py:143 >> {'loss': 0.7222, 'learning_rate': 4.9309e-05, 'epoch': 0.24, 'throughput': 9971.72} [INFO|2025-03-19 05:37:11] logging.py:143 >> {'loss': 0.7521, 'learning_rate': 4.9306e-05, 'epoch': 0.25, 'throughput': 9971.56} [INFO|2025-03-19 05:37:51] logging.py:143 >> {'loss': 0.7241, 'learning_rate': 4.9303e-05, 'epoch': 0.25, 'throughput': 9971.77} [INFO|2025-03-19 05:38:30] logging.py:143 >> {'loss': 0.7520, 'learning_rate': 4.9299e-05, 'epoch': 0.25, 'throughput': 9972.25} [INFO|2025-03-19 05:39:11] logging.py:143 >> {'loss': 0.7554, 'learning_rate': 4.9296e-05, 'epoch': 0.25, 'throughput': 9972.33} [INFO|2025-03-19 05:39:50] logging.py:143 >> {'loss': 0.7570, 'learning_rate': 4.9293e-05, 'epoch': 0.25, 'throughput': 9972.68} [INFO|2025-03-19 05:40:30] logging.py:143 >> {'loss': 0.8125, 'learning_rate': 4.9289e-05, 'epoch': 0.25, 'throughput': 9972.64} [INFO|2025-03-19 05:41:12] logging.py:143 >> {'loss': 0.7241, 'learning_rate': 4.9286e-05, 'epoch': 0.25, 'throughput': 9972.77} [INFO|2025-03-19 05:41:52] logging.py:143 >> {'loss': 0.7346, 'learning_rate': 4.9283e-05, 'epoch': 0.25, 'throughput': 9972.76} [INFO|2025-03-19 05:42:34] logging.py:143 >> {'loss': 0.7595, 'learning_rate': 4.9279e-05, 'epoch': 0.25, 'throughput': 9972.90} [INFO|2025-03-19 05:43:14] logging.py:143 >> {'loss': 0.7873, 'learning_rate': 4.9276e-05, 'epoch': 0.25, 'throughput': 9973.69} [INFO|2025-03-19 05:43:55] logging.py:143 >> {'loss': 0.7230, 'learning_rate': 4.9273e-05, 'epoch': 0.25, 'throughput': 9973.00} [INFO|2025-03-19 05:44:34] logging.py:143 >> {'loss': 0.7646, 'learning_rate': 4.9269e-05, 'epoch': 0.25, 'throughput': 9973.29} [INFO|2025-03-19 05:45:14] logging.py:143 >> {'loss': 0.7143, 'learning_rate': 4.9266e-05, 'epoch': 0.25, 'throughput': 9973.22} [INFO|2025-03-19 05:45:53] logging.py:143 >> {'loss': 0.7787, 'learning_rate': 4.9263e-05, 'epoch': 0.25, 'throughput': 9974.16} [INFO|2025-03-19 05:46:33] logging.py:143 >> {'loss': 0.7815, 'learning_rate': 4.9259e-05, 'epoch': 0.25, 'throughput': 9974.37} [INFO|2025-03-19 05:47:14] logging.py:143 >> {'loss': 0.7424, 'learning_rate': 4.9256e-05, 'epoch': 0.25, 'throughput': 9973.72} [INFO|2025-03-19 05:47:55] logging.py:143 >> {'loss': 0.7656, 'learning_rate': 4.9252e-05, 'epoch': 0.25, 'throughput': 9973.65} [INFO|2025-03-19 05:48:35] logging.py:143 >> {'loss': 0.7983, 'learning_rate': 4.9249e-05, 'epoch': 0.25, 'throughput': 9973.90} [INFO|2025-03-19 05:49:17] logging.py:143 >> {'loss': 0.7581, 'learning_rate': 4.9246e-05, 'epoch': 0.25, 'throughput': 9973.49} [INFO|2025-03-19 05:49:56] logging.py:143 >> {'loss': 0.7683, 'learning_rate': 4.9242e-05, 'epoch': 0.26, 'throughput': 9974.34} [INFO|2025-03-19 05:50:35] logging.py:143 >> {'loss': 0.7648, 'learning_rate': 4.9239e-05, 'epoch': 0.26, 'throughput': 9975.33} [INFO|2025-03-19 05:51:15] logging.py:143 >> {'loss': 0.7524, 'learning_rate': 4.9235e-05, 'epoch': 0.26, 'throughput': 9975.21} [INFO|2025-03-19 05:51:56] logging.py:143 >> {'loss': 0.7702, 'learning_rate': 4.9232e-05, 'epoch': 0.26, 'throughput': 9975.73} [INFO|2025-03-19 05:52:36] logging.py:143 >> {'loss': 0.7691, 'learning_rate': 4.9228e-05, 'epoch': 0.26, 'throughput': 9976.03} [INFO|2025-03-19 05:53:16] logging.py:143 >> {'loss': 0.7382, 'learning_rate': 4.9225e-05, 'epoch': 0.26, 'throughput': 9976.58} [INFO|2025-03-19 05:53:55] logging.py:143 >> {'loss': 0.7509, 'learning_rate': 4.9222e-05, 'epoch': 0.26, 'throughput': 9977.28} [INFO|2025-03-19 05:54:35] logging.py:143 >> {'loss': 0.7352, 'learning_rate': 4.9218e-05, 'epoch': 0.26, 'throughput': 9976.80} [INFO|2025-03-19 05:55:14] logging.py:143 >> {'loss': 0.7763, 'learning_rate': 4.9215e-05, 'epoch': 0.26, 'throughput': 9977.30} [INFO|2025-03-19 05:55:56] logging.py:143 >> {'loss': 0.7399, 'learning_rate': 4.9211e-05, 'epoch': 0.26, 'throughput': 9976.53} [INFO|2025-03-19 05:56:36] logging.py:143 >> {'loss': 0.7196, 'learning_rate': 4.9208e-05, 'epoch': 0.26, 'throughput': 9977.03} [INFO|2025-03-19 05:57:16] logging.py:143 >> {'loss': 0.7617, 'learning_rate': 4.9204e-05, 'epoch': 0.26, 'throughput': 9977.18} [INFO|2025-03-19 05:57:57] logging.py:143 >> {'loss': 0.7685, 'learning_rate': 4.9201e-05, 'epoch': 0.26, 'throughput': 9977.79} [INFO|2025-03-19 05:58:36] logging.py:143 >> {'loss': 0.7507, 'learning_rate': 4.9197e-05, 'epoch': 0.26, 'throughput': 9977.95} [INFO|2025-03-19 05:59:17] logging.py:143 >> {'loss': 0.7691, 'learning_rate': 4.9194e-05, 'epoch': 0.26, 'throughput': 9978.59} [INFO|2025-03-19 05:59:57] logging.py:143 >> {'loss': 0.7544, 'learning_rate': 4.9190e-05, 'epoch': 0.26, 'throughput': 9978.56} [INFO|2025-03-19 06:00:37] logging.py:143 >> {'loss': 0.7545, 'learning_rate': 4.9187e-05, 'epoch': 0.26, 'throughput': 9978.66} [INFO|2025-03-19 06:01:18] logging.py:143 >> {'loss': 0.7489, 'learning_rate': 4.9183e-05, 'epoch': 0.26, 'throughput': 9978.78} [INFO|2025-03-19 06:01:56] logging.py:143 >> {'loss': 0.7790, 'learning_rate': 4.9179e-05, 'epoch': 0.26, 'throughput': 9979.73} [INFO|2025-03-19 06:02:36] logging.py:143 >> {'loss': 0.7420, 'learning_rate': 4.9176e-05, 'epoch': 0.27, 'throughput': 9979.71} [INFO|2025-03-19 06:03:18] logging.py:143 >> {'loss': 0.7323, 'learning_rate': 4.9172e-05, 'epoch': 0.27, 'throughput': 9979.60} [INFO|2025-03-19 06:03:59] logging.py:143 >> {'loss': 0.7522, 'learning_rate': 4.9169e-05, 'epoch': 0.27, 'throughput': 9979.60} [INFO|2025-03-19 06:04:39] logging.py:143 >> {'loss': 0.7715, 'learning_rate': 4.9165e-05, 'epoch': 0.27, 'throughput': 9979.40} [INFO|2025-03-19 06:05:19] logging.py:143 >> {'loss': 0.7390, 'learning_rate': 4.9162e-05, 'epoch': 0.27, 'throughput': 9979.57} [INFO|2025-03-19 06:06:00] logging.py:143 >> {'loss': 0.7793, 'learning_rate': 4.9158e-05, 'epoch': 0.27, 'throughput': 9979.95} [INFO|2025-03-19 06:06:41] logging.py:143 >> {'loss': 0.7217, 'learning_rate': 4.9154e-05, 'epoch': 0.27, 'throughput': 9980.12} [INFO|2025-03-19 06:07:21] logging.py:143 >> {'loss': 0.7677, 'learning_rate': 4.9151e-05, 'epoch': 0.27, 'throughput': 9980.46} [INFO|2025-03-19 06:08:00] logging.py:143 >> {'loss': 0.7451, 'learning_rate': 4.9147e-05, 'epoch': 0.27, 'throughput': 9980.91} [INFO|2025-03-19 06:08:41] logging.py:143 >> {'loss': 0.7505, 'learning_rate': 4.9143e-05, 'epoch': 0.27, 'throughput': 9980.51} [INFO|2025-03-19 06:09:22] logging.py:143 >> {'loss': 0.7583, 'learning_rate': 4.9140e-05, 'epoch': 0.27, 'throughput': 9980.94} [INFO|2025-03-19 06:10:02] logging.py:143 >> {'loss': 0.7572, 'learning_rate': 4.9136e-05, 'epoch': 0.27, 'throughput': 9981.09} [INFO|2025-03-19 06:10:42] logging.py:143 >> {'loss': 0.7450, 'learning_rate': 4.9133e-05, 'epoch': 0.27, 'throughput': 9981.25} [INFO|2025-03-19 06:11:21] logging.py:143 >> {'loss': 0.7780, 'learning_rate': 4.9129e-05, 'epoch': 0.27, 'throughput': 9981.65} [INFO|2025-03-19 06:12:01] logging.py:143 >> {'loss': 0.7358, 'learning_rate': 4.9125e-05, 'epoch': 0.27, 'throughput': 9981.42} [INFO|2025-03-19 06:12:41] logging.py:143 >> {'loss': 0.7602, 'learning_rate': 4.9122e-05, 'epoch': 0.27, 'throughput': 9981.56} [INFO|2025-03-19 06:13:22] logging.py:143 >> {'loss': 0.7479, 'learning_rate': 4.9118e-05, 'epoch': 0.27, 'throughput': 9981.31} [INFO|2025-03-19 06:14:02] logging.py:143 >> {'loss': 0.7223, 'learning_rate': 4.9114e-05, 'epoch': 0.27, 'throughput': 9981.16} [INFO|2025-03-19 06:14:42] logging.py:143 >> {'loss': 0.7069, 'learning_rate': 4.9111e-05, 'epoch': 0.27, 'throughput': 9981.17} [INFO|2025-03-19 06:15:22] logging.py:143 >> {'loss': 0.7298, 'learning_rate': 4.9107e-05, 'epoch': 0.28, 'throughput': 9980.93} [INFO|2025-03-19 06:16:02] logging.py:143 >> {'loss': 0.7327, 'learning_rate': 4.9103e-05, 'epoch': 0.28, 'throughput': 9981.25} [INFO|2025-03-19 06:16:41] logging.py:143 >> {'loss': 0.7781, 'learning_rate': 4.9099e-05, 'epoch': 0.28, 'throughput': 9982.03} [INFO|2025-03-19 06:17:23] logging.py:143 >> {'loss': 0.7498, 'learning_rate': 4.9096e-05, 'epoch': 0.28, 'throughput': 9982.03} [INFO|2025-03-19 06:18:03] logging.py:143 >> {'loss': 0.7526, 'learning_rate': 4.9092e-05, 'epoch': 0.28, 'throughput': 9983.33} [INFO|2025-03-19 06:18:42] logging.py:143 >> {'loss': 0.7666, 'learning_rate': 4.9088e-05, 'epoch': 0.28, 'throughput': 9983.75} [INFO|2025-03-19 06:19:24] logging.py:143 >> {'loss': 0.7501, 'learning_rate': 4.9084e-05, 'epoch': 0.28, 'throughput': 9982.90} [INFO|2025-03-19 06:20:05] logging.py:143 >> {'loss': 0.7184, 'learning_rate': 4.9081e-05, 'epoch': 0.28, 'throughput': 9982.92} [INFO|2025-03-19 06:20:45] logging.py:143 >> {'loss': 0.7495, 'learning_rate': 4.9077e-05, 'epoch': 0.28, 'throughput': 9982.86} [INFO|2025-03-19 06:21:25] logging.py:143 >> {'loss': 0.7782, 'learning_rate': 4.9073e-05, 'epoch': 0.28, 'throughput': 9983.13} [INFO|2025-03-19 06:22:06] logging.py:143 >> {'loss': 0.7250, 'learning_rate': 4.9069e-05, 'epoch': 0.28, 'throughput': 9983.28} [INFO|2025-03-19 06:22:47] logging.py:143 >> {'loss': 0.7634, 'learning_rate': 4.9066e-05, 'epoch': 0.28, 'throughput': 9983.52} [INFO|2025-03-19 06:23:27] logging.py:143 >> {'loss': 0.7575, 'learning_rate': 4.9062e-05, 'epoch': 0.28, 'throughput': 9983.77} [INFO|2025-03-19 06:24:06] logging.py:143 >> {'loss': 0.7597, 'learning_rate': 4.9058e-05, 'epoch': 0.28, 'throughput': 9983.90} [INFO|2025-03-19 06:24:47] logging.py:143 >> {'loss': 0.7735, 'learning_rate': 4.9054e-05, 'epoch': 0.28, 'throughput': 9984.31} [INFO|2025-03-19 06:25:29] logging.py:143 >> {'loss': 0.7774, 'learning_rate': 4.9050e-05, 'epoch': 0.28, 'throughput': 9984.14} [INFO|2025-03-19 06:26:09] logging.py:143 >> {'loss': 0.7391, 'learning_rate': 4.9047e-05, 'epoch': 0.28, 'throughput': 9983.83} [INFO|2025-03-19 06:26:51] logging.py:143 >> {'loss': 0.7284, 'learning_rate': 4.9043e-05, 'epoch': 0.28, 'throughput': 9983.32} [INFO|2025-03-19 06:27:32] logging.py:143 >> {'loss': 0.7659, 'learning_rate': 4.9039e-05, 'epoch': 0.29, 'throughput': 9983.22} [INFO|2025-03-19 06:28:13] logging.py:143 >> {'loss': 0.7695, 'learning_rate': 4.9035e-05, 'epoch': 0.29, 'throughput': 9983.30} [INFO|2025-03-19 06:28:54] logging.py:143 >> {'loss': 0.7334, 'learning_rate': 4.9031e-05, 'epoch': 0.29, 'throughput': 9983.31} [INFO|2025-03-19 06:29:35] logging.py:143 >> {'loss': 0.7191, 'learning_rate': 4.9027e-05, 'epoch': 0.29, 'throughput': 9983.23} [INFO|2025-03-19 06:30:14] logging.py:143 >> {'loss': 0.7549, 'learning_rate': 4.9023e-05, 'epoch': 0.29, 'throughput': 9983.12} [INFO|2025-03-19 06:30:56] logging.py:143 >> {'loss': 0.7553, 'learning_rate': 4.9020e-05, 'epoch': 0.29, 'throughput': 9983.26} [INFO|2025-03-19 06:31:36] logging.py:143 >> {'loss': 0.7179, 'learning_rate': 4.9016e-05, 'epoch': 0.29, 'throughput': 9983.01} [INFO|2025-03-19 06:32:17] logging.py:143 >> {'loss': 0.7338, 'learning_rate': 4.9012e-05, 'epoch': 0.29, 'throughput': 9982.92} [INFO|2025-03-19 06:32:56] logging.py:143 >> {'loss': 0.7555, 'learning_rate': 4.9008e-05, 'epoch': 0.29, 'throughput': 9983.26} [INFO|2025-03-19 06:33:38] logging.py:143 >> {'loss': 0.7678, 'learning_rate': 4.9004e-05, 'epoch': 0.29, 'throughput': 9982.75} [INFO|2025-03-19 06:34:18] logging.py:143 >> {'loss': 0.7517, 'learning_rate': 4.9000e-05, 'epoch': 0.29, 'throughput': 9983.29} [INFO|2025-03-19 06:34:58] logging.py:143 >> {'loss': 0.7481, 'learning_rate': 4.8996e-05, 'epoch': 0.29, 'throughput': 9982.84} [INFO|2025-03-19 06:35:37] logging.py:143 >> {'loss': 0.7134, 'learning_rate': 4.8992e-05, 'epoch': 0.29, 'throughput': 9983.34} [INFO|2025-03-19 06:36:17] logging.py:143 >> {'loss': 0.7502, 'learning_rate': 4.8988e-05, 'epoch': 0.29, 'throughput': 9982.98} [INFO|2025-03-19 06:36:59] logging.py:143 >> {'loss': 0.7333, 'learning_rate': 4.8984e-05, 'epoch': 0.29, 'throughput': 9982.55} [INFO|2025-03-19 06:37:40] logging.py:143 >> {'loss': 0.7791, 'learning_rate': 4.8980e-05, 'epoch': 0.29, 'throughput': 9982.54} [INFO|2025-03-19 06:38:20] logging.py:143 >> {'loss': 0.7560, 'learning_rate': 4.8976e-05, 'epoch': 0.29, 'throughput': 9982.40} [INFO|2025-03-19 06:39:00] logging.py:143 >> {'loss': 0.7423, 'learning_rate': 4.8972e-05, 'epoch': 0.29, 'throughput': 9982.81} [INFO|2025-03-19 06:39:41] logging.py:143 >> {'loss': 0.7409, 'learning_rate': 4.8968e-05, 'epoch': 0.29, 'throughput': 9982.81} [INFO|2025-03-19 06:40:21] logging.py:143 >> {'loss': 0.7948, 'learning_rate': 4.8964e-05, 'epoch': 0.30, 'throughput': 9983.23} [INFO|2025-03-19 06:41:02] logging.py:143 >> {'loss': 0.7451, 'learning_rate': 4.8960e-05, 'epoch': 0.30, 'throughput': 9982.94} [INFO|2025-03-19 06:41:41] logging.py:143 >> {'loss': 0.7198, 'learning_rate': 4.8956e-05, 'epoch': 0.30, 'throughput': 9983.16} [INFO|2025-03-19 06:42:22] logging.py:143 >> {'loss': 0.7602, 'learning_rate': 4.8952e-05, 'epoch': 0.30, 'throughput': 9983.53} [INFO|2025-03-19 06:43:01] logging.py:143 >> {'loss': 0.7232, 'learning_rate': 4.8948e-05, 'epoch': 0.30, 'throughput': 9983.89} [INFO|2025-03-19 06:43:42] logging.py:143 >> {'loss': 0.7639, 'learning_rate': 4.8944e-05, 'epoch': 0.30, 'throughput': 9983.66} [INFO|2025-03-19 06:44:23] logging.py:143 >> {'loss': 0.7403, 'learning_rate': 4.8940e-05, 'epoch': 0.30, 'throughput': 9983.55} [INFO|2025-03-19 06:45:04] logging.py:143 >> {'loss': 0.7794, 'learning_rate': 4.8936e-05, 'epoch': 0.30, 'throughput': 9983.62} [INFO|2025-03-19 06:45:44] logging.py:143 >> {'loss': 0.7397, 'learning_rate': 4.8932e-05, 'epoch': 0.30, 'throughput': 9983.67} [INFO|2025-03-19 06:46:24] logging.py:143 >> {'loss': 0.7486, 'learning_rate': 4.8928e-05, 'epoch': 0.30, 'throughput': 9984.11} [INFO|2025-03-19 06:47:05] logging.py:143 >> {'loss': 0.7420, 'learning_rate': 4.8924e-05, 'epoch': 0.30, 'throughput': 9984.12} [INFO|2025-03-19 06:47:44] logging.py:143 >> {'loss': 0.7626, 'learning_rate': 4.8920e-05, 'epoch': 0.30, 'throughput': 9984.54} [INFO|2025-03-19 06:48:25] logging.py:143 >> {'loss': 0.7844, 'learning_rate': 4.8916e-05, 'epoch': 0.30, 'throughput': 9984.98} [INFO|2025-03-19 06:49:05] logging.py:143 >> {'loss': 0.7603, 'learning_rate': 4.8912e-05, 'epoch': 0.30, 'throughput': 9984.92} [INFO|2025-03-19 06:49:47] logging.py:143 >> {'loss': 0.7097, 'learning_rate': 4.8908e-05, 'epoch': 0.30, 'throughput': 9984.30} [INFO|2025-03-19 06:50:27] logging.py:143 >> {'loss': 0.7567, 'learning_rate': 4.8904e-05, 'epoch': 0.30, 'throughput': 9984.49} [INFO|2025-03-19 06:51:08] logging.py:143 >> {'loss': 0.7394, 'learning_rate': 4.8900e-05, 'epoch': 0.30, 'throughput': 9984.55} [INFO|2025-03-19 06:51:50] logging.py:143 >> {'loss': 0.7517, 'learning_rate': 4.8896e-05, 'epoch': 0.30, 'throughput': 9984.51} [INFO|2025-03-19 06:52:30] logging.py:143 >> {'loss': 0.7602, 'learning_rate': 4.8892e-05, 'epoch': 0.30, 'throughput': 9984.82} [INFO|2025-03-19 06:53:10] logging.py:143 >> {'loss': 0.7436, 'learning_rate': 4.8887e-05, 'epoch': 0.31, 'throughput': 9984.56} [INFO|2025-03-19 06:53:50] logging.py:143 >> {'loss': 0.7394, 'learning_rate': 4.8883e-05, 'epoch': 0.31, 'throughput': 9984.90} [INFO|2025-03-19 06:54:31] logging.py:143 >> {'loss': 0.7427, 'learning_rate': 4.8879e-05, 'epoch': 0.31, 'throughput': 9984.69} [INFO|2025-03-19 06:55:11] logging.py:143 >> {'loss': 0.7207, 'learning_rate': 4.8875e-05, 'epoch': 0.31, 'throughput': 9984.56} [INFO|2025-03-19 06:55:52] logging.py:143 >> {'loss': 0.7658, 'learning_rate': 4.8871e-05, 'epoch': 0.31, 'throughput': 9984.34} [INFO|2025-03-19 06:56:32] logging.py:143 >> {'loss': 0.7599, 'learning_rate': 4.8867e-05, 'epoch': 0.31, 'throughput': 9984.51} [INFO|2025-03-19 06:57:12] logging.py:143 >> {'loss': 0.7274, 'learning_rate': 4.8862e-05, 'epoch': 0.31, 'throughput': 9985.33} [INFO|2025-03-19 06:57:53] logging.py:143 >> {'loss': 0.7662, 'learning_rate': 4.8858e-05, 'epoch': 0.31, 'throughput': 9985.41} [INFO|2025-03-19 06:58:33] logging.py:143 >> {'loss': 0.7430, 'learning_rate': 4.8854e-05, 'epoch': 0.31, 'throughput': 9985.41} [INFO|2025-03-19 06:59:14] logging.py:143 >> {'loss': 0.7590, 'learning_rate': 4.8850e-05, 'epoch': 0.31, 'throughput': 9985.51} [INFO|2025-03-19 06:59:54] logging.py:143 >> {'loss': 0.7390, 'learning_rate': 4.8846e-05, 'epoch': 0.31, 'throughput': 9985.58} [INFO|2025-03-19 07:00:33] logging.py:143 >> {'loss': 0.7205, 'learning_rate': 4.8842e-05, 'epoch': 0.31, 'throughput': 9985.63} [INFO|2025-03-19 07:01:13] logging.py:143 >> {'loss': 0.7465, 'learning_rate': 4.8837e-05, 'epoch': 0.31, 'throughput': 9985.71} [INFO|2025-03-19 07:01:53] logging.py:143 >> {'loss': 0.7209, 'learning_rate': 4.8833e-05, 'epoch': 0.31, 'throughput': 9985.85} [INFO|2025-03-19 07:02:34] logging.py:143 >> {'loss': 0.7620, 'learning_rate': 4.8829e-05, 'epoch': 0.31, 'throughput': 9985.71} [INFO|2025-03-19 07:03:15] logging.py:143 >> {'loss': 0.7010, 'learning_rate': 4.8825e-05, 'epoch': 0.31, 'throughput': 9985.67} [INFO|2025-03-19 07:03:58] logging.py:143 >> {'loss': 0.7110, 'learning_rate': 4.8820e-05, 'epoch': 0.31, 'throughput': 9985.58} [INFO|2025-03-19 07:04:39] logging.py:143 >> {'loss': 0.7056, 'learning_rate': 4.8816e-05, 'epoch': 0.31, 'throughput': 9985.14} [INFO|2025-03-19 07:05:19] logging.py:143 >> {'loss': 0.7591, 'learning_rate': 4.8812e-05, 'epoch': 0.31, 'throughput': 9984.89} [INFO|2025-03-19 07:05:59] logging.py:143 >> {'loss': 0.7687, 'learning_rate': 4.8808e-05, 'epoch': 0.32, 'throughput': 9985.21} [INFO|2025-03-19 07:06:39] logging.py:143 >> {'loss': 0.7369, 'learning_rate': 4.8803e-05, 'epoch': 0.32, 'throughput': 9984.93} [INFO|2025-03-19 07:07:19] logging.py:143 >> {'loss': 0.7615, 'learning_rate': 4.8799e-05, 'epoch': 0.32, 'throughput': 9985.45} [INFO|2025-03-19 07:07:59] logging.py:143 >> {'loss': 0.7439, 'learning_rate': 4.8795e-05, 'epoch': 0.32, 'throughput': 9985.74} [INFO|2025-03-19 07:08:38] logging.py:143 >> {'loss': 0.7267, 'learning_rate': 4.8790e-05, 'epoch': 0.32, 'throughput': 9986.03} [INFO|2025-03-19 07:09:18] logging.py:143 >> {'loss': 0.7354, 'learning_rate': 4.8786e-05, 'epoch': 0.32, 'throughput': 9986.37} [INFO|2025-03-19 07:10:00] logging.py:143 >> {'loss': 0.7434, 'learning_rate': 4.8782e-05, 'epoch': 0.32, 'throughput': 9985.74} [INFO|2025-03-19 07:10:40] logging.py:143 >> {'loss': 0.7622, 'learning_rate': 4.8778e-05, 'epoch': 0.32, 'throughput': 9986.01} [INFO|2025-03-19 07:11:20] logging.py:143 >> {'loss': 0.7725, 'learning_rate': 4.8773e-05, 'epoch': 0.32, 'throughput': 9986.31} [INFO|2025-03-19 07:11:59] logging.py:143 >> {'loss': 0.7246, 'learning_rate': 4.8769e-05, 'epoch': 0.32, 'throughput': 9986.48} [INFO|2025-03-19 07:12:40] logging.py:143 >> {'loss': 0.7205, 'learning_rate': 4.8765e-05, 'epoch': 0.32, 'throughput': 9986.38} [INFO|2025-03-19 07:13:22] logging.py:143 >> {'loss': 0.7570, 'learning_rate': 4.8760e-05, 'epoch': 0.32, 'throughput': 9986.13} [INFO|2025-03-19 07:14:04] logging.py:143 >> {'loss': 0.7309, 'learning_rate': 4.8756e-05, 'epoch': 0.32, 'throughput': 9985.70} [INFO|2025-03-19 07:14:45] logging.py:143 >> {'loss': 0.7469, 'learning_rate': 4.8751e-05, 'epoch': 0.32, 'throughput': 9985.65} [INFO|2025-03-19 07:15:26] logging.py:143 >> {'loss': 0.7557, 'learning_rate': 4.8747e-05, 'epoch': 0.32, 'throughput': 9985.58} [INFO|2025-03-19 07:16:06] logging.py:143 >> {'loss': 0.7268, 'learning_rate': 4.8743e-05, 'epoch': 0.32, 'throughput': 9985.52} [INFO|2025-03-19 07:16:47] logging.py:143 >> {'loss': 0.7352, 'learning_rate': 4.8738e-05, 'epoch': 0.32, 'throughput': 9985.19} [INFO|2025-03-19 07:17:29] logging.py:143 >> {'loss': 0.7336, 'learning_rate': 4.8734e-05, 'epoch': 0.32, 'throughput': 9985.39} [INFO|2025-03-19 07:18:11] logging.py:143 >> {'loss': 0.7140, 'learning_rate': 4.8730e-05, 'epoch': 0.32, 'throughput': 9985.04} [INFO|2025-03-19 07:18:51] logging.py:143 >> {'loss': 0.6865, 'learning_rate': 4.8725e-05, 'epoch': 0.33, 'throughput': 9985.13} [INFO|2025-03-19 07:19:30] logging.py:143 >> {'loss': 0.7179, 'learning_rate': 4.8721e-05, 'epoch': 0.33, 'throughput': 9985.45} [INFO|2025-03-19 07:20:11] logging.py:143 >> {'loss': 0.7381, 'learning_rate': 4.8716e-05, 'epoch': 0.33, 'throughput': 9985.95} [INFO|2025-03-19 07:20:51] logging.py:143 >> {'loss': 0.7215, 'learning_rate': 4.8712e-05, 'epoch': 0.33, 'throughput': 9985.78} [INFO|2025-03-19 07:21:31] logging.py:143 >> {'loss': 0.7174, 'learning_rate': 4.8707e-05, 'epoch': 0.33, 'throughput': 9985.87} [INFO|2025-03-19 07:22:10] logging.py:143 >> {'loss': 0.7189, 'learning_rate': 4.8703e-05, 'epoch': 0.33, 'throughput': 9985.94} [INFO|2025-03-19 07:22:51] logging.py:143 >> {'loss': 0.7474, 'learning_rate': 4.8699e-05, 'epoch': 0.33, 'throughput': 9985.60} [INFO|2025-03-19 07:23:31] logging.py:143 >> {'loss': 0.7195, 'learning_rate': 4.8694e-05, 'epoch': 0.33, 'throughput': 9985.67} [INFO|2025-03-19 07:24:11] logging.py:143 >> {'loss': 0.7511, 'learning_rate': 4.8690e-05, 'epoch': 0.33, 'throughput': 9985.84} [INFO|2025-03-19 07:24:50] logging.py:143 >> {'loss': 0.7279, 'learning_rate': 4.8685e-05, 'epoch': 0.33, 'throughput': 9985.89} [INFO|2025-03-19 07:25:31] logging.py:143 >> {'loss': 0.7070, 'learning_rate': 4.8681e-05, 'epoch': 0.33, 'throughput': 9985.22} [INFO|2025-03-19 07:26:11] logging.py:143 >> {'loss': 0.7541, 'learning_rate': 4.8676e-05, 'epoch': 0.33, 'throughput': 9985.08} [INFO|2025-03-19 07:26:52] logging.py:143 >> {'loss': 0.7447, 'learning_rate': 4.8672e-05, 'epoch': 0.33, 'throughput': 9985.07} [INFO|2025-03-19 07:27:33] logging.py:143 >> {'loss': 0.7422, 'learning_rate': 4.8667e-05, 'epoch': 0.33, 'throughput': 9985.20} [INFO|2025-03-19 07:28:13] logging.py:143 >> {'loss': 0.7335, 'learning_rate': 4.8663e-05, 'epoch': 0.33, 'throughput': 9985.06} [INFO|2025-03-19 07:28:54] logging.py:143 >> {'loss': 0.7629, 'learning_rate': 4.8658e-05, 'epoch': 0.33, 'throughput': 9984.99} [INFO|2025-03-19 07:29:34] logging.py:143 >> {'loss': 0.7248, 'learning_rate': 4.8654e-05, 'epoch': 0.33, 'throughput': 9985.14} [INFO|2025-03-19 07:30:15] logging.py:143 >> {'loss': 0.7613, 'learning_rate': 4.8649e-05, 'epoch': 0.33, 'throughput': 9984.92} [INFO|2025-03-19 07:30:55] logging.py:143 >> {'loss': 0.7120, 'learning_rate': 4.8645e-05, 'epoch': 0.33, 'throughput': 9984.86} [INFO|2025-03-19 07:31:35] logging.py:143 >> {'loss': 0.7274, 'learning_rate': 4.8640e-05, 'epoch': 0.34, 'throughput': 9985.25} [INFO|2025-03-19 07:32:14] logging.py:143 >> {'loss': 0.7589, 'learning_rate': 4.8635e-05, 'epoch': 0.34, 'throughput': 9985.45} [INFO|2025-03-19 07:32:54] logging.py:143 >> {'loss': 0.7316, 'learning_rate': 4.8631e-05, 'epoch': 0.34, 'throughput': 9985.42} [INFO|2025-03-19 07:33:34] logging.py:143 >> {'loss': 0.7439, 'learning_rate': 4.8626e-05, 'epoch': 0.34, 'throughput': 9986.07} [INFO|2025-03-19 07:34:14] logging.py:143 >> {'loss': 0.7709, 'learning_rate': 4.8622e-05, 'epoch': 0.34, 'throughput': 9986.26} [INFO|2025-03-19 07:34:54] logging.py:143 >> {'loss': 0.7183, 'learning_rate': 4.8617e-05, 'epoch': 0.34, 'throughput': 9986.17} [INFO|2025-03-19 07:35:33] logging.py:143 >> {'loss': 0.7139, 'learning_rate': 4.8613e-05, 'epoch': 0.34, 'throughput': 9986.32} [INFO|2025-03-19 07:36:14] logging.py:143 >> {'loss': 0.7410, 'learning_rate': 4.8608e-05, 'epoch': 0.34, 'throughput': 9986.13} [INFO|2025-03-19 07:36:55] logging.py:143 >> {'loss': 0.7325, 'learning_rate': 4.8603e-05, 'epoch': 0.34, 'throughput': 9985.98} [INFO|2025-03-19 07:37:34] logging.py:143 >> {'loss': 0.7125, 'learning_rate': 4.8599e-05, 'epoch': 0.34, 'throughput': 9985.70} [INFO|2025-03-19 07:38:15] logging.py:143 >> {'loss': 0.7096, 'learning_rate': 4.8594e-05, 'epoch': 0.34, 'throughput': 9985.94} [INFO|2025-03-19 07:38:55] logging.py:143 >> {'loss': 0.7202, 'learning_rate': 4.8589e-05, 'epoch': 0.34, 'throughput': 9985.64} [INFO|2025-03-19 07:39:37] logging.py:143 >> {'loss': 0.7440, 'learning_rate': 4.8585e-05, 'epoch': 0.34, 'throughput': 9985.51} [INFO|2025-03-19 07:40:17] logging.py:143 >> {'loss': 0.7597, 'learning_rate': 4.8580e-05, 'epoch': 0.34, 'throughput': 9985.64} [INFO|2025-03-19 07:40:58] logging.py:143 >> {'loss': 0.7228, 'learning_rate': 4.8576e-05, 'epoch': 0.34, 'throughput': 9985.89} [INFO|2025-03-19 07:41:39] logging.py:143 >> {'loss': 0.7621, 'learning_rate': 4.8571e-05, 'epoch': 0.34, 'throughput': 9985.85} [INFO|2025-03-19 07:42:19] logging.py:143 >> {'loss': 0.7316, 'learning_rate': 4.8566e-05, 'epoch': 0.34, 'throughput': 9985.96} [INFO|2025-03-19 07:43:01] logging.py:143 >> {'loss': 0.7477, 'learning_rate': 4.8562e-05, 'epoch': 0.34, 'throughput': 9985.98} [INFO|2025-03-19 07:43:41] logging.py:143 >> {'loss': 0.7487, 'learning_rate': 4.8557e-05, 'epoch': 0.34, 'throughput': 9986.38} [INFO|2025-03-19 07:44:23] logging.py:143 >> {'loss': 0.7440, 'learning_rate': 4.8552e-05, 'epoch': 0.35, 'throughput': 9986.48} [INFO|2025-03-19 07:45:05] logging.py:143 >> {'loss': 0.7263, 'learning_rate': 4.8547e-05, 'epoch': 0.35, 'throughput': 9986.07} [INFO|2025-03-19 07:45:45] logging.py:143 >> {'loss': 0.7298, 'learning_rate': 4.8543e-05, 'epoch': 0.35, 'throughput': 9986.29} [INFO|2025-03-19 07:46:26] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 4.8538e-05, 'epoch': 0.35, 'throughput': 9985.78} [INFO|2025-03-19 07:47:07] logging.py:143 >> {'loss': 0.7679, 'learning_rate': 4.8533e-05, 'epoch': 0.35, 'throughput': 9985.85} [INFO|2025-03-19 07:47:47] logging.py:143 >> {'loss': 0.7181, 'learning_rate': 4.8529e-05, 'epoch': 0.35, 'throughput': 9986.01} [INFO|2025-03-19 07:48:28] logging.py:143 >> {'loss': 0.7214, 'learning_rate': 4.8524e-05, 'epoch': 0.35, 'throughput': 9986.05} [INFO|2025-03-19 07:49:09] logging.py:143 >> {'loss': 0.7402, 'learning_rate': 4.8519e-05, 'epoch': 0.35, 'throughput': 9986.02} [INFO|2025-03-19 07:49:48] logging.py:143 >> {'loss': 0.7464, 'learning_rate': 4.8514e-05, 'epoch': 0.35, 'throughput': 9986.11} [INFO|2025-03-19 07:50:29] logging.py:143 >> {'loss': 0.7400, 'learning_rate': 4.8510e-05, 'epoch': 0.35, 'throughput': 9985.96} [INFO|2025-03-19 07:51:09] logging.py:143 >> {'loss': 0.7422, 'learning_rate': 4.8505e-05, 'epoch': 0.35, 'throughput': 9985.86} [INFO|2025-03-19 07:51:48] logging.py:143 >> {'loss': 0.7020, 'learning_rate': 4.8500e-05, 'epoch': 0.35, 'throughput': 9986.24} [INFO|2025-03-19 07:52:28] logging.py:143 >> {'loss': 0.7305, 'learning_rate': 4.8495e-05, 'epoch': 0.35, 'throughput': 9986.55} [INFO|2025-03-19 07:53:09] logging.py:143 >> {'loss': 0.7381, 'learning_rate': 4.8491e-05, 'epoch': 0.35, 'throughput': 9986.65} [INFO|2025-03-19 07:53:49] logging.py:143 >> {'loss': 0.7534, 'learning_rate': 4.8486e-05, 'epoch': 0.35, 'throughput': 9986.60} [INFO|2025-03-19 07:54:29] logging.py:143 >> {'loss': 0.7236, 'learning_rate': 4.8481e-05, 'epoch': 0.35, 'throughput': 9987.04} [INFO|2025-03-19 07:55:09] logging.py:143 >> {'loss': 0.7339, 'learning_rate': 4.8476e-05, 'epoch': 0.35, 'throughput': 9986.90} [INFO|2025-03-19 07:55:49] logging.py:143 >> {'loss': 0.7447, 'learning_rate': 4.8471e-05, 'epoch': 0.35, 'throughput': 9987.10} [INFO|2025-03-19 07:56:31] logging.py:143 >> {'loss': 0.7397, 'learning_rate': 4.8466e-05, 'epoch': 0.36, 'throughput': 9987.22} [INFO|2025-03-19 07:57:13] logging.py:143 >> {'loss': 0.7408, 'learning_rate': 4.8462e-05, 'epoch': 0.36, 'throughput': 9986.80} [INFO|2025-03-19 07:57:53] logging.py:143 >> {'loss': 0.7209, 'learning_rate': 4.8457e-05, 'epoch': 0.36, 'throughput': 9986.80} [INFO|2025-03-19 07:58:34] logging.py:143 >> {'loss': 0.7312, 'learning_rate': 4.8452e-05, 'epoch': 0.36, 'throughput': 9986.05} [INFO|2025-03-19 07:59:14] logging.py:143 >> {'loss': 0.7075, 'learning_rate': 4.8447e-05, 'epoch': 0.36, 'throughput': 9986.04} [INFO|2025-03-19 07:59:56] logging.py:143 >> {'loss': 0.7771, 'learning_rate': 4.8442e-05, 'epoch': 0.36, 'throughput': 9985.99} [INFO|2025-03-19 08:00:37] logging.py:143 >> {'loss': 0.6869, 'learning_rate': 4.8437e-05, 'epoch': 0.36, 'throughput': 9985.93} [INFO|2025-03-19 08:01:17] logging.py:143 >> {'loss': 0.7584, 'learning_rate': 4.8433e-05, 'epoch': 0.36, 'throughput': 9985.84} [INFO|2025-03-19 08:01:56] logging.py:143 >> {'loss': 0.7338, 'learning_rate': 4.8428e-05, 'epoch': 0.36, 'throughput': 9986.21} [INFO|2025-03-19 08:02:36] logging.py:143 >> {'loss': 0.7403, 'learning_rate': 4.8423e-05, 'epoch': 0.36, 'throughput': 9986.55} [INFO|2025-03-19 08:03:17] logging.py:143 >> {'loss': 0.6913, 'learning_rate': 4.8418e-05, 'epoch': 0.36, 'throughput': 9986.46} [INFO|2025-03-19 08:03:56] logging.py:143 >> {'loss': 0.7487, 'learning_rate': 4.8413e-05, 'epoch': 0.36, 'throughput': 9986.81} [INFO|2025-03-19 08:04:36] logging.py:143 >> {'loss': 0.7568, 'learning_rate': 4.8408e-05, 'epoch': 0.36, 'throughput': 9986.93} [INFO|2025-03-19 08:05:16] logging.py:143 >> {'loss': 0.7117, 'learning_rate': 4.8403e-05, 'epoch': 0.36, 'throughput': 9986.81} [INFO|2025-03-19 08:05:58] logging.py:143 >> {'loss': 0.6837, 'learning_rate': 4.8398e-05, 'epoch': 0.36, 'throughput': 9986.93} [INFO|2025-03-19 08:06:38] logging.py:143 >> {'loss': 0.7539, 'learning_rate': 4.8393e-05, 'epoch': 0.36, 'throughput': 9987.17} [INFO|2025-03-19 08:07:19] logging.py:143 >> {'loss': 0.6829, 'learning_rate': 4.8388e-05, 'epoch': 0.36, 'throughput': 9987.26} [INFO|2025-03-19 08:07:59] logging.py:143 >> {'loss': 0.7029, 'learning_rate': 4.8383e-05, 'epoch': 0.36, 'throughput': 9987.38} [INFO|2025-03-19 08:08:39] logging.py:143 >> {'loss': 0.7381, 'learning_rate': 4.8378e-05, 'epoch': 0.36, 'throughput': 9987.22} [INFO|2025-03-19 08:09:19] logging.py:143 >> {'loss': 0.7109, 'learning_rate': 4.8373e-05, 'epoch': 0.37, 'throughput': 9987.53} [INFO|2025-03-19 08:10:00] logging.py:143 >> {'loss': 0.7187, 'learning_rate': 4.8368e-05, 'epoch': 0.37, 'throughput': 9987.17} [INFO|2025-03-19 08:10:39] logging.py:143 >> {'loss': 0.7319, 'learning_rate': 4.8364e-05, 'epoch': 0.37, 'throughput': 9987.35} [INFO|2025-03-19 08:11:19] logging.py:143 >> {'loss': 0.7603, 'learning_rate': 4.8359e-05, 'epoch': 0.37, 'throughput': 9987.71} [INFO|2025-03-19 08:11:59] logging.py:143 >> {'loss': 0.7366, 'learning_rate': 4.8354e-05, 'epoch': 0.37, 'throughput': 9988.36} [INFO|2025-03-19 08:12:38] logging.py:143 >> {'loss': 0.7112, 'learning_rate': 4.8349e-05, 'epoch': 0.37, 'throughput': 9988.60} [INFO|2025-03-19 08:13:19] logging.py:143 >> {'loss': 0.7211, 'learning_rate': 4.8344e-05, 'epoch': 0.37, 'throughput': 9988.45} [INFO|2025-03-19 08:14:00] logging.py:143 >> {'loss': 0.7061, 'learning_rate': 4.8339e-05, 'epoch': 0.37, 'throughput': 9988.63} [INFO|2025-03-19 08:14:41] logging.py:143 >> {'loss': 0.7491, 'learning_rate': 4.8334e-05, 'epoch': 0.37, 'throughput': 9988.61} [INFO|2025-03-19 08:15:21] logging.py:143 >> {'loss': 0.7303, 'learning_rate': 4.8328e-05, 'epoch': 0.37, 'throughput': 9988.69} [INFO|2025-03-19 08:16:00] logging.py:143 >> {'loss': 0.7540, 'learning_rate': 4.8323e-05, 'epoch': 0.37, 'throughput': 9989.06} [INFO|2025-03-19 08:16:42] logging.py:143 >> {'loss': 0.7170, 'learning_rate': 4.8318e-05, 'epoch': 0.37, 'throughput': 9988.73} [INFO|2025-03-19 08:17:21] logging.py:143 >> {'loss': 0.7412, 'learning_rate': 4.8313e-05, 'epoch': 0.37, 'throughput': 9989.10} [INFO|2025-03-19 08:18:01] logging.py:143 >> {'loss': 0.7471, 'learning_rate': 4.8308e-05, 'epoch': 0.37, 'throughput': 9989.38} [INFO|2025-03-19 08:18:41] logging.py:143 >> {'loss': 0.7150, 'learning_rate': 4.8303e-05, 'epoch': 0.37, 'throughput': 9989.95} [INFO|2025-03-19 08:19:20] logging.py:143 >> {'loss': 0.7271, 'learning_rate': 4.8298e-05, 'epoch': 0.37, 'throughput': 9990.10} [INFO|2025-03-19 08:20:01] logging.py:143 >> {'loss': 0.7280, 'learning_rate': 4.8293e-05, 'epoch': 0.37, 'throughput': 9990.00} [INFO|2025-03-19 08:20:42] logging.py:143 >> {'loss': 0.7802, 'learning_rate': 4.8288e-05, 'epoch': 0.37, 'throughput': 9989.99} [INFO|2025-03-19 08:21:23] logging.py:143 >> {'loss': 0.7168, 'learning_rate': 4.8283e-05, 'epoch': 0.37, 'throughput': 9989.90} [INFO|2025-03-19 08:22:04] logging.py:143 >> {'loss': 0.7521, 'learning_rate': 4.8278e-05, 'epoch': 0.38, 'throughput': 9990.02} [INFO|2025-03-19 08:22:44] logging.py:143 >> {'loss': 0.7288, 'learning_rate': 4.8273e-05, 'epoch': 0.38, 'throughput': 9990.40} [INFO|2025-03-19 08:23:26] logging.py:143 >> {'loss': 0.7414, 'learning_rate': 4.8268e-05, 'epoch': 0.38, 'throughput': 9990.27} [INFO|2025-03-19 08:24:07] logging.py:143 >> {'loss': 0.6920, 'learning_rate': 4.8262e-05, 'epoch': 0.38, 'throughput': 9990.15} [INFO|2025-03-19 08:24:47] logging.py:143 >> {'loss': 0.7412, 'learning_rate': 4.8257e-05, 'epoch': 0.38, 'throughput': 9990.25} [INFO|2025-03-19 08:25:27] logging.py:143 >> {'loss': 0.7261, 'learning_rate': 4.8252e-05, 'epoch': 0.38, 'throughput': 9990.20} [INFO|2025-03-19 08:26:07] logging.py:143 >> {'loss': 0.7241, 'learning_rate': 4.8247e-05, 'epoch': 0.38, 'throughput': 9990.02} [INFO|2025-03-19 08:26:47] logging.py:143 >> {'loss': 0.7395, 'learning_rate': 4.8242e-05, 'epoch': 0.38, 'throughput': 9990.07} [INFO|2025-03-19 08:27:29] logging.py:143 >> {'loss': 0.6999, 'learning_rate': 4.8237e-05, 'epoch': 0.38, 'throughput': 9989.85} [INFO|2025-03-19 08:28:09] logging.py:143 >> {'loss': 0.7238, 'learning_rate': 4.8232e-05, 'epoch': 0.38, 'throughput': 9990.12} [INFO|2025-03-19 08:28:49] logging.py:143 >> {'loss': 0.7323, 'learning_rate': 4.8226e-05, 'epoch': 0.38, 'throughput': 9990.46} [INFO|2025-03-19 08:29:29] logging.py:143 >> {'loss': 0.7220, 'learning_rate': 4.8221e-05, 'epoch': 0.38, 'throughput': 9990.47} [INFO|2025-03-19 08:30:08] logging.py:143 >> {'loss': 0.7221, 'learning_rate': 4.8216e-05, 'epoch': 0.38, 'throughput': 9990.70} [INFO|2025-03-19 08:30:48] logging.py:143 >> {'loss': 0.7326, 'learning_rate': 4.8211e-05, 'epoch': 0.38, 'throughput': 9990.48} [INFO|2025-03-19 08:31:30] logging.py:143 >> {'loss': 0.7080, 'learning_rate': 4.8206e-05, 'epoch': 0.38, 'throughput': 9989.91} [INFO|2025-03-19 08:32:10] logging.py:143 >> {'loss': 0.7644, 'learning_rate': 4.8200e-05, 'epoch': 0.38, 'throughput': 9990.33} [INFO|2025-03-19 08:32:51] logging.py:143 >> {'loss': 0.7466, 'learning_rate': 4.8195e-05, 'epoch': 0.38, 'throughput': 9990.39} [INFO|2025-03-19 08:33:31] logging.py:143 >> {'loss': 0.7313, 'learning_rate': 4.8190e-05, 'epoch': 0.38, 'throughput': 9990.66} [INFO|2025-03-19 08:34:12] logging.py:143 >> {'loss': 0.6993, 'learning_rate': 4.8185e-05, 'epoch': 0.38, 'throughput': 9990.37} [INFO|2025-03-19 08:34:52] logging.py:143 >> {'loss': 0.7072, 'learning_rate': 4.8180e-05, 'epoch': 0.39, 'throughput': 9990.27} [INFO|2025-03-19 08:35:33] logging.py:143 >> {'loss': 0.7565, 'learning_rate': 4.8174e-05, 'epoch': 0.39, 'throughput': 9990.23} [INFO|2025-03-19 08:36:13] logging.py:143 >> {'loss': 0.7336, 'learning_rate': 4.8169e-05, 'epoch': 0.39, 'throughput': 9990.33} [INFO|2025-03-19 08:36:53] logging.py:143 >> {'loss': 0.7486, 'learning_rate': 4.8164e-05, 'epoch': 0.39, 'throughput': 9990.28} [INFO|2025-03-19 08:37:33] logging.py:143 >> {'loss': 0.7329, 'learning_rate': 4.8158e-05, 'epoch': 0.39, 'throughput': 9990.13} [INFO|2025-03-19 08:38:12] logging.py:143 >> {'loss': 0.7242, 'learning_rate': 4.8153e-05, 'epoch': 0.39, 'throughput': 9990.14} [INFO|2025-03-19 08:38:53] logging.py:143 >> {'loss': 0.7355, 'learning_rate': 4.8148e-05, 'epoch': 0.39, 'throughput': 9989.86} [INFO|2025-03-19 08:39:34] logging.py:143 >> {'loss': 0.7004, 'learning_rate': 4.8143e-05, 'epoch': 0.39, 'throughput': 9989.80} [INFO|2025-03-19 08:40:14] logging.py:143 >> {'loss': 0.7285, 'learning_rate': 4.8137e-05, 'epoch': 0.39, 'throughput': 9990.20} [INFO|2025-03-19 08:40:54] logging.py:143 >> {'loss': 0.7443, 'learning_rate': 4.8132e-05, 'epoch': 0.39, 'throughput': 9990.48} [INFO|2025-03-19 08:41:34] logging.py:143 >> {'loss': 0.7678, 'learning_rate': 4.8127e-05, 'epoch': 0.39, 'throughput': 9990.76} [INFO|2025-03-19 08:42:16] logging.py:143 >> {'loss': 0.7350, 'learning_rate': 4.8121e-05, 'epoch': 0.39, 'throughput': 9990.78} [INFO|2025-03-19 08:42:58] logging.py:143 >> {'loss': 0.7084, 'learning_rate': 4.8116e-05, 'epoch': 0.39, 'throughput': 9990.15} [INFO|2025-03-19 08:43:38] logging.py:143 >> {'loss': 0.7176, 'learning_rate': 4.8111e-05, 'epoch': 0.39, 'throughput': 9989.88} [INFO|2025-03-19 08:44:19] logging.py:143 >> {'loss': 0.7076, 'learning_rate': 4.8105e-05, 'epoch': 0.39, 'throughput': 9989.92} [INFO|2025-03-19 08:45:01] logging.py:143 >> {'loss': 0.7349, 'learning_rate': 4.8100e-05, 'epoch': 0.39, 'throughput': 9989.82} [INFO|2025-03-19 08:45:41] logging.py:143 >> {'loss': 0.6931, 'learning_rate': 4.8095e-05, 'epoch': 0.39, 'throughput': 9989.91} [INFO|2025-03-19 08:46:22] logging.py:143 >> {'loss': 0.6963, 'learning_rate': 4.8089e-05, 'epoch': 0.39, 'throughput': 9989.77} [INFO|2025-03-19 08:47:02] logging.py:143 >> {'loss': 0.7469, 'learning_rate': 4.8084e-05, 'epoch': 0.39, 'throughput': 9989.65} [INFO|2025-03-19 08:47:42] logging.py:143 >> {'loss': 0.7126, 'learning_rate': 4.8079e-05, 'epoch': 0.40, 'throughput': 9990.11} [INFO|2025-03-19 08:48:21] logging.py:143 >> {'loss': 0.7372, 'learning_rate': 4.8073e-05, 'epoch': 0.40, 'throughput': 9990.42} [INFO|2025-03-19 08:49:01] logging.py:143 >> {'loss': 0.7436, 'learning_rate': 4.8068e-05, 'epoch': 0.40, 'throughput': 9990.29} [INFO|2025-03-19 08:49:40] logging.py:143 >> {'loss': 0.6776, 'learning_rate': 4.8062e-05, 'epoch': 0.40, 'throughput': 9990.52} [INFO|2025-03-19 08:50:20] logging.py:143 >> {'loss': 0.6945, 'learning_rate': 4.8057e-05, 'epoch': 0.40, 'throughput': 9990.02} [INFO|2025-03-19 08:51:00] logging.py:143 >> {'loss': 0.6888, 'learning_rate': 4.8052e-05, 'epoch': 0.40, 'throughput': 9990.05} [INFO|2025-03-19 08:51:41] logging.py:143 >> {'loss': 0.6803, 'learning_rate': 4.8046e-05, 'epoch': 0.40, 'throughput': 9989.78} [INFO|2025-03-19 08:52:22] logging.py:143 >> {'loss': 0.7163, 'learning_rate': 4.8041e-05, 'epoch': 0.40, 'throughput': 9989.61} [INFO|2025-03-19 08:53:02] logging.py:143 >> {'loss': 0.7210, 'learning_rate': 4.8035e-05, 'epoch': 0.40, 'throughput': 9989.45} [INFO|2025-03-19 08:53:42] logging.py:143 >> {'loss': 0.7182, 'learning_rate': 4.8030e-05, 'epoch': 0.40, 'throughput': 9989.68} [INFO|2025-03-19 08:54:22] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 4.8024e-05, 'epoch': 0.40, 'throughput': 9989.72} [INFO|2025-03-19 08:55:02] logging.py:143 >> {'loss': 0.7533, 'learning_rate': 4.8019e-05, 'epoch': 0.40, 'throughput': 9990.16} [INFO|2025-03-19 08:55:43] logging.py:143 >> {'loss': 0.7041, 'learning_rate': 4.8014e-05, 'epoch': 0.40, 'throughput': 9990.37} [INFO|2025-03-19 08:56:23] logging.py:143 >> {'loss': 0.7483, 'learning_rate': 4.8008e-05, 'epoch': 0.40, 'throughput': 9990.46} [INFO|2025-03-19 08:57:04] logging.py:143 >> {'loss': 0.7342, 'learning_rate': 4.8003e-05, 'epoch': 0.40, 'throughput': 9990.61} [INFO|2025-03-19 08:57:45] logging.py:143 >> {'loss': 0.7219, 'learning_rate': 4.7997e-05, 'epoch': 0.40, 'throughput': 9990.79} [INFO|2025-03-19 08:58:25] logging.py:143 >> {'loss': 0.7175, 'learning_rate': 4.7992e-05, 'epoch': 0.40, 'throughput': 9991.06} [INFO|2025-03-19 08:59:06] logging.py:143 >> {'loss': 0.7265, 'learning_rate': 4.7986e-05, 'epoch': 0.40, 'throughput': 9990.92} [INFO|2025-03-19 08:59:49] logging.py:143 >> {'loss': 0.7400, 'learning_rate': 4.7981e-05, 'epoch': 0.40, 'throughput': 9990.73} [INFO|2025-03-19 09:00:29] logging.py:143 >> {'loss': 0.7379, 'learning_rate': 4.7975e-05, 'epoch': 0.41, 'throughput': 9990.96} [INFO|2025-03-19 09:01:08] logging.py:143 >> {'loss': 0.7242, 'learning_rate': 4.7970e-05, 'epoch': 0.41, 'throughput': 9991.44} [INFO|2025-03-19 09:01:48] logging.py:143 >> {'loss': 0.7061, 'learning_rate': 4.7964e-05, 'epoch': 0.41, 'throughput': 9991.64} [INFO|2025-03-19 09:02:27] logging.py:143 >> {'loss': 0.7336, 'learning_rate': 4.7959e-05, 'epoch': 0.41, 'throughput': 9991.93} [INFO|2025-03-19 09:03:08] logging.py:143 >> {'loss': 0.7178, 'learning_rate': 4.7953e-05, 'epoch': 0.41, 'throughput': 9991.70} [INFO|2025-03-19 09:03:48] logging.py:143 >> {'loss': 0.7103, 'learning_rate': 4.7947e-05, 'epoch': 0.41, 'throughput': 9992.12} [INFO|2025-03-19 09:04:30] logging.py:143 >> {'loss': 0.7104, 'learning_rate': 4.7942e-05, 'epoch': 0.41, 'throughput': 9992.10} [INFO|2025-03-19 09:05:10] logging.py:143 >> {'loss': 0.7114, 'learning_rate': 4.7936e-05, 'epoch': 0.41, 'throughput': 9992.01} [INFO|2025-03-19 09:05:50] logging.py:143 >> {'loss': 0.7183, 'learning_rate': 4.7931e-05, 'epoch': 0.41, 'throughput': 9992.03} [INFO|2025-03-19 09:06:30] logging.py:143 >> {'loss': 0.6936, 'learning_rate': 4.7925e-05, 'epoch': 0.41, 'throughput': 9992.02} [INFO|2025-03-19 09:07:10] logging.py:143 >> {'loss': 0.7314, 'learning_rate': 4.7920e-05, 'epoch': 0.41, 'throughput': 9992.20} [INFO|2025-03-19 09:07:49] logging.py:143 >> {'loss': 0.6933, 'learning_rate': 4.7914e-05, 'epoch': 0.41, 'throughput': 9992.08} [INFO|2025-03-19 09:08:30] logging.py:143 >> {'loss': 0.7445, 'learning_rate': 4.7908e-05, 'epoch': 0.41, 'throughput': 9991.99} [INFO|2025-03-19 09:09:09] logging.py:143 >> {'loss': 0.7151, 'learning_rate': 4.7903e-05, 'epoch': 0.41, 'throughput': 9992.18} [INFO|2025-03-19 09:09:51] logging.py:143 >> {'loss': 0.7151, 'learning_rate': 4.7897e-05, 'epoch': 0.41, 'throughput': 9992.06} [INFO|2025-03-19 09:10:32] logging.py:143 >> {'loss': 0.6764, 'learning_rate': 4.7892e-05, 'epoch': 0.41, 'throughput': 9991.90} [INFO|2025-03-19 09:11:12] logging.py:143 >> {'loss': 0.7157, 'learning_rate': 4.7886e-05, 'epoch': 0.41, 'throughput': 9992.26} [INFO|2025-03-19 09:11:53] logging.py:143 >> {'loss': 0.6835, 'learning_rate': 4.7880e-05, 'epoch': 0.41, 'throughput': 9992.10} [INFO|2025-03-19 09:12:33] logging.py:143 >> {'loss': 0.7123, 'learning_rate': 4.7875e-05, 'epoch': 0.42, 'throughput': 9992.18} [INFO|2025-03-19 09:13:14] logging.py:143 >> {'loss': 0.7503, 'learning_rate': 4.7869e-05, 'epoch': 0.42, 'throughput': 9992.39} [INFO|2025-03-19 09:13:56] logging.py:143 >> {'loss': 0.7332, 'learning_rate': 4.7863e-05, 'epoch': 0.42, 'throughput': 9992.43} [INFO|2025-03-19 09:14:35] logging.py:143 >> {'loss': 0.7109, 'learning_rate': 4.7858e-05, 'epoch': 0.42, 'throughput': 9992.78} [INFO|2025-03-19 09:15:15] logging.py:143 >> {'loss': 0.6957, 'learning_rate': 4.7852e-05, 'epoch': 0.42, 'throughput': 9992.62} [INFO|2025-03-19 09:15:56] logging.py:143 >> {'loss': 0.7135, 'learning_rate': 4.7846e-05, 'epoch': 0.42, 'throughput': 9992.38} [INFO|2025-03-19 09:16:37] logging.py:143 >> {'loss': 0.7199, 'learning_rate': 4.7841e-05, 'epoch': 0.42, 'throughput': 9992.38} [INFO|2025-03-19 09:17:18] logging.py:143 >> {'loss': 0.7049, 'learning_rate': 4.7835e-05, 'epoch': 0.42, 'throughput': 9992.22} [INFO|2025-03-19 09:17:59] logging.py:143 >> {'loss': 0.7181, 'learning_rate': 4.7829e-05, 'epoch': 0.42, 'throughput': 9991.88} [INFO|2025-03-19 09:18:39] logging.py:143 >> {'loss': 0.7251, 'learning_rate': 4.7824e-05, 'epoch': 0.42, 'throughput': 9991.99} [INFO|2025-03-19 09:19:21] logging.py:143 >> {'loss': 0.7115, 'learning_rate': 4.7818e-05, 'epoch': 0.42, 'throughput': 9991.69} [INFO|2025-03-19 09:20:01] logging.py:143 >> {'loss': 0.7394, 'learning_rate': 4.7812e-05, 'epoch': 0.42, 'throughput': 9991.57} [INFO|2025-03-19 09:20:42] logging.py:143 >> {'loss': 0.7149, 'learning_rate': 4.7806e-05, 'epoch': 0.42, 'throughput': 9991.82} [INFO|2025-03-19 09:21:21] logging.py:143 >> {'loss': 0.7285, 'learning_rate': 4.7801e-05, 'epoch': 0.42, 'throughput': 9991.93} [INFO|2025-03-19 09:22:01] logging.py:143 >> {'loss': 0.7166, 'learning_rate': 4.7795e-05, 'epoch': 0.42, 'throughput': 9992.21} [INFO|2025-03-19 09:22:40] logging.py:143 >> {'loss': 0.6893, 'learning_rate': 4.7789e-05, 'epoch': 0.42, 'throughput': 9992.05} [INFO|2025-03-19 09:23:20] logging.py:143 >> {'loss': 0.7560, 'learning_rate': 4.7783e-05, 'epoch': 0.42, 'throughput': 9992.55} [INFO|2025-03-19 09:24:00] logging.py:143 >> {'loss': 0.6952, 'learning_rate': 4.7778e-05, 'epoch': 0.42, 'throughput': 9992.79} [INFO|2025-03-19 09:24:40] logging.py:143 >> {'loss': 0.6999, 'learning_rate': 4.7772e-05, 'epoch': 0.42, 'throughput': 9992.66} [INFO|2025-03-19 09:25:20] logging.py:143 >> {'loss': 0.7088, 'learning_rate': 4.7766e-05, 'epoch': 0.43, 'throughput': 9992.78} [INFO|2025-03-19 09:25:59] logging.py:143 >> {'loss': 0.7259, 'learning_rate': 4.7760e-05, 'epoch': 0.43, 'throughput': 9993.32} [INFO|2025-03-19 09:26:40] logging.py:143 >> {'loss': 0.7141, 'learning_rate': 4.7754e-05, 'epoch': 0.43, 'throughput': 9993.23} [INFO|2025-03-19 09:27:22] logging.py:143 >> {'loss': 0.7080, 'learning_rate': 4.7749e-05, 'epoch': 0.43, 'throughput': 9993.21} [INFO|2025-03-19 09:28:02] logging.py:143 >> {'loss': 0.7356, 'learning_rate': 4.7743e-05, 'epoch': 0.43, 'throughput': 9992.97} [INFO|2025-03-19 09:28:44] logging.py:143 >> {'loss': 0.7502, 'learning_rate': 4.7737e-05, 'epoch': 0.43, 'throughput': 9992.73} [INFO|2025-03-19 09:29:25] logging.py:143 >> {'loss': 0.7179, 'learning_rate': 4.7731e-05, 'epoch': 0.43, 'throughput': 9992.66} [INFO|2025-03-19 09:30:04] logging.py:143 >> {'loss': 0.7242, 'learning_rate': 4.7725e-05, 'epoch': 0.43, 'throughput': 9992.85} [INFO|2025-03-19 09:30:44] logging.py:143 >> {'loss': 0.7133, 'learning_rate': 4.7720e-05, 'epoch': 0.43, 'throughput': 9993.25} [INFO|2025-03-19 09:31:24] logging.py:143 >> {'loss': 0.6971, 'learning_rate': 4.7714e-05, 'epoch': 0.43, 'throughput': 9993.18} [INFO|2025-03-19 09:32:04] logging.py:143 >> {'loss': 0.7181, 'learning_rate': 4.7708e-05, 'epoch': 0.43, 'throughput': 9993.12} [INFO|2025-03-19 09:32:43] logging.py:143 >> {'loss': 0.6904, 'learning_rate': 4.7702e-05, 'epoch': 0.43, 'throughput': 9993.24} [INFO|2025-03-19 09:33:24] logging.py:143 >> {'loss': 0.7355, 'learning_rate': 4.7696e-05, 'epoch': 0.43, 'throughput': 9993.37} [INFO|2025-03-19 09:34:04] logging.py:143 >> {'loss': 0.6855, 'learning_rate': 4.7690e-05, 'epoch': 0.43, 'throughput': 9993.71} [INFO|2025-03-19 09:34:45] logging.py:143 >> {'loss': 0.7060, 'learning_rate': 4.7684e-05, 'epoch': 0.43, 'throughput': 9993.18} [INFO|2025-03-19 09:35:25] logging.py:143 >> {'loss': 0.6703, 'learning_rate': 4.7679e-05, 'epoch': 0.43, 'throughput': 9993.27} [INFO|2025-03-19 09:36:04] logging.py:143 >> {'loss': 0.7445, 'learning_rate': 4.7673e-05, 'epoch': 0.43, 'throughput': 9993.83} [INFO|2025-03-19 09:36:43] logging.py:143 >> {'loss': 0.7067, 'learning_rate': 4.7667e-05, 'epoch': 0.43, 'throughput': 9993.99} [INFO|2025-03-19 09:37:23] logging.py:143 >> {'loss': 0.7020, 'learning_rate': 4.7661e-05, 'epoch': 0.43, 'throughput': 9993.82} [INFO|2025-03-19 09:38:04] logging.py:143 >> {'loss': 0.7332, 'learning_rate': 4.7655e-05, 'epoch': 0.44, 'throughput': 9993.46} [INFO|2025-03-19 09:38:45] logging.py:143 >> {'loss': 0.7201, 'learning_rate': 4.7649e-05, 'epoch': 0.44, 'throughput': 9993.26} [INFO|2025-03-19 09:39:26] logging.py:143 >> {'loss': 0.6808, 'learning_rate': 4.7643e-05, 'epoch': 0.44, 'throughput': 9993.09} [INFO|2025-03-19 09:40:07] logging.py:143 >> {'loss': 0.7363, 'learning_rate': 4.7637e-05, 'epoch': 0.44, 'throughput': 9993.19} [INFO|2025-03-19 09:40:47] logging.py:143 >> {'loss': 0.7210, 'learning_rate': 4.7631e-05, 'epoch': 0.44, 'throughput': 9993.19} [INFO|2025-03-19 09:41:27] logging.py:143 >> {'loss': 0.7433, 'learning_rate': 4.7625e-05, 'epoch': 0.44, 'throughput': 9993.27} [INFO|2025-03-19 09:42:07] logging.py:143 >> {'loss': 0.7036, 'learning_rate': 4.7619e-05, 'epoch': 0.44, 'throughput': 9993.42} [INFO|2025-03-19 09:42:48] logging.py:143 >> {'loss': 0.7017, 'learning_rate': 4.7613e-05, 'epoch': 0.44, 'throughput': 9993.23} [INFO|2025-03-19 09:43:28] logging.py:143 >> {'loss': 0.6965, 'learning_rate': 4.7607e-05, 'epoch': 0.44, 'throughput': 9993.42} [INFO|2025-03-19 09:44:07] logging.py:143 >> {'loss': 0.7150, 'learning_rate': 4.7601e-05, 'epoch': 0.44, 'throughput': 9993.87} [INFO|2025-03-19 09:44:47] logging.py:143 >> {'loss': 0.7344, 'learning_rate': 4.7595e-05, 'epoch': 0.44, 'throughput': 9993.94} [INFO|2025-03-19 09:45:27] logging.py:143 >> {'loss': 0.7403, 'learning_rate': 4.7589e-05, 'epoch': 0.44, 'throughput': 9994.15} [INFO|2025-03-19 09:46:08] logging.py:143 >> {'loss': 0.7162, 'learning_rate': 4.7583e-05, 'epoch': 0.44, 'throughput': 9994.26} [INFO|2025-03-19 09:46:48] logging.py:143 >> {'loss': 0.7305, 'learning_rate': 4.7577e-05, 'epoch': 0.44, 'throughput': 9994.10} [INFO|2025-03-19 09:47:28] logging.py:143 >> {'loss': 0.6905, 'learning_rate': 4.7571e-05, 'epoch': 0.44, 'throughput': 9994.17} [INFO|2025-03-19 09:48:10] logging.py:143 >> {'loss': 0.7401, 'learning_rate': 4.7565e-05, 'epoch': 0.44, 'throughput': 9994.14} [INFO|2025-03-19 09:48:51] logging.py:143 >> {'loss': 0.7108, 'learning_rate': 4.7559e-05, 'epoch': 0.44, 'throughput': 9994.25} [INFO|2025-03-19 09:49:31] logging.py:143 >> {'loss': 0.7265, 'learning_rate': 4.7553e-05, 'epoch': 0.44, 'throughput': 9994.24} [INFO|2025-03-19 09:50:11] logging.py:143 >> {'loss': 0.6825, 'learning_rate': 4.7547e-05, 'epoch': 0.44, 'throughput': 9994.40} [INFO|2025-03-19 09:50:51] logging.py:143 >> {'loss': 0.7132, 'learning_rate': 4.7541e-05, 'epoch': 0.45, 'throughput': 9994.27} [INFO|2025-03-19 09:51:32] logging.py:143 >> {'loss': 0.7288, 'learning_rate': 4.7535e-05, 'epoch': 0.45, 'throughput': 9994.52} [INFO|2025-03-19 09:52:13] logging.py:143 >> {'loss': 0.7016, 'learning_rate': 4.7529e-05, 'epoch': 0.45, 'throughput': 9994.32} [INFO|2025-03-19 09:52:53] logging.py:143 >> {'loss': 0.7001, 'learning_rate': 4.7523e-05, 'epoch': 0.45, 'throughput': 9994.12} [INFO|2025-03-19 09:53:32] logging.py:143 >> {'loss': 0.7599, 'learning_rate': 4.7517e-05, 'epoch': 0.45, 'throughput': 9994.45} [INFO|2025-03-19 09:54:14] logging.py:143 >> {'loss': 0.6999, 'learning_rate': 4.7511e-05, 'epoch': 0.45, 'throughput': 9994.75} [INFO|2025-03-19 09:54:53] logging.py:143 >> {'loss': 0.7111, 'learning_rate': 4.7505e-05, 'epoch': 0.45, 'throughput': 9994.90} [INFO|2025-03-19 09:55:33] logging.py:143 >> {'loss': 0.7124, 'learning_rate': 4.7499e-05, 'epoch': 0.45, 'throughput': 9994.96} [INFO|2025-03-19 09:56:11] logging.py:143 >> {'loss': 0.6896, 'learning_rate': 4.7493e-05, 'epoch': 0.45, 'throughput': 9995.04} [INFO|2025-03-19 09:56:51] logging.py:143 >> {'loss': 0.7126, 'learning_rate': 4.7486e-05, 'epoch': 0.45, 'throughput': 9994.55} [INFO|2025-03-19 09:57:31] logging.py:143 >> {'loss': 0.7244, 'learning_rate': 4.7480e-05, 'epoch': 0.45, 'throughput': 9995.00} [INFO|2025-03-19 09:58:10] logging.py:143 >> {'loss': 0.7241, 'learning_rate': 4.7474e-05, 'epoch': 0.45, 'throughput': 9995.20} [INFO|2025-03-19 09:58:50] logging.py:143 >> {'loss': 0.6897, 'learning_rate': 4.7468e-05, 'epoch': 0.45, 'throughput': 9995.35} [INFO|2025-03-19 09:59:31] logging.py:143 >> {'loss': 0.7367, 'learning_rate': 4.7462e-05, 'epoch': 0.45, 'throughput': 9995.38} [INFO|2025-03-19 10:00:14] logging.py:143 >> {'loss': 0.7188, 'learning_rate': 4.7456e-05, 'epoch': 0.45, 'throughput': 9994.57} [INFO|2025-03-19 10:00:55] logging.py:143 >> {'loss': 0.7544, 'learning_rate': 4.7450e-05, 'epoch': 0.45, 'throughput': 9994.71} [INFO|2025-03-19 10:01:35] logging.py:143 >> {'loss': 0.7558, 'learning_rate': 4.7443e-05, 'epoch': 0.45, 'throughput': 9995.00} [INFO|2025-03-19 10:02:17] logging.py:143 >> {'loss': 0.7322, 'learning_rate': 4.7437e-05, 'epoch': 0.45, 'throughput': 9995.12} [INFO|2025-03-19 10:02:57] logging.py:143 >> {'loss': 0.7086, 'learning_rate': 4.7431e-05, 'epoch': 0.45, 'throughput': 9994.99} [INFO|2025-03-19 10:03:38] logging.py:143 >> {'loss': 0.7386, 'learning_rate': 4.7425e-05, 'epoch': 0.46, 'throughput': 9994.94} [INFO|2025-03-19 10:04:19] logging.py:143 >> {'loss': 0.7181, 'learning_rate': 4.7419e-05, 'epoch': 0.46, 'throughput': 9995.53} [INFO|2025-03-19 10:04:59] logging.py:143 >> {'loss': 0.7082, 'learning_rate': 4.7413e-05, 'epoch': 0.46, 'throughput': 9995.87} [INFO|2025-03-19 10:05:39] logging.py:143 >> {'loss': 0.7013, 'learning_rate': 4.7406e-05, 'epoch': 0.46, 'throughput': 9996.13} [INFO|2025-03-19 10:06:20] logging.py:143 >> {'loss': 0.7488, 'learning_rate': 4.7400e-05, 'epoch': 0.46, 'throughput': 9996.22} [INFO|2025-03-19 10:07:01] logging.py:143 >> {'loss': 0.7146, 'learning_rate': 4.7394e-05, 'epoch': 0.46, 'throughput': 9996.30} [INFO|2025-03-19 10:07:41] logging.py:143 >> {'loss': 0.7031, 'learning_rate': 4.7388e-05, 'epoch': 0.46, 'throughput': 9996.48} [INFO|2025-03-19 10:08:19] logging.py:143 >> {'loss': 0.7000, 'learning_rate': 4.7381e-05, 'epoch': 0.46, 'throughput': 9996.58} [INFO|2025-03-19 10:09:02] logging.py:143 >> {'loss': 0.7492, 'learning_rate': 4.7375e-05, 'epoch': 0.46, 'throughput': 9996.42} [INFO|2025-03-19 10:09:43] logging.py:143 >> {'loss': 0.6952, 'learning_rate': 4.7369e-05, 'epoch': 0.46, 'throughput': 9996.40} [INFO|2025-03-19 10:10:23] logging.py:143 >> {'loss': 0.7322, 'learning_rate': 4.7363e-05, 'epoch': 0.46, 'throughput': 9996.79} [INFO|2025-03-19 10:11:03] logging.py:143 >> {'loss': 0.6980, 'learning_rate': 4.7356e-05, 'epoch': 0.46, 'throughput': 9996.92} [INFO|2025-03-19 10:11:43] logging.py:143 >> {'loss': 0.7128, 'learning_rate': 4.7350e-05, 'epoch': 0.46, 'throughput': 9997.35} [INFO|2025-03-19 10:12:22] logging.py:143 >> {'loss': 0.7154, 'learning_rate': 4.7344e-05, 'epoch': 0.46, 'throughput': 9997.65} [INFO|2025-03-19 10:13:02] logging.py:143 >> {'loss': 0.6837, 'learning_rate': 4.7338e-05, 'epoch': 0.46, 'throughput': 9997.57} [INFO|2025-03-19 10:13:43] logging.py:143 >> {'loss': 0.6833, 'learning_rate': 4.7331e-05, 'epoch': 0.46, 'throughput': 9997.30} [INFO|2025-03-19 10:14:22] logging.py:143 >> {'loss': 0.7140, 'learning_rate': 4.7325e-05, 'epoch': 0.46, 'throughput': 9997.68} [INFO|2025-03-19 10:15:04] logging.py:143 >> {'loss': 0.7338, 'learning_rate': 4.7319e-05, 'epoch': 0.46, 'throughput': 9997.59} [INFO|2025-03-19 10:15:44] logging.py:143 >> {'loss': 0.7229, 'learning_rate': 4.7312e-05, 'epoch': 0.46, 'throughput': 9997.54} [INFO|2025-03-19 10:16:25] logging.py:143 >> {'loss': 0.7007, 'learning_rate': 4.7306e-05, 'epoch': 0.47, 'throughput': 9997.71} [INFO|2025-03-19 10:17:06] logging.py:143 >> {'loss': 0.7137, 'learning_rate': 4.7300e-05, 'epoch': 0.47, 'throughput': 9997.58} [INFO|2025-03-19 10:17:46] logging.py:143 >> {'loss': 0.7416, 'learning_rate': 4.7293e-05, 'epoch': 0.47, 'throughput': 9998.03} [INFO|2025-03-19 10:18:26] logging.py:143 >> {'loss': 0.6830, 'learning_rate': 4.7287e-05, 'epoch': 0.47, 'throughput': 9997.99} [INFO|2025-03-19 10:19:05] logging.py:143 >> {'loss': 0.7062, 'learning_rate': 4.7281e-05, 'epoch': 0.47, 'throughput': 9998.03} [INFO|2025-03-19 10:19:46] logging.py:143 >> {'loss': 0.7269, 'learning_rate': 4.7274e-05, 'epoch': 0.47, 'throughput': 9997.91} [INFO|2025-03-19 10:20:26] logging.py:143 >> {'loss': 0.6996, 'learning_rate': 4.7268e-05, 'epoch': 0.47, 'throughput': 9998.09} [INFO|2025-03-19 10:21:06] logging.py:143 >> {'loss': 0.7286, 'learning_rate': 4.7262e-05, 'epoch': 0.47, 'throughput': 9998.38} [INFO|2025-03-19 10:21:47] logging.py:143 >> {'loss': 0.7230, 'learning_rate': 4.7255e-05, 'epoch': 0.47, 'throughput': 9998.21} [INFO|2025-03-19 10:22:29] logging.py:143 >> {'loss': 0.6676, 'learning_rate': 4.7249e-05, 'epoch': 0.47, 'throughput': 9998.41} [INFO|2025-03-19 10:23:09] logging.py:143 >> {'loss': 0.6742, 'learning_rate': 4.7243e-05, 'epoch': 0.47, 'throughput': 9998.10} [INFO|2025-03-19 10:23:49] logging.py:143 >> {'loss': 0.7263, 'learning_rate': 4.7236e-05, 'epoch': 0.47, 'throughput': 9998.35} [INFO|2025-03-19 10:24:30] logging.py:143 >> {'loss': 0.6742, 'learning_rate': 4.7230e-05, 'epoch': 0.47, 'throughput': 9998.50} [INFO|2025-03-19 10:25:10] logging.py:143 >> {'loss': 0.6858, 'learning_rate': 4.7223e-05, 'epoch': 0.47, 'throughput': 9998.57} [INFO|2025-03-19 10:25:51] logging.py:143 >> {'loss': 0.7205, 'learning_rate': 4.7217e-05, 'epoch': 0.47, 'throughput': 9998.42} [INFO|2025-03-19 10:26:32] logging.py:143 >> {'loss': 0.7201, 'learning_rate': 4.7211e-05, 'epoch': 0.47, 'throughput': 9998.30} [INFO|2025-03-19 10:27:13] logging.py:143 >> {'loss': 0.6870, 'learning_rate': 4.7204e-05, 'epoch': 0.47, 'throughput': 9998.10} [INFO|2025-03-19 10:27:53] logging.py:143 >> {'loss': 0.7114, 'learning_rate': 4.7198e-05, 'epoch': 0.47, 'throughput': 9998.26} [INFO|2025-03-19 10:28:34] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 4.7191e-05, 'epoch': 0.48, 'throughput': 9998.29} [INFO|2025-03-19 10:29:14] logging.py:143 >> {'loss': 0.6771, 'learning_rate': 4.7185e-05, 'epoch': 0.48, 'throughput': 9998.29} [INFO|2025-03-19 10:29:53] logging.py:143 >> {'loss': 0.6712, 'learning_rate': 4.7178e-05, 'epoch': 0.48, 'throughput': 9998.47} [INFO|2025-03-19 10:30:35] logging.py:143 >> {'loss': 0.6856, 'learning_rate': 4.7172e-05, 'epoch': 0.48, 'throughput': 9998.39} [INFO|2025-03-19 10:31:15] logging.py:143 >> {'loss': 0.6856, 'learning_rate': 4.7165e-05, 'epoch': 0.48, 'throughput': 9998.55} [INFO|2025-03-19 10:31:57] logging.py:143 >> {'loss': 0.7209, 'learning_rate': 4.7159e-05, 'epoch': 0.48, 'throughput': 9998.25} [INFO|2025-03-19 10:32:38] logging.py:143 >> {'loss': 0.6972, 'learning_rate': 4.7152e-05, 'epoch': 0.48, 'throughput': 9998.25} [INFO|2025-03-19 10:33:17] logging.py:143 >> {'loss': 0.6955, 'learning_rate': 4.7146e-05, 'epoch': 0.48, 'throughput': 9998.26} [INFO|2025-03-19 10:33:56] logging.py:143 >> {'loss': 0.7297, 'learning_rate': 4.7140e-05, 'epoch': 0.48, 'throughput': 9998.63} [INFO|2025-03-19 10:34:37] logging.py:143 >> {'loss': 0.6769, 'learning_rate': 4.7133e-05, 'epoch': 0.48, 'throughput': 9998.66} [INFO|2025-03-19 10:35:17] logging.py:143 >> {'loss': 0.6907, 'learning_rate': 4.7126e-05, 'epoch': 0.48, 'throughput': 9998.78} [INFO|2025-03-19 10:35:57] logging.py:143 >> {'loss': 0.6645, 'learning_rate': 4.7120e-05, 'epoch': 0.48, 'throughput': 9998.98} [INFO|2025-03-19 10:36:37] logging.py:143 >> {'loss': 0.7090, 'learning_rate': 4.7113e-05, 'epoch': 0.48, 'throughput': 9999.22} [INFO|2025-03-19 10:37:16] logging.py:143 >> {'loss': 0.7176, 'learning_rate': 4.7107e-05, 'epoch': 0.48, 'throughput': 9999.15} [INFO|2025-03-19 10:37:57] logging.py:143 >> {'loss': 0.7436, 'learning_rate': 4.7100e-05, 'epoch': 0.48, 'throughput': 9998.91} [INFO|2025-03-19 10:38:36] logging.py:143 >> {'loss': 0.6583, 'learning_rate': 4.7094e-05, 'epoch': 0.48, 'throughput': 9999.02} [INFO|2025-03-19 10:39:17] logging.py:143 >> {'loss': 0.7367, 'learning_rate': 4.7087e-05, 'epoch': 0.48, 'throughput': 9999.38} [INFO|2025-03-19 10:39:57] logging.py:143 >> {'loss': 0.7271, 'learning_rate': 4.7081e-05, 'epoch': 0.48, 'throughput': 9999.30} [INFO|2025-03-19 10:40:38] logging.py:143 >> {'loss': 0.7274, 'learning_rate': 4.7074e-05, 'epoch': 0.48, 'throughput': 9999.46} [INFO|2025-03-19 10:41:18] logging.py:143 >> {'loss': 0.6975, 'learning_rate': 4.7068e-05, 'epoch': 0.49, 'throughput': 9999.61} [INFO|2025-03-19 10:41:58] logging.py:143 >> {'loss': 0.6910, 'learning_rate': 4.7061e-05, 'epoch': 0.49, 'throughput': 9999.37} [INFO|2025-03-19 10:42:40] logging.py:143 >> {'loss': 0.6864, 'learning_rate': 4.7054e-05, 'epoch': 0.49, 'throughput': 9999.01} [INFO|2025-03-19 10:43:21] logging.py:143 >> {'loss': 0.6977, 'learning_rate': 4.7048e-05, 'epoch': 0.49, 'throughput': 9999.14} [INFO|2025-03-19 10:44:01] logging.py:143 >> {'loss': 0.6655, 'learning_rate': 4.7041e-05, 'epoch': 0.49, 'throughput': 9999.09} [INFO|2025-03-19 10:44:40] logging.py:143 >> {'loss': 0.6972, 'learning_rate': 4.7035e-05, 'epoch': 0.49, 'throughput': 9999.51} [INFO|2025-03-19 10:45:20] logging.py:143 >> {'loss': 0.7262, 'learning_rate': 4.7028e-05, 'epoch': 0.49, 'throughput': 9999.81} [INFO|2025-03-19 10:46:01] logging.py:143 >> {'loss': 0.6997, 'learning_rate': 4.7021e-05, 'epoch': 0.49, 'throughput': 9999.78} [INFO|2025-03-19 10:46:41] logging.py:143 >> {'loss': 0.7199, 'learning_rate': 4.7015e-05, 'epoch': 0.49, 'throughput': 9999.80} [INFO|2025-03-19 10:47:21] logging.py:143 >> {'loss': 0.7308, 'learning_rate': 4.7008e-05, 'epoch': 0.49, 'throughput': 10000.11} [INFO|2025-03-19 10:48:00] logging.py:143 >> {'loss': 0.6919, 'learning_rate': 4.7001e-05, 'epoch': 0.49, 'throughput': 10000.10} [INFO|2025-03-19 10:48:41] logging.py:143 >> {'loss': 0.7281, 'learning_rate': 4.6995e-05, 'epoch': 0.49, 'throughput': 9999.95} [INFO|2025-03-19 10:49:22] logging.py:143 >> {'loss': 0.7227, 'learning_rate': 4.6988e-05, 'epoch': 0.49, 'throughput': 9999.76} [INFO|2025-03-19 10:50:03] logging.py:143 >> {'loss': 0.7300, 'learning_rate': 4.6982e-05, 'epoch': 0.49, 'throughput': 9999.66} [INFO|2025-03-19 10:50:44] logging.py:143 >> {'loss': 0.6972, 'learning_rate': 4.6975e-05, 'epoch': 0.49, 'throughput': 9999.44} [INFO|2025-03-19 10:51:24] logging.py:143 >> {'loss': 0.7460, 'learning_rate': 4.6968e-05, 'epoch': 0.49, 'throughput': 9999.59} [INFO|2025-03-19 10:52:05] logging.py:143 >> {'loss': 0.6793, 'learning_rate': 4.6961e-05, 'epoch': 0.49, 'throughput': 9999.56} [INFO|2025-03-19 10:52:45] logging.py:143 >> {'loss': 0.6738, 'learning_rate': 4.6955e-05, 'epoch': 0.49, 'throughput': 9999.60} [INFO|2025-03-19 10:53:26] logging.py:143 >> {'loss': 0.7534, 'learning_rate': 4.6948e-05, 'epoch': 0.49, 'throughput': 9999.54} [INFO|2025-03-19 10:54:07] logging.py:143 >> {'loss': 0.7120, 'learning_rate': 4.6941e-05, 'epoch': 0.50, 'throughput': 9999.38} [INFO|2025-03-19 10:54:46] logging.py:143 >> {'loss': 0.6842, 'learning_rate': 4.6935e-05, 'epoch': 0.50, 'throughput': 9999.44} [INFO|2025-03-19 10:55:26] logging.py:143 >> {'loss': 0.7098, 'learning_rate': 4.6928e-05, 'epoch': 0.50, 'throughput': 9999.42} [INFO|2025-03-19 10:56:06] logging.py:143 >> {'loss': 0.7031, 'learning_rate': 4.6921e-05, 'epoch': 0.50, 'throughput': 9999.78} [INFO|2025-03-19 10:56:46] logging.py:143 >> {'loss': 0.6955, 'learning_rate': 4.6915e-05, 'epoch': 0.50, 'throughput': 10000.00} [INFO|2025-03-19 10:57:24] logging.py:143 >> {'loss': 0.7168, 'learning_rate': 4.6908e-05, 'epoch': 0.50, 'throughput': 10000.37} [INFO|2025-03-19 10:58:04] logging.py:143 >> {'loss': 0.6956, 'learning_rate': 4.6901e-05, 'epoch': 0.50, 'throughput': 10000.28} [INFO|2025-03-19 10:58:44] logging.py:143 >> {'loss': 0.7243, 'learning_rate': 4.6894e-05, 'epoch': 0.50, 'throughput': 10000.52} [INFO|2025-03-19 10:59:25] logging.py:143 >> {'loss': 0.7165, 'learning_rate': 4.6888e-05, 'epoch': 0.50, 'throughput': 10000.57} [INFO|2025-03-19 11:00:06] logging.py:143 >> {'loss': 0.7088, 'learning_rate': 4.6881e-05, 'epoch': 0.50, 'throughput': 10000.57} [INFO|2025-03-19 11:00:47] logging.py:143 >> {'loss': 0.7244, 'learning_rate': 4.6874e-05, 'epoch': 0.50, 'throughput': 10000.53} [INFO|2025-03-19 11:01:28] logging.py:143 >> {'loss': 0.6925, 'learning_rate': 4.6867e-05, 'epoch': 0.50, 'throughput': 10000.65} [INFO|2025-03-19 11:02:07] logging.py:143 >> {'loss': 0.6663, 'learning_rate': 4.6860e-05, 'epoch': 0.50, 'throughput': 10000.72} [INFO|2025-03-19 11:02:49] logging.py:143 >> {'loss': 0.7028, 'learning_rate': 4.6854e-05, 'epoch': 0.50, 'throughput': 10000.37} [INFO|2025-03-19 11:03:29] logging.py:143 >> {'loss': 0.7117, 'learning_rate': 4.6847e-05, 'epoch': 0.50, 'throughput': 10000.41} [INFO|2025-03-19 11:04:10] logging.py:143 >> {'loss': 0.7525, 'learning_rate': 4.6840e-05, 'epoch': 0.50, 'throughput': 10000.43} [INFO|2025-03-19 11:04:49] logging.py:143 >> {'loss': 0.6841, 'learning_rate': 4.6833e-05, 'epoch': 0.50, 'throughput': 10000.54} [INFO|2025-03-19 11:05:29] logging.py:143 >> {'loss': 0.6654, 'learning_rate': 4.6826e-05, 'epoch': 0.50, 'throughput': 10000.45} [INFO|2025-03-19 11:06:10] logging.py:143 >> {'loss': 0.7175, 'learning_rate': 4.6820e-05, 'epoch': 0.50, 'throughput': 10000.33} [INFO|2025-03-19 11:06:51] logging.py:143 >> {'loss': 0.7078, 'learning_rate': 4.6813e-05, 'epoch': 0.51, 'throughput': 10000.41} [INFO|2025-03-19 11:07:33] logging.py:143 >> {'loss': 0.6907, 'learning_rate': 4.6806e-05, 'epoch': 0.51, 'throughput': 10000.07} [INFO|2025-03-19 11:08:14] logging.py:143 >> {'loss': 0.7446, 'learning_rate': 4.6799e-05, 'epoch': 0.51, 'throughput': 10000.17} [INFO|2025-03-19 11:08:54] logging.py:143 >> {'loss': 0.7069, 'learning_rate': 4.6792e-05, 'epoch': 0.51, 'throughput': 10000.24} [INFO|2025-03-19 11:09:34] logging.py:143 >> {'loss': 0.7325, 'learning_rate': 4.6785e-05, 'epoch': 0.51, 'throughput': 10000.52} [INFO|2025-03-19 11:10:15] logging.py:143 >> {'loss': 0.7053, 'learning_rate': 4.6778e-05, 'epoch': 0.51, 'throughput': 10000.19} [INFO|2025-03-19 11:10:56] logging.py:143 >> {'loss': 0.7137, 'learning_rate': 4.6772e-05, 'epoch': 0.51, 'throughput': 10000.21} [INFO|2025-03-19 11:11:36] logging.py:143 >> {'loss': 0.7056, 'learning_rate': 4.6765e-05, 'epoch': 0.51, 'throughput': 10000.32} [INFO|2025-03-19 11:12:17] logging.py:143 >> {'loss': 0.7354, 'learning_rate': 4.6758e-05, 'epoch': 0.51, 'throughput': 10000.28} [INFO|2025-03-19 11:12:57] logging.py:143 >> {'loss': 0.7054, 'learning_rate': 4.6751e-05, 'epoch': 0.51, 'throughput': 10000.16} [INFO|2025-03-19 11:13:37] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 4.6744e-05, 'epoch': 0.51, 'throughput': 10000.16} [INFO|2025-03-19 11:14:18] logging.py:143 >> {'loss': 0.6978, 'learning_rate': 4.6737e-05, 'epoch': 0.51, 'throughput': 9999.92} [INFO|2025-03-19 11:15:00] logging.py:143 >> {'loss': 0.7071, 'learning_rate': 4.6730e-05, 'epoch': 0.51, 'throughput': 9999.86} [INFO|2025-03-19 11:15:40] logging.py:143 >> {'loss': 0.7043, 'learning_rate': 4.6723e-05, 'epoch': 0.51, 'throughput': 9999.66} [INFO|2025-03-19 11:16:20] logging.py:143 >> {'loss': 0.7031, 'learning_rate': 4.6716e-05, 'epoch': 0.51, 'throughput': 9999.96} [INFO|2025-03-19 11:17:01] logging.py:143 >> {'loss': 0.7132, 'learning_rate': 4.6709e-05, 'epoch': 0.51, 'throughput': 10000.13} [INFO|2025-03-19 11:17:41] logging.py:143 >> {'loss': 0.7381, 'learning_rate': 4.6702e-05, 'epoch': 0.51, 'throughput': 10000.15} [INFO|2025-03-19 11:18:20] logging.py:143 >> {'loss': 0.6980, 'learning_rate': 4.6696e-05, 'epoch': 0.51, 'throughput': 10000.44} [INFO|2025-03-19 11:19:01] logging.py:143 >> {'loss': 0.7229, 'learning_rate': 4.6689e-05, 'epoch': 0.51, 'throughput': 10000.47} [INFO|2025-03-19 11:19:41] logging.py:143 >> {'loss': 0.6958, 'learning_rate': 4.6682e-05, 'epoch': 0.52, 'throughput': 10000.54} [INFO|2025-03-19 11:20:20] logging.py:143 >> {'loss': 0.7031, 'learning_rate': 4.6675e-05, 'epoch': 0.52, 'throughput': 10000.76} [INFO|2025-03-19 11:21:01] logging.py:143 >> {'loss': 0.6924, 'learning_rate': 4.6668e-05, 'epoch': 0.52, 'throughput': 10000.73} [INFO|2025-03-19 11:21:40] logging.py:143 >> {'loss': 0.6743, 'learning_rate': 4.6661e-05, 'epoch': 0.52, 'throughput': 10000.84} [INFO|2025-03-19 11:22:23] logging.py:143 >> {'loss': 0.6960, 'learning_rate': 4.6654e-05, 'epoch': 0.52, 'throughput': 10000.66} [INFO|2025-03-19 11:23:03] logging.py:143 >> {'loss': 0.6874, 'learning_rate': 4.6647e-05, 'epoch': 0.52, 'throughput': 10000.66} [INFO|2025-03-19 11:23:43] logging.py:143 >> {'loss': 0.7081, 'learning_rate': 4.6640e-05, 'epoch': 0.52, 'throughput': 10000.98} [INFO|2025-03-19 11:24:25] logging.py:143 >> {'loss': 0.7200, 'learning_rate': 4.6633e-05, 'epoch': 0.52, 'throughput': 10000.81} [INFO|2025-03-19 11:25:05] logging.py:143 >> {'loss': 0.7240, 'learning_rate': 4.6626e-05, 'epoch': 0.52, 'throughput': 10000.82} [INFO|2025-03-19 11:25:45] logging.py:143 >> {'loss': 0.6707, 'learning_rate': 4.6619e-05, 'epoch': 0.52, 'throughput': 10000.53} [INFO|2025-03-19 11:26:26] logging.py:143 >> {'loss': 0.7208, 'learning_rate': 4.6612e-05, 'epoch': 0.52, 'throughput': 10000.47} [INFO|2025-03-19 11:27:06] logging.py:143 >> {'loss': 0.7275, 'learning_rate': 4.6605e-05, 'epoch': 0.52, 'throughput': 10000.64} [INFO|2025-03-19 11:27:45] logging.py:143 >> {'loss': 0.7008, 'learning_rate': 4.6597e-05, 'epoch': 0.52, 'throughput': 10000.83} [INFO|2025-03-19 11:28:25] logging.py:143 >> {'loss': 0.7415, 'learning_rate': 4.6590e-05, 'epoch': 0.52, 'throughput': 10001.18} [INFO|2025-03-19 11:29:03] logging.py:143 >> {'loss': 0.6883, 'learning_rate': 4.6583e-05, 'epoch': 0.52, 'throughput': 10001.56} [INFO|2025-03-19 11:29:44] logging.py:143 >> {'loss': 0.6580, 'learning_rate': 4.6576e-05, 'epoch': 0.52, 'throughput': 10001.38} [INFO|2025-03-19 11:30:25] logging.py:143 >> {'loss': 0.7240, 'learning_rate': 4.6569e-05, 'epoch': 0.52, 'throughput': 10001.40} [INFO|2025-03-19 11:31:05] logging.py:143 >> {'loss': 0.6701, 'learning_rate': 4.6562e-05, 'epoch': 0.52, 'throughput': 10001.47} [INFO|2025-03-19 11:31:45] logging.py:143 >> {'loss': 0.6910, 'learning_rate': 4.6555e-05, 'epoch': 0.52, 'throughput': 10001.57} [INFO|2025-03-19 11:32:25] logging.py:143 >> {'loss': 0.6516, 'learning_rate': 4.6548e-05, 'epoch': 0.53, 'throughput': 10001.43} [INFO|2025-03-19 11:33:06] logging.py:143 >> {'loss': 0.6585, 'learning_rate': 4.6541e-05, 'epoch': 0.53, 'throughput': 10001.44} [INFO|2025-03-19 11:33:48] logging.py:143 >> {'loss': 0.6890, 'learning_rate': 4.6534e-05, 'epoch': 0.53, 'throughput': 10001.11} [INFO|2025-03-19 11:34:29] logging.py:143 >> {'loss': 0.6642, 'learning_rate': 4.6527e-05, 'epoch': 0.53, 'throughput': 10000.92} [INFO|2025-03-19 11:35:08] logging.py:143 >> {'loss': 0.7137, 'learning_rate': 4.6520e-05, 'epoch': 0.53, 'throughput': 10000.97} [INFO|2025-03-19 11:35:48] logging.py:143 >> {'loss': 0.7200, 'learning_rate': 4.6512e-05, 'epoch': 0.53, 'throughput': 10001.18} [INFO|2025-03-19 11:36:27] logging.py:143 >> {'loss': 0.7011, 'learning_rate': 4.6505e-05, 'epoch': 0.53, 'throughput': 10001.21} [INFO|2025-03-19 11:37:08] logging.py:143 >> {'loss': 0.6760, 'learning_rate': 4.6498e-05, 'epoch': 0.53, 'throughput': 10001.17} [INFO|2025-03-19 11:37:49] logging.py:143 >> {'loss': 0.6900, 'learning_rate': 4.6491e-05, 'epoch': 0.53, 'throughput': 10000.96} [INFO|2025-03-19 11:38:28] logging.py:143 >> {'loss': 0.7162, 'learning_rate': 4.6484e-05, 'epoch': 0.53, 'throughput': 10001.32} [INFO|2025-03-19 11:39:08] logging.py:143 >> {'loss': 0.7213, 'learning_rate': 4.6477e-05, 'epoch': 0.53, 'throughput': 10001.64} [INFO|2025-03-19 11:39:12] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000 [INFO|2025-03-19 11:39:12] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000/config.json [INFO|2025-03-19 11:39:12] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000/generation_config.json [INFO|2025-03-19 11:39:23] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000/model.safetensors.index.json. [INFO|2025-03-19 11:39:23] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000/tokenizer_config.json [INFO|2025-03-19 11:39:23] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-5000/special_tokens_map.json [INFO|2025-03-19 11:40:27] logging.py:143 >> {'loss': 0.6952, 'learning_rate': 4.6470e-05, 'epoch': 0.53, 'throughput': 9992.09} [INFO|2025-03-19 11:41:06] logging.py:143 >> {'loss': 0.6974, 'learning_rate': 4.6462e-05, 'epoch': 0.53, 'throughput': 9992.10} [INFO|2025-03-19 11:41:45] logging.py:143 >> {'loss': 0.6848, 'learning_rate': 4.6455e-05, 'epoch': 0.53, 'throughput': 9992.34} [INFO|2025-03-19 11:42:25] logging.py:143 >> {'loss': 0.6687, 'learning_rate': 4.6448e-05, 'epoch': 0.53, 'throughput': 9992.18} [INFO|2025-03-19 11:43:07] logging.py:143 >> {'loss': 0.6754, 'learning_rate': 4.6441e-05, 'epoch': 0.53, 'throughput': 9991.85} [INFO|2025-03-19 11:43:46] logging.py:143 >> {'loss': 0.6736, 'learning_rate': 4.6434e-05, 'epoch': 0.53, 'throughput': 9991.89} [INFO|2025-03-19 11:44:25] logging.py:143 >> {'loss': 0.6737, 'learning_rate': 4.6426e-05, 'epoch': 0.53, 'throughput': 9992.18} [INFO|2025-03-19 11:45:04] logging.py:143 >> {'loss': 0.6705, 'learning_rate': 4.6419e-05, 'epoch': 0.53, 'throughput': 9992.48} [INFO|2025-03-19 11:45:46] logging.py:143 >> {'loss': 0.7040, 'learning_rate': 4.6412e-05, 'epoch': 0.54, 'throughput': 9992.59} [INFO|2025-03-19 11:46:26] logging.py:143 >> {'loss': 0.7123, 'learning_rate': 4.6405e-05, 'epoch': 0.54, 'throughput': 9992.88} [INFO|2025-03-19 11:47:07] logging.py:143 >> {'loss': 0.6851, 'learning_rate': 4.6398e-05, 'epoch': 0.54, 'throughput': 9992.89} [INFO|2025-03-19 11:47:48] logging.py:143 >> {'loss': 0.6888, 'learning_rate': 4.6390e-05, 'epoch': 0.54, 'throughput': 9992.95} [INFO|2025-03-19 11:48:27] logging.py:143 >> {'loss': 0.6996, 'learning_rate': 4.6383e-05, 'epoch': 0.54, 'throughput': 9993.28} [INFO|2025-03-19 11:49:07] logging.py:143 >> {'loss': 0.7145, 'learning_rate': 4.6376e-05, 'epoch': 0.54, 'throughput': 9993.44} [INFO|2025-03-19 11:49:48] logging.py:143 >> {'loss': 0.6496, 'learning_rate': 4.6369e-05, 'epoch': 0.54, 'throughput': 9993.08} [INFO|2025-03-19 11:50:29] logging.py:143 >> {'loss': 0.6791, 'learning_rate': 4.6361e-05, 'epoch': 0.54, 'throughput': 9993.24} [INFO|2025-03-19 11:51:08] logging.py:143 >> {'loss': 0.7134, 'learning_rate': 4.6354e-05, 'epoch': 0.54, 'throughput': 9993.29} [INFO|2025-03-19 11:51:50] logging.py:143 >> {'loss': 0.7026, 'learning_rate': 4.6347e-05, 'epoch': 0.54, 'throughput': 9993.20} [INFO|2025-03-19 11:52:30] logging.py:143 >> {'loss': 0.7036, 'learning_rate': 4.6339e-05, 'epoch': 0.54, 'throughput': 9993.27} [INFO|2025-03-19 11:53:10] logging.py:143 >> {'loss': 0.6819, 'learning_rate': 4.6332e-05, 'epoch': 0.54, 'throughput': 9993.47} [INFO|2025-03-19 11:53:50] logging.py:143 >> {'loss': 0.6677, 'learning_rate': 4.6325e-05, 'epoch': 0.54, 'throughput': 9993.37} [INFO|2025-03-19 11:54:30] logging.py:143 >> {'loss': 0.6882, 'learning_rate': 4.6318e-05, 'epoch': 0.54, 'throughput': 9993.55} [INFO|2025-03-19 11:55:11] logging.py:143 >> {'loss': 0.7397, 'learning_rate': 4.6310e-05, 'epoch': 0.54, 'throughput': 9993.46} [INFO|2025-03-19 11:55:52] logging.py:143 >> {'loss': 0.6587, 'learning_rate': 4.6303e-05, 'epoch': 0.54, 'throughput': 9993.19} [INFO|2025-03-19 11:56:32] logging.py:143 >> {'loss': 0.6927, 'learning_rate': 4.6296e-05, 'epoch': 0.54, 'throughput': 9993.35} [INFO|2025-03-19 11:57:13] logging.py:143 >> {'loss': 0.6858, 'learning_rate': 4.6288e-05, 'epoch': 0.54, 'throughput': 9993.47} [INFO|2025-03-19 11:57:53] logging.py:143 >> {'loss': 0.7117, 'learning_rate': 4.6281e-05, 'epoch': 0.55, 'throughput': 9993.64} [INFO|2025-03-19 11:58:33] logging.py:143 >> {'loss': 0.7005, 'learning_rate': 4.6274e-05, 'epoch': 0.55, 'throughput': 9993.69} [INFO|2025-03-19 11:59:16] logging.py:143 >> {'loss': 0.7025, 'learning_rate': 4.6266e-05, 'epoch': 0.55, 'throughput': 9993.53} [INFO|2025-03-19 11:59:58] logging.py:143 >> {'loss': 0.6722, 'learning_rate': 4.6259e-05, 'epoch': 0.55, 'throughput': 9993.55} [INFO|2025-03-19 12:00:38] logging.py:143 >> {'loss': 0.7266, 'learning_rate': 4.6251e-05, 'epoch': 0.55, 'throughput': 9993.62} [INFO|2025-03-19 12:01:17] logging.py:143 >> {'loss': 0.7105, 'learning_rate': 4.6244e-05, 'epoch': 0.55, 'throughput': 9994.08} [INFO|2025-03-19 12:01:58] logging.py:143 >> {'loss': 0.6867, 'learning_rate': 4.6237e-05, 'epoch': 0.55, 'throughput': 9994.10} [INFO|2025-03-19 12:02:38] logging.py:143 >> {'loss': 0.7070, 'learning_rate': 4.6229e-05, 'epoch': 0.55, 'throughput': 9994.00} [INFO|2025-03-19 12:03:19] logging.py:143 >> {'loss': 0.6726, 'learning_rate': 4.6222e-05, 'epoch': 0.55, 'throughput': 9993.76} [INFO|2025-03-19 12:04:00] logging.py:143 >> {'loss': 0.7100, 'learning_rate': 4.6215e-05, 'epoch': 0.55, 'throughput': 9993.53} [INFO|2025-03-19 12:04:42] logging.py:143 >> {'loss': 0.7107, 'learning_rate': 4.6207e-05, 'epoch': 0.55, 'throughput': 9993.55} [INFO|2025-03-19 12:05:22] logging.py:143 >> {'loss': 0.6731, 'learning_rate': 4.6200e-05, 'epoch': 0.55, 'throughput': 9993.29} [INFO|2025-03-19 12:06:04] logging.py:143 >> {'loss': 0.6810, 'learning_rate': 4.6192e-05, 'epoch': 0.55, 'throughput': 9993.36} [INFO|2025-03-19 12:06:42] logging.py:143 >> {'loss': 0.7058, 'learning_rate': 4.6185e-05, 'epoch': 0.55, 'throughput': 9993.77} [INFO|2025-03-19 12:07:21] logging.py:143 >> {'loss': 0.7049, 'learning_rate': 4.6177e-05, 'epoch': 0.55, 'throughput': 9993.87} [INFO|2025-03-19 12:08:01] logging.py:143 >> {'loss': 0.6709, 'learning_rate': 4.6170e-05, 'epoch': 0.55, 'throughput': 9993.75} [INFO|2025-03-19 12:08:41] logging.py:143 >> {'loss': 0.6788, 'learning_rate': 4.6163e-05, 'epoch': 0.55, 'throughput': 9994.02} [INFO|2025-03-19 12:09:22] logging.py:143 >> {'loss': 0.6621, 'learning_rate': 4.6155e-05, 'epoch': 0.55, 'throughput': 9993.94} [INFO|2025-03-19 12:10:03] logging.py:143 >> {'loss': 0.6675, 'learning_rate': 4.6148e-05, 'epoch': 0.55, 'throughput': 9993.88} [INFO|2025-03-19 12:10:44] logging.py:143 >> {'loss': 0.7198, 'learning_rate': 4.6140e-05, 'epoch': 0.56, 'throughput': 9993.80} [INFO|2025-03-19 12:11:25] logging.py:143 >> {'loss': 0.7163, 'learning_rate': 4.6133e-05, 'epoch': 0.56, 'throughput': 9993.98} [INFO|2025-03-19 12:12:05] logging.py:143 >> {'loss': 0.6864, 'learning_rate': 4.6125e-05, 'epoch': 0.56, 'throughput': 9994.16} [INFO|2025-03-19 12:12:45] logging.py:143 >> {'loss': 0.6795, 'learning_rate': 4.6118e-05, 'epoch': 0.56, 'throughput': 9994.34} [INFO|2025-03-19 12:13:24] logging.py:143 >> {'loss': 0.6539, 'learning_rate': 4.6110e-05, 'epoch': 0.56, 'throughput': 9994.63} [INFO|2025-03-19 12:14:04] logging.py:143 >> {'loss': 0.7013, 'learning_rate': 4.6103e-05, 'epoch': 0.56, 'throughput': 9994.68} [INFO|2025-03-19 12:14:44] logging.py:143 >> {'loss': 0.7047, 'learning_rate': 4.6095e-05, 'epoch': 0.56, 'throughput': 9994.84} [INFO|2025-03-19 12:15:25] logging.py:143 >> {'loss': 0.7209, 'learning_rate': 4.6088e-05, 'epoch': 0.56, 'throughput': 9994.70} [INFO|2025-03-19 12:16:06] logging.py:143 >> {'loss': 0.7307, 'learning_rate': 4.6080e-05, 'epoch': 0.56, 'throughput': 9995.10} [INFO|2025-03-19 12:16:47] logging.py:143 >> {'loss': 0.7081, 'learning_rate': 4.6073e-05, 'epoch': 0.56, 'throughput': 9995.03} [INFO|2025-03-19 12:17:28] logging.py:143 >> {'loss': 0.7059, 'learning_rate': 4.6065e-05, 'epoch': 0.56, 'throughput': 9994.76} [INFO|2025-03-19 12:18:07] logging.py:143 >> {'loss': 0.7216, 'learning_rate': 4.6058e-05, 'epoch': 0.56, 'throughput': 9995.13} [INFO|2025-03-19 12:18:47] logging.py:143 >> {'loss': 0.6297, 'learning_rate': 4.6050e-05, 'epoch': 0.56, 'throughput': 9995.19} [INFO|2025-03-19 12:19:27] logging.py:143 >> {'loss': 0.6903, 'learning_rate': 4.6042e-05, 'epoch': 0.56, 'throughput': 9995.28} [INFO|2025-03-19 12:20:07] logging.py:143 >> {'loss': 0.7051, 'learning_rate': 4.6035e-05, 'epoch': 0.56, 'throughput': 9995.31} [INFO|2025-03-19 12:20:47] logging.py:143 >> {'loss': 0.7039, 'learning_rate': 4.6027e-05, 'epoch': 0.56, 'throughput': 9995.31} [INFO|2025-03-19 12:21:29] logging.py:143 >> {'loss': 0.7249, 'learning_rate': 4.6020e-05, 'epoch': 0.56, 'throughput': 9995.14} [INFO|2025-03-19 12:22:09] logging.py:143 >> {'loss': 0.6415, 'learning_rate': 4.6012e-05, 'epoch': 0.56, 'throughput': 9995.06} [INFO|2025-03-19 12:22:49] logging.py:143 >> {'loss': 0.6890, 'learning_rate': 4.6005e-05, 'epoch': 0.56, 'throughput': 9994.85} [INFO|2025-03-19 12:23:29] logging.py:143 >> {'loss': 0.6909, 'learning_rate': 4.5997e-05, 'epoch': 0.57, 'throughput': 9994.74} [INFO|2025-03-19 12:24:10] logging.py:143 >> {'loss': 0.7141, 'learning_rate': 4.5989e-05, 'epoch': 0.57, 'throughput': 9994.90} [INFO|2025-03-19 12:24:51] logging.py:143 >> {'loss': 0.6757, 'learning_rate': 4.5982e-05, 'epoch': 0.57, 'throughput': 9994.78} [INFO|2025-03-19 12:25:33] logging.py:143 >> {'loss': 0.7039, 'learning_rate': 4.5974e-05, 'epoch': 0.57, 'throughput': 9994.68} [INFO|2025-03-19 12:26:13] logging.py:143 >> {'loss': 0.7307, 'learning_rate': 4.5967e-05, 'epoch': 0.57, 'throughput': 9994.91} [INFO|2025-03-19 12:26:55] logging.py:143 >> {'loss': 0.6997, 'learning_rate': 4.5959e-05, 'epoch': 0.57, 'throughput': 9994.92} [INFO|2025-03-19 12:27:37] logging.py:143 >> {'loss': 0.6768, 'learning_rate': 4.5951e-05, 'epoch': 0.57, 'throughput': 9994.52} [INFO|2025-03-19 12:28:17] logging.py:143 >> {'loss': 0.6844, 'learning_rate': 4.5944e-05, 'epoch': 0.57, 'throughput': 9994.55} [INFO|2025-03-19 12:28:59] logging.py:143 >> {'loss': 0.6705, 'learning_rate': 4.5936e-05, 'epoch': 0.57, 'throughput': 9994.18} [INFO|2025-03-19 12:29:39] logging.py:143 >> {'loss': 0.6983, 'learning_rate': 4.5928e-05, 'epoch': 0.57, 'throughput': 9994.04} [INFO|2025-03-19 12:30:19] logging.py:143 >> {'loss': 0.7282, 'learning_rate': 4.5921e-05, 'epoch': 0.57, 'throughput': 9994.20} [INFO|2025-03-19 12:30:59] logging.py:143 >> {'loss': 0.6970, 'learning_rate': 4.5913e-05, 'epoch': 0.57, 'throughput': 9994.21} [INFO|2025-03-19 12:31:39] logging.py:143 >> {'loss': 0.6783, 'learning_rate': 4.5905e-05, 'epoch': 0.57, 'throughput': 9994.52} [INFO|2025-03-19 12:32:17] logging.py:143 >> {'loss': 0.6874, 'learning_rate': 4.5898e-05, 'epoch': 0.57, 'throughput': 9994.82} [INFO|2025-03-19 12:32:58] logging.py:143 >> {'loss': 0.7158, 'learning_rate': 4.5890e-05, 'epoch': 0.57, 'throughput': 9995.14} [INFO|2025-03-19 12:33:38] logging.py:143 >> {'loss': 0.6667, 'learning_rate': 4.5882e-05, 'epoch': 0.57, 'throughput': 9995.20} [INFO|2025-03-19 12:34:18] logging.py:143 >> {'loss': 0.6902, 'learning_rate': 4.5875e-05, 'epoch': 0.57, 'throughput': 9995.11} [INFO|2025-03-19 12:34:57] logging.py:143 >> {'loss': 0.6428, 'learning_rate': 4.5867e-05, 'epoch': 0.57, 'throughput': 9995.14} [INFO|2025-03-19 12:35:37] logging.py:143 >> {'loss': 0.6794, 'learning_rate': 4.5859e-05, 'epoch': 0.57, 'throughput': 9995.21} [INFO|2025-03-19 12:36:18] logging.py:143 >> {'loss': 0.6814, 'learning_rate': 4.5852e-05, 'epoch': 0.58, 'throughput': 9995.19} [INFO|2025-03-19 12:36:57] logging.py:143 >> {'loss': 0.6606, 'learning_rate': 4.5844e-05, 'epoch': 0.58, 'throughput': 9995.29} [INFO|2025-03-19 12:37:37] logging.py:143 >> {'loss': 0.7034, 'learning_rate': 4.5836e-05, 'epoch': 0.58, 'throughput': 9995.43} [INFO|2025-03-19 12:38:18] logging.py:143 >> {'loss': 0.6890, 'learning_rate': 4.5828e-05, 'epoch': 0.58, 'throughput': 9995.44} [INFO|2025-03-19 12:39:00] logging.py:143 >> {'loss': 0.6991, 'learning_rate': 4.5821e-05, 'epoch': 0.58, 'throughput': 9995.22} [INFO|2025-03-19 12:39:40] logging.py:143 >> {'loss': 0.6955, 'learning_rate': 4.5813e-05, 'epoch': 0.58, 'throughput': 9995.24} [INFO|2025-03-19 12:40:22] logging.py:143 >> {'loss': 0.6960, 'learning_rate': 4.5805e-05, 'epoch': 0.58, 'throughput': 9995.24} [INFO|2025-03-19 12:41:01] logging.py:143 >> {'loss': 0.6992, 'learning_rate': 4.5797e-05, 'epoch': 0.58, 'throughput': 9995.67} [INFO|2025-03-19 12:41:43] logging.py:143 >> {'loss': 0.6873, 'learning_rate': 4.5790e-05, 'epoch': 0.58, 'throughput': 9995.75} [INFO|2025-03-19 12:42:23] logging.py:143 >> {'loss': 0.6923, 'learning_rate': 4.5782e-05, 'epoch': 0.58, 'throughput': 9995.85} [INFO|2025-03-19 12:43:03] logging.py:143 >> {'loss': 0.6871, 'learning_rate': 4.5774e-05, 'epoch': 0.58, 'throughput': 9995.98} [INFO|2025-03-19 12:43:44] logging.py:143 >> {'loss': 0.6775, 'learning_rate': 4.5766e-05, 'epoch': 0.58, 'throughput': 9995.99} [INFO|2025-03-19 12:44:24] logging.py:143 >> {'loss': 0.6957, 'learning_rate': 4.5758e-05, 'epoch': 0.58, 'throughput': 9996.17} [INFO|2025-03-19 12:45:05] logging.py:143 >> {'loss': 0.6886, 'learning_rate': 4.5751e-05, 'epoch': 0.58, 'throughput': 9996.21} [INFO|2025-03-19 12:45:45] logging.py:143 >> {'loss': 0.6912, 'learning_rate': 4.5743e-05, 'epoch': 0.58, 'throughput': 9996.34} [INFO|2025-03-19 12:46:25] logging.py:143 >> {'loss': 0.6771, 'learning_rate': 4.5735e-05, 'epoch': 0.58, 'throughput': 9996.55} [INFO|2025-03-19 12:47:05] logging.py:143 >> {'loss': 0.6975, 'learning_rate': 4.5727e-05, 'epoch': 0.58, 'throughput': 9996.59} [INFO|2025-03-19 12:47:45] logging.py:143 >> {'loss': 0.6726, 'learning_rate': 4.5719e-05, 'epoch': 0.58, 'throughput': 9996.59} [INFO|2025-03-19 12:48:24] logging.py:143 >> {'loss': 0.6671, 'learning_rate': 4.5712e-05, 'epoch': 0.58, 'throughput': 9996.59} [INFO|2025-03-19 12:49:05] logging.py:143 >> {'loss': 0.6792, 'learning_rate': 4.5704e-05, 'epoch': 0.59, 'throughput': 9996.44} [INFO|2025-03-19 12:49:44] logging.py:143 >> {'loss': 0.6876, 'learning_rate': 4.5696e-05, 'epoch': 0.59, 'throughput': 9996.69} [INFO|2025-03-19 12:50:25] logging.py:143 >> {'loss': 0.6764, 'learning_rate': 4.5688e-05, 'epoch': 0.59, 'throughput': 9996.83} [INFO|2025-03-19 12:51:05] logging.py:143 >> {'loss': 0.6866, 'learning_rate': 4.5680e-05, 'epoch': 0.59, 'throughput': 9996.99} [INFO|2025-03-19 12:51:46] logging.py:143 >> {'loss': 0.6954, 'learning_rate': 4.5672e-05, 'epoch': 0.59, 'throughput': 9996.94} [INFO|2025-03-19 12:52:27] logging.py:143 >> {'loss': 0.6755, 'learning_rate': 4.5664e-05, 'epoch': 0.59, 'throughput': 9996.82} [INFO|2025-03-19 12:53:06] logging.py:143 >> {'loss': 0.7207, 'learning_rate': 4.5656e-05, 'epoch': 0.59, 'throughput': 9996.96} [INFO|2025-03-19 12:53:47] logging.py:143 >> {'loss': 0.6938, 'learning_rate': 4.5649e-05, 'epoch': 0.59, 'throughput': 9996.92} [INFO|2025-03-19 12:54:28] logging.py:143 >> {'loss': 0.6539, 'learning_rate': 4.5641e-05, 'epoch': 0.59, 'throughput': 9997.02} [INFO|2025-03-19 12:55:08] logging.py:143 >> {'loss': 0.6929, 'learning_rate': 4.5633e-05, 'epoch': 0.59, 'throughput': 9997.00} [INFO|2025-03-19 12:55:46] logging.py:143 >> {'loss': 0.7147, 'learning_rate': 4.5625e-05, 'epoch': 0.59, 'throughput': 9997.40} [INFO|2025-03-19 12:56:26] logging.py:143 >> {'loss': 0.6609, 'learning_rate': 4.5617e-05, 'epoch': 0.59, 'throughput': 9997.66} [INFO|2025-03-19 12:57:06] logging.py:143 >> {'loss': 0.6513, 'learning_rate': 4.5609e-05, 'epoch': 0.59, 'throughput': 9997.48} [INFO|2025-03-19 12:57:47] logging.py:143 >> {'loss': 0.6935, 'learning_rate': 4.5601e-05, 'epoch': 0.59, 'throughput': 9997.66} [INFO|2025-03-19 12:58:28] logging.py:143 >> {'loss': 0.6691, 'learning_rate': 4.5593e-05, 'epoch': 0.59, 'throughput': 9997.36} [INFO|2025-03-19 12:59:09] logging.py:143 >> {'loss': 0.6364, 'learning_rate': 4.5585e-05, 'epoch': 0.59, 'throughput': 9997.03} [INFO|2025-03-19 12:59:51] logging.py:143 >> {'loss': 0.6628, 'learning_rate': 4.5577e-05, 'epoch': 0.59, 'throughput': 9996.71} [INFO|2025-03-19 13:00:33] logging.py:143 >> {'loss': 0.6994, 'learning_rate': 4.5569e-05, 'epoch': 0.59, 'throughput': 9996.43} [INFO|2025-03-19 13:01:15] logging.py:143 >> {'loss': 0.6400, 'learning_rate': 4.5561e-05, 'epoch': 0.59, 'throughput': 9996.12} [INFO|2025-03-19 13:01:56] logging.py:143 >> {'loss': 0.6942, 'learning_rate': 4.5553e-05, 'epoch': 0.60, 'throughput': 9996.15} [INFO|2025-03-19 13:02:36] logging.py:143 >> {'loss': 0.6865, 'learning_rate': 4.5546e-05, 'epoch': 0.60, 'throughput': 9995.93} [INFO|2025-03-19 13:03:17] logging.py:143 >> {'loss': 0.7094, 'learning_rate': 4.5538e-05, 'epoch': 0.60, 'throughput': 9996.00} [INFO|2025-03-19 13:03:56] logging.py:143 >> {'loss': 0.6645, 'learning_rate': 4.5530e-05, 'epoch': 0.60, 'throughput': 9996.08} [INFO|2025-03-19 13:04:38] logging.py:143 >> {'loss': 0.6957, 'learning_rate': 4.5522e-05, 'epoch': 0.60, 'throughput': 9996.00} [INFO|2025-03-19 13:05:18] logging.py:143 >> {'loss': 0.7102, 'learning_rate': 4.5514e-05, 'epoch': 0.60, 'throughput': 9996.17} [INFO|2025-03-19 13:05:59] logging.py:143 >> {'loss': 0.7133, 'learning_rate': 4.5506e-05, 'epoch': 0.60, 'throughput': 9996.08} [INFO|2025-03-19 13:06:37] logging.py:143 >> {'loss': 0.7056, 'learning_rate': 4.5498e-05, 'epoch': 0.60, 'throughput': 9996.39} [INFO|2025-03-19 13:07:17] logging.py:143 >> {'loss': 0.6675, 'learning_rate': 4.5490e-05, 'epoch': 0.60, 'throughput': 9996.15} [INFO|2025-03-19 13:07:57] logging.py:143 >> {'loss': 0.6994, 'learning_rate': 4.5482e-05, 'epoch': 0.60, 'throughput': 9996.24} [INFO|2025-03-19 13:08:37] logging.py:143 >> {'loss': 0.6443, 'learning_rate': 4.5473e-05, 'epoch': 0.60, 'throughput': 9996.29} [INFO|2025-03-19 13:09:19] logging.py:143 >> {'loss': 0.7034, 'learning_rate': 4.5465e-05, 'epoch': 0.60, 'throughput': 9996.08} [INFO|2025-03-19 13:10:00] logging.py:143 >> {'loss': 0.6780, 'learning_rate': 4.5457e-05, 'epoch': 0.60, 'throughput': 9996.13} [INFO|2025-03-19 13:10:40] logging.py:143 >> {'loss': 0.7566, 'learning_rate': 4.5449e-05, 'epoch': 0.60, 'throughput': 9996.20} [INFO|2025-03-19 13:11:19] logging.py:143 >> {'loss': 0.6577, 'learning_rate': 4.5441e-05, 'epoch': 0.60, 'throughput': 9996.24} [INFO|2025-03-19 13:12:01] logging.py:143 >> {'loss': 0.7026, 'learning_rate': 4.5433e-05, 'epoch': 0.60, 'throughput': 9996.01} [INFO|2025-03-19 13:12:42] logging.py:143 >> {'loss': 0.6879, 'learning_rate': 4.5425e-05, 'epoch': 0.60, 'throughput': 9995.98} [INFO|2025-03-19 13:13:23] logging.py:143 >> {'loss': 0.6781, 'learning_rate': 4.5417e-05, 'epoch': 0.60, 'throughput': 9995.86} [INFO|2025-03-19 13:14:03] logging.py:143 >> {'loss': 0.6582, 'learning_rate': 4.5409e-05, 'epoch': 0.61, 'throughput': 9995.83} [INFO|2025-03-19 13:14:44] logging.py:143 >> {'loss': 0.6822, 'learning_rate': 4.5401e-05, 'epoch': 0.61, 'throughput': 9995.75} [INFO|2025-03-19 13:15:24] logging.py:143 >> {'loss': 0.6886, 'learning_rate': 4.5393e-05, 'epoch': 0.61, 'throughput': 9995.66} [INFO|2025-03-19 13:16:04] logging.py:143 >> {'loss': 0.7199, 'learning_rate': 4.5385e-05, 'epoch': 0.61, 'throughput': 9995.83} [INFO|2025-03-19 13:16:43] logging.py:143 >> {'loss': 0.6727, 'learning_rate': 4.5377e-05, 'epoch': 0.61, 'throughput': 9995.79} [INFO|2025-03-19 13:17:24] logging.py:143 >> {'loss': 0.7440, 'learning_rate': 4.5369e-05, 'epoch': 0.61, 'throughput': 9996.14} [INFO|2025-03-19 13:18:03] logging.py:143 >> {'loss': 0.6974, 'learning_rate': 4.5360e-05, 'epoch': 0.61, 'throughput': 9996.42} [INFO|2025-03-19 13:18:44] logging.py:143 >> {'loss': 0.7076, 'learning_rate': 4.5352e-05, 'epoch': 0.61, 'throughput': 9996.29} [INFO|2025-03-19 13:19:24] logging.py:143 >> {'loss': 0.6887, 'learning_rate': 4.5344e-05, 'epoch': 0.61, 'throughput': 9996.42} [INFO|2025-03-19 13:20:04] logging.py:143 >> {'loss': 0.6836, 'learning_rate': 4.5336e-05, 'epoch': 0.61, 'throughput': 9996.37} [INFO|2025-03-19 13:20:45] logging.py:143 >> {'loss': 0.6835, 'learning_rate': 4.5328e-05, 'epoch': 0.61, 'throughput': 9996.46} [INFO|2025-03-19 13:21:27] logging.py:143 >> {'loss': 0.6405, 'learning_rate': 4.5320e-05, 'epoch': 0.61, 'throughput': 9996.38} [INFO|2025-03-19 13:22:08] logging.py:143 >> {'loss': 0.6778, 'learning_rate': 4.5312e-05, 'epoch': 0.61, 'throughput': 9996.51} [INFO|2025-03-19 13:22:47] logging.py:143 >> {'loss': 0.7232, 'learning_rate': 4.5303e-05, 'epoch': 0.61, 'throughput': 9996.64} [INFO|2025-03-19 13:23:27] logging.py:143 >> {'loss': 0.6705, 'learning_rate': 4.5295e-05, 'epoch': 0.61, 'throughput': 9996.82} [INFO|2025-03-19 13:24:09] logging.py:143 >> {'loss': 0.6443, 'learning_rate': 4.5287e-05, 'epoch': 0.61, 'throughput': 9996.91} [INFO|2025-03-19 13:24:49] logging.py:143 >> {'loss': 0.6701, 'learning_rate': 4.5279e-05, 'epoch': 0.61, 'throughput': 9996.70} [INFO|2025-03-19 13:25:30] logging.py:143 >> {'loss': 0.7047, 'learning_rate': 4.5271e-05, 'epoch': 0.61, 'throughput': 9996.51} [INFO|2025-03-19 13:26:10] logging.py:143 >> {'loss': 0.6811, 'learning_rate': 4.5263e-05, 'epoch': 0.61, 'throughput': 9996.61} [INFO|2025-03-19 13:26:50] logging.py:143 >> {'loss': 0.7161, 'learning_rate': 4.5254e-05, 'epoch': 0.62, 'throughput': 9996.88} [INFO|2025-03-19 13:27:31] logging.py:143 >> {'loss': 0.6897, 'learning_rate': 4.5246e-05, 'epoch': 0.62, 'throughput': 9997.09} [INFO|2025-03-19 13:28:10] logging.py:143 >> {'loss': 0.6702, 'learning_rate': 4.5238e-05, 'epoch': 0.62, 'throughput': 9997.32} [INFO|2025-03-19 13:28:49] logging.py:143 >> {'loss': 0.6420, 'learning_rate': 4.5230e-05, 'epoch': 0.62, 'throughput': 9997.28} [INFO|2025-03-19 13:29:30] logging.py:143 >> {'loss': 0.6659, 'learning_rate': 4.5221e-05, 'epoch': 0.62, 'throughput': 9997.35} [INFO|2025-03-19 13:30:11] logging.py:143 >> {'loss': 0.6814, 'learning_rate': 4.5213e-05, 'epoch': 0.62, 'throughput': 9997.48} [INFO|2025-03-19 13:30:53] logging.py:143 >> {'loss': 0.6932, 'learning_rate': 4.5205e-05, 'epoch': 0.62, 'throughput': 9997.33} [INFO|2025-03-19 13:31:33] logging.py:143 >> {'loss': 0.6737, 'learning_rate': 4.5197e-05, 'epoch': 0.62, 'throughput': 9997.28} [INFO|2025-03-19 13:32:14] logging.py:143 >> {'loss': 0.6968, 'learning_rate': 4.5189e-05, 'epoch': 0.62, 'throughput': 9997.37} [INFO|2025-03-19 13:32:55] logging.py:143 >> {'loss': 0.6795, 'learning_rate': 4.5180e-05, 'epoch': 0.62, 'throughput': 9997.12} [INFO|2025-03-19 13:33:36] logging.py:143 >> {'loss': 0.6578, 'learning_rate': 4.5172e-05, 'epoch': 0.62, 'throughput': 9996.89} [INFO|2025-03-19 13:34:16] logging.py:143 >> {'loss': 0.6620, 'learning_rate': 4.5164e-05, 'epoch': 0.62, 'throughput': 9997.00} [INFO|2025-03-19 13:34:56] logging.py:143 >> {'loss': 0.6946, 'learning_rate': 4.5155e-05, 'epoch': 0.62, 'throughput': 9997.16} [INFO|2025-03-19 13:35:37] logging.py:143 >> {'loss': 0.6538, 'learning_rate': 4.5147e-05, 'epoch': 0.62, 'throughput': 9997.06} [INFO|2025-03-19 13:36:17] logging.py:143 >> {'loss': 0.6954, 'learning_rate': 4.5139e-05, 'epoch': 0.62, 'throughput': 9997.23} [INFO|2025-03-19 13:36:58] logging.py:143 >> {'loss': 0.6720, 'learning_rate': 4.5131e-05, 'epoch': 0.62, 'throughput': 9997.20} [INFO|2025-03-19 13:37:40] logging.py:143 >> {'loss': 0.6626, 'learning_rate': 4.5122e-05, 'epoch': 0.62, 'throughput': 9997.07} [INFO|2025-03-19 13:38:22] logging.py:143 >> {'loss': 0.6694, 'learning_rate': 4.5114e-05, 'epoch': 0.62, 'throughput': 9996.92} [INFO|2025-03-19 13:39:02] logging.py:143 >> {'loss': 0.6862, 'learning_rate': 4.5106e-05, 'epoch': 0.62, 'throughput': 9996.91} [INFO|2025-03-19 13:39:42] logging.py:143 >> {'loss': 0.6729, 'learning_rate': 4.5097e-05, 'epoch': 0.63, 'throughput': 9996.95} [INFO|2025-03-19 13:40:23] logging.py:143 >> {'loss': 0.6507, 'learning_rate': 4.5089e-05, 'epoch': 0.63, 'throughput': 9996.75} [INFO|2025-03-19 13:41:03] logging.py:143 >> {'loss': 0.6744, 'learning_rate': 4.5081e-05, 'epoch': 0.63, 'throughput': 9996.73} [INFO|2025-03-19 13:41:44] logging.py:143 >> {'loss': 0.7022, 'learning_rate': 4.5072e-05, 'epoch': 0.63, 'throughput': 9996.73} [INFO|2025-03-19 13:42:24] logging.py:143 >> {'loss': 0.6976, 'learning_rate': 4.5064e-05, 'epoch': 0.63, 'throughput': 9996.79} [INFO|2025-03-19 13:43:04] logging.py:143 >> {'loss': 0.6959, 'learning_rate': 4.5056e-05, 'epoch': 0.63, 'throughput': 9996.78} [INFO|2025-03-19 13:43:46] logging.py:143 >> {'loss': 0.6672, 'learning_rate': 4.5047e-05, 'epoch': 0.63, 'throughput': 9996.59} [INFO|2025-03-19 13:44:26] logging.py:143 >> {'loss': 0.6653, 'learning_rate': 4.5039e-05, 'epoch': 0.63, 'throughput': 9996.73} [INFO|2025-03-19 13:45:08] logging.py:143 >> {'loss': 0.6830, 'learning_rate': 4.5031e-05, 'epoch': 0.63, 'throughput': 9996.65} [INFO|2025-03-19 13:45:47] logging.py:143 >> {'loss': 0.6689, 'learning_rate': 4.5022e-05, 'epoch': 0.63, 'throughput': 9996.74} [INFO|2025-03-19 13:46:28] logging.py:143 >> {'loss': 0.6983, 'learning_rate': 4.5014e-05, 'epoch': 0.63, 'throughput': 9996.72} [INFO|2025-03-19 13:47:08] logging.py:143 >> {'loss': 0.6623, 'learning_rate': 4.5005e-05, 'epoch': 0.63, 'throughput': 9996.64} [INFO|2025-03-19 13:47:48] logging.py:143 >> {'loss': 0.6634, 'learning_rate': 4.4997e-05, 'epoch': 0.63, 'throughput': 9996.52} [INFO|2025-03-19 13:48:28] logging.py:143 >> {'loss': 0.7305, 'learning_rate': 4.4989e-05, 'epoch': 0.63, 'throughput': 9996.83} [INFO|2025-03-19 13:49:08] logging.py:143 >> {'loss': 0.6672, 'learning_rate': 4.4980e-05, 'epoch': 0.63, 'throughput': 9996.83} [INFO|2025-03-19 13:49:49] logging.py:143 >> {'loss': 0.6762, 'learning_rate': 4.4972e-05, 'epoch': 0.63, 'throughput': 9997.02} [INFO|2025-03-19 13:50:31] logging.py:143 >> {'loss': 0.6672, 'learning_rate': 4.4963e-05, 'epoch': 0.63, 'throughput': 9996.94} [INFO|2025-03-19 13:51:11] logging.py:143 >> {'loss': 0.6883, 'learning_rate': 4.4955e-05, 'epoch': 0.63, 'throughput': 9997.11} [INFO|2025-03-19 13:51:52] logging.py:143 >> {'loss': 0.6483, 'learning_rate': 4.4947e-05, 'epoch': 0.63, 'throughput': 9997.25} [INFO|2025-03-19 13:52:31] logging.py:143 >> {'loss': 0.6656, 'learning_rate': 4.4938e-05, 'epoch': 0.64, 'throughput': 9997.26} [INFO|2025-03-19 13:53:10] logging.py:143 >> {'loss': 0.6878, 'learning_rate': 4.4930e-05, 'epoch': 0.64, 'throughput': 9997.65} [INFO|2025-03-19 13:53:51] logging.py:143 >> {'loss': 0.6910, 'learning_rate': 4.4921e-05, 'epoch': 0.64, 'throughput': 9997.63} [INFO|2025-03-19 13:54:32] logging.py:143 >> {'loss': 0.6857, 'learning_rate': 4.4913e-05, 'epoch': 0.64, 'throughput': 9997.47} [INFO|2025-03-19 13:55:13] logging.py:143 >> {'loss': 0.6637, 'learning_rate': 4.4904e-05, 'epoch': 0.64, 'throughput': 9997.51} [INFO|2025-03-19 13:55:54] logging.py:143 >> {'loss': 0.6711, 'learning_rate': 4.4896e-05, 'epoch': 0.64, 'throughput': 9997.48} [INFO|2025-03-19 13:56:33] logging.py:143 >> {'loss': 0.6946, 'learning_rate': 4.4887e-05, 'epoch': 0.64, 'throughput': 9997.53} [INFO|2025-03-19 13:57:13] logging.py:143 >> {'loss': 0.6673, 'learning_rate': 4.4879e-05, 'epoch': 0.64, 'throughput': 9997.44} [INFO|2025-03-19 13:57:56] logging.py:143 >> {'loss': 0.6850, 'learning_rate': 4.4870e-05, 'epoch': 0.64, 'throughput': 9997.18} [INFO|2025-03-19 13:58:36] logging.py:143 >> {'loss': 0.6969, 'learning_rate': 4.4862e-05, 'epoch': 0.64, 'throughput': 9997.44} [INFO|2025-03-19 13:59:16] logging.py:143 >> {'loss': 0.6803, 'learning_rate': 4.4853e-05, 'epoch': 0.64, 'throughput': 9997.76} [INFO|2025-03-19 13:59:55] logging.py:143 >> {'loss': 0.6760, 'learning_rate': 4.4845e-05, 'epoch': 0.64, 'throughput': 9997.77} [INFO|2025-03-19 14:00:37] logging.py:143 >> {'loss': 0.6893, 'learning_rate': 4.4836e-05, 'epoch': 0.64, 'throughput': 9997.66} [INFO|2025-03-19 14:01:17] logging.py:143 >> {'loss': 0.6761, 'learning_rate': 4.4828e-05, 'epoch': 0.64, 'throughput': 9997.64} [INFO|2025-03-19 14:01:57] logging.py:143 >> {'loss': 0.6748, 'learning_rate': 4.4819e-05, 'epoch': 0.64, 'throughput': 9997.44} [INFO|2025-03-19 14:02:38] logging.py:143 >> {'loss': 0.7301, 'learning_rate': 4.4811e-05, 'epoch': 0.64, 'throughput': 9997.42} [INFO|2025-03-19 14:03:18] logging.py:143 >> {'loss': 0.6641, 'learning_rate': 4.4802e-05, 'epoch': 0.64, 'throughput': 9997.52} [INFO|2025-03-19 14:03:59] logging.py:143 >> {'loss': 0.6741, 'learning_rate': 4.4794e-05, 'epoch': 0.64, 'throughput': 9997.29} [INFO|2025-03-19 14:04:40] logging.py:143 >> {'loss': 0.6779, 'learning_rate': 4.4785e-05, 'epoch': 0.64, 'throughput': 9997.45} [INFO|2025-03-19 14:05:21] logging.py:143 >> {'loss': 0.6943, 'learning_rate': 4.4777e-05, 'epoch': 0.65, 'throughput': 9997.40} [INFO|2025-03-19 14:06:01] logging.py:143 >> {'loss': 0.7021, 'learning_rate': 4.4768e-05, 'epoch': 0.65, 'throughput': 9997.39} [INFO|2025-03-19 14:06:42] logging.py:143 >> {'loss': 0.7256, 'learning_rate': 4.4759e-05, 'epoch': 0.65, 'throughput': 9997.41} [INFO|2025-03-19 14:07:23] logging.py:143 >> {'loss': 0.6846, 'learning_rate': 4.4751e-05, 'epoch': 0.65, 'throughput': 9997.45} [INFO|2025-03-19 14:08:04] logging.py:143 >> {'loss': 0.6812, 'learning_rate': 4.4742e-05, 'epoch': 0.65, 'throughput': 9997.62} [INFO|2025-03-19 14:08:44] logging.py:143 >> {'loss': 0.6711, 'learning_rate': 4.4734e-05, 'epoch': 0.65, 'throughput': 9997.53} [INFO|2025-03-19 14:09:26] logging.py:143 >> {'loss': 0.6652, 'learning_rate': 4.4725e-05, 'epoch': 0.65, 'throughput': 9997.40} [INFO|2025-03-19 14:10:07] logging.py:143 >> {'loss': 0.6402, 'learning_rate': 4.4716e-05, 'epoch': 0.65, 'throughput': 9997.40} [INFO|2025-03-19 14:10:47] logging.py:143 >> {'loss': 0.6914, 'learning_rate': 4.4708e-05, 'epoch': 0.65, 'throughput': 9997.29} [INFO|2025-03-19 14:11:26] logging.py:143 >> {'loss': 0.6733, 'learning_rate': 4.4699e-05, 'epoch': 0.65, 'throughput': 9997.43} [INFO|2025-03-19 14:12:07] logging.py:143 >> {'loss': 0.6660, 'learning_rate': 4.4691e-05, 'epoch': 0.65, 'throughput': 9997.57} [INFO|2025-03-19 14:12:48] logging.py:143 >> {'loss': 0.6556, 'learning_rate': 4.4682e-05, 'epoch': 0.65, 'throughput': 9997.56} [INFO|2025-03-19 14:13:28] logging.py:143 >> {'loss': 0.6916, 'learning_rate': 4.4673e-05, 'epoch': 0.65, 'throughput': 9997.70} [INFO|2025-03-19 14:14:09] logging.py:143 >> {'loss': 0.6743, 'learning_rate': 4.4665e-05, 'epoch': 0.65, 'throughput': 9997.60} [INFO|2025-03-19 14:14:50] logging.py:143 >> {'loss': 0.7112, 'learning_rate': 4.4656e-05, 'epoch': 0.65, 'throughput': 9997.52} [INFO|2025-03-19 14:15:30] logging.py:143 >> {'loss': 0.6636, 'learning_rate': 4.4647e-05, 'epoch': 0.65, 'throughput': 9997.40} [INFO|2025-03-19 14:16:11] logging.py:143 >> {'loss': 0.6893, 'learning_rate': 4.4639e-05, 'epoch': 0.65, 'throughput': 9997.39} [INFO|2025-03-19 14:16:50] logging.py:143 >> {'loss': 0.7056, 'learning_rate': 4.4630e-05, 'epoch': 0.65, 'throughput': 9997.53} [INFO|2025-03-19 14:17:31] logging.py:143 >> {'loss': 0.6781, 'learning_rate': 4.4621e-05, 'epoch': 0.65, 'throughput': 9997.60} [INFO|2025-03-19 14:18:12] logging.py:143 >> {'loss': 0.6749, 'learning_rate': 4.4613e-05, 'epoch': 0.66, 'throughput': 9997.58} [INFO|2025-03-19 14:18:51] logging.py:143 >> {'loss': 0.6619, 'learning_rate': 4.4604e-05, 'epoch': 0.66, 'throughput': 9997.69} [INFO|2025-03-19 14:19:31] logging.py:143 >> {'loss': 0.6909, 'learning_rate': 4.4595e-05, 'epoch': 0.66, 'throughput': 9997.90} [INFO|2025-03-19 14:20:12] logging.py:143 >> {'loss': 0.6502, 'learning_rate': 4.4587e-05, 'epoch': 0.66, 'throughput': 9997.94} [INFO|2025-03-19 14:20:53] logging.py:143 >> {'loss': 0.6957, 'learning_rate': 4.4578e-05, 'epoch': 0.66, 'throughput': 9997.91} [INFO|2025-03-19 14:21:32] logging.py:143 >> {'loss': 0.6730, 'learning_rate': 4.4569e-05, 'epoch': 0.66, 'throughput': 9997.93} [INFO|2025-03-19 14:22:13] logging.py:143 >> {'loss': 0.6934, 'learning_rate': 4.4561e-05, 'epoch': 0.66, 'throughput': 9997.96} [INFO|2025-03-19 14:22:53] logging.py:143 >> {'loss': 0.6952, 'learning_rate': 4.4552e-05, 'epoch': 0.66, 'throughput': 9998.02} [INFO|2025-03-19 14:23:33] logging.py:143 >> {'loss': 0.6767, 'learning_rate': 4.4543e-05, 'epoch': 0.66, 'throughput': 9998.10} [INFO|2025-03-19 14:24:13] logging.py:143 >> {'loss': 0.6479, 'learning_rate': 4.4534e-05, 'epoch': 0.66, 'throughput': 9998.26} [INFO|2025-03-19 14:24:54] logging.py:143 >> {'loss': 0.6579, 'learning_rate': 4.4526e-05, 'epoch': 0.66, 'throughput': 9997.97} [INFO|2025-03-19 14:25:35] logging.py:143 >> {'loss': 0.6796, 'learning_rate': 4.4517e-05, 'epoch': 0.66, 'throughput': 9997.87} [INFO|2025-03-19 14:26:16] logging.py:143 >> {'loss': 0.7079, 'learning_rate': 4.4508e-05, 'epoch': 0.66, 'throughput': 9997.91} [INFO|2025-03-19 14:26:57] logging.py:143 >> {'loss': 0.6972, 'learning_rate': 4.4499e-05, 'epoch': 0.66, 'throughput': 9997.91} [INFO|2025-03-19 14:27:36] logging.py:143 >> {'loss': 0.6825, 'learning_rate': 4.4491e-05, 'epoch': 0.66, 'throughput': 9998.08} [INFO|2025-03-19 14:28:16] logging.py:143 >> {'loss': 0.6645, 'learning_rate': 4.4482e-05, 'epoch': 0.66, 'throughput': 9998.12} [INFO|2025-03-19 14:28:57] logging.py:143 >> {'loss': 0.6646, 'learning_rate': 4.4473e-05, 'epoch': 0.66, 'throughput': 9998.17} [INFO|2025-03-19 14:29:37] logging.py:143 >> {'loss': 0.6742, 'learning_rate': 4.4464e-05, 'epoch': 0.66, 'throughput': 9998.23} [INFO|2025-03-19 14:30:16] logging.py:143 >> {'loss': 0.6747, 'learning_rate': 4.4456e-05, 'epoch': 0.67, 'throughput': 9998.43} [INFO|2025-03-19 14:30:56] logging.py:143 >> {'loss': 0.6547, 'learning_rate': 4.4447e-05, 'epoch': 0.67, 'throughput': 9998.56} [INFO|2025-03-19 14:31:37] logging.py:143 >> {'loss': 0.6920, 'learning_rate': 4.4438e-05, 'epoch': 0.67, 'throughput': 9998.72} [INFO|2025-03-19 14:32:16] logging.py:143 >> {'loss': 0.6777, 'learning_rate': 4.4429e-05, 'epoch': 0.67, 'throughput': 9998.98} [INFO|2025-03-19 14:32:57] logging.py:143 >> {'loss': 0.6907, 'learning_rate': 4.4420e-05, 'epoch': 0.67, 'throughput': 9998.91} [INFO|2025-03-19 14:33:38] logging.py:143 >> {'loss': 0.6466, 'learning_rate': 4.4412e-05, 'epoch': 0.67, 'throughput': 9998.92} [INFO|2025-03-19 14:34:19] logging.py:143 >> {'loss': 0.6415, 'learning_rate': 4.4403e-05, 'epoch': 0.67, 'throughput': 9998.73} [INFO|2025-03-19 14:35:00] logging.py:143 >> {'loss': 0.6494, 'learning_rate': 4.4394e-05, 'epoch': 0.67, 'throughput': 9998.66} [INFO|2025-03-19 14:35:40] logging.py:143 >> {'loss': 0.6888, 'learning_rate': 4.4385e-05, 'epoch': 0.67, 'throughput': 9998.61} [INFO|2025-03-19 14:36:20] logging.py:143 >> {'loss': 0.6549, 'learning_rate': 4.4376e-05, 'epoch': 0.67, 'throughput': 9998.62} [INFO|2025-03-19 14:37:01] logging.py:143 >> {'loss': 0.6758, 'learning_rate': 4.4367e-05, 'epoch': 0.67, 'throughput': 9998.63} [INFO|2025-03-19 14:37:40] logging.py:143 >> {'loss': 0.7012, 'learning_rate': 4.4359e-05, 'epoch': 0.67, 'throughput': 9998.63} [INFO|2025-03-19 14:38:23] logging.py:143 >> {'loss': 0.6550, 'learning_rate': 4.4350e-05, 'epoch': 0.67, 'throughput': 9998.44} [INFO|2025-03-19 14:39:04] logging.py:143 >> {'loss': 0.6634, 'learning_rate': 4.4341e-05, 'epoch': 0.67, 'throughput': 9998.19} [INFO|2025-03-19 14:39:44] logging.py:143 >> {'loss': 0.6660, 'learning_rate': 4.4332e-05, 'epoch': 0.67, 'throughput': 9998.19} [INFO|2025-03-19 14:40:24] logging.py:143 >> {'loss': 0.6925, 'learning_rate': 4.4323e-05, 'epoch': 0.67, 'throughput': 9998.44} [INFO|2025-03-19 14:41:05] logging.py:143 >> {'loss': 0.6589, 'learning_rate': 4.4314e-05, 'epoch': 0.67, 'throughput': 9998.37} [INFO|2025-03-19 14:41:44] logging.py:143 >> {'loss': 0.6957, 'learning_rate': 4.4305e-05, 'epoch': 0.67, 'throughput': 9998.66} [INFO|2025-03-19 14:42:24] logging.py:143 >> {'loss': 0.6490, 'learning_rate': 4.4296e-05, 'epoch': 0.67, 'throughput': 9998.81} [INFO|2025-03-19 14:43:05] logging.py:143 >> {'loss': 0.6375, 'learning_rate': 4.4288e-05, 'epoch': 0.68, 'throughput': 9998.94} [INFO|2025-03-19 14:43:44] logging.py:143 >> {'loss': 0.6383, 'learning_rate': 4.4279e-05, 'epoch': 0.68, 'throughput': 9999.13} [INFO|2025-03-19 14:44:23] logging.py:143 >> {'loss': 0.6486, 'learning_rate': 4.4270e-05, 'epoch': 0.68, 'throughput': 9999.18} [INFO|2025-03-19 14:45:03] logging.py:143 >> {'loss': 0.6728, 'learning_rate': 4.4261e-05, 'epoch': 0.68, 'throughput': 9999.19} [INFO|2025-03-19 14:45:42] logging.py:143 >> {'loss': 0.6710, 'learning_rate': 4.4252e-05, 'epoch': 0.68, 'throughput': 9999.17} [INFO|2025-03-19 14:46:22] logging.py:143 >> {'loss': 0.6636, 'learning_rate': 4.4243e-05, 'epoch': 0.68, 'throughput': 9999.16} [INFO|2025-03-19 14:47:02] logging.py:143 >> {'loss': 0.6459, 'learning_rate': 4.4234e-05, 'epoch': 0.68, 'throughput': 9999.06} [INFO|2025-03-19 14:47:44] logging.py:143 >> {'loss': 0.7087, 'learning_rate': 4.4225e-05, 'epoch': 0.68, 'throughput': 9999.06} [INFO|2025-03-19 14:48:25] logging.py:143 >> {'loss': 0.6879, 'learning_rate': 4.4216e-05, 'epoch': 0.68, 'throughput': 9999.02} [INFO|2025-03-19 14:49:05] logging.py:143 >> {'loss': 0.6678, 'learning_rate': 4.4207e-05, 'epoch': 0.68, 'throughput': 9999.22} [INFO|2025-03-19 14:49:46] logging.py:143 >> {'loss': 0.6638, 'learning_rate': 4.4198e-05, 'epoch': 0.68, 'throughput': 9998.97} [INFO|2025-03-19 14:50:27] logging.py:143 >> {'loss': 0.6548, 'learning_rate': 4.4189e-05, 'epoch': 0.68, 'throughput': 9998.93} [INFO|2025-03-19 14:51:07] logging.py:143 >> {'loss': 0.6811, 'learning_rate': 4.4180e-05, 'epoch': 0.68, 'throughput': 9998.90} [INFO|2025-03-19 14:51:49] logging.py:143 >> {'loss': 0.6539, 'learning_rate': 4.4171e-05, 'epoch': 0.68, 'throughput': 9998.75} [INFO|2025-03-19 14:52:29] logging.py:143 >> {'loss': 0.6700, 'learning_rate': 4.4162e-05, 'epoch': 0.68, 'throughput': 9998.74} [INFO|2025-03-19 14:53:09] logging.py:143 >> {'loss': 0.7083, 'learning_rate': 4.4153e-05, 'epoch': 0.68, 'throughput': 9998.85} [INFO|2025-03-19 14:53:49] logging.py:143 >> {'loss': 0.6721, 'learning_rate': 4.4144e-05, 'epoch': 0.68, 'throughput': 9998.93} [INFO|2025-03-19 14:54:30] logging.py:143 >> {'loss': 0.6809, 'learning_rate': 4.4135e-05, 'epoch': 0.68, 'throughput': 9998.82} [INFO|2025-03-19 14:55:10] logging.py:143 >> {'loss': 0.7154, 'learning_rate': 4.4126e-05, 'epoch': 0.68, 'throughput': 9998.94} [INFO|2025-03-19 14:55:49] logging.py:143 >> {'loss': 0.6435, 'learning_rate': 4.4117e-05, 'epoch': 0.69, 'throughput': 9998.93} [INFO|2025-03-19 14:56:31] logging.py:143 >> {'loss': 0.6684, 'learning_rate': 4.4108e-05, 'epoch': 0.69, 'throughput': 9998.60} [INFO|2025-03-19 14:57:10] logging.py:143 >> {'loss': 0.7176, 'learning_rate': 4.4099e-05, 'epoch': 0.69, 'throughput': 9998.69} [INFO|2025-03-19 14:57:51] logging.py:143 >> {'loss': 0.6795, 'learning_rate': 4.4090e-05, 'epoch': 0.69, 'throughput': 9998.80} [INFO|2025-03-19 14:58:31] logging.py:143 >> {'loss': 0.6484, 'learning_rate': 4.4081e-05, 'epoch': 0.69, 'throughput': 9998.49} [INFO|2025-03-19 14:59:10] logging.py:143 >> {'loss': 0.6656, 'learning_rate': 4.4072e-05, 'epoch': 0.69, 'throughput': 9998.65} [INFO|2025-03-19 14:59:51] logging.py:143 >> {'loss': 0.6096, 'learning_rate': 4.4063e-05, 'epoch': 0.69, 'throughput': 9998.46} [INFO|2025-03-19 15:00:33] logging.py:143 >> {'loss': 0.6570, 'learning_rate': 4.4054e-05, 'epoch': 0.69, 'throughput': 9998.33} [INFO|2025-03-19 15:01:13] logging.py:143 >> {'loss': 0.6886, 'learning_rate': 4.4045e-05, 'epoch': 0.69, 'throughput': 9998.44} [INFO|2025-03-19 15:01:52] logging.py:143 >> {'loss': 0.6535, 'learning_rate': 4.4036e-05, 'epoch': 0.69, 'throughput': 9998.35} [INFO|2025-03-19 15:02:34] logging.py:143 >> {'loss': 0.6871, 'learning_rate': 4.4027e-05, 'epoch': 0.69, 'throughput': 9998.25} [INFO|2025-03-19 15:03:14] logging.py:143 >> {'loss': 0.6277, 'learning_rate': 4.4018e-05, 'epoch': 0.69, 'throughput': 9998.25} [INFO|2025-03-19 15:03:55] logging.py:143 >> {'loss': 0.6736, 'learning_rate': 4.4009e-05, 'epoch': 0.69, 'throughput': 9998.15} [INFO|2025-03-19 15:04:35] logging.py:143 >> {'loss': 0.6848, 'learning_rate': 4.4000e-05, 'epoch': 0.69, 'throughput': 9998.17} [INFO|2025-03-19 15:05:15] logging.py:143 >> {'loss': 0.6911, 'learning_rate': 4.3990e-05, 'epoch': 0.69, 'throughput': 9998.42} [INFO|2025-03-19 15:05:56] logging.py:143 >> {'loss': 0.6988, 'learning_rate': 4.3981e-05, 'epoch': 0.69, 'throughput': 9998.66} [INFO|2025-03-19 15:06:35] logging.py:143 >> {'loss': 0.6713, 'learning_rate': 4.3972e-05, 'epoch': 0.69, 'throughput': 9998.77} [INFO|2025-03-19 15:07:17] logging.py:143 >> {'loss': 0.6487, 'learning_rate': 4.3963e-05, 'epoch': 0.69, 'throughput': 9998.67} [INFO|2025-03-19 15:07:57] logging.py:143 >> {'loss': 0.6813, 'learning_rate': 4.3954e-05, 'epoch': 0.69, 'throughput': 9998.92} [INFO|2025-03-19 15:08:36] logging.py:143 >> {'loss': 0.6774, 'learning_rate': 4.3945e-05, 'epoch': 0.70, 'throughput': 9998.90} [INFO|2025-03-19 15:09:17] logging.py:143 >> {'loss': 0.6709, 'learning_rate': 4.3936e-05, 'epoch': 0.70, 'throughput': 9998.96} [INFO|2025-03-19 15:09:57] logging.py:143 >> {'loss': 0.6726, 'learning_rate': 4.3927e-05, 'epoch': 0.70, 'throughput': 9998.85} [INFO|2025-03-19 15:10:39] logging.py:143 >> {'loss': 0.6852, 'learning_rate': 4.3917e-05, 'epoch': 0.70, 'throughput': 9998.77} [INFO|2025-03-19 15:11:19] logging.py:143 >> {'loss': 0.6582, 'learning_rate': 4.3908e-05, 'epoch': 0.70, 'throughput': 9998.77} [INFO|2025-03-19 15:11:58] logging.py:143 >> {'loss': 0.6845, 'learning_rate': 4.3899e-05, 'epoch': 0.70, 'throughput': 9998.93} [INFO|2025-03-19 15:12:38] logging.py:143 >> {'loss': 0.6667, 'learning_rate': 4.3890e-05, 'epoch': 0.70, 'throughput': 9999.10} [INFO|2025-03-19 15:13:20] logging.py:143 >> {'loss': 0.6393, 'learning_rate': 4.3881e-05, 'epoch': 0.70, 'throughput': 9998.99} [INFO|2025-03-19 15:14:01] logging.py:143 >> {'loss': 0.6988, 'learning_rate': 4.3872e-05, 'epoch': 0.70, 'throughput': 9999.23} [INFO|2025-03-19 15:14:41] logging.py:143 >> {'loss': 0.6776, 'learning_rate': 4.3862e-05, 'epoch': 0.70, 'throughput': 9999.13} [INFO|2025-03-19 15:15:21] logging.py:143 >> {'loss': 0.6817, 'learning_rate': 4.3853e-05, 'epoch': 0.70, 'throughput': 9999.09} [INFO|2025-03-19 15:16:01] logging.py:143 >> {'loss': 0.6729, 'learning_rate': 4.3844e-05, 'epoch': 0.70, 'throughput': 9999.39} [INFO|2025-03-19 15:16:42] logging.py:143 >> {'loss': 0.6694, 'learning_rate': 4.3835e-05, 'epoch': 0.70, 'throughput': 9999.30} [INFO|2025-03-19 15:17:23] logging.py:143 >> {'loss': 0.6665, 'learning_rate': 4.3826e-05, 'epoch': 0.70, 'throughput': 9999.38} [INFO|2025-03-19 15:18:02] logging.py:143 >> {'loss': 0.6993, 'learning_rate': 4.3816e-05, 'epoch': 0.70, 'throughput': 9999.42} [INFO|2025-03-19 15:18:42] logging.py:143 >> {'loss': 0.6585, 'learning_rate': 4.3807e-05, 'epoch': 0.70, 'throughput': 9999.49} [INFO|2025-03-19 15:19:22] logging.py:143 >> {'loss': 0.6417, 'learning_rate': 4.3798e-05, 'epoch': 0.70, 'throughput': 9999.41} [INFO|2025-03-19 15:20:03] logging.py:143 >> {'loss': 0.6412, 'learning_rate': 4.3789e-05, 'epoch': 0.70, 'throughput': 9999.37} [INFO|2025-03-19 15:20:43] logging.py:143 >> {'loss': 0.6881, 'learning_rate': 4.3780e-05, 'epoch': 0.70, 'throughput': 9999.47} [INFO|2025-03-19 15:21:24] logging.py:143 >> {'loss': 0.6953, 'learning_rate': 4.3770e-05, 'epoch': 0.71, 'throughput': 9999.42} [INFO|2025-03-19 15:22:02] logging.py:143 >> {'loss': 0.6462, 'learning_rate': 4.3761e-05, 'epoch': 0.71, 'throughput': 9999.66} [INFO|2025-03-19 15:22:43] logging.py:143 >> {'loss': 0.6922, 'learning_rate': 4.3752e-05, 'epoch': 0.71, 'throughput': 9999.67} [INFO|2025-03-19 15:23:23] logging.py:143 >> {'loss': 0.6449, 'learning_rate': 4.3743e-05, 'epoch': 0.71, 'throughput': 9999.72} [INFO|2025-03-19 15:24:03] logging.py:143 >> {'loss': 0.6559, 'learning_rate': 4.3733e-05, 'epoch': 0.71, 'throughput': 9999.76} [INFO|2025-03-19 15:24:43] logging.py:143 >> {'loss': 0.6927, 'learning_rate': 4.3724e-05, 'epoch': 0.71, 'throughput': 9999.87} [INFO|2025-03-19 15:25:22] logging.py:143 >> {'loss': 0.6165, 'learning_rate': 4.3715e-05, 'epoch': 0.71, 'throughput': 9999.82} [INFO|2025-03-19 15:26:02] logging.py:143 >> {'loss': 0.6937, 'learning_rate': 4.3705e-05, 'epoch': 0.71, 'throughput': 9999.97} [INFO|2025-03-19 15:26:42] logging.py:143 >> {'loss': 0.6727, 'learning_rate': 4.3696e-05, 'epoch': 0.71, 'throughput': 10000.26} [INFO|2025-03-19 15:27:22] logging.py:143 >> {'loss': 0.6265, 'learning_rate': 4.3687e-05, 'epoch': 0.71, 'throughput': 10000.21} [INFO|2025-03-19 15:28:03] logging.py:143 >> {'loss': 0.6756, 'learning_rate': 4.3678e-05, 'epoch': 0.71, 'throughput': 10000.20} [INFO|2025-03-19 15:28:44] logging.py:143 >> {'loss': 0.6476, 'learning_rate': 4.3668e-05, 'epoch': 0.71, 'throughput': 9999.99} [INFO|2025-03-19 15:29:26] logging.py:143 >> {'loss': 0.6859, 'learning_rate': 4.3659e-05, 'epoch': 0.71, 'throughput': 10000.13} [INFO|2025-03-19 15:30:07] logging.py:143 >> {'loss': 0.6428, 'learning_rate': 4.3650e-05, 'epoch': 0.71, 'throughput': 10000.01} [INFO|2025-03-19 15:30:46] logging.py:143 >> {'loss': 0.6300, 'learning_rate': 4.3640e-05, 'epoch': 0.71, 'throughput': 10000.18} [INFO|2025-03-19 15:31:28] logging.py:143 >> {'loss': 0.6690, 'learning_rate': 4.3631e-05, 'epoch': 0.71, 'throughput': 10000.19} [INFO|2025-03-19 15:32:08] logging.py:143 >> {'loss': 0.6732, 'learning_rate': 4.3622e-05, 'epoch': 0.71, 'throughput': 10000.29} [INFO|2025-03-19 15:32:48] logging.py:143 >> {'loss': 0.6757, 'learning_rate': 4.3612e-05, 'epoch': 0.71, 'throughput': 10000.34} [INFO|2025-03-19 15:33:28] logging.py:143 >> {'loss': 0.6357, 'learning_rate': 4.3603e-05, 'epoch': 0.71, 'throughput': 10000.32} [INFO|2025-03-19 15:34:07] logging.py:143 >> {'loss': 0.6643, 'learning_rate': 4.3594e-05, 'epoch': 0.72, 'throughput': 10000.52} [INFO|2025-03-19 15:34:48] logging.py:143 >> {'loss': 0.6566, 'learning_rate': 4.3584e-05, 'epoch': 0.72, 'throughput': 10000.26} [INFO|2025-03-19 15:35:29] logging.py:143 >> {'loss': 0.6877, 'learning_rate': 4.3575e-05, 'epoch': 0.72, 'throughput': 10000.26} [INFO|2025-03-19 15:36:09] logging.py:143 >> {'loss': 0.6675, 'learning_rate': 4.3566e-05, 'epoch': 0.72, 'throughput': 10000.24} [INFO|2025-03-19 15:36:49] logging.py:143 >> {'loss': 0.6862, 'learning_rate': 4.3556e-05, 'epoch': 0.72, 'throughput': 10000.23} [INFO|2025-03-19 15:37:30] logging.py:143 >> {'loss': 0.6361, 'learning_rate': 4.3547e-05, 'epoch': 0.72, 'throughput': 10000.17} [INFO|2025-03-19 15:38:10] logging.py:143 >> {'loss': 0.6779, 'learning_rate': 4.3537e-05, 'epoch': 0.72, 'throughput': 10000.35} [INFO|2025-03-19 15:38:52] logging.py:143 >> {'loss': 0.6449, 'learning_rate': 4.3528e-05, 'epoch': 0.72, 'throughput': 10000.03} [INFO|2025-03-19 15:39:32] logging.py:143 >> {'loss': 0.6813, 'learning_rate': 4.3519e-05, 'epoch': 0.72, 'throughput': 10000.11} [INFO|2025-03-19 15:40:14] logging.py:143 >> {'loss': 0.6815, 'learning_rate': 4.3509e-05, 'epoch': 0.72, 'throughput': 10000.24} [INFO|2025-03-19 15:40:54] logging.py:143 >> {'loss': 0.6687, 'learning_rate': 4.3500e-05, 'epoch': 0.72, 'throughput': 10000.13} [INFO|2025-03-19 15:41:35] logging.py:143 >> {'loss': 0.6949, 'learning_rate': 4.3490e-05, 'epoch': 0.72, 'throughput': 10000.30} [INFO|2025-03-19 15:42:16] logging.py:143 >> {'loss': 0.6620, 'learning_rate': 4.3481e-05, 'epoch': 0.72, 'throughput': 9999.96} [INFO|2025-03-19 15:42:57] logging.py:143 >> {'loss': 0.6334, 'learning_rate': 4.3472e-05, 'epoch': 0.72, 'throughput': 10000.00} [INFO|2025-03-19 15:43:37] logging.py:143 >> {'loss': 0.6894, 'learning_rate': 4.3462e-05, 'epoch': 0.72, 'throughput': 10000.15} [INFO|2025-03-19 15:44:17] logging.py:143 >> {'loss': 0.6448, 'learning_rate': 4.3453e-05, 'epoch': 0.72, 'throughput': 10000.29} [INFO|2025-03-19 15:44:57] logging.py:143 >> {'loss': 0.6459, 'learning_rate': 4.3443e-05, 'epoch': 0.72, 'throughput': 10000.32} [INFO|2025-03-19 15:45:37] logging.py:143 >> {'loss': 0.6546, 'learning_rate': 4.3434e-05, 'epoch': 0.72, 'throughput': 10000.53} [INFO|2025-03-19 15:46:17] logging.py:143 >> {'loss': 0.6569, 'learning_rate': 4.3424e-05, 'epoch': 0.72, 'throughput': 10000.60} [INFO|2025-03-19 15:46:57] logging.py:143 >> {'loss': 0.6606, 'learning_rate': 4.3415e-05, 'epoch': 0.73, 'throughput': 10000.62} [INFO|2025-03-19 15:47:37] logging.py:143 >> {'loss': 0.6869, 'learning_rate': 4.3405e-05, 'epoch': 0.73, 'throughput': 10000.65} [INFO|2025-03-19 15:48:19] logging.py:143 >> {'loss': 0.6597, 'learning_rate': 4.3396e-05, 'epoch': 0.73, 'throughput': 10000.42} [INFO|2025-03-19 15:48:59] logging.py:143 >> {'loss': 0.6697, 'learning_rate': 4.3386e-05, 'epoch': 0.73, 'throughput': 10000.60} [INFO|2025-03-19 15:49:40] logging.py:143 >> {'loss': 0.6462, 'learning_rate': 4.3377e-05, 'epoch': 0.73, 'throughput': 10000.63} [INFO|2025-03-19 15:50:20] logging.py:143 >> {'loss': 0.6677, 'learning_rate': 4.3367e-05, 'epoch': 0.73, 'throughput': 10000.58} [INFO|2025-03-19 15:51:02] logging.py:143 >> {'loss': 0.6773, 'learning_rate': 4.3358e-05, 'epoch': 0.73, 'throughput': 10000.48} [INFO|2025-03-19 15:51:42] logging.py:143 >> {'loss': 0.6759, 'learning_rate': 4.3348e-05, 'epoch': 0.73, 'throughput': 10000.48} [INFO|2025-03-19 15:52:23] logging.py:143 >> {'loss': 0.6918, 'learning_rate': 4.3339e-05, 'epoch': 0.73, 'throughput': 10000.48} [INFO|2025-03-19 15:53:04] logging.py:143 >> {'loss': 0.6322, 'learning_rate': 4.3329e-05, 'epoch': 0.73, 'throughput': 10000.42} [INFO|2025-03-19 15:53:44] logging.py:143 >> {'loss': 0.6560, 'learning_rate': 4.3320e-05, 'epoch': 0.73, 'throughput': 10000.49} [INFO|2025-03-19 15:54:25] logging.py:143 >> {'loss': 0.6493, 'learning_rate': 4.3310e-05, 'epoch': 0.73, 'throughput': 10000.47} [INFO|2025-03-19 15:55:08] logging.py:143 >> {'loss': 0.6636, 'learning_rate': 4.3301e-05, 'epoch': 0.73, 'throughput': 10000.26} [INFO|2025-03-19 15:55:48] logging.py:143 >> {'loss': 0.6506, 'learning_rate': 4.3291e-05, 'epoch': 0.73, 'throughput': 10000.32} [INFO|2025-03-19 15:56:28] logging.py:143 >> {'loss': 0.6774, 'learning_rate': 4.3282e-05, 'epoch': 0.73, 'throughput': 10000.50} [INFO|2025-03-19 15:57:09] logging.py:143 >> {'loss': 0.6430, 'learning_rate': 4.3272e-05, 'epoch': 0.73, 'throughput': 10000.33} [INFO|2025-03-19 15:57:49] logging.py:143 >> {'loss': 0.6287, 'learning_rate': 4.3263e-05, 'epoch': 0.73, 'throughput': 10000.28} [INFO|2025-03-19 15:58:29] logging.py:143 >> {'loss': 0.6867, 'learning_rate': 4.3253e-05, 'epoch': 0.73, 'throughput': 10000.39} [INFO|2025-03-19 15:59:10] logging.py:143 >> {'loss': 0.6733, 'learning_rate': 4.3244e-05, 'epoch': 0.74, 'throughput': 10000.37} [INFO|2025-03-19 15:59:50] logging.py:143 >> {'loss': 0.6497, 'learning_rate': 4.3234e-05, 'epoch': 0.74, 'throughput': 10000.26} [INFO|2025-03-19 16:00:30] logging.py:143 >> {'loss': 0.6668, 'learning_rate': 4.3224e-05, 'epoch': 0.74, 'throughput': 10000.20} [INFO|2025-03-19 16:01:11] logging.py:143 >> {'loss': 0.6552, 'learning_rate': 4.3215e-05, 'epoch': 0.74, 'throughput': 10000.48} [INFO|2025-03-19 16:01:52] logging.py:143 >> {'loss': 0.6674, 'learning_rate': 4.3205e-05, 'epoch': 0.74, 'throughput': 10000.50} [INFO|2025-03-19 16:02:31] logging.py:143 >> {'loss': 0.6532, 'learning_rate': 4.3196e-05, 'epoch': 0.74, 'throughput': 10000.59} [INFO|2025-03-19 16:03:13] logging.py:143 >> {'loss': 0.6407, 'learning_rate': 4.3186e-05, 'epoch': 0.74, 'throughput': 10000.43} [INFO|2025-03-19 16:03:54] logging.py:143 >> {'loss': 0.6352, 'learning_rate': 4.3176e-05, 'epoch': 0.74, 'throughput': 10000.21} [INFO|2025-03-19 16:04:35] logging.py:143 >> {'loss': 0.6777, 'learning_rate': 4.3167e-05, 'epoch': 0.74, 'throughput': 10000.09} [INFO|2025-03-19 16:05:14] logging.py:143 >> {'loss': 0.6670, 'learning_rate': 4.3157e-05, 'epoch': 0.74, 'throughput': 10000.35} [INFO|2025-03-19 16:05:57] logging.py:143 >> {'loss': 0.6542, 'learning_rate': 4.3148e-05, 'epoch': 0.74, 'throughput': 10000.16} [INFO|2025-03-19 16:06:37] logging.py:143 >> {'loss': 0.6599, 'learning_rate': 4.3138e-05, 'epoch': 0.74, 'throughput': 10000.13} [INFO|2025-03-19 16:07:16] logging.py:143 >> {'loss': 0.6633, 'learning_rate': 4.3128e-05, 'epoch': 0.74, 'throughput': 10000.51} [INFO|2025-03-19 16:07:58] logging.py:143 >> {'loss': 0.6302, 'learning_rate': 4.3119e-05, 'epoch': 0.74, 'throughput': 10000.41} [INFO|2025-03-19 16:08:40] logging.py:143 >> {'loss': 0.6414, 'learning_rate': 4.3109e-05, 'epoch': 0.74, 'throughput': 10000.50} [INFO|2025-03-19 16:09:20] logging.py:143 >> {'loss': 0.6463, 'learning_rate': 4.3099e-05, 'epoch': 0.74, 'throughput': 10000.44} [INFO|2025-03-19 16:10:02] logging.py:143 >> {'loss': 0.6406, 'learning_rate': 4.3090e-05, 'epoch': 0.74, 'throughput': 10000.37} [INFO|2025-03-19 16:10:42] logging.py:143 >> {'loss': 0.6828, 'learning_rate': 4.3080e-05, 'epoch': 0.74, 'throughput': 10000.33} [INFO|2025-03-19 16:11:24] logging.py:143 >> {'loss': 0.6724, 'learning_rate': 4.3070e-05, 'epoch': 0.74, 'throughput': 10000.15} [INFO|2025-03-19 16:12:03] logging.py:143 >> {'loss': 0.6386, 'learning_rate': 4.3061e-05, 'epoch': 0.75, 'throughput': 10000.04} [INFO|2025-03-19 16:12:42] logging.py:143 >> {'loss': 0.6671, 'learning_rate': 4.3051e-05, 'epoch': 0.75, 'throughput': 10000.32} [INFO|2025-03-19 16:13:24] logging.py:143 >> {'loss': 0.6846, 'learning_rate': 4.3041e-05, 'epoch': 0.75, 'throughput': 10000.36} [INFO|2025-03-19 16:14:04] logging.py:143 >> {'loss': 0.6433, 'learning_rate': 4.3032e-05, 'epoch': 0.75, 'throughput': 10000.19} [INFO|2025-03-19 16:14:45] logging.py:143 >> {'loss': 0.6797, 'learning_rate': 4.3022e-05, 'epoch': 0.75, 'throughput': 10000.21} [INFO|2025-03-19 16:15:25] logging.py:143 >> {'loss': 0.6912, 'learning_rate': 4.3012e-05, 'epoch': 0.75, 'throughput': 10000.34} [INFO|2025-03-19 16:16:06] logging.py:143 >> {'loss': 0.6551, 'learning_rate': 4.3003e-05, 'epoch': 0.75, 'throughput': 10000.34} [INFO|2025-03-19 16:16:47] logging.py:143 >> {'loss': 0.6838, 'learning_rate': 4.2993e-05, 'epoch': 0.75, 'throughput': 10000.26} [INFO|2025-03-19 16:17:27] logging.py:143 >> {'loss': 0.6802, 'learning_rate': 4.2983e-05, 'epoch': 0.75, 'throughput': 10000.45} [INFO|2025-03-19 16:18:07] logging.py:143 >> {'loss': 0.6521, 'learning_rate': 4.2973e-05, 'epoch': 0.75, 'throughput': 10000.38} [INFO|2025-03-19 16:18:47] logging.py:143 >> {'loss': 0.6754, 'learning_rate': 4.2964e-05, 'epoch': 0.75, 'throughput': 10000.51} [INFO|2025-03-19 16:19:28] logging.py:143 >> {'loss': 0.6402, 'learning_rate': 4.2954e-05, 'epoch': 0.75, 'throughput': 10000.69} [INFO|2025-03-19 16:20:08] logging.py:143 >> {'loss': 0.6624, 'learning_rate': 4.2944e-05, 'epoch': 0.75, 'throughput': 10000.55} [INFO|2025-03-19 16:20:50] logging.py:143 >> {'loss': 0.6706, 'learning_rate': 4.2935e-05, 'epoch': 0.75, 'throughput': 10000.61} [INFO|2025-03-19 16:21:30] logging.py:143 >> {'loss': 0.6361, 'learning_rate': 4.2925e-05, 'epoch': 0.75, 'throughput': 10000.37} [INFO|2025-03-19 16:22:11] logging.py:143 >> {'loss': 0.6467, 'learning_rate': 4.2915e-05, 'epoch': 0.75, 'throughput': 10000.34} [INFO|2025-03-19 16:22:52] logging.py:143 >> {'loss': 0.6787, 'learning_rate': 4.2905e-05, 'epoch': 0.75, 'throughput': 10000.33} [INFO|2025-03-19 16:23:32] logging.py:143 >> {'loss': 0.6201, 'learning_rate': 4.2895e-05, 'epoch': 0.75, 'throughput': 10000.24} [INFO|2025-03-19 16:24:11] logging.py:143 >> {'loss': 0.6472, 'learning_rate': 4.2886e-05, 'epoch': 0.75, 'throughput': 10000.22} [INFO|2025-03-19 16:24:51] logging.py:143 >> {'loss': 0.6490, 'learning_rate': 4.2876e-05, 'epoch': 0.76, 'throughput': 10000.31} [INFO|2025-03-19 16:25:31] logging.py:143 >> {'loss': 0.6751, 'learning_rate': 4.2866e-05, 'epoch': 0.76, 'throughput': 10000.46} [INFO|2025-03-19 16:26:12] logging.py:143 >> {'loss': 0.6301, 'learning_rate': 4.2856e-05, 'epoch': 0.76, 'throughput': 10000.61} [INFO|2025-03-19 16:26:53] logging.py:143 >> {'loss': 0.6834, 'learning_rate': 4.2847e-05, 'epoch': 0.76, 'throughput': 10000.58} [INFO|2025-03-19 16:27:32] logging.py:143 >> {'loss': 0.6313, 'learning_rate': 4.2837e-05, 'epoch': 0.76, 'throughput': 10000.72} [INFO|2025-03-19 16:28:13] logging.py:143 >> {'loss': 0.6643, 'learning_rate': 4.2827e-05, 'epoch': 0.76, 'throughput': 10000.66} [INFO|2025-03-19 16:28:54] logging.py:143 >> {'loss': 0.6771, 'learning_rate': 4.2817e-05, 'epoch': 0.76, 'throughput': 10000.68} [INFO|2025-03-19 16:29:33] logging.py:143 >> {'loss': 0.6593, 'learning_rate': 4.2807e-05, 'epoch': 0.76, 'throughput': 10000.71} [INFO|2025-03-19 16:30:14] logging.py:143 >> {'loss': 0.6453, 'learning_rate': 4.2797e-05, 'epoch': 0.76, 'throughput': 10000.79} [INFO|2025-03-19 16:30:54] logging.py:143 >> {'loss': 0.6697, 'learning_rate': 4.2788e-05, 'epoch': 0.76, 'throughput': 10000.86} [INFO|2025-03-19 16:31:34] logging.py:143 >> {'loss': 0.7302, 'learning_rate': 4.2778e-05, 'epoch': 0.76, 'throughput': 10001.10} [INFO|2025-03-19 16:32:14] logging.py:143 >> {'loss': 0.6723, 'learning_rate': 4.2768e-05, 'epoch': 0.76, 'throughput': 10001.27} [INFO|2025-03-19 16:32:54] logging.py:143 >> {'loss': 0.6475, 'learning_rate': 4.2758e-05, 'epoch': 0.76, 'throughput': 10001.44} [INFO|2025-03-19 16:33:35] logging.py:143 >> {'loss': 0.6440, 'learning_rate': 4.2748e-05, 'epoch': 0.76, 'throughput': 10001.45} [INFO|2025-03-19 16:34:15] logging.py:143 >> {'loss': 0.6855, 'learning_rate': 4.2738e-05, 'epoch': 0.76, 'throughput': 10001.57} [INFO|2025-03-19 16:34:55] logging.py:143 >> {'loss': 0.6297, 'learning_rate': 4.2729e-05, 'epoch': 0.76, 'throughput': 10001.75} [INFO|2025-03-19 16:35:35] logging.py:143 >> {'loss': 0.6468, 'learning_rate': 4.2719e-05, 'epoch': 0.76, 'throughput': 10001.73} [INFO|2025-03-19 16:36:16] logging.py:143 >> {'loss': 0.6764, 'learning_rate': 4.2709e-05, 'epoch': 0.76, 'throughput': 10001.70} [INFO|2025-03-19 16:36:56] logging.py:143 >> {'loss': 0.6782, 'learning_rate': 4.2699e-05, 'epoch': 0.76, 'throughput': 10001.69} [INFO|2025-03-19 16:37:38] logging.py:143 >> {'loss': 0.6420, 'learning_rate': 4.2689e-05, 'epoch': 0.77, 'throughput': 10001.62} [INFO|2025-03-19 16:38:17] logging.py:143 >> {'loss': 0.6523, 'learning_rate': 4.2679e-05, 'epoch': 0.77, 'throughput': 10001.68} [INFO|2025-03-19 16:38:58] logging.py:143 >> {'loss': 0.6270, 'learning_rate': 4.2669e-05, 'epoch': 0.77, 'throughput': 10001.59} [INFO|2025-03-19 16:39:39] logging.py:143 >> {'loss': 0.6886, 'learning_rate': 4.2659e-05, 'epoch': 0.77, 'throughput': 10001.49} [INFO|2025-03-19 16:40:20] logging.py:143 >> {'loss': 0.6910, 'learning_rate': 4.2649e-05, 'epoch': 0.77, 'throughput': 10001.45} [INFO|2025-03-19 16:40:59] logging.py:143 >> {'loss': 0.6482, 'learning_rate': 4.2640e-05, 'epoch': 0.77, 'throughput': 10001.46} [INFO|2025-03-19 16:41:39] logging.py:143 >> {'loss': 0.6971, 'learning_rate': 4.2630e-05, 'epoch': 0.77, 'throughput': 10001.68} [INFO|2025-03-19 16:42:21] logging.py:143 >> {'loss': 0.6842, 'learning_rate': 4.2620e-05, 'epoch': 0.77, 'throughput': 10001.85} [INFO|2025-03-19 16:43:02] logging.py:143 >> {'loss': 0.6456, 'learning_rate': 4.2610e-05, 'epoch': 0.77, 'throughput': 10001.57} [INFO|2025-03-19 16:43:43] logging.py:143 >> {'loss': 0.6529, 'learning_rate': 4.2600e-05, 'epoch': 0.77, 'throughput': 10001.39} [INFO|2025-03-19 16:44:24] logging.py:143 >> {'loss': 0.6552, 'learning_rate': 4.2590e-05, 'epoch': 0.77, 'throughput': 10001.35} [INFO|2025-03-19 16:45:03] logging.py:143 >> {'loss': 0.6635, 'learning_rate': 4.2580e-05, 'epoch': 0.77, 'throughput': 10001.58} [INFO|2025-03-19 16:45:45] logging.py:143 >> {'loss': 0.6587, 'learning_rate': 4.2570e-05, 'epoch': 0.77, 'throughput': 10001.62} [INFO|2025-03-19 16:46:25] logging.py:143 >> {'loss': 0.6384, 'learning_rate': 4.2560e-05, 'epoch': 0.77, 'throughput': 10001.77} [INFO|2025-03-19 16:47:05] logging.py:143 >> {'loss': 0.6754, 'learning_rate': 4.2550e-05, 'epoch': 0.77, 'throughput': 10002.04} [INFO|2025-03-19 16:47:46] logging.py:143 >> {'loss': 0.6752, 'learning_rate': 4.2540e-05, 'epoch': 0.77, 'throughput': 10002.01} [INFO|2025-03-19 16:48:26] logging.py:143 >> {'loss': 0.6572, 'learning_rate': 4.2530e-05, 'epoch': 0.77, 'throughput': 10002.06} [INFO|2025-03-19 16:49:06] logging.py:143 >> {'loss': 0.6335, 'learning_rate': 4.2520e-05, 'epoch': 0.77, 'throughput': 10002.05} [INFO|2025-03-19 16:49:47] logging.py:143 >> {'loss': 0.6270, 'learning_rate': 4.2510e-05, 'epoch': 0.77, 'throughput': 10002.10} [INFO|2025-03-19 16:50:26] logging.py:143 >> {'loss': 0.6181, 'learning_rate': 4.2500e-05, 'epoch': 0.78, 'throughput': 10002.27} [INFO|2025-03-19 16:51:07] logging.py:143 >> {'loss': 0.6304, 'learning_rate': 4.2490e-05, 'epoch': 0.78, 'throughput': 10002.32} [INFO|2025-03-19 16:51:47] logging.py:143 >> {'loss': 0.6737, 'learning_rate': 4.2480e-05, 'epoch': 0.78, 'throughput': 10002.17} [INFO|2025-03-19 16:52:27] logging.py:143 >> {'loss': 0.6868, 'learning_rate': 4.2470e-05, 'epoch': 0.78, 'throughput': 10002.22} [INFO|2025-03-19 16:53:08] logging.py:143 >> {'loss': 0.6625, 'learning_rate': 4.2460e-05, 'epoch': 0.78, 'throughput': 10002.12} [INFO|2025-03-19 16:53:49] logging.py:143 >> {'loss': 0.6767, 'learning_rate': 4.2450e-05, 'epoch': 0.78, 'throughput': 10002.22} [INFO|2025-03-19 16:54:30] logging.py:143 >> {'loss': 0.6828, 'learning_rate': 4.2440e-05, 'epoch': 0.78, 'throughput': 10002.31} [INFO|2025-03-19 16:55:10] logging.py:143 >> {'loss': 0.6274, 'learning_rate': 4.2430e-05, 'epoch': 0.78, 'throughput': 10002.27} [INFO|2025-03-19 16:55:50] logging.py:143 >> {'loss': 0.6866, 'learning_rate': 4.2420e-05, 'epoch': 0.78, 'throughput': 10002.30} [INFO|2025-03-19 16:56:30] logging.py:143 >> {'loss': 0.6377, 'learning_rate': 4.2410e-05, 'epoch': 0.78, 'throughput': 10002.43} [INFO|2025-03-19 16:57:10] logging.py:143 >> {'loss': 0.6646, 'learning_rate': 4.2400e-05, 'epoch': 0.78, 'throughput': 10002.53} [INFO|2025-03-19 16:57:50] logging.py:143 >> {'loss': 0.6909, 'learning_rate': 4.2390e-05, 'epoch': 0.78, 'throughput': 10002.36} [INFO|2025-03-19 16:58:31] logging.py:143 >> {'loss': 0.6545, 'learning_rate': 4.2380e-05, 'epoch': 0.78, 'throughput': 10002.35} [INFO|2025-03-19 16:59:10] logging.py:143 >> {'loss': 0.6938, 'learning_rate': 4.2370e-05, 'epoch': 0.78, 'throughput': 10002.41} [INFO|2025-03-19 16:59:51] logging.py:143 >> {'loss': 0.6275, 'learning_rate': 4.2360e-05, 'epoch': 0.78, 'throughput': 10002.28} [INFO|2025-03-19 17:00:31] logging.py:143 >> {'loss': 0.6185, 'learning_rate': 4.2350e-05, 'epoch': 0.78, 'throughput': 10002.24} [INFO|2025-03-19 17:01:10] logging.py:143 >> {'loss': 0.6495, 'learning_rate': 4.2340e-05, 'epoch': 0.78, 'throughput': 10002.40} [INFO|2025-03-19 17:01:51] logging.py:143 >> {'loss': 0.6585, 'learning_rate': 4.2329e-05, 'epoch': 0.78, 'throughput': 10002.39} [INFO|2025-03-19 17:02:30] logging.py:143 >> {'loss': 0.6762, 'learning_rate': 4.2319e-05, 'epoch': 0.78, 'throughput': 10002.71} [INFO|2025-03-19 17:03:10] logging.py:143 >> {'loss': 0.6474, 'learning_rate': 4.2309e-05, 'epoch': 0.79, 'throughput': 10002.74} [INFO|2025-03-19 17:03:49] logging.py:143 >> {'loss': 0.6438, 'learning_rate': 4.2299e-05, 'epoch': 0.79, 'throughput': 10002.99} [INFO|2025-03-19 17:04:30] logging.py:143 >> {'loss': 0.6384, 'learning_rate': 4.2289e-05, 'epoch': 0.79, 'throughput': 10002.92} [INFO|2025-03-19 17:05:10] logging.py:143 >> {'loss': 0.6367, 'learning_rate': 4.2279e-05, 'epoch': 0.79, 'throughput': 10003.08} [INFO|2025-03-19 17:05:51] logging.py:143 >> {'loss': 0.6623, 'learning_rate': 4.2269e-05, 'epoch': 0.79, 'throughput': 10002.92} [INFO|2025-03-19 17:06:29] logging.py:143 >> {'loss': 0.6364, 'learning_rate': 4.2259e-05, 'epoch': 0.79, 'throughput': 10003.33} [INFO|2025-03-19 17:07:10] logging.py:143 >> {'loss': 0.6035, 'learning_rate': 4.2249e-05, 'epoch': 0.79, 'throughput': 10003.32} [INFO|2025-03-19 17:07:50] logging.py:143 >> {'loss': 0.6936, 'learning_rate': 4.2238e-05, 'epoch': 0.79, 'throughput': 10003.33} [INFO|2025-03-19 17:08:31] logging.py:143 >> {'loss': 0.6525, 'learning_rate': 4.2228e-05, 'epoch': 0.79, 'throughput': 10003.33} [INFO|2025-03-19 17:09:12] logging.py:143 >> {'loss': 0.6428, 'learning_rate': 4.2218e-05, 'epoch': 0.79, 'throughput': 10003.20} [INFO|2025-03-19 17:09:52] logging.py:143 >> {'loss': 0.6436, 'learning_rate': 4.2208e-05, 'epoch': 0.79, 'throughput': 10003.16} [INFO|2025-03-19 17:10:33] logging.py:143 >> {'loss': 0.6425, 'learning_rate': 4.2198e-05, 'epoch': 0.79, 'throughput': 10003.15} [INFO|2025-03-19 17:11:14] logging.py:143 >> {'loss': 0.6647, 'learning_rate': 4.2188e-05, 'epoch': 0.79, 'throughput': 10003.27} [INFO|2025-03-19 17:11:54] logging.py:143 >> {'loss': 0.6724, 'learning_rate': 4.2178e-05, 'epoch': 0.79, 'throughput': 10003.14} [INFO|2025-03-19 17:12:34] logging.py:143 >> {'loss': 0.6607, 'learning_rate': 4.2167e-05, 'epoch': 0.79, 'throughput': 10003.38} [INFO|2025-03-19 17:13:14] logging.py:143 >> {'loss': 0.6750, 'learning_rate': 4.2157e-05, 'epoch': 0.79, 'throughput': 10003.53} [INFO|2025-03-19 17:13:54] logging.py:143 >> {'loss': 0.6561, 'learning_rate': 4.2147e-05, 'epoch': 0.79, 'throughput': 10003.59} [INFO|2025-03-19 17:14:35] logging.py:143 >> {'loss': 0.6626, 'learning_rate': 4.2137e-05, 'epoch': 0.79, 'throughput': 10003.66} [INFO|2025-03-19 17:15:15] logging.py:143 >> {'loss': 0.6591, 'learning_rate': 4.2127e-05, 'epoch': 0.80, 'throughput': 10003.73} [INFO|2025-03-19 17:15:56] logging.py:143 >> {'loss': 0.6188, 'learning_rate': 4.2116e-05, 'epoch': 0.80, 'throughput': 10003.73} [INFO|2025-03-19 17:16:37] logging.py:143 >> {'loss': 0.6422, 'learning_rate': 4.2106e-05, 'epoch': 0.80, 'throughput': 10003.55} [INFO|2025-03-19 17:17:18] logging.py:143 >> {'loss': 0.6598, 'learning_rate': 4.2096e-05, 'epoch': 0.80, 'throughput': 10003.72} [INFO|2025-03-19 17:17:58] logging.py:143 >> {'loss': 0.6641, 'learning_rate': 4.2086e-05, 'epoch': 0.80, 'throughput': 10003.78} [INFO|2025-03-19 17:18:38] logging.py:143 >> {'loss': 0.6572, 'learning_rate': 4.2076e-05, 'epoch': 0.80, 'throughput': 10003.66} [INFO|2025-03-19 17:19:19] logging.py:143 >> {'loss': 0.6968, 'learning_rate': 4.2065e-05, 'epoch': 0.80, 'throughput': 10003.63} [INFO|2025-03-19 17:20:00] logging.py:143 >> {'loss': 0.6452, 'learning_rate': 4.2055e-05, 'epoch': 0.80, 'throughput': 10003.54} [INFO|2025-03-19 17:20:41] logging.py:143 >> {'loss': 0.6672, 'learning_rate': 4.2045e-05, 'epoch': 0.80, 'throughput': 10003.29} [INFO|2025-03-19 17:21:20] logging.py:143 >> {'loss': 0.6604, 'learning_rate': 4.2035e-05, 'epoch': 0.80, 'throughput': 10003.50} [INFO|2025-03-19 17:22:00] logging.py:143 >> {'loss': 0.6641, 'learning_rate': 4.2024e-05, 'epoch': 0.80, 'throughput': 10003.61} [INFO|2025-03-19 17:22:42] logging.py:143 >> {'loss': 0.6222, 'learning_rate': 4.2014e-05, 'epoch': 0.80, 'throughput': 10003.51} [INFO|2025-03-19 17:23:24] logging.py:143 >> {'loss': 0.6666, 'learning_rate': 4.2004e-05, 'epoch': 0.80, 'throughput': 10003.23} [INFO|2025-03-19 17:24:05] logging.py:143 >> {'loss': 0.6555, 'learning_rate': 4.1994e-05, 'epoch': 0.80, 'throughput': 10002.99} [INFO|2025-03-19 17:24:47] logging.py:143 >> {'loss': 0.6795, 'learning_rate': 4.1983e-05, 'epoch': 0.80, 'throughput': 10002.87} [INFO|2025-03-19 17:25:27] logging.py:143 >> {'loss': 0.6673, 'learning_rate': 4.1973e-05, 'epoch': 0.80, 'throughput': 10003.08} [INFO|2025-03-19 17:26:08] logging.py:143 >> {'loss': 0.6151, 'learning_rate': 4.1963e-05, 'epoch': 0.80, 'throughput': 10003.06} [INFO|2025-03-19 17:26:48] logging.py:143 >> {'loss': 0.6411, 'learning_rate': 4.1953e-05, 'epoch': 0.80, 'throughput': 10003.23} [INFO|2025-03-19 17:27:28] logging.py:143 >> {'loss': 0.6745, 'learning_rate': 4.1942e-05, 'epoch': 0.80, 'throughput': 10003.24} [INFO|2025-03-19 17:28:09] logging.py:143 >> {'loss': 0.6214, 'learning_rate': 4.1932e-05, 'epoch': 0.81, 'throughput': 10003.12} [INFO|2025-03-19 17:28:51] logging.py:143 >> {'loss': 0.6255, 'learning_rate': 4.1922e-05, 'epoch': 0.81, 'throughput': 10002.87} [INFO|2025-03-19 17:29:33] logging.py:143 >> {'loss': 0.6598, 'learning_rate': 4.1911e-05, 'epoch': 0.81, 'throughput': 10002.45} [INFO|2025-03-19 17:30:15] logging.py:143 >> {'loss': 0.6967, 'learning_rate': 4.1901e-05, 'epoch': 0.81, 'throughput': 10002.19} [INFO|2025-03-19 17:30:55] logging.py:143 >> {'loss': 0.6430, 'learning_rate': 4.1891e-05, 'epoch': 0.81, 'throughput': 10002.22} [INFO|2025-03-19 17:31:36] logging.py:143 >> {'loss': 0.6326, 'learning_rate': 4.1880e-05, 'epoch': 0.81, 'throughput': 10002.12} [INFO|2025-03-19 17:32:16] logging.py:143 >> {'loss': 0.6453, 'learning_rate': 4.1870e-05, 'epoch': 0.81, 'throughput': 10002.18} [INFO|2025-03-19 17:32:57] logging.py:143 >> {'loss': 0.6650, 'learning_rate': 4.1860e-05, 'epoch': 0.81, 'throughput': 10002.21} [INFO|2025-03-19 17:33:37] logging.py:143 >> {'loss': 0.6444, 'learning_rate': 4.1850e-05, 'epoch': 0.81, 'throughput': 10002.44} [INFO|2025-03-19 17:34:18] logging.py:143 >> {'loss': 0.6372, 'learning_rate': 4.1839e-05, 'epoch': 0.81, 'throughput': 10002.46} [INFO|2025-03-19 17:34:59] logging.py:143 >> {'loss': 0.6494, 'learning_rate': 4.1829e-05, 'epoch': 0.81, 'throughput': 10002.47} [INFO|2025-03-19 17:35:39] logging.py:143 >> {'loss': 0.6336, 'learning_rate': 4.1818e-05, 'epoch': 0.81, 'throughput': 10002.38} [INFO|2025-03-19 17:36:18] logging.py:143 >> {'loss': 0.6328, 'learning_rate': 4.1808e-05, 'epoch': 0.81, 'throughput': 10002.50} [INFO|2025-03-19 17:36:59] logging.py:143 >> {'loss': 0.6432, 'learning_rate': 4.1798e-05, 'epoch': 0.81, 'throughput': 10002.39} [INFO|2025-03-19 17:37:41] logging.py:143 >> {'loss': 0.6342, 'learning_rate': 4.1787e-05, 'epoch': 0.81, 'throughput': 10002.33} [INFO|2025-03-19 17:38:20] logging.py:143 >> {'loss': 0.6778, 'learning_rate': 4.1777e-05, 'epoch': 0.81, 'throughput': 10002.43} [INFO|2025-03-19 17:39:01] logging.py:143 >> {'loss': 0.6783, 'learning_rate': 4.1767e-05, 'epoch': 0.81, 'throughput': 10002.43} [INFO|2025-03-19 17:39:43] logging.py:143 >> {'loss': 0.6779, 'learning_rate': 4.1756e-05, 'epoch': 0.81, 'throughput': 10002.47} [INFO|2025-03-19 17:40:23] logging.py:143 >> {'loss': 0.6706, 'learning_rate': 4.1746e-05, 'epoch': 0.81, 'throughput': 10002.67} [INFO|2025-03-19 17:41:04] logging.py:143 >> {'loss': 0.6584, 'learning_rate': 4.1735e-05, 'epoch': 0.82, 'throughput': 10002.69} [INFO|2025-03-19 17:41:45] logging.py:143 >> {'loss': 0.6632, 'learning_rate': 4.1725e-05, 'epoch': 0.82, 'throughput': 10002.73} [INFO|2025-03-19 17:42:28] logging.py:143 >> {'loss': 0.6156, 'learning_rate': 4.1715e-05, 'epoch': 0.82, 'throughput': 10002.41} [INFO|2025-03-19 17:43:08] logging.py:143 >> {'loss': 0.6615, 'learning_rate': 4.1704e-05, 'epoch': 0.82, 'throughput': 10002.37} [INFO|2025-03-19 17:43:48] logging.py:143 >> {'loss': 0.6645, 'learning_rate': 4.1694e-05, 'epoch': 0.82, 'throughput': 10002.37} [INFO|2025-03-19 17:44:30] logging.py:143 >> {'loss': 0.6594, 'learning_rate': 4.1683e-05, 'epoch': 0.82, 'throughput': 10002.20} [INFO|2025-03-19 17:45:11] logging.py:143 >> {'loss': 0.6067, 'learning_rate': 4.1673e-05, 'epoch': 0.82, 'throughput': 10002.11} [INFO|2025-03-19 17:45:52] logging.py:143 >> {'loss': 0.6401, 'learning_rate': 4.1663e-05, 'epoch': 0.82, 'throughput': 10002.14} [INFO|2025-03-19 17:46:32] logging.py:143 >> {'loss': 0.6620, 'learning_rate': 4.1652e-05, 'epoch': 0.82, 'throughput': 10002.19} [INFO|2025-03-19 17:47:13] logging.py:143 >> {'loss': 0.6279, 'learning_rate': 4.1642e-05, 'epoch': 0.82, 'throughput': 10002.15} [INFO|2025-03-19 17:47:55] logging.py:143 >> {'loss': 0.6770, 'learning_rate': 4.1631e-05, 'epoch': 0.82, 'throughput': 10002.19} [INFO|2025-03-19 17:48:35] logging.py:143 >> {'loss': 0.6780, 'learning_rate': 4.1621e-05, 'epoch': 0.82, 'throughput': 10002.34} [INFO|2025-03-19 17:49:16] logging.py:143 >> {'loss': 0.6716, 'learning_rate': 4.1610e-05, 'epoch': 0.82, 'throughput': 10002.32} [INFO|2025-03-19 17:49:56] logging.py:143 >> {'loss': 0.6587, 'learning_rate': 4.1600e-05, 'epoch': 0.82, 'throughput': 10002.34} [INFO|2025-03-19 17:50:38] logging.py:143 >> {'loss': 0.6477, 'learning_rate': 4.1589e-05, 'epoch': 0.82, 'throughput': 10002.22} [INFO|2025-03-19 17:51:18] logging.py:143 >> {'loss': 0.6141, 'learning_rate': 4.1579e-05, 'epoch': 0.82, 'throughput': 10002.10} [INFO|2025-03-19 17:51:59] logging.py:143 >> {'loss': 0.6785, 'learning_rate': 4.1568e-05, 'epoch': 0.82, 'throughput': 10001.91} [INFO|2025-03-19 17:52:40] logging.py:143 >> {'loss': 0.6390, 'learning_rate': 4.1558e-05, 'epoch': 0.82, 'throughput': 10001.91} [INFO|2025-03-19 17:53:20] logging.py:143 >> {'loss': 0.6156, 'learning_rate': 4.1548e-05, 'epoch': 0.82, 'throughput': 10002.00} [INFO|2025-03-19 17:54:02] logging.py:143 >> {'loss': 0.6478, 'learning_rate': 4.1537e-05, 'epoch': 0.83, 'throughput': 10001.92} [INFO|2025-03-19 17:54:42] logging.py:143 >> {'loss': 0.6914, 'learning_rate': 4.1527e-05, 'epoch': 0.83, 'throughput': 10001.87} [INFO|2025-03-19 17:55:23] logging.py:143 >> {'loss': 0.6375, 'learning_rate': 4.1516e-05, 'epoch': 0.83, 'throughput': 10001.88} [INFO|2025-03-19 17:56:02] logging.py:143 >> {'loss': 0.6760, 'learning_rate': 4.1506e-05, 'epoch': 0.83, 'throughput': 10002.00} [INFO|2025-03-19 17:56:43] logging.py:143 >> {'loss': 0.6750, 'learning_rate': 4.1495e-05, 'epoch': 0.83, 'throughput': 10001.94} [INFO|2025-03-19 17:57:20] logging.py:143 >> {'loss': 0.6192, 'learning_rate': 4.1484e-05, 'epoch': 0.83, 'throughput': 10002.18} [INFO|2025-03-19 17:58:00] logging.py:143 >> {'loss': 0.6726, 'learning_rate': 4.1474e-05, 'epoch': 0.83, 'throughput': 10002.36} [INFO|2025-03-19 17:58:41] logging.py:143 >> {'loss': 0.6349, 'learning_rate': 4.1463e-05, 'epoch': 0.83, 'throughput': 10002.19} [INFO|2025-03-19 17:59:21] logging.py:143 >> {'loss': 0.6465, 'learning_rate': 4.1453e-05, 'epoch': 0.83, 'throughput': 10002.32} [INFO|2025-03-19 18:00:00] logging.py:143 >> {'loss': 0.6430, 'learning_rate': 4.1442e-05, 'epoch': 0.83, 'throughput': 10002.44} [INFO|2025-03-19 18:00:40] logging.py:143 >> {'loss': 0.6603, 'learning_rate': 4.1432e-05, 'epoch': 0.83, 'throughput': 10002.37} [INFO|2025-03-19 18:01:21] logging.py:143 >> {'loss': 0.6080, 'learning_rate': 4.1421e-05, 'epoch': 0.83, 'throughput': 10002.38} [INFO|2025-03-19 18:02:02] logging.py:143 >> {'loss': 0.6113, 'learning_rate': 4.1411e-05, 'epoch': 0.83, 'throughput': 10002.38} [INFO|2025-03-19 18:02:45] logging.py:143 >> {'loss': 0.6509, 'learning_rate': 4.1400e-05, 'epoch': 0.83, 'throughput': 10002.21} [INFO|2025-03-19 18:03:26] logging.py:143 >> {'loss': 0.6452, 'learning_rate': 4.1390e-05, 'epoch': 0.83, 'throughput': 10002.05} [INFO|2025-03-19 18:04:06] logging.py:143 >> {'loss': 0.6365, 'learning_rate': 4.1379e-05, 'epoch': 0.83, 'throughput': 10002.19} [INFO|2025-03-19 18:04:47] logging.py:143 >> {'loss': 0.6463, 'learning_rate': 4.1368e-05, 'epoch': 0.83, 'throughput': 10002.24} [INFO|2025-03-19 18:05:27] logging.py:143 >> {'loss': 0.6557, 'learning_rate': 4.1358e-05, 'epoch': 0.83, 'throughput': 10002.37} [INFO|2025-03-19 18:06:09] logging.py:143 >> {'loss': 0.6738, 'learning_rate': 4.1347e-05, 'epoch': 0.83, 'throughput': 10002.24} [INFO|2025-03-19 18:06:49] logging.py:143 >> {'loss': 0.6505, 'learning_rate': 4.1337e-05, 'epoch': 0.84, 'throughput': 10002.38} [INFO|2025-03-19 18:07:28] logging.py:143 >> {'loss': 0.6265, 'learning_rate': 4.1326e-05, 'epoch': 0.84, 'throughput': 10002.60} [INFO|2025-03-19 18:08:07] logging.py:143 >> {'loss': 0.6653, 'learning_rate': 4.1315e-05, 'epoch': 0.84, 'throughput': 10002.73} [INFO|2025-03-19 18:08:48] logging.py:143 >> {'loss': 0.6399, 'learning_rate': 4.1305e-05, 'epoch': 0.84, 'throughput': 10002.59} [INFO|2025-03-19 18:09:30] logging.py:143 >> {'loss': 0.6480, 'learning_rate': 4.1294e-05, 'epoch': 0.84, 'throughput': 10002.53} [INFO|2025-03-19 18:10:11] logging.py:143 >> {'loss': 0.6525, 'learning_rate': 4.1284e-05, 'epoch': 0.84, 'throughput': 10002.64} [INFO|2025-03-19 18:10:52] logging.py:143 >> {'loss': 0.6603, 'learning_rate': 4.1273e-05, 'epoch': 0.84, 'throughput': 10002.58} [INFO|2025-03-19 18:11:33] logging.py:143 >> {'loss': 0.6516, 'learning_rate': 4.1262e-05, 'epoch': 0.84, 'throughput': 10002.30} [INFO|2025-03-19 18:12:13] logging.py:143 >> {'loss': 0.6559, 'learning_rate': 4.1252e-05, 'epoch': 0.84, 'throughput': 10002.27} [INFO|2025-03-19 18:12:55] logging.py:143 >> {'loss': 0.6580, 'learning_rate': 4.1241e-05, 'epoch': 0.84, 'throughput': 10001.99} [INFO|2025-03-19 18:13:36] logging.py:143 >> {'loss': 0.6916, 'learning_rate': 4.1231e-05, 'epoch': 0.84, 'throughput': 10001.96} [INFO|2025-03-19 18:14:17] logging.py:143 >> {'loss': 0.6468, 'learning_rate': 4.1220e-05, 'epoch': 0.84, 'throughput': 10002.18} [INFO|2025-03-19 18:14:58] logging.py:143 >> {'loss': 0.6615, 'learning_rate': 4.1209e-05, 'epoch': 0.84, 'throughput': 10002.09} [INFO|2025-03-19 18:15:38] logging.py:143 >> {'loss': 0.6701, 'learning_rate': 4.1199e-05, 'epoch': 0.84, 'throughput': 10002.20} [INFO|2025-03-19 18:16:18] logging.py:143 >> {'loss': 0.6684, 'learning_rate': 4.1188e-05, 'epoch': 0.84, 'throughput': 10002.24} [INFO|2025-03-19 18:16:58] logging.py:143 >> {'loss': 0.6619, 'learning_rate': 4.1177e-05, 'epoch': 0.84, 'throughput': 10002.24} [INFO|2025-03-19 18:17:40] logging.py:143 >> {'loss': 0.6217, 'learning_rate': 4.1167e-05, 'epoch': 0.84, 'throughput': 10002.08} [INFO|2025-03-19 18:18:20] logging.py:143 >> {'loss': 0.6836, 'learning_rate': 4.1156e-05, 'epoch': 0.84, 'throughput': 10002.18} [INFO|2025-03-19 18:19:01] logging.py:143 >> {'loss': 0.6725, 'learning_rate': 4.1145e-05, 'epoch': 0.84, 'throughput': 10002.16} [INFO|2025-03-19 18:19:43] logging.py:143 >> {'loss': 0.6247, 'learning_rate': 4.1135e-05, 'epoch': 0.85, 'throughput': 10002.02} [INFO|2025-03-19 18:20:24] logging.py:143 >> {'loss': 0.6447, 'learning_rate': 4.1124e-05, 'epoch': 0.85, 'throughput': 10002.05} [INFO|2025-03-19 18:21:05] logging.py:143 >> {'loss': 0.6602, 'learning_rate': 4.1113e-05, 'epoch': 0.85, 'throughput': 10002.14} [INFO|2025-03-19 18:21:45] logging.py:143 >> {'loss': 0.6429, 'learning_rate': 4.1102e-05, 'epoch': 0.85, 'throughput': 10002.20} [INFO|2025-03-19 18:22:27] logging.py:143 >> {'loss': 0.6251, 'learning_rate': 4.1092e-05, 'epoch': 0.85, 'throughput': 10002.15} [INFO|2025-03-19 18:23:07] logging.py:143 >> {'loss': 0.6651, 'learning_rate': 4.1081e-05, 'epoch': 0.85, 'throughput': 10002.05} [INFO|2025-03-19 18:23:47] logging.py:143 >> {'loss': 0.6570, 'learning_rate': 4.1070e-05, 'epoch': 0.85, 'throughput': 10002.17} [INFO|2025-03-19 18:24:28] logging.py:143 >> {'loss': 0.6693, 'learning_rate': 4.1060e-05, 'epoch': 0.85, 'throughput': 10002.30} [INFO|2025-03-19 18:25:08] logging.py:143 >> {'loss': 0.6298, 'learning_rate': 4.1049e-05, 'epoch': 0.85, 'throughput': 10002.25} [INFO|2025-03-19 18:25:50] logging.py:143 >> {'loss': 0.6384, 'learning_rate': 4.1038e-05, 'epoch': 0.85, 'throughput': 10002.11} [INFO|2025-03-19 18:26:33] logging.py:143 >> {'loss': 0.6605, 'learning_rate': 4.1027e-05, 'epoch': 0.85, 'throughput': 10001.88} [INFO|2025-03-19 18:27:14] logging.py:143 >> {'loss': 0.6369, 'learning_rate': 4.1017e-05, 'epoch': 0.85, 'throughput': 10001.84} [INFO|2025-03-19 18:27:56] logging.py:143 >> {'loss': 0.6581, 'learning_rate': 4.1006e-05, 'epoch': 0.85, 'throughput': 10001.90} [INFO|2025-03-19 18:28:37] logging.py:143 >> {'loss': 0.6386, 'learning_rate': 4.0995e-05, 'epoch': 0.85, 'throughput': 10001.90} [INFO|2025-03-19 18:29:17] logging.py:143 >> {'loss': 0.6502, 'learning_rate': 4.0984e-05, 'epoch': 0.85, 'throughput': 10002.01} [INFO|2025-03-19 18:29:56] logging.py:143 >> {'loss': 0.6093, 'learning_rate': 4.0974e-05, 'epoch': 0.85, 'throughput': 10002.07} [INFO|2025-03-19 18:30:35] logging.py:143 >> {'loss': 0.6409, 'learning_rate': 4.0963e-05, 'epoch': 0.85, 'throughput': 10002.22} [INFO|2025-03-19 18:31:16] logging.py:143 >> {'loss': 0.6558, 'learning_rate': 4.0952e-05, 'epoch': 0.85, 'throughput': 10002.19} [INFO|2025-03-19 18:31:56] logging.py:143 >> {'loss': 0.6279, 'learning_rate': 4.0941e-05, 'epoch': 0.86, 'throughput': 10002.35} [INFO|2025-03-19 18:32:38] logging.py:143 >> {'loss': 0.6332, 'learning_rate': 4.0931e-05, 'epoch': 0.86, 'throughput': 10002.21} [INFO|2025-03-19 18:33:19] logging.py:143 >> {'loss': 0.6482, 'learning_rate': 4.0920e-05, 'epoch': 0.86, 'throughput': 10002.15} [INFO|2025-03-19 18:33:59] logging.py:143 >> {'loss': 0.6233, 'learning_rate': 4.0909e-05, 'epoch': 0.86, 'throughput': 10002.16} [INFO|2025-03-19 18:34:38] logging.py:143 >> {'loss': 0.6766, 'learning_rate': 4.0898e-05, 'epoch': 0.86, 'throughput': 10002.30} [INFO|2025-03-19 18:35:19] logging.py:143 >> {'loss': 0.6129, 'learning_rate': 4.0887e-05, 'epoch': 0.86, 'throughput': 10002.28} [INFO|2025-03-19 18:35:58] logging.py:143 >> {'loss': 0.6542, 'learning_rate': 4.0877e-05, 'epoch': 0.86, 'throughput': 10002.52} [INFO|2025-03-19 18:36:37] logging.py:143 >> {'loss': 0.6150, 'learning_rate': 4.0866e-05, 'epoch': 0.86, 'throughput': 10002.65} [INFO|2025-03-19 18:37:17] logging.py:143 >> {'loss': 0.6245, 'learning_rate': 4.0855e-05, 'epoch': 0.86, 'throughput': 10002.65} [INFO|2025-03-19 18:38:00] logging.py:143 >> {'loss': 0.6285, 'learning_rate': 4.0844e-05, 'epoch': 0.86, 'throughput': 10002.31} [INFO|2025-03-19 18:38:40] logging.py:143 >> {'loss': 0.6642, 'learning_rate': 4.0833e-05, 'epoch': 0.86, 'throughput': 10002.43} [INFO|2025-03-19 18:39:20] logging.py:143 >> {'loss': 0.6759, 'learning_rate': 4.0822e-05, 'epoch': 0.86, 'throughput': 10002.47} [INFO|2025-03-19 18:40:00] logging.py:143 >> {'loss': 0.6414, 'learning_rate': 4.0812e-05, 'epoch': 0.86, 'throughput': 10002.44} [INFO|2025-03-19 18:40:40] logging.py:143 >> {'loss': 0.6205, 'learning_rate': 4.0801e-05, 'epoch': 0.86, 'throughput': 10002.51} [INFO|2025-03-19 18:41:22] logging.py:143 >> {'loss': 0.6423, 'learning_rate': 4.0790e-05, 'epoch': 0.86, 'throughput': 10002.52} [INFO|2025-03-19 18:42:01] logging.py:143 >> {'loss': 0.6408, 'learning_rate': 4.0779e-05, 'epoch': 0.86, 'throughput': 10002.59} [INFO|2025-03-19 18:42:42] logging.py:143 >> {'loss': 0.6595, 'learning_rate': 4.0768e-05, 'epoch': 0.86, 'throughput': 10002.67} [INFO|2025-03-19 18:43:21] logging.py:143 >> {'loss': 0.6421, 'learning_rate': 4.0757e-05, 'epoch': 0.86, 'throughput': 10002.77} [INFO|2025-03-19 18:44:00] logging.py:143 >> {'loss': 0.6234, 'learning_rate': 4.0746e-05, 'epoch': 0.86, 'throughput': 10002.83} [INFO|2025-03-19 18:44:39] logging.py:143 >> {'loss': 0.6691, 'learning_rate': 4.0736e-05, 'epoch': 0.87, 'throughput': 10003.09} [INFO|2025-03-19 18:45:20] logging.py:143 >> {'loss': 0.6351, 'learning_rate': 4.0725e-05, 'epoch': 0.87, 'throughput': 10002.95} [INFO|2025-03-19 18:46:01] logging.py:143 >> {'loss': 0.6179, 'learning_rate': 4.0714e-05, 'epoch': 0.87, 'throughput': 10002.86} [INFO|2025-03-19 18:46:41] logging.py:143 >> {'loss': 0.6355, 'learning_rate': 4.0703e-05, 'epoch': 0.87, 'throughput': 10002.79} [INFO|2025-03-19 18:47:21] logging.py:143 >> {'loss': 0.6881, 'learning_rate': 4.0692e-05, 'epoch': 0.87, 'throughput': 10002.96} [INFO|2025-03-19 18:48:00] logging.py:143 >> {'loss': 0.6767, 'learning_rate': 4.0681e-05, 'epoch': 0.87, 'throughput': 10003.14} [INFO|2025-03-19 18:48:40] logging.py:143 >> {'loss': 0.6543, 'learning_rate': 4.0670e-05, 'epoch': 0.87, 'throughput': 10003.13} [INFO|2025-03-19 18:49:20] logging.py:143 >> {'loss': 0.6258, 'learning_rate': 4.0659e-05, 'epoch': 0.87, 'throughput': 10003.07} [INFO|2025-03-19 18:50:01] logging.py:143 >> {'loss': 0.6091, 'learning_rate': 4.0648e-05, 'epoch': 0.87, 'throughput': 10002.96} [INFO|2025-03-19 18:50:41] logging.py:143 >> {'loss': 0.6660, 'learning_rate': 4.0638e-05, 'epoch': 0.87, 'throughput': 10003.03} [INFO|2025-03-19 18:51:21] logging.py:143 >> {'loss': 0.6498, 'learning_rate': 4.0627e-05, 'epoch': 0.87, 'throughput': 10003.19} [INFO|2025-03-19 18:52:02] logging.py:143 >> {'loss': 0.6531, 'learning_rate': 4.0616e-05, 'epoch': 0.87, 'throughput': 10003.15} [INFO|2025-03-19 18:52:43] logging.py:143 >> {'loss': 0.6316, 'learning_rate': 4.0605e-05, 'epoch': 0.87, 'throughput': 10003.01} [INFO|2025-03-19 18:53:24] logging.py:143 >> {'loss': 0.6547, 'learning_rate': 4.0594e-05, 'epoch': 0.87, 'throughput': 10003.07} [INFO|2025-03-19 18:54:03] logging.py:143 >> {'loss': 0.6666, 'learning_rate': 4.0583e-05, 'epoch': 0.87, 'throughput': 10003.11} [INFO|2025-03-19 18:54:44] logging.py:143 >> {'loss': 0.6520, 'learning_rate': 4.0572e-05, 'epoch': 0.87, 'throughput': 10003.22} [INFO|2025-03-19 18:55:25] logging.py:143 >> {'loss': 0.6260, 'learning_rate': 4.0561e-05, 'epoch': 0.87, 'throughput': 10003.15} [INFO|2025-03-19 18:56:05] logging.py:143 >> {'loss': 0.6597, 'learning_rate': 4.0550e-05, 'epoch': 0.87, 'throughput': 10003.33} [INFO|2025-03-19 18:56:44] logging.py:143 >> {'loss': 0.6561, 'learning_rate': 4.0539e-05, 'epoch': 0.87, 'throughput': 10003.40} [INFO|2025-03-19 18:57:25] logging.py:143 >> {'loss': 0.6422, 'learning_rate': 4.0528e-05, 'epoch': 0.88, 'throughput': 10003.32} [INFO|2025-03-19 18:58:05] logging.py:143 >> {'loss': 0.6609, 'learning_rate': 4.0517e-05, 'epoch': 0.88, 'throughput': 10003.35} [INFO|2025-03-19 18:58:45] logging.py:143 >> {'loss': 0.6370, 'learning_rate': 4.0506e-05, 'epoch': 0.88, 'throughput': 10003.31} [INFO|2025-03-19 18:59:25] logging.py:143 >> {'loss': 0.6456, 'learning_rate': 4.0495e-05, 'epoch': 0.88, 'throughput': 10003.30} [INFO|2025-03-19 19:00:04] logging.py:143 >> {'loss': 0.6380, 'learning_rate': 4.0484e-05, 'epoch': 0.88, 'throughput': 10003.42} [INFO|2025-03-19 19:00:44] logging.py:143 >> {'loss': 0.6656, 'learning_rate': 4.0473e-05, 'epoch': 0.88, 'throughput': 10003.42} [INFO|2025-03-19 19:01:25] logging.py:143 >> {'loss': 0.6032, 'learning_rate': 4.0462e-05, 'epoch': 0.88, 'throughput': 10003.28} [INFO|2025-03-19 19:02:05] logging.py:143 >> {'loss': 0.6473, 'learning_rate': 4.0451e-05, 'epoch': 0.88, 'throughput': 10003.25} [INFO|2025-03-19 19:02:46] logging.py:143 >> {'loss': 0.6413, 'learning_rate': 4.0440e-05, 'epoch': 0.88, 'throughput': 10003.24} [INFO|2025-03-19 19:03:28] logging.py:143 >> {'loss': 0.6295, 'learning_rate': 4.0429e-05, 'epoch': 0.88, 'throughput': 10003.13} [INFO|2025-03-19 19:04:08] logging.py:143 >> {'loss': 0.6613, 'learning_rate': 4.0418e-05, 'epoch': 0.88, 'throughput': 10003.12} [INFO|2025-03-19 19:04:48] logging.py:143 >> {'loss': 0.6208, 'learning_rate': 4.0407e-05, 'epoch': 0.88, 'throughput': 10003.11} [INFO|2025-03-19 19:05:29] logging.py:143 >> {'loss': 0.6422, 'learning_rate': 4.0396e-05, 'epoch': 0.88, 'throughput': 10003.17} [INFO|2025-03-19 19:06:10] logging.py:143 >> {'loss': 0.6484, 'learning_rate': 4.0385e-05, 'epoch': 0.88, 'throughput': 10003.32} [INFO|2025-03-19 19:06:50] logging.py:143 >> {'loss': 0.6113, 'learning_rate': 4.0374e-05, 'epoch': 0.88, 'throughput': 10003.22} [INFO|2025-03-19 19:07:30] logging.py:143 >> {'loss': 0.6340, 'learning_rate': 4.0363e-05, 'epoch': 0.88, 'throughput': 10003.28} [INFO|2025-03-19 19:08:09] logging.py:143 >> {'loss': 0.6357, 'learning_rate': 4.0352e-05, 'epoch': 0.88, 'throughput': 10003.44} [INFO|2025-03-19 19:08:50] logging.py:143 >> {'loss': 0.6503, 'learning_rate': 4.0341e-05, 'epoch': 0.88, 'throughput': 10003.52} [INFO|2025-03-19 19:09:32] logging.py:143 >> {'loss': 0.6355, 'learning_rate': 4.0330e-05, 'epoch': 0.88, 'throughput': 10003.28} [INFO|2025-03-19 19:10:13] logging.py:143 >> {'loss': 0.6300, 'learning_rate': 4.0319e-05, 'epoch': 0.89, 'throughput': 10003.33} [INFO|2025-03-19 19:10:53] logging.py:143 >> {'loss': 0.6641, 'learning_rate': 4.0308e-05, 'epoch': 0.89, 'throughput': 10003.10} [INFO|2025-03-19 19:11:34] logging.py:143 >> {'loss': 0.6960, 'learning_rate': 4.0297e-05, 'epoch': 0.89, 'throughput': 10003.11} [INFO|2025-03-19 19:12:14] logging.py:143 >> {'loss': 0.6137, 'learning_rate': 4.0286e-05, 'epoch': 0.89, 'throughput': 10003.14} [INFO|2025-03-19 19:12:55] logging.py:143 >> {'loss': 0.6886, 'learning_rate': 4.0275e-05, 'epoch': 0.89, 'throughput': 10003.04} [INFO|2025-03-19 19:13:37] logging.py:143 >> {'loss': 0.6125, 'learning_rate': 4.0263e-05, 'epoch': 0.89, 'throughput': 10002.78} [INFO|2025-03-19 19:14:17] logging.py:143 >> {'loss': 0.6612, 'learning_rate': 4.0252e-05, 'epoch': 0.89, 'throughput': 10002.84} [INFO|2025-03-19 19:15:00] logging.py:143 >> {'loss': 0.5822, 'learning_rate': 4.0241e-05, 'epoch': 0.89, 'throughput': 10002.75} [INFO|2025-03-19 19:15:41] logging.py:143 >> {'loss': 0.6312, 'learning_rate': 4.0230e-05, 'epoch': 0.89, 'throughput': 10002.68} [INFO|2025-03-19 19:16:22] logging.py:143 >> {'loss': 0.6210, 'learning_rate': 4.0219e-05, 'epoch': 0.89, 'throughput': 10002.65} [INFO|2025-03-19 19:17:02] logging.py:143 >> {'loss': 0.6579, 'learning_rate': 4.0208e-05, 'epoch': 0.89, 'throughput': 10002.76} [INFO|2025-03-19 19:17:42] logging.py:143 >> {'loss': 0.6203, 'learning_rate': 4.0197e-05, 'epoch': 0.89, 'throughput': 10002.71} [INFO|2025-03-19 19:18:21] logging.py:143 >> {'loss': 0.6319, 'learning_rate': 4.0186e-05, 'epoch': 0.89, 'throughput': 10002.72} [INFO|2025-03-19 19:19:02] logging.py:143 >> {'loss': 0.6425, 'learning_rate': 4.0175e-05, 'epoch': 0.89, 'throughput': 10002.85} [INFO|2025-03-19 19:19:41] logging.py:143 >> {'loss': 0.6225, 'learning_rate': 4.0164e-05, 'epoch': 0.89, 'throughput': 10002.98} [INFO|2025-03-19 19:20:21] logging.py:143 >> {'loss': 0.6335, 'learning_rate': 4.0152e-05, 'epoch': 0.89, 'throughput': 10003.14} [INFO|2025-03-19 19:21:02] logging.py:143 >> {'loss': 0.6403, 'learning_rate': 4.0141e-05, 'epoch': 0.89, 'throughput': 10003.08} [INFO|2025-03-19 19:21:43] logging.py:143 >> {'loss': 0.6059, 'learning_rate': 4.0130e-05, 'epoch': 0.89, 'throughput': 10002.99} [INFO|2025-03-19 19:22:23] logging.py:143 >> {'loss': 0.6350, 'learning_rate': 4.0119e-05, 'epoch': 0.89, 'throughput': 10003.03} [INFO|2025-03-19 19:23:04] logging.py:143 >> {'loss': 0.6646, 'learning_rate': 4.0108e-05, 'epoch': 0.90, 'throughput': 10003.17} [INFO|2025-03-19 19:23:45] logging.py:143 >> {'loss': 0.6690, 'learning_rate': 4.0097e-05, 'epoch': 0.90, 'throughput': 10003.14} [INFO|2025-03-19 19:24:27] logging.py:143 >> {'loss': 0.6178, 'learning_rate': 4.0086e-05, 'epoch': 0.90, 'throughput': 10003.12} [INFO|2025-03-19 19:25:08] logging.py:143 >> {'loss': 0.6292, 'learning_rate': 4.0074e-05, 'epoch': 0.90, 'throughput': 10002.96} [INFO|2025-03-19 19:25:47] logging.py:143 >> {'loss': 0.6334, 'learning_rate': 4.0063e-05, 'epoch': 0.90, 'throughput': 10003.05} [INFO|2025-03-19 19:26:29] logging.py:143 >> {'loss': 0.5980, 'learning_rate': 4.0052e-05, 'epoch': 0.90, 'throughput': 10003.10} [INFO|2025-03-19 19:27:10] logging.py:143 >> {'loss': 0.6343, 'learning_rate': 4.0041e-05, 'epoch': 0.90, 'throughput': 10003.10} [INFO|2025-03-19 19:27:49] logging.py:143 >> {'loss': 0.5982, 'learning_rate': 4.0030e-05, 'epoch': 0.90, 'throughput': 10003.26} [INFO|2025-03-19 19:28:29] logging.py:143 >> {'loss': 0.6380, 'learning_rate': 4.0019e-05, 'epoch': 0.90, 'throughput': 10003.30} [INFO|2025-03-19 19:29:09] logging.py:143 >> {'loss': 0.6482, 'learning_rate': 4.0007e-05, 'epoch': 0.90, 'throughput': 10003.35} [INFO|2025-03-19 19:29:50] logging.py:143 >> {'loss': 0.6456, 'learning_rate': 3.9996e-05, 'epoch': 0.90, 'throughput': 10003.38} [INFO|2025-03-19 19:30:30] logging.py:143 >> {'loss': 0.6134, 'learning_rate': 3.9985e-05, 'epoch': 0.90, 'throughput': 10003.48} [INFO|2025-03-19 19:31:11] logging.py:143 >> {'loss': 0.6394, 'learning_rate': 3.9974e-05, 'epoch': 0.90, 'throughput': 10003.51} [INFO|2025-03-19 19:31:51] logging.py:143 >> {'loss': 0.6483, 'learning_rate': 3.9963e-05, 'epoch': 0.90, 'throughput': 10003.79} [INFO|2025-03-19 19:32:32] logging.py:143 >> {'loss': 0.5834, 'learning_rate': 3.9951e-05, 'epoch': 0.90, 'throughput': 10003.57} [INFO|2025-03-19 19:33:13] logging.py:143 >> {'loss': 0.5958, 'learning_rate': 3.9940e-05, 'epoch': 0.90, 'throughput': 10003.51} [INFO|2025-03-19 19:33:53] logging.py:143 >> {'loss': 0.6478, 'learning_rate': 3.9929e-05, 'epoch': 0.90, 'throughput': 10003.46} [INFO|2025-03-19 19:34:34] logging.py:143 >> {'loss': 0.6337, 'learning_rate': 3.9918e-05, 'epoch': 0.90, 'throughput': 10003.41} [INFO|2025-03-19 19:35:13] logging.py:143 >> {'loss': 0.6512, 'learning_rate': 3.9906e-05, 'epoch': 0.90, 'throughput': 10003.59} [INFO|2025-03-19 19:35:54] logging.py:143 >> {'loss': 0.6623, 'learning_rate': 3.9895e-05, 'epoch': 0.91, 'throughput': 10003.61} [INFO|2025-03-19 19:36:33] logging.py:143 >> {'loss': 0.6529, 'learning_rate': 3.9884e-05, 'epoch': 0.91, 'throughput': 10003.77} [INFO|2025-03-19 19:37:13] logging.py:143 >> {'loss': 0.5990, 'learning_rate': 3.9873e-05, 'epoch': 0.91, 'throughput': 10003.83} [INFO|2025-03-19 19:37:52] logging.py:143 >> {'loss': 0.6321, 'learning_rate': 3.9861e-05, 'epoch': 0.91, 'throughput': 10003.95} [INFO|2025-03-19 19:38:33] logging.py:143 >> {'loss': 0.6491, 'learning_rate': 3.9850e-05, 'epoch': 0.91, 'throughput': 10004.00} [INFO|2025-03-19 19:39:12] logging.py:143 >> {'loss': 0.6701, 'learning_rate': 3.9839e-05, 'epoch': 0.91, 'throughput': 10004.11} [INFO|2025-03-19 19:39:53] logging.py:143 >> {'loss': 0.6618, 'learning_rate': 3.9828e-05, 'epoch': 0.91, 'throughput': 10004.12} [INFO|2025-03-19 19:40:33] logging.py:143 >> {'loss': 0.6008, 'learning_rate': 3.9816e-05, 'epoch': 0.91, 'throughput': 10003.97} [INFO|2025-03-19 19:41:16] logging.py:143 >> {'loss': 0.6561, 'learning_rate': 3.9805e-05, 'epoch': 0.91, 'throughput': 10003.70} [INFO|2025-03-19 19:41:56] logging.py:143 >> {'loss': 0.6184, 'learning_rate': 3.9794e-05, 'epoch': 0.91, 'throughput': 10003.79} [INFO|2025-03-19 19:42:37] logging.py:143 >> {'loss': 0.6678, 'learning_rate': 3.9783e-05, 'epoch': 0.91, 'throughput': 10003.84} [INFO|2025-03-19 19:43:17] logging.py:143 >> {'loss': 0.6674, 'learning_rate': 3.9771e-05, 'epoch': 0.91, 'throughput': 10003.84} [INFO|2025-03-19 19:43:57] logging.py:143 >> {'loss': 0.6085, 'learning_rate': 3.9760e-05, 'epoch': 0.91, 'throughput': 10003.57} [INFO|2025-03-19 19:44:39] logging.py:143 >> {'loss': 0.5880, 'learning_rate': 3.9749e-05, 'epoch': 0.91, 'throughput': 10003.41} [INFO|2025-03-19 19:45:18] logging.py:143 >> {'loss': 0.6404, 'learning_rate': 3.9737e-05, 'epoch': 0.91, 'throughput': 10003.53} [INFO|2025-03-19 19:45:58] logging.py:143 >> {'loss': 0.6785, 'learning_rate': 3.9726e-05, 'epoch': 0.91, 'throughput': 10003.50} [INFO|2025-03-19 19:46:37] logging.py:143 >> {'loss': 0.6455, 'learning_rate': 3.9715e-05, 'epoch': 0.91, 'throughput': 10003.66} [INFO|2025-03-19 19:47:18] logging.py:143 >> {'loss': 0.6669, 'learning_rate': 3.9703e-05, 'epoch': 0.91, 'throughput': 10003.85} [INFO|2025-03-19 19:48:01] logging.py:143 >> {'loss': 0.6533, 'learning_rate': 3.9692e-05, 'epoch': 0.91, 'throughput': 10003.62} [INFO|2025-03-19 19:48:42] logging.py:143 >> {'loss': 0.6232, 'learning_rate': 3.9681e-05, 'epoch': 0.92, 'throughput': 10003.60} [INFO|2025-03-19 19:49:21] logging.py:143 >> {'loss': 0.7017, 'learning_rate': 3.9669e-05, 'epoch': 0.92, 'throughput': 10003.73} [INFO|2025-03-19 19:50:01] logging.py:143 >> {'loss': 0.6409, 'learning_rate': 3.9658e-05, 'epoch': 0.92, 'throughput': 10003.83} [INFO|2025-03-19 19:50:42] logging.py:143 >> {'loss': 0.6615, 'learning_rate': 3.9647e-05, 'epoch': 0.92, 'throughput': 10004.00} [INFO|2025-03-19 19:51:23] logging.py:143 >> {'loss': 0.6331, 'learning_rate': 3.9635e-05, 'epoch': 0.92, 'throughput': 10003.93} [INFO|2025-03-19 19:52:02] logging.py:143 >> {'loss': 0.6508, 'learning_rate': 3.9624e-05, 'epoch': 0.92, 'throughput': 10004.11} [INFO|2025-03-19 19:52:42] logging.py:143 >> {'loss': 0.6440, 'learning_rate': 3.9613e-05, 'epoch': 0.92, 'throughput': 10004.19} [INFO|2025-03-19 19:53:22] logging.py:143 >> {'loss': 0.6783, 'learning_rate': 3.9601e-05, 'epoch': 0.92, 'throughput': 10004.23} [INFO|2025-03-19 19:54:02] logging.py:143 >> {'loss': 0.6447, 'learning_rate': 3.9590e-05, 'epoch': 0.92, 'throughput': 10004.22} [INFO|2025-03-19 19:54:42] logging.py:143 >> {'loss': 0.6085, 'learning_rate': 3.9579e-05, 'epoch': 0.92, 'throughput': 10004.11} [INFO|2025-03-19 19:55:21] logging.py:143 >> {'loss': 0.6098, 'learning_rate': 3.9567e-05, 'epoch': 0.92, 'throughput': 10004.31} [INFO|2025-03-19 19:56:01] logging.py:143 >> {'loss': 0.6662, 'learning_rate': 3.9556e-05, 'epoch': 0.92, 'throughput': 10004.28} [INFO|2025-03-19 19:56:44] logging.py:143 >> {'loss': 0.6450, 'learning_rate': 3.9545e-05, 'epoch': 0.92, 'throughput': 10004.14} [INFO|2025-03-19 19:57:24] logging.py:143 >> {'loss': 0.6708, 'learning_rate': 3.9533e-05, 'epoch': 0.92, 'throughput': 10004.03} [INFO|2025-03-19 19:58:04] logging.py:143 >> {'loss': 0.5942, 'learning_rate': 3.9522e-05, 'epoch': 0.92, 'throughput': 10004.32} [INFO|2025-03-19 19:58:44] logging.py:143 >> {'loss': 0.6191, 'learning_rate': 3.9510e-05, 'epoch': 0.92, 'throughput': 10004.34} [INFO|2025-03-19 19:59:23] logging.py:143 >> {'loss': 0.6122, 'learning_rate': 3.9499e-05, 'epoch': 0.92, 'throughput': 10004.42} [INFO|2025-03-19 20:00:05] logging.py:143 >> {'loss': 0.6572, 'learning_rate': 3.9488e-05, 'epoch': 0.92, 'throughput': 10004.26} [INFO|2025-03-19 20:00:45] logging.py:143 >> {'loss': 0.5703, 'learning_rate': 3.9476e-05, 'epoch': 0.93, 'throughput': 10004.38} [INFO|2025-03-19 20:01:25] logging.py:143 >> {'loss': 0.6534, 'learning_rate': 3.9465e-05, 'epoch': 0.93, 'throughput': 10004.49} [INFO|2025-03-19 20:02:06] logging.py:143 >> {'loss': 0.6258, 'learning_rate': 3.9453e-05, 'epoch': 0.93, 'throughput': 10004.53} [INFO|2025-03-19 20:02:48] logging.py:143 >> {'loss': 0.6471, 'learning_rate': 3.9442e-05, 'epoch': 0.93, 'throughput': 10004.40} [INFO|2025-03-19 20:03:28] logging.py:143 >> {'loss': 0.6424, 'learning_rate': 3.9431e-05, 'epoch': 0.93, 'throughput': 10004.40} [INFO|2025-03-19 20:04:09] logging.py:143 >> {'loss': 0.6280, 'learning_rate': 3.9419e-05, 'epoch': 0.93, 'throughput': 10004.30} [INFO|2025-03-19 20:04:50] logging.py:143 >> {'loss': 0.6263, 'learning_rate': 3.9408e-05, 'epoch': 0.93, 'throughput': 10004.39} [INFO|2025-03-19 20:05:32] logging.py:143 >> {'loss': 0.6048, 'learning_rate': 3.9396e-05, 'epoch': 0.93, 'throughput': 10004.38} [INFO|2025-03-19 20:06:13] logging.py:143 >> {'loss': 0.6440, 'learning_rate': 3.9385e-05, 'epoch': 0.93, 'throughput': 10004.32} [INFO|2025-03-19 20:06:54] logging.py:143 >> {'loss': 0.6469, 'learning_rate': 3.9373e-05, 'epoch': 0.93, 'throughput': 10004.41} [INFO|2025-03-19 20:07:33] logging.py:143 >> {'loss': 0.6374, 'learning_rate': 3.9362e-05, 'epoch': 0.93, 'throughput': 10004.57} [INFO|2025-03-19 20:08:15] logging.py:143 >> {'loss': 0.6373, 'learning_rate': 3.9350e-05, 'epoch': 0.93, 'throughput': 10004.41} [INFO|2025-03-19 20:08:54] logging.py:143 >> {'loss': 0.6273, 'learning_rate': 3.9339e-05, 'epoch': 0.93, 'throughput': 10004.49} [INFO|2025-03-19 20:09:34] logging.py:143 >> {'loss': 0.6067, 'learning_rate': 3.9327e-05, 'epoch': 0.93, 'throughput': 10004.35} [INFO|2025-03-19 20:10:15] logging.py:143 >> {'loss': 0.6379, 'learning_rate': 3.9316e-05, 'epoch': 0.93, 'throughput': 10004.26} [INFO|2025-03-19 20:10:55] logging.py:143 >> {'loss': 0.6494, 'learning_rate': 3.9305e-05, 'epoch': 0.93, 'throughput': 10004.23} [INFO|2025-03-19 20:11:35] logging.py:143 >> {'loss': 0.6339, 'learning_rate': 3.9293e-05, 'epoch': 0.93, 'throughput': 10004.37} [INFO|2025-03-19 20:12:17] logging.py:143 >> {'loss': 0.6389, 'learning_rate': 3.9282e-05, 'epoch': 0.93, 'throughput': 10004.27} [INFO|2025-03-19 20:12:57] logging.py:143 >> {'loss': 0.6441, 'learning_rate': 3.9270e-05, 'epoch': 0.93, 'throughput': 10004.24} [INFO|2025-03-19 20:13:37] logging.py:143 >> {'loss': 0.6308, 'learning_rate': 3.9259e-05, 'epoch': 0.94, 'throughput': 10004.41} [INFO|2025-03-19 20:14:15] logging.py:143 >> {'loss': 0.6520, 'learning_rate': 3.9247e-05, 'epoch': 0.94, 'throughput': 10004.56} [INFO|2025-03-19 20:14:54] logging.py:143 >> {'loss': 0.6414, 'learning_rate': 3.9236e-05, 'epoch': 0.94, 'throughput': 10004.67} [INFO|2025-03-19 20:15:36] logging.py:143 >> {'loss': 0.6399, 'learning_rate': 3.9224e-05, 'epoch': 0.94, 'throughput': 10004.58} [INFO|2025-03-19 20:16:16] logging.py:143 >> {'loss': 0.6272, 'learning_rate': 3.9213e-05, 'epoch': 0.94, 'throughput': 10004.59} [INFO|2025-03-19 20:16:58] logging.py:143 >> {'loss': 0.6238, 'learning_rate': 3.9201e-05, 'epoch': 0.94, 'throughput': 10004.52} [INFO|2025-03-19 20:17:40] logging.py:143 >> {'loss': 0.6399, 'learning_rate': 3.9190e-05, 'epoch': 0.94, 'throughput': 10004.41} [INFO|2025-03-19 20:18:20] logging.py:143 >> {'loss': 0.6288, 'learning_rate': 3.9178e-05, 'epoch': 0.94, 'throughput': 10004.48} [INFO|2025-03-19 20:19:00] logging.py:143 >> {'loss': 0.6409, 'learning_rate': 3.9167e-05, 'epoch': 0.94, 'throughput': 10004.55} [INFO|2025-03-19 20:19:40] logging.py:143 >> {'loss': 0.5994, 'learning_rate': 3.9155e-05, 'epoch': 0.94, 'throughput': 10004.60} [INFO|2025-03-19 20:20:19] logging.py:143 >> {'loss': 0.6415, 'learning_rate': 3.9143e-05, 'epoch': 0.94, 'throughput': 10004.65} [INFO|2025-03-19 20:20:58] logging.py:143 >> {'loss': 0.6261, 'learning_rate': 3.9132e-05, 'epoch': 0.94, 'throughput': 10004.80} [INFO|2025-03-19 20:21:39] logging.py:143 >> {'loss': 0.6149, 'learning_rate': 3.9120e-05, 'epoch': 0.94, 'throughput': 10004.89} [INFO|2025-03-19 20:22:20] logging.py:143 >> {'loss': 0.6321, 'learning_rate': 3.9109e-05, 'epoch': 0.94, 'throughput': 10004.92} [INFO|2025-03-19 20:22:59] logging.py:143 >> {'loss': 0.6564, 'learning_rate': 3.9097e-05, 'epoch': 0.94, 'throughput': 10005.03} [INFO|2025-03-19 20:23:40] logging.py:143 >> {'loss': 0.6604, 'learning_rate': 3.9086e-05, 'epoch': 0.94, 'throughput': 10005.19} [INFO|2025-03-19 20:24:20] logging.py:143 >> {'loss': 0.6543, 'learning_rate': 3.9074e-05, 'epoch': 0.94, 'throughput': 10005.29} [INFO|2025-03-19 20:25:01] logging.py:143 >> {'loss': 0.6270, 'learning_rate': 3.9063e-05, 'epoch': 0.94, 'throughput': 10005.14} [INFO|2025-03-19 20:25:41] logging.py:143 >> {'loss': 0.6420, 'learning_rate': 3.9051e-05, 'epoch': 0.94, 'throughput': 10005.24} [INFO|2025-03-19 20:26:22] logging.py:143 >> {'loss': 0.6289, 'learning_rate': 3.9039e-05, 'epoch': 0.95, 'throughput': 10005.10} [INFO|2025-03-19 20:27:02] logging.py:143 >> {'loss': 0.6436, 'learning_rate': 3.9028e-05, 'epoch': 0.95, 'throughput': 10005.23} [INFO|2025-03-19 20:27:43] logging.py:143 >> {'loss': 0.5876, 'learning_rate': 3.9016e-05, 'epoch': 0.95, 'throughput': 10005.26} [INFO|2025-03-19 20:28:24] logging.py:143 >> {'loss': 0.6178, 'learning_rate': 3.9005e-05, 'epoch': 0.95, 'throughput': 10005.26} [INFO|2025-03-19 20:29:06] logging.py:143 >> {'loss': 0.5824, 'learning_rate': 3.8993e-05, 'epoch': 0.95, 'throughput': 10005.17} [INFO|2025-03-19 20:29:48] logging.py:143 >> {'loss': 0.6238, 'learning_rate': 3.8981e-05, 'epoch': 0.95, 'throughput': 10005.17} [INFO|2025-03-19 20:30:27] logging.py:143 >> {'loss': 0.6475, 'learning_rate': 3.8970e-05, 'epoch': 0.95, 'throughput': 10005.34} [INFO|2025-03-19 20:31:07] logging.py:143 >> {'loss': 0.6278, 'learning_rate': 3.8958e-05, 'epoch': 0.95, 'throughput': 10005.46} [INFO|2025-03-19 20:31:48] logging.py:143 >> {'loss': 0.6186, 'learning_rate': 3.8947e-05, 'epoch': 0.95, 'throughput': 10005.48} [INFO|2025-03-19 20:32:28] logging.py:143 >> {'loss': 0.6280, 'learning_rate': 3.8935e-05, 'epoch': 0.95, 'throughput': 10005.51} [INFO|2025-03-19 20:33:08] logging.py:143 >> {'loss': 0.6503, 'learning_rate': 3.8923e-05, 'epoch': 0.95, 'throughput': 10005.68} [INFO|2025-03-19 20:33:48] logging.py:143 >> {'loss': 0.6195, 'learning_rate': 3.8912e-05, 'epoch': 0.95, 'throughput': 10005.69} [INFO|2025-03-19 20:34:27] logging.py:143 >> {'loss': 0.6092, 'learning_rate': 3.8900e-05, 'epoch': 0.95, 'throughput': 10005.86} [INFO|2025-03-19 20:35:08] logging.py:143 >> {'loss': 0.6122, 'learning_rate': 3.8889e-05, 'epoch': 0.95, 'throughput': 10005.70} [INFO|2025-03-19 20:35:50] logging.py:143 >> {'loss': 0.6616, 'learning_rate': 3.8877e-05, 'epoch': 0.95, 'throughput': 10005.61} [INFO|2025-03-19 20:36:31] logging.py:143 >> {'loss': 0.6117, 'learning_rate': 3.8865e-05, 'epoch': 0.95, 'throughput': 10005.62} [INFO|2025-03-19 20:37:10] logging.py:143 >> {'loss': 0.6058, 'learning_rate': 3.8854e-05, 'epoch': 0.95, 'throughput': 10005.69} [INFO|2025-03-19 20:37:51] logging.py:143 >> {'loss': 0.6264, 'learning_rate': 3.8842e-05, 'epoch': 0.95, 'throughput': 10005.80} [INFO|2025-03-19 20:38:31] logging.py:143 >> {'loss': 0.6495, 'learning_rate': 3.8830e-05, 'epoch': 0.95, 'throughput': 10005.76} [INFO|2025-03-19 20:39:12] logging.py:143 >> {'loss': 0.6175, 'learning_rate': 3.8819e-05, 'epoch': 0.96, 'throughput': 10005.72} [INFO|2025-03-19 20:39:53] logging.py:143 >> {'loss': 0.6225, 'learning_rate': 3.8807e-05, 'epoch': 0.96, 'throughput': 10005.76} [INFO|2025-03-19 20:40:32] logging.py:143 >> {'loss': 0.6478, 'learning_rate': 3.8795e-05, 'epoch': 0.96, 'throughput': 10005.97} [INFO|2025-03-19 20:41:12] logging.py:143 >> {'loss': 0.6089, 'learning_rate': 3.8784e-05, 'epoch': 0.96, 'throughput': 10005.96} [INFO|2025-03-19 20:41:54] logging.py:143 >> {'loss': 0.6432, 'learning_rate': 3.8772e-05, 'epoch': 0.96, 'throughput': 10005.70} [INFO|2025-03-19 20:42:33] logging.py:143 >> {'loss': 0.6102, 'learning_rate': 3.8760e-05, 'epoch': 0.96, 'throughput': 10005.72} [INFO|2025-03-19 20:43:13] logging.py:143 >> {'loss': 0.6400, 'learning_rate': 3.8749e-05, 'epoch': 0.96, 'throughput': 10005.77} [INFO|2025-03-19 20:43:55] logging.py:143 >> {'loss': 0.6028, 'learning_rate': 3.8737e-05, 'epoch': 0.96, 'throughput': 10005.54} [INFO|2025-03-19 20:44:35] logging.py:143 >> {'loss': 0.6136, 'learning_rate': 3.8725e-05, 'epoch': 0.96, 'throughput': 10005.54} [INFO|2025-03-19 20:45:15] logging.py:143 >> {'loss': 0.6225, 'learning_rate': 3.8714e-05, 'epoch': 0.96, 'throughput': 10005.54} [INFO|2025-03-19 20:45:56] logging.py:143 >> {'loss': 0.6285, 'learning_rate': 3.8702e-05, 'epoch': 0.96, 'throughput': 10005.50} [INFO|2025-03-19 20:46:37] logging.py:143 >> {'loss': 0.6403, 'learning_rate': 3.8690e-05, 'epoch': 0.96, 'throughput': 10005.67} [INFO|2025-03-19 20:47:18] logging.py:143 >> {'loss': 0.6156, 'learning_rate': 3.8678e-05, 'epoch': 0.96, 'throughput': 10005.61} [INFO|2025-03-19 20:47:59] logging.py:143 >> {'loss': 0.6029, 'learning_rate': 3.8667e-05, 'epoch': 0.96, 'throughput': 10005.50} [INFO|2025-03-19 20:48:39] logging.py:143 >> {'loss': 0.6270, 'learning_rate': 3.8655e-05, 'epoch': 0.96, 'throughput': 10005.60} [INFO|2025-03-19 20:49:20] logging.py:143 >> {'loss': 0.6492, 'learning_rate': 3.8643e-05, 'epoch': 0.96, 'throughput': 10005.58} [INFO|2025-03-19 20:50:02] logging.py:143 >> {'loss': 0.6289, 'learning_rate': 3.8632e-05, 'epoch': 0.96, 'throughput': 10005.48} [INFO|2025-03-19 20:50:43] logging.py:143 >> {'loss': 0.6642, 'learning_rate': 3.8620e-05, 'epoch': 0.96, 'throughput': 10005.40} [INFO|2025-03-19 20:51:24] logging.py:143 >> {'loss': 0.6063, 'learning_rate': 3.8608e-05, 'epoch': 0.96, 'throughput': 10005.36} [INFO|2025-03-19 20:52:03] logging.py:143 >> {'loss': 0.6393, 'learning_rate': 3.8596e-05, 'epoch': 0.97, 'throughput': 10005.45} [INFO|2025-03-19 20:52:43] logging.py:143 >> {'loss': 0.5885, 'learning_rate': 3.8585e-05, 'epoch': 0.97, 'throughput': 10005.25} [INFO|2025-03-19 20:53:23] logging.py:143 >> {'loss': 0.6160, 'learning_rate': 3.8573e-05, 'epoch': 0.97, 'throughput': 10005.26} [INFO|2025-03-19 20:54:05] logging.py:143 >> {'loss': 0.6161, 'learning_rate': 3.8561e-05, 'epoch': 0.97, 'throughput': 10005.24} [INFO|2025-03-19 20:54:45] logging.py:143 >> {'loss': 0.6205, 'learning_rate': 3.8549e-05, 'epoch': 0.97, 'throughput': 10005.08} [INFO|2025-03-19 20:55:26] logging.py:143 >> {'loss': 0.6413, 'learning_rate': 3.8538e-05, 'epoch': 0.97, 'throughput': 10005.01} [INFO|2025-03-19 20:56:06] logging.py:143 >> {'loss': 0.6171, 'learning_rate': 3.8526e-05, 'epoch': 0.97, 'throughput': 10005.07} [INFO|2025-03-19 20:56:48] logging.py:143 >> {'loss': 0.6345, 'learning_rate': 3.8514e-05, 'epoch': 0.97, 'throughput': 10005.05} [INFO|2025-03-19 20:57:28] logging.py:143 >> {'loss': 0.6508, 'learning_rate': 3.8502e-05, 'epoch': 0.97, 'throughput': 10005.11} [INFO|2025-03-19 20:58:08] logging.py:143 >> {'loss': 0.6293, 'learning_rate': 3.8490e-05, 'epoch': 0.97, 'throughput': 10005.32} [INFO|2025-03-19 20:58:49] logging.py:143 >> {'loss': 0.6275, 'learning_rate': 3.8479e-05, 'epoch': 0.97, 'throughput': 10005.37} [INFO|2025-03-19 20:59:29] logging.py:143 >> {'loss': 0.6537, 'learning_rate': 3.8467e-05, 'epoch': 0.97, 'throughput': 10005.48} [INFO|2025-03-19 21:00:10] logging.py:143 >> {'loss': 0.6080, 'learning_rate': 3.8455e-05, 'epoch': 0.97, 'throughput': 10005.40} [INFO|2025-03-19 21:00:51] logging.py:143 >> {'loss': 0.6166, 'learning_rate': 3.8443e-05, 'epoch': 0.97, 'throughput': 10005.38} [INFO|2025-03-19 21:01:32] logging.py:143 >> {'loss': 0.6086, 'learning_rate': 3.8431e-05, 'epoch': 0.97, 'throughput': 10005.32} [INFO|2025-03-19 21:02:13] logging.py:143 >> {'loss': 0.6320, 'learning_rate': 3.8420e-05, 'epoch': 0.97, 'throughput': 10005.49} [INFO|2025-03-19 21:02:52] logging.py:143 >> {'loss': 0.6488, 'learning_rate': 3.8408e-05, 'epoch': 0.97, 'throughput': 10005.59} [INFO|2025-03-19 21:03:33] logging.py:143 >> {'loss': 0.6321, 'learning_rate': 3.8396e-05, 'epoch': 0.97, 'throughput': 10005.65} [INFO|2025-03-19 21:04:14] logging.py:143 >> {'loss': 0.6172, 'learning_rate': 3.8384e-05, 'epoch': 0.97, 'throughput': 10005.49} [INFO|2025-03-19 21:04:53] logging.py:143 >> {'loss': 0.6123, 'learning_rate': 3.8372e-05, 'epoch': 0.98, 'throughput': 10005.50} [INFO|2025-03-19 21:05:34] logging.py:143 >> {'loss': 0.6373, 'learning_rate': 3.8361e-05, 'epoch': 0.98, 'throughput': 10005.64} [INFO|2025-03-19 21:06:13] logging.py:143 >> {'loss': 0.6242, 'learning_rate': 3.8349e-05, 'epoch': 0.98, 'throughput': 10005.82} [INFO|2025-03-19 21:06:54] logging.py:143 >> {'loss': 0.6038, 'learning_rate': 3.8337e-05, 'epoch': 0.98, 'throughput': 10005.69} [INFO|2025-03-19 21:07:34] logging.py:143 >> {'loss': 0.6328, 'learning_rate': 3.8325e-05, 'epoch': 0.98, 'throughput': 10005.77} [INFO|2025-03-19 21:08:15] logging.py:143 >> {'loss': 0.6338, 'learning_rate': 3.8313e-05, 'epoch': 0.98, 'throughput': 10005.55} [INFO|2025-03-19 21:08:55] logging.py:143 >> {'loss': 0.6296, 'learning_rate': 3.8301e-05, 'epoch': 0.98, 'throughput': 10005.73} [INFO|2025-03-19 21:09:35] logging.py:143 >> {'loss': 0.5921, 'learning_rate': 3.8290e-05, 'epoch': 0.98, 'throughput': 10005.77} [INFO|2025-03-19 21:10:15] logging.py:143 >> {'loss': 0.6345, 'learning_rate': 3.8278e-05, 'epoch': 0.98, 'throughput': 10005.87} [INFO|2025-03-19 21:10:55] logging.py:143 >> {'loss': 0.6334, 'learning_rate': 3.8266e-05, 'epoch': 0.98, 'throughput': 10006.07} [INFO|2025-03-19 21:11:37] logging.py:143 >> {'loss': 0.6444, 'learning_rate': 3.8254e-05, 'epoch': 0.98, 'throughput': 10005.92} [INFO|2025-03-19 21:12:17] logging.py:143 >> {'loss': 0.6254, 'learning_rate': 3.8242e-05, 'epoch': 0.98, 'throughput': 10005.98} [INFO|2025-03-19 21:12:56] logging.py:143 >> {'loss': 0.6419, 'learning_rate': 3.8230e-05, 'epoch': 0.98, 'throughput': 10006.13} [INFO|2025-03-19 21:13:36] logging.py:143 >> {'loss': 0.6284, 'learning_rate': 3.8218e-05, 'epoch': 0.98, 'throughput': 10006.33} [INFO|2025-03-19 21:14:15] logging.py:143 >> {'loss': 0.6244, 'learning_rate': 3.8206e-05, 'epoch': 0.98, 'throughput': 10006.43} [INFO|2025-03-19 21:14:54] logging.py:143 >> {'loss': 0.6523, 'learning_rate': 3.8195e-05, 'epoch': 0.98, 'throughput': 10006.42} [INFO|2025-03-19 21:15:36] logging.py:143 >> {'loss': 0.6455, 'learning_rate': 3.8183e-05, 'epoch': 0.98, 'throughput': 10006.46} [INFO|2025-03-19 21:16:14] logging.py:143 >> {'loss': 0.6336, 'learning_rate': 3.8171e-05, 'epoch': 0.98, 'throughput': 10006.66} [INFO|2025-03-19 21:16:54] logging.py:143 >> {'loss': 0.6495, 'learning_rate': 3.8159e-05, 'epoch': 0.99, 'throughput': 10006.63} [INFO|2025-03-19 21:17:34] logging.py:143 >> {'loss': 0.6265, 'learning_rate': 3.8147e-05, 'epoch': 0.99, 'throughput': 10006.74} [INFO|2025-03-19 21:18:14] logging.py:143 >> {'loss': 0.6473, 'learning_rate': 3.8135e-05, 'epoch': 0.99, 'throughput': 10006.95} [INFO|2025-03-19 21:18:53] logging.py:143 >> {'loss': 0.6184, 'learning_rate': 3.8123e-05, 'epoch': 0.99, 'throughput': 10007.14} [INFO|2025-03-19 21:19:33] logging.py:143 >> {'loss': 0.6212, 'learning_rate': 3.8111e-05, 'epoch': 0.99, 'throughput': 10007.27} [INFO|2025-03-19 21:20:15] logging.py:143 >> {'loss': 0.6296, 'learning_rate': 3.8099e-05, 'epoch': 0.99, 'throughput': 10007.42} [INFO|2025-03-19 21:20:54] logging.py:143 >> {'loss': 0.5899, 'learning_rate': 3.8087e-05, 'epoch': 0.99, 'throughput': 10007.40} [INFO|2025-03-19 21:21:35] logging.py:143 >> {'loss': 0.6492, 'learning_rate': 3.8076e-05, 'epoch': 0.99, 'throughput': 10007.30} [INFO|2025-03-19 21:22:14] logging.py:143 >> {'loss': 0.6519, 'learning_rate': 3.8064e-05, 'epoch': 0.99, 'throughput': 10007.38} [INFO|2025-03-19 21:22:54] logging.py:143 >> {'loss': 0.6395, 'learning_rate': 3.8052e-05, 'epoch': 0.99, 'throughput': 10007.37} [INFO|2025-03-19 21:23:34] logging.py:143 >> {'loss': 0.6124, 'learning_rate': 3.8040e-05, 'epoch': 0.99, 'throughput': 10007.24} [INFO|2025-03-19 21:24:14] logging.py:143 >> {'loss': 0.6350, 'learning_rate': 3.8028e-05, 'epoch': 0.99, 'throughput': 10007.23} [INFO|2025-03-19 21:24:53] logging.py:143 >> {'loss': 0.6093, 'learning_rate': 3.8016e-05, 'epoch': 0.99, 'throughput': 10007.20} [INFO|2025-03-19 21:25:33] logging.py:143 >> {'loss': 0.6297, 'learning_rate': 3.8004e-05, 'epoch': 0.99, 'throughput': 10007.33} [INFO|2025-03-19 21:26:14] logging.py:143 >> {'loss': 0.6094, 'learning_rate': 3.7992e-05, 'epoch': 0.99, 'throughput': 10007.37} [INFO|2025-03-19 21:26:55] logging.py:143 >> {'loss': 0.6117, 'learning_rate': 3.7980e-05, 'epoch': 0.99, 'throughput': 10007.28} [INFO|2025-03-19 21:27:35] logging.py:143 >> {'loss': 0.6299, 'learning_rate': 3.7968e-05, 'epoch': 0.99, 'throughput': 10007.29} [INFO|2025-03-19 21:28:14] logging.py:143 >> {'loss': 0.6229, 'learning_rate': 3.7956e-05, 'epoch': 0.99, 'throughput': 10007.27} [INFO|2025-03-19 21:28:54] logging.py:143 >> {'loss': 0.6406, 'learning_rate': 3.7944e-05, 'epoch': 0.99, 'throughput': 10007.40} [INFO|2025-03-19 21:29:34] logging.py:143 >> {'loss': 0.6479, 'learning_rate': 3.7932e-05, 'epoch': 1.00, 'throughput': 10007.38} [INFO|2025-03-19 21:30:14] logging.py:143 >> {'loss': 0.6400, 'learning_rate': 3.7920e-05, 'epoch': 1.00, 'throughput': 10007.31} [INFO|2025-03-19 21:30:54] logging.py:143 >> {'loss': 0.6575, 'learning_rate': 3.7908e-05, 'epoch': 1.00, 'throughput': 10007.43} [INFO|2025-03-19 21:31:38] logging.py:143 >> {'loss': 0.5763, 'learning_rate': 3.7896e-05, 'epoch': 1.00, 'throughput': 10007.08} [INFO|2025-03-19 21:32:18] logging.py:143 >> {'loss': 0.6044, 'learning_rate': 3.7884e-05, 'epoch': 1.00, 'throughput': 10007.10} [INFO|2025-03-19 21:32:57] logging.py:143 >> {'loss': 0.6060, 'learning_rate': 3.7872e-05, 'epoch': 1.00, 'throughput': 10007.26} [INFO|2025-03-19 21:33:38] logging.py:143 >> {'loss': 0.6374, 'learning_rate': 3.7860e-05, 'epoch': 1.00, 'throughput': 10007.33} [INFO|2025-03-19 21:34:19] logging.py:143 >> {'loss': 0.6220, 'learning_rate': 3.7848e-05, 'epoch': 1.00, 'throughput': 10007.25} [INFO|2025-03-19 21:34:59] logging.py:143 >> {'loss': 0.6075, 'learning_rate': 3.7836e-05, 'epoch': 1.00, 'throughput': 10007.25} [INFO|2025-03-19 21:35:39] logging.py:143 >> {'loss': 0.6012, 'learning_rate': 3.7824e-05, 'epoch': 1.00, 'throughput': 10007.39} [INFO|2025-03-19 21:36:18] logging.py:143 >> {'loss': 0.5125, 'learning_rate': 3.7812e-05, 'epoch': 1.00, 'throughput': 10007.44} [INFO|2025-03-19 21:37:01] logging.py:143 >> {'loss': 0.4443, 'learning_rate': 3.7800e-05, 'epoch': 1.00, 'throughput': 10007.33} [INFO|2025-03-19 21:37:42] logging.py:143 >> {'loss': 0.4888, 'learning_rate': 3.7788e-05, 'epoch': 1.00, 'throughput': 10007.38} [INFO|2025-03-19 21:38:23] logging.py:143 >> {'loss': 0.4994, 'learning_rate': 3.7776e-05, 'epoch': 1.00, 'throughput': 10007.45} [INFO|2025-03-19 21:39:02] logging.py:143 >> {'loss': 0.4860, 'learning_rate': 3.7764e-05, 'epoch': 1.00, 'throughput': 10007.48} [INFO|2025-03-19 21:39:42] logging.py:143 >> {'loss': 0.4778, 'learning_rate': 3.7752e-05, 'epoch': 1.00, 'throughput': 10007.52} [INFO|2025-03-19 21:40:23] logging.py:143 >> {'loss': 0.5125, 'learning_rate': 3.7740e-05, 'epoch': 1.00, 'throughput': 10007.52} [INFO|2025-03-19 21:41:03] logging.py:143 >> {'loss': 0.4612, 'learning_rate': 3.7728e-05, 'epoch': 1.00, 'throughput': 10007.41} [INFO|2025-03-19 21:41:45] logging.py:143 >> {'loss': 0.4812, 'learning_rate': 3.7716e-05, 'epoch': 1.00, 'throughput': 10007.39} [INFO|2025-03-19 21:42:24] logging.py:143 >> {'loss': 0.4765, 'learning_rate': 3.7704e-05, 'epoch': 1.01, 'throughput': 10007.50} [INFO|2025-03-19 21:43:04] logging.py:143 >> {'loss': 0.4858, 'learning_rate': 3.7692e-05, 'epoch': 1.01, 'throughput': 10007.45} [INFO|2025-03-19 21:43:44] logging.py:143 >> {'loss': 0.4754, 'learning_rate': 3.7680e-05, 'epoch': 1.01, 'throughput': 10007.65} [INFO|2025-03-19 21:44:24] logging.py:143 >> {'loss': 0.4641, 'learning_rate': 3.7668e-05, 'epoch': 1.01, 'throughput': 10007.55} [INFO|2025-03-19 21:45:04] logging.py:143 >> {'loss': 0.4958, 'learning_rate': 3.7656e-05, 'epoch': 1.01, 'throughput': 10007.67} [INFO|2025-03-19 21:45:45] logging.py:143 >> {'loss': 0.4801, 'learning_rate': 3.7644e-05, 'epoch': 1.01, 'throughput': 10007.66} [INFO|2025-03-19 21:46:25] logging.py:143 >> {'loss': 0.4822, 'learning_rate': 3.7631e-05, 'epoch': 1.01, 'throughput': 10007.64} [INFO|2025-03-19 21:47:07] logging.py:143 >> {'loss': 0.4887, 'learning_rate': 3.7619e-05, 'epoch': 1.01, 'throughput': 10007.69} [INFO|2025-03-19 21:47:47] logging.py:143 >> {'loss': 0.4702, 'learning_rate': 3.7607e-05, 'epoch': 1.01, 'throughput': 10007.73} [INFO|2025-03-19 21:48:27] logging.py:143 >> {'loss': 0.4581, 'learning_rate': 3.7595e-05, 'epoch': 1.01, 'throughput': 10007.68} [INFO|2025-03-19 21:49:08] logging.py:143 >> {'loss': 0.4788, 'learning_rate': 3.7583e-05, 'epoch': 1.01, 'throughput': 10007.71} [INFO|2025-03-19 21:49:49] logging.py:143 >> {'loss': 0.4919, 'learning_rate': 3.7571e-05, 'epoch': 1.01, 'throughput': 10007.77} [INFO|2025-03-19 21:50:29] logging.py:143 >> {'loss': 0.4546, 'learning_rate': 3.7559e-05, 'epoch': 1.01, 'throughput': 10007.91} [INFO|2025-03-19 21:51:08] logging.py:143 >> {'loss': 0.4631, 'learning_rate': 3.7547e-05, 'epoch': 1.01, 'throughput': 10007.86} [INFO|2025-03-19 21:51:50] logging.py:143 >> {'loss': 0.4801, 'learning_rate': 3.7535e-05, 'epoch': 1.01, 'throughput': 10007.72} [INFO|2025-03-19 21:52:32] logging.py:143 >> {'loss': 0.4905, 'learning_rate': 3.7523e-05, 'epoch': 1.01, 'throughput': 10007.64} [INFO|2025-03-19 21:53:13] logging.py:143 >> {'loss': 0.4985, 'learning_rate': 3.7511e-05, 'epoch': 1.01, 'throughput': 10007.60} [INFO|2025-03-19 21:53:53] logging.py:143 >> {'loss': 0.4919, 'learning_rate': 3.7498e-05, 'epoch': 1.01, 'throughput': 10007.70} [INFO|2025-03-19 21:54:32] logging.py:143 >> {'loss': 0.4923, 'learning_rate': 3.7486e-05, 'epoch': 1.01, 'throughput': 10007.72} [INFO|2025-03-19 21:55:13] logging.py:143 >> {'loss': 0.5024, 'learning_rate': 3.7474e-05, 'epoch': 1.02, 'throughput': 10007.65} [INFO|2025-03-19 21:55:53] logging.py:143 >> {'loss': 0.4976, 'learning_rate': 3.7462e-05, 'epoch': 1.02, 'throughput': 10007.78} [INFO|2025-03-19 21:56:35] logging.py:143 >> {'loss': 0.4469, 'learning_rate': 3.7450e-05, 'epoch': 1.02, 'throughput': 10007.60} [INFO|2025-03-19 21:57:17] logging.py:143 >> {'loss': 0.5009, 'learning_rate': 3.7438e-05, 'epoch': 1.02, 'throughput': 10007.51} [INFO|2025-03-19 21:57:58] logging.py:143 >> {'loss': 0.4858, 'learning_rate': 3.7426e-05, 'epoch': 1.02, 'throughput': 10007.60} [INFO|2025-03-19 21:58:37] logging.py:143 >> {'loss': 0.4737, 'learning_rate': 3.7413e-05, 'epoch': 1.02, 'throughput': 10007.83} [INFO|2025-03-19 21:59:17] logging.py:143 >> {'loss': 0.4827, 'learning_rate': 3.7401e-05, 'epoch': 1.02, 'throughput': 10007.93} [INFO|2025-03-19 21:59:57] logging.py:143 >> {'loss': 0.4738, 'learning_rate': 3.7389e-05, 'epoch': 1.02, 'throughput': 10007.97} [INFO|2025-03-19 22:00:37] logging.py:143 >> {'loss': 0.4565, 'learning_rate': 3.7377e-05, 'epoch': 1.02, 'throughput': 10008.02} [INFO|2025-03-19 22:01:17] logging.py:143 >> {'loss': 0.4711, 'learning_rate': 3.7365e-05, 'epoch': 1.02, 'throughput': 10008.11} [INFO|2025-03-19 22:01:59] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 3.7353e-05, 'epoch': 1.02, 'throughput': 10008.01} [INFO|2025-03-19 22:02:41] logging.py:143 >> {'loss': 0.4624, 'learning_rate': 3.7341e-05, 'epoch': 1.02, 'throughput': 10007.83} [INFO|2025-03-19 22:03:22] logging.py:143 >> {'loss': 0.4605, 'learning_rate': 3.7328e-05, 'epoch': 1.02, 'throughput': 10007.87} [INFO|2025-03-19 22:04:02] logging.py:143 >> {'loss': 0.4831, 'learning_rate': 3.7316e-05, 'epoch': 1.02, 'throughput': 10007.84} [INFO|2025-03-19 22:04:41] logging.py:143 >> {'loss': 0.4607, 'learning_rate': 3.7304e-05, 'epoch': 1.02, 'throughput': 10007.95} [INFO|2025-03-19 22:05:21] logging.py:143 >> {'loss': 0.4730, 'learning_rate': 3.7292e-05, 'epoch': 1.02, 'throughput': 10007.94} [INFO|2025-03-19 22:06:03] logging.py:143 >> {'loss': 0.4553, 'learning_rate': 3.7280e-05, 'epoch': 1.02, 'throughput': 10007.97} [INFO|2025-03-19 22:06:43] logging.py:143 >> {'loss': 0.4799, 'learning_rate': 3.7267e-05, 'epoch': 1.02, 'throughput': 10008.04} [INFO|2025-03-19 22:07:22] logging.py:143 >> {'loss': 0.4660, 'learning_rate': 3.7255e-05, 'epoch': 1.02, 'throughput': 10008.09} [INFO|2025-03-19 22:08:01] logging.py:143 >> {'loss': 0.4557, 'learning_rate': 3.7243e-05, 'epoch': 1.03, 'throughput': 10008.30} [INFO|2025-03-19 22:08:42] logging.py:143 >> {'loss': 0.4818, 'learning_rate': 3.7231e-05, 'epoch': 1.03, 'throughput': 10008.34} [INFO|2025-03-19 22:09:24] logging.py:143 >> {'loss': 0.4917, 'learning_rate': 3.7219e-05, 'epoch': 1.03, 'throughput': 10008.43} [INFO|2025-03-19 22:10:03] logging.py:143 >> {'loss': 0.4473, 'learning_rate': 3.7206e-05, 'epoch': 1.03, 'throughput': 10008.49} [INFO|2025-03-19 22:10:44] logging.py:143 >> {'loss': 0.4751, 'learning_rate': 3.7194e-05, 'epoch': 1.03, 'throughput': 10008.46} [INFO|2025-03-19 22:11:23] logging.py:143 >> {'loss': 0.4499, 'learning_rate': 3.7182e-05, 'epoch': 1.03, 'throughput': 10008.49} [INFO|2025-03-19 22:12:03] logging.py:143 >> {'loss': 0.4888, 'learning_rate': 3.7170e-05, 'epoch': 1.03, 'throughput': 10008.59} [INFO|2025-03-19 22:12:42] logging.py:143 >> {'loss': 0.4630, 'learning_rate': 3.7158e-05, 'epoch': 1.03, 'throughput': 10008.69} [INFO|2025-03-19 22:13:21] logging.py:143 >> {'loss': 0.4743, 'learning_rate': 3.7145e-05, 'epoch': 1.03, 'throughput': 10008.90} [INFO|2025-03-19 22:14:01] logging.py:143 >> {'loss': 0.4819, 'learning_rate': 3.7133e-05, 'epoch': 1.03, 'throughput': 10008.86} [INFO|2025-03-19 22:14:42] logging.py:143 >> {'loss': 0.4961, 'learning_rate': 3.7121e-05, 'epoch': 1.03, 'throughput': 10008.85} [INFO|2025-03-19 22:15:24] logging.py:143 >> {'loss': 0.4612, 'learning_rate': 3.7109e-05, 'epoch': 1.03, 'throughput': 10008.61} [INFO|2025-03-19 22:16:05] logging.py:143 >> {'loss': 0.4845, 'learning_rate': 3.7096e-05, 'epoch': 1.03, 'throughput': 10008.46} [INFO|2025-03-19 22:16:46] logging.py:143 >> {'loss': 0.4543, 'learning_rate': 3.7084e-05, 'epoch': 1.03, 'throughput': 10008.50} [INFO|2025-03-19 22:17:26] logging.py:143 >> {'loss': 0.4570, 'learning_rate': 3.7072e-05, 'epoch': 1.03, 'throughput': 10008.55} [INFO|2025-03-19 22:18:06] logging.py:143 >> {'loss': 0.4755, 'learning_rate': 3.7060e-05, 'epoch': 1.03, 'throughput': 10008.72} [INFO|2025-03-19 22:18:47] logging.py:143 >> {'loss': 0.4723, 'learning_rate': 3.7047e-05, 'epoch': 1.03, 'throughput': 10008.58} [INFO|2025-03-19 22:19:30] logging.py:143 >> {'loss': 0.4804, 'learning_rate': 3.7035e-05, 'epoch': 1.03, 'throughput': 10008.40} [INFO|2025-03-19 22:20:10] logging.py:143 >> {'loss': 0.4722, 'learning_rate': 3.7023e-05, 'epoch': 1.03, 'throughput': 10008.36} [INFO|2025-03-19 22:20:49] logging.py:143 >> {'loss': 0.4867, 'learning_rate': 3.7011e-05, 'epoch': 1.04, 'throughput': 10008.38} [INFO|2025-03-19 22:21:29] logging.py:143 >> {'loss': 0.4701, 'learning_rate': 3.6998e-05, 'epoch': 1.04, 'throughput': 10008.30} [INFO|2025-03-19 22:22:09] logging.py:143 >> {'loss': 0.4579, 'learning_rate': 3.6986e-05, 'epoch': 1.04, 'throughput': 10008.47} [INFO|2025-03-19 22:22:50] logging.py:143 >> {'loss': 0.4794, 'learning_rate': 3.6974e-05, 'epoch': 1.04, 'throughput': 10008.33} [INFO|2025-03-19 22:23:30] logging.py:143 >> {'loss': 0.4647, 'learning_rate': 3.6961e-05, 'epoch': 1.04, 'throughput': 10008.40} [INFO|2025-03-19 22:24:11] logging.py:143 >> {'loss': 0.4755, 'learning_rate': 3.6949e-05, 'epoch': 1.04, 'throughput': 10008.37} [INFO|2025-03-19 22:24:51] logging.py:143 >> {'loss': 0.4604, 'learning_rate': 3.6937e-05, 'epoch': 1.04, 'throughput': 10008.33} [INFO|2025-03-19 22:25:32] logging.py:143 >> {'loss': 0.4766, 'learning_rate': 3.6925e-05, 'epoch': 1.04, 'throughput': 10008.27} [INFO|2025-03-19 22:26:12] logging.py:143 >> {'loss': 0.4719, 'learning_rate': 3.6912e-05, 'epoch': 1.04, 'throughput': 10008.33} [INFO|2025-03-19 22:26:54] logging.py:143 >> {'loss': 0.4637, 'learning_rate': 3.6900e-05, 'epoch': 1.04, 'throughput': 10008.18} [INFO|2025-03-19 22:27:36] logging.py:143 >> {'loss': 0.4687, 'learning_rate': 3.6888e-05, 'epoch': 1.04, 'throughput': 10008.12} [INFO|2025-03-19 22:28:17] logging.py:143 >> {'loss': 0.4785, 'learning_rate': 3.6875e-05, 'epoch': 1.04, 'throughput': 10008.08} [INFO|2025-03-19 22:28:58] logging.py:143 >> {'loss': 0.5129, 'learning_rate': 3.6863e-05, 'epoch': 1.04, 'throughput': 10008.13} [INFO|2025-03-19 22:29:39] logging.py:143 >> {'loss': 0.4734, 'learning_rate': 3.6851e-05, 'epoch': 1.04, 'throughput': 10008.13} [INFO|2025-03-19 22:30:20] logging.py:143 >> {'loss': 0.4815, 'learning_rate': 3.6838e-05, 'epoch': 1.04, 'throughput': 10008.11} [INFO|2025-03-19 22:31:00] logging.py:143 >> {'loss': 0.4864, 'learning_rate': 3.6826e-05, 'epoch': 1.04, 'throughput': 10008.12} [INFO|2025-03-19 22:31:40] logging.py:143 >> {'loss': 0.4412, 'learning_rate': 3.6814e-05, 'epoch': 1.04, 'throughput': 10008.19} [INFO|2025-03-19 22:32:21] logging.py:143 >> {'loss': 0.5074, 'learning_rate': 3.6801e-05, 'epoch': 1.04, 'throughput': 10008.28} [INFO|2025-03-19 22:33:02] logging.py:143 >> {'loss': 0.4459, 'learning_rate': 3.6789e-05, 'epoch': 1.05, 'throughput': 10008.11} [INFO|2025-03-19 22:33:42] logging.py:143 >> {'loss': 0.4923, 'learning_rate': 3.6777e-05, 'epoch': 1.05, 'throughput': 10008.08} [INFO|2025-03-19 22:34:22] logging.py:143 >> {'loss': 0.4631, 'learning_rate': 3.6764e-05, 'epoch': 1.05, 'throughput': 10008.11} [INFO|2025-03-19 22:35:02] logging.py:143 >> {'loss': 0.5078, 'learning_rate': 3.6752e-05, 'epoch': 1.05, 'throughput': 10008.29} [INFO|2025-03-19 22:35:43] logging.py:143 >> {'loss': 0.4789, 'learning_rate': 3.6740e-05, 'epoch': 1.05, 'throughput': 10008.32} [INFO|2025-03-19 22:36:23] logging.py:143 >> {'loss': 0.4769, 'learning_rate': 3.6727e-05, 'epoch': 1.05, 'throughput': 10008.38} [INFO|2025-03-19 22:37:05] logging.py:143 >> {'loss': 0.4803, 'learning_rate': 3.6715e-05, 'epoch': 1.05, 'throughput': 10008.29} [INFO|2025-03-19 22:37:44] logging.py:143 >> {'loss': 0.4817, 'learning_rate': 3.6703e-05, 'epoch': 1.05, 'throughput': 10008.36} [INFO|2025-03-19 22:38:23] logging.py:143 >> {'loss': 0.4596, 'learning_rate': 3.6690e-05, 'epoch': 1.05, 'throughput': 10008.49} [INFO|2025-03-19 22:39:03] logging.py:143 >> {'loss': 0.4851, 'learning_rate': 3.6678e-05, 'epoch': 1.05, 'throughput': 10008.64} [INFO|2025-03-19 22:39:43] logging.py:143 >> {'loss': 0.4733, 'learning_rate': 3.6665e-05, 'epoch': 1.05, 'throughput': 10008.64} [INFO|2025-03-19 22:40:24] logging.py:143 >> {'loss': 0.4881, 'learning_rate': 3.6653e-05, 'epoch': 1.05, 'throughput': 10008.74} [INFO|2025-03-19 22:41:04] logging.py:143 >> {'loss': 0.4855, 'learning_rate': 3.6641e-05, 'epoch': 1.05, 'throughput': 10008.72} [INFO|2025-03-19 22:41:44] logging.py:143 >> {'loss': 0.4971, 'learning_rate': 3.6628e-05, 'epoch': 1.05, 'throughput': 10008.80} [INFO|2025-03-19 22:42:25] logging.py:143 >> {'loss': 0.4669, 'learning_rate': 3.6616e-05, 'epoch': 1.05, 'throughput': 10008.72} [INFO|2025-03-19 22:43:06] logging.py:143 >> {'loss': 0.4933, 'learning_rate': 3.6603e-05, 'epoch': 1.05, 'throughput': 10008.87} [INFO|2025-03-19 22:43:48] logging.py:143 >> {'loss': 0.4779, 'learning_rate': 3.6591e-05, 'epoch': 1.05, 'throughput': 10008.77} [INFO|2025-03-19 22:44:27] logging.py:143 >> {'loss': 0.4706, 'learning_rate': 3.6579e-05, 'epoch': 1.05, 'throughput': 10008.91} [INFO|2025-03-19 22:45:06] logging.py:143 >> {'loss': 0.4844, 'learning_rate': 3.6566e-05, 'epoch': 1.05, 'throughput': 10009.13} [INFO|2025-03-19 22:45:46] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 3.6554e-05, 'epoch': 1.06, 'throughput': 10009.18} [INFO|2025-03-19 22:46:26] logging.py:143 >> {'loss': 0.4983, 'learning_rate': 3.6541e-05, 'epoch': 1.06, 'throughput': 10009.26} [INFO|2025-03-19 22:47:05] logging.py:143 >> {'loss': 0.4637, 'learning_rate': 3.6529e-05, 'epoch': 1.06, 'throughput': 10009.38} [INFO|2025-03-19 22:47:46] logging.py:143 >> {'loss': 0.4869, 'learning_rate': 3.6517e-05, 'epoch': 1.06, 'throughput': 10009.32} [INFO|2025-03-19 22:48:26] logging.py:143 >> {'loss': 0.4713, 'learning_rate': 3.6504e-05, 'epoch': 1.06, 'throughput': 10009.36} [INFO|2025-03-19 22:49:06] logging.py:143 >> {'loss': 0.4495, 'learning_rate': 3.6492e-05, 'epoch': 1.06, 'throughput': 10009.46} [INFO|2025-03-19 22:49:47] logging.py:143 >> {'loss': 0.4583, 'learning_rate': 3.6479e-05, 'epoch': 1.06, 'throughput': 10009.46} [INFO|2025-03-19 22:50:29] logging.py:143 >> {'loss': 0.4995, 'learning_rate': 3.6467e-05, 'epoch': 1.06, 'throughput': 10009.42} [INFO|2025-03-19 22:51:08] logging.py:143 >> {'loss': 0.4799, 'learning_rate': 3.6454e-05, 'epoch': 1.06, 'throughput': 10009.57} [INFO|2025-03-19 22:51:49] logging.py:143 >> {'loss': 0.4696, 'learning_rate': 3.6442e-05, 'epoch': 1.06, 'throughput': 10009.50} [INFO|2025-03-19 22:52:28] logging.py:143 >> {'loss': 0.4801, 'learning_rate': 3.6430e-05, 'epoch': 1.06, 'throughput': 10009.71} [INFO|2025-03-19 22:53:08] logging.py:143 >> {'loss': 0.4838, 'learning_rate': 3.6417e-05, 'epoch': 1.06, 'throughput': 10009.83} [INFO|2025-03-19 22:53:49] logging.py:143 >> {'loss': 0.4622, 'learning_rate': 3.6405e-05, 'epoch': 1.06, 'throughput': 10009.85} [INFO|2025-03-19 22:53:53] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000 [INFO|2025-03-19 22:53:53] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000/config.json [INFO|2025-03-19 22:53:53] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000/generation_config.json [INFO|2025-03-19 22:54:05] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000/model.safetensors.index.json. [INFO|2025-03-19 22:54:05] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000/tokenizer_config.json [INFO|2025-03-19 22:54:05] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-10000/special_tokens_map.json [INFO|2025-03-19 22:55:10] logging.py:143 >> {'loss': 0.4612, 'learning_rate': 3.6392e-05, 'epoch': 1.06, 'throughput': 10004.98} [INFO|2025-03-19 22:55:50] logging.py:143 >> {'loss': 0.4663, 'learning_rate': 3.6380e-05, 'epoch': 1.06, 'throughput': 10004.95} [INFO|2025-03-19 22:56:30] logging.py:143 >> {'loss': 0.4687, 'learning_rate': 3.6367e-05, 'epoch': 1.06, 'throughput': 10005.10} [INFO|2025-03-19 22:57:11] logging.py:143 >> {'loss': 0.4841, 'learning_rate': 3.6355e-05, 'epoch': 1.06, 'throughput': 10005.11} [INFO|2025-03-19 22:57:51] logging.py:143 >> {'loss': 0.4813, 'learning_rate': 3.6342e-05, 'epoch': 1.06, 'throughput': 10005.16} [INFO|2025-03-19 22:58:32] logging.py:143 >> {'loss': 0.4874, 'learning_rate': 3.6330e-05, 'epoch': 1.06, 'throughput': 10005.32} [INFO|2025-03-19 22:59:12] logging.py:143 >> {'loss': 0.4568, 'learning_rate': 3.6317e-05, 'epoch': 1.07, 'throughput': 10005.34} [INFO|2025-03-19 22:59:53] logging.py:143 >> {'loss': 0.4795, 'learning_rate': 3.6305e-05, 'epoch': 1.07, 'throughput': 10005.46} [INFO|2025-03-19 23:00:33] logging.py:143 >> {'loss': 0.4644, 'learning_rate': 3.6292e-05, 'epoch': 1.07, 'throughput': 10005.51} [INFO|2025-03-19 23:01:14] logging.py:143 >> {'loss': 0.5008, 'learning_rate': 3.6280e-05, 'epoch': 1.07, 'throughput': 10005.46} [INFO|2025-03-19 23:01:55] logging.py:143 >> {'loss': 0.5034, 'learning_rate': 3.6268e-05, 'epoch': 1.07, 'throughput': 10005.45} [INFO|2025-03-19 23:02:33] logging.py:143 >> {'loss': 0.4790, 'learning_rate': 3.6255e-05, 'epoch': 1.07, 'throughput': 10005.59} [INFO|2025-03-19 23:03:15] logging.py:143 >> {'loss': 0.4646, 'learning_rate': 3.6243e-05, 'epoch': 1.07, 'throughput': 10005.61} [INFO|2025-03-19 23:03:55] logging.py:143 >> {'loss': 0.5314, 'learning_rate': 3.6230e-05, 'epoch': 1.07, 'throughput': 10005.69} [INFO|2025-03-19 23:04:37] logging.py:143 >> {'loss': 0.4577, 'learning_rate': 3.6218e-05, 'epoch': 1.07, 'throughput': 10005.49} [INFO|2025-03-19 23:05:17] logging.py:143 >> {'loss': 0.4771, 'learning_rate': 3.6205e-05, 'epoch': 1.07, 'throughput': 10005.54} [INFO|2025-03-19 23:05:57] logging.py:143 >> {'loss': 0.4920, 'learning_rate': 3.6193e-05, 'epoch': 1.07, 'throughput': 10005.66} [INFO|2025-03-19 23:06:38] logging.py:143 >> {'loss': 0.4766, 'learning_rate': 3.6180e-05, 'epoch': 1.07, 'throughput': 10005.62} [INFO|2025-03-19 23:07:19] logging.py:143 >> {'loss': 0.4876, 'learning_rate': 3.6167e-05, 'epoch': 1.07, 'throughput': 10005.62} [INFO|2025-03-19 23:08:00] logging.py:143 >> {'loss': 0.4777, 'learning_rate': 3.6155e-05, 'epoch': 1.07, 'throughput': 10005.66} [INFO|2025-03-19 23:08:40] logging.py:143 >> {'loss': 0.4894, 'learning_rate': 3.6142e-05, 'epoch': 1.07, 'throughput': 10005.70} [INFO|2025-03-19 23:09:19] logging.py:143 >> {'loss': 0.4631, 'learning_rate': 3.6130e-05, 'epoch': 1.07, 'throughput': 10005.72} [INFO|2025-03-19 23:09:59] logging.py:143 >> {'loss': 0.4355, 'learning_rate': 3.6117e-05, 'epoch': 1.07, 'throughput': 10005.72} [INFO|2025-03-19 23:10:40] logging.py:143 >> {'loss': 0.4628, 'learning_rate': 3.6105e-05, 'epoch': 1.07, 'throughput': 10005.64} [INFO|2025-03-19 23:11:19] logging.py:143 >> {'loss': 0.4901, 'learning_rate': 3.6092e-05, 'epoch': 1.07, 'throughput': 10005.75} [INFO|2025-03-19 23:12:00] logging.py:143 >> {'loss': 0.4903, 'learning_rate': 3.6080e-05, 'epoch': 1.08, 'throughput': 10005.64} [INFO|2025-03-19 23:12:40] logging.py:143 >> {'loss': 0.4703, 'learning_rate': 3.6067e-05, 'epoch': 1.08, 'throughput': 10005.64} [INFO|2025-03-19 23:13:21] logging.py:143 >> {'loss': 0.4681, 'learning_rate': 3.6055e-05, 'epoch': 1.08, 'throughput': 10005.64} [INFO|2025-03-19 23:14:00] logging.py:143 >> {'loss': 0.4839, 'learning_rate': 3.6042e-05, 'epoch': 1.08, 'throughput': 10005.72} [INFO|2025-03-19 23:14:43] logging.py:143 >> {'loss': 0.4739, 'learning_rate': 3.6030e-05, 'epoch': 1.08, 'throughput': 10005.58} [INFO|2025-03-19 23:15:23] logging.py:143 >> {'loss': 0.4883, 'learning_rate': 3.6017e-05, 'epoch': 1.08, 'throughput': 10005.71} [INFO|2025-03-19 23:16:04] logging.py:143 >> {'loss': 0.4737, 'learning_rate': 3.6004e-05, 'epoch': 1.08, 'throughput': 10005.63} [INFO|2025-03-19 23:16:43] logging.py:143 >> {'loss': 0.4837, 'learning_rate': 3.5992e-05, 'epoch': 1.08, 'throughput': 10005.67} [INFO|2025-03-19 23:17:24] logging.py:143 >> {'loss': 0.4673, 'learning_rate': 3.5979e-05, 'epoch': 1.08, 'throughput': 10005.56} [INFO|2025-03-19 23:18:06] logging.py:143 >> {'loss': 0.4850, 'learning_rate': 3.5967e-05, 'epoch': 1.08, 'throughput': 10005.54} [INFO|2025-03-19 23:18:46] logging.py:143 >> {'loss': 0.4674, 'learning_rate': 3.5954e-05, 'epoch': 1.08, 'throughput': 10005.56} [INFO|2025-03-19 23:19:26] logging.py:143 >> {'loss': 0.4796, 'learning_rate': 3.5942e-05, 'epoch': 1.08, 'throughput': 10005.58} [INFO|2025-03-19 23:20:08] logging.py:143 >> {'loss': 0.4827, 'learning_rate': 3.5929e-05, 'epoch': 1.08, 'throughput': 10005.40} [INFO|2025-03-19 23:20:49] logging.py:143 >> {'loss': 0.4630, 'learning_rate': 3.5916e-05, 'epoch': 1.08, 'throughput': 10005.34} [INFO|2025-03-19 23:21:29] logging.py:143 >> {'loss': 0.4855, 'learning_rate': 3.5904e-05, 'epoch': 1.08, 'throughput': 10005.28} [INFO|2025-03-19 23:22:09] logging.py:143 >> {'loss': 0.4593, 'learning_rate': 3.5891e-05, 'epoch': 1.08, 'throughput': 10005.33} [INFO|2025-03-19 23:22:49] logging.py:143 >> {'loss': 0.4985, 'learning_rate': 3.5879e-05, 'epoch': 1.08, 'throughput': 10005.35} [INFO|2025-03-19 23:23:27] logging.py:143 >> {'loss': 0.4702, 'learning_rate': 3.5866e-05, 'epoch': 1.08, 'throughput': 10005.44} [INFO|2025-03-19 23:24:07] logging.py:143 >> {'loss': 0.4531, 'learning_rate': 3.5853e-05, 'epoch': 1.08, 'throughput': 10005.52} [INFO|2025-03-19 23:24:47] logging.py:143 >> {'loss': 0.5010, 'learning_rate': 3.5841e-05, 'epoch': 1.09, 'throughput': 10005.65} [INFO|2025-03-19 23:25:27] logging.py:143 >> {'loss': 0.4690, 'learning_rate': 3.5828e-05, 'epoch': 1.09, 'throughput': 10005.69} [INFO|2025-03-19 23:26:09] logging.py:143 >> {'loss': 0.4597, 'learning_rate': 3.5816e-05, 'epoch': 1.09, 'throughput': 10005.56} [INFO|2025-03-19 23:26:49] logging.py:143 >> {'loss': 0.4561, 'learning_rate': 3.5803e-05, 'epoch': 1.09, 'throughput': 10005.59} [INFO|2025-03-19 23:27:29] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 3.5790e-05, 'epoch': 1.09, 'throughput': 10005.52} [INFO|2025-03-19 23:28:09] logging.py:143 >> {'loss': 0.4576, 'learning_rate': 3.5778e-05, 'epoch': 1.09, 'throughput': 10005.50} [INFO|2025-03-19 23:28:51] logging.py:143 >> {'loss': 0.4651, 'learning_rate': 3.5765e-05, 'epoch': 1.09, 'throughput': 10005.46} [INFO|2025-03-19 23:29:32] logging.py:143 >> {'loss': 0.4759, 'learning_rate': 3.5752e-05, 'epoch': 1.09, 'throughput': 10005.56} [INFO|2025-03-19 23:30:12] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 3.5740e-05, 'epoch': 1.09, 'throughput': 10005.57} [INFO|2025-03-19 23:30:52] logging.py:143 >> {'loss': 0.4638, 'learning_rate': 3.5727e-05, 'epoch': 1.09, 'throughput': 10005.57} [INFO|2025-03-19 23:31:32] logging.py:143 >> {'loss': 0.4698, 'learning_rate': 3.5715e-05, 'epoch': 1.09, 'throughput': 10005.52} [INFO|2025-03-19 23:32:13] logging.py:143 >> {'loss': 0.4595, 'learning_rate': 3.5702e-05, 'epoch': 1.09, 'throughput': 10005.58} [INFO|2025-03-19 23:32:53] logging.py:143 >> {'loss': 0.4716, 'learning_rate': 3.5689e-05, 'epoch': 1.09, 'throughput': 10005.50} [INFO|2025-03-19 23:33:33] logging.py:143 >> {'loss': 0.4746, 'learning_rate': 3.5677e-05, 'epoch': 1.09, 'throughput': 10005.73} [INFO|2025-03-19 23:34:15] logging.py:143 >> {'loss': 0.4828, 'learning_rate': 3.5664e-05, 'epoch': 1.09, 'throughput': 10005.65} [INFO|2025-03-19 23:34:56] logging.py:143 >> {'loss': 0.5027, 'learning_rate': 3.5651e-05, 'epoch': 1.09, 'throughput': 10005.71} [INFO|2025-03-19 23:35:37] logging.py:143 >> {'loss': 0.4866, 'learning_rate': 3.5639e-05, 'epoch': 1.09, 'throughput': 10005.55} [INFO|2025-03-19 23:36:19] logging.py:143 >> {'loss': 0.4652, 'learning_rate': 3.5626e-05, 'epoch': 1.09, 'throughput': 10005.57} [INFO|2025-03-19 23:36:59] logging.py:143 >> {'loss': 0.4961, 'learning_rate': 3.5613e-05, 'epoch': 1.09, 'throughput': 10005.60} [INFO|2025-03-19 23:37:41] logging.py:143 >> {'loss': 0.4674, 'learning_rate': 3.5601e-05, 'epoch': 1.10, 'throughput': 10005.56} [INFO|2025-03-19 23:38:22] logging.py:143 >> {'loss': 0.4920, 'learning_rate': 3.5588e-05, 'epoch': 1.10, 'throughput': 10005.56} [INFO|2025-03-19 23:39:03] logging.py:143 >> {'loss': 0.4825, 'learning_rate': 3.5575e-05, 'epoch': 1.10, 'throughput': 10005.51} [INFO|2025-03-19 23:39:43] logging.py:143 >> {'loss': 0.4846, 'learning_rate': 3.5563e-05, 'epoch': 1.10, 'throughput': 10005.62} [INFO|2025-03-19 23:40:23] logging.py:143 >> {'loss': 0.4784, 'learning_rate': 3.5550e-05, 'epoch': 1.10, 'throughput': 10005.62} [INFO|2025-03-19 23:41:04] logging.py:143 >> {'loss': 0.5007, 'learning_rate': 3.5537e-05, 'epoch': 1.10, 'throughput': 10005.62} [INFO|2025-03-19 23:41:44] logging.py:143 >> {'loss': 0.4626, 'learning_rate': 3.5525e-05, 'epoch': 1.10, 'throughput': 10005.58} [INFO|2025-03-19 23:42:24] logging.py:143 >> {'loss': 0.4344, 'learning_rate': 3.5512e-05, 'epoch': 1.10, 'throughput': 10005.53} [INFO|2025-03-19 23:43:04] logging.py:143 >> {'loss': 0.4745, 'learning_rate': 3.5499e-05, 'epoch': 1.10, 'throughput': 10005.49} [INFO|2025-03-19 23:43:43] logging.py:143 >> {'loss': 0.4596, 'learning_rate': 3.5486e-05, 'epoch': 1.10, 'throughput': 10005.62} [INFO|2025-03-19 23:44:24] logging.py:143 >> {'loss': 0.4554, 'learning_rate': 3.5474e-05, 'epoch': 1.10, 'throughput': 10005.57} [INFO|2025-03-19 23:45:05] logging.py:143 >> {'loss': 0.4703, 'learning_rate': 3.5461e-05, 'epoch': 1.10, 'throughput': 10005.62} [INFO|2025-03-19 23:45:43] logging.py:143 >> {'loss': 0.4953, 'learning_rate': 3.5448e-05, 'epoch': 1.10, 'throughput': 10005.73} [INFO|2025-03-19 23:46:23] logging.py:143 >> {'loss': 0.4820, 'learning_rate': 3.5436e-05, 'epoch': 1.10, 'throughput': 10005.89} [INFO|2025-03-19 23:47:04] logging.py:143 >> {'loss': 0.5021, 'learning_rate': 3.5423e-05, 'epoch': 1.10, 'throughput': 10005.92} [INFO|2025-03-19 23:47:46] logging.py:143 >> {'loss': 0.4778, 'learning_rate': 3.5410e-05, 'epoch': 1.10, 'throughput': 10005.75} [INFO|2025-03-19 23:48:27] logging.py:143 >> {'loss': 0.4738, 'learning_rate': 3.5397e-05, 'epoch': 1.10, 'throughput': 10005.76} [INFO|2025-03-19 23:49:07] logging.py:143 >> {'loss': 0.5070, 'learning_rate': 3.5385e-05, 'epoch': 1.10, 'throughput': 10005.81} [INFO|2025-03-19 23:49:46] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 3.5372e-05, 'epoch': 1.10, 'throughput': 10005.83} [INFO|2025-03-19 23:50:27] logging.py:143 >> {'loss': 0.4840, 'learning_rate': 3.5359e-05, 'epoch': 1.11, 'throughput': 10005.74} [INFO|2025-03-19 23:51:07] logging.py:143 >> {'loss': 0.4735, 'learning_rate': 3.5346e-05, 'epoch': 1.11, 'throughput': 10005.77} [INFO|2025-03-19 23:51:47] logging.py:143 >> {'loss': 0.4641, 'learning_rate': 3.5334e-05, 'epoch': 1.11, 'throughput': 10005.85} [INFO|2025-03-19 23:52:28] logging.py:143 >> {'loss': 0.4825, 'learning_rate': 3.5321e-05, 'epoch': 1.11, 'throughput': 10005.75} [INFO|2025-03-19 23:53:10] logging.py:143 >> {'loss': 0.4963, 'learning_rate': 3.5308e-05, 'epoch': 1.11, 'throughput': 10005.86} [INFO|2025-03-19 23:53:50] logging.py:143 >> {'loss': 0.4812, 'learning_rate': 3.5295e-05, 'epoch': 1.11, 'throughput': 10005.88} [INFO|2025-03-19 23:54:30] logging.py:143 >> {'loss': 0.4776, 'learning_rate': 3.5283e-05, 'epoch': 1.11, 'throughput': 10005.92} [INFO|2025-03-19 23:55:11] logging.py:143 >> {'loss': 0.4585, 'learning_rate': 3.5270e-05, 'epoch': 1.11, 'throughput': 10005.78} [INFO|2025-03-19 23:55:53] logging.py:143 >> {'loss': 0.4612, 'learning_rate': 3.5257e-05, 'epoch': 1.11, 'throughput': 10005.67} [INFO|2025-03-19 23:56:33] logging.py:143 >> {'loss': 0.4668, 'learning_rate': 3.5244e-05, 'epoch': 1.11, 'throughput': 10005.62} [INFO|2025-03-19 23:57:13] logging.py:143 >> {'loss': 0.4939, 'learning_rate': 3.5232e-05, 'epoch': 1.11, 'throughput': 10005.64} [INFO|2025-03-19 23:57:53] logging.py:143 >> {'loss': 0.4850, 'learning_rate': 3.5219e-05, 'epoch': 1.11, 'throughput': 10005.80} [INFO|2025-03-19 23:58:33] logging.py:143 >> {'loss': 0.4805, 'learning_rate': 3.5206e-05, 'epoch': 1.11, 'throughput': 10005.81} [INFO|2025-03-19 23:59:14] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 3.5193e-05, 'epoch': 1.11, 'throughput': 10005.70} [INFO|2025-03-19 23:59:55] logging.py:143 >> {'loss': 0.4893, 'learning_rate': 3.5181e-05, 'epoch': 1.11, 'throughput': 10005.67} [INFO|2025-03-20 00:00:35] logging.py:143 >> {'loss': 0.4997, 'learning_rate': 3.5168e-05, 'epoch': 1.11, 'throughput': 10005.67} [INFO|2025-03-20 00:01:14] logging.py:143 >> {'loss': 0.4740, 'learning_rate': 3.5155e-05, 'epoch': 1.11, 'throughput': 10005.82} [INFO|2025-03-20 00:01:55] logging.py:143 >> {'loss': 0.4606, 'learning_rate': 3.5142e-05, 'epoch': 1.11, 'throughput': 10005.67} [INFO|2025-03-20 00:02:36] logging.py:143 >> {'loss': 0.4694, 'learning_rate': 3.5129e-05, 'epoch': 1.12, 'throughput': 10005.64} [INFO|2025-03-20 00:03:17] logging.py:143 >> {'loss': 0.4965, 'learning_rate': 3.5117e-05, 'epoch': 1.12, 'throughput': 10005.64} [INFO|2025-03-20 00:03:57] logging.py:143 >> {'loss': 0.4843, 'learning_rate': 3.5104e-05, 'epoch': 1.12, 'throughput': 10005.71} [INFO|2025-03-20 00:04:39] logging.py:143 >> {'loss': 0.4573, 'learning_rate': 3.5091e-05, 'epoch': 1.12, 'throughput': 10005.55} [INFO|2025-03-20 00:05:20] logging.py:143 >> {'loss': 0.4775, 'learning_rate': 3.5078e-05, 'epoch': 1.12, 'throughput': 10005.53} [INFO|2025-03-20 00:06:01] logging.py:143 >> {'loss': 0.4371, 'learning_rate': 3.5065e-05, 'epoch': 1.12, 'throughput': 10005.50} [INFO|2025-03-20 00:06:41] logging.py:143 >> {'loss': 0.4593, 'learning_rate': 3.5053e-05, 'epoch': 1.12, 'throughput': 10005.49} [INFO|2025-03-20 00:07:22] logging.py:143 >> {'loss': 0.4628, 'learning_rate': 3.5040e-05, 'epoch': 1.12, 'throughput': 10005.55} [INFO|2025-03-20 00:08:03] logging.py:143 >> {'loss': 0.5135, 'learning_rate': 3.5027e-05, 'epoch': 1.12, 'throughput': 10005.60} [INFO|2025-03-20 00:08:41] logging.py:143 >> {'loss': 0.4748, 'learning_rate': 3.5014e-05, 'epoch': 1.12, 'throughput': 10005.70} [INFO|2025-03-20 00:09:22] logging.py:143 >> {'loss': 0.4793, 'learning_rate': 3.5001e-05, 'epoch': 1.12, 'throughput': 10005.70} [INFO|2025-03-20 00:10:02] logging.py:143 >> {'loss': 0.4722, 'learning_rate': 3.4989e-05, 'epoch': 1.12, 'throughput': 10005.74} [INFO|2025-03-20 00:10:43] logging.py:143 >> {'loss': 0.4614, 'learning_rate': 3.4976e-05, 'epoch': 1.12, 'throughput': 10005.71} [INFO|2025-03-20 00:11:23] logging.py:143 >> {'loss': 0.4701, 'learning_rate': 3.4963e-05, 'epoch': 1.12, 'throughput': 10005.75} [INFO|2025-03-20 00:12:03] logging.py:143 >> {'loss': 0.4647, 'learning_rate': 3.4950e-05, 'epoch': 1.12, 'throughput': 10005.81} [INFO|2025-03-20 00:12:43] logging.py:143 >> {'loss': 0.4624, 'learning_rate': 3.4937e-05, 'epoch': 1.12, 'throughput': 10005.80} [INFO|2025-03-20 00:13:22] logging.py:143 >> {'loss': 0.4300, 'learning_rate': 3.4924e-05, 'epoch': 1.12, 'throughput': 10005.93} [INFO|2025-03-20 00:14:03] logging.py:143 >> {'loss': 0.4733, 'learning_rate': 3.4912e-05, 'epoch': 1.12, 'throughput': 10005.86} [INFO|2025-03-20 00:14:42] logging.py:143 >> {'loss': 0.4540, 'learning_rate': 3.4899e-05, 'epoch': 1.12, 'throughput': 10005.89} [INFO|2025-03-20 00:15:23] logging.py:143 >> {'loss': 0.5184, 'learning_rate': 3.4886e-05, 'epoch': 1.13, 'throughput': 10005.86} [INFO|2025-03-20 00:16:03] logging.py:143 >> {'loss': 0.4688, 'learning_rate': 3.4873e-05, 'epoch': 1.13, 'throughput': 10005.91} [INFO|2025-03-20 00:16:43] logging.py:143 >> {'loss': 0.4759, 'learning_rate': 3.4860e-05, 'epoch': 1.13, 'throughput': 10006.02} [INFO|2025-03-20 00:17:23] logging.py:143 >> {'loss': 0.4592, 'learning_rate': 3.4847e-05, 'epoch': 1.13, 'throughput': 10006.05} [INFO|2025-03-20 00:18:04] logging.py:143 >> {'loss': 0.4644, 'learning_rate': 3.4834e-05, 'epoch': 1.13, 'throughput': 10005.96} [INFO|2025-03-20 00:18:44] logging.py:143 >> {'loss': 0.4694, 'learning_rate': 3.4822e-05, 'epoch': 1.13, 'throughput': 10005.97} [INFO|2025-03-20 00:19:27] logging.py:143 >> {'loss': 0.4885, 'learning_rate': 3.4809e-05, 'epoch': 1.13, 'throughput': 10005.91} [INFO|2025-03-20 00:20:08] logging.py:143 >> {'loss': 0.4595, 'learning_rate': 3.4796e-05, 'epoch': 1.13, 'throughput': 10005.81} [INFO|2025-03-20 00:20:49] logging.py:143 >> {'loss': 0.4431, 'learning_rate': 3.4783e-05, 'epoch': 1.13, 'throughput': 10005.70} [INFO|2025-03-20 00:21:29] logging.py:143 >> {'loss': 0.4835, 'learning_rate': 3.4770e-05, 'epoch': 1.13, 'throughput': 10005.67} [INFO|2025-03-20 00:22:11] logging.py:143 >> {'loss': 0.4699, 'learning_rate': 3.4757e-05, 'epoch': 1.13, 'throughput': 10005.55} [INFO|2025-03-20 00:22:51] logging.py:143 >> {'loss': 0.4744, 'learning_rate': 3.4744e-05, 'epoch': 1.13, 'throughput': 10005.64} [INFO|2025-03-20 00:23:31] logging.py:143 >> {'loss': 0.4716, 'learning_rate': 3.4731e-05, 'epoch': 1.13, 'throughput': 10005.47} [INFO|2025-03-20 00:24:13] logging.py:143 >> {'loss': 0.4834, 'learning_rate': 3.4718e-05, 'epoch': 1.13, 'throughput': 10005.38} [INFO|2025-03-20 00:24:52] logging.py:143 >> {'loss': 0.4430, 'learning_rate': 3.4706e-05, 'epoch': 1.13, 'throughput': 10005.50} [INFO|2025-03-20 00:25:33] logging.py:143 >> {'loss': 0.4655, 'learning_rate': 3.4693e-05, 'epoch': 1.13, 'throughput': 10005.58} [INFO|2025-03-20 00:26:14] logging.py:143 >> {'loss': 0.4712, 'learning_rate': 3.4680e-05, 'epoch': 1.13, 'throughput': 10005.47} [INFO|2025-03-20 00:26:54] logging.py:143 >> {'loss': 0.4846, 'learning_rate': 3.4667e-05, 'epoch': 1.13, 'throughput': 10005.36} [INFO|2025-03-20 00:27:33] logging.py:143 >> {'loss': 0.4682, 'learning_rate': 3.4654e-05, 'epoch': 1.13, 'throughput': 10005.48} [INFO|2025-03-20 00:28:15] logging.py:143 >> {'loss': 0.4868, 'learning_rate': 3.4641e-05, 'epoch': 1.14, 'throughput': 10005.57} [INFO|2025-03-20 00:28:55] logging.py:143 >> {'loss': 0.4574, 'learning_rate': 3.4628e-05, 'epoch': 1.14, 'throughput': 10005.69} [INFO|2025-03-20 00:29:35] logging.py:143 >> {'loss': 0.4627, 'learning_rate': 3.4615e-05, 'epoch': 1.14, 'throughput': 10005.81} [INFO|2025-03-20 00:30:14] logging.py:143 >> {'loss': 0.4634, 'learning_rate': 3.4602e-05, 'epoch': 1.14, 'throughput': 10005.90} [INFO|2025-03-20 00:30:55] logging.py:143 >> {'loss': 0.4812, 'learning_rate': 3.4589e-05, 'epoch': 1.14, 'throughput': 10005.90} [INFO|2025-03-20 00:31:36] logging.py:143 >> {'loss': 0.4821, 'learning_rate': 3.4576e-05, 'epoch': 1.14, 'throughput': 10005.95} [INFO|2025-03-20 00:32:15] logging.py:143 >> {'loss': 0.4742, 'learning_rate': 3.4564e-05, 'epoch': 1.14, 'throughput': 10006.14} [INFO|2025-03-20 00:32:54] logging.py:143 >> {'loss': 0.4537, 'learning_rate': 3.4551e-05, 'epoch': 1.14, 'throughput': 10006.04} [INFO|2025-03-20 00:33:35] logging.py:143 >> {'loss': 0.4352, 'learning_rate': 3.4538e-05, 'epoch': 1.14, 'throughput': 10005.81} [INFO|2025-03-20 00:34:16] logging.py:143 >> {'loss': 0.4880, 'learning_rate': 3.4525e-05, 'epoch': 1.14, 'throughput': 10005.85} [INFO|2025-03-20 00:34:55] logging.py:143 >> {'loss': 0.4562, 'learning_rate': 3.4512e-05, 'epoch': 1.14, 'throughput': 10005.86} [INFO|2025-03-20 00:35:36] logging.py:143 >> {'loss': 0.4893, 'learning_rate': 3.4499e-05, 'epoch': 1.14, 'throughput': 10005.84} [INFO|2025-03-20 00:36:17] logging.py:143 >> {'loss': 0.4788, 'learning_rate': 3.4486e-05, 'epoch': 1.14, 'throughput': 10005.66} [INFO|2025-03-20 00:36:56] logging.py:143 >> {'loss': 0.4594, 'learning_rate': 3.4473e-05, 'epoch': 1.14, 'throughput': 10005.71} [INFO|2025-03-20 00:37:37] logging.py:143 >> {'loss': 0.4713, 'learning_rate': 3.4460e-05, 'epoch': 1.14, 'throughput': 10005.68} [INFO|2025-03-20 00:38:18] logging.py:143 >> {'loss': 0.4833, 'learning_rate': 3.4447e-05, 'epoch': 1.14, 'throughput': 10005.62} [INFO|2025-03-20 00:38:59] logging.py:143 >> {'loss': 0.4926, 'learning_rate': 3.4434e-05, 'epoch': 1.14, 'throughput': 10005.63} [INFO|2025-03-20 00:39:39] logging.py:143 >> {'loss': 0.4878, 'learning_rate': 3.4421e-05, 'epoch': 1.14, 'throughput': 10005.76} [INFO|2025-03-20 00:40:19] logging.py:143 >> {'loss': 0.4717, 'learning_rate': 3.4408e-05, 'epoch': 1.14, 'throughput': 10005.78} [INFO|2025-03-20 00:40:59] logging.py:143 >> {'loss': 0.4718, 'learning_rate': 3.4395e-05, 'epoch': 1.15, 'throughput': 10005.75} [INFO|2025-03-20 00:41:39] logging.py:143 >> {'loss': 0.4866, 'learning_rate': 3.4382e-05, 'epoch': 1.15, 'throughput': 10005.77} [INFO|2025-03-20 00:42:21] logging.py:143 >> {'loss': 0.4470, 'learning_rate': 3.4369e-05, 'epoch': 1.15, 'throughput': 10005.65} [INFO|2025-03-20 00:43:01] logging.py:143 >> {'loss': 0.4626, 'learning_rate': 3.4356e-05, 'epoch': 1.15, 'throughput': 10005.64} [INFO|2025-03-20 00:43:41] logging.py:143 >> {'loss': 0.4437, 'learning_rate': 3.4343e-05, 'epoch': 1.15, 'throughput': 10005.74} [INFO|2025-03-20 00:44:21] logging.py:143 >> {'loss': 0.4584, 'learning_rate': 3.4330e-05, 'epoch': 1.15, 'throughput': 10005.68} [INFO|2025-03-20 00:45:02] logging.py:143 >> {'loss': 0.4774, 'learning_rate': 3.4317e-05, 'epoch': 1.15, 'throughput': 10005.58} [INFO|2025-03-20 00:45:43] logging.py:143 >> {'loss': 0.4665, 'learning_rate': 3.4304e-05, 'epoch': 1.15, 'throughput': 10005.72} [INFO|2025-03-20 00:46:23] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 3.4291e-05, 'epoch': 1.15, 'throughput': 10005.81} [INFO|2025-03-20 00:47:04] logging.py:143 >> {'loss': 0.4831, 'learning_rate': 3.4278e-05, 'epoch': 1.15, 'throughput': 10005.76} [INFO|2025-03-20 00:47:45] logging.py:143 >> {'loss': 0.4820, 'learning_rate': 3.4265e-05, 'epoch': 1.15, 'throughput': 10005.70} [INFO|2025-03-20 00:48:25] logging.py:143 >> {'loss': 0.4598, 'learning_rate': 3.4252e-05, 'epoch': 1.15, 'throughput': 10005.70} [INFO|2025-03-20 00:49:06] logging.py:143 >> {'loss': 0.4507, 'learning_rate': 3.4239e-05, 'epoch': 1.15, 'throughput': 10005.69} [INFO|2025-03-20 00:49:46] logging.py:143 >> {'loss': 0.4959, 'learning_rate': 3.4226e-05, 'epoch': 1.15, 'throughput': 10005.79} [INFO|2025-03-20 00:50:26] logging.py:143 >> {'loss': 0.4690, 'learning_rate': 3.4213e-05, 'epoch': 1.15, 'throughput': 10005.82} [INFO|2025-03-20 00:51:05] logging.py:143 >> {'loss': 0.4690, 'learning_rate': 3.4200e-05, 'epoch': 1.15, 'throughput': 10005.92} [INFO|2025-03-20 00:51:47] logging.py:143 >> {'loss': 0.4816, 'learning_rate': 3.4187e-05, 'epoch': 1.15, 'throughput': 10005.86} [INFO|2025-03-20 00:52:27] logging.py:143 >> {'loss': 0.4556, 'learning_rate': 3.4174e-05, 'epoch': 1.15, 'throughput': 10005.70} [INFO|2025-03-20 00:53:09] logging.py:143 >> {'loss': 0.4503, 'learning_rate': 3.4161e-05, 'epoch': 1.15, 'throughput': 10005.68} [INFO|2025-03-20 00:53:49] logging.py:143 >> {'loss': 0.4711, 'learning_rate': 3.4148e-05, 'epoch': 1.16, 'throughput': 10005.75} [INFO|2025-03-20 00:54:30] logging.py:143 >> {'loss': 0.4799, 'learning_rate': 3.4135e-05, 'epoch': 1.16, 'throughput': 10005.77} [INFO|2025-03-20 00:55:09] logging.py:143 >> {'loss': 0.4660, 'learning_rate': 3.4122e-05, 'epoch': 1.16, 'throughput': 10005.80} [INFO|2025-03-20 00:55:50] logging.py:143 >> {'loss': 0.4354, 'learning_rate': 3.4109e-05, 'epoch': 1.16, 'throughput': 10005.85} [INFO|2025-03-20 00:56:31] logging.py:143 >> {'loss': 0.4784, 'learning_rate': 3.4096e-05, 'epoch': 1.16, 'throughput': 10005.80} [INFO|2025-03-20 00:57:11] logging.py:143 >> {'loss': 0.4923, 'learning_rate': 3.4083e-05, 'epoch': 1.16, 'throughput': 10005.94} [INFO|2025-03-20 00:57:50] logging.py:143 >> {'loss': 0.4846, 'learning_rate': 3.4070e-05, 'epoch': 1.16, 'throughput': 10006.00} [INFO|2025-03-20 00:58:32] logging.py:143 >> {'loss': 0.4567, 'learning_rate': 3.4057e-05, 'epoch': 1.16, 'throughput': 10005.79} [INFO|2025-03-20 00:59:13] logging.py:143 >> {'loss': 0.4569, 'learning_rate': 3.4044e-05, 'epoch': 1.16, 'throughput': 10005.70} [INFO|2025-03-20 00:59:54] logging.py:143 >> {'loss': 0.4915, 'learning_rate': 3.4031e-05, 'epoch': 1.16, 'throughput': 10005.66} [INFO|2025-03-20 01:00:33] logging.py:143 >> {'loss': 0.4710, 'learning_rate': 3.4018e-05, 'epoch': 1.16, 'throughput': 10005.77} [INFO|2025-03-20 01:01:13] logging.py:143 >> {'loss': 0.4696, 'learning_rate': 3.4005e-05, 'epoch': 1.16, 'throughput': 10005.80} [INFO|2025-03-20 01:01:55] logging.py:143 >> {'loss': 0.4527, 'learning_rate': 3.3992e-05, 'epoch': 1.16, 'throughput': 10005.63} [INFO|2025-03-20 01:02:35] logging.py:143 >> {'loss': 0.4480, 'learning_rate': 3.3979e-05, 'epoch': 1.16, 'throughput': 10005.55} [INFO|2025-03-20 01:03:15] logging.py:143 >> {'loss': 0.4533, 'learning_rate': 3.3966e-05, 'epoch': 1.16, 'throughput': 10005.64} [INFO|2025-03-20 01:03:56] logging.py:143 >> {'loss': 0.4744, 'learning_rate': 3.3953e-05, 'epoch': 1.16, 'throughput': 10005.74} [INFO|2025-03-20 01:04:36] logging.py:143 >> {'loss': 0.4549, 'learning_rate': 3.3940e-05, 'epoch': 1.16, 'throughput': 10005.73} [INFO|2025-03-20 01:05:16] logging.py:143 >> {'loss': 0.4649, 'learning_rate': 3.3926e-05, 'epoch': 1.16, 'throughput': 10005.74} [INFO|2025-03-20 01:05:56] logging.py:143 >> {'loss': 0.4598, 'learning_rate': 3.3913e-05, 'epoch': 1.16, 'throughput': 10005.68} [INFO|2025-03-20 01:06:36] logging.py:143 >> {'loss': 0.4670, 'learning_rate': 3.3900e-05, 'epoch': 1.17, 'throughput': 10005.75} [INFO|2025-03-20 01:07:16] logging.py:143 >> {'loss': 0.4559, 'learning_rate': 3.3887e-05, 'epoch': 1.17, 'throughput': 10005.68} [INFO|2025-03-20 01:07:56] logging.py:143 >> {'loss': 0.4798, 'learning_rate': 3.3874e-05, 'epoch': 1.17, 'throughput': 10005.73} [INFO|2025-03-20 01:08:36] logging.py:143 >> {'loss': 0.4588, 'learning_rate': 3.3861e-05, 'epoch': 1.17, 'throughput': 10005.77} [INFO|2025-03-20 01:09:17] logging.py:143 >> {'loss': 0.4292, 'learning_rate': 3.3848e-05, 'epoch': 1.17, 'throughput': 10005.72} [INFO|2025-03-20 01:09:59] logging.py:143 >> {'loss': 0.4732, 'learning_rate': 3.3835e-05, 'epoch': 1.17, 'throughput': 10005.62} [INFO|2025-03-20 01:10:39] logging.py:143 >> {'loss': 0.4701, 'learning_rate': 3.3822e-05, 'epoch': 1.17, 'throughput': 10005.59} [INFO|2025-03-20 01:11:20] logging.py:143 >> {'loss': 0.4854, 'learning_rate': 3.3809e-05, 'epoch': 1.17, 'throughput': 10005.64} [INFO|2025-03-20 01:12:00] logging.py:143 >> {'loss': 0.4970, 'learning_rate': 3.3796e-05, 'epoch': 1.17, 'throughput': 10005.69} [INFO|2025-03-20 01:12:40] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 3.3783e-05, 'epoch': 1.17, 'throughput': 10005.76} [INFO|2025-03-20 01:13:19] logging.py:143 >> {'loss': 0.4634, 'learning_rate': 3.3769e-05, 'epoch': 1.17, 'throughput': 10005.91} [INFO|2025-03-20 01:14:00] logging.py:143 >> {'loss': 0.4862, 'learning_rate': 3.3756e-05, 'epoch': 1.17, 'throughput': 10005.68} [INFO|2025-03-20 01:14:41] logging.py:143 >> {'loss': 0.4573, 'learning_rate': 3.3743e-05, 'epoch': 1.17, 'throughput': 10005.58} [INFO|2025-03-20 01:15:21] logging.py:143 >> {'loss': 0.4584, 'learning_rate': 3.3730e-05, 'epoch': 1.17, 'throughput': 10005.52} [INFO|2025-03-20 01:16:01] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 3.3717e-05, 'epoch': 1.17, 'throughput': 10005.56} [INFO|2025-03-20 01:16:42] logging.py:143 >> {'loss': 0.4780, 'learning_rate': 3.3704e-05, 'epoch': 1.17, 'throughput': 10005.64} [INFO|2025-03-20 01:17:23] logging.py:143 >> {'loss': 0.4635, 'learning_rate': 3.3691e-05, 'epoch': 1.17, 'throughput': 10005.50} [INFO|2025-03-20 01:18:04] logging.py:143 >> {'loss': 0.4749, 'learning_rate': 3.3678e-05, 'epoch': 1.17, 'throughput': 10005.42} [INFO|2025-03-20 01:18:44] logging.py:143 >> {'loss': 0.4562, 'learning_rate': 3.3665e-05, 'epoch': 1.18, 'throughput': 10005.51} [INFO|2025-03-20 01:19:25] logging.py:143 >> {'loss': 0.4544, 'learning_rate': 3.3651e-05, 'epoch': 1.18, 'throughput': 10005.36} [INFO|2025-03-20 01:20:07] logging.py:143 >> {'loss': 0.4656, 'learning_rate': 3.3638e-05, 'epoch': 1.18, 'throughput': 10005.38} [INFO|2025-03-20 01:20:47] logging.py:143 >> {'loss': 0.4539, 'learning_rate': 3.3625e-05, 'epoch': 1.18, 'throughput': 10005.29} [INFO|2025-03-20 01:21:27] logging.py:143 >> {'loss': 0.4408, 'learning_rate': 3.3612e-05, 'epoch': 1.18, 'throughput': 10005.19} [INFO|2025-03-20 01:22:07] logging.py:143 >> {'loss': 0.4468, 'learning_rate': 3.3599e-05, 'epoch': 1.18, 'throughput': 10005.15} [INFO|2025-03-20 01:22:48] logging.py:143 >> {'loss': 0.4988, 'learning_rate': 3.3586e-05, 'epoch': 1.18, 'throughput': 10005.16} [INFO|2025-03-20 01:23:28] logging.py:143 >> {'loss': 0.4599, 'learning_rate': 3.3573e-05, 'epoch': 1.18, 'throughput': 10005.17} [INFO|2025-03-20 01:24:08] logging.py:143 >> {'loss': 0.4804, 'learning_rate': 3.3559e-05, 'epoch': 1.18, 'throughput': 10005.20} [INFO|2025-03-20 01:24:47] logging.py:143 >> {'loss': 0.4670, 'learning_rate': 3.3546e-05, 'epoch': 1.18, 'throughput': 10005.21} [INFO|2025-03-20 01:25:28] logging.py:143 >> {'loss': 0.4811, 'learning_rate': 3.3533e-05, 'epoch': 1.18, 'throughput': 10005.30} [INFO|2025-03-20 01:26:09] logging.py:143 >> {'loss': 0.4874, 'learning_rate': 3.3520e-05, 'epoch': 1.18, 'throughput': 10005.34} [INFO|2025-03-20 01:26:50] logging.py:143 >> {'loss': 0.4581, 'learning_rate': 3.3507e-05, 'epoch': 1.18, 'throughput': 10005.27} [INFO|2025-03-20 01:27:31] logging.py:143 >> {'loss': 0.4774, 'learning_rate': 3.3494e-05, 'epoch': 1.18, 'throughput': 10005.34} [INFO|2025-03-20 01:28:11] logging.py:143 >> {'loss': 0.4756, 'learning_rate': 3.3480e-05, 'epoch': 1.18, 'throughput': 10005.35} [INFO|2025-03-20 01:28:51] logging.py:143 >> {'loss': 0.4891, 'learning_rate': 3.3467e-05, 'epoch': 1.18, 'throughput': 10005.51} [INFO|2025-03-20 01:29:31] logging.py:143 >> {'loss': 0.4828, 'learning_rate': 3.3454e-05, 'epoch': 1.18, 'throughput': 10005.60} [INFO|2025-03-20 01:30:11] logging.py:143 >> {'loss': 0.4662, 'learning_rate': 3.3441e-05, 'epoch': 1.18, 'throughput': 10005.57} [INFO|2025-03-20 01:30:51] logging.py:143 >> {'loss': 0.4548, 'learning_rate': 3.3428e-05, 'epoch': 1.18, 'throughput': 10005.67} [INFO|2025-03-20 01:31:31] logging.py:143 >> {'loss': 0.4806, 'learning_rate': 3.3415e-05, 'epoch': 1.19, 'throughput': 10005.73} [INFO|2025-03-20 01:32:11] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 3.3401e-05, 'epoch': 1.19, 'throughput': 10005.58} [INFO|2025-03-20 01:32:50] logging.py:143 >> {'loss': 0.4753, 'learning_rate': 3.3388e-05, 'epoch': 1.19, 'throughput': 10005.60} [INFO|2025-03-20 01:33:31] logging.py:143 >> {'loss': 0.4589, 'learning_rate': 3.3375e-05, 'epoch': 1.19, 'throughput': 10005.52} [INFO|2025-03-20 01:34:13] logging.py:143 >> {'loss': 0.4577, 'learning_rate': 3.3362e-05, 'epoch': 1.19, 'throughput': 10005.47} [INFO|2025-03-20 01:34:55] logging.py:143 >> {'loss': 0.4783, 'learning_rate': 3.3349e-05, 'epoch': 1.19, 'throughput': 10005.47} [INFO|2025-03-20 01:35:35] logging.py:143 >> {'loss': 0.4713, 'learning_rate': 3.3336e-05, 'epoch': 1.19, 'throughput': 10005.45} [INFO|2025-03-20 01:36:15] logging.py:143 >> {'loss': 0.4586, 'learning_rate': 3.3322e-05, 'epoch': 1.19, 'throughput': 10005.57} [INFO|2025-03-20 01:36:56] logging.py:143 >> {'loss': 0.4624, 'learning_rate': 3.3309e-05, 'epoch': 1.19, 'throughput': 10005.56} [INFO|2025-03-20 01:37:36] logging.py:143 >> {'loss': 0.4981, 'learning_rate': 3.3296e-05, 'epoch': 1.19, 'throughput': 10005.60} [INFO|2025-03-20 01:38:16] logging.py:143 >> {'loss': 0.4979, 'learning_rate': 3.3283e-05, 'epoch': 1.19, 'throughput': 10005.74} [INFO|2025-03-20 01:38:57] logging.py:143 >> {'loss': 0.4657, 'learning_rate': 3.3270e-05, 'epoch': 1.19, 'throughput': 10005.76} [INFO|2025-03-20 01:39:37] logging.py:143 >> {'loss': 0.5011, 'learning_rate': 3.3256e-05, 'epoch': 1.19, 'throughput': 10005.79} [INFO|2025-03-20 01:40:17] logging.py:143 >> {'loss': 0.4640, 'learning_rate': 3.3243e-05, 'epoch': 1.19, 'throughput': 10005.71} [INFO|2025-03-20 01:41:00] logging.py:143 >> {'loss': 0.4889, 'learning_rate': 3.3230e-05, 'epoch': 1.19, 'throughput': 10005.49} [INFO|2025-03-20 01:41:40] logging.py:143 >> {'loss': 0.5032, 'learning_rate': 3.3217e-05, 'epoch': 1.19, 'throughput': 10005.61} [INFO|2025-03-20 01:42:21] logging.py:143 >> {'loss': 0.4952, 'learning_rate': 3.3203e-05, 'epoch': 1.19, 'throughput': 10005.58} [INFO|2025-03-20 01:43:02] logging.py:143 >> {'loss': 0.4653, 'learning_rate': 3.3190e-05, 'epoch': 1.19, 'throughput': 10005.57} [INFO|2025-03-20 01:43:42] logging.py:143 >> {'loss': 0.4707, 'learning_rate': 3.3177e-05, 'epoch': 1.19, 'throughput': 10005.63} [INFO|2025-03-20 01:44:23] logging.py:143 >> {'loss': 0.4861, 'learning_rate': 3.3164e-05, 'epoch': 1.20, 'throughput': 10005.64} [INFO|2025-03-20 01:45:02] logging.py:143 >> {'loss': 0.4708, 'learning_rate': 3.3151e-05, 'epoch': 1.20, 'throughput': 10005.70} [INFO|2025-03-20 01:45:42] logging.py:143 >> {'loss': 0.4790, 'learning_rate': 3.3137e-05, 'epoch': 1.20, 'throughput': 10005.80} [INFO|2025-03-20 01:46:24] logging.py:143 >> {'loss': 0.4822, 'learning_rate': 3.3124e-05, 'epoch': 1.20, 'throughput': 10005.70} [INFO|2025-03-20 01:47:04] logging.py:143 >> {'loss': 0.4560, 'learning_rate': 3.3111e-05, 'epoch': 1.20, 'throughput': 10005.73} [INFO|2025-03-20 01:47:45] logging.py:143 >> {'loss': 0.4452, 'learning_rate': 3.3098e-05, 'epoch': 1.20, 'throughput': 10005.66} [INFO|2025-03-20 01:48:25] logging.py:143 >> {'loss': 0.4705, 'learning_rate': 3.3084e-05, 'epoch': 1.20, 'throughput': 10005.80} [INFO|2025-03-20 01:49:05] logging.py:143 >> {'loss': 0.4775, 'learning_rate': 3.3071e-05, 'epoch': 1.20, 'throughput': 10005.85} [INFO|2025-03-20 01:49:48] logging.py:143 >> {'loss': 0.4875, 'learning_rate': 3.3058e-05, 'epoch': 1.20, 'throughput': 10005.66} [INFO|2025-03-20 01:50:30] logging.py:143 >> {'loss': 0.4823, 'learning_rate': 3.3045e-05, 'epoch': 1.20, 'throughput': 10005.56} [INFO|2025-03-20 01:51:12] logging.py:143 >> {'loss': 0.4578, 'learning_rate': 3.3031e-05, 'epoch': 1.20, 'throughput': 10005.52} [INFO|2025-03-20 01:51:51] logging.py:143 >> {'loss': 0.4697, 'learning_rate': 3.3018e-05, 'epoch': 1.20, 'throughput': 10005.48} [INFO|2025-03-20 01:52:35] logging.py:143 >> {'loss': 0.4577, 'learning_rate': 3.3005e-05, 'epoch': 1.20, 'throughput': 10005.38} [INFO|2025-03-20 01:53:14] logging.py:143 >> {'loss': 0.4988, 'learning_rate': 3.2992e-05, 'epoch': 1.20, 'throughput': 10005.36} [INFO|2025-03-20 01:53:56] logging.py:143 >> {'loss': 0.4908, 'learning_rate': 3.2978e-05, 'epoch': 1.20, 'throughput': 10005.26} [INFO|2025-03-20 01:54:39] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 3.2965e-05, 'epoch': 1.20, 'throughput': 10005.11} [INFO|2025-03-20 01:55:20] logging.py:143 >> {'loss': 0.4942, 'learning_rate': 3.2952e-05, 'epoch': 1.20, 'throughput': 10005.13} [INFO|2025-03-20 01:56:01] logging.py:143 >> {'loss': 0.4924, 'learning_rate': 3.2939e-05, 'epoch': 1.20, 'throughput': 10005.10} [INFO|2025-03-20 01:56:42] logging.py:143 >> {'loss': 0.4821, 'learning_rate': 3.2925e-05, 'epoch': 1.20, 'throughput': 10005.10} [INFO|2025-03-20 01:57:24] logging.py:143 >> {'loss': 0.4859, 'learning_rate': 3.2912e-05, 'epoch': 1.21, 'throughput': 10004.89} [INFO|2025-03-20 01:58:05] logging.py:143 >> {'loss': 0.4685, 'learning_rate': 3.2899e-05, 'epoch': 1.21, 'throughput': 10004.89} [INFO|2025-03-20 01:58:47] logging.py:143 >> {'loss': 0.4365, 'learning_rate': 3.2885e-05, 'epoch': 1.21, 'throughput': 10004.72} [INFO|2025-03-20 01:59:28] logging.py:143 >> {'loss': 0.4337, 'learning_rate': 3.2872e-05, 'epoch': 1.21, 'throughput': 10004.54} [INFO|2025-03-20 02:00:09] logging.py:143 >> {'loss': 0.4362, 'learning_rate': 3.2859e-05, 'epoch': 1.21, 'throughput': 10004.46} [INFO|2025-03-20 02:00:48] logging.py:143 >> {'loss': 0.4717, 'learning_rate': 3.2846e-05, 'epoch': 1.21, 'throughput': 10004.52} [INFO|2025-03-20 02:01:28] logging.py:143 >> {'loss': 0.4771, 'learning_rate': 3.2832e-05, 'epoch': 1.21, 'throughput': 10004.60} [INFO|2025-03-20 02:02:08] logging.py:143 >> {'loss': 0.4862, 'learning_rate': 3.2819e-05, 'epoch': 1.21, 'throughput': 10004.57} [INFO|2025-03-20 02:02:48] logging.py:143 >> {'loss': 0.4604, 'learning_rate': 3.2806e-05, 'epoch': 1.21, 'throughput': 10004.65} [INFO|2025-03-20 02:03:29] logging.py:143 >> {'loss': 0.4533, 'learning_rate': 3.2792e-05, 'epoch': 1.21, 'throughput': 10004.67} [INFO|2025-03-20 02:04:09] logging.py:143 >> {'loss': 0.4942, 'learning_rate': 3.2779e-05, 'epoch': 1.21, 'throughput': 10004.61} [INFO|2025-03-20 02:04:48] logging.py:143 >> {'loss': 0.4724, 'learning_rate': 3.2766e-05, 'epoch': 1.21, 'throughput': 10004.78} [INFO|2025-03-20 02:05:29] logging.py:143 >> {'loss': 0.4810, 'learning_rate': 3.2753e-05, 'epoch': 1.21, 'throughput': 10004.84} [INFO|2025-03-20 02:06:10] logging.py:143 >> {'loss': 0.4851, 'learning_rate': 3.2739e-05, 'epoch': 1.21, 'throughput': 10004.89} [INFO|2025-03-20 02:06:49] logging.py:143 >> {'loss': 0.4729, 'learning_rate': 3.2726e-05, 'epoch': 1.21, 'throughput': 10004.80} [INFO|2025-03-20 02:07:29] logging.py:143 >> {'loss': 0.4849, 'learning_rate': 3.2713e-05, 'epoch': 1.21, 'throughput': 10004.81} [INFO|2025-03-20 02:08:11] logging.py:143 >> {'loss': 0.4623, 'learning_rate': 3.2699e-05, 'epoch': 1.21, 'throughput': 10004.77} [INFO|2025-03-20 02:08:51] logging.py:143 >> {'loss': 0.4754, 'learning_rate': 3.2686e-05, 'epoch': 1.21, 'throughput': 10004.83} [INFO|2025-03-20 02:09:32] logging.py:143 >> {'loss': 0.4548, 'learning_rate': 3.2673e-05, 'epoch': 1.21, 'throughput': 10004.83} [INFO|2025-03-20 02:10:12] logging.py:143 >> {'loss': 0.4520, 'learning_rate': 3.2659e-05, 'epoch': 1.22, 'throughput': 10004.80} [INFO|2025-03-20 02:10:54] logging.py:143 >> {'loss': 0.4803, 'learning_rate': 3.2646e-05, 'epoch': 1.22, 'throughput': 10004.73} [INFO|2025-03-20 02:11:37] logging.py:143 >> {'loss': 0.4728, 'learning_rate': 3.2633e-05, 'epoch': 1.22, 'throughput': 10004.51} [INFO|2025-03-20 02:12:18] logging.py:143 >> {'loss': 0.4804, 'learning_rate': 3.2619e-05, 'epoch': 1.22, 'throughput': 10004.53} [INFO|2025-03-20 02:12:58] logging.py:143 >> {'loss': 0.4425, 'learning_rate': 3.2606e-05, 'epoch': 1.22, 'throughput': 10004.48} [INFO|2025-03-20 02:13:38] logging.py:143 >> {'loss': 0.4902, 'learning_rate': 3.2593e-05, 'epoch': 1.22, 'throughput': 10004.66} [INFO|2025-03-20 02:14:19] logging.py:143 >> {'loss': 0.4787, 'learning_rate': 3.2579e-05, 'epoch': 1.22, 'throughput': 10004.65} [INFO|2025-03-20 02:14:59] logging.py:143 >> {'loss': 0.4568, 'learning_rate': 3.2566e-05, 'epoch': 1.22, 'throughput': 10004.55} [INFO|2025-03-20 02:15:41] logging.py:143 >> {'loss': 0.4426, 'learning_rate': 3.2553e-05, 'epoch': 1.22, 'throughput': 10004.41} [INFO|2025-03-20 02:16:22] logging.py:143 >> {'loss': 0.4568, 'learning_rate': 3.2539e-05, 'epoch': 1.22, 'throughput': 10004.50} [INFO|2025-03-20 02:17:02] logging.py:143 >> {'loss': 0.4454, 'learning_rate': 3.2526e-05, 'epoch': 1.22, 'throughput': 10004.63} [INFO|2025-03-20 02:17:41] logging.py:143 >> {'loss': 0.4559, 'learning_rate': 3.2513e-05, 'epoch': 1.22, 'throughput': 10004.72} [INFO|2025-03-20 02:18:22] logging.py:143 >> {'loss': 0.4711, 'learning_rate': 3.2499e-05, 'epoch': 1.22, 'throughput': 10004.55} [INFO|2025-03-20 02:19:03] logging.py:143 >> {'loss': 0.4615, 'learning_rate': 3.2486e-05, 'epoch': 1.22, 'throughput': 10004.53} [INFO|2025-03-20 02:19:42] logging.py:143 >> {'loss': 0.4556, 'learning_rate': 3.2473e-05, 'epoch': 1.22, 'throughput': 10004.61} [INFO|2025-03-20 02:20:24] logging.py:143 >> {'loss': 0.4660, 'learning_rate': 3.2459e-05, 'epoch': 1.22, 'throughput': 10004.63} [INFO|2025-03-20 02:21:04] logging.py:143 >> {'loss': 0.4621, 'learning_rate': 3.2446e-05, 'epoch': 1.22, 'throughput': 10004.68} [INFO|2025-03-20 02:21:46] logging.py:143 >> {'loss': 0.4803, 'learning_rate': 3.2433e-05, 'epoch': 1.22, 'throughput': 10004.52} [INFO|2025-03-20 02:22:28] logging.py:143 >> {'loss': 0.4700, 'learning_rate': 3.2419e-05, 'epoch': 1.22, 'throughput': 10004.36} [INFO|2025-03-20 02:23:10] logging.py:143 >> {'loss': 0.4341, 'learning_rate': 3.2406e-05, 'epoch': 1.23, 'throughput': 10004.22} [INFO|2025-03-20 02:23:51] logging.py:143 >> {'loss': 0.4798, 'learning_rate': 3.2392e-05, 'epoch': 1.23, 'throughput': 10004.15} [INFO|2025-03-20 02:24:32] logging.py:143 >> {'loss': 0.4735, 'learning_rate': 3.2379e-05, 'epoch': 1.23, 'throughput': 10004.15} [INFO|2025-03-20 02:25:13] logging.py:143 >> {'loss': 0.4651, 'learning_rate': 3.2366e-05, 'epoch': 1.23, 'throughput': 10004.18} [INFO|2025-03-20 02:25:55] logging.py:143 >> {'loss': 0.4777, 'learning_rate': 3.2352e-05, 'epoch': 1.23, 'throughput': 10004.03} [INFO|2025-03-20 02:26:37] logging.py:143 >> {'loss': 0.4723, 'learning_rate': 3.2339e-05, 'epoch': 1.23, 'throughput': 10004.03} [INFO|2025-03-20 02:27:16] logging.py:143 >> {'loss': 0.4642, 'learning_rate': 3.2326e-05, 'epoch': 1.23, 'throughput': 10004.15} [INFO|2025-03-20 02:27:57] logging.py:143 >> {'loss': 0.4564, 'learning_rate': 3.2312e-05, 'epoch': 1.23, 'throughput': 10004.19} [INFO|2025-03-20 02:28:37] logging.py:143 >> {'loss': 0.4892, 'learning_rate': 3.2299e-05, 'epoch': 1.23, 'throughput': 10004.22} [INFO|2025-03-20 02:29:18] logging.py:143 >> {'loss': 0.4792, 'learning_rate': 3.2285e-05, 'epoch': 1.23, 'throughput': 10004.25} [INFO|2025-03-20 02:29:58] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 3.2272e-05, 'epoch': 1.23, 'throughput': 10004.28} [INFO|2025-03-20 02:30:39] logging.py:143 >> {'loss': 0.4782, 'learning_rate': 3.2259e-05, 'epoch': 1.23, 'throughput': 10004.29} [INFO|2025-03-20 02:31:21] logging.py:143 >> {'loss': 0.4829, 'learning_rate': 3.2245e-05, 'epoch': 1.23, 'throughput': 10004.28} [INFO|2025-03-20 02:32:03] logging.py:143 >> {'loss': 0.4871, 'learning_rate': 3.2232e-05, 'epoch': 1.23, 'throughput': 10004.15} [INFO|2025-03-20 02:32:43] logging.py:143 >> {'loss': 0.4655, 'learning_rate': 3.2219e-05, 'epoch': 1.23, 'throughput': 10004.18} [INFO|2025-03-20 02:33:24] logging.py:143 >> {'loss': 0.4797, 'learning_rate': 3.2205e-05, 'epoch': 1.23, 'throughput': 10004.01} [INFO|2025-03-20 02:34:05] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 3.2192e-05, 'epoch': 1.23, 'throughput': 10004.03} [INFO|2025-03-20 02:34:46] logging.py:143 >> {'loss': 0.4632, 'learning_rate': 3.2178e-05, 'epoch': 1.23, 'throughput': 10003.96} [INFO|2025-03-20 02:35:26] logging.py:143 >> {'loss': 0.4857, 'learning_rate': 3.2165e-05, 'epoch': 1.24, 'throughput': 10003.99} [INFO|2025-03-20 02:36:06] logging.py:143 >> {'loss': 0.4769, 'learning_rate': 3.2152e-05, 'epoch': 1.24, 'throughput': 10004.07} [INFO|2025-03-20 02:36:47] logging.py:143 >> {'loss': 0.4858, 'learning_rate': 3.2138e-05, 'epoch': 1.24, 'throughput': 10004.23} [INFO|2025-03-20 02:37:27] logging.py:143 >> {'loss': 0.4622, 'learning_rate': 3.2125e-05, 'epoch': 1.24, 'throughput': 10004.08} [INFO|2025-03-20 02:38:09] logging.py:143 >> {'loss': 0.4581, 'learning_rate': 3.2111e-05, 'epoch': 1.24, 'throughput': 10003.99} [INFO|2025-03-20 02:38:51] logging.py:143 >> {'loss': 0.4711, 'learning_rate': 3.2098e-05, 'epoch': 1.24, 'throughput': 10004.01} [INFO|2025-03-20 02:39:31] logging.py:143 >> {'loss': 0.4618, 'learning_rate': 3.2084e-05, 'epoch': 1.24, 'throughput': 10004.00} [INFO|2025-03-20 02:40:12] logging.py:143 >> {'loss': 0.4943, 'learning_rate': 3.2071e-05, 'epoch': 1.24, 'throughput': 10003.91} [INFO|2025-03-20 02:40:53] logging.py:143 >> {'loss': 0.4591, 'learning_rate': 3.2058e-05, 'epoch': 1.24, 'throughput': 10003.99} [INFO|2025-03-20 02:41:34] logging.py:143 >> {'loss': 0.4820, 'learning_rate': 3.2044e-05, 'epoch': 1.24, 'throughput': 10004.07} [INFO|2025-03-20 02:42:14] logging.py:143 >> {'loss': 0.4611, 'learning_rate': 3.2031e-05, 'epoch': 1.24, 'throughput': 10004.06} [INFO|2025-03-20 02:42:56] logging.py:143 >> {'loss': 0.4557, 'learning_rate': 3.2017e-05, 'epoch': 1.24, 'throughput': 10003.89} [INFO|2025-03-20 02:43:39] logging.py:143 >> {'loss': 0.4770, 'learning_rate': 3.2004e-05, 'epoch': 1.24, 'throughput': 10003.65} [INFO|2025-03-20 02:44:18] logging.py:143 >> {'loss': 0.4286, 'learning_rate': 3.1990e-05, 'epoch': 1.24, 'throughput': 10003.69} [INFO|2025-03-20 02:45:00] logging.py:143 >> {'loss': 0.4714, 'learning_rate': 3.1977e-05, 'epoch': 1.24, 'throughput': 10003.64} [INFO|2025-03-20 02:45:40] logging.py:143 >> {'loss': 0.4406, 'learning_rate': 3.1964e-05, 'epoch': 1.24, 'throughput': 10003.62} [INFO|2025-03-20 02:46:21] logging.py:143 >> {'loss': 0.4367, 'learning_rate': 3.1950e-05, 'epoch': 1.24, 'throughput': 10003.71} [INFO|2025-03-20 02:47:02] logging.py:143 >> {'loss': 0.4773, 'learning_rate': 3.1937e-05, 'epoch': 1.24, 'throughput': 10003.58} [INFO|2025-03-20 02:47:43] logging.py:143 >> {'loss': 0.4533, 'learning_rate': 3.1923e-05, 'epoch': 1.24, 'throughput': 10003.66} [INFO|2025-03-20 02:48:22] logging.py:143 >> {'loss': 0.4588, 'learning_rate': 3.1910e-05, 'epoch': 1.25, 'throughput': 10003.77} [INFO|2025-03-20 02:49:02] logging.py:143 >> {'loss': 0.4506, 'learning_rate': 3.1896e-05, 'epoch': 1.25, 'throughput': 10003.83} [INFO|2025-03-20 02:49:43] logging.py:143 >> {'loss': 0.4727, 'learning_rate': 3.1883e-05, 'epoch': 1.25, 'throughput': 10003.79} [INFO|2025-03-20 02:50:23] logging.py:143 >> {'loss': 0.4766, 'learning_rate': 3.1869e-05, 'epoch': 1.25, 'throughput': 10003.78} [INFO|2025-03-20 02:51:02] logging.py:143 >> {'loss': 0.4646, 'learning_rate': 3.1856e-05, 'epoch': 1.25, 'throughput': 10003.84} [INFO|2025-03-20 02:51:44] logging.py:143 >> {'loss': 0.4751, 'learning_rate': 3.1843e-05, 'epoch': 1.25, 'throughput': 10003.69} [INFO|2025-03-20 02:52:25] logging.py:143 >> {'loss': 0.4650, 'learning_rate': 3.1829e-05, 'epoch': 1.25, 'throughput': 10003.63} [INFO|2025-03-20 02:53:06] logging.py:143 >> {'loss': 0.4346, 'learning_rate': 3.1816e-05, 'epoch': 1.25, 'throughput': 10003.44} [INFO|2025-03-20 02:53:47] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 3.1802e-05, 'epoch': 1.25, 'throughput': 10003.50} [INFO|2025-03-20 02:54:28] logging.py:143 >> {'loss': 0.4445, 'learning_rate': 3.1789e-05, 'epoch': 1.25, 'throughput': 10003.61} [INFO|2025-03-20 02:55:08] logging.py:143 >> {'loss': 0.4404, 'learning_rate': 3.1775e-05, 'epoch': 1.25, 'throughput': 10003.64} [INFO|2025-03-20 02:55:47] logging.py:143 >> {'loss': 0.4619, 'learning_rate': 3.1762e-05, 'epoch': 1.25, 'throughput': 10003.60} [INFO|2025-03-20 02:56:29] logging.py:143 >> {'loss': 0.4418, 'learning_rate': 3.1748e-05, 'epoch': 1.25, 'throughput': 10003.45} [INFO|2025-03-20 02:57:10] logging.py:143 >> {'loss': 0.4735, 'learning_rate': 3.1735e-05, 'epoch': 1.25, 'throughput': 10003.32} [INFO|2025-03-20 02:57:50] logging.py:143 >> {'loss': 0.4448, 'learning_rate': 3.1721e-05, 'epoch': 1.25, 'throughput': 10003.51} [INFO|2025-03-20 02:58:32] logging.py:143 >> {'loss': 0.4688, 'learning_rate': 3.1708e-05, 'epoch': 1.25, 'throughput': 10003.40} [INFO|2025-03-20 02:59:15] logging.py:143 >> {'loss': 0.4992, 'learning_rate': 3.1694e-05, 'epoch': 1.25, 'throughput': 10003.31} [INFO|2025-03-20 02:59:56] logging.py:143 >> {'loss': 0.4673, 'learning_rate': 3.1681e-05, 'epoch': 1.25, 'throughput': 10003.19} [INFO|2025-03-20 03:00:36] logging.py:143 >> {'loss': 0.4799, 'learning_rate': 3.1667e-05, 'epoch': 1.25, 'throughput': 10003.21} [INFO|2025-03-20 03:01:16] logging.py:143 >> {'loss': 0.4705, 'learning_rate': 3.1654e-05, 'epoch': 1.26, 'throughput': 10003.29} [INFO|2025-03-20 03:01:57] logging.py:143 >> {'loss': 0.4610, 'learning_rate': 3.1640e-05, 'epoch': 1.26, 'throughput': 10003.20} [INFO|2025-03-20 03:02:38] logging.py:143 >> {'loss': 0.4761, 'learning_rate': 3.1627e-05, 'epoch': 1.26, 'throughput': 10003.26} [INFO|2025-03-20 03:03:17] logging.py:143 >> {'loss': 0.4496, 'learning_rate': 3.1613e-05, 'epoch': 1.26, 'throughput': 10003.23} [INFO|2025-03-20 03:04:00] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 3.1600e-05, 'epoch': 1.26, 'throughput': 10003.12} [INFO|2025-03-20 03:04:43] logging.py:143 >> {'loss': 0.4812, 'learning_rate': 3.1586e-05, 'epoch': 1.26, 'throughput': 10002.95} [INFO|2025-03-20 03:05:24] logging.py:143 >> {'loss': 0.4833, 'learning_rate': 3.1573e-05, 'epoch': 1.26, 'throughput': 10002.96} [INFO|2025-03-20 03:06:06] logging.py:143 >> {'loss': 0.4642, 'learning_rate': 3.1559e-05, 'epoch': 1.26, 'throughput': 10002.86} [INFO|2025-03-20 03:06:46] logging.py:143 >> {'loss': 0.4978, 'learning_rate': 3.1546e-05, 'epoch': 1.26, 'throughput': 10002.92} [INFO|2025-03-20 03:07:28] logging.py:143 >> {'loss': 0.4623, 'learning_rate': 3.1532e-05, 'epoch': 1.26, 'throughput': 10002.75} [INFO|2025-03-20 03:08:08] logging.py:143 >> {'loss': 0.4421, 'learning_rate': 3.1519e-05, 'epoch': 1.26, 'throughput': 10002.76} [INFO|2025-03-20 03:08:48] logging.py:143 >> {'loss': 0.4580, 'learning_rate': 3.1505e-05, 'epoch': 1.26, 'throughput': 10002.89} [INFO|2025-03-20 03:09:28] logging.py:143 >> {'loss': 0.4677, 'learning_rate': 3.1492e-05, 'epoch': 1.26, 'throughput': 10002.98} [INFO|2025-03-20 03:10:09] logging.py:143 >> {'loss': 0.4690, 'learning_rate': 3.1478e-05, 'epoch': 1.26, 'throughput': 10002.95} [INFO|2025-03-20 03:10:50] logging.py:143 >> {'loss': 0.4605, 'learning_rate': 3.1465e-05, 'epoch': 1.26, 'throughput': 10003.07} [INFO|2025-03-20 03:11:30] logging.py:143 >> {'loss': 0.4888, 'learning_rate': 3.1451e-05, 'epoch': 1.26, 'throughput': 10003.04} [INFO|2025-03-20 03:12:12] logging.py:143 >> {'loss': 0.4472, 'learning_rate': 3.1438e-05, 'epoch': 1.26, 'throughput': 10002.93} [INFO|2025-03-20 03:12:53] logging.py:143 >> {'loss': 0.4578, 'learning_rate': 3.1424e-05, 'epoch': 1.26, 'throughput': 10002.80} [INFO|2025-03-20 03:13:34] logging.py:143 >> {'loss': 0.4584, 'learning_rate': 3.1411e-05, 'epoch': 1.26, 'throughput': 10002.69} [INFO|2025-03-20 03:14:16] logging.py:143 >> {'loss': 0.4948, 'learning_rate': 3.1397e-05, 'epoch': 1.27, 'throughput': 10002.64} [INFO|2025-03-20 03:14:57] logging.py:143 >> {'loss': 0.4550, 'learning_rate': 3.1384e-05, 'epoch': 1.27, 'throughput': 10002.71} [INFO|2025-03-20 03:15:36] logging.py:143 >> {'loss': 0.4658, 'learning_rate': 3.1370e-05, 'epoch': 1.27, 'throughput': 10002.81} [INFO|2025-03-20 03:16:18] logging.py:143 >> {'loss': 0.4573, 'learning_rate': 3.1357e-05, 'epoch': 1.27, 'throughput': 10002.79} [INFO|2025-03-20 03:17:00] logging.py:143 >> {'loss': 0.4834, 'learning_rate': 3.1343e-05, 'epoch': 1.27, 'throughput': 10002.70} [INFO|2025-03-20 03:17:40] logging.py:143 >> {'loss': 0.4300, 'learning_rate': 3.1330e-05, 'epoch': 1.27, 'throughput': 10002.69} [INFO|2025-03-20 03:18:21] logging.py:143 >> {'loss': 0.4611, 'learning_rate': 3.1316e-05, 'epoch': 1.27, 'throughput': 10002.65} [INFO|2025-03-20 03:19:01] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 3.1302e-05, 'epoch': 1.27, 'throughput': 10002.56} [INFO|2025-03-20 03:19:43] logging.py:143 >> {'loss': 0.4308, 'learning_rate': 3.1289e-05, 'epoch': 1.27, 'throughput': 10002.53} [INFO|2025-03-20 03:20:22] logging.py:143 >> {'loss': 0.4573, 'learning_rate': 3.1275e-05, 'epoch': 1.27, 'throughput': 10002.62} [INFO|2025-03-20 03:21:02] logging.py:143 >> {'loss': 0.4322, 'learning_rate': 3.1262e-05, 'epoch': 1.27, 'throughput': 10002.59} [INFO|2025-03-20 03:21:42] logging.py:143 >> {'loss': 0.4623, 'learning_rate': 3.1248e-05, 'epoch': 1.27, 'throughput': 10002.60} [INFO|2025-03-20 03:22:23] logging.py:143 >> {'loss': 0.4430, 'learning_rate': 3.1235e-05, 'epoch': 1.27, 'throughput': 10002.54} [INFO|2025-03-20 03:23:04] logging.py:143 >> {'loss': 0.4838, 'learning_rate': 3.1221e-05, 'epoch': 1.27, 'throughput': 10002.39} [INFO|2025-03-20 03:23:46] logging.py:143 >> {'loss': 0.4646, 'learning_rate': 3.1208e-05, 'epoch': 1.27, 'throughput': 10002.26} [INFO|2025-03-20 03:24:26] logging.py:143 >> {'loss': 0.4419, 'learning_rate': 3.1194e-05, 'epoch': 1.27, 'throughput': 10002.15} [INFO|2025-03-20 03:25:06] logging.py:143 >> {'loss': 0.4814, 'learning_rate': 3.1181e-05, 'epoch': 1.27, 'throughput': 10002.25} [INFO|2025-03-20 03:25:47] logging.py:143 >> {'loss': 0.4464, 'learning_rate': 3.1167e-05, 'epoch': 1.27, 'throughput': 10002.22} [INFO|2025-03-20 03:26:27] logging.py:143 >> {'loss': 0.4695, 'learning_rate': 3.1153e-05, 'epoch': 1.27, 'throughput': 10002.21} [INFO|2025-03-20 03:27:08] logging.py:143 >> {'loss': 0.4775, 'learning_rate': 3.1140e-05, 'epoch': 1.28, 'throughput': 10002.31} [INFO|2025-03-20 03:27:48] logging.py:143 >> {'loss': 0.4708, 'learning_rate': 3.1126e-05, 'epoch': 1.28, 'throughput': 10002.29} [INFO|2025-03-20 03:28:29] logging.py:143 >> {'loss': 0.4497, 'learning_rate': 3.1113e-05, 'epoch': 1.28, 'throughput': 10002.11} [INFO|2025-03-20 03:29:09] logging.py:143 >> {'loss': 0.4879, 'learning_rate': 3.1099e-05, 'epoch': 1.28, 'throughput': 10002.15} [INFO|2025-03-20 03:29:49] logging.py:143 >> {'loss': 0.4519, 'learning_rate': 3.1086e-05, 'epoch': 1.28, 'throughput': 10002.08} [INFO|2025-03-20 03:30:29] logging.py:143 >> {'loss': 0.4714, 'learning_rate': 3.1072e-05, 'epoch': 1.28, 'throughput': 10001.98} [INFO|2025-03-20 03:31:10] logging.py:143 >> {'loss': 0.4773, 'learning_rate': 3.1058e-05, 'epoch': 1.28, 'throughput': 10002.07} [INFO|2025-03-20 03:31:50] logging.py:143 >> {'loss': 0.4565, 'learning_rate': 3.1045e-05, 'epoch': 1.28, 'throughput': 10002.07} [INFO|2025-03-20 03:32:30] logging.py:143 >> {'loss': 0.4250, 'learning_rate': 3.1031e-05, 'epoch': 1.28, 'throughput': 10002.02} [INFO|2025-03-20 03:33:10] logging.py:143 >> {'loss': 0.4455, 'learning_rate': 3.1018e-05, 'epoch': 1.28, 'throughput': 10002.05} [INFO|2025-03-20 03:33:52] logging.py:143 >> {'loss': 0.4567, 'learning_rate': 3.1004e-05, 'epoch': 1.28, 'throughput': 10001.98} [INFO|2025-03-20 03:34:33] logging.py:143 >> {'loss': 0.4812, 'learning_rate': 3.0991e-05, 'epoch': 1.28, 'throughput': 10001.90} [INFO|2025-03-20 03:35:13] logging.py:143 >> {'loss': 0.4681, 'learning_rate': 3.0977e-05, 'epoch': 1.28, 'throughput': 10001.86} [INFO|2025-03-20 03:35:54] logging.py:143 >> {'loss': 0.4538, 'learning_rate': 3.0963e-05, 'epoch': 1.28, 'throughput': 10001.87} [INFO|2025-03-20 03:36:34] logging.py:143 >> {'loss': 0.4867, 'learning_rate': 3.0950e-05, 'epoch': 1.28, 'throughput': 10001.86} [INFO|2025-03-20 03:37:17] logging.py:143 >> {'loss': 0.4665, 'learning_rate': 3.0936e-05, 'epoch': 1.28, 'throughput': 10001.72} [INFO|2025-03-20 03:37:56] logging.py:143 >> {'loss': 0.4574, 'learning_rate': 3.0923e-05, 'epoch': 1.28, 'throughput': 10001.75} [INFO|2025-03-20 03:38:35] logging.py:143 >> {'loss': 0.4696, 'learning_rate': 3.0909e-05, 'epoch': 1.28, 'throughput': 10001.85} [INFO|2025-03-20 03:39:15] logging.py:143 >> {'loss': 0.4511, 'learning_rate': 3.0895e-05, 'epoch': 1.28, 'throughput': 10001.87} [INFO|2025-03-20 03:39:56] logging.py:143 >> {'loss': 0.4599, 'learning_rate': 3.0882e-05, 'epoch': 1.29, 'throughput': 10001.91} [INFO|2025-03-20 03:40:37] logging.py:143 >> {'loss': 0.4354, 'learning_rate': 3.0868e-05, 'epoch': 1.29, 'throughput': 10001.78} [INFO|2025-03-20 03:41:18] logging.py:143 >> {'loss': 0.4397, 'learning_rate': 3.0855e-05, 'epoch': 1.29, 'throughput': 10001.62} [INFO|2025-03-20 03:41:59] logging.py:143 >> {'loss': 0.4532, 'learning_rate': 3.0841e-05, 'epoch': 1.29, 'throughput': 10001.55} [INFO|2025-03-20 03:42:40] logging.py:143 >> {'loss': 0.4362, 'learning_rate': 3.0827e-05, 'epoch': 1.29, 'throughput': 10001.53} [INFO|2025-03-20 03:43:21] logging.py:143 >> {'loss': 0.4727, 'learning_rate': 3.0814e-05, 'epoch': 1.29, 'throughput': 10001.54} [INFO|2025-03-20 03:44:02] logging.py:143 >> {'loss': 0.4593, 'learning_rate': 3.0800e-05, 'epoch': 1.29, 'throughput': 10001.48} [INFO|2025-03-20 03:44:43] logging.py:143 >> {'loss': 0.4490, 'learning_rate': 3.0787e-05, 'epoch': 1.29, 'throughput': 10001.45} [INFO|2025-03-20 03:45:24] logging.py:143 >> {'loss': 0.4590, 'learning_rate': 3.0773e-05, 'epoch': 1.29, 'throughput': 10001.50} [INFO|2025-03-20 03:46:05] logging.py:143 >> {'loss': 0.4472, 'learning_rate': 3.0759e-05, 'epoch': 1.29, 'throughput': 10001.49} [INFO|2025-03-20 03:46:44] logging.py:143 >> {'loss': 0.4604, 'learning_rate': 3.0746e-05, 'epoch': 1.29, 'throughput': 10001.53} [INFO|2025-03-20 03:47:24] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 3.0732e-05, 'epoch': 1.29, 'throughput': 10001.58} [INFO|2025-03-20 03:48:04] logging.py:143 >> {'loss': 0.4681, 'learning_rate': 3.0718e-05, 'epoch': 1.29, 'throughput': 10001.66} [INFO|2025-03-20 03:48:44] logging.py:143 >> {'loss': 0.4802, 'learning_rate': 3.0705e-05, 'epoch': 1.29, 'throughput': 10001.72} [INFO|2025-03-20 03:49:24] logging.py:143 >> {'loss': 0.4410, 'learning_rate': 3.0691e-05, 'epoch': 1.29, 'throughput': 10001.73} [INFO|2025-03-20 03:50:04] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 3.0678e-05, 'epoch': 1.29, 'throughput': 10001.83} [INFO|2025-03-20 03:50:44] logging.py:143 >> {'loss': 0.4701, 'learning_rate': 3.0664e-05, 'epoch': 1.29, 'throughput': 10001.78} [INFO|2025-03-20 03:51:25] logging.py:143 >> {'loss': 0.4718, 'learning_rate': 3.0650e-05, 'epoch': 1.29, 'throughput': 10001.85} [INFO|2025-03-20 03:52:06] logging.py:143 >> {'loss': 0.4408, 'learning_rate': 3.0637e-05, 'epoch': 1.29, 'throughput': 10001.70} [INFO|2025-03-20 03:52:48] logging.py:143 >> {'loss': 0.4249, 'learning_rate': 3.0623e-05, 'epoch': 1.30, 'throughput': 10001.67} [INFO|2025-03-20 03:53:28] logging.py:143 >> {'loss': 0.4355, 'learning_rate': 3.0609e-05, 'epoch': 1.30, 'throughput': 10001.66} [INFO|2025-03-20 03:54:08] logging.py:143 >> {'loss': 0.4668, 'learning_rate': 3.0596e-05, 'epoch': 1.30, 'throughput': 10001.64} [INFO|2025-03-20 03:54:50] logging.py:143 >> {'loss': 0.4436, 'learning_rate': 3.0582e-05, 'epoch': 1.30, 'throughput': 10001.67} [INFO|2025-03-20 03:55:30] logging.py:143 >> {'loss': 0.4364, 'learning_rate': 3.0568e-05, 'epoch': 1.30, 'throughput': 10001.69} [INFO|2025-03-20 03:56:10] logging.py:143 >> {'loss': 0.4517, 'learning_rate': 3.0555e-05, 'epoch': 1.30, 'throughput': 10001.69} [INFO|2025-03-20 03:56:50] logging.py:143 >> {'loss': 0.4468, 'learning_rate': 3.0541e-05, 'epoch': 1.30, 'throughput': 10001.64} [INFO|2025-03-20 03:57:29] logging.py:143 >> {'loss': 0.4837, 'learning_rate': 3.0528e-05, 'epoch': 1.30, 'throughput': 10001.73} [INFO|2025-03-20 03:58:09] logging.py:143 >> {'loss': 0.4527, 'learning_rate': 3.0514e-05, 'epoch': 1.30, 'throughput': 10001.85} [INFO|2025-03-20 03:58:51] logging.py:143 >> {'loss': 0.4902, 'learning_rate': 3.0500e-05, 'epoch': 1.30, 'throughput': 10001.75} [INFO|2025-03-20 03:59:30] logging.py:143 >> {'loss': 0.4367, 'learning_rate': 3.0487e-05, 'epoch': 1.30, 'throughput': 10001.73} [INFO|2025-03-20 04:00:10] logging.py:143 >> {'loss': 0.4739, 'learning_rate': 3.0473e-05, 'epoch': 1.30, 'throughput': 10001.83} [INFO|2025-03-20 04:00:52] logging.py:143 >> {'loss': 0.4878, 'learning_rate': 3.0459e-05, 'epoch': 1.30, 'throughput': 10001.77} [INFO|2025-03-20 04:01:33] logging.py:143 >> {'loss': 0.4472, 'learning_rate': 3.0446e-05, 'epoch': 1.30, 'throughput': 10001.72} [INFO|2025-03-20 04:02:14] logging.py:143 >> {'loss': 0.4398, 'learning_rate': 3.0432e-05, 'epoch': 1.30, 'throughput': 10001.66} [INFO|2025-03-20 04:02:53] logging.py:143 >> {'loss': 0.4648, 'learning_rate': 3.0418e-05, 'epoch': 1.30, 'throughput': 10001.63} [INFO|2025-03-20 04:03:33] logging.py:143 >> {'loss': 0.4580, 'learning_rate': 3.0405e-05, 'epoch': 1.30, 'throughput': 10001.69} [INFO|2025-03-20 04:04:15] logging.py:143 >> {'loss': 0.4482, 'learning_rate': 3.0391e-05, 'epoch': 1.30, 'throughput': 10001.63} [INFO|2025-03-20 04:04:57] logging.py:143 >> {'loss': 0.4785, 'learning_rate': 3.0377e-05, 'epoch': 1.31, 'throughput': 10001.47} [INFO|2025-03-20 04:05:36] logging.py:143 >> {'loss': 0.4476, 'learning_rate': 3.0364e-05, 'epoch': 1.31, 'throughput': 10001.58} [INFO|2025-03-20 04:06:18] logging.py:143 >> {'loss': 0.4824, 'learning_rate': 3.0350e-05, 'epoch': 1.31, 'throughput': 10001.49} [INFO|2025-03-20 04:06:59] logging.py:143 >> {'loss': 0.4620, 'learning_rate': 3.0336e-05, 'epoch': 1.31, 'throughput': 10001.45} [INFO|2025-03-20 04:07:39] logging.py:143 >> {'loss': 0.4635, 'learning_rate': 3.0323e-05, 'epoch': 1.31, 'throughput': 10001.42} [INFO|2025-03-20 04:08:18] logging.py:143 >> {'loss': 0.4726, 'learning_rate': 3.0309e-05, 'epoch': 1.31, 'throughput': 10001.44} [INFO|2025-03-20 04:08:59] logging.py:143 >> {'loss': 0.4317, 'learning_rate': 3.0295e-05, 'epoch': 1.31, 'throughput': 10001.45} [INFO|2025-03-20 04:09:39] logging.py:143 >> {'loss': 0.4494, 'learning_rate': 3.0282e-05, 'epoch': 1.31, 'throughput': 10001.46} [INFO|2025-03-20 04:10:19] logging.py:143 >> {'loss': 0.4566, 'learning_rate': 3.0268e-05, 'epoch': 1.31, 'throughput': 10001.47} [INFO|2025-03-20 04:10:59] logging.py:143 >> {'loss': 0.4566, 'learning_rate': 3.0254e-05, 'epoch': 1.31, 'throughput': 10001.44} [INFO|2025-03-20 04:11:40] logging.py:143 >> {'loss': 0.4509, 'learning_rate': 3.0241e-05, 'epoch': 1.31, 'throughput': 10001.37} [INFO|2025-03-20 04:12:20] logging.py:143 >> {'loss': 0.4678, 'learning_rate': 3.0227e-05, 'epoch': 1.31, 'throughput': 10001.37} [INFO|2025-03-20 04:13:02] logging.py:143 >> {'loss': 0.4437, 'learning_rate': 3.0213e-05, 'epoch': 1.31, 'throughput': 10001.45} [INFO|2025-03-20 04:13:43] logging.py:143 >> {'loss': 0.4715, 'learning_rate': 3.0200e-05, 'epoch': 1.31, 'throughput': 10001.40} [INFO|2025-03-20 04:14:24] logging.py:143 >> {'loss': 0.4583, 'learning_rate': 3.0186e-05, 'epoch': 1.31, 'throughput': 10001.49} [INFO|2025-03-20 04:15:05] logging.py:143 >> {'loss': 0.4198, 'learning_rate': 3.0172e-05, 'epoch': 1.31, 'throughput': 10001.44} [INFO|2025-03-20 04:15:45] logging.py:143 >> {'loss': 0.4585, 'learning_rate': 3.0158e-05, 'epoch': 1.31, 'throughput': 10001.52} [INFO|2025-03-20 04:16:26] logging.py:143 >> {'loss': 0.4474, 'learning_rate': 3.0145e-05, 'epoch': 1.31, 'throughput': 10001.45} [INFO|2025-03-20 04:17:07] logging.py:143 >> {'loss': 0.4474, 'learning_rate': 3.0131e-05, 'epoch': 1.31, 'throughput': 10001.41} [INFO|2025-03-20 04:17:47] logging.py:143 >> {'loss': 0.4427, 'learning_rate': 3.0117e-05, 'epoch': 1.32, 'throughput': 10001.41} [INFO|2025-03-20 04:18:28] logging.py:143 >> {'loss': 0.4524, 'learning_rate': 3.0104e-05, 'epoch': 1.32, 'throughput': 10001.33} [INFO|2025-03-20 04:19:09] logging.py:143 >> {'loss': 0.4544, 'learning_rate': 3.0090e-05, 'epoch': 1.32, 'throughput': 10001.38} [INFO|2025-03-20 04:19:49] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 3.0076e-05, 'epoch': 1.32, 'throughput': 10001.43} [INFO|2025-03-20 04:20:29] logging.py:143 >> {'loss': 0.4282, 'learning_rate': 3.0063e-05, 'epoch': 1.32, 'throughput': 10001.57} [INFO|2025-03-20 04:21:09] logging.py:143 >> {'loss': 0.4578, 'learning_rate': 3.0049e-05, 'epoch': 1.32, 'throughput': 10001.57} [INFO|2025-03-20 04:21:51] logging.py:143 >> {'loss': 0.4367, 'learning_rate': 3.0035e-05, 'epoch': 1.32, 'throughput': 10001.53} [INFO|2025-03-20 04:22:30] logging.py:143 >> {'loss': 0.4657, 'learning_rate': 3.0021e-05, 'epoch': 1.32, 'throughput': 10001.61} [INFO|2025-03-20 04:23:12] logging.py:143 >> {'loss': 0.4757, 'learning_rate': 3.0008e-05, 'epoch': 1.32, 'throughput': 10001.47} [INFO|2025-03-20 04:23:53] logging.py:143 >> {'loss': 0.4683, 'learning_rate': 2.9994e-05, 'epoch': 1.32, 'throughput': 10001.41} [INFO|2025-03-20 04:24:33] logging.py:143 >> {'loss': 0.4821, 'learning_rate': 2.9980e-05, 'epoch': 1.32, 'throughput': 10001.43} [INFO|2025-03-20 04:25:13] logging.py:143 >> {'loss': 0.4512, 'learning_rate': 2.9967e-05, 'epoch': 1.32, 'throughput': 10001.40} [INFO|2025-03-20 04:25:53] logging.py:143 >> {'loss': 0.4800, 'learning_rate': 2.9953e-05, 'epoch': 1.32, 'throughput': 10001.41} [INFO|2025-03-20 04:26:35] logging.py:143 >> {'loss': 0.4353, 'learning_rate': 2.9939e-05, 'epoch': 1.32, 'throughput': 10001.31} [INFO|2025-03-20 04:27:14] logging.py:143 >> {'loss': 0.4327, 'learning_rate': 2.9925e-05, 'epoch': 1.32, 'throughput': 10001.27} [INFO|2025-03-20 04:27:55] logging.py:143 >> {'loss': 0.4548, 'learning_rate': 2.9912e-05, 'epoch': 1.32, 'throughput': 10001.24} [INFO|2025-03-20 04:28:34] logging.py:143 >> {'loss': 0.4484, 'learning_rate': 2.9898e-05, 'epoch': 1.32, 'throughput': 10001.23} [INFO|2025-03-20 04:29:15] logging.py:143 >> {'loss': 0.4468, 'learning_rate': 2.9884e-05, 'epoch': 1.32, 'throughput': 10001.20} [INFO|2025-03-20 04:29:56] logging.py:143 >> {'loss': 0.4678, 'learning_rate': 2.9871e-05, 'epoch': 1.32, 'throughput': 10001.24} [INFO|2025-03-20 04:30:35] logging.py:143 >> {'loss': 0.4793, 'learning_rate': 2.9857e-05, 'epoch': 1.33, 'throughput': 10001.13} [INFO|2025-03-20 04:31:16] logging.py:143 >> {'loss': 0.4439, 'learning_rate': 2.9843e-05, 'epoch': 1.33, 'throughput': 10001.08} [INFO|2025-03-20 04:31:57] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 2.9829e-05, 'epoch': 1.33, 'throughput': 10000.99} [INFO|2025-03-20 04:32:38] logging.py:143 >> {'loss': 0.4240, 'learning_rate': 2.9816e-05, 'epoch': 1.33, 'throughput': 10000.84} [INFO|2025-03-20 04:33:17] logging.py:143 >> {'loss': 0.4875, 'learning_rate': 2.9802e-05, 'epoch': 1.33, 'throughput': 10000.89} [INFO|2025-03-20 04:33:59] logging.py:143 >> {'loss': 0.4919, 'learning_rate': 2.9788e-05, 'epoch': 1.33, 'throughput': 10000.91} [INFO|2025-03-20 04:34:39] logging.py:143 >> {'loss': 0.4784, 'learning_rate': 2.9774e-05, 'epoch': 1.33, 'throughput': 10000.94} [INFO|2025-03-20 04:35:20] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 2.9761e-05, 'epoch': 1.33, 'throughput': 10000.88} [INFO|2025-03-20 04:36:02] logging.py:143 >> {'loss': 0.4252, 'learning_rate': 2.9747e-05, 'epoch': 1.33, 'throughput': 10000.92} [INFO|2025-03-20 04:36:42] logging.py:143 >> {'loss': 0.4827, 'learning_rate': 2.9733e-05, 'epoch': 1.33, 'throughput': 10000.81} [INFO|2025-03-20 04:37:23] logging.py:143 >> {'loss': 0.4728, 'learning_rate': 2.9719e-05, 'epoch': 1.33, 'throughput': 10000.70} [INFO|2025-03-20 04:38:03] logging.py:143 >> {'loss': 0.4881, 'learning_rate': 2.9706e-05, 'epoch': 1.33, 'throughput': 10000.81} [INFO|2025-03-20 04:38:43] logging.py:143 >> {'loss': 0.4584, 'learning_rate': 2.9692e-05, 'epoch': 1.33, 'throughput': 10000.86} [INFO|2025-03-20 04:39:24] logging.py:143 >> {'loss': 0.4591, 'learning_rate': 2.9678e-05, 'epoch': 1.33, 'throughput': 10000.72} [INFO|2025-03-20 04:40:04] logging.py:143 >> {'loss': 0.4412, 'learning_rate': 2.9665e-05, 'epoch': 1.33, 'throughput': 10000.70} [INFO|2025-03-20 04:40:45] logging.py:143 >> {'loss': 0.4586, 'learning_rate': 2.9651e-05, 'epoch': 1.33, 'throughput': 10000.71} [INFO|2025-03-20 04:41:26] logging.py:143 >> {'loss': 0.4705, 'learning_rate': 2.9637e-05, 'epoch': 1.33, 'throughput': 10000.73} [INFO|2025-03-20 04:42:07] logging.py:143 >> {'loss': 0.4706, 'learning_rate': 2.9623e-05, 'epoch': 1.33, 'throughput': 10000.66} [INFO|2025-03-20 04:42:46] logging.py:143 >> {'loss': 0.4531, 'learning_rate': 2.9610e-05, 'epoch': 1.33, 'throughput': 10000.70} [INFO|2025-03-20 04:43:26] logging.py:143 >> {'loss': 0.4203, 'learning_rate': 2.9596e-05, 'epoch': 1.34, 'throughput': 10000.69} [INFO|2025-03-20 04:44:07] logging.py:143 >> {'loss': 0.4635, 'learning_rate': 2.9582e-05, 'epoch': 1.34, 'throughput': 10000.72} [INFO|2025-03-20 04:44:47] logging.py:143 >> {'loss': 0.4627, 'learning_rate': 2.9568e-05, 'epoch': 1.34, 'throughput': 10000.75} [INFO|2025-03-20 04:45:28] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 2.9554e-05, 'epoch': 1.34, 'throughput': 10000.75} [INFO|2025-03-20 04:46:08] logging.py:143 >> {'loss': 0.4333, 'learning_rate': 2.9541e-05, 'epoch': 1.34, 'throughput': 10000.79} [INFO|2025-03-20 04:46:48] logging.py:143 >> {'loss': 0.4864, 'learning_rate': 2.9527e-05, 'epoch': 1.34, 'throughput': 10000.91} [INFO|2025-03-20 04:47:29] logging.py:143 >> {'loss': 0.4515, 'learning_rate': 2.9513e-05, 'epoch': 1.34, 'throughput': 10000.81} [INFO|2025-03-20 04:48:09] logging.py:143 >> {'loss': 0.4746, 'learning_rate': 2.9499e-05, 'epoch': 1.34, 'throughput': 10000.84} [INFO|2025-03-20 04:48:49] logging.py:143 >> {'loss': 0.4491, 'learning_rate': 2.9486e-05, 'epoch': 1.34, 'throughput': 10000.88} [INFO|2025-03-20 04:49:29] logging.py:143 >> {'loss': 0.4539, 'learning_rate': 2.9472e-05, 'epoch': 1.34, 'throughput': 10000.82} [INFO|2025-03-20 04:50:09] logging.py:143 >> {'loss': 0.4422, 'learning_rate': 2.9458e-05, 'epoch': 1.34, 'throughput': 10000.76} [INFO|2025-03-20 04:50:49] logging.py:143 >> {'loss': 0.4713, 'learning_rate': 2.9444e-05, 'epoch': 1.34, 'throughput': 10000.73} [INFO|2025-03-20 04:51:30] logging.py:143 >> {'loss': 0.4274, 'learning_rate': 2.9431e-05, 'epoch': 1.34, 'throughput': 10000.68} [INFO|2025-03-20 04:52:12] logging.py:143 >> {'loss': 0.4554, 'learning_rate': 2.9417e-05, 'epoch': 1.34, 'throughput': 10000.70} [INFO|2025-03-20 04:52:51] logging.py:143 >> {'loss': 0.4808, 'learning_rate': 2.9403e-05, 'epoch': 1.34, 'throughput': 10000.72} [INFO|2025-03-20 04:53:31] logging.py:143 >> {'loss': 0.4610, 'learning_rate': 2.9389e-05, 'epoch': 1.34, 'throughput': 10000.71} [INFO|2025-03-20 04:54:12] logging.py:143 >> {'loss': 0.4905, 'learning_rate': 2.9375e-05, 'epoch': 1.34, 'throughput': 10000.80} [INFO|2025-03-20 04:54:52] logging.py:143 >> {'loss': 0.4524, 'learning_rate': 2.9362e-05, 'epoch': 1.34, 'throughput': 10000.78} [INFO|2025-03-20 04:55:33] logging.py:143 >> {'loss': 0.4689, 'learning_rate': 2.9348e-05, 'epoch': 1.34, 'throughput': 10000.83} [INFO|2025-03-20 04:56:15] logging.py:143 >> {'loss': 0.4464, 'learning_rate': 2.9334e-05, 'epoch': 1.35, 'throughput': 10000.80} [INFO|2025-03-20 04:56:56] logging.py:143 >> {'loss': 0.4516, 'learning_rate': 2.9320e-05, 'epoch': 1.35, 'throughput': 10000.78} [INFO|2025-03-20 04:57:36] logging.py:143 >> {'loss': 0.4527, 'learning_rate': 2.9307e-05, 'epoch': 1.35, 'throughput': 10000.66} [INFO|2025-03-20 04:58:17] logging.py:143 >> {'loss': 0.4847, 'learning_rate': 2.9293e-05, 'epoch': 1.35, 'throughput': 10000.66} [INFO|2025-03-20 04:58:56] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 2.9279e-05, 'epoch': 1.35, 'throughput': 10000.63} [INFO|2025-03-20 04:59:36] logging.py:143 >> {'loss': 0.4696, 'learning_rate': 2.9265e-05, 'epoch': 1.35, 'throughput': 10000.62} [INFO|2025-03-20 05:00:18] logging.py:143 >> {'loss': 0.4552, 'learning_rate': 2.9251e-05, 'epoch': 1.35, 'throughput': 10000.39} [INFO|2025-03-20 05:00:59] logging.py:143 >> {'loss': 0.4250, 'learning_rate': 2.9238e-05, 'epoch': 1.35, 'throughput': 10000.43} [INFO|2025-03-20 05:01:40] logging.py:143 >> {'loss': 0.5002, 'learning_rate': 2.9224e-05, 'epoch': 1.35, 'throughput': 10000.44} [INFO|2025-03-20 05:02:23] logging.py:143 >> {'loss': 0.4463, 'learning_rate': 2.9210e-05, 'epoch': 1.35, 'throughput': 10000.21} [INFO|2025-03-20 05:03:01] logging.py:143 >> {'loss': 0.4672, 'learning_rate': 2.9196e-05, 'epoch': 1.35, 'throughput': 10000.39} [INFO|2025-03-20 05:03:43] logging.py:143 >> {'loss': 0.4377, 'learning_rate': 2.9182e-05, 'epoch': 1.35, 'throughput': 10000.26} [INFO|2025-03-20 05:04:25] logging.py:143 >> {'loss': 0.4486, 'learning_rate': 2.9169e-05, 'epoch': 1.35, 'throughput': 10000.20} [INFO|2025-03-20 05:05:06] logging.py:143 >> {'loss': 0.4825, 'learning_rate': 2.9155e-05, 'epoch': 1.35, 'throughput': 10000.13} [INFO|2025-03-20 05:05:46] logging.py:143 >> {'loss': 0.4462, 'learning_rate': 2.9141e-05, 'epoch': 1.35, 'throughput': 10000.27} [INFO|2025-03-20 05:06:27] logging.py:143 >> {'loss': 0.4517, 'learning_rate': 2.9127e-05, 'epoch': 1.35, 'throughput': 10000.25} [INFO|2025-03-20 05:07:08] logging.py:143 >> {'loss': 0.4707, 'learning_rate': 2.9113e-05, 'epoch': 1.35, 'throughput': 10000.23} [INFO|2025-03-20 05:07:48] logging.py:143 >> {'loss': 0.4616, 'learning_rate': 2.9100e-05, 'epoch': 1.35, 'throughput': 10000.27} [INFO|2025-03-20 05:08:28] logging.py:143 >> {'loss': 0.4427, 'learning_rate': 2.9086e-05, 'epoch': 1.35, 'throughput': 10000.23} [INFO|2025-03-20 05:09:09] logging.py:143 >> {'loss': 0.4456, 'learning_rate': 2.9072e-05, 'epoch': 1.36, 'throughput': 10000.31} [INFO|2025-03-20 05:09:48] logging.py:143 >> {'loss': 0.4576, 'learning_rate': 2.9058e-05, 'epoch': 1.36, 'throughput': 10000.28} [INFO|2025-03-20 05:10:29] logging.py:143 >> {'loss': 0.4580, 'learning_rate': 2.9044e-05, 'epoch': 1.36, 'throughput': 10000.32} [INFO|2025-03-20 05:11:09] logging.py:143 >> {'loss': 0.4699, 'learning_rate': 2.9031e-05, 'epoch': 1.36, 'throughput': 10000.32} [INFO|2025-03-20 05:11:50] logging.py:143 >> {'loss': 0.4677, 'learning_rate': 2.9017e-05, 'epoch': 1.36, 'throughput': 10000.35} [INFO|2025-03-20 05:12:30] logging.py:143 >> {'loss': 0.4506, 'learning_rate': 2.9003e-05, 'epoch': 1.36, 'throughput': 10000.25} [INFO|2025-03-20 05:13:12] logging.py:143 >> {'loss': 0.4498, 'learning_rate': 2.8989e-05, 'epoch': 1.36, 'throughput': 10000.18} [INFO|2025-03-20 05:13:53] logging.py:143 >> {'loss': 0.4528, 'learning_rate': 2.8975e-05, 'epoch': 1.36, 'throughput': 10000.17} [INFO|2025-03-20 05:14:32] logging.py:143 >> {'loss': 0.4668, 'learning_rate': 2.8962e-05, 'epoch': 1.36, 'throughput': 10000.22} [INFO|2025-03-20 05:15:15] logging.py:143 >> {'loss': 0.4409, 'learning_rate': 2.8948e-05, 'epoch': 1.36, 'throughput': 10000.06} [INFO|2025-03-20 05:15:55] logging.py:143 >> {'loss': 0.4604, 'learning_rate': 2.8934e-05, 'epoch': 1.36, 'throughput': 10000.14} [INFO|2025-03-20 05:16:35] logging.py:143 >> {'loss': 0.4235, 'learning_rate': 2.8920e-05, 'epoch': 1.36, 'throughput': 10000.13} [INFO|2025-03-20 05:17:16] logging.py:143 >> {'loss': 0.4533, 'learning_rate': 2.8906e-05, 'epoch': 1.36, 'throughput': 10000.14} [INFO|2025-03-20 05:17:57] logging.py:143 >> {'loss': 0.4621, 'learning_rate': 2.8892e-05, 'epoch': 1.36, 'throughput': 10000.07} [INFO|2025-03-20 05:18:36] logging.py:143 >> {'loss': 0.4665, 'learning_rate': 2.8879e-05, 'epoch': 1.36, 'throughput': 10000.20} [INFO|2025-03-20 05:19:17] logging.py:143 >> {'loss': 0.4336, 'learning_rate': 2.8865e-05, 'epoch': 1.36, 'throughput': 10000.16} [INFO|2025-03-20 05:19:58] logging.py:143 >> {'loss': 0.4413, 'learning_rate': 2.8851e-05, 'epoch': 1.36, 'throughput': 10000.06} [INFO|2025-03-20 05:20:36] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 2.8837e-05, 'epoch': 1.36, 'throughput': 10000.13} [INFO|2025-03-20 05:21:16] logging.py:143 >> {'loss': 0.4605, 'learning_rate': 2.8823e-05, 'epoch': 1.37, 'throughput': 10000.19} [INFO|2025-03-20 05:21:57] logging.py:143 >> {'loss': 0.4871, 'learning_rate': 2.8810e-05, 'epoch': 1.37, 'throughput': 10000.16} [INFO|2025-03-20 05:22:37] logging.py:143 >> {'loss': 0.4699, 'learning_rate': 2.8796e-05, 'epoch': 1.37, 'throughput': 10000.15} [INFO|2025-03-20 05:23:16] logging.py:143 >> {'loss': 0.4830, 'learning_rate': 2.8782e-05, 'epoch': 1.37, 'throughput': 10000.25} [INFO|2025-03-20 05:23:57] logging.py:143 >> {'loss': 0.4641, 'learning_rate': 2.8768e-05, 'epoch': 1.37, 'throughput': 10000.38} [INFO|2025-03-20 05:24:37] logging.py:143 >> {'loss': 0.4379, 'learning_rate': 2.8754e-05, 'epoch': 1.37, 'throughput': 10000.37} [INFO|2025-03-20 05:25:17] logging.py:143 >> {'loss': 0.4738, 'learning_rate': 2.8740e-05, 'epoch': 1.37, 'throughput': 10000.46} [INFO|2025-03-20 05:25:57] logging.py:143 >> {'loss': 0.4677, 'learning_rate': 2.8727e-05, 'epoch': 1.37, 'throughput': 10000.39} [INFO|2025-03-20 05:26:39] logging.py:143 >> {'loss': 0.4565, 'learning_rate': 2.8713e-05, 'epoch': 1.37, 'throughput': 10000.35} [INFO|2025-03-20 05:27:19] logging.py:143 >> {'loss': 0.4293, 'learning_rate': 2.8699e-05, 'epoch': 1.37, 'throughput': 10000.33} [INFO|2025-03-20 05:28:00] logging.py:143 >> {'loss': 0.4904, 'learning_rate': 2.8685e-05, 'epoch': 1.37, 'throughput': 10000.35} [INFO|2025-03-20 05:28:42] logging.py:143 >> {'loss': 0.4288, 'learning_rate': 2.8671e-05, 'epoch': 1.37, 'throughput': 10000.36} [INFO|2025-03-20 05:29:23] logging.py:143 >> {'loss': 0.4472, 'learning_rate': 2.8657e-05, 'epoch': 1.37, 'throughput': 10000.28} [INFO|2025-03-20 05:30:02] logging.py:143 >> {'loss': 0.4475, 'learning_rate': 2.8643e-05, 'epoch': 1.37, 'throughput': 10000.39} [INFO|2025-03-20 05:30:43] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 2.8630e-05, 'epoch': 1.37, 'throughput': 10000.29} [INFO|2025-03-20 05:31:24] logging.py:143 >> {'loss': 0.4463, 'learning_rate': 2.8616e-05, 'epoch': 1.37, 'throughput': 10000.24} [INFO|2025-03-20 05:32:05] logging.py:143 >> {'loss': 0.4236, 'learning_rate': 2.8602e-05, 'epoch': 1.37, 'throughput': 10000.23} [INFO|2025-03-20 05:32:44] logging.py:143 >> {'loss': 0.4431, 'learning_rate': 2.8588e-05, 'epoch': 1.37, 'throughput': 10000.31} [INFO|2025-03-20 05:33:25] logging.py:143 >> {'loss': 0.4371, 'learning_rate': 2.8574e-05, 'epoch': 1.37, 'throughput': 10000.29} [INFO|2025-03-20 05:34:04] logging.py:143 >> {'loss': 0.4414, 'learning_rate': 2.8560e-05, 'epoch': 1.38, 'throughput': 10000.41} [INFO|2025-03-20 05:34:44] logging.py:143 >> {'loss': 0.4346, 'learning_rate': 2.8547e-05, 'epoch': 1.38, 'throughput': 10000.31} [INFO|2025-03-20 05:35:25] logging.py:143 >> {'loss': 0.4493, 'learning_rate': 2.8533e-05, 'epoch': 1.38, 'throughput': 10000.38} [INFO|2025-03-20 05:36:06] logging.py:143 >> {'loss': 0.4536, 'learning_rate': 2.8519e-05, 'epoch': 1.38, 'throughput': 10000.34} [INFO|2025-03-20 05:36:44] logging.py:143 >> {'loss': 0.4788, 'learning_rate': 2.8505e-05, 'epoch': 1.38, 'throughput': 10000.47} [INFO|2025-03-20 05:37:24] logging.py:143 >> {'loss': 0.4601, 'learning_rate': 2.8491e-05, 'epoch': 1.38, 'throughput': 10000.50} [INFO|2025-03-20 05:38:06] logging.py:143 >> {'loss': 0.4465, 'learning_rate': 2.8477e-05, 'epoch': 1.38, 'throughput': 10000.55} [INFO|2025-03-20 05:38:45] logging.py:143 >> {'loss': 0.4523, 'learning_rate': 2.8463e-05, 'epoch': 1.38, 'throughput': 10000.58} [INFO|2025-03-20 05:39:27] logging.py:143 >> {'loss': 0.4658, 'learning_rate': 2.8450e-05, 'epoch': 1.38, 'throughput': 10000.53} [INFO|2025-03-20 05:40:06] logging.py:143 >> {'loss': 0.4823, 'learning_rate': 2.8436e-05, 'epoch': 1.38, 'throughput': 10000.62} [INFO|2025-03-20 05:40:47] logging.py:143 >> {'loss': 0.4444, 'learning_rate': 2.8422e-05, 'epoch': 1.38, 'throughput': 10000.62} [INFO|2025-03-20 05:41:27] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 2.8408e-05, 'epoch': 1.38, 'throughput': 10000.73} [INFO|2025-03-20 05:42:08] logging.py:143 >> {'loss': 0.4454, 'learning_rate': 2.8394e-05, 'epoch': 1.38, 'throughput': 10000.65} [INFO|2025-03-20 05:42:51] logging.py:143 >> {'loss': 0.4507, 'learning_rate': 2.8380e-05, 'epoch': 1.38, 'throughput': 10000.54} [INFO|2025-03-20 05:43:31] logging.py:143 >> {'loss': 0.4468, 'learning_rate': 2.8366e-05, 'epoch': 1.38, 'throughput': 10000.57} [INFO|2025-03-20 05:44:11] logging.py:143 >> {'loss': 0.4778, 'learning_rate': 2.8353e-05, 'epoch': 1.38, 'throughput': 10000.68} [INFO|2025-03-20 05:44:52] logging.py:143 >> {'loss': 0.4399, 'learning_rate': 2.8339e-05, 'epoch': 1.38, 'throughput': 10000.62} [INFO|2025-03-20 05:45:31] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 2.8325e-05, 'epoch': 1.38, 'throughput': 10000.67} [INFO|2025-03-20 05:46:11] logging.py:143 >> {'loss': 0.4897, 'learning_rate': 2.8311e-05, 'epoch': 1.38, 'throughput': 10000.68} [INFO|2025-03-20 05:46:53] logging.py:143 >> {'loss': 0.4873, 'learning_rate': 2.8297e-05, 'epoch': 1.39, 'throughput': 10000.57} [INFO|2025-03-20 05:47:33] logging.py:143 >> {'loss': 0.4732, 'learning_rate': 2.8283e-05, 'epoch': 1.39, 'throughput': 10000.52} [INFO|2025-03-20 05:48:13] logging.py:143 >> {'loss': 0.4481, 'learning_rate': 2.8269e-05, 'epoch': 1.39, 'throughput': 10000.66} [INFO|2025-03-20 05:48:54] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 2.8255e-05, 'epoch': 1.39, 'throughput': 10000.53} [INFO|2025-03-20 05:49:35] logging.py:143 >> {'loss': 0.4725, 'learning_rate': 2.8242e-05, 'epoch': 1.39, 'throughput': 10000.56} [INFO|2025-03-20 05:50:15] logging.py:143 >> {'loss': 0.4483, 'learning_rate': 2.8228e-05, 'epoch': 1.39, 'throughput': 10000.63} [INFO|2025-03-20 05:50:55] logging.py:143 >> {'loss': 0.4704, 'learning_rate': 2.8214e-05, 'epoch': 1.39, 'throughput': 10000.68} [INFO|2025-03-20 05:51:37] logging.py:143 >> {'loss': 0.4416, 'learning_rate': 2.8200e-05, 'epoch': 1.39, 'throughput': 10000.64} [INFO|2025-03-20 05:52:19] logging.py:143 >> {'loss': 0.4392, 'learning_rate': 2.8186e-05, 'epoch': 1.39, 'throughput': 10000.58} [INFO|2025-03-20 05:53:00] logging.py:143 >> {'loss': 0.4519, 'learning_rate': 2.8172e-05, 'epoch': 1.39, 'throughput': 10000.51} [INFO|2025-03-20 05:53:40] logging.py:143 >> {'loss': 0.4514, 'learning_rate': 2.8158e-05, 'epoch': 1.39, 'throughput': 10000.57} [INFO|2025-03-20 05:54:23] logging.py:143 >> {'loss': 0.4774, 'learning_rate': 2.8144e-05, 'epoch': 1.39, 'throughput': 10000.44} [INFO|2025-03-20 05:55:02] logging.py:143 >> {'loss': 0.4985, 'learning_rate': 2.8130e-05, 'epoch': 1.39, 'throughput': 10000.60} [INFO|2025-03-20 05:55:42] logging.py:143 >> {'loss': 0.4458, 'learning_rate': 2.8117e-05, 'epoch': 1.39, 'throughput': 10000.60} [INFO|2025-03-20 05:56:23] logging.py:143 >> {'loss': 0.4410, 'learning_rate': 2.8103e-05, 'epoch': 1.39, 'throughput': 10000.60} [INFO|2025-03-20 05:57:03] logging.py:143 >> {'loss': 0.4613, 'learning_rate': 2.8089e-05, 'epoch': 1.39, 'throughput': 10000.57} [INFO|2025-03-20 05:57:44] logging.py:143 >> {'loss': 0.4313, 'learning_rate': 2.8075e-05, 'epoch': 1.39, 'throughput': 10000.50} [INFO|2025-03-20 05:58:25] logging.py:143 >> {'loss': 0.4622, 'learning_rate': 2.8061e-05, 'epoch': 1.39, 'throughput': 10000.55} [INFO|2025-03-20 05:59:07] logging.py:143 >> {'loss': 0.4536, 'learning_rate': 2.8047e-05, 'epoch': 1.39, 'throughput': 10000.37} [INFO|2025-03-20 05:59:47] logging.py:143 >> {'loss': 0.4742, 'learning_rate': 2.8033e-05, 'epoch': 1.40, 'throughput': 10000.31} [INFO|2025-03-20 06:00:27] logging.py:143 >> {'loss': 0.4557, 'learning_rate': 2.8019e-05, 'epoch': 1.40, 'throughput': 10000.33} [INFO|2025-03-20 06:01:07] logging.py:143 >> {'loss': 0.4702, 'learning_rate': 2.8006e-05, 'epoch': 1.40, 'throughput': 10000.38} [INFO|2025-03-20 06:01:48] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 2.7992e-05, 'epoch': 1.40, 'throughput': 10000.38} [INFO|2025-03-20 06:02:28] logging.py:143 >> {'loss': 0.4553, 'learning_rate': 2.7978e-05, 'epoch': 1.40, 'throughput': 10000.49} [INFO|2025-03-20 06:03:08] logging.py:143 >> {'loss': 0.4390, 'learning_rate': 2.7964e-05, 'epoch': 1.40, 'throughput': 10000.36} [INFO|2025-03-20 06:03:50] logging.py:143 >> {'loss': 0.4411, 'learning_rate': 2.7950e-05, 'epoch': 1.40, 'throughput': 10000.28} [INFO|2025-03-20 06:04:30] logging.py:143 >> {'loss': 0.4348, 'learning_rate': 2.7936e-05, 'epoch': 1.40, 'throughput': 10000.32} [INFO|2025-03-20 06:05:11] logging.py:143 >> {'loss': 0.4318, 'learning_rate': 2.7922e-05, 'epoch': 1.40, 'throughput': 10000.27} [INFO|2025-03-20 06:05:53] logging.py:143 >> {'loss': 0.4247, 'learning_rate': 2.7908e-05, 'epoch': 1.40, 'throughput': 10000.18} [INFO|2025-03-20 06:06:34] logging.py:143 >> {'loss': 0.4505, 'learning_rate': 2.7894e-05, 'epoch': 1.40, 'throughput': 10000.14} [INFO|2025-03-20 06:07:15] logging.py:143 >> {'loss': 0.4266, 'learning_rate': 2.7880e-05, 'epoch': 1.40, 'throughput': 10000.00} [INFO|2025-03-20 06:07:56] logging.py:143 >> {'loss': 0.4524, 'learning_rate': 2.7867e-05, 'epoch': 1.40, 'throughput': 9999.91} [INFO|2025-03-20 06:08:36] logging.py:143 >> {'loss': 0.4460, 'learning_rate': 2.7853e-05, 'epoch': 1.40, 'throughput': 9999.93} [INFO|2025-03-20 06:09:16] logging.py:143 >> {'loss': 0.4800, 'learning_rate': 2.7839e-05, 'epoch': 1.40, 'throughput': 9999.97} [INFO|2025-03-20 06:09:56] logging.py:143 >> {'loss': 0.4523, 'learning_rate': 2.7825e-05, 'epoch': 1.40, 'throughput': 10000.07} [INFO|2025-03-20 06:10:38] logging.py:143 >> {'loss': 0.4499, 'learning_rate': 2.7811e-05, 'epoch': 1.40, 'throughput': 9999.99} [INFO|2025-03-20 06:11:18] logging.py:143 >> {'loss': 0.4496, 'learning_rate': 2.7797e-05, 'epoch': 1.40, 'throughput': 10000.06} [INFO|2025-03-20 06:11:59] logging.py:143 >> {'loss': 0.4772, 'learning_rate': 2.7783e-05, 'epoch': 1.40, 'throughput': 10000.08} [INFO|2025-03-20 06:12:40] logging.py:143 >> {'loss': 0.4695, 'learning_rate': 2.7769e-05, 'epoch': 1.41, 'throughput': 10000.02} [INFO|2025-03-20 06:13:20] logging.py:143 >> {'loss': 0.4566, 'learning_rate': 2.7755e-05, 'epoch': 1.41, 'throughput': 10000.10} [INFO|2025-03-20 06:14:02] logging.py:143 >> {'loss': 0.4427, 'learning_rate': 2.7741e-05, 'epoch': 1.41, 'throughput': 10000.02} [INFO|2025-03-20 06:14:42] logging.py:143 >> {'loss': 0.4703, 'learning_rate': 2.7727e-05, 'epoch': 1.41, 'throughput': 10000.01} [INFO|2025-03-20 06:15:23] logging.py:143 >> {'loss': 0.4335, 'learning_rate': 2.7714e-05, 'epoch': 1.41, 'throughput': 9999.99} [INFO|2025-03-20 06:16:04] logging.py:143 >> {'loss': 0.4678, 'learning_rate': 2.7700e-05, 'epoch': 1.41, 'throughput': 9999.99} [INFO|2025-03-20 06:16:45] logging.py:143 >> {'loss': 0.4525, 'learning_rate': 2.7686e-05, 'epoch': 1.41, 'throughput': 10000.00} [INFO|2025-03-20 06:17:25] logging.py:143 >> {'loss': 0.4491, 'learning_rate': 2.7672e-05, 'epoch': 1.41, 'throughput': 10000.15} [INFO|2025-03-20 06:18:05] logging.py:143 >> {'loss': 0.4263, 'learning_rate': 2.7658e-05, 'epoch': 1.41, 'throughput': 10000.18} [INFO|2025-03-20 06:18:47] logging.py:143 >> {'loss': 0.4308, 'learning_rate': 2.7644e-05, 'epoch': 1.41, 'throughput': 10000.06} [INFO|2025-03-20 06:19:27] logging.py:143 >> {'loss': 0.4511, 'learning_rate': 2.7630e-05, 'epoch': 1.41, 'throughput': 10000.17} [INFO|2025-03-20 06:20:07] logging.py:143 >> {'loss': 0.4631, 'learning_rate': 2.7616e-05, 'epoch': 1.41, 'throughput': 10000.17} [INFO|2025-03-20 06:20:49] logging.py:143 >> {'loss': 0.4557, 'learning_rate': 2.7602e-05, 'epoch': 1.41, 'throughput': 10000.13} [INFO|2025-03-20 06:21:29] logging.py:143 >> {'loss': 0.4627, 'learning_rate': 2.7588e-05, 'epoch': 1.41, 'throughput': 10000.08} [INFO|2025-03-20 06:22:08] logging.py:143 >> {'loss': 0.4360, 'learning_rate': 2.7574e-05, 'epoch': 1.41, 'throughput': 10000.07} [INFO|2025-03-20 06:22:49] logging.py:143 >> {'loss': 0.4654, 'learning_rate': 2.7561e-05, 'epoch': 1.41, 'throughput': 10000.06} [INFO|2025-03-20 06:23:29] logging.py:143 >> {'loss': 0.4532, 'learning_rate': 2.7547e-05, 'epoch': 1.41, 'throughput': 10000.01} [INFO|2025-03-20 06:24:10] logging.py:143 >> {'loss': 0.4498, 'learning_rate': 2.7533e-05, 'epoch': 1.41, 'throughput': 9999.99} [INFO|2025-03-20 06:24:49] logging.py:143 >> {'loss': 0.4345, 'learning_rate': 2.7519e-05, 'epoch': 1.41, 'throughput': 9999.96} [INFO|2025-03-20 06:25:30] logging.py:143 >> {'loss': 0.4400, 'learning_rate': 2.7505e-05, 'epoch': 1.42, 'throughput': 9999.88} [INFO|2025-03-20 06:26:11] logging.py:143 >> {'loss': 0.4849, 'learning_rate': 2.7491e-05, 'epoch': 1.42, 'throughput': 9999.89} [INFO|2025-03-20 06:26:52] logging.py:143 >> {'loss': 0.4285, 'learning_rate': 2.7477e-05, 'epoch': 1.42, 'throughput': 9999.83} [INFO|2025-03-20 06:27:32] logging.py:143 >> {'loss': 0.4738, 'learning_rate': 2.7463e-05, 'epoch': 1.42, 'throughput': 9999.90} [INFO|2025-03-20 06:28:13] logging.py:143 >> {'loss': 0.4453, 'learning_rate': 2.7449e-05, 'epoch': 1.42, 'throughput': 9999.88} [INFO|2025-03-20 06:28:53] logging.py:143 >> {'loss': 0.4569, 'learning_rate': 2.7435e-05, 'epoch': 1.42, 'throughput': 9999.97} [INFO|2025-03-20 06:29:34] logging.py:143 >> {'loss': 0.4563, 'learning_rate': 2.7421e-05, 'epoch': 1.42, 'throughput': 10000.01} [INFO|2025-03-20 06:30:15] logging.py:143 >> {'loss': 0.4756, 'learning_rate': 2.7407e-05, 'epoch': 1.42, 'throughput': 10000.08} [INFO|2025-03-20 06:30:56] logging.py:143 >> {'loss': 0.4343, 'learning_rate': 2.7393e-05, 'epoch': 1.42, 'throughput': 10000.08} [INFO|2025-03-20 06:31:36] logging.py:143 >> {'loss': 0.4512, 'learning_rate': 2.7379e-05, 'epoch': 1.42, 'throughput': 10000.17} [INFO|2025-03-20 06:32:15] logging.py:143 >> {'loss': 0.4625, 'learning_rate': 2.7366e-05, 'epoch': 1.42, 'throughput': 10000.24} [INFO|2025-03-20 06:32:56] logging.py:143 >> {'loss': 0.4515, 'learning_rate': 2.7352e-05, 'epoch': 1.42, 'throughput': 10000.33} [INFO|2025-03-20 06:33:35] logging.py:143 >> {'loss': 0.4832, 'learning_rate': 2.7338e-05, 'epoch': 1.42, 'throughput': 10000.34} [INFO|2025-03-20 06:34:14] logging.py:143 >> {'loss': 0.4494, 'learning_rate': 2.7324e-05, 'epoch': 1.42, 'throughput': 10000.34} [INFO|2025-03-20 06:34:55] logging.py:143 >> {'loss': 0.4595, 'learning_rate': 2.7310e-05, 'epoch': 1.42, 'throughput': 10000.34} [INFO|2025-03-20 06:35:39] logging.py:143 >> {'loss': 0.4907, 'learning_rate': 2.7296e-05, 'epoch': 1.42, 'throughput': 10000.16} [INFO|2025-03-20 06:36:21] logging.py:143 >> {'loss': 0.4499, 'learning_rate': 2.7282e-05, 'epoch': 1.42, 'throughput': 10000.04} [INFO|2025-03-20 06:37:02] logging.py:143 >> {'loss': 0.4497, 'learning_rate': 2.7268e-05, 'epoch': 1.42, 'throughput': 10000.03} [INFO|2025-03-20 06:37:41] logging.py:143 >> {'loss': 0.4679, 'learning_rate': 2.7254e-05, 'epoch': 1.43, 'throughput': 10000.09} [INFO|2025-03-20 06:38:22] logging.py:143 >> {'loss': 0.4588, 'learning_rate': 2.7240e-05, 'epoch': 1.43, 'throughput': 10000.09} [INFO|2025-03-20 06:39:03] logging.py:143 >> {'loss': 0.4365, 'learning_rate': 2.7226e-05, 'epoch': 1.43, 'throughput': 10000.03} [INFO|2025-03-20 06:39:43] logging.py:143 >> {'loss': 0.4678, 'learning_rate': 2.7212e-05, 'epoch': 1.43, 'throughput': 10000.10} [INFO|2025-03-20 06:40:26] logging.py:143 >> {'loss': 0.4460, 'learning_rate': 2.7198e-05, 'epoch': 1.43, 'throughput': 10000.09} [INFO|2025-03-20 06:41:07] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 2.7184e-05, 'epoch': 1.43, 'throughput': 10000.11} [INFO|2025-03-20 06:41:46] logging.py:143 >> {'loss': 0.4361, 'learning_rate': 2.7170e-05, 'epoch': 1.43, 'throughput': 10000.11} [INFO|2025-03-20 06:42:26] logging.py:143 >> {'loss': 0.4642, 'learning_rate': 2.7157e-05, 'epoch': 1.43, 'throughput': 10000.13} [INFO|2025-03-20 06:43:07] logging.py:143 >> {'loss': 0.4430, 'learning_rate': 2.7143e-05, 'epoch': 1.43, 'throughput': 10000.12} [INFO|2025-03-20 06:43:47] logging.py:143 >> {'loss': 0.4511, 'learning_rate': 2.7129e-05, 'epoch': 1.43, 'throughput': 10000.10} [INFO|2025-03-20 06:44:26] logging.py:143 >> {'loss': 0.4401, 'learning_rate': 2.7115e-05, 'epoch': 1.43, 'throughput': 10000.19} [INFO|2025-03-20 06:45:08] logging.py:143 >> {'loss': 0.4797, 'learning_rate': 2.7101e-05, 'epoch': 1.43, 'throughput': 10000.14} [INFO|2025-03-20 06:45:48] logging.py:143 >> {'loss': 0.4409, 'learning_rate': 2.7087e-05, 'epoch': 1.43, 'throughput': 10000.18} [INFO|2025-03-20 06:46:29] logging.py:143 >> {'loss': 0.4481, 'learning_rate': 2.7073e-05, 'epoch': 1.43, 'throughput': 10000.18} [INFO|2025-03-20 06:47:11] logging.py:143 >> {'loss': 0.4448, 'learning_rate': 2.7059e-05, 'epoch': 1.43, 'throughput': 10000.13} [INFO|2025-03-20 06:47:50] logging.py:143 >> {'loss': 0.4609, 'learning_rate': 2.7045e-05, 'epoch': 1.43, 'throughput': 10000.23} [INFO|2025-03-20 06:48:31] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 2.7031e-05, 'epoch': 1.43, 'throughput': 10000.13} [INFO|2025-03-20 06:49:12] logging.py:143 >> {'loss': 0.4316, 'learning_rate': 2.7017e-05, 'epoch': 1.43, 'throughput': 10000.18} [INFO|2025-03-20 06:49:52] logging.py:143 >> {'loss': 0.4345, 'learning_rate': 2.7003e-05, 'epoch': 1.43, 'throughput': 10000.13} [INFO|2025-03-20 06:50:34] logging.py:143 >> {'loss': 0.4376, 'learning_rate': 2.6989e-05, 'epoch': 1.44, 'throughput': 10000.02} [INFO|2025-03-20 06:51:14] logging.py:143 >> {'loss': 0.4187, 'learning_rate': 2.6975e-05, 'epoch': 1.44, 'throughput': 10000.09} [INFO|2025-03-20 06:51:56] logging.py:143 >> {'loss': 0.4727, 'learning_rate': 2.6961e-05, 'epoch': 1.44, 'throughput': 10000.09} [INFO|2025-03-20 06:52:36] logging.py:143 >> {'loss': 0.4409, 'learning_rate': 2.6947e-05, 'epoch': 1.44, 'throughput': 10000.17} [INFO|2025-03-20 06:53:15] logging.py:143 >> {'loss': 0.4484, 'learning_rate': 2.6933e-05, 'epoch': 1.44, 'throughput': 10000.21} [INFO|2025-03-20 06:53:56] logging.py:143 >> {'loss': 0.4291, 'learning_rate': 2.6919e-05, 'epoch': 1.44, 'throughput': 10000.23} [INFO|2025-03-20 06:54:36] logging.py:143 >> {'loss': 0.4736, 'learning_rate': 2.6905e-05, 'epoch': 1.44, 'throughput': 10000.21} [INFO|2025-03-20 06:55:17] logging.py:143 >> {'loss': 0.4768, 'learning_rate': 2.6892e-05, 'epoch': 1.44, 'throughput': 10000.19} [INFO|2025-03-20 06:55:59] logging.py:143 >> {'loss': 0.4336, 'learning_rate': 2.6878e-05, 'epoch': 1.44, 'throughput': 10000.16} [INFO|2025-03-20 06:56:39] logging.py:143 >> {'loss': 0.4104, 'learning_rate': 2.6864e-05, 'epoch': 1.44, 'throughput': 10000.31} [INFO|2025-03-20 06:57:18] logging.py:143 >> {'loss': 0.4781, 'learning_rate': 2.6850e-05, 'epoch': 1.44, 'throughput': 10000.41} [INFO|2025-03-20 06:58:00] logging.py:143 >> {'loss': 0.4421, 'learning_rate': 2.6836e-05, 'epoch': 1.44, 'throughput': 10000.32} [INFO|2025-03-20 06:58:40] logging.py:143 >> {'loss': 0.4444, 'learning_rate': 2.6822e-05, 'epoch': 1.44, 'throughput': 10000.42} [INFO|2025-03-20 06:59:20] logging.py:143 >> {'loss': 0.4746, 'learning_rate': 2.6808e-05, 'epoch': 1.44, 'throughput': 10000.42} [INFO|2025-03-20 07:00:00] logging.py:143 >> {'loss': 0.4368, 'learning_rate': 2.6794e-05, 'epoch': 1.44, 'throughput': 10000.33} [INFO|2025-03-20 07:00:40] logging.py:143 >> {'loss': 0.4342, 'learning_rate': 2.6780e-05, 'epoch': 1.44, 'throughput': 10000.33} [INFO|2025-03-20 07:01:19] logging.py:143 >> {'loss': 0.4617, 'learning_rate': 2.6766e-05, 'epoch': 1.44, 'throughput': 10000.38} [INFO|2025-03-20 07:02:00] logging.py:143 >> {'loss': 0.4633, 'learning_rate': 2.6752e-05, 'epoch': 1.44, 'throughput': 10000.40} [INFO|2025-03-20 07:02:41] logging.py:143 >> {'loss': 0.4686, 'learning_rate': 2.6738e-05, 'epoch': 1.44, 'throughput': 10000.37} [INFO|2025-03-20 07:03:22] logging.py:143 >> {'loss': 0.4507, 'learning_rate': 2.6724e-05, 'epoch': 1.45, 'throughput': 10000.37} [INFO|2025-03-20 07:04:02] logging.py:143 >> {'loss': 0.4287, 'learning_rate': 2.6710e-05, 'epoch': 1.45, 'throughput': 10000.33} [INFO|2025-03-20 07:04:43] logging.py:143 >> {'loss': 0.4844, 'learning_rate': 2.6696e-05, 'epoch': 1.45, 'throughput': 10000.27} [INFO|2025-03-20 07:05:24] logging.py:143 >> {'loss': 0.4395, 'learning_rate': 2.6682e-05, 'epoch': 1.45, 'throughput': 10000.20} [INFO|2025-03-20 07:06:06] logging.py:143 >> {'loss': 0.4569, 'learning_rate': 2.6668e-05, 'epoch': 1.45, 'throughput': 10000.03} [INFO|2025-03-20 07:06:47] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 2.6654e-05, 'epoch': 1.45, 'throughput': 10000.04} [INFO|2025-03-20 07:07:26] logging.py:143 >> {'loss': 0.4502, 'learning_rate': 2.6640e-05, 'epoch': 1.45, 'throughput': 10000.17} [INFO|2025-03-20 07:08:06] logging.py:143 >> {'loss': 0.4582, 'learning_rate': 2.6626e-05, 'epoch': 1.45, 'throughput': 10000.25} [INFO|2025-03-20 07:08:46] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 2.6612e-05, 'epoch': 1.45, 'throughput': 10000.42} [INFO|2025-03-20 07:09:26] logging.py:143 >> {'loss': 0.4669, 'learning_rate': 2.6598e-05, 'epoch': 1.45, 'throughput': 10000.44} [INFO|2025-03-20 07:10:07] logging.py:143 >> {'loss': 0.4498, 'learning_rate': 2.6584e-05, 'epoch': 1.45, 'throughput': 10000.42} [INFO|2025-03-20 07:10:49] logging.py:143 >> {'loss': 0.4593, 'learning_rate': 2.6570e-05, 'epoch': 1.45, 'throughput': 10000.35} [INFO|2025-03-20 07:11:30] logging.py:143 >> {'loss': 0.4707, 'learning_rate': 2.6556e-05, 'epoch': 1.45, 'throughput': 10000.46} [INFO|2025-03-20 07:12:11] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 2.6543e-05, 'epoch': 1.45, 'throughput': 10000.37} [INFO|2025-03-20 07:12:51] logging.py:143 >> {'loss': 0.4661, 'learning_rate': 2.6529e-05, 'epoch': 1.45, 'throughput': 10000.36} [INFO|2025-03-20 07:13:32] logging.py:143 >> {'loss': 0.4307, 'learning_rate': 2.6515e-05, 'epoch': 1.45, 'throughput': 10000.35} [INFO|2025-03-20 07:14:14] logging.py:143 >> {'loss': 0.4736, 'learning_rate': 2.6501e-05, 'epoch': 1.45, 'throughput': 10000.29} [INFO|2025-03-20 07:14:54] logging.py:143 >> {'loss': 0.4628, 'learning_rate': 2.6487e-05, 'epoch': 1.45, 'throughput': 10000.29} [INFO|2025-03-20 07:15:34] logging.py:143 >> {'loss': 0.4719, 'learning_rate': 2.6473e-05, 'epoch': 1.45, 'throughput': 10000.32} [INFO|2025-03-20 07:16:15] logging.py:143 >> {'loss': 0.4718, 'learning_rate': 2.6459e-05, 'epoch': 1.46, 'throughput': 10000.29} [INFO|2025-03-20 07:16:54] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 2.6445e-05, 'epoch': 1.46, 'throughput': 10000.37} [INFO|2025-03-20 07:17:35] logging.py:143 >> {'loss': 0.4352, 'learning_rate': 2.6431e-05, 'epoch': 1.46, 'throughput': 10000.41} [INFO|2025-03-20 07:18:16] logging.py:143 >> {'loss': 0.4465, 'learning_rate': 2.6417e-05, 'epoch': 1.46, 'throughput': 10000.36} [INFO|2025-03-20 07:18:57] logging.py:143 >> {'loss': 0.4269, 'learning_rate': 2.6403e-05, 'epoch': 1.46, 'throughput': 10000.26} [INFO|2025-03-20 07:19:39] logging.py:143 >> {'loss': 0.4536, 'learning_rate': 2.6389e-05, 'epoch': 1.46, 'throughput': 10000.20} [INFO|2025-03-20 07:20:21] logging.py:143 >> {'loss': 0.4188, 'learning_rate': 2.6375e-05, 'epoch': 1.46, 'throughput': 10000.10} [INFO|2025-03-20 07:20:59] logging.py:143 >> {'loss': 0.4388, 'learning_rate': 2.6361e-05, 'epoch': 1.46, 'throughput': 10000.19} [INFO|2025-03-20 07:21:40] logging.py:143 >> {'loss': 0.4387, 'learning_rate': 2.6347e-05, 'epoch': 1.46, 'throughput': 10000.18} [INFO|2025-03-20 07:22:20] logging.py:143 >> {'loss': 0.4593, 'learning_rate': 2.6333e-05, 'epoch': 1.46, 'throughput': 10000.26} [INFO|2025-03-20 07:23:00] logging.py:143 >> {'loss': 0.4394, 'learning_rate': 2.6319e-05, 'epoch': 1.46, 'throughput': 10000.29} [INFO|2025-03-20 07:23:39] logging.py:143 >> {'loss': 0.4504, 'learning_rate': 2.6305e-05, 'epoch': 1.46, 'throughput': 10000.35} [INFO|2025-03-20 07:24:20] logging.py:143 >> {'loss': 0.4477, 'learning_rate': 2.6291e-05, 'epoch': 1.46, 'throughput': 10000.26} [INFO|2025-03-20 07:25:02] logging.py:143 >> {'loss': 0.4693, 'learning_rate': 2.6277e-05, 'epoch': 1.46, 'throughput': 10000.23} [INFO|2025-03-20 07:25:42] logging.py:143 >> {'loss': 0.4379, 'learning_rate': 2.6263e-05, 'epoch': 1.46, 'throughput': 10000.27} [INFO|2025-03-20 07:26:22] logging.py:143 >> {'loss': 0.4482, 'learning_rate': 2.6249e-05, 'epoch': 1.46, 'throughput': 10000.32} [INFO|2025-03-20 07:27:02] logging.py:143 >> {'loss': 0.4418, 'learning_rate': 2.6235e-05, 'epoch': 1.46, 'throughput': 10000.31} [INFO|2025-03-20 07:27:42] logging.py:143 >> {'loss': 0.4469, 'learning_rate': 2.6221e-05, 'epoch': 1.46, 'throughput': 10000.32} [INFO|2025-03-20 07:28:23] logging.py:143 >> {'loss': 0.4463, 'learning_rate': 2.6207e-05, 'epoch': 1.46, 'throughput': 10000.35} [INFO|2025-03-20 07:29:03] logging.py:143 >> {'loss': 0.4523, 'learning_rate': 2.6193e-05, 'epoch': 1.47, 'throughput': 10000.26} [INFO|2025-03-20 07:29:45] logging.py:143 >> {'loss': 0.4070, 'learning_rate': 2.6179e-05, 'epoch': 1.47, 'throughput': 10000.08} [INFO|2025-03-20 07:30:24] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 2.6165e-05, 'epoch': 1.47, 'throughput': 10000.10} [INFO|2025-03-20 07:31:05] logging.py:143 >> {'loss': 0.4734, 'learning_rate': 2.6151e-05, 'epoch': 1.47, 'throughput': 10000.09} [INFO|2025-03-20 07:31:44] logging.py:143 >> {'loss': 0.4440, 'learning_rate': 2.6137e-05, 'epoch': 1.47, 'throughput': 10000.10} [INFO|2025-03-20 07:32:25] logging.py:143 >> {'loss': 0.4365, 'learning_rate': 2.6123e-05, 'epoch': 1.47, 'throughput': 10000.21} [INFO|2025-03-20 07:33:04] logging.py:143 >> {'loss': 0.4630, 'learning_rate': 2.6109e-05, 'epoch': 1.47, 'throughput': 10000.24} [INFO|2025-03-20 07:33:44] logging.py:143 >> {'loss': 0.4428, 'learning_rate': 2.6095e-05, 'epoch': 1.47, 'throughput': 10000.30} [INFO|2025-03-20 07:34:25] logging.py:143 >> {'loss': 0.4445, 'learning_rate': 2.6081e-05, 'epoch': 1.47, 'throughput': 10000.31} [INFO|2025-03-20 07:35:06] logging.py:143 >> {'loss': 0.4679, 'learning_rate': 2.6067e-05, 'epoch': 1.47, 'throughput': 10000.26} [INFO|2025-03-20 07:35:46] logging.py:143 >> {'loss': 0.4796, 'learning_rate': 2.6053e-05, 'epoch': 1.47, 'throughput': 10000.37} [INFO|2025-03-20 07:36:26] logging.py:143 >> {'loss': 0.4351, 'learning_rate': 2.6039e-05, 'epoch': 1.47, 'throughput': 10000.42} [INFO|2025-03-20 07:37:07] logging.py:143 >> {'loss': 0.4442, 'learning_rate': 2.6025e-05, 'epoch': 1.47, 'throughput': 10000.39} [INFO|2025-03-20 07:37:47] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 2.6011e-05, 'epoch': 1.47, 'throughput': 10000.39} [INFO|2025-03-20 07:38:29] logging.py:143 >> {'loss': 0.4460, 'learning_rate': 2.5997e-05, 'epoch': 1.47, 'throughput': 10000.35} [INFO|2025-03-20 07:39:09] logging.py:143 >> {'loss': 0.4455, 'learning_rate': 2.5983e-05, 'epoch': 1.47, 'throughput': 10000.30} [INFO|2025-03-20 07:39:49] logging.py:143 >> {'loss': 0.4591, 'learning_rate': 2.5970e-05, 'epoch': 1.47, 'throughput': 10000.35} [INFO|2025-03-20 07:40:30] logging.py:143 >> {'loss': 0.4518, 'learning_rate': 2.5956e-05, 'epoch': 1.47, 'throughput': 10000.34} [INFO|2025-03-20 07:41:11] logging.py:143 >> {'loss': 0.4506, 'learning_rate': 2.5942e-05, 'epoch': 1.47, 'throughput': 10000.45} [INFO|2025-03-20 07:41:50] logging.py:143 >> {'loss': 0.4321, 'learning_rate': 2.5928e-05, 'epoch': 1.48, 'throughput': 10000.43} [INFO|2025-03-20 07:42:31] logging.py:143 >> {'loss': 0.4395, 'learning_rate': 2.5914e-05, 'epoch': 1.48, 'throughput': 10000.41} [INFO|2025-03-20 07:43:11] logging.py:143 >> {'loss': 0.4428, 'learning_rate': 2.5900e-05, 'epoch': 1.48, 'throughput': 10000.46} [INFO|2025-03-20 07:43:51] logging.py:143 >> {'loss': 0.4542, 'learning_rate': 2.5886e-05, 'epoch': 1.48, 'throughput': 10000.33} [INFO|2025-03-20 07:44:31] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 2.5872e-05, 'epoch': 1.48, 'throughput': 10000.33} [INFO|2025-03-20 07:45:10] logging.py:143 >> {'loss': 0.4323, 'learning_rate': 2.5858e-05, 'epoch': 1.48, 'throughput': 10000.33} [INFO|2025-03-20 07:45:51] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 2.5844e-05, 'epoch': 1.48, 'throughput': 10000.39} [INFO|2025-03-20 07:46:31] logging.py:143 >> {'loss': 0.4243, 'learning_rate': 2.5830e-05, 'epoch': 1.48, 'throughput': 10000.43} [INFO|2025-03-20 07:47:11] logging.py:143 >> {'loss': 0.4351, 'learning_rate': 2.5816e-05, 'epoch': 1.48, 'throughput': 10000.42} [INFO|2025-03-20 07:47:53] logging.py:143 >> {'loss': 0.4413, 'learning_rate': 2.5802e-05, 'epoch': 1.48, 'throughput': 10000.41} [INFO|2025-03-20 07:48:32] logging.py:143 >> {'loss': 0.4503, 'learning_rate': 2.5788e-05, 'epoch': 1.48, 'throughput': 10000.45} [INFO|2025-03-20 07:49:13] logging.py:143 >> {'loss': 0.4724, 'learning_rate': 2.5774e-05, 'epoch': 1.48, 'throughput': 10000.48} [INFO|2025-03-20 07:49:52] logging.py:143 >> {'loss': 0.4462, 'learning_rate': 2.5760e-05, 'epoch': 1.48, 'throughput': 10000.53} [INFO|2025-03-20 07:50:33] logging.py:143 >> {'loss': 0.4406, 'learning_rate': 2.5746e-05, 'epoch': 1.48, 'throughput': 10000.56} [INFO|2025-03-20 07:51:14] logging.py:143 >> {'loss': 0.4290, 'learning_rate': 2.5732e-05, 'epoch': 1.48, 'throughput': 10000.66} [INFO|2025-03-20 07:51:55] logging.py:143 >> {'loss': 0.4510, 'learning_rate': 2.5718e-05, 'epoch': 1.48, 'throughput': 10000.64} [INFO|2025-03-20 07:52:35] logging.py:143 >> {'loss': 0.4355, 'learning_rate': 2.5704e-05, 'epoch': 1.48, 'throughput': 10000.62} [INFO|2025-03-20 07:53:14] logging.py:143 >> {'loss': 0.4437, 'learning_rate': 2.5690e-05, 'epoch': 1.48, 'throughput': 10000.60} [INFO|2025-03-20 07:53:53] logging.py:143 >> {'loss': 0.4570, 'learning_rate': 2.5676e-05, 'epoch': 1.48, 'throughput': 10000.69} [INFO|2025-03-20 07:54:33] logging.py:143 >> {'loss': 0.4428, 'learning_rate': 2.5662e-05, 'epoch': 1.49, 'throughput': 10000.67} [INFO|2025-03-20 07:55:12] logging.py:143 >> {'loss': 0.4284, 'learning_rate': 2.5648e-05, 'epoch': 1.49, 'throughput': 10000.70} [INFO|2025-03-20 07:55:52] logging.py:143 >> {'loss': 0.4399, 'learning_rate': 2.5634e-05, 'epoch': 1.49, 'throughput': 10000.73} [INFO|2025-03-20 07:56:33] logging.py:143 >> {'loss': 0.4007, 'learning_rate': 2.5620e-05, 'epoch': 1.49, 'throughput': 10000.74} [INFO|2025-03-20 07:57:13] logging.py:143 >> {'loss': 0.4438, 'learning_rate': 2.5606e-05, 'epoch': 1.49, 'throughput': 10000.89} [INFO|2025-03-20 07:57:53] logging.py:143 >> {'loss': 0.4607, 'learning_rate': 2.5592e-05, 'epoch': 1.49, 'throughput': 10000.90} [INFO|2025-03-20 07:58:32] logging.py:143 >> {'loss': 0.4504, 'learning_rate': 2.5578e-05, 'epoch': 1.49, 'throughput': 10000.98} [INFO|2025-03-20 07:59:12] logging.py:143 >> {'loss': 0.4614, 'learning_rate': 2.5564e-05, 'epoch': 1.49, 'throughput': 10000.93} [INFO|2025-03-20 07:59:53] logging.py:143 >> {'loss': 0.4508, 'learning_rate': 2.5550e-05, 'epoch': 1.49, 'throughput': 10000.98} [INFO|2025-03-20 08:00:34] logging.py:143 >> {'loss': 0.4635, 'learning_rate': 2.5536e-05, 'epoch': 1.49, 'throughput': 10001.00} [INFO|2025-03-20 08:01:15] logging.py:143 >> {'loss': 0.4197, 'learning_rate': 2.5522e-05, 'epoch': 1.49, 'throughput': 10000.99} [INFO|2025-03-20 08:01:55] logging.py:143 >> {'loss': 0.4281, 'learning_rate': 2.5508e-05, 'epoch': 1.49, 'throughput': 10000.97} [INFO|2025-03-20 08:02:36] logging.py:143 >> {'loss': 0.4539, 'learning_rate': 2.5494e-05, 'epoch': 1.49, 'throughput': 10000.94} [INFO|2025-03-20 08:03:16] logging.py:143 >> {'loss': 0.4667, 'learning_rate': 2.5480e-05, 'epoch': 1.49, 'throughput': 10000.95} [INFO|2025-03-20 08:03:57] logging.py:143 >> {'loss': 0.4345, 'learning_rate': 2.5466e-05, 'epoch': 1.49, 'throughput': 10000.88} [INFO|2025-03-20 08:04:38] logging.py:143 >> {'loss': 0.4634, 'learning_rate': 2.5452e-05, 'epoch': 1.49, 'throughput': 10000.85} [INFO|2025-03-20 08:05:19] logging.py:143 >> {'loss': 0.4102, 'learning_rate': 2.5438e-05, 'epoch': 1.49, 'throughput': 10000.84} [INFO|2025-03-20 08:06:00] logging.py:143 >> {'loss': 0.4084, 'learning_rate': 2.5424e-05, 'epoch': 1.49, 'throughput': 10000.81} [INFO|2025-03-20 08:06:41] logging.py:143 >> {'loss': 0.4445, 'learning_rate': 2.5410e-05, 'epoch': 1.50, 'throughput': 10000.74} [INFO|2025-03-20 08:07:21] logging.py:143 >> {'loss': 0.4451, 'learning_rate': 2.5396e-05, 'epoch': 1.50, 'throughput': 10000.78} [INFO|2025-03-20 08:08:02] logging.py:143 >> {'loss': 0.4569, 'learning_rate': 2.5382e-05, 'epoch': 1.50, 'throughput': 10000.72} [INFO|2025-03-20 08:08:42] logging.py:143 >> {'loss': 0.4346, 'learning_rate': 2.5368e-05, 'epoch': 1.50, 'throughput': 10000.74} [INFO|2025-03-20 08:09:23] logging.py:143 >> {'loss': 0.3871, 'learning_rate': 2.5354e-05, 'epoch': 1.50, 'throughput': 10000.63} [INFO|2025-03-20 08:10:04] logging.py:143 >> {'loss': 0.4505, 'learning_rate': 2.5340e-05, 'epoch': 1.50, 'throughput': 10000.65} [INFO|2025-03-20 08:10:43] logging.py:143 >> {'loss': 0.4338, 'learning_rate': 2.5326e-05, 'epoch': 1.50, 'throughput': 10000.71} [INFO|2025-03-20 08:11:23] logging.py:143 >> {'loss': 0.4407, 'learning_rate': 2.5312e-05, 'epoch': 1.50, 'throughput': 10000.79} [INFO|2025-03-20 08:12:03] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 2.5298e-05, 'epoch': 1.50, 'throughput': 10000.78} [INFO|2025-03-20 08:12:44] logging.py:143 >> {'loss': 0.4529, 'learning_rate': 2.5284e-05, 'epoch': 1.50, 'throughput': 10000.75} [INFO|2025-03-20 08:13:24] logging.py:143 >> {'loss': 0.4544, 'learning_rate': 2.5270e-05, 'epoch': 1.50, 'throughput': 10000.78} [INFO|2025-03-20 08:14:05] logging.py:143 >> {'loss': 0.4504, 'learning_rate': 2.5256e-05, 'epoch': 1.50, 'throughput': 10000.76} [INFO|2025-03-20 08:14:44] logging.py:143 >> {'loss': 0.4475, 'learning_rate': 2.5242e-05, 'epoch': 1.50, 'throughput': 10000.82} [INFO|2025-03-20 08:15:27] logging.py:143 >> {'loss': 0.4272, 'learning_rate': 2.5228e-05, 'epoch': 1.50, 'throughput': 10000.74} [INFO|2025-03-20 08:16:07] logging.py:143 >> {'loss': 0.3918, 'learning_rate': 2.5214e-05, 'epoch': 1.50, 'throughput': 10000.75} [INFO|2025-03-20 08:16:48] logging.py:143 >> {'loss': 0.4256, 'learning_rate': 2.5200e-05, 'epoch': 1.50, 'throughput': 10000.62} [INFO|2025-03-20 08:17:28] logging.py:143 >> {'loss': 0.4419, 'learning_rate': 2.5186e-05, 'epoch': 1.50, 'throughput': 10000.72} [INFO|2025-03-20 08:18:07] logging.py:143 >> {'loss': 0.4752, 'learning_rate': 2.5172e-05, 'epoch': 1.50, 'throughput': 10000.76} [INFO|2025-03-20 08:18:48] logging.py:143 >> {'loss': 0.4241, 'learning_rate': 2.5158e-05, 'epoch': 1.50, 'throughput': 10000.72} [INFO|2025-03-20 08:19:29] logging.py:143 >> {'loss': 0.4555, 'learning_rate': 2.5144e-05, 'epoch': 1.51, 'throughput': 10000.66} [INFO|2025-03-20 08:20:09] logging.py:143 >> {'loss': 0.4375, 'learning_rate': 2.5130e-05, 'epoch': 1.51, 'throughput': 10000.57} [INFO|2025-03-20 08:20:50] logging.py:143 >> {'loss': 0.4411, 'learning_rate': 2.5116e-05, 'epoch': 1.51, 'throughput': 10000.56} [INFO|2025-03-20 08:21:30] logging.py:143 >> {'loss': 0.4195, 'learning_rate': 2.5102e-05, 'epoch': 1.51, 'throughput': 10000.50} [INFO|2025-03-20 08:22:12] logging.py:143 >> {'loss': 0.4465, 'learning_rate': 2.5088e-05, 'epoch': 1.51, 'throughput': 10000.41} [INFO|2025-03-20 08:22:52] logging.py:143 >> {'loss': 0.4164, 'learning_rate': 2.5074e-05, 'epoch': 1.51, 'throughput': 10000.46} [INFO|2025-03-20 08:23:32] logging.py:143 >> {'loss': 0.4620, 'learning_rate': 2.5060e-05, 'epoch': 1.51, 'throughput': 10000.48} [INFO|2025-03-20 08:24:12] logging.py:143 >> {'loss': 0.4618, 'learning_rate': 2.5046e-05, 'epoch': 1.51, 'throughput': 10000.46} [INFO|2025-03-20 08:24:52] logging.py:143 >> {'loss': 0.4530, 'learning_rate': 2.5032e-05, 'epoch': 1.51, 'throughput': 10000.45} [INFO|2025-03-20 08:25:31] logging.py:143 >> {'loss': 0.4314, 'learning_rate': 2.5018e-05, 'epoch': 1.51, 'throughput': 10000.54} [INFO|2025-03-20 08:26:09] logging.py:143 >> {'loss': 0.4394, 'learning_rate': 2.5004e-05, 'epoch': 1.51, 'throughput': 10000.57} [INFO|2025-03-20 08:26:49] logging.py:143 >> {'loss': 0.4102, 'learning_rate': 2.4990e-05, 'epoch': 1.51, 'throughput': 10000.64} [INFO|2025-03-20 08:27:30] logging.py:143 >> {'loss': 0.4290, 'learning_rate': 2.4976e-05, 'epoch': 1.51, 'throughput': 10000.66} [INFO|2025-03-20 08:28:10] logging.py:143 >> {'loss': 0.4500, 'learning_rate': 2.4962e-05, 'epoch': 1.51, 'throughput': 10000.76} [INFO|2025-03-20 08:28:51] logging.py:143 >> {'loss': 0.4468, 'learning_rate': 2.4948e-05, 'epoch': 1.51, 'throughput': 10000.77} [INFO|2025-03-20 08:29:31] logging.py:143 >> {'loss': 0.4226, 'learning_rate': 2.4934e-05, 'epoch': 1.51, 'throughput': 10000.90} [INFO|2025-03-20 08:30:12] logging.py:143 >> {'loss': 0.4403, 'learning_rate': 2.4920e-05, 'epoch': 1.51, 'throughput': 10000.84} [INFO|2025-03-20 08:30:51] logging.py:143 >> {'loss': 0.4467, 'learning_rate': 2.4906e-05, 'epoch': 1.51, 'throughput': 10000.99} [INFO|2025-03-20 08:31:32] logging.py:143 >> {'loss': 0.4560, 'learning_rate': 2.4892e-05, 'epoch': 1.51, 'throughput': 10000.91} [INFO|2025-03-20 08:32:12] logging.py:143 >> {'loss': 0.4639, 'learning_rate': 2.4878e-05, 'epoch': 1.52, 'throughput': 10000.89} [INFO|2025-03-20 08:32:51] logging.py:143 >> {'loss': 0.4315, 'learning_rate': 2.4864e-05, 'epoch': 1.52, 'throughput': 10000.85} [INFO|2025-03-20 08:33:33] logging.py:143 >> {'loss': 0.4352, 'learning_rate': 2.4850e-05, 'epoch': 1.52, 'throughput': 10000.83} [INFO|2025-03-20 08:34:13] logging.py:143 >> {'loss': 0.4415, 'learning_rate': 2.4836e-05, 'epoch': 1.52, 'throughput': 10000.81} [INFO|2025-03-20 08:34:53] logging.py:143 >> {'loss': 0.4337, 'learning_rate': 2.4822e-05, 'epoch': 1.52, 'throughput': 10000.96} [INFO|2025-03-20 08:35:32] logging.py:143 >> {'loss': 0.4492, 'learning_rate': 2.4808e-05, 'epoch': 1.52, 'throughput': 10001.03} [INFO|2025-03-20 08:36:13] logging.py:143 >> {'loss': 0.4663, 'learning_rate': 2.4794e-05, 'epoch': 1.52, 'throughput': 10001.02} [INFO|2025-03-20 08:36:53] logging.py:143 >> {'loss': 0.4383, 'learning_rate': 2.4780e-05, 'epoch': 1.52, 'throughput': 10001.04} [INFO|2025-03-20 08:37:34] logging.py:143 >> {'loss': 0.4479, 'learning_rate': 2.4766e-05, 'epoch': 1.52, 'throughput': 10001.09} [INFO|2025-03-20 08:38:14] logging.py:143 >> {'loss': 0.4238, 'learning_rate': 2.4752e-05, 'epoch': 1.52, 'throughput': 10001.16} [INFO|2025-03-20 08:38:54] logging.py:143 >> {'loss': 0.4511, 'learning_rate': 2.4738e-05, 'epoch': 1.52, 'throughput': 10001.29} [INFO|2025-03-20 08:39:34] logging.py:143 >> {'loss': 0.4349, 'learning_rate': 2.4724e-05, 'epoch': 1.52, 'throughput': 10001.34} [INFO|2025-03-20 08:40:14] logging.py:143 >> {'loss': 0.4199, 'learning_rate': 2.4710e-05, 'epoch': 1.52, 'throughput': 10001.39} [INFO|2025-03-20 08:40:54] logging.py:143 >> {'loss': 0.4394, 'learning_rate': 2.4696e-05, 'epoch': 1.52, 'throughput': 10001.35} [INFO|2025-03-20 08:41:34] logging.py:143 >> {'loss': 0.4455, 'learning_rate': 2.4682e-05, 'epoch': 1.52, 'throughput': 10001.45} [INFO|2025-03-20 08:42:16] logging.py:143 >> {'loss': 0.4336, 'learning_rate': 2.4668e-05, 'epoch': 1.52, 'throughput': 10001.41} [INFO|2025-03-20 08:42:56] logging.py:143 >> {'loss': 0.4482, 'learning_rate': 2.4654e-05, 'epoch': 1.52, 'throughput': 10001.46} [INFO|2025-03-20 08:43:35] logging.py:143 >> {'loss': 0.4533, 'learning_rate': 2.4640e-05, 'epoch': 1.52, 'throughput': 10001.45} [INFO|2025-03-20 08:44:16] logging.py:143 >> {'loss': 0.4457, 'learning_rate': 2.4626e-05, 'epoch': 1.52, 'throughput': 10001.40} [INFO|2025-03-20 08:44:56] logging.py:143 >> {'loss': 0.4344, 'learning_rate': 2.4612e-05, 'epoch': 1.53, 'throughput': 10001.47} [INFO|2025-03-20 08:45:37] logging.py:143 >> {'loss': 0.4100, 'learning_rate': 2.4598e-05, 'epoch': 1.53, 'throughput': 10001.36} [INFO|2025-03-20 08:46:17] logging.py:143 >> {'loss': 0.4383, 'learning_rate': 2.4584e-05, 'epoch': 1.53, 'throughput': 10001.34} [INFO|2025-03-20 08:46:58] logging.py:143 >> {'loss': 0.4230, 'learning_rate': 2.4570e-05, 'epoch': 1.53, 'throughput': 10001.34} [INFO|2025-03-20 08:47:40] logging.py:143 >> {'loss': 0.4548, 'learning_rate': 2.4556e-05, 'epoch': 1.53, 'throughput': 10001.30} [INFO|2025-03-20 08:48:22] logging.py:143 >> {'loss': 0.4330, 'learning_rate': 2.4542e-05, 'epoch': 1.53, 'throughput': 10001.19} [INFO|2025-03-20 08:49:03] logging.py:143 >> {'loss': 0.4482, 'learning_rate': 2.4528e-05, 'epoch': 1.53, 'throughput': 10001.23} [INFO|2025-03-20 08:49:44] logging.py:143 >> {'loss': 0.4275, 'learning_rate': 2.4514e-05, 'epoch': 1.53, 'throughput': 10001.20} [INFO|2025-03-20 08:50:23] logging.py:143 >> {'loss': 0.4616, 'learning_rate': 2.4500e-05, 'epoch': 1.53, 'throughput': 10001.22} [INFO|2025-03-20 08:51:04] logging.py:143 >> {'loss': 0.4449, 'learning_rate': 2.4486e-05, 'epoch': 1.53, 'throughput': 10001.21} [INFO|2025-03-20 08:51:44] logging.py:143 >> {'loss': 0.4536, 'learning_rate': 2.4472e-05, 'epoch': 1.53, 'throughput': 10001.17} [INFO|2025-03-20 08:52:23] logging.py:143 >> {'loss': 0.4394, 'learning_rate': 2.4458e-05, 'epoch': 1.53, 'throughput': 10001.26} [INFO|2025-03-20 08:53:04] logging.py:143 >> {'loss': 0.4575, 'learning_rate': 2.4445e-05, 'epoch': 1.53, 'throughput': 10001.30} [INFO|2025-03-20 08:53:43] logging.py:143 >> {'loss': 0.4442, 'learning_rate': 2.4431e-05, 'epoch': 1.53, 'throughput': 10001.37} [INFO|2025-03-20 08:54:23] logging.py:143 >> {'loss': 0.4350, 'learning_rate': 2.4417e-05, 'epoch': 1.53, 'throughput': 10001.32} [INFO|2025-03-20 08:55:03] logging.py:143 >> {'loss': 0.4517, 'learning_rate': 2.4403e-05, 'epoch': 1.53, 'throughput': 10001.35} [INFO|2025-03-20 08:55:45] logging.py:143 >> {'loss': 0.4498, 'learning_rate': 2.4389e-05, 'epoch': 1.53, 'throughput': 10001.37} [INFO|2025-03-20 08:56:25] logging.py:143 >> {'loss': 0.4446, 'learning_rate': 2.4375e-05, 'epoch': 1.53, 'throughput': 10001.33} [INFO|2025-03-20 08:57:06] logging.py:143 >> {'loss': 0.4538, 'learning_rate': 2.4361e-05, 'epoch': 1.53, 'throughput': 10001.34} [INFO|2025-03-20 08:57:47] logging.py:143 >> {'loss': 0.4307, 'learning_rate': 2.4347e-05, 'epoch': 1.54, 'throughput': 10001.33} [INFO|2025-03-20 08:58:27] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 2.4333e-05, 'epoch': 1.54, 'throughput': 10001.30} [INFO|2025-03-20 08:59:06] logging.py:143 >> {'loss': 0.4678, 'learning_rate': 2.4319e-05, 'epoch': 1.54, 'throughput': 10001.43} [INFO|2025-03-20 08:59:47] logging.py:143 >> {'loss': 0.4316, 'learning_rate': 2.4305e-05, 'epoch': 1.54, 'throughput': 10001.42} [INFO|2025-03-20 09:00:28] logging.py:143 >> {'loss': 0.4292, 'learning_rate': 2.4291e-05, 'epoch': 1.54, 'throughput': 10001.41} [INFO|2025-03-20 09:01:09] logging.py:143 >> {'loss': 0.4426, 'learning_rate': 2.4277e-05, 'epoch': 1.54, 'throughput': 10001.48} [INFO|2025-03-20 09:01:48] logging.py:143 >> {'loss': 0.4559, 'learning_rate': 2.4263e-05, 'epoch': 1.54, 'throughput': 10001.58} [INFO|2025-03-20 09:02:30] logging.py:143 >> {'loss': 0.4450, 'learning_rate': 2.4249e-05, 'epoch': 1.54, 'throughput': 10001.46} [INFO|2025-03-20 09:03:12] logging.py:143 >> {'loss': 0.4475, 'learning_rate': 2.4235e-05, 'epoch': 1.54, 'throughput': 10001.43} [INFO|2025-03-20 09:03:51] logging.py:143 >> {'loss': 0.4771, 'learning_rate': 2.4221e-05, 'epoch': 1.54, 'throughput': 10001.56} [INFO|2025-03-20 09:04:31] logging.py:143 >> {'loss': 0.4542, 'learning_rate': 2.4207e-05, 'epoch': 1.54, 'throughput': 10001.57} [INFO|2025-03-20 09:05:12] logging.py:143 >> {'loss': 0.4442, 'learning_rate': 2.4193e-05, 'epoch': 1.54, 'throughput': 10001.46} [INFO|2025-03-20 09:05:54] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 2.4179e-05, 'epoch': 1.54, 'throughput': 10001.46} [INFO|2025-03-20 09:06:32] logging.py:143 >> {'loss': 0.4164, 'learning_rate': 2.4165e-05, 'epoch': 1.54, 'throughput': 10001.61} [INFO|2025-03-20 09:07:13] logging.py:143 >> {'loss': 0.4841, 'learning_rate': 2.4151e-05, 'epoch': 1.54, 'throughput': 10001.59} [INFO|2025-03-20 09:07:53] logging.py:143 >> {'loss': 0.4413, 'learning_rate': 2.4137e-05, 'epoch': 1.54, 'throughput': 10001.54} [INFO|2025-03-20 09:08:33] logging.py:143 >> {'loss': 0.4285, 'learning_rate': 2.4123e-05, 'epoch': 1.54, 'throughput': 10001.49} [INFO|2025-03-20 09:09:13] logging.py:143 >> {'loss': 0.4207, 'learning_rate': 2.4109e-05, 'epoch': 1.54, 'throughput': 10001.48} [INFO|2025-03-20 09:09:52] logging.py:143 >> {'loss': 0.4720, 'learning_rate': 2.4095e-05, 'epoch': 1.54, 'throughput': 10001.63} [INFO|2025-03-20 09:10:33] logging.py:143 >> {'loss': 0.4675, 'learning_rate': 2.4081e-05, 'epoch': 1.55, 'throughput': 10001.60} [INFO|2025-03-20 09:11:14] logging.py:143 >> {'loss': 0.4551, 'learning_rate': 2.4067e-05, 'epoch': 1.55, 'throughput': 10001.61} [INFO|2025-03-20 09:11:54] logging.py:143 >> {'loss': 0.4053, 'learning_rate': 2.4053e-05, 'epoch': 1.55, 'throughput': 10001.56} [INFO|2025-03-20 09:12:34] logging.py:143 >> {'loss': 0.4564, 'learning_rate': 2.4039e-05, 'epoch': 1.55, 'throughput': 10001.66} [INFO|2025-03-20 09:13:15] logging.py:143 >> {'loss': 0.4440, 'learning_rate': 2.4025e-05, 'epoch': 1.55, 'throughput': 10001.67} [INFO|2025-03-20 09:13:56] logging.py:143 >> {'loss': 0.4184, 'learning_rate': 2.4011e-05, 'epoch': 1.55, 'throughput': 10001.57} [INFO|2025-03-20 09:14:35] logging.py:143 >> {'loss': 0.4411, 'learning_rate': 2.3997e-05, 'epoch': 1.55, 'throughput': 10001.51} [INFO|2025-03-20 09:15:16] logging.py:143 >> {'loss': 0.4098, 'learning_rate': 2.3983e-05, 'epoch': 1.55, 'throughput': 10001.41} [INFO|2025-03-20 09:15:58] logging.py:143 >> {'loss': 0.4623, 'learning_rate': 2.3969e-05, 'epoch': 1.55, 'throughput': 10001.29} [INFO|2025-03-20 09:16:39] logging.py:143 >> {'loss': 0.4863, 'learning_rate': 2.3955e-05, 'epoch': 1.55, 'throughput': 10001.28} [INFO|2025-03-20 09:17:18] logging.py:143 >> {'loss': 0.4276, 'learning_rate': 2.3941e-05, 'epoch': 1.55, 'throughput': 10001.37} [INFO|2025-03-20 09:17:57] logging.py:143 >> {'loss': 0.4567, 'learning_rate': 2.3927e-05, 'epoch': 1.55, 'throughput': 10001.44} [INFO|2025-03-20 09:18:37] logging.py:143 >> {'loss': 0.4401, 'learning_rate': 2.3913e-05, 'epoch': 1.55, 'throughput': 10001.40} [INFO|2025-03-20 09:19:17] logging.py:143 >> {'loss': 0.4350, 'learning_rate': 2.3899e-05, 'epoch': 1.55, 'throughput': 10001.37} [INFO|2025-03-20 09:19:58] logging.py:143 >> {'loss': 0.4453, 'learning_rate': 2.3885e-05, 'epoch': 1.55, 'throughput': 10001.28} [INFO|2025-03-20 09:20:38] logging.py:143 >> {'loss': 0.4077, 'learning_rate': 2.3871e-05, 'epoch': 1.55, 'throughput': 10001.32} [INFO|2025-03-20 09:21:19] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 2.3857e-05, 'epoch': 1.55, 'throughput': 10001.39} [INFO|2025-03-20 09:22:01] logging.py:143 >> {'loss': 0.4317, 'learning_rate': 2.3843e-05, 'epoch': 1.55, 'throughput': 10001.32} [INFO|2025-03-20 09:22:41] logging.py:143 >> {'loss': 0.4341, 'learning_rate': 2.3829e-05, 'epoch': 1.56, 'throughput': 10001.37} [INFO|2025-03-20 09:23:21] logging.py:143 >> {'loss': 0.4531, 'learning_rate': 2.3815e-05, 'epoch': 1.56, 'throughput': 10001.28} [INFO|2025-03-20 09:24:01] logging.py:143 >> {'loss': 0.4412, 'learning_rate': 2.3801e-05, 'epoch': 1.56, 'throughput': 10001.28} [INFO|2025-03-20 09:24:40] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 2.3787e-05, 'epoch': 1.56, 'throughput': 10001.33} [INFO|2025-03-20 09:25:20] logging.py:143 >> {'loss': 0.4359, 'learning_rate': 2.3773e-05, 'epoch': 1.56, 'throughput': 10001.40} [INFO|2025-03-20 09:25:59] logging.py:143 >> {'loss': 0.4190, 'learning_rate': 2.3759e-05, 'epoch': 1.56, 'throughput': 10001.36} [INFO|2025-03-20 09:26:40] logging.py:143 >> {'loss': 0.4344, 'learning_rate': 2.3745e-05, 'epoch': 1.56, 'throughput': 10001.37} [INFO|2025-03-20 09:27:19] logging.py:143 >> {'loss': 0.4405, 'learning_rate': 2.3731e-05, 'epoch': 1.56, 'throughput': 10001.40} [INFO|2025-03-20 09:28:00] logging.py:143 >> {'loss': 0.4369, 'learning_rate': 2.3717e-05, 'epoch': 1.56, 'throughput': 10001.42} [INFO|2025-03-20 09:28:40] logging.py:143 >> {'loss': 0.4496, 'learning_rate': 2.3703e-05, 'epoch': 1.56, 'throughput': 10001.38} [INFO|2025-03-20 09:29:21] logging.py:143 >> {'loss': 0.4445, 'learning_rate': 2.3689e-05, 'epoch': 1.56, 'throughput': 10001.35} [INFO|2025-03-20 09:30:02] logging.py:143 >> {'loss': 0.4592, 'learning_rate': 2.3675e-05, 'epoch': 1.56, 'throughput': 10001.40} [INFO|2025-03-20 09:30:42] logging.py:143 >> {'loss': 0.4256, 'learning_rate': 2.3661e-05, 'epoch': 1.56, 'throughput': 10001.31} [INFO|2025-03-20 09:31:24] logging.py:143 >> {'loss': 0.4621, 'learning_rate': 2.3647e-05, 'epoch': 1.56, 'throughput': 10001.28} [INFO|2025-03-20 09:32:03] logging.py:143 >> {'loss': 0.4422, 'learning_rate': 2.3634e-05, 'epoch': 1.56, 'throughput': 10001.33} [INFO|2025-03-20 09:32:44] logging.py:143 >> {'loss': 0.4356, 'learning_rate': 2.3620e-05, 'epoch': 1.56, 'throughput': 10001.27} [INFO|2025-03-20 09:33:25] logging.py:143 >> {'loss': 0.4336, 'learning_rate': 2.3606e-05, 'epoch': 1.56, 'throughput': 10001.26} [INFO|2025-03-20 09:34:04] logging.py:143 >> {'loss': 0.4086, 'learning_rate': 2.3592e-05, 'epoch': 1.56, 'throughput': 10001.31} [INFO|2025-03-20 09:34:47] logging.py:143 >> {'loss': 0.4224, 'learning_rate': 2.3578e-05, 'epoch': 1.56, 'throughput': 10001.20} [INFO|2025-03-20 09:35:28] logging.py:143 >> {'loss': 0.4391, 'learning_rate': 2.3564e-05, 'epoch': 1.57, 'throughput': 10001.24} [INFO|2025-03-20 09:36:10] logging.py:143 >> {'loss': 0.4448, 'learning_rate': 2.3550e-05, 'epoch': 1.57, 'throughput': 10001.21} [INFO|2025-03-20 09:36:51] logging.py:143 >> {'loss': 0.4179, 'learning_rate': 2.3536e-05, 'epoch': 1.57, 'throughput': 10001.32} [INFO|2025-03-20 09:37:32] logging.py:143 >> {'loss': 0.4522, 'learning_rate': 2.3522e-05, 'epoch': 1.57, 'throughput': 10001.31} [INFO|2025-03-20 09:38:13] logging.py:143 >> {'loss': 0.4448, 'learning_rate': 2.3508e-05, 'epoch': 1.57, 'throughput': 10001.25} [INFO|2025-03-20 09:38:53] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 2.3494e-05, 'epoch': 1.57, 'throughput': 10001.34} [INFO|2025-03-20 09:39:35] logging.py:143 >> {'loss': 0.4539, 'learning_rate': 2.3480e-05, 'epoch': 1.57, 'throughput': 10001.28} [INFO|2025-03-20 09:40:17] logging.py:143 >> {'loss': 0.4313, 'learning_rate': 2.3466e-05, 'epoch': 1.57, 'throughput': 10001.21} [INFO|2025-03-20 09:40:58] logging.py:143 >> {'loss': 0.4432, 'learning_rate': 2.3452e-05, 'epoch': 1.57, 'throughput': 10001.17} [INFO|2025-03-20 09:41:37] logging.py:143 >> {'loss': 0.4354, 'learning_rate': 2.3438e-05, 'epoch': 1.57, 'throughput': 10001.34} [INFO|2025-03-20 09:42:18] logging.py:143 >> {'loss': 0.4398, 'learning_rate': 2.3424e-05, 'epoch': 1.57, 'throughput': 10001.25} [INFO|2025-03-20 09:43:00] logging.py:143 >> {'loss': 0.4645, 'learning_rate': 2.3410e-05, 'epoch': 1.57, 'throughput': 10001.15} [INFO|2025-03-20 09:43:40] logging.py:143 >> {'loss': 0.4485, 'learning_rate': 2.3396e-05, 'epoch': 1.57, 'throughput': 10001.23} [INFO|2025-03-20 09:44:20] logging.py:143 >> {'loss': 0.4449, 'learning_rate': 2.3382e-05, 'epoch': 1.57, 'throughput': 10001.25} [INFO|2025-03-20 09:45:00] logging.py:143 >> {'loss': 0.4330, 'learning_rate': 2.3368e-05, 'epoch': 1.57, 'throughput': 10001.14} [INFO|2025-03-20 09:45:40] logging.py:143 >> {'loss': 0.4463, 'learning_rate': 2.3354e-05, 'epoch': 1.57, 'throughput': 10001.14} [INFO|2025-03-20 09:46:21] logging.py:143 >> {'loss': 0.4666, 'learning_rate': 2.3340e-05, 'epoch': 1.57, 'throughput': 10001.13} [INFO|2025-03-20 09:47:00] logging.py:143 >> {'loss': 0.4159, 'learning_rate': 2.3326e-05, 'epoch': 1.57, 'throughput': 10001.12} [INFO|2025-03-20 09:47:40] logging.py:143 >> {'loss': 0.4299, 'learning_rate': 2.3312e-05, 'epoch': 1.57, 'throughput': 10001.19} [INFO|2025-03-20 09:48:22] logging.py:143 >> {'loss': 0.4036, 'learning_rate': 2.3298e-05, 'epoch': 1.58, 'throughput': 10001.10} [INFO|2025-03-20 09:49:04] logging.py:143 >> {'loss': 0.4440, 'learning_rate': 2.3284e-05, 'epoch': 1.58, 'throughput': 10001.02} [INFO|2025-03-20 09:49:44] logging.py:143 >> {'loss': 0.4213, 'learning_rate': 2.3270e-05, 'epoch': 1.58, 'throughput': 10001.00} [INFO|2025-03-20 09:50:26] logging.py:143 >> {'loss': 0.4323, 'learning_rate': 2.3256e-05, 'epoch': 1.58, 'throughput': 10000.92} [INFO|2025-03-20 09:51:07] logging.py:143 >> {'loss': 0.4120, 'learning_rate': 2.3242e-05, 'epoch': 1.58, 'throughput': 10000.91} [INFO|2025-03-20 09:51:46] logging.py:143 >> {'loss': 0.4266, 'learning_rate': 2.3229e-05, 'epoch': 1.58, 'throughput': 10000.99} [INFO|2025-03-20 09:52:26] logging.py:143 >> {'loss': 0.4271, 'learning_rate': 2.3215e-05, 'epoch': 1.58, 'throughput': 10001.05} [INFO|2025-03-20 09:53:06] logging.py:143 >> {'loss': 0.4715, 'learning_rate': 2.3201e-05, 'epoch': 1.58, 'throughput': 10001.09} [INFO|2025-03-20 09:53:47] logging.py:143 >> {'loss': 0.4185, 'learning_rate': 2.3187e-05, 'epoch': 1.58, 'throughput': 10001.01} [INFO|2025-03-20 09:54:28] logging.py:143 >> {'loss': 0.4225, 'learning_rate': 2.3173e-05, 'epoch': 1.58, 'throughput': 10001.01} [INFO|2025-03-20 09:55:09] logging.py:143 >> {'loss': 0.4600, 'learning_rate': 2.3159e-05, 'epoch': 1.58, 'throughput': 10001.03} [INFO|2025-03-20 09:55:49] logging.py:143 >> {'loss': 0.4321, 'learning_rate': 2.3145e-05, 'epoch': 1.58, 'throughput': 10000.99} [INFO|2025-03-20 09:56:28] logging.py:143 >> {'loss': 0.4416, 'learning_rate': 2.3131e-05, 'epoch': 1.58, 'throughput': 10001.00} [INFO|2025-03-20 09:57:09] logging.py:143 >> {'loss': 0.4245, 'learning_rate': 2.3117e-05, 'epoch': 1.58, 'throughput': 10000.96} [INFO|2025-03-20 09:57:52] logging.py:143 >> {'loss': 0.4310, 'learning_rate': 2.3103e-05, 'epoch': 1.58, 'throughput': 10000.91} [INFO|2025-03-20 09:58:32] logging.py:143 >> {'loss': 0.4615, 'learning_rate': 2.3089e-05, 'epoch': 1.58, 'throughput': 10000.90} [INFO|2025-03-20 09:59:11] logging.py:143 >> {'loss': 0.4489, 'learning_rate': 2.3075e-05, 'epoch': 1.58, 'throughput': 10000.99} [INFO|2025-03-20 09:59:52] logging.py:143 >> {'loss': 0.4366, 'learning_rate': 2.3061e-05, 'epoch': 1.58, 'throughput': 10000.97} [INFO|2025-03-20 10:00:32] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 2.3047e-05, 'epoch': 1.58, 'throughput': 10001.01} [INFO|2025-03-20 10:01:14] logging.py:143 >> {'loss': 0.4452, 'learning_rate': 2.3033e-05, 'epoch': 1.59, 'throughput': 10000.95} [INFO|2025-03-20 10:01:56] logging.py:143 >> {'loss': 0.4355, 'learning_rate': 2.3019e-05, 'epoch': 1.59, 'throughput': 10000.77} [INFO|2025-03-20 10:02:37] logging.py:143 >> {'loss': 0.4686, 'learning_rate': 2.3005e-05, 'epoch': 1.59, 'throughput': 10000.86} [INFO|2025-03-20 10:03:18] logging.py:143 >> {'loss': 0.4321, 'learning_rate': 2.2991e-05, 'epoch': 1.59, 'throughput': 10000.82} [INFO|2025-03-20 10:04:01] logging.py:143 >> {'loss': 0.4223, 'learning_rate': 2.2977e-05, 'epoch': 1.59, 'throughput': 10000.67} [INFO|2025-03-20 10:04:41] logging.py:143 >> {'loss': 0.3854, 'learning_rate': 2.2963e-05, 'epoch': 1.59, 'throughput': 10000.61} [INFO|2025-03-20 10:05:21] logging.py:143 >> {'loss': 0.4163, 'learning_rate': 2.2949e-05, 'epoch': 1.59, 'throughput': 10000.58} [INFO|2025-03-20 10:06:01] logging.py:143 >> {'loss': 0.4662, 'learning_rate': 2.2936e-05, 'epoch': 1.59, 'throughput': 10000.57} [INFO|2025-03-20 10:06:41] logging.py:143 >> {'loss': 0.4501, 'learning_rate': 2.2922e-05, 'epoch': 1.59, 'throughput': 10000.58} [INFO|2025-03-20 10:07:22] logging.py:143 >> {'loss': 0.4287, 'learning_rate': 2.2908e-05, 'epoch': 1.59, 'throughput': 10000.54} [INFO|2025-03-20 10:08:03] logging.py:143 >> {'loss': 0.4253, 'learning_rate': 2.2894e-05, 'epoch': 1.59, 'throughput': 10000.53} [INFO|2025-03-20 10:08:43] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 2.2880e-05, 'epoch': 1.59, 'throughput': 10000.56} [INFO|2025-03-20 10:09:22] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 2.2866e-05, 'epoch': 1.59, 'throughput': 10000.73} [INFO|2025-03-20 10:10:02] logging.py:143 >> {'loss': 0.4506, 'learning_rate': 2.2852e-05, 'epoch': 1.59, 'throughput': 10000.83} [INFO|2025-03-20 10:10:06] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000 [INFO|2025-03-20 10:10:06] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000/config.json [INFO|2025-03-20 10:10:06] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000/generation_config.json [INFO|2025-03-20 10:10:22] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000/model.safetensors.index.json. [INFO|2025-03-20 10:10:22] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000/tokenizer_config.json [INFO|2025-03-20 10:10:22] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-15000/special_tokens_map.json [INFO|2025-03-20 10:11:29] logging.py:143 >> {'loss': 0.4375, 'learning_rate': 2.2838e-05, 'epoch': 1.59, 'throughput': 9997.11} [INFO|2025-03-20 10:12:11] logging.py:143 >> {'loss': 0.4405, 'learning_rate': 2.2824e-05, 'epoch': 1.59, 'throughput': 9997.02} [INFO|2025-03-20 10:12:51] logging.py:143 >> {'loss': 0.4524, 'learning_rate': 2.2810e-05, 'epoch': 1.59, 'throughput': 9997.03} [INFO|2025-03-20 10:13:31] logging.py:143 >> {'loss': 0.4223, 'learning_rate': 2.2796e-05, 'epoch': 1.59, 'throughput': 9997.00} [INFO|2025-03-20 10:14:13] logging.py:143 >> {'loss': 0.4418, 'learning_rate': 2.2782e-05, 'epoch': 1.59, 'throughput': 9996.90} [INFO|2025-03-20 10:14:53] logging.py:143 >> {'loss': 0.4309, 'learning_rate': 2.2768e-05, 'epoch': 1.60, 'throughput': 9996.97} [INFO|2025-03-20 10:15:33] logging.py:143 >> {'loss': 0.3805, 'learning_rate': 2.2754e-05, 'epoch': 1.60, 'throughput': 9996.96} [INFO|2025-03-20 10:16:13] logging.py:143 >> {'loss': 0.4223, 'learning_rate': 2.2740e-05, 'epoch': 1.60, 'throughput': 9996.96} [INFO|2025-03-20 10:16:55] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 2.2726e-05, 'epoch': 1.60, 'throughput': 9996.88} [INFO|2025-03-20 10:17:36] logging.py:143 >> {'loss': 0.4323, 'learning_rate': 2.2712e-05, 'epoch': 1.60, 'throughput': 9996.92} [INFO|2025-03-20 10:18:17] logging.py:143 >> {'loss': 0.4186, 'learning_rate': 2.2699e-05, 'epoch': 1.60, 'throughput': 9996.88} [INFO|2025-03-20 10:18:57] logging.py:143 >> {'loss': 0.4589, 'learning_rate': 2.2685e-05, 'epoch': 1.60, 'throughput': 9996.91} [INFO|2025-03-20 10:19:38] logging.py:143 >> {'loss': 0.4473, 'learning_rate': 2.2671e-05, 'epoch': 1.60, 'throughput': 9996.94} [INFO|2025-03-20 10:20:18] logging.py:143 >> {'loss': 0.4517, 'learning_rate': 2.2657e-05, 'epoch': 1.60, 'throughput': 9997.04} [INFO|2025-03-20 10:21:00] logging.py:143 >> {'loss': 0.4474, 'learning_rate': 2.2643e-05, 'epoch': 1.60, 'throughput': 9996.97} [INFO|2025-03-20 10:21:41] logging.py:143 >> {'loss': 0.4366, 'learning_rate': 2.2629e-05, 'epoch': 1.60, 'throughput': 9997.01} [INFO|2025-03-20 10:22:22] logging.py:143 >> {'loss': 0.4214, 'learning_rate': 2.2615e-05, 'epoch': 1.60, 'throughput': 9996.84} [INFO|2025-03-20 10:23:02] logging.py:143 >> {'loss': 0.4559, 'learning_rate': 2.2601e-05, 'epoch': 1.60, 'throughput': 9996.82} [INFO|2025-03-20 10:23:43] logging.py:143 >> {'loss': 0.4234, 'learning_rate': 2.2587e-05, 'epoch': 1.60, 'throughput': 9996.81} [INFO|2025-03-20 10:24:22] logging.py:143 >> {'loss': 0.4471, 'learning_rate': 2.2573e-05, 'epoch': 1.60, 'throughput': 9996.88} [INFO|2025-03-20 10:25:03] logging.py:143 >> {'loss': 0.4226, 'learning_rate': 2.2559e-05, 'epoch': 1.60, 'throughput': 9996.81} [INFO|2025-03-20 10:25:44] logging.py:143 >> {'loss': 0.4441, 'learning_rate': 2.2545e-05, 'epoch': 1.60, 'throughput': 9996.80} [INFO|2025-03-20 10:26:24] logging.py:143 >> {'loss': 0.4155, 'learning_rate': 2.2531e-05, 'epoch': 1.60, 'throughput': 9996.85} [INFO|2025-03-20 10:27:03] logging.py:143 >> {'loss': 0.4602, 'learning_rate': 2.2517e-05, 'epoch': 1.60, 'throughput': 9996.95} [INFO|2025-03-20 10:27:44] logging.py:143 >> {'loss': 0.4357, 'learning_rate': 2.2504e-05, 'epoch': 1.61, 'throughput': 9996.99} [INFO|2025-03-20 10:28:27] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 2.2490e-05, 'epoch': 1.61, 'throughput': 9997.03} [INFO|2025-03-20 10:29:08] logging.py:143 >> {'loss': 0.4356, 'learning_rate': 2.2476e-05, 'epoch': 1.61, 'throughput': 9997.06} [INFO|2025-03-20 10:29:48] logging.py:143 >> {'loss': 0.4597, 'learning_rate': 2.2462e-05, 'epoch': 1.61, 'throughput': 9997.03} [INFO|2025-03-20 10:30:30] logging.py:143 >> {'loss': 0.3979, 'learning_rate': 2.2448e-05, 'epoch': 1.61, 'throughput': 9997.02} [INFO|2025-03-20 10:31:10] logging.py:143 >> {'loss': 0.4337, 'learning_rate': 2.2434e-05, 'epoch': 1.61, 'throughput': 9997.04} [INFO|2025-03-20 10:31:50] logging.py:143 >> {'loss': 0.4295, 'learning_rate': 2.2420e-05, 'epoch': 1.61, 'throughput': 9997.10} [INFO|2025-03-20 10:32:30] logging.py:143 >> {'loss': 0.4184, 'learning_rate': 2.2406e-05, 'epoch': 1.61, 'throughput': 9997.19} [INFO|2025-03-20 10:33:11] logging.py:143 >> {'loss': 0.4540, 'learning_rate': 2.2392e-05, 'epoch': 1.61, 'throughput': 9997.16} [INFO|2025-03-20 10:33:52] logging.py:143 >> {'loss': 0.4115, 'learning_rate': 2.2378e-05, 'epoch': 1.61, 'throughput': 9997.01} [INFO|2025-03-20 10:34:32] logging.py:143 >> {'loss': 0.4152, 'learning_rate': 2.2364e-05, 'epoch': 1.61, 'throughput': 9997.02} [INFO|2025-03-20 10:35:12] logging.py:143 >> {'loss': 0.4094, 'learning_rate': 2.2350e-05, 'epoch': 1.61, 'throughput': 9997.01} [INFO|2025-03-20 10:35:54] logging.py:143 >> {'loss': 0.4040, 'learning_rate': 2.2337e-05, 'epoch': 1.61, 'throughput': 9996.97} [INFO|2025-03-20 10:36:34] logging.py:143 >> {'loss': 0.4166, 'learning_rate': 2.2323e-05, 'epoch': 1.61, 'throughput': 9997.03} [INFO|2025-03-20 10:37:14] logging.py:143 >> {'loss': 0.4564, 'learning_rate': 2.2309e-05, 'epoch': 1.61, 'throughput': 9997.07} [INFO|2025-03-20 10:37:55] logging.py:143 >> {'loss': 0.4314, 'learning_rate': 2.2295e-05, 'epoch': 1.61, 'throughput': 9997.03} [INFO|2025-03-20 10:38:36] logging.py:143 >> {'loss': 0.4409, 'learning_rate': 2.2281e-05, 'epoch': 1.61, 'throughput': 9997.06} [INFO|2025-03-20 10:39:17] logging.py:143 >> {'loss': 0.4269, 'learning_rate': 2.2267e-05, 'epoch': 1.61, 'throughput': 9996.99} [INFO|2025-03-20 10:39:58] logging.py:143 >> {'loss': 0.4664, 'learning_rate': 2.2253e-05, 'epoch': 1.62, 'throughput': 9997.03} [INFO|2025-03-20 10:40:37] logging.py:143 >> {'loss': 0.4420, 'learning_rate': 2.2239e-05, 'epoch': 1.62, 'throughput': 9997.10} [INFO|2025-03-20 10:41:18] logging.py:143 >> {'loss': 0.3959, 'learning_rate': 2.2225e-05, 'epoch': 1.62, 'throughput': 9997.06} [INFO|2025-03-20 10:41:58] logging.py:143 >> {'loss': 0.4271, 'learning_rate': 2.2211e-05, 'epoch': 1.62, 'throughput': 9997.07} [INFO|2025-03-20 10:42:38] logging.py:143 >> {'loss': 0.4610, 'learning_rate': 2.2197e-05, 'epoch': 1.62, 'throughput': 9997.12} [INFO|2025-03-20 10:43:20] logging.py:143 >> {'loss': 0.4604, 'learning_rate': 2.2184e-05, 'epoch': 1.62, 'throughput': 9997.15} [INFO|2025-03-20 10:44:00] logging.py:143 >> {'loss': 0.4134, 'learning_rate': 2.2170e-05, 'epoch': 1.62, 'throughput': 9997.18} [INFO|2025-03-20 10:44:40] logging.py:143 >> {'loss': 0.4364, 'learning_rate': 2.2156e-05, 'epoch': 1.62, 'throughput': 9997.21} [INFO|2025-03-20 10:45:20] logging.py:143 >> {'loss': 0.4272, 'learning_rate': 2.2142e-05, 'epoch': 1.62, 'throughput': 9997.26} [INFO|2025-03-20 10:46:01] logging.py:143 >> {'loss': 0.4133, 'learning_rate': 2.2128e-05, 'epoch': 1.62, 'throughput': 9997.19} [INFO|2025-03-20 10:46:41] logging.py:143 >> {'loss': 0.4389, 'learning_rate': 2.2114e-05, 'epoch': 1.62, 'throughput': 9997.30} [INFO|2025-03-20 10:47:21] logging.py:143 >> {'loss': 0.4221, 'learning_rate': 2.2100e-05, 'epoch': 1.62, 'throughput': 9997.27} [INFO|2025-03-20 10:48:01] logging.py:143 >> {'loss': 0.4049, 'learning_rate': 2.2086e-05, 'epoch': 1.62, 'throughput': 9997.34} [INFO|2025-03-20 10:48:39] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 2.2072e-05, 'epoch': 1.62, 'throughput': 9997.40} [INFO|2025-03-20 10:49:19] logging.py:143 >> {'loss': 0.4276, 'learning_rate': 2.2058e-05, 'epoch': 1.62, 'throughput': 9997.41} [INFO|2025-03-20 10:49:59] logging.py:143 >> {'loss': 0.4324, 'learning_rate': 2.2045e-05, 'epoch': 1.62, 'throughput': 9997.44} [INFO|2025-03-20 10:50:39] logging.py:143 >> {'loss': 0.3951, 'learning_rate': 2.2031e-05, 'epoch': 1.62, 'throughput': 9997.44} [INFO|2025-03-20 10:51:19] logging.py:143 >> {'loss': 0.4505, 'learning_rate': 2.2017e-05, 'epoch': 1.62, 'throughput': 9997.42} [INFO|2025-03-20 10:52:00] logging.py:143 >> {'loss': 0.4683, 'learning_rate': 2.2003e-05, 'epoch': 1.62, 'throughput': 9997.38} [INFO|2025-03-20 10:52:40] logging.py:143 >> {'loss': 0.4469, 'learning_rate': 2.1989e-05, 'epoch': 1.63, 'throughput': 9997.47} [INFO|2025-03-20 10:53:20] logging.py:143 >> {'loss': 0.4364, 'learning_rate': 2.1975e-05, 'epoch': 1.63, 'throughput': 9997.48} [INFO|2025-03-20 10:54:00] logging.py:143 >> {'loss': 0.4590, 'learning_rate': 2.1961e-05, 'epoch': 1.63, 'throughput': 9997.48} [INFO|2025-03-20 10:54:40] logging.py:143 >> {'loss': 0.4376, 'learning_rate': 2.1947e-05, 'epoch': 1.63, 'throughput': 9997.52} [INFO|2025-03-20 10:55:20] logging.py:143 >> {'loss': 0.4359, 'learning_rate': 2.1933e-05, 'epoch': 1.63, 'throughput': 9997.53} [INFO|2025-03-20 10:55:59] logging.py:143 >> {'loss': 0.4269, 'learning_rate': 2.1919e-05, 'epoch': 1.63, 'throughput': 9997.56} [INFO|2025-03-20 10:56:38] logging.py:143 >> {'loss': 0.4180, 'learning_rate': 2.1906e-05, 'epoch': 1.63, 'throughput': 9997.62} [INFO|2025-03-20 10:57:19] logging.py:143 >> {'loss': 0.4252, 'learning_rate': 2.1892e-05, 'epoch': 1.63, 'throughput': 9997.69} [INFO|2025-03-20 10:57:57] logging.py:143 >> {'loss': 0.3980, 'learning_rate': 2.1878e-05, 'epoch': 1.63, 'throughput': 9997.89} [INFO|2025-03-20 10:58:38] logging.py:143 >> {'loss': 0.4274, 'learning_rate': 2.1864e-05, 'epoch': 1.63, 'throughput': 9997.87} [INFO|2025-03-20 10:59:18] logging.py:143 >> {'loss': 0.4423, 'learning_rate': 2.1850e-05, 'epoch': 1.63, 'throughput': 9997.89} [INFO|2025-03-20 10:59:58] logging.py:143 >> {'loss': 0.4437, 'learning_rate': 2.1836e-05, 'epoch': 1.63, 'throughput': 9997.93} [INFO|2025-03-20 11:00:39] logging.py:143 >> {'loss': 0.4511, 'learning_rate': 2.1822e-05, 'epoch': 1.63, 'throughput': 9997.89} [INFO|2025-03-20 11:01:19] logging.py:143 >> {'loss': 0.4123, 'learning_rate': 2.1808e-05, 'epoch': 1.63, 'throughput': 9997.89} [INFO|2025-03-20 11:01:59] logging.py:143 >> {'loss': 0.4226, 'learning_rate': 2.1795e-05, 'epoch': 1.63, 'throughput': 9997.91} [INFO|2025-03-20 11:02:40] logging.py:143 >> {'loss': 0.4315, 'learning_rate': 2.1781e-05, 'epoch': 1.63, 'throughput': 9997.90} [INFO|2025-03-20 11:03:21] logging.py:143 >> {'loss': 0.4085, 'learning_rate': 2.1767e-05, 'epoch': 1.63, 'throughput': 9997.89} [INFO|2025-03-20 11:04:02] logging.py:143 >> {'loss': 0.4471, 'learning_rate': 2.1753e-05, 'epoch': 1.63, 'throughput': 9997.90} [INFO|2025-03-20 11:04:41] logging.py:143 >> {'loss': 0.4647, 'learning_rate': 2.1739e-05, 'epoch': 1.63, 'throughput': 9997.97} [INFO|2025-03-20 11:05:22] logging.py:143 >> {'loss': 0.4291, 'learning_rate': 2.1725e-05, 'epoch': 1.64, 'throughput': 9997.99} [INFO|2025-03-20 11:06:02] logging.py:143 >> {'loss': 0.4359, 'learning_rate': 2.1711e-05, 'epoch': 1.64, 'throughput': 9997.97} [INFO|2025-03-20 11:06:44] logging.py:143 >> {'loss': 0.4522, 'learning_rate': 2.1697e-05, 'epoch': 1.64, 'throughput': 9997.99} [INFO|2025-03-20 11:07:24] logging.py:143 >> {'loss': 0.4251, 'learning_rate': 2.1684e-05, 'epoch': 1.64, 'throughput': 9998.06} [INFO|2025-03-20 11:08:05] logging.py:143 >> {'loss': 0.3948, 'learning_rate': 2.1670e-05, 'epoch': 1.64, 'throughput': 9998.02} [INFO|2025-03-20 11:08:47] logging.py:143 >> {'loss': 0.4275, 'learning_rate': 2.1656e-05, 'epoch': 1.64, 'throughput': 9997.90} [INFO|2025-03-20 11:09:26] logging.py:143 >> {'loss': 0.4321, 'learning_rate': 2.1642e-05, 'epoch': 1.64, 'throughput': 9997.89} [INFO|2025-03-20 11:10:06] logging.py:143 >> {'loss': 0.4507, 'learning_rate': 2.1628e-05, 'epoch': 1.64, 'throughput': 9997.84} [INFO|2025-03-20 11:10:46] logging.py:143 >> {'loss': 0.4201, 'learning_rate': 2.1614e-05, 'epoch': 1.64, 'throughput': 9997.93} [INFO|2025-03-20 11:11:26] logging.py:143 >> {'loss': 0.4403, 'learning_rate': 2.1600e-05, 'epoch': 1.64, 'throughput': 9997.92} [INFO|2025-03-20 11:12:06] logging.py:143 >> {'loss': 0.4320, 'learning_rate': 2.1586e-05, 'epoch': 1.64, 'throughput': 9997.94} [INFO|2025-03-20 11:12:46] logging.py:143 >> {'loss': 0.4492, 'learning_rate': 2.1573e-05, 'epoch': 1.64, 'throughput': 9998.05} [INFO|2025-03-20 11:13:27] logging.py:143 >> {'loss': 0.4520, 'learning_rate': 2.1559e-05, 'epoch': 1.64, 'throughput': 9998.03} [INFO|2025-03-20 11:14:07] logging.py:143 >> {'loss': 0.3945, 'learning_rate': 2.1545e-05, 'epoch': 1.64, 'throughput': 9998.04} [INFO|2025-03-20 11:14:48] logging.py:143 >> {'loss': 0.4297, 'learning_rate': 2.1531e-05, 'epoch': 1.64, 'throughput': 9998.02} [INFO|2025-03-20 11:15:29] logging.py:143 >> {'loss': 0.4128, 'learning_rate': 2.1517e-05, 'epoch': 1.64, 'throughput': 9998.02} [INFO|2025-03-20 11:16:10] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 2.1503e-05, 'epoch': 1.64, 'throughput': 9997.96} [INFO|2025-03-20 11:16:51] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 2.1489e-05, 'epoch': 1.64, 'throughput': 9997.93} [INFO|2025-03-20 11:17:32] logging.py:143 >> {'loss': 0.4296, 'learning_rate': 2.1476e-05, 'epoch': 1.64, 'throughput': 9997.84} [INFO|2025-03-20 11:18:13] logging.py:143 >> {'loss': 0.4059, 'learning_rate': 2.1462e-05, 'epoch': 1.65, 'throughput': 9997.91} [INFO|2025-03-20 11:18:54] logging.py:143 >> {'loss': 0.4175, 'learning_rate': 2.1448e-05, 'epoch': 1.65, 'throughput': 9997.85} [INFO|2025-03-20 11:19:34] logging.py:143 >> {'loss': 0.4419, 'learning_rate': 2.1434e-05, 'epoch': 1.65, 'throughput': 9997.84} [INFO|2025-03-20 11:20:15] logging.py:143 >> {'loss': 0.4307, 'learning_rate': 2.1420e-05, 'epoch': 1.65, 'throughput': 9997.80} [INFO|2025-03-20 11:20:57] logging.py:143 >> {'loss': 0.4248, 'learning_rate': 2.1406e-05, 'epoch': 1.65, 'throughput': 9997.67} [INFO|2025-03-20 11:21:36] logging.py:143 >> {'loss': 0.3990, 'learning_rate': 2.1393e-05, 'epoch': 1.65, 'throughput': 9997.69} [INFO|2025-03-20 11:22:16] logging.py:143 >> {'loss': 0.4427, 'learning_rate': 2.1379e-05, 'epoch': 1.65, 'throughput': 9997.77} [INFO|2025-03-20 11:22:57] logging.py:143 >> {'loss': 0.4328, 'learning_rate': 2.1365e-05, 'epoch': 1.65, 'throughput': 9997.75} [INFO|2025-03-20 11:23:37] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 2.1351e-05, 'epoch': 1.65, 'throughput': 9997.83} [INFO|2025-03-20 11:24:17] logging.py:143 >> {'loss': 0.4447, 'learning_rate': 2.1337e-05, 'epoch': 1.65, 'throughput': 9997.86} [INFO|2025-03-20 11:24:58] logging.py:143 >> {'loss': 0.4536, 'learning_rate': 2.1323e-05, 'epoch': 1.65, 'throughput': 9997.86} [INFO|2025-03-20 11:25:38] logging.py:143 >> {'loss': 0.4545, 'learning_rate': 2.1309e-05, 'epoch': 1.65, 'throughput': 9998.00} [INFO|2025-03-20 11:26:18] logging.py:143 >> {'loss': 0.4512, 'learning_rate': 2.1296e-05, 'epoch': 1.65, 'throughput': 9998.07} [INFO|2025-03-20 11:26:58] logging.py:143 >> {'loss': 0.4398, 'learning_rate': 2.1282e-05, 'epoch': 1.65, 'throughput': 9998.06} [INFO|2025-03-20 11:27:38] logging.py:143 >> {'loss': 0.4142, 'learning_rate': 2.1268e-05, 'epoch': 1.65, 'throughput': 9998.10} [INFO|2025-03-20 11:28:19] logging.py:143 >> {'loss': 0.3980, 'learning_rate': 2.1254e-05, 'epoch': 1.65, 'throughput': 9998.09} [INFO|2025-03-20 11:29:00] logging.py:143 >> {'loss': 0.4363, 'learning_rate': 2.1240e-05, 'epoch': 1.65, 'throughput': 9998.11} [INFO|2025-03-20 11:29:40] logging.py:143 >> {'loss': 0.4380, 'learning_rate': 2.1226e-05, 'epoch': 1.65, 'throughput': 9998.15} [INFO|2025-03-20 11:30:19] logging.py:143 >> {'loss': 0.4089, 'learning_rate': 2.1213e-05, 'epoch': 1.65, 'throughput': 9998.26} [INFO|2025-03-20 11:30:58] logging.py:143 >> {'loss': 0.4254, 'learning_rate': 2.1199e-05, 'epoch': 1.66, 'throughput': 9998.33} [INFO|2025-03-20 11:31:37] logging.py:143 >> {'loss': 0.3946, 'learning_rate': 2.1185e-05, 'epoch': 1.66, 'throughput': 9998.46} [INFO|2025-03-20 11:32:18] logging.py:143 >> {'loss': 0.4416, 'learning_rate': 2.1171e-05, 'epoch': 1.66, 'throughput': 9998.51} [INFO|2025-03-20 11:32:58] logging.py:143 >> {'loss': 0.4472, 'learning_rate': 2.1157e-05, 'epoch': 1.66, 'throughput': 9998.47} [INFO|2025-03-20 11:33:38] logging.py:143 >> {'loss': 0.4392, 'learning_rate': 2.1143e-05, 'epoch': 1.66, 'throughput': 9998.49} [INFO|2025-03-20 11:34:20] logging.py:143 >> {'loss': 0.4281, 'learning_rate': 2.1130e-05, 'epoch': 1.66, 'throughput': 9998.43} [INFO|2025-03-20 11:35:00] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 2.1116e-05, 'epoch': 1.66, 'throughput': 9998.45} [INFO|2025-03-20 11:35:39] logging.py:143 >> {'loss': 0.4373, 'learning_rate': 2.1102e-05, 'epoch': 1.66, 'throughput': 9998.48} [INFO|2025-03-20 11:36:19] logging.py:143 >> {'loss': 0.4209, 'learning_rate': 2.1088e-05, 'epoch': 1.66, 'throughput': 9998.44} [INFO|2025-03-20 11:36:59] logging.py:143 >> {'loss': 0.4250, 'learning_rate': 2.1074e-05, 'epoch': 1.66, 'throughput': 9998.46} [INFO|2025-03-20 11:37:39] logging.py:143 >> {'loss': 0.4208, 'learning_rate': 2.1061e-05, 'epoch': 1.66, 'throughput': 9998.52} [INFO|2025-03-20 11:38:19] logging.py:143 >> {'loss': 0.4198, 'learning_rate': 2.1047e-05, 'epoch': 1.66, 'throughput': 9998.47} [INFO|2025-03-20 11:39:01] logging.py:143 >> {'loss': 0.3988, 'learning_rate': 2.1033e-05, 'epoch': 1.66, 'throughput': 9998.44} [INFO|2025-03-20 11:39:41] logging.py:143 >> {'loss': 0.4293, 'learning_rate': 2.1019e-05, 'epoch': 1.66, 'throughput': 9998.48} [INFO|2025-03-20 11:40:22] logging.py:143 >> {'loss': 0.4227, 'learning_rate': 2.1005e-05, 'epoch': 1.66, 'throughput': 9998.47} [INFO|2025-03-20 11:41:03] logging.py:143 >> {'loss': 0.4463, 'learning_rate': 2.0991e-05, 'epoch': 1.66, 'throughput': 9998.53} [INFO|2025-03-20 11:41:43] logging.py:143 >> {'loss': 0.4115, 'learning_rate': 2.0978e-05, 'epoch': 1.66, 'throughput': 9998.48} [INFO|2025-03-20 11:42:25] logging.py:143 >> {'loss': 0.4376, 'learning_rate': 2.0964e-05, 'epoch': 1.66, 'throughput': 9998.45} [INFO|2025-03-20 11:43:06] logging.py:143 >> {'loss': 0.4397, 'learning_rate': 2.0950e-05, 'epoch': 1.66, 'throughput': 9998.43} [INFO|2025-03-20 11:43:47] logging.py:143 >> {'loss': 0.4390, 'learning_rate': 2.0936e-05, 'epoch': 1.67, 'throughput': 9998.39} [INFO|2025-03-20 11:44:26] logging.py:143 >> {'loss': 0.4163, 'learning_rate': 2.0922e-05, 'epoch': 1.67, 'throughput': 9998.40} [INFO|2025-03-20 11:45:06] logging.py:143 >> {'loss': 0.4264, 'learning_rate': 2.0909e-05, 'epoch': 1.67, 'throughput': 9998.47} [INFO|2025-03-20 11:45:47] logging.py:143 >> {'loss': 0.4184, 'learning_rate': 2.0895e-05, 'epoch': 1.67, 'throughput': 9998.43} [INFO|2025-03-20 11:46:29] logging.py:143 >> {'loss': 0.4516, 'learning_rate': 2.0881e-05, 'epoch': 1.67, 'throughput': 9998.47} [INFO|2025-03-20 11:47:09] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 2.0867e-05, 'epoch': 1.67, 'throughput': 9998.52} [INFO|2025-03-20 11:47:49] logging.py:143 >> {'loss': 0.4156, 'learning_rate': 2.0853e-05, 'epoch': 1.67, 'throughput': 9998.56} [INFO|2025-03-20 11:48:30] logging.py:143 >> {'loss': 0.4532, 'learning_rate': 2.0840e-05, 'epoch': 1.67, 'throughput': 9998.70} [INFO|2025-03-20 11:49:10] logging.py:143 >> {'loss': 0.4116, 'learning_rate': 2.0826e-05, 'epoch': 1.67, 'throughput': 9998.74} [INFO|2025-03-20 11:49:50] logging.py:143 >> {'loss': 0.4290, 'learning_rate': 2.0812e-05, 'epoch': 1.67, 'throughput': 9998.75} [INFO|2025-03-20 11:50:31] logging.py:143 >> {'loss': 0.4127, 'learning_rate': 2.0798e-05, 'epoch': 1.67, 'throughput': 9998.71} [INFO|2025-03-20 11:51:12] logging.py:143 >> {'loss': 0.4157, 'learning_rate': 2.0784e-05, 'epoch': 1.67, 'throughput': 9998.72} [INFO|2025-03-20 11:51:52] logging.py:143 >> {'loss': 0.4143, 'learning_rate': 2.0771e-05, 'epoch': 1.67, 'throughput': 9998.81} [INFO|2025-03-20 11:52:32] logging.py:143 >> {'loss': 0.4206, 'learning_rate': 2.0757e-05, 'epoch': 1.67, 'throughput': 9998.83} [INFO|2025-03-20 11:53:12] logging.py:143 >> {'loss': 0.4480, 'learning_rate': 2.0743e-05, 'epoch': 1.67, 'throughput': 9998.87} [INFO|2025-03-20 11:53:52] logging.py:143 >> {'loss': 0.4356, 'learning_rate': 2.0729e-05, 'epoch': 1.67, 'throughput': 9998.81} [INFO|2025-03-20 11:54:33] logging.py:143 >> {'loss': 0.4128, 'learning_rate': 2.0715e-05, 'epoch': 1.67, 'throughput': 9998.79} [INFO|2025-03-20 11:55:13] logging.py:143 >> {'loss': 0.4109, 'learning_rate': 2.0702e-05, 'epoch': 1.67, 'throughput': 9998.75} [INFO|2025-03-20 11:55:54] logging.py:143 >> {'loss': 0.3991, 'learning_rate': 2.0688e-05, 'epoch': 1.67, 'throughput': 9998.76} [INFO|2025-03-20 11:56:32] logging.py:143 >> {'loss': 0.4107, 'learning_rate': 2.0674e-05, 'epoch': 1.68, 'throughput': 9998.78} [INFO|2025-03-20 11:57:11] logging.py:143 >> {'loss': 0.4346, 'learning_rate': 2.0660e-05, 'epoch': 1.68, 'throughput': 9998.88} [INFO|2025-03-20 11:57:52] logging.py:143 >> {'loss': 0.4590, 'learning_rate': 2.0647e-05, 'epoch': 1.68, 'throughput': 9998.92} [INFO|2025-03-20 11:58:32] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 2.0633e-05, 'epoch': 1.68, 'throughput': 9998.97} [INFO|2025-03-20 11:59:14] logging.py:143 >> {'loss': 0.4326, 'learning_rate': 2.0619e-05, 'epoch': 1.68, 'throughput': 9998.89} [INFO|2025-03-20 11:59:54] logging.py:143 >> {'loss': 0.4162, 'learning_rate': 2.0605e-05, 'epoch': 1.68, 'throughput': 9998.94} [INFO|2025-03-20 12:00:36] logging.py:143 >> {'loss': 0.4378, 'learning_rate': 2.0591e-05, 'epoch': 1.68, 'throughput': 9998.96} [INFO|2025-03-20 12:01:14] logging.py:143 >> {'loss': 0.4359, 'learning_rate': 2.0578e-05, 'epoch': 1.68, 'throughput': 9999.05} [INFO|2025-03-20 12:01:55] logging.py:143 >> {'loss': 0.4195, 'learning_rate': 2.0564e-05, 'epoch': 1.68, 'throughput': 9999.09} [INFO|2025-03-20 12:02:36] logging.py:143 >> {'loss': 0.4030, 'learning_rate': 2.0550e-05, 'epoch': 1.68, 'throughput': 9999.02} [INFO|2025-03-20 12:03:15] logging.py:143 >> {'loss': 0.4235, 'learning_rate': 2.0536e-05, 'epoch': 1.68, 'throughput': 9999.02} [INFO|2025-03-20 12:03:55] logging.py:143 >> {'loss': 0.4249, 'learning_rate': 2.0523e-05, 'epoch': 1.68, 'throughput': 9999.01} [INFO|2025-03-20 12:04:36] logging.py:143 >> {'loss': 0.4393, 'learning_rate': 2.0509e-05, 'epoch': 1.68, 'throughput': 9999.07} [INFO|2025-03-20 12:05:17] logging.py:143 >> {'loss': 0.4392, 'learning_rate': 2.0495e-05, 'epoch': 1.68, 'throughput': 9999.06} [INFO|2025-03-20 12:05:57] logging.py:143 >> {'loss': 0.3983, 'learning_rate': 2.0481e-05, 'epoch': 1.68, 'throughput': 9999.05} [INFO|2025-03-20 12:06:37] logging.py:143 >> {'loss': 0.3984, 'learning_rate': 2.0468e-05, 'epoch': 1.68, 'throughput': 9999.11} [INFO|2025-03-20 12:07:17] logging.py:143 >> {'loss': 0.4311, 'learning_rate': 2.0454e-05, 'epoch': 1.68, 'throughput': 9999.11} [INFO|2025-03-20 12:07:57] logging.py:143 >> {'loss': 0.4250, 'learning_rate': 2.0440e-05, 'epoch': 1.68, 'throughput': 9999.12} [INFO|2025-03-20 12:08:37] logging.py:143 >> {'loss': 0.4382, 'learning_rate': 2.0426e-05, 'epoch': 1.69, 'throughput': 9999.08} [INFO|2025-03-20 12:09:16] logging.py:143 >> {'loss': 0.4072, 'learning_rate': 2.0412e-05, 'epoch': 1.69, 'throughput': 9999.14} [INFO|2025-03-20 12:09:56] logging.py:143 >> {'loss': 0.4454, 'learning_rate': 2.0399e-05, 'epoch': 1.69, 'throughput': 9999.23} [INFO|2025-03-20 12:10:37] logging.py:143 >> {'loss': 0.4382, 'learning_rate': 2.0385e-05, 'epoch': 1.69, 'throughput': 9999.21} [INFO|2025-03-20 12:11:18] logging.py:143 >> {'loss': 0.4382, 'learning_rate': 2.0371e-05, 'epoch': 1.69, 'throughput': 9999.20} [INFO|2025-03-20 12:11:59] logging.py:143 >> {'loss': 0.3827, 'learning_rate': 2.0357e-05, 'epoch': 1.69, 'throughput': 9999.17} [INFO|2025-03-20 12:12:39] logging.py:143 >> {'loss': 0.4258, 'learning_rate': 2.0344e-05, 'epoch': 1.69, 'throughput': 9999.12} [INFO|2025-03-20 12:13:20] logging.py:143 >> {'loss': 0.4431, 'learning_rate': 2.0330e-05, 'epoch': 1.69, 'throughput': 9999.12} [INFO|2025-03-20 12:14:02] logging.py:143 >> {'loss': 0.4480, 'learning_rate': 2.0316e-05, 'epoch': 1.69, 'throughput': 9999.01} [INFO|2025-03-20 12:14:43] logging.py:143 >> {'loss': 0.4293, 'learning_rate': 2.0302e-05, 'epoch': 1.69, 'throughput': 9999.09} [INFO|2025-03-20 12:15:23] logging.py:143 >> {'loss': 0.4198, 'learning_rate': 2.0289e-05, 'epoch': 1.69, 'throughput': 9999.08} [INFO|2025-03-20 12:16:03] logging.py:143 >> {'loss': 0.4164, 'learning_rate': 2.0275e-05, 'epoch': 1.69, 'throughput': 9999.11} [INFO|2025-03-20 12:16:43] logging.py:143 >> {'loss': 0.4011, 'learning_rate': 2.0261e-05, 'epoch': 1.69, 'throughput': 9999.09} [INFO|2025-03-20 12:17:24] logging.py:143 >> {'loss': 0.4237, 'learning_rate': 2.0248e-05, 'epoch': 1.69, 'throughput': 9999.16} [INFO|2025-03-20 12:18:04] logging.py:143 >> {'loss': 0.4204, 'learning_rate': 2.0234e-05, 'epoch': 1.69, 'throughput': 9999.18} [INFO|2025-03-20 12:18:44] logging.py:143 >> {'loss': 0.4165, 'learning_rate': 2.0220e-05, 'epoch': 1.69, 'throughput': 9999.24} [INFO|2025-03-20 12:19:23] logging.py:143 >> {'loss': 0.4254, 'learning_rate': 2.0206e-05, 'epoch': 1.69, 'throughput': 9999.35} [INFO|2025-03-20 12:20:03] logging.py:143 >> {'loss': 0.4075, 'learning_rate': 2.0193e-05, 'epoch': 1.69, 'throughput': 9999.39} [INFO|2025-03-20 12:20:43] logging.py:143 >> {'loss': 0.4327, 'learning_rate': 2.0179e-05, 'epoch': 1.69, 'throughput': 9999.38} [INFO|2025-03-20 12:21:25] logging.py:143 >> {'loss': 0.4226, 'learning_rate': 2.0165e-05, 'epoch': 1.70, 'throughput': 9999.34} [INFO|2025-03-20 12:22:04] logging.py:143 >> {'loss': 0.4252, 'learning_rate': 2.0151e-05, 'epoch': 1.70, 'throughput': 9999.32} [INFO|2025-03-20 12:22:46] logging.py:143 >> {'loss': 0.3867, 'learning_rate': 2.0138e-05, 'epoch': 1.70, 'throughput': 9999.34} [INFO|2025-03-20 12:23:25] logging.py:143 >> {'loss': 0.4260, 'learning_rate': 2.0124e-05, 'epoch': 1.70, 'throughput': 9999.41} [INFO|2025-03-20 12:24:06] logging.py:143 >> {'loss': 0.4010, 'learning_rate': 2.0110e-05, 'epoch': 1.70, 'throughput': 9999.42} [INFO|2025-03-20 12:24:47] logging.py:143 >> {'loss': 0.4232, 'learning_rate': 2.0096e-05, 'epoch': 1.70, 'throughput': 9999.53} [INFO|2025-03-20 12:25:27] logging.py:143 >> {'loss': 0.4401, 'learning_rate': 2.0083e-05, 'epoch': 1.70, 'throughput': 9999.54} [INFO|2025-03-20 12:26:07] logging.py:143 >> {'loss': 0.4335, 'learning_rate': 2.0069e-05, 'epoch': 1.70, 'throughput': 9999.55} [INFO|2025-03-20 12:26:48] logging.py:143 >> {'loss': 0.4064, 'learning_rate': 2.0055e-05, 'epoch': 1.70, 'throughput': 9999.48} [INFO|2025-03-20 12:27:29] logging.py:143 >> {'loss': 0.4036, 'learning_rate': 2.0042e-05, 'epoch': 1.70, 'throughput': 9999.44} [INFO|2025-03-20 12:28:10] logging.py:143 >> {'loss': 0.4386, 'learning_rate': 2.0028e-05, 'epoch': 1.70, 'throughput': 9999.49} [INFO|2025-03-20 12:28:51] logging.py:143 >> {'loss': 0.4142, 'learning_rate': 2.0014e-05, 'epoch': 1.70, 'throughput': 9999.41} [INFO|2025-03-20 12:29:31] logging.py:143 >> {'loss': 0.4285, 'learning_rate': 2.0000e-05, 'epoch': 1.70, 'throughput': 9999.44} [INFO|2025-03-20 12:30:12] logging.py:143 >> {'loss': 0.4069, 'learning_rate': 1.9987e-05, 'epoch': 1.70, 'throughput': 9999.43} [INFO|2025-03-20 12:30:54] logging.py:143 >> {'loss': 0.4400, 'learning_rate': 1.9973e-05, 'epoch': 1.70, 'throughput': 9999.35} [INFO|2025-03-20 12:31:34] logging.py:143 >> {'loss': 0.4137, 'learning_rate': 1.9959e-05, 'epoch': 1.70, 'throughput': 9999.33} [INFO|2025-03-20 12:32:16] logging.py:143 >> {'loss': 0.4065, 'learning_rate': 1.9946e-05, 'epoch': 1.70, 'throughput': 9999.24} [INFO|2025-03-20 12:32:56] logging.py:143 >> {'loss': 0.4040, 'learning_rate': 1.9932e-05, 'epoch': 1.70, 'throughput': 9999.22} [INFO|2025-03-20 12:33:36] logging.py:143 >> {'loss': 0.3978, 'learning_rate': 1.9918e-05, 'epoch': 1.70, 'throughput': 9999.27} [INFO|2025-03-20 12:34:17] logging.py:143 >> {'loss': 0.4113, 'learning_rate': 1.9905e-05, 'epoch': 1.71, 'throughput': 9999.30} [INFO|2025-03-20 12:34:56] logging.py:143 >> {'loss': 0.3927, 'learning_rate': 1.9891e-05, 'epoch': 1.71, 'throughput': 9999.40} [INFO|2025-03-20 12:35:36] logging.py:143 >> {'loss': 0.4178, 'learning_rate': 1.9877e-05, 'epoch': 1.71, 'throughput': 9999.45} [INFO|2025-03-20 12:36:16] logging.py:143 >> {'loss': 0.4501, 'learning_rate': 1.9863e-05, 'epoch': 1.71, 'throughput': 9999.51} [INFO|2025-03-20 12:36:56] logging.py:143 >> {'loss': 0.4224, 'learning_rate': 1.9850e-05, 'epoch': 1.71, 'throughput': 9999.56} [INFO|2025-03-20 12:37:37] logging.py:143 >> {'loss': 0.4356, 'learning_rate': 1.9836e-05, 'epoch': 1.71, 'throughput': 9999.54} [INFO|2025-03-20 12:38:19] logging.py:143 >> {'loss': 0.4157, 'learning_rate': 1.9822e-05, 'epoch': 1.71, 'throughput': 9999.50} [INFO|2025-03-20 12:39:01] logging.py:143 >> {'loss': 0.4252, 'learning_rate': 1.9809e-05, 'epoch': 1.71, 'throughput': 9999.44} [INFO|2025-03-20 12:39:42] logging.py:143 >> {'loss': 0.4111, 'learning_rate': 1.9795e-05, 'epoch': 1.71, 'throughput': 9999.39} [INFO|2025-03-20 12:40:24] logging.py:143 >> {'loss': 0.4222, 'learning_rate': 1.9781e-05, 'epoch': 1.71, 'throughput': 9999.28} [INFO|2025-03-20 12:41:05] logging.py:143 >> {'loss': 0.4144, 'learning_rate': 1.9768e-05, 'epoch': 1.71, 'throughput': 9999.28} [INFO|2025-03-20 12:41:45] logging.py:143 >> {'loss': 0.3989, 'learning_rate': 1.9754e-05, 'epoch': 1.71, 'throughput': 9999.32} [INFO|2025-03-20 12:42:25] logging.py:143 >> {'loss': 0.4280, 'learning_rate': 1.9740e-05, 'epoch': 1.71, 'throughput': 9999.39} [INFO|2025-03-20 12:43:05] logging.py:143 >> {'loss': 0.4298, 'learning_rate': 1.9727e-05, 'epoch': 1.71, 'throughput': 9999.40} [INFO|2025-03-20 12:43:45] logging.py:143 >> {'loss': 0.4415, 'learning_rate': 1.9713e-05, 'epoch': 1.71, 'throughput': 9999.48} [INFO|2025-03-20 12:44:25] logging.py:143 >> {'loss': 0.4328, 'learning_rate': 1.9699e-05, 'epoch': 1.71, 'throughput': 9999.51} [INFO|2025-03-20 12:45:06] logging.py:143 >> {'loss': 0.4563, 'learning_rate': 1.9686e-05, 'epoch': 1.71, 'throughput': 9999.49} [INFO|2025-03-20 12:45:47] logging.py:143 >> {'loss': 0.4196, 'learning_rate': 1.9672e-05, 'epoch': 1.71, 'throughput': 9999.53} [INFO|2025-03-20 12:46:27] logging.py:143 >> {'loss': 0.4408, 'learning_rate': 1.9658e-05, 'epoch': 1.71, 'throughput': 9999.55} [INFO|2025-03-20 12:47:08] logging.py:143 >> {'loss': 0.4322, 'learning_rate': 1.9645e-05, 'epoch': 1.72, 'throughput': 9999.47} [INFO|2025-03-20 12:47:47] logging.py:143 >> {'loss': 0.4554, 'learning_rate': 1.9631e-05, 'epoch': 1.72, 'throughput': 9999.48} [INFO|2025-03-20 12:48:27] logging.py:143 >> {'loss': 0.4228, 'learning_rate': 1.9617e-05, 'epoch': 1.72, 'throughput': 9999.49} [INFO|2025-03-20 12:49:08] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 1.9604e-05, 'epoch': 1.72, 'throughput': 9999.50} [INFO|2025-03-20 12:49:49] logging.py:143 >> {'loss': 0.3992, 'learning_rate': 1.9590e-05, 'epoch': 1.72, 'throughput': 9999.43} [INFO|2025-03-20 12:50:28] logging.py:143 >> {'loss': 0.4076, 'learning_rate': 1.9576e-05, 'epoch': 1.72, 'throughput': 9999.50} [INFO|2025-03-20 12:51:08] logging.py:143 >> {'loss': 0.4207, 'learning_rate': 1.9563e-05, 'epoch': 1.72, 'throughput': 9999.53} [INFO|2025-03-20 12:51:48] logging.py:143 >> {'loss': 0.3954, 'learning_rate': 1.9549e-05, 'epoch': 1.72, 'throughput': 9999.53} [INFO|2025-03-20 12:52:28] logging.py:143 >> {'loss': 0.4426, 'learning_rate': 1.9535e-05, 'epoch': 1.72, 'throughput': 9999.62} [INFO|2025-03-20 12:53:09] logging.py:143 >> {'loss': 0.4387, 'learning_rate': 1.9522e-05, 'epoch': 1.72, 'throughput': 9999.65} [INFO|2025-03-20 12:53:49] logging.py:143 >> {'loss': 0.4291, 'learning_rate': 1.9508e-05, 'epoch': 1.72, 'throughput': 9999.71} [INFO|2025-03-20 12:54:28] logging.py:143 >> {'loss': 0.4337, 'learning_rate': 1.9494e-05, 'epoch': 1.72, 'throughput': 9999.78} [INFO|2025-03-20 12:55:08] logging.py:143 >> {'loss': 0.4014, 'learning_rate': 1.9481e-05, 'epoch': 1.72, 'throughput': 9999.84} [INFO|2025-03-20 12:55:48] logging.py:143 >> {'loss': 0.4398, 'learning_rate': 1.9467e-05, 'epoch': 1.72, 'throughput': 9999.84} [INFO|2025-03-20 12:56:28] logging.py:143 >> {'loss': 0.4392, 'learning_rate': 1.9453e-05, 'epoch': 1.72, 'throughput': 9999.87} [INFO|2025-03-20 12:57:08] logging.py:143 >> {'loss': 0.4111, 'learning_rate': 1.9440e-05, 'epoch': 1.72, 'throughput': 9999.85} [INFO|2025-03-20 12:57:48] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 1.9426e-05, 'epoch': 1.72, 'throughput': 9999.89} [INFO|2025-03-20 12:58:29] logging.py:143 >> {'loss': 0.4185, 'learning_rate': 1.9412e-05, 'epoch': 1.72, 'throughput': 9999.82} [INFO|2025-03-20 12:59:10] logging.py:143 >> {'loss': 0.4241, 'learning_rate': 1.9399e-05, 'epoch': 1.72, 'throughput': 9999.79} [INFO|2025-03-20 12:59:51] logging.py:143 >> {'loss': 0.4343, 'learning_rate': 1.9385e-05, 'epoch': 1.73, 'throughput': 9999.73} [INFO|2025-03-20 13:00:32] logging.py:143 >> {'loss': 0.4301, 'learning_rate': 1.9372e-05, 'epoch': 1.73, 'throughput': 9999.75} [INFO|2025-03-20 13:01:12] logging.py:143 >> {'loss': 0.4535, 'learning_rate': 1.9358e-05, 'epoch': 1.73, 'throughput': 9999.74} [INFO|2025-03-20 13:01:52] logging.py:143 >> {'loss': 0.4188, 'learning_rate': 1.9344e-05, 'epoch': 1.73, 'throughput': 9999.79} [INFO|2025-03-20 13:02:32] logging.py:143 >> {'loss': 0.4190, 'learning_rate': 1.9331e-05, 'epoch': 1.73, 'throughput': 9999.78} [INFO|2025-03-20 13:03:13] logging.py:143 >> {'loss': 0.4046, 'learning_rate': 1.9317e-05, 'epoch': 1.73, 'throughput': 9999.81} [INFO|2025-03-20 13:03:53] logging.py:143 >> {'loss': 0.3942, 'learning_rate': 1.9303e-05, 'epoch': 1.73, 'throughput': 9999.86} [INFO|2025-03-20 13:04:33] logging.py:143 >> {'loss': 0.4031, 'learning_rate': 1.9290e-05, 'epoch': 1.73, 'throughput': 9999.90} [INFO|2025-03-20 13:05:13] logging.py:143 >> {'loss': 0.4571, 'learning_rate': 1.9276e-05, 'epoch': 1.73, 'throughput': 9999.96} [INFO|2025-03-20 13:05:54] logging.py:143 >> {'loss': 0.4115, 'learning_rate': 1.9262e-05, 'epoch': 1.73, 'throughput': 9999.94} [INFO|2025-03-20 13:06:35] logging.py:143 >> {'loss': 0.4067, 'learning_rate': 1.9249e-05, 'epoch': 1.73, 'throughput': 9999.91} [INFO|2025-03-20 13:07:16] logging.py:143 >> {'loss': 0.4399, 'learning_rate': 1.9235e-05, 'epoch': 1.73, 'throughput': 9999.95} [INFO|2025-03-20 13:07:56] logging.py:143 >> {'loss': 0.4326, 'learning_rate': 1.9222e-05, 'epoch': 1.73, 'throughput': 10000.01} [INFO|2025-03-20 13:08:36] logging.py:143 >> {'loss': 0.4158, 'learning_rate': 1.9208e-05, 'epoch': 1.73, 'throughput': 10000.03} [INFO|2025-03-20 13:09:17] logging.py:143 >> {'loss': 0.4410, 'learning_rate': 1.9194e-05, 'epoch': 1.73, 'throughput': 10000.09} [INFO|2025-03-20 13:09:57] logging.py:143 >> {'loss': 0.4174, 'learning_rate': 1.9181e-05, 'epoch': 1.73, 'throughput': 10000.18} [INFO|2025-03-20 13:10:39] logging.py:143 >> {'loss': 0.4497, 'learning_rate': 1.9167e-05, 'epoch': 1.73, 'throughput': 10000.15} [INFO|2025-03-20 13:11:18] logging.py:143 >> {'loss': 0.4290, 'learning_rate': 1.9154e-05, 'epoch': 1.73, 'throughput': 10000.19} [INFO|2025-03-20 13:11:59] logging.py:143 >> {'loss': 0.3919, 'learning_rate': 1.9140e-05, 'epoch': 1.73, 'throughput': 10000.22} [INFO|2025-03-20 13:12:40] logging.py:143 >> {'loss': 0.4091, 'learning_rate': 1.9126e-05, 'epoch': 1.74, 'throughput': 10000.15} [INFO|2025-03-20 13:13:21] logging.py:143 >> {'loss': 0.4209, 'learning_rate': 1.9113e-05, 'epoch': 1.74, 'throughput': 10000.21} [INFO|2025-03-20 13:14:00] logging.py:143 >> {'loss': 0.4402, 'learning_rate': 1.9099e-05, 'epoch': 1.74, 'throughput': 10000.33} [INFO|2025-03-20 13:14:41] logging.py:143 >> {'loss': 0.4031, 'learning_rate': 1.9086e-05, 'epoch': 1.74, 'throughput': 10000.30} [INFO|2025-03-20 13:15:21] logging.py:143 >> {'loss': 0.4274, 'learning_rate': 1.9072e-05, 'epoch': 1.74, 'throughput': 10000.31} [INFO|2025-03-20 13:16:01] logging.py:143 >> {'loss': 0.4200, 'learning_rate': 1.9058e-05, 'epoch': 1.74, 'throughput': 10000.26} [INFO|2025-03-20 13:16:42] logging.py:143 >> {'loss': 0.4305, 'learning_rate': 1.9045e-05, 'epoch': 1.74, 'throughput': 10000.25} [INFO|2025-03-20 13:17:22] logging.py:143 >> {'loss': 0.4039, 'learning_rate': 1.9031e-05, 'epoch': 1.74, 'throughput': 10000.32} [INFO|2025-03-20 13:18:01] logging.py:143 >> {'loss': 0.3999, 'learning_rate': 1.9018e-05, 'epoch': 1.74, 'throughput': 10000.39} [INFO|2025-03-20 13:18:42] logging.py:143 >> {'loss': 0.4449, 'learning_rate': 1.9004e-05, 'epoch': 1.74, 'throughput': 10000.36} [INFO|2025-03-20 13:19:24] logging.py:143 >> {'loss': 0.4364, 'learning_rate': 1.8990e-05, 'epoch': 1.74, 'throughput': 10000.30} [INFO|2025-03-20 13:20:03] logging.py:143 >> {'loss': 0.3913, 'learning_rate': 1.8977e-05, 'epoch': 1.74, 'throughput': 10000.29} [INFO|2025-03-20 13:20:44] logging.py:143 >> {'loss': 0.4173, 'learning_rate': 1.8963e-05, 'epoch': 1.74, 'throughput': 10000.26} [INFO|2025-03-20 13:21:25] logging.py:143 >> {'loss': 0.4261, 'learning_rate': 1.8950e-05, 'epoch': 1.74, 'throughput': 10000.29} [INFO|2025-03-20 13:22:07] logging.py:143 >> {'loss': 0.4303, 'learning_rate': 1.8936e-05, 'epoch': 1.74, 'throughput': 10000.23} [INFO|2025-03-20 13:22:46] logging.py:143 >> {'loss': 0.4300, 'learning_rate': 1.8923e-05, 'epoch': 1.74, 'throughput': 10000.27} [INFO|2025-03-20 13:23:26] logging.py:143 >> {'loss': 0.4344, 'learning_rate': 1.8909e-05, 'epoch': 1.74, 'throughput': 10000.33} [INFO|2025-03-20 13:24:07] logging.py:143 >> {'loss': 0.4109, 'learning_rate': 1.8895e-05, 'epoch': 1.74, 'throughput': 10000.27} [INFO|2025-03-20 13:24:47] logging.py:143 >> {'loss': 0.4097, 'learning_rate': 1.8882e-05, 'epoch': 1.75, 'throughput': 10000.20} [INFO|2025-03-20 13:25:28] logging.py:143 >> {'loss': 0.4341, 'learning_rate': 1.8868e-05, 'epoch': 1.75, 'throughput': 10000.22} [INFO|2025-03-20 13:26:08] logging.py:143 >> {'loss': 0.4503, 'learning_rate': 1.8855e-05, 'epoch': 1.75, 'throughput': 10000.24} [INFO|2025-03-20 13:26:48] logging.py:143 >> {'loss': 0.4169, 'learning_rate': 1.8841e-05, 'epoch': 1.75, 'throughput': 10000.36} [INFO|2025-03-20 13:27:29] logging.py:143 >> {'loss': 0.4186, 'learning_rate': 1.8828e-05, 'epoch': 1.75, 'throughput': 10000.32} [INFO|2025-03-20 13:28:11] logging.py:143 >> {'loss': 0.4310, 'learning_rate': 1.8814e-05, 'epoch': 1.75, 'throughput': 10000.25} [INFO|2025-03-20 13:28:50] logging.py:143 >> {'loss': 0.4438, 'learning_rate': 1.8800e-05, 'epoch': 1.75, 'throughput': 10000.26} [INFO|2025-03-20 13:29:30] logging.py:143 >> {'loss': 0.4272, 'learning_rate': 1.8787e-05, 'epoch': 1.75, 'throughput': 10000.32} [INFO|2025-03-20 13:30:12] logging.py:143 >> {'loss': 0.4044, 'learning_rate': 1.8773e-05, 'epoch': 1.75, 'throughput': 10000.25} [INFO|2025-03-20 13:30:51] logging.py:143 >> {'loss': 0.4412, 'learning_rate': 1.8760e-05, 'epoch': 1.75, 'throughput': 10000.33} [INFO|2025-03-20 13:31:32] logging.py:143 >> {'loss': 0.4180, 'learning_rate': 1.8746e-05, 'epoch': 1.75, 'throughput': 10000.26} [INFO|2025-03-20 13:32:13] logging.py:143 >> {'loss': 0.4439, 'learning_rate': 1.8733e-05, 'epoch': 1.75, 'throughput': 10000.26} [INFO|2025-03-20 13:32:55] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 1.8719e-05, 'epoch': 1.75, 'throughput': 10000.22} [INFO|2025-03-20 13:33:34] logging.py:143 >> {'loss': 0.4024, 'learning_rate': 1.8706e-05, 'epoch': 1.75, 'throughput': 10000.22} [INFO|2025-03-20 13:34:15] logging.py:143 >> {'loss': 0.4052, 'learning_rate': 1.8692e-05, 'epoch': 1.75, 'throughput': 10000.14} [INFO|2025-03-20 13:34:55] logging.py:143 >> {'loss': 0.4337, 'learning_rate': 1.8679e-05, 'epoch': 1.75, 'throughput': 10000.07} [INFO|2025-03-20 13:35:34] logging.py:143 >> {'loss': 0.4361, 'learning_rate': 1.8665e-05, 'epoch': 1.75, 'throughput': 10000.18} [INFO|2025-03-20 13:36:14] logging.py:143 >> {'loss': 0.4360, 'learning_rate': 1.8651e-05, 'epoch': 1.75, 'throughput': 10000.23} [INFO|2025-03-20 13:36:56] logging.py:143 >> {'loss': 0.4180, 'learning_rate': 1.8638e-05, 'epoch': 1.75, 'throughput': 10000.19} [INFO|2025-03-20 13:37:37] logging.py:143 >> {'loss': 0.4105, 'learning_rate': 1.8624e-05, 'epoch': 1.76, 'throughput': 10000.15} [INFO|2025-03-20 13:38:16] logging.py:143 >> {'loss': 0.4271, 'learning_rate': 1.8611e-05, 'epoch': 1.76, 'throughput': 10000.16} [INFO|2025-03-20 13:38:58] logging.py:143 >> {'loss': 0.4327, 'learning_rate': 1.8597e-05, 'epoch': 1.76, 'throughput': 10000.10} [INFO|2025-03-20 13:39:38] logging.py:143 >> {'loss': 0.4269, 'learning_rate': 1.8584e-05, 'epoch': 1.76, 'throughput': 10000.14} [INFO|2025-03-20 13:40:18] logging.py:143 >> {'loss': 0.4210, 'learning_rate': 1.8570e-05, 'epoch': 1.76, 'throughput': 10000.12} [INFO|2025-03-20 13:40:59] logging.py:143 >> {'loss': 0.3896, 'learning_rate': 1.8557e-05, 'epoch': 1.76, 'throughput': 10000.10} [INFO|2025-03-20 13:41:41] logging.py:143 >> {'loss': 0.3913, 'learning_rate': 1.8543e-05, 'epoch': 1.76, 'throughput': 10000.04} [INFO|2025-03-20 13:42:23] logging.py:143 >> {'loss': 0.4115, 'learning_rate': 1.8530e-05, 'epoch': 1.76, 'throughput': 10000.03} [INFO|2025-03-20 13:43:03] logging.py:143 >> {'loss': 0.4220, 'learning_rate': 1.8516e-05, 'epoch': 1.76, 'throughput': 10000.07} [INFO|2025-03-20 13:43:43] logging.py:143 >> {'loss': 0.3836, 'learning_rate': 1.8503e-05, 'epoch': 1.76, 'throughput': 10000.06} [INFO|2025-03-20 13:44:23] logging.py:143 >> {'loss': 0.4318, 'learning_rate': 1.8489e-05, 'epoch': 1.76, 'throughput': 10000.00} [INFO|2025-03-20 13:45:04] logging.py:143 >> {'loss': 0.4121, 'learning_rate': 1.8476e-05, 'epoch': 1.76, 'throughput': 9999.92} [INFO|2025-03-20 13:45:46] logging.py:143 >> {'loss': 0.3983, 'learning_rate': 1.8462e-05, 'epoch': 1.76, 'throughput': 9999.83} [INFO|2025-03-20 13:46:27] logging.py:143 >> {'loss': 0.4410, 'learning_rate': 1.8449e-05, 'epoch': 1.76, 'throughput': 9999.84} [INFO|2025-03-20 13:47:07] logging.py:143 >> {'loss': 0.4434, 'learning_rate': 1.8435e-05, 'epoch': 1.76, 'throughput': 9999.87} [INFO|2025-03-20 13:47:48] logging.py:143 >> {'loss': 0.4369, 'learning_rate': 1.8422e-05, 'epoch': 1.76, 'throughput': 9999.87} [INFO|2025-03-20 13:48:28] logging.py:143 >> {'loss': 0.4340, 'learning_rate': 1.8408e-05, 'epoch': 1.76, 'throughput': 9999.94} [INFO|2025-03-20 13:49:08] logging.py:143 >> {'loss': 0.4264, 'learning_rate': 1.8395e-05, 'epoch': 1.76, 'throughput': 10000.01} [INFO|2025-03-20 13:49:49] logging.py:143 >> {'loss': 0.4190, 'learning_rate': 1.8381e-05, 'epoch': 1.76, 'throughput': 9999.99} [INFO|2025-03-20 13:50:29] logging.py:143 >> {'loss': 0.4173, 'learning_rate': 1.8368e-05, 'epoch': 1.77, 'throughput': 10000.01} [INFO|2025-03-20 13:51:09] logging.py:143 >> {'loss': 0.4172, 'learning_rate': 1.8354e-05, 'epoch': 1.77, 'throughput': 10000.04} [INFO|2025-03-20 13:51:51] logging.py:143 >> {'loss': 0.4024, 'learning_rate': 1.8341e-05, 'epoch': 1.77, 'throughput': 10000.03} [INFO|2025-03-20 13:52:31] logging.py:143 >> {'loss': 0.4008, 'learning_rate': 1.8327e-05, 'epoch': 1.77, 'throughput': 9999.96} [INFO|2025-03-20 13:53:12] logging.py:143 >> {'loss': 0.4329, 'learning_rate': 1.8314e-05, 'epoch': 1.77, 'throughput': 9999.96} [INFO|2025-03-20 13:53:52] logging.py:143 >> {'loss': 0.4155, 'learning_rate': 1.8300e-05, 'epoch': 1.77, 'throughput': 10000.05} [INFO|2025-03-20 13:54:34] logging.py:143 >> {'loss': 0.4098, 'learning_rate': 1.8287e-05, 'epoch': 1.77, 'throughput': 9999.95} [INFO|2025-03-20 13:55:14] logging.py:143 >> {'loss': 0.4303, 'learning_rate': 1.8273e-05, 'epoch': 1.77, 'throughput': 9999.96} [INFO|2025-03-20 13:55:54] logging.py:143 >> {'loss': 0.4343, 'learning_rate': 1.8260e-05, 'epoch': 1.77, 'throughput': 10000.04} [INFO|2025-03-20 13:56:34] logging.py:143 >> {'loss': 0.4309, 'learning_rate': 1.8246e-05, 'epoch': 1.77, 'throughput': 10000.06} [INFO|2025-03-20 13:57:13] logging.py:143 >> {'loss': 0.4280, 'learning_rate': 1.8233e-05, 'epoch': 1.77, 'throughput': 10000.17} [INFO|2025-03-20 13:57:54] logging.py:143 >> {'loss': 0.4419, 'learning_rate': 1.8219e-05, 'epoch': 1.77, 'throughput': 10000.16} [INFO|2025-03-20 13:58:34] logging.py:143 >> {'loss': 0.4434, 'learning_rate': 1.8206e-05, 'epoch': 1.77, 'throughput': 10000.16} [INFO|2025-03-20 13:59:16] logging.py:143 >> {'loss': 0.4462, 'learning_rate': 1.8192e-05, 'epoch': 1.77, 'throughput': 10000.18} [INFO|2025-03-20 13:59:56] logging.py:143 >> {'loss': 0.3879, 'learning_rate': 1.8179e-05, 'epoch': 1.77, 'throughput': 10000.22} [INFO|2025-03-20 14:00:36] logging.py:143 >> {'loss': 0.4329, 'learning_rate': 1.8166e-05, 'epoch': 1.77, 'throughput': 10000.23} [INFO|2025-03-20 14:01:17] logging.py:143 >> {'loss': 0.4326, 'learning_rate': 1.8152e-05, 'epoch': 1.77, 'throughput': 10000.16} [INFO|2025-03-20 14:01:58] logging.py:143 >> {'loss': 0.4456, 'learning_rate': 1.8139e-05, 'epoch': 1.77, 'throughput': 10000.17} [INFO|2025-03-20 14:02:39] logging.py:143 >> {'loss': 0.4073, 'learning_rate': 1.8125e-05, 'epoch': 1.77, 'throughput': 10000.18} [INFO|2025-03-20 14:03:20] logging.py:143 >> {'loss': 0.4429, 'learning_rate': 1.8112e-05, 'epoch': 1.78, 'throughput': 10000.15} [INFO|2025-03-20 14:04:00] logging.py:143 >> {'loss': 0.4405, 'learning_rate': 1.8098e-05, 'epoch': 1.78, 'throughput': 10000.18} [INFO|2025-03-20 14:04:41] logging.py:143 >> {'loss': 0.4077, 'learning_rate': 1.8085e-05, 'epoch': 1.78, 'throughput': 10000.17} [INFO|2025-03-20 14:05:22] logging.py:143 >> {'loss': 0.4125, 'learning_rate': 1.8071e-05, 'epoch': 1.78, 'throughput': 10000.26} [INFO|2025-03-20 14:06:02] logging.py:143 >> {'loss': 0.4258, 'learning_rate': 1.8058e-05, 'epoch': 1.78, 'throughput': 10000.26} [INFO|2025-03-20 14:06:42] logging.py:143 >> {'loss': 0.3969, 'learning_rate': 1.8044e-05, 'epoch': 1.78, 'throughput': 10000.17} [INFO|2025-03-20 14:07:24] logging.py:143 >> {'loss': 0.4577, 'learning_rate': 1.8031e-05, 'epoch': 1.78, 'throughput': 10000.10} [INFO|2025-03-20 14:08:03] logging.py:143 >> {'loss': 0.3946, 'learning_rate': 1.8018e-05, 'epoch': 1.78, 'throughput': 10000.15} [INFO|2025-03-20 14:08:44] logging.py:143 >> {'loss': 0.4151, 'learning_rate': 1.8004e-05, 'epoch': 1.78, 'throughput': 10000.20} [INFO|2025-03-20 14:09:24] logging.py:143 >> {'loss': 0.4118, 'learning_rate': 1.7991e-05, 'epoch': 1.78, 'throughput': 10000.19} [INFO|2025-03-20 14:10:04] logging.py:143 >> {'loss': 0.4154, 'learning_rate': 1.7977e-05, 'epoch': 1.78, 'throughput': 10000.29} [INFO|2025-03-20 14:10:44] logging.py:143 >> {'loss': 0.4146, 'learning_rate': 1.7964e-05, 'epoch': 1.78, 'throughput': 10000.29} [INFO|2025-03-20 14:11:23] logging.py:143 >> {'loss': 0.4048, 'learning_rate': 1.7950e-05, 'epoch': 1.78, 'throughput': 10000.26} [INFO|2025-03-20 14:12:03] logging.py:143 >> {'loss': 0.4295, 'learning_rate': 1.7937e-05, 'epoch': 1.78, 'throughput': 10000.33} [INFO|2025-03-20 14:12:44] logging.py:143 >> {'loss': 0.4190, 'learning_rate': 1.7924e-05, 'epoch': 1.78, 'throughput': 10000.31} [INFO|2025-03-20 14:13:25] logging.py:143 >> {'loss': 0.3948, 'learning_rate': 1.7910e-05, 'epoch': 1.78, 'throughput': 10000.22} [INFO|2025-03-20 14:14:04] logging.py:143 >> {'loss': 0.3778, 'learning_rate': 1.7897e-05, 'epoch': 1.78, 'throughput': 10000.21} [INFO|2025-03-20 14:14:45] logging.py:143 >> {'loss': 0.3996, 'learning_rate': 1.7883e-05, 'epoch': 1.78, 'throughput': 10000.12} [INFO|2025-03-20 14:15:26] logging.py:143 >> {'loss': 0.4058, 'learning_rate': 1.7870e-05, 'epoch': 1.78, 'throughput': 10000.14} [INFO|2025-03-20 14:16:05] logging.py:143 >> {'loss': 0.3936, 'learning_rate': 1.7857e-05, 'epoch': 1.79, 'throughput': 10000.25} [INFO|2025-03-20 14:16:44] logging.py:143 >> {'loss': 0.4361, 'learning_rate': 1.7843e-05, 'epoch': 1.79, 'throughput': 10000.32} [INFO|2025-03-20 14:17:26] logging.py:143 >> {'loss': 0.4078, 'learning_rate': 1.7830e-05, 'epoch': 1.79, 'throughput': 10000.31} [INFO|2025-03-20 14:18:06] logging.py:143 >> {'loss': 0.4224, 'learning_rate': 1.7816e-05, 'epoch': 1.79, 'throughput': 10000.33} [INFO|2025-03-20 14:18:45] logging.py:143 >> {'loss': 0.4284, 'learning_rate': 1.7803e-05, 'epoch': 1.79, 'throughput': 10000.40} [INFO|2025-03-20 14:19:25] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 1.7790e-05, 'epoch': 1.79, 'throughput': 10000.45} [INFO|2025-03-20 14:20:05] logging.py:143 >> {'loss': 0.4443, 'learning_rate': 1.7776e-05, 'epoch': 1.79, 'throughput': 10000.53} [INFO|2025-03-20 14:20:46] logging.py:143 >> {'loss': 0.3998, 'learning_rate': 1.7763e-05, 'epoch': 1.79, 'throughput': 10000.54} [INFO|2025-03-20 14:21:27] logging.py:143 >> {'loss': 0.4126, 'learning_rate': 1.7749e-05, 'epoch': 1.79, 'throughput': 10000.49} [INFO|2025-03-20 14:22:07] logging.py:143 >> {'loss': 0.4225, 'learning_rate': 1.7736e-05, 'epoch': 1.79, 'throughput': 10000.49} [INFO|2025-03-20 14:22:47] logging.py:143 >> {'loss': 0.3883, 'learning_rate': 1.7723e-05, 'epoch': 1.79, 'throughput': 10000.43} [INFO|2025-03-20 14:23:27] logging.py:143 >> {'loss': 0.4278, 'learning_rate': 1.7709e-05, 'epoch': 1.79, 'throughput': 10000.47} [INFO|2025-03-20 14:24:06] logging.py:143 >> {'loss': 0.4127, 'learning_rate': 1.7696e-05, 'epoch': 1.79, 'throughput': 10000.61} [INFO|2025-03-20 14:24:46] logging.py:143 >> {'loss': 0.4022, 'learning_rate': 1.7682e-05, 'epoch': 1.79, 'throughput': 10000.64} [INFO|2025-03-20 14:25:25] logging.py:143 >> {'loss': 0.4218, 'learning_rate': 1.7669e-05, 'epoch': 1.79, 'throughput': 10000.72} [INFO|2025-03-20 14:26:05] logging.py:143 >> {'loss': 0.4022, 'learning_rate': 1.7656e-05, 'epoch': 1.79, 'throughput': 10000.71} [INFO|2025-03-20 14:26:46] logging.py:143 >> {'loss': 0.4217, 'learning_rate': 1.7642e-05, 'epoch': 1.79, 'throughput': 10000.80} [INFO|2025-03-20 14:27:27] logging.py:143 >> {'loss': 0.4411, 'learning_rate': 1.7629e-05, 'epoch': 1.79, 'throughput': 10000.81} [INFO|2025-03-20 14:28:07] logging.py:143 >> {'loss': 0.4185, 'learning_rate': 1.7616e-05, 'epoch': 1.79, 'throughput': 10000.80} [INFO|2025-03-20 14:28:49] logging.py:143 >> {'loss': 0.4328, 'learning_rate': 1.7602e-05, 'epoch': 1.80, 'throughput': 10000.75} [INFO|2025-03-20 14:29:29] logging.py:143 >> {'loss': 0.4309, 'learning_rate': 1.7589e-05, 'epoch': 1.80, 'throughput': 10000.84} [INFO|2025-03-20 14:30:10] logging.py:143 >> {'loss': 0.4234, 'learning_rate': 1.7575e-05, 'epoch': 1.80, 'throughput': 10000.84} [INFO|2025-03-20 14:30:51] logging.py:143 >> {'loss': 0.4259, 'learning_rate': 1.7562e-05, 'epoch': 1.80, 'throughput': 10000.87} [INFO|2025-03-20 14:31:31] logging.py:143 >> {'loss': 0.4294, 'learning_rate': 1.7549e-05, 'epoch': 1.80, 'throughput': 10000.88} [INFO|2025-03-20 14:32:09] logging.py:143 >> {'loss': 0.4349, 'learning_rate': 1.7535e-05, 'epoch': 1.80, 'throughput': 10000.96} [INFO|2025-03-20 14:32:51] logging.py:143 >> {'loss': 0.4143, 'learning_rate': 1.7522e-05, 'epoch': 1.80, 'throughput': 10000.95} [INFO|2025-03-20 14:33:30] logging.py:143 >> {'loss': 0.3921, 'learning_rate': 1.7509e-05, 'epoch': 1.80, 'throughput': 10000.99} [INFO|2025-03-20 14:34:12] logging.py:143 >> {'loss': 0.4082, 'learning_rate': 1.7495e-05, 'epoch': 1.80, 'throughput': 10001.00} [INFO|2025-03-20 14:34:53] logging.py:143 >> {'loss': 0.4471, 'learning_rate': 1.7482e-05, 'epoch': 1.80, 'throughput': 10000.92} [INFO|2025-03-20 14:35:33] logging.py:143 >> {'loss': 0.4055, 'learning_rate': 1.7469e-05, 'epoch': 1.80, 'throughput': 10000.89} [INFO|2025-03-20 14:36:14] logging.py:143 >> {'loss': 0.4388, 'learning_rate': 1.7455e-05, 'epoch': 1.80, 'throughput': 10000.89} [INFO|2025-03-20 14:36:55] logging.py:143 >> {'loss': 0.4102, 'learning_rate': 1.7442e-05, 'epoch': 1.80, 'throughput': 10000.81} [INFO|2025-03-20 14:37:35] logging.py:143 >> {'loss': 0.3981, 'learning_rate': 1.7429e-05, 'epoch': 1.80, 'throughput': 10000.90} [INFO|2025-03-20 14:38:15] logging.py:143 >> {'loss': 0.4002, 'learning_rate': 1.7415e-05, 'epoch': 1.80, 'throughput': 10000.90} [INFO|2025-03-20 14:38:55] logging.py:143 >> {'loss': 0.4106, 'learning_rate': 1.7402e-05, 'epoch': 1.80, 'throughput': 10000.86} [INFO|2025-03-20 14:39:33] logging.py:143 >> {'loss': 0.4175, 'learning_rate': 1.7389e-05, 'epoch': 1.80, 'throughput': 10000.94} [INFO|2025-03-20 14:40:14] logging.py:143 >> {'loss': 0.4163, 'learning_rate': 1.7375e-05, 'epoch': 1.80, 'throughput': 10001.03} [INFO|2025-03-20 14:40:53] logging.py:143 >> {'loss': 0.4155, 'learning_rate': 1.7362e-05, 'epoch': 1.81, 'throughput': 10001.05} [INFO|2025-03-20 14:41:34] logging.py:143 >> {'loss': 0.4187, 'learning_rate': 1.7349e-05, 'epoch': 1.81, 'throughput': 10001.09} [INFO|2025-03-20 14:42:14] logging.py:143 >> {'loss': 0.4221, 'learning_rate': 1.7335e-05, 'epoch': 1.81, 'throughput': 10001.09} [INFO|2025-03-20 14:42:54] logging.py:143 >> {'loss': 0.4237, 'learning_rate': 1.7322e-05, 'epoch': 1.81, 'throughput': 10001.12} [INFO|2025-03-20 14:43:33] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 1.7309e-05, 'epoch': 1.81, 'throughput': 10001.19} [INFO|2025-03-20 14:44:15] logging.py:143 >> {'loss': 0.4217, 'learning_rate': 1.7295e-05, 'epoch': 1.81, 'throughput': 10001.10} [INFO|2025-03-20 14:44:56] logging.py:143 >> {'loss': 0.3899, 'learning_rate': 1.7282e-05, 'epoch': 1.81, 'throughput': 10001.04} [INFO|2025-03-20 14:45:36] logging.py:143 >> {'loss': 0.4118, 'learning_rate': 1.7269e-05, 'epoch': 1.81, 'throughput': 10001.03} [INFO|2025-03-20 14:46:16] logging.py:143 >> {'loss': 0.4466, 'learning_rate': 1.7255e-05, 'epoch': 1.81, 'throughput': 10000.98} [INFO|2025-03-20 14:46:56] logging.py:143 >> {'loss': 0.4282, 'learning_rate': 1.7242e-05, 'epoch': 1.81, 'throughput': 10000.98} [INFO|2025-03-20 14:47:36] logging.py:143 >> {'loss': 0.3978, 'learning_rate': 1.7229e-05, 'epoch': 1.81, 'throughput': 10000.96} [INFO|2025-03-20 14:48:17] logging.py:143 >> {'loss': 0.4365, 'learning_rate': 1.7216e-05, 'epoch': 1.81, 'throughput': 10000.97} [INFO|2025-03-20 14:48:57] logging.py:143 >> {'loss': 0.3927, 'learning_rate': 1.7202e-05, 'epoch': 1.81, 'throughput': 10000.89} [INFO|2025-03-20 14:49:37] logging.py:143 >> {'loss': 0.3970, 'learning_rate': 1.7189e-05, 'epoch': 1.81, 'throughput': 10000.81} [INFO|2025-03-20 14:50:17] logging.py:143 >> {'loss': 0.4269, 'learning_rate': 1.7176e-05, 'epoch': 1.81, 'throughput': 10000.83} [INFO|2025-03-20 14:50:56] logging.py:143 >> {'loss': 0.4156, 'learning_rate': 1.7162e-05, 'epoch': 1.81, 'throughput': 10000.92} [INFO|2025-03-20 14:51:39] logging.py:143 >> {'loss': 0.4482, 'learning_rate': 1.7149e-05, 'epoch': 1.81, 'throughput': 10000.87} [INFO|2025-03-20 14:52:19] logging.py:143 >> {'loss': 0.4156, 'learning_rate': 1.7136e-05, 'epoch': 1.81, 'throughput': 10000.85} [INFO|2025-03-20 14:53:01] logging.py:143 >> {'loss': 0.4426, 'learning_rate': 1.7122e-05, 'epoch': 1.81, 'throughput': 10000.88} [INFO|2025-03-20 14:53:42] logging.py:143 >> {'loss': 0.4060, 'learning_rate': 1.7109e-05, 'epoch': 1.82, 'throughput': 10000.77} [INFO|2025-03-20 14:54:22] logging.py:143 >> {'loss': 0.4071, 'learning_rate': 1.7096e-05, 'epoch': 1.82, 'throughput': 10000.74} [INFO|2025-03-20 14:55:03] logging.py:143 >> {'loss': 0.4302, 'learning_rate': 1.7083e-05, 'epoch': 1.82, 'throughput': 10000.72} [INFO|2025-03-20 14:55:43] logging.py:143 >> {'loss': 0.3977, 'learning_rate': 1.7069e-05, 'epoch': 1.82, 'throughput': 10000.74} [INFO|2025-03-20 14:56:23] logging.py:143 >> {'loss': 0.3878, 'learning_rate': 1.7056e-05, 'epoch': 1.82, 'throughput': 10000.73} [INFO|2025-03-20 14:57:03] logging.py:143 >> {'loss': 0.4454, 'learning_rate': 1.7043e-05, 'epoch': 1.82, 'throughput': 10000.65} [INFO|2025-03-20 14:57:42] logging.py:143 >> {'loss': 0.4261, 'learning_rate': 1.7030e-05, 'epoch': 1.82, 'throughput': 10000.71} [INFO|2025-03-20 14:58:22] logging.py:143 >> {'loss': 0.4125, 'learning_rate': 1.7016e-05, 'epoch': 1.82, 'throughput': 10000.71} [INFO|2025-03-20 14:59:03] logging.py:143 >> {'loss': 0.4334, 'learning_rate': 1.7003e-05, 'epoch': 1.82, 'throughput': 10000.67} [INFO|2025-03-20 14:59:45] logging.py:143 >> {'loss': 0.4283, 'learning_rate': 1.6990e-05, 'epoch': 1.82, 'throughput': 10000.67} [INFO|2025-03-20 15:00:26] logging.py:143 >> {'loss': 0.4087, 'learning_rate': 1.6977e-05, 'epoch': 1.82, 'throughput': 10000.62} [INFO|2025-03-20 15:01:06] logging.py:143 >> {'loss': 0.3999, 'learning_rate': 1.6963e-05, 'epoch': 1.82, 'throughput': 10000.68} [INFO|2025-03-20 15:01:47] logging.py:143 >> {'loss': 0.4041, 'learning_rate': 1.6950e-05, 'epoch': 1.82, 'throughput': 10000.65} [INFO|2025-03-20 15:02:28] logging.py:143 >> {'loss': 0.4137, 'learning_rate': 1.6937e-05, 'epoch': 1.82, 'throughput': 10000.64} [INFO|2025-03-20 15:03:08] logging.py:143 >> {'loss': 0.4199, 'learning_rate': 1.6924e-05, 'epoch': 1.82, 'throughput': 10000.63} [INFO|2025-03-20 15:03:48] logging.py:143 >> {'loss': 0.4157, 'learning_rate': 1.6910e-05, 'epoch': 1.82, 'throughput': 10000.64} [INFO|2025-03-20 15:04:28] logging.py:143 >> {'loss': 0.4379, 'learning_rate': 1.6897e-05, 'epoch': 1.82, 'throughput': 10000.60} [INFO|2025-03-20 15:05:07] logging.py:143 >> {'loss': 0.4108, 'learning_rate': 1.6884e-05, 'epoch': 1.82, 'throughput': 10000.71} [INFO|2025-03-20 15:05:48] logging.py:143 >> {'loss': 0.4005, 'learning_rate': 1.6871e-05, 'epoch': 1.82, 'throughput': 10000.71} [INFO|2025-03-20 15:06:28] logging.py:143 >> {'loss': 0.4156, 'learning_rate': 1.6857e-05, 'epoch': 1.83, 'throughput': 10000.80} [INFO|2025-03-20 15:07:08] logging.py:143 >> {'loss': 0.4374, 'learning_rate': 1.6844e-05, 'epoch': 1.83, 'throughput': 10000.82} [INFO|2025-03-20 15:07:49] logging.py:143 >> {'loss': 0.4016, 'learning_rate': 1.6831e-05, 'epoch': 1.83, 'throughput': 10000.80} [INFO|2025-03-20 15:08:28] logging.py:143 >> {'loss': 0.4355, 'learning_rate': 1.6818e-05, 'epoch': 1.83, 'throughput': 10000.86} [INFO|2025-03-20 15:09:09] logging.py:143 >> {'loss': 0.4332, 'learning_rate': 1.6804e-05, 'epoch': 1.83, 'throughput': 10000.89} [INFO|2025-03-20 15:09:50] logging.py:143 >> {'loss': 0.4222, 'learning_rate': 1.6791e-05, 'epoch': 1.83, 'throughput': 10000.89} [INFO|2025-03-20 15:10:29] logging.py:143 >> {'loss': 0.4260, 'learning_rate': 1.6778e-05, 'epoch': 1.83, 'throughput': 10000.95} [INFO|2025-03-20 15:11:09] logging.py:143 >> {'loss': 0.4038, 'learning_rate': 1.6765e-05, 'epoch': 1.83, 'throughput': 10000.95} [INFO|2025-03-20 15:11:50] logging.py:143 >> {'loss': 0.3994, 'learning_rate': 1.6752e-05, 'epoch': 1.83, 'throughput': 10000.90} [INFO|2025-03-20 15:12:32] logging.py:143 >> {'loss': 0.4326, 'learning_rate': 1.6738e-05, 'epoch': 1.83, 'throughput': 10000.76} [INFO|2025-03-20 15:13:12] logging.py:143 >> {'loss': 0.4261, 'learning_rate': 1.6725e-05, 'epoch': 1.83, 'throughput': 10000.83} [INFO|2025-03-20 15:13:51] logging.py:143 >> {'loss': 0.3913, 'learning_rate': 1.6712e-05, 'epoch': 1.83, 'throughput': 10000.89} [INFO|2025-03-20 15:14:30] logging.py:143 >> {'loss': 0.4053, 'learning_rate': 1.6699e-05, 'epoch': 1.83, 'throughput': 10000.93} [INFO|2025-03-20 15:15:09] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 1.6686e-05, 'epoch': 1.83, 'throughput': 10000.97} [INFO|2025-03-20 15:15:49] logging.py:143 >> {'loss': 0.4062, 'learning_rate': 1.6672e-05, 'epoch': 1.83, 'throughput': 10001.00} [INFO|2025-03-20 15:16:28] logging.py:143 >> {'loss': 0.4159, 'learning_rate': 1.6659e-05, 'epoch': 1.83, 'throughput': 10001.04} [INFO|2025-03-20 15:17:09] logging.py:143 >> {'loss': 0.4414, 'learning_rate': 1.6646e-05, 'epoch': 1.83, 'throughput': 10001.03} [INFO|2025-03-20 15:17:49] logging.py:143 >> {'loss': 0.3844, 'learning_rate': 1.6633e-05, 'epoch': 1.83, 'throughput': 10001.03} [INFO|2025-03-20 15:18:31] logging.py:143 >> {'loss': 0.3968, 'learning_rate': 1.6620e-05, 'epoch': 1.83, 'throughput': 10000.97} [INFO|2025-03-20 15:19:11] logging.py:143 >> {'loss': 0.4126, 'learning_rate': 1.6606e-05, 'epoch': 1.84, 'throughput': 10001.00} [INFO|2025-03-20 15:19:51] logging.py:143 >> {'loss': 0.4352, 'learning_rate': 1.6593e-05, 'epoch': 1.84, 'throughput': 10001.04} [INFO|2025-03-20 15:20:32] logging.py:143 >> {'loss': 0.4095, 'learning_rate': 1.6580e-05, 'epoch': 1.84, 'throughput': 10001.01} [INFO|2025-03-20 15:21:13] logging.py:143 >> {'loss': 0.3878, 'learning_rate': 1.6567e-05, 'epoch': 1.84, 'throughput': 10001.06} [INFO|2025-03-20 15:21:52] logging.py:143 >> {'loss': 0.3953, 'learning_rate': 1.6554e-05, 'epoch': 1.84, 'throughput': 10001.07} [INFO|2025-03-20 15:22:33] logging.py:143 >> {'loss': 0.4281, 'learning_rate': 1.6541e-05, 'epoch': 1.84, 'throughput': 10001.08} [INFO|2025-03-20 15:23:14] logging.py:143 >> {'loss': 0.3987, 'learning_rate': 1.6527e-05, 'epoch': 1.84, 'throughput': 10001.10} [INFO|2025-03-20 15:23:54] logging.py:143 >> {'loss': 0.4033, 'learning_rate': 1.6514e-05, 'epoch': 1.84, 'throughput': 10001.06} [INFO|2025-03-20 15:24:35] logging.py:143 >> {'loss': 0.4210, 'learning_rate': 1.6501e-05, 'epoch': 1.84, 'throughput': 10001.02} [INFO|2025-03-20 15:25:15] logging.py:143 >> {'loss': 0.4144, 'learning_rate': 1.6488e-05, 'epoch': 1.84, 'throughput': 10001.05} [INFO|2025-03-20 15:25:54] logging.py:143 >> {'loss': 0.4258, 'learning_rate': 1.6475e-05, 'epoch': 1.84, 'throughput': 10001.11} [INFO|2025-03-20 15:26:34] logging.py:143 >> {'loss': 0.4084, 'learning_rate': 1.6462e-05, 'epoch': 1.84, 'throughput': 10001.09} [INFO|2025-03-20 15:27:14] logging.py:143 >> {'loss': 0.3779, 'learning_rate': 1.6448e-05, 'epoch': 1.84, 'throughput': 10001.13} [INFO|2025-03-20 15:27:55] logging.py:143 >> {'loss': 0.3857, 'learning_rate': 1.6435e-05, 'epoch': 1.84, 'throughput': 10001.08} [INFO|2025-03-20 15:28:36] logging.py:143 >> {'loss': 0.3984, 'learning_rate': 1.6422e-05, 'epoch': 1.84, 'throughput': 10001.09} [INFO|2025-03-20 15:29:18] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 1.6409e-05, 'epoch': 1.84, 'throughput': 10001.09} [INFO|2025-03-20 15:30:00] logging.py:143 >> {'loss': 0.3741, 'learning_rate': 1.6396e-05, 'epoch': 1.84, 'throughput': 10001.01} [INFO|2025-03-20 15:30:40] logging.py:143 >> {'loss': 0.4354, 'learning_rate': 1.6383e-05, 'epoch': 1.84, 'throughput': 10001.06} [INFO|2025-03-20 15:31:21] logging.py:143 >> {'loss': 0.4019, 'learning_rate': 1.6370e-05, 'epoch': 1.84, 'throughput': 10001.07} [INFO|2025-03-20 15:32:03] logging.py:143 >> {'loss': 0.4042, 'learning_rate': 1.6356e-05, 'epoch': 1.85, 'throughput': 10001.08} [INFO|2025-03-20 15:32:43] logging.py:143 >> {'loss': 0.3888, 'learning_rate': 1.6343e-05, 'epoch': 1.85, 'throughput': 10001.02} [INFO|2025-03-20 15:33:22] logging.py:143 >> {'loss': 0.3965, 'learning_rate': 1.6330e-05, 'epoch': 1.85, 'throughput': 10001.11} [INFO|2025-03-20 15:34:03] logging.py:143 >> {'loss': 0.3993, 'learning_rate': 1.6317e-05, 'epoch': 1.85, 'throughput': 10001.09} [INFO|2025-03-20 15:34:43] logging.py:143 >> {'loss': 0.4040, 'learning_rate': 1.6304e-05, 'epoch': 1.85, 'throughput': 10001.06} [INFO|2025-03-20 15:35:25] logging.py:143 >> {'loss': 0.3842, 'learning_rate': 1.6291e-05, 'epoch': 1.85, 'throughput': 10000.99} [INFO|2025-03-20 15:36:04] logging.py:143 >> {'loss': 0.4032, 'learning_rate': 1.6278e-05, 'epoch': 1.85, 'throughput': 10000.98} [INFO|2025-03-20 15:36:44] logging.py:143 >> {'loss': 0.4242, 'learning_rate': 1.6265e-05, 'epoch': 1.85, 'throughput': 10000.97} [INFO|2025-03-20 15:37:24] logging.py:143 >> {'loss': 0.4019, 'learning_rate': 1.6252e-05, 'epoch': 1.85, 'throughput': 10001.07} [INFO|2025-03-20 15:38:05] logging.py:143 >> {'loss': 0.3972, 'learning_rate': 1.6238e-05, 'epoch': 1.85, 'throughput': 10001.07} [INFO|2025-03-20 15:38:46] logging.py:143 >> {'loss': 0.4144, 'learning_rate': 1.6225e-05, 'epoch': 1.85, 'throughput': 10001.02} [INFO|2025-03-20 15:39:27] logging.py:143 >> {'loss': 0.4261, 'learning_rate': 1.6212e-05, 'epoch': 1.85, 'throughput': 10001.00} [INFO|2025-03-20 15:40:07] logging.py:143 >> {'loss': 0.3855, 'learning_rate': 1.6199e-05, 'epoch': 1.85, 'throughput': 10001.02} [INFO|2025-03-20 15:40:46] logging.py:143 >> {'loss': 0.4208, 'learning_rate': 1.6186e-05, 'epoch': 1.85, 'throughput': 10001.12} [INFO|2025-03-20 15:41:26] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 1.6173e-05, 'epoch': 1.85, 'throughput': 10001.17} [INFO|2025-03-20 15:42:07] logging.py:143 >> {'loss': 0.4171, 'learning_rate': 1.6160e-05, 'epoch': 1.85, 'throughput': 10001.13} [INFO|2025-03-20 15:42:47] logging.py:143 >> {'loss': 0.4035, 'learning_rate': 1.6147e-05, 'epoch': 1.85, 'throughput': 10001.19} [INFO|2025-03-20 15:43:29] logging.py:143 >> {'loss': 0.4170, 'learning_rate': 1.6134e-05, 'epoch': 1.85, 'throughput': 10001.20} [INFO|2025-03-20 15:44:09] logging.py:143 >> {'loss': 0.4004, 'learning_rate': 1.6121e-05, 'epoch': 1.85, 'throughput': 10001.22} [INFO|2025-03-20 15:44:48] logging.py:143 >> {'loss': 0.4039, 'learning_rate': 1.6108e-05, 'epoch': 1.86, 'throughput': 10001.29} [INFO|2025-03-20 15:45:28] logging.py:143 >> {'loss': 0.4158, 'learning_rate': 1.6094e-05, 'epoch': 1.86, 'throughput': 10001.34} [INFO|2025-03-20 15:46:07] logging.py:143 >> {'loss': 0.4169, 'learning_rate': 1.6081e-05, 'epoch': 1.86, 'throughput': 10001.39} [INFO|2025-03-20 15:46:50] logging.py:143 >> {'loss': 0.4044, 'learning_rate': 1.6068e-05, 'epoch': 1.86, 'throughput': 10001.28} [INFO|2025-03-20 15:47:30] logging.py:143 >> {'loss': 0.4029, 'learning_rate': 1.6055e-05, 'epoch': 1.86, 'throughput': 10001.21} [INFO|2025-03-20 15:48:12] logging.py:143 >> {'loss': 0.4358, 'learning_rate': 1.6042e-05, 'epoch': 1.86, 'throughput': 10001.21} [INFO|2025-03-20 15:48:52] logging.py:143 >> {'loss': 0.4527, 'learning_rate': 1.6029e-05, 'epoch': 1.86, 'throughput': 10001.28} [INFO|2025-03-20 15:49:31] logging.py:143 >> {'loss': 0.3995, 'learning_rate': 1.6016e-05, 'epoch': 1.86, 'throughput': 10001.26} [INFO|2025-03-20 15:50:11] logging.py:143 >> {'loss': 0.4372, 'learning_rate': 1.6003e-05, 'epoch': 1.86, 'throughput': 10001.27} [INFO|2025-03-20 15:50:52] logging.py:143 >> {'loss': 0.3972, 'learning_rate': 1.5990e-05, 'epoch': 1.86, 'throughput': 10001.28} [INFO|2025-03-20 15:51:32] logging.py:143 >> {'loss': 0.4080, 'learning_rate': 1.5977e-05, 'epoch': 1.86, 'throughput': 10001.30} [INFO|2025-03-20 15:52:14] logging.py:143 >> {'loss': 0.4229, 'learning_rate': 1.5964e-05, 'epoch': 1.86, 'throughput': 10001.32} [INFO|2025-03-20 15:52:55] logging.py:143 >> {'loss': 0.4161, 'learning_rate': 1.5951e-05, 'epoch': 1.86, 'throughput': 10001.33} [INFO|2025-03-20 15:53:37] logging.py:143 >> {'loss': 0.3902, 'learning_rate': 1.5938e-05, 'epoch': 1.86, 'throughput': 10001.33} [INFO|2025-03-20 15:54:17] logging.py:143 >> {'loss': 0.4129, 'learning_rate': 1.5925e-05, 'epoch': 1.86, 'throughput': 10001.38} [INFO|2025-03-20 15:54:56] logging.py:143 >> {'loss': 0.3910, 'learning_rate': 1.5912e-05, 'epoch': 1.86, 'throughput': 10001.38} [INFO|2025-03-20 15:55:37] logging.py:143 >> {'loss': 0.4211, 'learning_rate': 1.5899e-05, 'epoch': 1.86, 'throughput': 10001.33} [INFO|2025-03-20 15:56:17] logging.py:143 >> {'loss': 0.4288, 'learning_rate': 1.5886e-05, 'epoch': 1.86, 'throughput': 10001.31} [INFO|2025-03-20 15:56:58] logging.py:143 >> {'loss': 0.4347, 'learning_rate': 1.5873e-05, 'epoch': 1.86, 'throughput': 10001.27} [INFO|2025-03-20 15:57:40] logging.py:143 >> {'loss': 0.4060, 'learning_rate': 1.5860e-05, 'epoch': 1.87, 'throughput': 10001.25} [INFO|2025-03-20 15:58:20] logging.py:143 >> {'loss': 0.3917, 'learning_rate': 1.5847e-05, 'epoch': 1.87, 'throughput': 10001.29} [INFO|2025-03-20 15:58:59] logging.py:143 >> {'loss': 0.4308, 'learning_rate': 1.5833e-05, 'epoch': 1.87, 'throughput': 10001.35} [INFO|2025-03-20 15:59:41] logging.py:143 >> {'loss': 0.4062, 'learning_rate': 1.5820e-05, 'epoch': 1.87, 'throughput': 10001.32} [INFO|2025-03-20 16:00:22] logging.py:143 >> {'loss': 0.4256, 'learning_rate': 1.5807e-05, 'epoch': 1.87, 'throughput': 10001.29} [INFO|2025-03-20 16:01:01] logging.py:143 >> {'loss': 0.3961, 'learning_rate': 1.5794e-05, 'epoch': 1.87, 'throughput': 10001.26} [INFO|2025-03-20 16:01:42] logging.py:143 >> {'loss': 0.4017, 'learning_rate': 1.5781e-05, 'epoch': 1.87, 'throughput': 10001.33} [INFO|2025-03-20 16:02:22] logging.py:143 >> {'loss': 0.3945, 'learning_rate': 1.5768e-05, 'epoch': 1.87, 'throughput': 10001.38} [INFO|2025-03-20 16:03:02] logging.py:143 >> {'loss': 0.4212, 'learning_rate': 1.5755e-05, 'epoch': 1.87, 'throughput': 10001.41} [INFO|2025-03-20 16:03:43] logging.py:143 >> {'loss': 0.4351, 'learning_rate': 1.5742e-05, 'epoch': 1.87, 'throughput': 10001.41} [INFO|2025-03-20 16:04:23] logging.py:143 >> {'loss': 0.3975, 'learning_rate': 1.5729e-05, 'epoch': 1.87, 'throughput': 10001.43} [INFO|2025-03-20 16:05:02] logging.py:143 >> {'loss': 0.4277, 'learning_rate': 1.5716e-05, 'epoch': 1.87, 'throughput': 10001.54} [INFO|2025-03-20 16:05:44] logging.py:143 >> {'loss': 0.4270, 'learning_rate': 1.5703e-05, 'epoch': 1.87, 'throughput': 10001.50} [INFO|2025-03-20 16:06:23] logging.py:143 >> {'loss': 0.4318, 'learning_rate': 1.5690e-05, 'epoch': 1.87, 'throughput': 10001.52} [INFO|2025-03-20 16:07:04] logging.py:143 >> {'loss': 0.3984, 'learning_rate': 1.5677e-05, 'epoch': 1.87, 'throughput': 10001.52} [INFO|2025-03-20 16:07:43] logging.py:143 >> {'loss': 0.4141, 'learning_rate': 1.5664e-05, 'epoch': 1.87, 'throughput': 10001.59} [INFO|2025-03-20 16:08:23] logging.py:143 >> {'loss': 0.4106, 'learning_rate': 1.5652e-05, 'epoch': 1.87, 'throughput': 10001.66} [INFO|2025-03-20 16:09:04] logging.py:143 >> {'loss': 0.4182, 'learning_rate': 1.5639e-05, 'epoch': 1.87, 'throughput': 10001.63} [INFO|2025-03-20 16:09:45] logging.py:143 >> {'loss': 0.4386, 'learning_rate': 1.5626e-05, 'epoch': 1.88, 'throughput': 10001.66} [INFO|2025-03-20 16:10:25] logging.py:143 >> {'loss': 0.4293, 'learning_rate': 1.5613e-05, 'epoch': 1.88, 'throughput': 10001.71} [INFO|2025-03-20 16:11:05] logging.py:143 >> {'loss': 0.3858, 'learning_rate': 1.5600e-05, 'epoch': 1.88, 'throughput': 10001.64} [INFO|2025-03-20 16:11:47] logging.py:143 >> {'loss': 0.4330, 'learning_rate': 1.5587e-05, 'epoch': 1.88, 'throughput': 10001.63} [INFO|2025-03-20 16:12:29] logging.py:143 >> {'loss': 0.3938, 'learning_rate': 1.5574e-05, 'epoch': 1.88, 'throughput': 10001.64} [INFO|2025-03-20 16:13:10] logging.py:143 >> {'loss': 0.4174, 'learning_rate': 1.5561e-05, 'epoch': 1.88, 'throughput': 10001.65} [INFO|2025-03-20 16:13:51] logging.py:143 >> {'loss': 0.4309, 'learning_rate': 1.5548e-05, 'epoch': 1.88, 'throughput': 10001.56} [INFO|2025-03-20 16:14:33] logging.py:143 >> {'loss': 0.4000, 'learning_rate': 1.5535e-05, 'epoch': 1.88, 'throughput': 10001.54} [INFO|2025-03-20 16:15:15] logging.py:143 >> {'loss': 0.4022, 'learning_rate': 1.5522e-05, 'epoch': 1.88, 'throughput': 10001.43} [INFO|2025-03-20 16:15:56] logging.py:143 >> {'loss': 0.4352, 'learning_rate': 1.5509e-05, 'epoch': 1.88, 'throughput': 10001.43} [INFO|2025-03-20 16:16:37] logging.py:143 >> {'loss': 0.3952, 'learning_rate': 1.5496e-05, 'epoch': 1.88, 'throughput': 10001.34} [INFO|2025-03-20 16:17:18] logging.py:143 >> {'loss': 0.4457, 'learning_rate': 1.5483e-05, 'epoch': 1.88, 'throughput': 10001.41} [INFO|2025-03-20 16:17:58] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 1.5470e-05, 'epoch': 1.88, 'throughput': 10001.43} [INFO|2025-03-20 16:18:38] logging.py:143 >> {'loss': 0.4216, 'learning_rate': 1.5457e-05, 'epoch': 1.88, 'throughput': 10001.43} [INFO|2025-03-20 16:19:18] logging.py:143 >> {'loss': 0.3927, 'learning_rate': 1.5444e-05, 'epoch': 1.88, 'throughput': 10001.43} [INFO|2025-03-20 16:20:00] logging.py:143 >> {'loss': 0.4128, 'learning_rate': 1.5431e-05, 'epoch': 1.88, 'throughput': 10001.41} [INFO|2025-03-20 16:20:41] logging.py:143 >> {'loss': 0.4090, 'learning_rate': 1.5418e-05, 'epoch': 1.88, 'throughput': 10001.39} [INFO|2025-03-20 16:21:23] logging.py:143 >> {'loss': 0.4412, 'learning_rate': 1.5405e-05, 'epoch': 1.88, 'throughput': 10001.26} [INFO|2025-03-20 16:22:04] logging.py:143 >> {'loss': 0.4330, 'learning_rate': 1.5393e-05, 'epoch': 1.88, 'throughput': 10001.29} [INFO|2025-03-20 16:22:45] logging.py:143 >> {'loss': 0.4148, 'learning_rate': 1.5380e-05, 'epoch': 1.89, 'throughput': 10001.25} [INFO|2025-03-20 16:23:27] logging.py:143 >> {'loss': 0.4194, 'learning_rate': 1.5367e-05, 'epoch': 1.89, 'throughput': 10001.24} [INFO|2025-03-20 16:24:07] logging.py:143 >> {'loss': 0.4121, 'learning_rate': 1.5354e-05, 'epoch': 1.89, 'throughput': 10001.26} [INFO|2025-03-20 16:24:48] logging.py:143 >> {'loss': 0.3994, 'learning_rate': 1.5341e-05, 'epoch': 1.89, 'throughput': 10001.27} [INFO|2025-03-20 16:25:28] logging.py:143 >> {'loss': 0.4084, 'learning_rate': 1.5328e-05, 'epoch': 1.89, 'throughput': 10001.31} [INFO|2025-03-20 16:26:08] logging.py:143 >> {'loss': 0.4316, 'learning_rate': 1.5315e-05, 'epoch': 1.89, 'throughput': 10001.41} [INFO|2025-03-20 16:26:48] logging.py:143 >> {'loss': 0.3990, 'learning_rate': 1.5302e-05, 'epoch': 1.89, 'throughput': 10001.36} [INFO|2025-03-20 16:27:28] logging.py:143 >> {'loss': 0.3858, 'learning_rate': 1.5289e-05, 'epoch': 1.89, 'throughput': 10001.47} [INFO|2025-03-20 16:28:09] logging.py:143 >> {'loss': 0.4064, 'learning_rate': 1.5276e-05, 'epoch': 1.89, 'throughput': 10001.49} [INFO|2025-03-20 16:28:51] logging.py:143 >> {'loss': 0.4245, 'learning_rate': 1.5263e-05, 'epoch': 1.89, 'throughput': 10001.47} [INFO|2025-03-20 16:29:32] logging.py:143 >> {'loss': 0.3969, 'learning_rate': 1.5251e-05, 'epoch': 1.89, 'throughput': 10001.43} [INFO|2025-03-20 16:30:13] logging.py:143 >> {'loss': 0.3861, 'learning_rate': 1.5238e-05, 'epoch': 1.89, 'throughput': 10001.41} [INFO|2025-03-20 16:30:53] logging.py:143 >> {'loss': 0.3879, 'learning_rate': 1.5225e-05, 'epoch': 1.89, 'throughput': 10001.45} [INFO|2025-03-20 16:31:34] logging.py:143 >> {'loss': 0.4057, 'learning_rate': 1.5212e-05, 'epoch': 1.89, 'throughput': 10001.42} [INFO|2025-03-20 16:32:13] logging.py:143 >> {'loss': 0.3917, 'learning_rate': 1.5199e-05, 'epoch': 1.89, 'throughput': 10001.39} [INFO|2025-03-20 16:32:52] logging.py:143 >> {'loss': 0.4377, 'learning_rate': 1.5186e-05, 'epoch': 1.89, 'throughput': 10001.43} [INFO|2025-03-20 16:33:31] logging.py:143 >> {'loss': 0.3906, 'learning_rate': 1.5173e-05, 'epoch': 1.89, 'throughput': 10001.51} [INFO|2025-03-20 16:34:12] logging.py:143 >> {'loss': 0.3987, 'learning_rate': 1.5160e-05, 'epoch': 1.89, 'throughput': 10001.57} [INFO|2025-03-20 16:34:53] logging.py:143 >> {'loss': 0.4077, 'learning_rate': 1.5148e-05, 'epoch': 1.89, 'throughput': 10001.49} [INFO|2025-03-20 16:35:32] logging.py:143 >> {'loss': 0.3988, 'learning_rate': 1.5135e-05, 'epoch': 1.90, 'throughput': 10001.55} [INFO|2025-03-20 16:36:12] logging.py:143 >> {'loss': 0.4396, 'learning_rate': 1.5122e-05, 'epoch': 1.90, 'throughput': 10001.60} [INFO|2025-03-20 16:36:51] logging.py:143 >> {'loss': 0.4156, 'learning_rate': 1.5109e-05, 'epoch': 1.90, 'throughput': 10001.66} [INFO|2025-03-20 16:37:32] logging.py:143 >> {'loss': 0.4172, 'learning_rate': 1.5096e-05, 'epoch': 1.90, 'throughput': 10001.70} [INFO|2025-03-20 16:38:13] logging.py:143 >> {'loss': 0.3887, 'learning_rate': 1.5083e-05, 'epoch': 1.90, 'throughput': 10001.67} [INFO|2025-03-20 16:38:53] logging.py:143 >> {'loss': 0.4223, 'learning_rate': 1.5071e-05, 'epoch': 1.90, 'throughput': 10001.76} [INFO|2025-03-20 16:39:35] logging.py:143 >> {'loss': 0.3916, 'learning_rate': 1.5058e-05, 'epoch': 1.90, 'throughput': 10001.68} [INFO|2025-03-20 16:40:14] logging.py:143 >> {'loss': 0.4133, 'learning_rate': 1.5045e-05, 'epoch': 1.90, 'throughput': 10001.71} [INFO|2025-03-20 16:40:56] logging.py:143 >> {'loss': 0.4104, 'learning_rate': 1.5032e-05, 'epoch': 1.90, 'throughput': 10001.67} [INFO|2025-03-20 16:41:38] logging.py:143 >> {'loss': 0.4084, 'learning_rate': 1.5019e-05, 'epoch': 1.90, 'throughput': 10001.66} [INFO|2025-03-20 16:42:18] logging.py:143 >> {'loss': 0.4023, 'learning_rate': 1.5006e-05, 'epoch': 1.90, 'throughput': 10001.70} [INFO|2025-03-20 16:42:58] logging.py:143 >> {'loss': 0.4134, 'learning_rate': 1.4994e-05, 'epoch': 1.90, 'throughput': 10001.70} [INFO|2025-03-20 16:43:39] logging.py:143 >> {'loss': 0.3871, 'learning_rate': 1.4981e-05, 'epoch': 1.90, 'throughput': 10001.67} [INFO|2025-03-20 16:44:18] logging.py:143 >> {'loss': 0.4027, 'learning_rate': 1.4968e-05, 'epoch': 1.90, 'throughput': 10001.66} [INFO|2025-03-20 16:44:57] logging.py:143 >> {'loss': 0.4013, 'learning_rate': 1.4955e-05, 'epoch': 1.90, 'throughput': 10001.73} [INFO|2025-03-20 16:45:37] logging.py:143 >> {'loss': 0.3976, 'learning_rate': 1.4942e-05, 'epoch': 1.90, 'throughput': 10001.81} [INFO|2025-03-20 16:46:17] logging.py:143 >> {'loss': 0.4231, 'learning_rate': 1.4929e-05, 'epoch': 1.90, 'throughput': 10001.80} [INFO|2025-03-20 16:46:58] logging.py:143 >> {'loss': 0.4268, 'learning_rate': 1.4917e-05, 'epoch': 1.90, 'throughput': 10001.77} [INFO|2025-03-20 16:47:38] logging.py:143 >> {'loss': 0.4101, 'learning_rate': 1.4904e-05, 'epoch': 1.90, 'throughput': 10001.77} [INFO|2025-03-20 16:48:18] logging.py:143 >> {'loss': 0.3914, 'learning_rate': 1.4891e-05, 'epoch': 1.91, 'throughput': 10001.80} [INFO|2025-03-20 16:49:00] logging.py:143 >> {'loss': 0.4050, 'learning_rate': 1.4878e-05, 'epoch': 1.91, 'throughput': 10001.74} [INFO|2025-03-20 16:49:39] logging.py:143 >> {'loss': 0.4204, 'learning_rate': 1.4865e-05, 'epoch': 1.91, 'throughput': 10001.85} [INFO|2025-03-20 16:50:19] logging.py:143 >> {'loss': 0.4294, 'learning_rate': 1.4853e-05, 'epoch': 1.91, 'throughput': 10001.88} [INFO|2025-03-20 16:51:00] logging.py:143 >> {'loss': 0.3716, 'learning_rate': 1.4840e-05, 'epoch': 1.91, 'throughput': 10001.83} [INFO|2025-03-20 16:51:41] logging.py:143 >> {'loss': 0.3954, 'learning_rate': 1.4827e-05, 'epoch': 1.91, 'throughput': 10001.78} [INFO|2025-03-20 16:52:21] logging.py:143 >> {'loss': 0.4132, 'learning_rate': 1.4814e-05, 'epoch': 1.91, 'throughput': 10001.87} [INFO|2025-03-20 16:53:01] logging.py:143 >> {'loss': 0.4213, 'learning_rate': 1.4802e-05, 'epoch': 1.91, 'throughput': 10001.81} [INFO|2025-03-20 16:53:41] logging.py:143 >> {'loss': 0.4220, 'learning_rate': 1.4789e-05, 'epoch': 1.91, 'throughput': 10001.85} [INFO|2025-03-20 16:54:22] logging.py:143 >> {'loss': 0.3894, 'learning_rate': 1.4776e-05, 'epoch': 1.91, 'throughput': 10001.82} [INFO|2025-03-20 16:55:02] logging.py:143 >> {'loss': 0.4221, 'learning_rate': 1.4763e-05, 'epoch': 1.91, 'throughput': 10001.84} [INFO|2025-03-20 16:55:42] logging.py:143 >> {'loss': 0.4032, 'learning_rate': 1.4750e-05, 'epoch': 1.91, 'throughput': 10001.82} [INFO|2025-03-20 16:56:24] logging.py:143 >> {'loss': 0.4009, 'learning_rate': 1.4738e-05, 'epoch': 1.91, 'throughput': 10001.71} [INFO|2025-03-20 16:57:05] logging.py:143 >> {'loss': 0.4071, 'learning_rate': 1.4725e-05, 'epoch': 1.91, 'throughput': 10001.73} [INFO|2025-03-20 16:57:47] logging.py:143 >> {'loss': 0.4212, 'learning_rate': 1.4712e-05, 'epoch': 1.91, 'throughput': 10001.75} [INFO|2025-03-20 16:58:26] logging.py:143 >> {'loss': 0.3875, 'learning_rate': 1.4699e-05, 'epoch': 1.91, 'throughput': 10001.82} [INFO|2025-03-20 16:59:06] logging.py:143 >> {'loss': 0.4075, 'learning_rate': 1.4687e-05, 'epoch': 1.91, 'throughput': 10001.85} [INFO|2025-03-20 16:59:45] logging.py:143 >> {'loss': 0.4089, 'learning_rate': 1.4674e-05, 'epoch': 1.91, 'throughput': 10001.93} [INFO|2025-03-20 17:00:26] logging.py:143 >> {'loss': 0.4106, 'learning_rate': 1.4661e-05, 'epoch': 1.91, 'throughput': 10001.89} [INFO|2025-03-20 17:01:07] logging.py:143 >> {'loss': 0.3886, 'learning_rate': 1.4648e-05, 'epoch': 1.92, 'throughput': 10001.82} [INFO|2025-03-20 17:01:49] logging.py:143 >> {'loss': 0.4131, 'learning_rate': 1.4636e-05, 'epoch': 1.92, 'throughput': 10001.79} [INFO|2025-03-20 17:02:28] logging.py:143 >> {'loss': 0.4093, 'learning_rate': 1.4623e-05, 'epoch': 1.92, 'throughput': 10001.86} [INFO|2025-03-20 17:03:09] logging.py:143 >> {'loss': 0.3854, 'learning_rate': 1.4610e-05, 'epoch': 1.92, 'throughput': 10001.83} [INFO|2025-03-20 17:03:50] logging.py:143 >> {'loss': 0.4271, 'learning_rate': 1.4598e-05, 'epoch': 1.92, 'throughput': 10001.79} [INFO|2025-03-20 17:04:29] logging.py:143 >> {'loss': 0.3991, 'learning_rate': 1.4585e-05, 'epoch': 1.92, 'throughput': 10001.80} [INFO|2025-03-20 17:05:11] logging.py:143 >> {'loss': 0.4211, 'learning_rate': 1.4572e-05, 'epoch': 1.92, 'throughput': 10001.69} [INFO|2025-03-20 17:05:51] logging.py:143 >> {'loss': 0.3785, 'learning_rate': 1.4559e-05, 'epoch': 1.92, 'throughput': 10001.69} [INFO|2025-03-20 17:06:33] logging.py:143 >> {'loss': 0.4116, 'learning_rate': 1.4547e-05, 'epoch': 1.92, 'throughput': 10001.75} [INFO|2025-03-20 17:07:13] logging.py:143 >> {'loss': 0.4089, 'learning_rate': 1.4534e-05, 'epoch': 1.92, 'throughput': 10001.74} [INFO|2025-03-20 17:07:52] logging.py:143 >> {'loss': 0.4031, 'learning_rate': 1.4521e-05, 'epoch': 1.92, 'throughput': 10001.80} [INFO|2025-03-20 17:08:32] logging.py:143 >> {'loss': 0.4102, 'learning_rate': 1.4509e-05, 'epoch': 1.92, 'throughput': 10001.82} [INFO|2025-03-20 17:09:12] logging.py:143 >> {'loss': 0.4103, 'learning_rate': 1.4496e-05, 'epoch': 1.92, 'throughput': 10001.76} [INFO|2025-03-20 17:09:54] logging.py:143 >> {'loss': 0.3863, 'learning_rate': 1.4483e-05, 'epoch': 1.92, 'throughput': 10001.79} [INFO|2025-03-20 17:10:34] logging.py:143 >> {'loss': 0.4214, 'learning_rate': 1.4470e-05, 'epoch': 1.92, 'throughput': 10001.81} [INFO|2025-03-20 17:11:15] logging.py:143 >> {'loss': 0.3988, 'learning_rate': 1.4458e-05, 'epoch': 1.92, 'throughput': 10001.82} [INFO|2025-03-20 17:11:55] logging.py:143 >> {'loss': 0.4072, 'learning_rate': 1.4445e-05, 'epoch': 1.92, 'throughput': 10001.84} [INFO|2025-03-20 17:12:37] logging.py:143 >> {'loss': 0.3894, 'learning_rate': 1.4432e-05, 'epoch': 1.92, 'throughput': 10001.77} [INFO|2025-03-20 17:13:17] logging.py:143 >> {'loss': 0.4112, 'learning_rate': 1.4420e-05, 'epoch': 1.92, 'throughput': 10001.76} [INFO|2025-03-20 17:13:58] logging.py:143 >> {'loss': 0.3915, 'learning_rate': 1.4407e-05, 'epoch': 1.93, 'throughput': 10001.69} [INFO|2025-03-20 17:14:38] logging.py:143 >> {'loss': 0.4024, 'learning_rate': 1.4394e-05, 'epoch': 1.93, 'throughput': 10001.70} [INFO|2025-03-20 17:15:18] logging.py:143 >> {'loss': 0.4303, 'learning_rate': 1.4382e-05, 'epoch': 1.93, 'throughput': 10001.73} [INFO|2025-03-20 17:15:59] logging.py:143 >> {'loss': 0.4180, 'learning_rate': 1.4369e-05, 'epoch': 1.93, 'throughput': 10001.77} [INFO|2025-03-20 17:16:40] logging.py:143 >> {'loss': 0.3863, 'learning_rate': 1.4356e-05, 'epoch': 1.93, 'throughput': 10001.72} [INFO|2025-03-20 17:17:20] logging.py:143 >> {'loss': 0.4142, 'learning_rate': 1.4344e-05, 'epoch': 1.93, 'throughput': 10001.74} [INFO|2025-03-20 17:18:01] logging.py:143 >> {'loss': 0.4202, 'learning_rate': 1.4331e-05, 'epoch': 1.93, 'throughput': 10001.78} [INFO|2025-03-20 17:18:43] logging.py:143 >> {'loss': 0.4174, 'learning_rate': 1.4318e-05, 'epoch': 1.93, 'throughput': 10001.67} [INFO|2025-03-20 17:19:22] logging.py:143 >> {'loss': 0.4135, 'learning_rate': 1.4306e-05, 'epoch': 1.93, 'throughput': 10001.73} [INFO|2025-03-20 17:20:02] logging.py:143 >> {'loss': 0.4703, 'learning_rate': 1.4293e-05, 'epoch': 1.93, 'throughput': 10001.80} [INFO|2025-03-20 17:20:42] logging.py:143 >> {'loss': 0.3711, 'learning_rate': 1.4280e-05, 'epoch': 1.93, 'throughput': 10001.73} [INFO|2025-03-20 17:21:22] logging.py:143 >> {'loss': 0.3861, 'learning_rate': 1.4268e-05, 'epoch': 1.93, 'throughput': 10001.77} [INFO|2025-03-20 17:22:01] logging.py:143 >> {'loss': 0.4025, 'learning_rate': 1.4255e-05, 'epoch': 1.93, 'throughput': 10001.84} [INFO|2025-03-20 17:22:41] logging.py:143 >> {'loss': 0.3897, 'learning_rate': 1.4243e-05, 'epoch': 1.93, 'throughput': 10001.97} [INFO|2025-03-20 17:23:22] logging.py:143 >> {'loss': 0.3934, 'learning_rate': 1.4230e-05, 'epoch': 1.93, 'throughput': 10001.86} [INFO|2025-03-20 17:24:02] logging.py:143 >> {'loss': 0.4134, 'learning_rate': 1.4217e-05, 'epoch': 1.93, 'throughput': 10001.82} [INFO|2025-03-20 17:24:41] logging.py:143 >> {'loss': 0.3900, 'learning_rate': 1.4205e-05, 'epoch': 1.93, 'throughput': 10001.86} [INFO|2025-03-20 17:25:22] logging.py:143 >> {'loss': 0.3855, 'learning_rate': 1.4192e-05, 'epoch': 1.93, 'throughput': 10001.77} [INFO|2025-03-20 17:26:02] logging.py:143 >> {'loss': 0.3921, 'learning_rate': 1.4179e-05, 'epoch': 1.94, 'throughput': 10001.73} [INFO|2025-03-20 17:26:42] logging.py:143 >> {'loss': 0.4233, 'learning_rate': 1.4167e-05, 'epoch': 1.94, 'throughput': 10001.80} [INFO|2025-03-20 17:27:23] logging.py:143 >> {'loss': 0.4143, 'learning_rate': 1.4154e-05, 'epoch': 1.94, 'throughput': 10001.84} [INFO|2025-03-20 17:28:05] logging.py:143 >> {'loss': 0.3810, 'learning_rate': 1.4142e-05, 'epoch': 1.94, 'throughput': 10001.76} [INFO|2025-03-20 17:28:46] logging.py:143 >> {'loss': 0.4210, 'learning_rate': 1.4129e-05, 'epoch': 1.94, 'throughput': 10001.76} [INFO|2025-03-20 17:29:28] logging.py:143 >> {'loss': 0.3695, 'learning_rate': 1.4116e-05, 'epoch': 1.94, 'throughput': 10001.69} [INFO|2025-03-20 17:30:08] logging.py:143 >> {'loss': 0.3834, 'learning_rate': 1.4104e-05, 'epoch': 1.94, 'throughput': 10001.73} [INFO|2025-03-20 17:30:48] logging.py:143 >> {'loss': 0.3955, 'learning_rate': 1.4091e-05, 'epoch': 1.94, 'throughput': 10001.75} [INFO|2025-03-20 17:31:29] logging.py:143 >> {'loss': 0.4054, 'learning_rate': 1.4079e-05, 'epoch': 1.94, 'throughput': 10001.76} [INFO|2025-03-20 17:32:09] logging.py:143 >> {'loss': 0.3966, 'learning_rate': 1.4066e-05, 'epoch': 1.94, 'throughput': 10001.82} [INFO|2025-03-20 17:32:49] logging.py:143 >> {'loss': 0.4125, 'learning_rate': 1.4053e-05, 'epoch': 1.94, 'throughput': 10001.82} [INFO|2025-03-20 17:33:29] logging.py:143 >> {'loss': 0.3686, 'learning_rate': 1.4041e-05, 'epoch': 1.94, 'throughput': 10001.84} [INFO|2025-03-20 17:34:11] logging.py:143 >> {'loss': 0.3820, 'learning_rate': 1.4028e-05, 'epoch': 1.94, 'throughput': 10001.77} [INFO|2025-03-20 17:34:51] logging.py:143 >> {'loss': 0.4115, 'learning_rate': 1.4016e-05, 'epoch': 1.94, 'throughput': 10001.83} [INFO|2025-03-20 17:35:33] logging.py:143 >> {'loss': 0.3887, 'learning_rate': 1.4003e-05, 'epoch': 1.94, 'throughput': 10001.77} [INFO|2025-03-20 17:36:13] logging.py:143 >> {'loss': 0.3896, 'learning_rate': 1.3991e-05, 'epoch': 1.94, 'throughput': 10001.78} [INFO|2025-03-20 17:36:53] logging.py:143 >> {'loss': 0.4113, 'learning_rate': 1.3978e-05, 'epoch': 1.94, 'throughput': 10001.78} [INFO|2025-03-20 17:37:34] logging.py:143 >> {'loss': 0.3967, 'learning_rate': 1.3965e-05, 'epoch': 1.94, 'throughput': 10001.78} [INFO|2025-03-20 17:38:15] logging.py:143 >> {'loss': 0.3868, 'learning_rate': 1.3953e-05, 'epoch': 1.94, 'throughput': 10001.72} [INFO|2025-03-20 17:38:54] logging.py:143 >> {'loss': 0.3781, 'learning_rate': 1.3940e-05, 'epoch': 1.95, 'throughput': 10001.78} [INFO|2025-03-20 17:39:34] logging.py:143 >> {'loss': 0.4272, 'learning_rate': 1.3928e-05, 'epoch': 1.95, 'throughput': 10001.82} [INFO|2025-03-20 17:40:14] logging.py:143 >> {'loss': 0.4028, 'learning_rate': 1.3915e-05, 'epoch': 1.95, 'throughput': 10001.78} [INFO|2025-03-20 17:40:55] logging.py:143 >> {'loss': 0.4341, 'learning_rate': 1.3903e-05, 'epoch': 1.95, 'throughput': 10001.83} [INFO|2025-03-20 17:41:36] logging.py:143 >> {'loss': 0.3998, 'learning_rate': 1.3890e-05, 'epoch': 1.95, 'throughput': 10001.76} [INFO|2025-03-20 17:42:17] logging.py:143 >> {'loss': 0.3938, 'learning_rate': 1.3878e-05, 'epoch': 1.95, 'throughput': 10001.74} [INFO|2025-03-20 17:42:59] logging.py:143 >> {'loss': 0.4108, 'learning_rate': 1.3865e-05, 'epoch': 1.95, 'throughput': 10001.77} [INFO|2025-03-20 17:43:40] logging.py:143 >> {'loss': 0.4030, 'learning_rate': 1.3853e-05, 'epoch': 1.95, 'throughput': 10001.70} [INFO|2025-03-20 17:44:21] logging.py:143 >> {'loss': 0.4054, 'learning_rate': 1.3840e-05, 'epoch': 1.95, 'throughput': 10001.66} [INFO|2025-03-20 17:45:02] logging.py:143 >> {'loss': 0.4279, 'learning_rate': 1.3828e-05, 'epoch': 1.95, 'throughput': 10001.62} [INFO|2025-03-20 17:45:41] logging.py:143 >> {'loss': 0.3946, 'learning_rate': 1.3815e-05, 'epoch': 1.95, 'throughput': 10001.67} [INFO|2025-03-20 17:46:21] logging.py:143 >> {'loss': 0.3666, 'learning_rate': 1.3802e-05, 'epoch': 1.95, 'throughput': 10001.64} [INFO|2025-03-20 17:47:01] logging.py:143 >> {'loss': 0.3761, 'learning_rate': 1.3790e-05, 'epoch': 1.95, 'throughput': 10001.62} [INFO|2025-03-20 17:47:41] logging.py:143 >> {'loss': 0.3928, 'learning_rate': 1.3777e-05, 'epoch': 1.95, 'throughput': 10001.65} [INFO|2025-03-20 17:48:23] logging.py:143 >> {'loss': 0.4211, 'learning_rate': 1.3765e-05, 'epoch': 1.95, 'throughput': 10001.57} [INFO|2025-03-20 17:49:04] logging.py:143 >> {'loss': 0.4256, 'learning_rate': 1.3752e-05, 'epoch': 1.95, 'throughput': 10001.57} [INFO|2025-03-20 17:49:45] logging.py:143 >> {'loss': 0.4113, 'learning_rate': 1.3740e-05, 'epoch': 1.95, 'throughput': 10001.64} [INFO|2025-03-20 17:50:26] logging.py:143 >> {'loss': 0.4020, 'learning_rate': 1.3727e-05, 'epoch': 1.95, 'throughput': 10001.60} [INFO|2025-03-20 17:51:07] logging.py:143 >> {'loss': 0.3946, 'learning_rate': 1.3715e-05, 'epoch': 1.95, 'throughput': 10001.56} [INFO|2025-03-20 17:51:48] logging.py:143 >> {'loss': 0.4331, 'learning_rate': 1.3703e-05, 'epoch': 1.96, 'throughput': 10001.57} [INFO|2025-03-20 17:52:30] logging.py:143 >> {'loss': 0.3904, 'learning_rate': 1.3690e-05, 'epoch': 1.96, 'throughput': 10001.51} [INFO|2025-03-20 17:53:10] logging.py:143 >> {'loss': 0.4123, 'learning_rate': 1.3678e-05, 'epoch': 1.96, 'throughput': 10001.62} [INFO|2025-03-20 17:53:49] logging.py:143 >> {'loss': 0.4214, 'learning_rate': 1.3665e-05, 'epoch': 1.96, 'throughput': 10001.74} [INFO|2025-03-20 17:54:29] logging.py:143 >> {'loss': 0.3976, 'learning_rate': 1.3653e-05, 'epoch': 1.96, 'throughput': 10001.72} [INFO|2025-03-20 17:55:10] logging.py:143 >> {'loss': 0.4162, 'learning_rate': 1.3640e-05, 'epoch': 1.96, 'throughput': 10001.73} [INFO|2025-03-20 17:55:50] logging.py:143 >> {'loss': 0.4032, 'learning_rate': 1.3628e-05, 'epoch': 1.96, 'throughput': 10001.79} [INFO|2025-03-20 17:56:31] logging.py:143 >> {'loss': 0.4133, 'learning_rate': 1.3615e-05, 'epoch': 1.96, 'throughput': 10001.73} [INFO|2025-03-20 17:57:12] logging.py:143 >> {'loss': 0.4065, 'learning_rate': 1.3603e-05, 'epoch': 1.96, 'throughput': 10001.80} [INFO|2025-03-20 17:57:52] logging.py:143 >> {'loss': 0.3924, 'learning_rate': 1.3590e-05, 'epoch': 1.96, 'throughput': 10001.84} [INFO|2025-03-20 17:58:32] logging.py:143 >> {'loss': 0.4071, 'learning_rate': 1.3578e-05, 'epoch': 1.96, 'throughput': 10001.91} [INFO|2025-03-20 17:59:12] logging.py:143 >> {'loss': 0.3885, 'learning_rate': 1.3565e-05, 'epoch': 1.96, 'throughput': 10001.87} [INFO|2025-03-20 17:59:52] logging.py:143 >> {'loss': 0.4120, 'learning_rate': 1.3553e-05, 'epoch': 1.96, 'throughput': 10001.93} [INFO|2025-03-20 18:00:31] logging.py:143 >> {'loss': 0.3626, 'learning_rate': 1.3541e-05, 'epoch': 1.96, 'throughput': 10002.03} [INFO|2025-03-20 18:01:12] logging.py:143 >> {'loss': 0.4191, 'learning_rate': 1.3528e-05, 'epoch': 1.96, 'throughput': 10002.07} [INFO|2025-03-20 18:01:53] logging.py:143 >> {'loss': 0.4235, 'learning_rate': 1.3516e-05, 'epoch': 1.96, 'throughput': 10002.07} [INFO|2025-03-20 18:02:33] logging.py:143 >> {'loss': 0.3987, 'learning_rate': 1.3503e-05, 'epoch': 1.96, 'throughput': 10002.10} [INFO|2025-03-20 18:03:15] logging.py:143 >> {'loss': 0.4090, 'learning_rate': 1.3491e-05, 'epoch': 1.96, 'throughput': 10002.09} [INFO|2025-03-20 18:03:55] logging.py:143 >> {'loss': 0.4162, 'learning_rate': 1.3478e-05, 'epoch': 1.96, 'throughput': 10002.09} [INFO|2025-03-20 18:04:37] logging.py:143 >> {'loss': 0.4136, 'learning_rate': 1.3466e-05, 'epoch': 1.97, 'throughput': 10002.09} [INFO|2025-03-20 18:05:19] logging.py:143 >> {'loss': 0.4002, 'learning_rate': 1.3454e-05, 'epoch': 1.97, 'throughput': 10001.95} [INFO|2025-03-20 18:05:59] logging.py:143 >> {'loss': 0.3896, 'learning_rate': 1.3441e-05, 'epoch': 1.97, 'throughput': 10001.99} [INFO|2025-03-20 18:06:40] logging.py:143 >> {'loss': 0.4077, 'learning_rate': 1.3429e-05, 'epoch': 1.97, 'throughput': 10001.99} [INFO|2025-03-20 18:07:19] logging.py:143 >> {'loss': 0.4023, 'learning_rate': 1.3416e-05, 'epoch': 1.97, 'throughput': 10002.06} [INFO|2025-03-20 18:08:01] logging.py:143 >> {'loss': 0.3738, 'learning_rate': 1.3404e-05, 'epoch': 1.97, 'throughput': 10002.05} [INFO|2025-03-20 18:08:42] logging.py:143 >> {'loss': 0.3958, 'learning_rate': 1.3392e-05, 'epoch': 1.97, 'throughput': 10001.96} [INFO|2025-03-20 18:09:22] logging.py:143 >> {'loss': 0.4158, 'learning_rate': 1.3379e-05, 'epoch': 1.97, 'throughput': 10001.94} [INFO|2025-03-20 18:10:02] logging.py:143 >> {'loss': 0.4164, 'learning_rate': 1.3367e-05, 'epoch': 1.97, 'throughput': 10001.98} [INFO|2025-03-20 18:10:44] logging.py:143 >> {'loss': 0.4006, 'learning_rate': 1.3354e-05, 'epoch': 1.97, 'throughput': 10001.95} [INFO|2025-03-20 18:11:25] logging.py:143 >> {'loss': 0.3694, 'learning_rate': 1.3342e-05, 'epoch': 1.97, 'throughput': 10001.90} [INFO|2025-03-20 18:12:08] logging.py:143 >> {'loss': 0.3444, 'learning_rate': 1.3330e-05, 'epoch': 1.97, 'throughput': 10001.82} [INFO|2025-03-20 18:12:48] logging.py:143 >> {'loss': 0.4095, 'learning_rate': 1.3317e-05, 'epoch': 1.97, 'throughput': 10001.78} [INFO|2025-03-20 18:13:31] logging.py:143 >> {'loss': 0.3989, 'learning_rate': 1.3305e-05, 'epoch': 1.97, 'throughput': 10001.66} [INFO|2025-03-20 18:14:11] logging.py:143 >> {'loss': 0.3828, 'learning_rate': 1.3293e-05, 'epoch': 1.97, 'throughput': 10001.66} [INFO|2025-03-20 18:14:52] logging.py:143 >> {'loss': 0.3940, 'learning_rate': 1.3280e-05, 'epoch': 1.97, 'throughput': 10001.62} [INFO|2025-03-20 18:15:34] logging.py:143 >> {'loss': 0.4124, 'learning_rate': 1.3268e-05, 'epoch': 1.97, 'throughput': 10001.51} [INFO|2025-03-20 18:16:14] logging.py:143 >> {'loss': 0.3866, 'learning_rate': 1.3255e-05, 'epoch': 1.97, 'throughput': 10001.46} [INFO|2025-03-20 18:16:54] logging.py:143 >> {'loss': 0.4206, 'learning_rate': 1.3243e-05, 'epoch': 1.97, 'throughput': 10001.48} [INFO|2025-03-20 18:17:33] logging.py:143 >> {'loss': 0.4027, 'learning_rate': 1.3231e-05, 'epoch': 1.98, 'throughput': 10001.53} [INFO|2025-03-20 18:18:13] logging.py:143 >> {'loss': 0.3730, 'learning_rate': 1.3218e-05, 'epoch': 1.98, 'throughput': 10001.56} [INFO|2025-03-20 18:18:55] logging.py:143 >> {'loss': 0.4032, 'learning_rate': 1.3206e-05, 'epoch': 1.98, 'throughput': 10001.49} [INFO|2025-03-20 18:19:36] logging.py:143 >> {'loss': 0.4227, 'learning_rate': 1.3194e-05, 'epoch': 1.98, 'throughput': 10001.44} [INFO|2025-03-20 18:20:15] logging.py:143 >> {'loss': 0.3794, 'learning_rate': 1.3181e-05, 'epoch': 1.98, 'throughput': 10001.47} [INFO|2025-03-20 18:20:55] logging.py:143 >> {'loss': 0.3996, 'learning_rate': 1.3169e-05, 'epoch': 1.98, 'throughput': 10001.43} [INFO|2025-03-20 18:21:35] logging.py:143 >> {'loss': 0.4173, 'learning_rate': 1.3157e-05, 'epoch': 1.98, 'throughput': 10001.39} [INFO|2025-03-20 18:22:17] logging.py:143 >> {'loss': 0.3841, 'learning_rate': 1.3144e-05, 'epoch': 1.98, 'throughput': 10001.34} [INFO|2025-03-20 18:22:57] logging.py:143 >> {'loss': 0.3854, 'learning_rate': 1.3132e-05, 'epoch': 1.98, 'throughput': 10001.36} [INFO|2025-03-20 18:23:36] logging.py:143 >> {'loss': 0.3799, 'learning_rate': 1.3120e-05, 'epoch': 1.98, 'throughput': 10001.44} [INFO|2025-03-20 18:24:18] logging.py:143 >> {'loss': 0.3897, 'learning_rate': 1.3107e-05, 'epoch': 1.98, 'throughput': 10001.47} [INFO|2025-03-20 18:25:01] logging.py:143 >> {'loss': 0.4249, 'learning_rate': 1.3095e-05, 'epoch': 1.98, 'throughput': 10001.32} [INFO|2025-03-20 18:25:42] logging.py:143 >> {'loss': 0.4079, 'learning_rate': 1.3083e-05, 'epoch': 1.98, 'throughput': 10001.33} [INFO|2025-03-20 18:26:23] logging.py:143 >> {'loss': 0.4353, 'learning_rate': 1.3071e-05, 'epoch': 1.98, 'throughput': 10001.28} [INFO|2025-03-20 18:27:05] logging.py:143 >> {'loss': 0.4037, 'learning_rate': 1.3058e-05, 'epoch': 1.98, 'throughput': 10001.17} [INFO|2025-03-20 18:27:46] logging.py:143 >> {'loss': 0.3909, 'learning_rate': 1.3046e-05, 'epoch': 1.98, 'throughput': 10001.14} [INFO|2025-03-20 18:28:25] logging.py:143 >> {'loss': 0.4071, 'learning_rate': 1.3034e-05, 'epoch': 1.98, 'throughput': 10001.17} [INFO|2025-03-20 18:29:05] logging.py:143 >> {'loss': 0.4110, 'learning_rate': 1.3021e-05, 'epoch': 1.98, 'throughput': 10001.20} [INFO|2025-03-20 18:29:46] logging.py:143 >> {'loss': 0.3866, 'learning_rate': 1.3009e-05, 'epoch': 1.98, 'throughput': 10001.22} [INFO|2025-03-20 18:30:26] logging.py:143 >> {'loss': 0.4152, 'learning_rate': 1.2997e-05, 'epoch': 1.99, 'throughput': 10001.25} [INFO|2025-03-20 18:31:06] logging.py:143 >> {'loss': 0.3881, 'learning_rate': 1.2985e-05, 'epoch': 1.99, 'throughput': 10001.25} [INFO|2025-03-20 18:31:46] logging.py:143 >> {'loss': 0.3928, 'learning_rate': 1.2972e-05, 'epoch': 1.99, 'throughput': 10001.23} [INFO|2025-03-20 18:32:27] logging.py:143 >> {'loss': 0.3746, 'learning_rate': 1.2960e-05, 'epoch': 1.99, 'throughput': 10001.13} [INFO|2025-03-20 18:33:07] logging.py:143 >> {'loss': 0.4010, 'learning_rate': 1.2948e-05, 'epoch': 1.99, 'throughput': 10001.20} [INFO|2025-03-20 18:33:47] logging.py:143 >> {'loss': 0.3943, 'learning_rate': 1.2936e-05, 'epoch': 1.99, 'throughput': 10001.20} [INFO|2025-03-20 18:34:27] logging.py:143 >> {'loss': 0.3934, 'learning_rate': 1.2923e-05, 'epoch': 1.99, 'throughput': 10001.24} [INFO|2025-03-20 18:35:08] logging.py:143 >> {'loss': 0.4158, 'learning_rate': 1.2911e-05, 'epoch': 1.99, 'throughput': 10001.20} [INFO|2025-03-20 18:35:49] logging.py:143 >> {'loss': 0.4067, 'learning_rate': 1.2899e-05, 'epoch': 1.99, 'throughput': 10001.15} [INFO|2025-03-20 18:36:29] logging.py:143 >> {'loss': 0.3995, 'learning_rate': 1.2887e-05, 'epoch': 1.99, 'throughput': 10001.22} [INFO|2025-03-20 18:37:10] logging.py:143 >> {'loss': 0.3756, 'learning_rate': 1.2874e-05, 'epoch': 1.99, 'throughput': 10001.21} [INFO|2025-03-20 18:37:51] logging.py:143 >> {'loss': 0.4067, 'learning_rate': 1.2862e-05, 'epoch': 1.99, 'throughput': 10001.15} [INFO|2025-03-20 18:38:31] logging.py:143 >> {'loss': 0.4048, 'learning_rate': 1.2850e-05, 'epoch': 1.99, 'throughput': 10001.15} [INFO|2025-03-20 18:39:12] logging.py:143 >> {'loss': 0.4084, 'learning_rate': 1.2838e-05, 'epoch': 1.99, 'throughput': 10001.18} [INFO|2025-03-20 18:39:53] logging.py:143 >> {'loss': 0.3754, 'learning_rate': 1.2825e-05, 'epoch': 1.99, 'throughput': 10001.20} [INFO|2025-03-20 18:40:33] logging.py:143 >> {'loss': 0.4229, 'learning_rate': 1.2813e-05, 'epoch': 1.99, 'throughput': 10001.19} [INFO|2025-03-20 18:41:13] logging.py:143 >> {'loss': 0.3780, 'learning_rate': 1.2801e-05, 'epoch': 1.99, 'throughput': 10001.20} [INFO|2025-03-20 18:41:54] logging.py:143 >> {'loss': 0.4419, 'learning_rate': 1.2789e-05, 'epoch': 1.99, 'throughput': 10001.18} [INFO|2025-03-20 18:42:34] logging.py:143 >> {'loss': 0.3993, 'learning_rate': 1.2777e-05, 'epoch': 2.00, 'throughput': 10001.16} [INFO|2025-03-20 18:43:15] logging.py:143 >> {'loss': 0.3999, 'learning_rate': 1.2764e-05, 'epoch': 2.00, 'throughput': 10001.17} [INFO|2025-03-20 18:43:56] logging.py:143 >> {'loss': 0.3822, 'learning_rate': 1.2752e-05, 'epoch': 2.00, 'throughput': 10001.15} [INFO|2025-03-20 18:44:36] logging.py:143 >> {'loss': 0.4043, 'learning_rate': 1.2740e-05, 'epoch': 2.00, 'throughput': 10001.15} [INFO|2025-03-20 18:45:17] logging.py:143 >> {'loss': 0.3928, 'learning_rate': 1.2728e-05, 'epoch': 2.00, 'throughput': 10001.22} [INFO|2025-03-20 18:45:56] logging.py:143 >> {'loss': 0.3891, 'learning_rate': 1.2716e-05, 'epoch': 2.00, 'throughput': 10001.16} [INFO|2025-03-20 18:46:37] logging.py:143 >> {'loss': 0.4001, 'learning_rate': 1.2703e-05, 'epoch': 2.00, 'throughput': 10001.10} [INFO|2025-03-20 18:47:17] logging.py:143 >> {'loss': 0.4310, 'learning_rate': 1.2691e-05, 'epoch': 2.00, 'throughput': 10001.13} [INFO|2025-03-20 18:47:58] logging.py:143 >> {'loss': 0.4187, 'learning_rate': 1.2679e-05, 'epoch': 2.00, 'throughput': 10001.16} [INFO|2025-03-20 18:48:39] logging.py:143 >> {'loss': 0.4041, 'learning_rate': 1.2667e-05, 'epoch': 2.00, 'throughput': 10001.07} [INFO|2025-03-20 18:49:20] logging.py:143 >> {'loss': 0.3029, 'learning_rate': 1.2655e-05, 'epoch': 2.00, 'throughput': 10001.04} [INFO|2025-03-20 18:49:59] logging.py:143 >> {'loss': 0.2555, 'learning_rate': 1.2642e-05, 'epoch': 2.00, 'throughput': 10001.06} [INFO|2025-03-20 18:50:39] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 1.2630e-05, 'epoch': 2.00, 'throughput': 10001.09} [INFO|2025-03-20 18:51:20] logging.py:143 >> {'loss': 0.2583, 'learning_rate': 1.2618e-05, 'epoch': 2.00, 'throughput': 10001.14} [INFO|2025-03-20 18:52:02] logging.py:143 >> {'loss': 0.2484, 'learning_rate': 1.2606e-05, 'epoch': 2.00, 'throughput': 10001.11} [INFO|2025-03-20 18:52:43] logging.py:143 >> {'loss': 0.2619, 'learning_rate': 1.2594e-05, 'epoch': 2.00, 'throughput': 10001.09} [INFO|2025-03-20 18:53:23] logging.py:143 >> {'loss': 0.2555, 'learning_rate': 1.2582e-05, 'epoch': 2.00, 'throughput': 10001.07} [INFO|2025-03-20 18:54:03] logging.py:143 >> {'loss': 0.2429, 'learning_rate': 1.2570e-05, 'epoch': 2.00, 'throughput': 10001.08} [INFO|2025-03-20 18:54:45] logging.py:143 >> {'loss': 0.2561, 'learning_rate': 1.2557e-05, 'epoch': 2.00, 'throughput': 10001.00} [INFO|2025-03-20 18:55:24] logging.py:143 >> {'loss': 0.2542, 'learning_rate': 1.2545e-05, 'epoch': 2.01, 'throughput': 10001.00} [INFO|2025-03-20 18:56:05] logging.py:143 >> {'loss': 0.2534, 'learning_rate': 1.2533e-05, 'epoch': 2.01, 'throughput': 10001.01} [INFO|2025-03-20 18:56:46] logging.py:143 >> {'loss': 0.2507, 'learning_rate': 1.2521e-05, 'epoch': 2.01, 'throughput': 10001.01} [INFO|2025-03-20 18:57:27] logging.py:143 >> {'loss': 0.2507, 'learning_rate': 1.2509e-05, 'epoch': 2.01, 'throughput': 10000.97} [INFO|2025-03-20 18:58:08] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 1.2497e-05, 'epoch': 2.01, 'throughput': 10000.98} [INFO|2025-03-20 18:58:50] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 1.2485e-05, 'epoch': 2.01, 'throughput': 10000.91} [INFO|2025-03-20 18:59:30] logging.py:143 >> {'loss': 0.2479, 'learning_rate': 1.2473e-05, 'epoch': 2.01, 'throughput': 10000.95} [INFO|2025-03-20 19:00:12] logging.py:143 >> {'loss': 0.2537, 'learning_rate': 1.2460e-05, 'epoch': 2.01, 'throughput': 10000.89} [INFO|2025-03-20 19:00:53] logging.py:143 >> {'loss': 0.2488, 'learning_rate': 1.2448e-05, 'epoch': 2.01, 'throughput': 10000.83} [INFO|2025-03-20 19:01:33] logging.py:143 >> {'loss': 0.2601, 'learning_rate': 1.2436e-05, 'epoch': 2.01, 'throughput': 10000.85} [INFO|2025-03-20 19:02:12] logging.py:143 >> {'loss': 0.2466, 'learning_rate': 1.2424e-05, 'epoch': 2.01, 'throughput': 10000.87} [INFO|2025-03-20 19:02:53] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 1.2412e-05, 'epoch': 2.01, 'throughput': 10000.88} [INFO|2025-03-20 19:03:33] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 1.2400e-05, 'epoch': 2.01, 'throughput': 10000.86} [INFO|2025-03-20 19:04:14] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 1.2388e-05, 'epoch': 2.01, 'throughput': 10000.83} [INFO|2025-03-20 19:04:55] logging.py:143 >> {'loss': 0.2510, 'learning_rate': 1.2376e-05, 'epoch': 2.01, 'throughput': 10000.81} [INFO|2025-03-20 19:05:35] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 1.2364e-05, 'epoch': 2.01, 'throughput': 10000.82} [INFO|2025-03-20 19:06:17] logging.py:143 >> {'loss': 0.2634, 'learning_rate': 1.2352e-05, 'epoch': 2.01, 'throughput': 10000.72} [INFO|2025-03-20 19:06:56] logging.py:143 >> {'loss': 0.2691, 'learning_rate': 1.2340e-05, 'epoch': 2.01, 'throughput': 10000.72} [INFO|2025-03-20 19:07:37] logging.py:143 >> {'loss': 0.2535, 'learning_rate': 1.2328e-05, 'epoch': 2.01, 'throughput': 10000.67} [INFO|2025-03-20 19:08:18] logging.py:143 >> {'loss': 0.2391, 'learning_rate': 1.2315e-05, 'epoch': 2.02, 'throughput': 10000.64} [INFO|2025-03-20 19:08:58] logging.py:143 >> {'loss': 0.2513, 'learning_rate': 1.2303e-05, 'epoch': 2.02, 'throughput': 10000.65} [INFO|2025-03-20 19:09:39] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 1.2291e-05, 'epoch': 2.02, 'throughput': 10000.62} [INFO|2025-03-20 19:10:21] logging.py:143 >> {'loss': 0.2632, 'learning_rate': 1.2279e-05, 'epoch': 2.02, 'throughput': 10000.54} [INFO|2025-03-20 19:11:02] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 1.2267e-05, 'epoch': 2.02, 'throughput': 10000.49} [INFO|2025-03-20 19:11:42] logging.py:143 >> {'loss': 0.2487, 'learning_rate': 1.2255e-05, 'epoch': 2.02, 'throughput': 10000.49} [INFO|2025-03-20 19:12:22] logging.py:143 >> {'loss': 0.2283, 'learning_rate': 1.2243e-05, 'epoch': 2.02, 'throughput': 10000.47} [INFO|2025-03-20 19:13:04] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 1.2231e-05, 'epoch': 2.02, 'throughput': 10000.41} [INFO|2025-03-20 19:13:44] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 1.2219e-05, 'epoch': 2.02, 'throughput': 10000.46} [INFO|2025-03-20 19:14:25] logging.py:143 >> {'loss': 0.2535, 'learning_rate': 1.2207e-05, 'epoch': 2.02, 'throughput': 10000.48} [INFO|2025-03-20 19:15:06] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 1.2195e-05, 'epoch': 2.02, 'throughput': 10000.41} [INFO|2025-03-20 19:15:45] logging.py:143 >> {'loss': 0.2916, 'learning_rate': 1.2183e-05, 'epoch': 2.02, 'throughput': 10000.47} [INFO|2025-03-20 19:16:26] logging.py:143 >> {'loss': 0.2594, 'learning_rate': 1.2171e-05, 'epoch': 2.02, 'throughput': 10000.51} [INFO|2025-03-20 19:17:07] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 1.2159e-05, 'epoch': 2.02, 'throughput': 10000.47} [INFO|2025-03-20 19:17:47] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 1.2147e-05, 'epoch': 2.02, 'throughput': 10000.43} [INFO|2025-03-20 19:18:27] logging.py:143 >> {'loss': 0.2433, 'learning_rate': 1.2135e-05, 'epoch': 2.02, 'throughput': 10000.48} [INFO|2025-03-20 19:19:08] logging.py:143 >> {'loss': 0.2403, 'learning_rate': 1.2123e-05, 'epoch': 2.02, 'throughput': 10000.44} [INFO|2025-03-20 19:19:48] logging.py:143 >> {'loss': 0.2493, 'learning_rate': 1.2111e-05, 'epoch': 2.02, 'throughput': 10000.34} [INFO|2025-03-20 19:20:28] logging.py:143 >> {'loss': 0.2421, 'learning_rate': 1.2099e-05, 'epoch': 2.02, 'throughput': 10000.43} [INFO|2025-03-20 19:21:08] logging.py:143 >> {'loss': 0.2541, 'learning_rate': 1.2087e-05, 'epoch': 2.03, 'throughput': 10000.42} [INFO|2025-03-20 19:21:48] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 1.2075e-05, 'epoch': 2.03, 'throughput': 10000.44} [INFO|2025-03-20 19:22:28] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 1.2063e-05, 'epoch': 2.03, 'throughput': 10000.41} [INFO|2025-03-20 19:23:08] logging.py:143 >> {'loss': 0.2491, 'learning_rate': 1.2051e-05, 'epoch': 2.03, 'throughput': 10000.52} [INFO|2025-03-20 19:23:48] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 1.2039e-05, 'epoch': 2.03, 'throughput': 10000.53} [INFO|2025-03-20 19:24:28] logging.py:143 >> {'loss': 0.2483, 'learning_rate': 1.2027e-05, 'epoch': 2.03, 'throughput': 10000.52} [INFO|2025-03-20 19:25:10] logging.py:143 >> {'loss': 0.2333, 'learning_rate': 1.2015e-05, 'epoch': 2.03, 'throughput': 10000.47} [INFO|2025-03-20 19:25:51] logging.py:143 >> {'loss': 0.2272, 'learning_rate': 1.2003e-05, 'epoch': 2.03, 'throughput': 10000.46} [INFO|2025-03-20 19:26:33] logging.py:143 >> {'loss': 0.2573, 'learning_rate': 1.1991e-05, 'epoch': 2.03, 'throughput': 10000.43} [INFO|2025-03-20 19:27:13] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 1.1979e-05, 'epoch': 2.03, 'throughput': 10000.44} [INFO|2025-03-20 19:27:53] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 1.1967e-05, 'epoch': 2.03, 'throughput': 10000.34} [INFO|2025-03-20 19:28:34] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 1.1955e-05, 'epoch': 2.03, 'throughput': 10000.36} [INFO|2025-03-20 19:29:14] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 1.1944e-05, 'epoch': 2.03, 'throughput': 10000.37} [INFO|2025-03-20 19:29:55] logging.py:143 >> {'loss': 0.2463, 'learning_rate': 1.1932e-05, 'epoch': 2.03, 'throughput': 10000.36} [INFO|2025-03-20 19:30:35] logging.py:143 >> {'loss': 0.2580, 'learning_rate': 1.1920e-05, 'epoch': 2.03, 'throughput': 10000.38} [INFO|2025-03-20 19:31:15] logging.py:143 >> {'loss': 0.2468, 'learning_rate': 1.1908e-05, 'epoch': 2.03, 'throughput': 10000.41} [INFO|2025-03-20 19:31:55] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 1.1896e-05, 'epoch': 2.03, 'throughput': 10000.45} [INFO|2025-03-20 19:32:35] logging.py:143 >> {'loss': 0.2651, 'learning_rate': 1.1884e-05, 'epoch': 2.03, 'throughput': 10000.44} [INFO|2025-03-20 19:33:16] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 1.1872e-05, 'epoch': 2.03, 'throughput': 10000.48} [INFO|2025-03-20 19:33:57] logging.py:143 >> {'loss': 0.2392, 'learning_rate': 1.1860e-05, 'epoch': 2.04, 'throughput': 10000.52} [INFO|2025-03-20 19:34:38] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 1.1848e-05, 'epoch': 2.04, 'throughput': 10000.49} [INFO|2025-03-20 19:35:18] logging.py:143 >> {'loss': 0.2226, 'learning_rate': 1.1836e-05, 'epoch': 2.04, 'throughput': 10000.56} [INFO|2025-03-20 19:35:59] logging.py:143 >> {'loss': 0.2391, 'learning_rate': 1.1824e-05, 'epoch': 2.04, 'throughput': 10000.56} [INFO|2025-03-20 19:36:42] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 1.1813e-05, 'epoch': 2.04, 'throughput': 10000.48} [INFO|2025-03-20 19:37:23] logging.py:143 >> {'loss': 0.2511, 'learning_rate': 1.1801e-05, 'epoch': 2.04, 'throughput': 10000.51} [INFO|2025-03-20 19:38:03] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 1.1789e-05, 'epoch': 2.04, 'throughput': 10000.52} [INFO|2025-03-20 19:38:43] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 1.1777e-05, 'epoch': 2.04, 'throughput': 10000.55} [INFO|2025-03-20 19:39:23] logging.py:143 >> {'loss': 0.2537, 'learning_rate': 1.1765e-05, 'epoch': 2.04, 'throughput': 10000.48} [INFO|2025-03-20 19:40:04] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 1.1753e-05, 'epoch': 2.04, 'throughput': 10000.49} [INFO|2025-03-20 19:40:42] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 1.1741e-05, 'epoch': 2.04, 'throughput': 10000.54} [INFO|2025-03-20 19:41:23] logging.py:143 >> {'loss': 0.2598, 'learning_rate': 1.1729e-05, 'epoch': 2.04, 'throughput': 10000.57} [INFO|2025-03-20 19:42:03] logging.py:143 >> {'loss': 0.2511, 'learning_rate': 1.1718e-05, 'epoch': 2.04, 'throughput': 10000.56} [INFO|2025-03-20 19:42:42] logging.py:143 >> {'loss': 0.2505, 'learning_rate': 1.1706e-05, 'epoch': 2.04, 'throughput': 10000.63} [INFO|2025-03-20 19:43:21] logging.py:143 >> {'loss': 0.2514, 'learning_rate': 1.1694e-05, 'epoch': 2.04, 'throughput': 10000.69} [INFO|2025-03-20 19:44:02] logging.py:143 >> {'loss': 0.2446, 'learning_rate': 1.1682e-05, 'epoch': 2.04, 'throughput': 10000.62} [INFO|2025-03-20 19:44:42] logging.py:143 >> {'loss': 0.2495, 'learning_rate': 1.1670e-05, 'epoch': 2.04, 'throughput': 10000.65} [INFO|2025-03-20 19:45:21] logging.py:143 >> {'loss': 0.2378, 'learning_rate': 1.1658e-05, 'epoch': 2.04, 'throughput': 10000.76} [INFO|2025-03-20 19:46:02] logging.py:143 >> {'loss': 0.2333, 'learning_rate': 1.1647e-05, 'epoch': 2.04, 'throughput': 10000.81} [INFO|2025-03-20 19:46:44] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 1.1635e-05, 'epoch': 2.05, 'throughput': 10000.77} [INFO|2025-03-20 19:47:26] logging.py:143 >> {'loss': 0.2489, 'learning_rate': 1.1623e-05, 'epoch': 2.05, 'throughput': 10000.77} [INFO|2025-03-20 19:48:07] logging.py:143 >> {'loss': 0.2470, 'learning_rate': 1.1611e-05, 'epoch': 2.05, 'throughput': 10000.81} [INFO|2025-03-20 19:48:50] logging.py:143 >> {'loss': 0.2371, 'learning_rate': 1.1599e-05, 'epoch': 2.05, 'throughput': 10000.68} [INFO|2025-03-20 19:49:31] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 1.1587e-05, 'epoch': 2.05, 'throughput': 10000.67} [INFO|2025-03-20 19:50:10] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 1.1576e-05, 'epoch': 2.05, 'throughput': 10000.71} [INFO|2025-03-20 19:50:51] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 1.1564e-05, 'epoch': 2.05, 'throughput': 10000.73} [INFO|2025-03-20 19:51:31] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 1.1552e-05, 'epoch': 2.05, 'throughput': 10000.69} [INFO|2025-03-20 19:52:13] logging.py:143 >> {'loss': 0.2519, 'learning_rate': 1.1540e-05, 'epoch': 2.05, 'throughput': 10000.68} [INFO|2025-03-20 19:52:54] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 1.1528e-05, 'epoch': 2.05, 'throughput': 10000.68} [INFO|2025-03-20 19:53:34] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 1.1517e-05, 'epoch': 2.05, 'throughput': 10000.61} [INFO|2025-03-20 19:54:16] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 1.1505e-05, 'epoch': 2.05, 'throughput': 10000.53} [INFO|2025-03-20 19:54:56] logging.py:143 >> {'loss': 0.2588, 'learning_rate': 1.1493e-05, 'epoch': 2.05, 'throughput': 10000.55} [INFO|2025-03-20 19:55:36] logging.py:143 >> {'loss': 0.2554, 'learning_rate': 1.1481e-05, 'epoch': 2.05, 'throughput': 10000.58} [INFO|2025-03-20 19:56:17] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 1.1470e-05, 'epoch': 2.05, 'throughput': 10000.51} [INFO|2025-03-20 19:56:57] logging.py:143 >> {'loss': 0.2647, 'learning_rate': 1.1458e-05, 'epoch': 2.05, 'throughput': 10000.52} [INFO|2025-03-20 19:57:38] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 1.1446e-05, 'epoch': 2.05, 'throughput': 10000.51} [INFO|2025-03-20 19:58:20] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 1.1434e-05, 'epoch': 2.05, 'throughput': 10000.37} [INFO|2025-03-20 19:59:00] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 1.1423e-05, 'epoch': 2.05, 'throughput': 10000.39} [INFO|2025-03-20 19:59:42] logging.py:143 >> {'loss': 0.2258, 'learning_rate': 1.1411e-05, 'epoch': 2.06, 'throughput': 10000.35} [INFO|2025-03-20 20:00:24] logging.py:143 >> {'loss': 0.2527, 'learning_rate': 1.1399e-05, 'epoch': 2.06, 'throughput': 10000.31} [INFO|2025-03-20 20:01:05] logging.py:143 >> {'loss': 0.2200, 'learning_rate': 1.1387e-05, 'epoch': 2.06, 'throughput': 10000.13} [INFO|2025-03-20 20:01:44] logging.py:143 >> {'loss': 0.2440, 'learning_rate': 1.1376e-05, 'epoch': 2.06, 'throughput': 10000.18} [INFO|2025-03-20 20:02:24] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 1.1364e-05, 'epoch': 2.06, 'throughput': 10000.17} [INFO|2025-03-20 20:03:04] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 1.1352e-05, 'epoch': 2.06, 'throughput': 10000.19} [INFO|2025-03-20 20:03:45] logging.py:143 >> {'loss': 0.2638, 'learning_rate': 1.1340e-05, 'epoch': 2.06, 'throughput': 10000.19} [INFO|2025-03-20 20:04:25] logging.py:143 >> {'loss': 0.2361, 'learning_rate': 1.1329e-05, 'epoch': 2.06, 'throughput': 10000.18} [INFO|2025-03-20 20:05:07] logging.py:143 >> {'loss': 0.2469, 'learning_rate': 1.1317e-05, 'epoch': 2.06, 'throughput': 10000.08} [INFO|2025-03-20 20:05:49] logging.py:143 >> {'loss': 0.2510, 'learning_rate': 1.1305e-05, 'epoch': 2.06, 'throughput': 10000.01} [INFO|2025-03-20 20:06:29] logging.py:143 >> {'loss': 0.2315, 'learning_rate': 1.1294e-05, 'epoch': 2.06, 'throughput': 10000.04} [INFO|2025-03-20 20:07:11] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 1.1282e-05, 'epoch': 2.06, 'throughput': 9999.95} [INFO|2025-03-20 20:07:52] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 1.1270e-05, 'epoch': 2.06, 'throughput': 9999.92} [INFO|2025-03-20 20:08:33] logging.py:143 >> {'loss': 0.2597, 'learning_rate': 1.1258e-05, 'epoch': 2.06, 'throughput': 9999.89} [INFO|2025-03-20 20:09:13] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 1.1247e-05, 'epoch': 2.06, 'throughput': 9999.88} [INFO|2025-03-20 20:09:52] logging.py:143 >> {'loss': 0.2589, 'learning_rate': 1.1235e-05, 'epoch': 2.06, 'throughput': 9999.91} [INFO|2025-03-20 20:10:33] logging.py:143 >> {'loss': 0.2512, 'learning_rate': 1.1223e-05, 'epoch': 2.06, 'throughput': 9999.92} [INFO|2025-03-20 20:11:14] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 1.1212e-05, 'epoch': 2.06, 'throughput': 9999.81} [INFO|2025-03-20 20:11:55] logging.py:143 >> {'loss': 0.2339, 'learning_rate': 1.1200e-05, 'epoch': 2.07, 'throughput': 9999.85} [INFO|2025-03-20 20:12:36] logging.py:143 >> {'loss': 0.2546, 'learning_rate': 1.1188e-05, 'epoch': 2.07, 'throughput': 9999.85} [INFO|2025-03-20 20:13:18] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 1.1177e-05, 'epoch': 2.07, 'throughput': 9999.78} [INFO|2025-03-20 20:13:58] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 1.1165e-05, 'epoch': 2.07, 'throughput': 9999.76} [INFO|2025-03-20 20:14:38] logging.py:143 >> {'loss': 0.2549, 'learning_rate': 1.1153e-05, 'epoch': 2.07, 'throughput': 9999.85} [INFO|2025-03-20 20:15:17] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 1.1142e-05, 'epoch': 2.07, 'throughput': 9999.90} [INFO|2025-03-20 20:15:57] logging.py:143 >> {'loss': 0.2427, 'learning_rate': 1.1130e-05, 'epoch': 2.07, 'throughput': 9999.89} [INFO|2025-03-20 20:16:39] logging.py:143 >> {'loss': 0.2414, 'learning_rate': 1.1118e-05, 'epoch': 2.07, 'throughput': 9999.87} [INFO|2025-03-20 20:17:18] logging.py:143 >> {'loss': 0.2499, 'learning_rate': 1.1107e-05, 'epoch': 2.07, 'throughput': 9999.91} [INFO|2025-03-20 20:17:59] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 1.1095e-05, 'epoch': 2.07, 'throughput': 9999.88} [INFO|2025-03-20 20:18:40] logging.py:143 >> {'loss': 0.2499, 'learning_rate': 1.1084e-05, 'epoch': 2.07, 'throughput': 9999.87} [INFO|2025-03-20 20:19:22] logging.py:143 >> {'loss': 0.2467, 'learning_rate': 1.1072e-05, 'epoch': 2.07, 'throughput': 9999.80} [INFO|2025-03-20 20:20:01] logging.py:143 >> {'loss': 0.2361, 'learning_rate': 1.1060e-05, 'epoch': 2.07, 'throughput': 9999.86} [INFO|2025-03-20 20:20:41] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 1.1049e-05, 'epoch': 2.07, 'throughput': 9999.88} [INFO|2025-03-20 20:21:23] logging.py:143 >> {'loss': 0.2305, 'learning_rate': 1.1037e-05, 'epoch': 2.07, 'throughput': 9999.78} [INFO|2025-03-20 20:22:03] logging.py:143 >> {'loss': 0.2566, 'learning_rate': 1.1025e-05, 'epoch': 2.07, 'throughput': 9999.75} [INFO|2025-03-20 20:22:44] logging.py:143 >> {'loss': 0.2461, 'learning_rate': 1.1014e-05, 'epoch': 2.07, 'throughput': 9999.79} [INFO|2025-03-20 20:23:25] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 1.1002e-05, 'epoch': 2.07, 'throughput': 9999.78} [INFO|2025-03-20 20:24:06] logging.py:143 >> {'loss': 0.2528, 'learning_rate': 1.0991e-05, 'epoch': 2.07, 'throughput': 9999.72} [INFO|2025-03-20 20:24:48] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 1.0979e-05, 'epoch': 2.08, 'throughput': 9999.67} [INFO|2025-03-20 20:25:29] logging.py:143 >> {'loss': 0.2175, 'learning_rate': 1.0968e-05, 'epoch': 2.08, 'throughput': 9999.61} [INFO|2025-03-20 20:26:09] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 1.0956e-05, 'epoch': 2.08, 'throughput': 9999.54} [INFO|2025-03-20 20:26:49] logging.py:143 >> {'loss': 0.2430, 'learning_rate': 1.0944e-05, 'epoch': 2.08, 'throughput': 9999.55} [INFO|2025-03-20 20:27:30] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 1.0933e-05, 'epoch': 2.08, 'throughput': 9999.54} [INFO|2025-03-20 20:28:11] logging.py:143 >> {'loss': 0.2327, 'learning_rate': 1.0921e-05, 'epoch': 2.08, 'throughput': 9999.55} [INFO|2025-03-20 20:28:52] logging.py:143 >> {'loss': 0.2583, 'learning_rate': 1.0910e-05, 'epoch': 2.08, 'throughput': 9999.58} [INFO|2025-03-20 20:29:32] logging.py:143 >> {'loss': 0.2467, 'learning_rate': 1.0898e-05, 'epoch': 2.08, 'throughput': 9999.61} [INFO|2025-03-20 20:30:13] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 1.0887e-05, 'epoch': 2.08, 'throughput': 9999.65} [INFO|2025-03-20 20:30:55] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 1.0875e-05, 'epoch': 2.08, 'throughput': 9999.63} [INFO|2025-03-20 20:31:35] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 1.0863e-05, 'epoch': 2.08, 'throughput': 9999.59} [INFO|2025-03-20 20:32:16] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 1.0852e-05, 'epoch': 2.08, 'throughput': 9999.65} [INFO|2025-03-20 20:32:57] logging.py:143 >> {'loss': 0.2597, 'learning_rate': 1.0840e-05, 'epoch': 2.08, 'throughput': 9999.66} [INFO|2025-03-20 20:33:36] logging.py:143 >> {'loss': 0.2605, 'learning_rate': 1.0829e-05, 'epoch': 2.08, 'throughput': 9999.73} [INFO|2025-03-20 20:34:17] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 1.0817e-05, 'epoch': 2.08, 'throughput': 9999.81} [INFO|2025-03-20 20:34:58] logging.py:143 >> {'loss': 0.2523, 'learning_rate': 1.0806e-05, 'epoch': 2.08, 'throughput': 9999.71} [INFO|2025-03-20 20:35:39] logging.py:143 >> {'loss': 0.2394, 'learning_rate': 1.0794e-05, 'epoch': 2.08, 'throughput': 9999.71} [INFO|2025-03-20 20:36:18] logging.py:143 >> {'loss': 0.2502, 'learning_rate': 1.0783e-05, 'epoch': 2.08, 'throughput': 9999.76} [INFO|2025-03-20 20:36:58] logging.py:143 >> {'loss': 0.2508, 'learning_rate': 1.0771e-05, 'epoch': 2.08, 'throughput': 9999.83} [INFO|2025-03-20 20:37:40] logging.py:143 >> {'loss': 0.2432, 'learning_rate': 1.0760e-05, 'epoch': 2.09, 'throughput': 9999.86} [INFO|2025-03-20 20:38:20] logging.py:143 >> {'loss': 0.2499, 'learning_rate': 1.0748e-05, 'epoch': 2.09, 'throughput': 9999.86} [INFO|2025-03-20 20:39:00] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 1.0737e-05, 'epoch': 2.09, 'throughput': 9999.92} [INFO|2025-03-20 20:39:40] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 1.0725e-05, 'epoch': 2.09, 'throughput': 9999.98} [INFO|2025-03-20 20:40:20] logging.py:143 >> {'loss': 0.2239, 'learning_rate': 1.0714e-05, 'epoch': 2.09, 'throughput': 9999.98} [INFO|2025-03-20 20:41:01] logging.py:143 >> {'loss': 0.2593, 'learning_rate': 1.0702e-05, 'epoch': 2.09, 'throughput': 10000.00} [INFO|2025-03-20 20:41:42] logging.py:143 >> {'loss': 0.2478, 'learning_rate': 1.0691e-05, 'epoch': 2.09, 'throughput': 9999.98} [INFO|2025-03-20 20:42:23] logging.py:143 >> {'loss': 0.2439, 'learning_rate': 1.0679e-05, 'epoch': 2.09, 'throughput': 9999.92} [INFO|2025-03-20 20:43:04] logging.py:143 >> {'loss': 0.2617, 'learning_rate': 1.0668e-05, 'epoch': 2.09, 'throughput': 9999.93} [INFO|2025-03-20 20:43:44] logging.py:143 >> {'loss': 0.2381, 'learning_rate': 1.0656e-05, 'epoch': 2.09, 'throughput': 9999.97} [INFO|2025-03-20 20:44:25] logging.py:143 >> {'loss': 0.2514, 'learning_rate': 1.0645e-05, 'epoch': 2.09, 'throughput': 9999.89} [INFO|2025-03-20 20:45:05] logging.py:143 >> {'loss': 0.2553, 'learning_rate': 1.0634e-05, 'epoch': 2.09, 'throughput': 9999.89} [INFO|2025-03-20 20:45:45] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 1.0622e-05, 'epoch': 2.09, 'throughput': 9999.88} [INFO|2025-03-20 20:46:25] logging.py:143 >> {'loss': 0.2556, 'learning_rate': 1.0611e-05, 'epoch': 2.09, 'throughput': 9999.82} [INFO|2025-03-20 20:47:05] logging.py:143 >> {'loss': 0.2483, 'learning_rate': 1.0599e-05, 'epoch': 2.09, 'throughput': 9999.80} [INFO|2025-03-20 20:47:44] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 1.0588e-05, 'epoch': 2.09, 'throughput': 9999.77} [INFO|2025-03-20 20:48:24] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 1.0576e-05, 'epoch': 2.09, 'throughput': 9999.77} [INFO|2025-03-20 20:49:05] logging.py:143 >> {'loss': 0.2498, 'learning_rate': 1.0565e-05, 'epoch': 2.09, 'throughput': 9999.76} [INFO|2025-03-20 20:49:45] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 1.0553e-05, 'epoch': 2.09, 'throughput': 9999.74} [INFO|2025-03-20 20:50:25] logging.py:143 >> {'loss': 0.2632, 'learning_rate': 1.0542e-05, 'epoch': 2.10, 'throughput': 9999.77} [INFO|2025-03-20 20:51:04] logging.py:143 >> {'loss': 0.2550, 'learning_rate': 1.0531e-05, 'epoch': 2.10, 'throughput': 9999.77} [INFO|2025-03-20 20:51:45] logging.py:143 >> {'loss': 0.2461, 'learning_rate': 1.0519e-05, 'epoch': 2.10, 'throughput': 9999.81} [INFO|2025-03-20 20:52:26] logging.py:143 >> {'loss': 0.2686, 'learning_rate': 1.0508e-05, 'epoch': 2.10, 'throughput': 9999.87} [INFO|2025-03-20 20:53:08] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 1.0496e-05, 'epoch': 2.10, 'throughput': 9999.82} [INFO|2025-03-20 20:53:49] logging.py:143 >> {'loss': 0.2484, 'learning_rate': 1.0485e-05, 'epoch': 2.10, 'throughput': 9999.83} [INFO|2025-03-20 20:54:30] logging.py:143 >> {'loss': 0.2424, 'learning_rate': 1.0474e-05, 'epoch': 2.10, 'throughput': 9999.82} [INFO|2025-03-20 20:55:11] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 1.0462e-05, 'epoch': 2.10, 'throughput': 9999.85} [INFO|2025-03-20 20:55:51] logging.py:143 >> {'loss': 0.2615, 'learning_rate': 1.0451e-05, 'epoch': 2.10, 'throughput': 9999.87} [INFO|2025-03-20 20:56:32] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 1.0440e-05, 'epoch': 2.10, 'throughput': 9999.85} [INFO|2025-03-20 20:57:13] logging.py:143 >> {'loss': 0.2589, 'learning_rate': 1.0428e-05, 'epoch': 2.10, 'throughput': 9999.76} [INFO|2025-03-20 20:57:53] logging.py:143 >> {'loss': 0.2537, 'learning_rate': 1.0417e-05, 'epoch': 2.10, 'throughput': 9999.81} [INFO|2025-03-20 20:58:33] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 1.0405e-05, 'epoch': 2.10, 'throughput': 9999.80} [INFO|2025-03-20 20:59:13] logging.py:143 >> {'loss': 0.2448, 'learning_rate': 1.0394e-05, 'epoch': 2.10, 'throughput': 9999.86} [INFO|2025-03-20 20:59:52] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 1.0383e-05, 'epoch': 2.10, 'throughput': 9999.84} [INFO|2025-03-20 21:00:32] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 1.0371e-05, 'epoch': 2.10, 'throughput': 9999.83} [INFO|2025-03-20 21:01:13] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 1.0360e-05, 'epoch': 2.10, 'throughput': 9999.88} [INFO|2025-03-20 21:01:54] logging.py:143 >> {'loss': 0.2569, 'learning_rate': 1.0349e-05, 'epoch': 2.10, 'throughput': 9999.86} [INFO|2025-03-20 21:02:35] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 1.0337e-05, 'epoch': 2.10, 'throughput': 9999.88} [INFO|2025-03-20 21:03:15] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 1.0326e-05, 'epoch': 2.11, 'throughput': 9999.92} [INFO|2025-03-20 21:03:56] logging.py:143 >> {'loss': 0.2337, 'learning_rate': 1.0315e-05, 'epoch': 2.11, 'throughput': 9999.88} [INFO|2025-03-20 21:04:36] logging.py:143 >> {'loss': 0.2488, 'learning_rate': 1.0303e-05, 'epoch': 2.11, 'throughput': 9999.94} [INFO|2025-03-20 21:05:16] logging.py:143 >> {'loss': 0.2190, 'learning_rate': 1.0292e-05, 'epoch': 2.11, 'throughput': 9999.94} [INFO|2025-03-20 21:05:56] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 1.0281e-05, 'epoch': 2.11, 'throughput': 9999.92} [INFO|2025-03-20 21:06:36] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 1.0269e-05, 'epoch': 2.11, 'throughput': 9999.86} [INFO|2025-03-20 21:07:16] logging.py:143 >> {'loss': 0.2240, 'learning_rate': 1.0258e-05, 'epoch': 2.11, 'throughput': 9999.87} [INFO|2025-03-20 21:07:56] logging.py:143 >> {'loss': 0.2511, 'learning_rate': 1.0247e-05, 'epoch': 2.11, 'throughput': 9999.96} [INFO|2025-03-20 21:08:37] logging.py:143 >> {'loss': 0.2526, 'learning_rate': 1.0235e-05, 'epoch': 2.11, 'throughput': 9999.97} [INFO|2025-03-20 21:09:18] logging.py:143 >> {'loss': 0.2587, 'learning_rate': 1.0224e-05, 'epoch': 2.11, 'throughput': 9999.99} [INFO|2025-03-20 21:09:58] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 1.0213e-05, 'epoch': 2.11, 'throughput': 9999.98} [INFO|2025-03-20 21:10:38] logging.py:143 >> {'loss': 0.2552, 'learning_rate': 1.0202e-05, 'epoch': 2.11, 'throughput': 9999.95} [INFO|2025-03-20 21:11:18] logging.py:143 >> {'loss': 0.2609, 'learning_rate': 1.0190e-05, 'epoch': 2.11, 'throughput': 9999.89} [INFO|2025-03-20 21:12:00] logging.py:143 >> {'loss': 0.2450, 'learning_rate': 1.0179e-05, 'epoch': 2.11, 'throughput': 9999.81} [INFO|2025-03-20 21:12:41] logging.py:143 >> {'loss': 0.2559, 'learning_rate': 1.0168e-05, 'epoch': 2.11, 'throughput': 9999.88} [INFO|2025-03-20 21:13:23] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 1.0157e-05, 'epoch': 2.11, 'throughput': 9999.80} [INFO|2025-03-20 21:14:03] logging.py:143 >> {'loss': 0.2581, 'learning_rate': 1.0145e-05, 'epoch': 2.11, 'throughput': 9999.73} [INFO|2025-03-20 21:14:44] logging.py:143 >> {'loss': 0.2493, 'learning_rate': 1.0134e-05, 'epoch': 2.11, 'throughput': 9999.76} [INFO|2025-03-20 21:15:25] logging.py:143 >> {'loss': 0.2316, 'learning_rate': 1.0123e-05, 'epoch': 2.11, 'throughput': 9999.73} [INFO|2025-03-20 21:16:05] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 1.0112e-05, 'epoch': 2.12, 'throughput': 9999.76} [INFO|2025-03-20 21:16:46] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 1.0100e-05, 'epoch': 2.12, 'throughput': 9999.75} [INFO|2025-03-20 21:17:26] logging.py:143 >> {'loss': 0.2509, 'learning_rate': 1.0089e-05, 'epoch': 2.12, 'throughput': 9999.82} [INFO|2025-03-20 21:18:07] logging.py:143 >> {'loss': 0.2455, 'learning_rate': 1.0078e-05, 'epoch': 2.12, 'throughput': 9999.76} [INFO|2025-03-20 21:18:46] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 1.0067e-05, 'epoch': 2.12, 'throughput': 9999.76} [INFO|2025-03-20 21:19:27] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 1.0055e-05, 'epoch': 2.12, 'throughput': 9999.70} [INFO|2025-03-20 21:20:08] logging.py:143 >> {'loss': 0.2526, 'learning_rate': 1.0044e-05, 'epoch': 2.12, 'throughput': 9999.71} [INFO|2025-03-20 21:20:50] logging.py:143 >> {'loss': 0.2585, 'learning_rate': 1.0033e-05, 'epoch': 2.12, 'throughput': 9999.66} [INFO|2025-03-20 21:21:31] logging.py:143 >> {'loss': 0.2430, 'learning_rate': 1.0022e-05, 'epoch': 2.12, 'throughput': 9999.63} [INFO|2025-03-20 21:22:11] logging.py:143 >> {'loss': 0.2444, 'learning_rate': 1.0011e-05, 'epoch': 2.12, 'throughput': 9999.69} [INFO|2025-03-20 21:22:52] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 9.9994e-06, 'epoch': 2.12, 'throughput': 9999.65} [INFO|2025-03-20 21:23:33] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 9.9882e-06, 'epoch': 2.12, 'throughput': 9999.67} [INFO|2025-03-20 21:24:13] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 9.9770e-06, 'epoch': 2.12, 'throughput': 9999.64} [INFO|2025-03-20 21:24:54] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 9.9658e-06, 'epoch': 2.12, 'throughput': 9999.61} [INFO|2025-03-20 21:25:35] logging.py:143 >> {'loss': 0.2387, 'learning_rate': 9.9546e-06, 'epoch': 2.12, 'throughput': 9999.66} [INFO|2025-03-20 21:25:40] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000 [INFO|2025-03-20 21:25:40] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000/config.json [INFO|2025-03-20 21:25:40] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000/generation_config.json [INFO|2025-03-20 21:26:02] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000/model.safetensors.index.json. [INFO|2025-03-20 21:26:02] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000/tokenizer_config.json [INFO|2025-03-20 21:26:02] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-20000/special_tokens_map.json [INFO|2025-03-20 21:27:11] logging.py:143 >> {'loss': 0.2352, 'learning_rate': 9.9435e-06, 'epoch': 2.12, 'throughput': 9996.32} [INFO|2025-03-20 21:27:51] logging.py:143 >> {'loss': 0.2485, 'learning_rate': 9.9323e-06, 'epoch': 2.12, 'throughput': 9996.42} [INFO|2025-03-20 21:28:33] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 9.9211e-06, 'epoch': 2.12, 'throughput': 9996.31} [INFO|2025-03-20 21:29:13] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 9.9100e-06, 'epoch': 2.13, 'throughput': 9996.31} [INFO|2025-03-20 21:29:53] logging.py:143 >> {'loss': 0.2497, 'learning_rate': 9.8988e-06, 'epoch': 2.13, 'throughput': 9996.30} [INFO|2025-03-20 21:30:35] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 9.8877e-06, 'epoch': 2.13, 'throughput': 9996.31} [INFO|2025-03-20 21:31:16] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 9.8765e-06, 'epoch': 2.13, 'throughput': 9996.23} [INFO|2025-03-20 21:31:59] logging.py:143 >> {'loss': 0.2397, 'learning_rate': 9.8654e-06, 'epoch': 2.13, 'throughput': 9996.16} [INFO|2025-03-20 21:32:38] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 9.8542e-06, 'epoch': 2.13, 'throughput': 9996.18} [INFO|2025-03-20 21:33:19] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 9.8431e-06, 'epoch': 2.13, 'throughput': 9996.15} [INFO|2025-03-20 21:33:59] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 9.8320e-06, 'epoch': 2.13, 'throughput': 9996.15} [INFO|2025-03-20 21:34:39] logging.py:143 >> {'loss': 0.2323, 'learning_rate': 9.8209e-06, 'epoch': 2.13, 'throughput': 9996.17} [INFO|2025-03-20 21:35:20] logging.py:143 >> {'loss': 0.2382, 'learning_rate': 9.8098e-06, 'epoch': 2.13, 'throughput': 9996.15} [INFO|2025-03-20 21:36:00] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 9.7986e-06, 'epoch': 2.13, 'throughput': 9996.16} [INFO|2025-03-20 21:36:40] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 9.7875e-06, 'epoch': 2.13, 'throughput': 9996.26} [INFO|2025-03-20 21:37:19] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 9.7764e-06, 'epoch': 2.13, 'throughput': 9996.31} [INFO|2025-03-20 21:38:00] logging.py:143 >> {'loss': 0.2654, 'learning_rate': 9.7653e-06, 'epoch': 2.13, 'throughput': 9996.26} [INFO|2025-03-20 21:38:39] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 9.7542e-06, 'epoch': 2.13, 'throughput': 9996.26} [INFO|2025-03-20 21:39:20] logging.py:143 >> {'loss': 0.2621, 'learning_rate': 9.7432e-06, 'epoch': 2.13, 'throughput': 9996.22} [INFO|2025-03-20 21:40:00] logging.py:143 >> {'loss': 0.2503, 'learning_rate': 9.7321e-06, 'epoch': 2.13, 'throughput': 9996.18} [INFO|2025-03-20 21:40:42] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 9.7210e-06, 'epoch': 2.13, 'throughput': 9996.08} [INFO|2025-03-20 21:41:23] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 9.7099e-06, 'epoch': 2.13, 'throughput': 9996.08} [INFO|2025-03-20 21:42:03] logging.py:143 >> {'loss': 0.2447, 'learning_rate': 9.6988e-06, 'epoch': 2.14, 'throughput': 9996.14} [INFO|2025-03-20 21:42:44] logging.py:143 >> {'loss': 0.2558, 'learning_rate': 9.6878e-06, 'epoch': 2.14, 'throughput': 9996.06} [INFO|2025-03-20 21:43:27] logging.py:143 >> {'loss': 0.2348, 'learning_rate': 9.6767e-06, 'epoch': 2.14, 'throughput': 9995.99} [INFO|2025-03-20 21:44:06] logging.py:143 >> {'loss': 0.2437, 'learning_rate': 9.6657e-06, 'epoch': 2.14, 'throughput': 9995.97} [INFO|2025-03-20 21:44:46] logging.py:143 >> {'loss': 0.2411, 'learning_rate': 9.6546e-06, 'epoch': 2.14, 'throughput': 9996.05} [INFO|2025-03-20 21:45:26] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 9.6436e-06, 'epoch': 2.14, 'throughput': 9996.11} [INFO|2025-03-20 21:46:05] logging.py:143 >> {'loss': 0.2438, 'learning_rate': 9.6325e-06, 'epoch': 2.14, 'throughput': 9996.17} [INFO|2025-03-20 21:46:46] logging.py:143 >> {'loss': 0.2501, 'learning_rate': 9.6215e-06, 'epoch': 2.14, 'throughput': 9996.22} [INFO|2025-03-20 21:47:26] logging.py:143 >> {'loss': 0.2467, 'learning_rate': 9.6105e-06, 'epoch': 2.14, 'throughput': 9996.22} [INFO|2025-03-20 21:48:06] logging.py:143 >> {'loss': 0.2614, 'learning_rate': 9.5994e-06, 'epoch': 2.14, 'throughput': 9996.22} [INFO|2025-03-20 21:48:47] logging.py:143 >> {'loss': 0.2283, 'learning_rate': 9.5884e-06, 'epoch': 2.14, 'throughput': 9996.16} [INFO|2025-03-20 21:49:27] logging.py:143 >> {'loss': 0.2605, 'learning_rate': 9.5774e-06, 'epoch': 2.14, 'throughput': 9996.18} [INFO|2025-03-20 21:50:08] logging.py:143 >> {'loss': 0.2440, 'learning_rate': 9.5664e-06, 'epoch': 2.14, 'throughput': 9996.23} [INFO|2025-03-20 21:50:48] logging.py:143 >> {'loss': 0.2463, 'learning_rate': 9.5554e-06, 'epoch': 2.14, 'throughput': 9996.30} [INFO|2025-03-20 21:51:29] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 9.5444e-06, 'epoch': 2.14, 'throughput': 9996.26} [INFO|2025-03-20 21:52:09] logging.py:143 >> {'loss': 0.2543, 'learning_rate': 9.5334e-06, 'epoch': 2.14, 'throughput': 9996.29} [INFO|2025-03-20 21:52:48] logging.py:143 >> {'loss': 0.2549, 'learning_rate': 9.5224e-06, 'epoch': 2.14, 'throughput': 9996.32} [INFO|2025-03-20 21:53:30] logging.py:143 >> {'loss': 0.2433, 'learning_rate': 9.5114e-06, 'epoch': 2.14, 'throughput': 9996.30} [INFO|2025-03-20 21:54:11] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 9.5004e-06, 'epoch': 2.14, 'throughput': 9996.25} [INFO|2025-03-20 21:54:51] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 9.4895e-06, 'epoch': 2.15, 'throughput': 9996.21} [INFO|2025-03-20 21:55:33] logging.py:143 >> {'loss': 0.2569, 'learning_rate': 9.4785e-06, 'epoch': 2.15, 'throughput': 9996.09} [INFO|2025-03-20 21:56:13] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 9.4675e-06, 'epoch': 2.15, 'throughput': 9996.09} [INFO|2025-03-20 21:56:54] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 9.4566e-06, 'epoch': 2.15, 'throughput': 9996.03} [INFO|2025-03-20 21:57:35] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 9.4456e-06, 'epoch': 2.15, 'throughput': 9995.97} [INFO|2025-03-20 21:58:16] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 9.4346e-06, 'epoch': 2.15, 'throughput': 9995.95} [INFO|2025-03-20 21:58:55] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 9.4237e-06, 'epoch': 2.15, 'throughput': 9996.04} [INFO|2025-03-20 21:59:36] logging.py:143 >> {'loss': 0.2412, 'learning_rate': 9.4128e-06, 'epoch': 2.15, 'throughput': 9996.04} [INFO|2025-03-20 22:00:17] logging.py:143 >> {'loss': 0.2549, 'learning_rate': 9.4018e-06, 'epoch': 2.15, 'throughput': 9996.05} [INFO|2025-03-20 22:00:58] logging.py:143 >> {'loss': 0.2541, 'learning_rate': 9.3909e-06, 'epoch': 2.15, 'throughput': 9996.03} [INFO|2025-03-20 22:01:39] logging.py:143 >> {'loss': 0.2227, 'learning_rate': 9.3800e-06, 'epoch': 2.15, 'throughput': 9995.97} [INFO|2025-03-20 22:02:20] logging.py:143 >> {'loss': 0.2258, 'learning_rate': 9.3690e-06, 'epoch': 2.15, 'throughput': 9996.00} [INFO|2025-03-20 22:03:01] logging.py:143 >> {'loss': 0.2428, 'learning_rate': 9.3581e-06, 'epoch': 2.15, 'throughput': 9996.01} [INFO|2025-03-20 22:03:42] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 9.3472e-06, 'epoch': 2.15, 'throughput': 9995.98} [INFO|2025-03-20 22:04:23] logging.py:143 >> {'loss': 0.2496, 'learning_rate': 9.3363e-06, 'epoch': 2.15, 'throughput': 9995.88} [INFO|2025-03-20 22:05:04] logging.py:143 >> {'loss': 0.2552, 'learning_rate': 9.3254e-06, 'epoch': 2.15, 'throughput': 9995.87} [INFO|2025-03-20 22:05:45] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 9.3145e-06, 'epoch': 2.15, 'throughput': 9995.81} [INFO|2025-03-20 22:06:27] logging.py:143 >> {'loss': 0.2578, 'learning_rate': 9.3036e-06, 'epoch': 2.15, 'throughput': 9995.73} [INFO|2025-03-20 22:07:07] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 9.2927e-06, 'epoch': 2.15, 'throughput': 9995.75} [INFO|2025-03-20 22:07:47] logging.py:143 >> {'loss': 0.2516, 'learning_rate': 9.2818e-06, 'epoch': 2.16, 'throughput': 9995.75} [INFO|2025-03-20 22:08:27] logging.py:143 >> {'loss': 0.2662, 'learning_rate': 9.2709e-06, 'epoch': 2.16, 'throughput': 9995.88} [INFO|2025-03-20 22:09:08] logging.py:143 >> {'loss': 0.2562, 'learning_rate': 9.2601e-06, 'epoch': 2.16, 'throughput': 9995.88} [INFO|2025-03-20 22:09:47] logging.py:143 >> {'loss': 0.2359, 'learning_rate': 9.2492e-06, 'epoch': 2.16, 'throughput': 9995.99} [INFO|2025-03-20 22:10:27] logging.py:143 >> {'loss': 0.2355, 'learning_rate': 9.2383e-06, 'epoch': 2.16, 'throughput': 9996.01} [INFO|2025-03-20 22:11:07] logging.py:143 >> {'loss': 0.2408, 'learning_rate': 9.2275e-06, 'epoch': 2.16, 'throughput': 9996.02} [INFO|2025-03-20 22:11:50] logging.py:143 >> {'loss': 0.2506, 'learning_rate': 9.2166e-06, 'epoch': 2.16, 'throughput': 9995.92} [INFO|2025-03-20 22:12:30] logging.py:143 >> {'loss': 0.2413, 'learning_rate': 9.2058e-06, 'epoch': 2.16, 'throughput': 9995.96} [INFO|2025-03-20 22:13:11] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 9.1949e-06, 'epoch': 2.16, 'throughput': 9995.91} [INFO|2025-03-20 22:13:52] logging.py:143 >> {'loss': 0.2397, 'learning_rate': 9.1841e-06, 'epoch': 2.16, 'throughput': 9995.91} [INFO|2025-03-20 22:14:35] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 9.1733e-06, 'epoch': 2.16, 'throughput': 9995.82} [INFO|2025-03-20 22:15:16] logging.py:143 >> {'loss': 0.2396, 'learning_rate': 9.1624e-06, 'epoch': 2.16, 'throughput': 9995.78} [INFO|2025-03-20 22:15:57] logging.py:143 >> {'loss': 0.2263, 'learning_rate': 9.1516e-06, 'epoch': 2.16, 'throughput': 9995.73} [INFO|2025-03-20 22:16:37] logging.py:143 >> {'loss': 0.2553, 'learning_rate': 9.1408e-06, 'epoch': 2.16, 'throughput': 9995.78} [INFO|2025-03-20 22:17:19] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 9.1300e-06, 'epoch': 2.16, 'throughput': 9995.70} [INFO|2025-03-20 22:18:00] logging.py:143 >> {'loss': 0.2547, 'learning_rate': 9.1192e-06, 'epoch': 2.16, 'throughput': 9995.77} [INFO|2025-03-20 22:18:40] logging.py:143 >> {'loss': 0.2246, 'learning_rate': 9.1083e-06, 'epoch': 2.16, 'throughput': 9995.79} [INFO|2025-03-20 22:19:21] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 9.0975e-06, 'epoch': 2.16, 'throughput': 9995.75} [INFO|2025-03-20 22:20:02] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 9.0868e-06, 'epoch': 2.16, 'throughput': 9995.68} [INFO|2025-03-20 22:20:42] logging.py:143 >> {'loss': 0.2521, 'learning_rate': 9.0760e-06, 'epoch': 2.17, 'throughput': 9995.66} [INFO|2025-03-20 22:21:22] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 9.0652e-06, 'epoch': 2.17, 'throughput': 9995.65} [INFO|2025-03-20 22:22:02] logging.py:143 >> {'loss': 0.2545, 'learning_rate': 9.0544e-06, 'epoch': 2.17, 'throughput': 9995.66} [INFO|2025-03-20 22:22:43] logging.py:143 >> {'loss': 0.2466, 'learning_rate': 9.0436e-06, 'epoch': 2.17, 'throughput': 9995.67} [INFO|2025-03-20 22:23:23] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 9.0329e-06, 'epoch': 2.17, 'throughput': 9995.77} [INFO|2025-03-20 22:24:04] logging.py:143 >> {'loss': 0.2454, 'learning_rate': 9.0221e-06, 'epoch': 2.17, 'throughput': 9995.77} [INFO|2025-03-20 22:24:44] logging.py:143 >> {'loss': 0.2323, 'learning_rate': 9.0113e-06, 'epoch': 2.17, 'throughput': 9995.83} [INFO|2025-03-20 22:25:22] logging.py:143 >> {'loss': 0.2489, 'learning_rate': 9.0006e-06, 'epoch': 2.17, 'throughput': 9995.90} [INFO|2025-03-20 22:26:02] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 8.9898e-06, 'epoch': 2.17, 'throughput': 9995.89} [INFO|2025-03-20 22:26:43] logging.py:143 >> {'loss': 0.2480, 'learning_rate': 8.9791e-06, 'epoch': 2.17, 'throughput': 9995.85} [INFO|2025-03-20 22:27:23] logging.py:143 >> {'loss': 0.2546, 'learning_rate': 8.9683e-06, 'epoch': 2.17, 'throughput': 9995.89} [INFO|2025-03-20 22:28:04] logging.py:143 >> {'loss': 0.2679, 'learning_rate': 8.9576e-06, 'epoch': 2.17, 'throughput': 9995.90} [INFO|2025-03-20 22:28:45] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 8.9469e-06, 'epoch': 2.17, 'throughput': 9995.97} [INFO|2025-03-20 22:29:25] logging.py:143 >> {'loss': 0.2221, 'learning_rate': 8.9361e-06, 'epoch': 2.17, 'throughput': 9995.95} [INFO|2025-03-20 22:30:05] logging.py:143 >> {'loss': 0.2313, 'learning_rate': 8.9254e-06, 'epoch': 2.17, 'throughput': 9995.99} [INFO|2025-03-20 22:30:45] logging.py:143 >> {'loss': 0.2429, 'learning_rate': 8.9147e-06, 'epoch': 2.17, 'throughput': 9995.97} [INFO|2025-03-20 22:31:24] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 8.9040e-06, 'epoch': 2.17, 'throughput': 9996.05} [INFO|2025-03-20 22:32:06] logging.py:143 >> {'loss': 0.2553, 'learning_rate': 8.8933e-06, 'epoch': 2.17, 'throughput': 9996.06} [INFO|2025-03-20 22:32:47] logging.py:143 >> {'loss': 0.2582, 'learning_rate': 8.8826e-06, 'epoch': 2.17, 'throughput': 9996.09} [INFO|2025-03-20 22:33:27] logging.py:143 >> {'loss': 0.2484, 'learning_rate': 8.8719e-06, 'epoch': 2.18, 'throughput': 9996.11} [INFO|2025-03-20 22:34:08] logging.py:143 >> {'loss': 0.2643, 'learning_rate': 8.8612e-06, 'epoch': 2.18, 'throughput': 9996.09} [INFO|2025-03-20 22:34:48] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 8.8505e-06, 'epoch': 2.18, 'throughput': 9996.09} [INFO|2025-03-20 22:35:29] logging.py:143 >> {'loss': 0.2478, 'learning_rate': 8.8398e-06, 'epoch': 2.18, 'throughput': 9996.07} [INFO|2025-03-20 22:36:10] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 8.8292e-06, 'epoch': 2.18, 'throughput': 9996.00} [INFO|2025-03-20 22:36:50] logging.py:143 >> {'loss': 0.2559, 'learning_rate': 8.8185e-06, 'epoch': 2.18, 'throughput': 9996.02} [INFO|2025-03-20 22:37:30] logging.py:143 >> {'loss': 0.2670, 'learning_rate': 8.8078e-06, 'epoch': 2.18, 'throughput': 9996.02} [INFO|2025-03-20 22:38:11] logging.py:143 >> {'loss': 0.2487, 'learning_rate': 8.7972e-06, 'epoch': 2.18, 'throughput': 9996.00} [INFO|2025-03-20 22:38:51] logging.py:143 >> {'loss': 0.2550, 'learning_rate': 8.7865e-06, 'epoch': 2.18, 'throughput': 9996.03} [INFO|2025-03-20 22:39:31] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 8.7759e-06, 'epoch': 2.18, 'throughput': 9996.04} [INFO|2025-03-20 22:40:10] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 8.7652e-06, 'epoch': 2.18, 'throughput': 9996.01} [INFO|2025-03-20 22:40:50] logging.py:143 >> {'loss': 0.2516, 'learning_rate': 8.7546e-06, 'epoch': 2.18, 'throughput': 9995.99} [INFO|2025-03-20 22:41:31] logging.py:143 >> {'loss': 0.2691, 'learning_rate': 8.7440e-06, 'epoch': 2.18, 'throughput': 9996.00} [INFO|2025-03-20 22:42:12] logging.py:143 >> {'loss': 0.2770, 'learning_rate': 8.7333e-06, 'epoch': 2.18, 'throughput': 9996.07} [INFO|2025-03-20 22:42:53] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 8.7227e-06, 'epoch': 2.18, 'throughput': 9996.06} [INFO|2025-03-20 22:43:33] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 8.7121e-06, 'epoch': 2.18, 'throughput': 9996.03} [INFO|2025-03-20 22:44:13] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 8.7015e-06, 'epoch': 2.18, 'throughput': 9996.04} [INFO|2025-03-20 22:44:53] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 8.6909e-06, 'epoch': 2.18, 'throughput': 9996.04} [INFO|2025-03-20 22:45:34] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 8.6803e-06, 'epoch': 2.19, 'throughput': 9996.04} [INFO|2025-03-20 22:46:15] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 8.6697e-06, 'epoch': 2.19, 'throughput': 9996.03} [INFO|2025-03-20 22:46:55] logging.py:143 >> {'loss': 0.2544, 'learning_rate': 8.6591e-06, 'epoch': 2.19, 'throughput': 9996.01} [INFO|2025-03-20 22:47:36] logging.py:143 >> {'loss': 0.2646, 'learning_rate': 8.6485e-06, 'epoch': 2.19, 'throughput': 9996.08} [INFO|2025-03-20 22:48:16] logging.py:143 >> {'loss': 0.2659, 'learning_rate': 8.6379e-06, 'epoch': 2.19, 'throughput': 9996.14} [INFO|2025-03-20 22:48:57] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 8.6273e-06, 'epoch': 2.19, 'throughput': 9996.17} [INFO|2025-03-20 22:49:37] logging.py:143 >> {'loss': 0.2480, 'learning_rate': 8.6168e-06, 'epoch': 2.19, 'throughput': 9996.24} [INFO|2025-03-20 22:50:18] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 8.6062e-06, 'epoch': 2.19, 'throughput': 9996.27} [INFO|2025-03-20 22:50:59] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 8.5956e-06, 'epoch': 2.19, 'throughput': 9996.28} [INFO|2025-03-20 22:51:40] logging.py:143 >> {'loss': 0.2605, 'learning_rate': 8.5851e-06, 'epoch': 2.19, 'throughput': 9996.20} [INFO|2025-03-20 22:52:21] logging.py:143 >> {'loss': 0.2404, 'learning_rate': 8.5745e-06, 'epoch': 2.19, 'throughput': 9996.19} [INFO|2025-03-20 22:53:02] logging.py:143 >> {'loss': 0.2494, 'learning_rate': 8.5640e-06, 'epoch': 2.19, 'throughput': 9996.17} [INFO|2025-03-20 22:53:43] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 8.5534e-06, 'epoch': 2.19, 'throughput': 9996.14} [INFO|2025-03-20 22:54:23] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 8.5429e-06, 'epoch': 2.19, 'throughput': 9996.14} [INFO|2025-03-20 22:55:04] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 8.5324e-06, 'epoch': 2.19, 'throughput': 9996.08} [INFO|2025-03-20 22:55:43] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 8.5218e-06, 'epoch': 2.19, 'throughput': 9996.07} [INFO|2025-03-20 22:56:23] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 8.5113e-06, 'epoch': 2.19, 'throughput': 9996.07} [INFO|2025-03-20 22:57:02] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 8.5008e-06, 'epoch': 2.19, 'throughput': 9996.14} [INFO|2025-03-20 22:57:44] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 8.4903e-06, 'epoch': 2.19, 'throughput': 9996.14} [INFO|2025-03-20 22:58:25] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 8.4798e-06, 'epoch': 2.20, 'throughput': 9996.11} [INFO|2025-03-20 22:59:06] logging.py:143 >> {'loss': 0.2707, 'learning_rate': 8.4693e-06, 'epoch': 2.20, 'throughput': 9996.15} [INFO|2025-03-20 22:59:48] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 8.4588e-06, 'epoch': 2.20, 'throughput': 9996.12} [INFO|2025-03-20 23:00:29] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 8.4483e-06, 'epoch': 2.20, 'throughput': 9996.11} [INFO|2025-03-20 23:01:08] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 8.4378e-06, 'epoch': 2.20, 'throughput': 9996.14} [INFO|2025-03-20 23:01:47] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 8.4273e-06, 'epoch': 2.20, 'throughput': 9996.18} [INFO|2025-03-20 23:02:28] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 8.4169e-06, 'epoch': 2.20, 'throughput': 9996.11} [INFO|2025-03-20 23:03:08] logging.py:143 >> {'loss': 0.2543, 'learning_rate': 8.4064e-06, 'epoch': 2.20, 'throughput': 9996.17} [INFO|2025-03-20 23:03:48] logging.py:143 >> {'loss': 0.2394, 'learning_rate': 8.3959e-06, 'epoch': 2.20, 'throughput': 9996.14} [INFO|2025-03-20 23:04:29] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 8.3855e-06, 'epoch': 2.20, 'throughput': 9996.12} [INFO|2025-03-20 23:05:11] logging.py:143 >> {'loss': 0.2652, 'learning_rate': 8.3750e-06, 'epoch': 2.20, 'throughput': 9996.13} [INFO|2025-03-20 23:05:51] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 8.3646e-06, 'epoch': 2.20, 'throughput': 9996.11} [INFO|2025-03-20 23:06:32] logging.py:143 >> {'loss': 0.2409, 'learning_rate': 8.3541e-06, 'epoch': 2.20, 'throughput': 9996.13} [INFO|2025-03-20 23:07:11] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 8.3437e-06, 'epoch': 2.20, 'throughput': 9996.19} [INFO|2025-03-20 23:07:51] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 8.3333e-06, 'epoch': 2.20, 'throughput': 9996.25} [INFO|2025-03-20 23:08:32] logging.py:143 >> {'loss': 0.2478, 'learning_rate': 8.3228e-06, 'epoch': 2.20, 'throughput': 9996.23} [INFO|2025-03-20 23:09:15] logging.py:143 >> {'loss': 0.2609, 'learning_rate': 8.3124e-06, 'epoch': 2.20, 'throughput': 9996.18} [INFO|2025-03-20 23:09:56] logging.py:143 >> {'loss': 0.2588, 'learning_rate': 8.3020e-06, 'epoch': 2.20, 'throughput': 9996.15} [INFO|2025-03-20 23:10:35] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 8.2916e-06, 'epoch': 2.20, 'throughput': 9996.18} [INFO|2025-03-20 23:11:16] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 8.2812e-06, 'epoch': 2.21, 'throughput': 9996.21} [INFO|2025-03-20 23:11:55] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 8.2708e-06, 'epoch': 2.21, 'throughput': 9996.28} [INFO|2025-03-20 23:12:37] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 8.2604e-06, 'epoch': 2.21, 'throughput': 9996.24} [INFO|2025-03-20 23:13:17] logging.py:143 >> {'loss': 0.2341, 'learning_rate': 8.2500e-06, 'epoch': 2.21, 'throughput': 9996.24} [INFO|2025-03-20 23:13:57] logging.py:143 >> {'loss': 0.2274, 'learning_rate': 8.2396e-06, 'epoch': 2.21, 'throughput': 9996.36} [INFO|2025-03-20 23:14:38] logging.py:143 >> {'loss': 0.2461, 'learning_rate': 8.2292e-06, 'epoch': 2.21, 'throughput': 9996.39} [INFO|2025-03-20 23:15:18] logging.py:143 >> {'loss': 0.2265, 'learning_rate': 8.2188e-06, 'epoch': 2.21, 'throughput': 9996.37} [INFO|2025-03-20 23:15:58] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 8.2085e-06, 'epoch': 2.21, 'throughput': 9996.36} [INFO|2025-03-20 23:16:39] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 8.1981e-06, 'epoch': 2.21, 'throughput': 9996.36} [INFO|2025-03-20 23:17:20] logging.py:143 >> {'loss': 0.2519, 'learning_rate': 8.1877e-06, 'epoch': 2.21, 'throughput': 9996.31} [INFO|2025-03-20 23:18:02] logging.py:143 >> {'loss': 0.2362, 'learning_rate': 8.1774e-06, 'epoch': 2.21, 'throughput': 9996.30} [INFO|2025-03-20 23:18:42] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 8.1670e-06, 'epoch': 2.21, 'throughput': 9996.22} [INFO|2025-03-20 23:19:23] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 8.1567e-06, 'epoch': 2.21, 'throughput': 9996.23} [INFO|2025-03-20 23:20:03] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 8.1464e-06, 'epoch': 2.21, 'throughput': 9996.29} [INFO|2025-03-20 23:20:43] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 8.1360e-06, 'epoch': 2.21, 'throughput': 9996.31} [INFO|2025-03-20 23:21:23] logging.py:143 >> {'loss': 0.2322, 'learning_rate': 8.1257e-06, 'epoch': 2.21, 'throughput': 9996.28} [INFO|2025-03-20 23:22:03] logging.py:143 >> {'loss': 0.2342, 'learning_rate': 8.1154e-06, 'epoch': 2.21, 'throughput': 9996.32} [INFO|2025-03-20 23:22:44] logging.py:143 >> {'loss': 0.2512, 'learning_rate': 8.1051e-06, 'epoch': 2.21, 'throughput': 9996.28} [INFO|2025-03-20 23:23:27] logging.py:143 >> {'loss': 0.2195, 'learning_rate': 8.0947e-06, 'epoch': 2.21, 'throughput': 9996.21} [INFO|2025-03-20 23:24:07] logging.py:143 >> {'loss': 0.2587, 'learning_rate': 8.0844e-06, 'epoch': 2.22, 'throughput': 9996.28} [INFO|2025-03-20 23:24:47] logging.py:143 >> {'loss': 0.2455, 'learning_rate': 8.0741e-06, 'epoch': 2.22, 'throughput': 9996.34} [INFO|2025-03-20 23:25:28] logging.py:143 >> {'loss': 0.2387, 'learning_rate': 8.0638e-06, 'epoch': 2.22, 'throughput': 9996.33} [INFO|2025-03-20 23:26:09] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 8.0536e-06, 'epoch': 2.22, 'throughput': 9996.34} [INFO|2025-03-20 23:26:50] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 8.0433e-06, 'epoch': 2.22, 'throughput': 9996.36} [INFO|2025-03-20 23:27:30] logging.py:143 >> {'loss': 0.2431, 'learning_rate': 8.0330e-06, 'epoch': 2.22, 'throughput': 9996.36} [INFO|2025-03-20 23:28:10] logging.py:143 >> {'loss': 0.2537, 'learning_rate': 8.0227e-06, 'epoch': 2.22, 'throughput': 9996.32} [INFO|2025-03-20 23:28:50] logging.py:143 >> {'loss': 0.2532, 'learning_rate': 8.0124e-06, 'epoch': 2.22, 'throughput': 9996.33} [INFO|2025-03-20 23:29:31] logging.py:143 >> {'loss': 0.2461, 'learning_rate': 8.0022e-06, 'epoch': 2.22, 'throughput': 9996.38} [INFO|2025-03-20 23:30:10] logging.py:143 >> {'loss': 0.2496, 'learning_rate': 7.9919e-06, 'epoch': 2.22, 'throughput': 9996.38} [INFO|2025-03-20 23:30:51] logging.py:143 >> {'loss': 0.2403, 'learning_rate': 7.9817e-06, 'epoch': 2.22, 'throughput': 9996.40} [INFO|2025-03-20 23:31:32] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 7.9714e-06, 'epoch': 2.22, 'throughput': 9996.32} [INFO|2025-03-20 23:32:14] logging.py:143 >> {'loss': 0.2506, 'learning_rate': 7.9612e-06, 'epoch': 2.22, 'throughput': 9996.28} [INFO|2025-03-20 23:32:54] logging.py:143 >> {'loss': 0.2320, 'learning_rate': 7.9509e-06, 'epoch': 2.22, 'throughput': 9996.35} [INFO|2025-03-20 23:33:33] logging.py:143 >> {'loss': 0.2422, 'learning_rate': 7.9407e-06, 'epoch': 2.22, 'throughput': 9996.39} [INFO|2025-03-20 23:34:12] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 7.9305e-06, 'epoch': 2.22, 'throughput': 9996.42} [INFO|2025-03-20 23:34:53] logging.py:143 >> {'loss': 0.2314, 'learning_rate': 7.9203e-06, 'epoch': 2.22, 'throughput': 9996.35} [INFO|2025-03-20 23:35:32] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 7.9100e-06, 'epoch': 2.22, 'throughput': 9996.38} [INFO|2025-03-20 23:36:12] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 7.8998e-06, 'epoch': 2.22, 'throughput': 9996.40} [INFO|2025-03-20 23:36:53] logging.py:143 >> {'loss': 0.2206, 'learning_rate': 7.8896e-06, 'epoch': 2.23, 'throughput': 9996.46} [INFO|2025-03-20 23:37:35] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 7.8794e-06, 'epoch': 2.23, 'throughput': 9996.49} [INFO|2025-03-20 23:38:14] logging.py:143 >> {'loss': 0.2586, 'learning_rate': 7.8692e-06, 'epoch': 2.23, 'throughput': 9996.58} [INFO|2025-03-20 23:38:55] logging.py:143 >> {'loss': 0.2622, 'learning_rate': 7.8590e-06, 'epoch': 2.23, 'throughput': 9996.50} [INFO|2025-03-20 23:39:36] logging.py:143 >> {'loss': 0.2495, 'learning_rate': 7.8489e-06, 'epoch': 2.23, 'throughput': 9996.48} [INFO|2025-03-20 23:40:16] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 7.8387e-06, 'epoch': 2.23, 'throughput': 9996.49} [INFO|2025-03-20 23:40:58] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 7.8285e-06, 'epoch': 2.23, 'throughput': 9996.48} [INFO|2025-03-20 23:41:39] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 7.8183e-06, 'epoch': 2.23, 'throughput': 9996.43} [INFO|2025-03-20 23:42:20] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 7.8082e-06, 'epoch': 2.23, 'throughput': 9996.36} [INFO|2025-03-20 23:42:58] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 7.7980e-06, 'epoch': 2.23, 'throughput': 9996.43} [INFO|2025-03-20 23:43:38] logging.py:143 >> {'loss': 0.2333, 'learning_rate': 7.7879e-06, 'epoch': 2.23, 'throughput': 9996.44} [INFO|2025-03-20 23:44:18] logging.py:143 >> {'loss': 0.2491, 'learning_rate': 7.7777e-06, 'epoch': 2.23, 'throughput': 9996.42} [INFO|2025-03-20 23:44:58] logging.py:143 >> {'loss': 0.2497, 'learning_rate': 7.7676e-06, 'epoch': 2.23, 'throughput': 9996.42} [INFO|2025-03-20 23:45:39] logging.py:143 >> {'loss': 0.2484, 'learning_rate': 7.7574e-06, 'epoch': 2.23, 'throughput': 9996.39} [INFO|2025-03-20 23:46:20] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 7.7473e-06, 'epoch': 2.23, 'throughput': 9996.42} [INFO|2025-03-20 23:47:00] logging.py:143 >> {'loss': 0.2414, 'learning_rate': 7.7372e-06, 'epoch': 2.23, 'throughput': 9996.49} [INFO|2025-03-20 23:47:40] logging.py:143 >> {'loss': 0.2180, 'learning_rate': 7.7271e-06, 'epoch': 2.23, 'throughput': 9996.49} [INFO|2025-03-20 23:48:18] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 7.7170e-06, 'epoch': 2.23, 'throughput': 9996.55} [INFO|2025-03-20 23:49:00] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 7.7069e-06, 'epoch': 2.23, 'throughput': 9996.46} [INFO|2025-03-20 23:49:40] logging.py:143 >> {'loss': 0.2633, 'learning_rate': 7.6968e-06, 'epoch': 2.24, 'throughput': 9996.49} [INFO|2025-03-20 23:50:19] logging.py:143 >> {'loss': 0.2333, 'learning_rate': 7.6867e-06, 'epoch': 2.24, 'throughput': 9996.52} [INFO|2025-03-20 23:51:01] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 7.6766e-06, 'epoch': 2.24, 'throughput': 9996.52} [INFO|2025-03-20 23:51:41] logging.py:143 >> {'loss': 0.2447, 'learning_rate': 7.6665e-06, 'epoch': 2.24, 'throughput': 9996.59} [INFO|2025-03-20 23:52:22] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 7.6564e-06, 'epoch': 2.24, 'throughput': 9996.60} [INFO|2025-03-20 23:53:03] logging.py:143 >> {'loss': 0.2488, 'learning_rate': 7.6463e-06, 'epoch': 2.24, 'throughput': 9996.63} [INFO|2025-03-20 23:53:44] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 7.6362e-06, 'epoch': 2.24, 'throughput': 9996.55} [INFO|2025-03-20 23:54:27] logging.py:143 >> {'loss': 0.2437, 'learning_rate': 7.6262e-06, 'epoch': 2.24, 'throughput': 9996.47} [INFO|2025-03-20 23:55:08] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 7.6161e-06, 'epoch': 2.24, 'throughput': 9996.41} [INFO|2025-03-20 23:55:48] logging.py:143 >> {'loss': 0.2240, 'learning_rate': 7.6061e-06, 'epoch': 2.24, 'throughput': 9996.40} [INFO|2025-03-20 23:56:30] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 7.5960e-06, 'epoch': 2.24, 'throughput': 9996.35} [INFO|2025-03-20 23:57:10] logging.py:143 >> {'loss': 0.2579, 'learning_rate': 7.5860e-06, 'epoch': 2.24, 'throughput': 9996.36} [INFO|2025-03-20 23:57:51] logging.py:143 >> {'loss': 0.2152, 'learning_rate': 7.5759e-06, 'epoch': 2.24, 'throughput': 9996.30} [INFO|2025-03-20 23:58:32] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 7.5659e-06, 'epoch': 2.24, 'throughput': 9996.24} [INFO|2025-03-20 23:59:12] logging.py:143 >> {'loss': 0.2484, 'learning_rate': 7.5559e-06, 'epoch': 2.24, 'throughput': 9996.31} [INFO|2025-03-20 23:59:52] logging.py:143 >> {'loss': 0.2350, 'learning_rate': 7.5459e-06, 'epoch': 2.24, 'throughput': 9996.33} [INFO|2025-03-21 00:00:32] logging.py:143 >> {'loss': 0.2614, 'learning_rate': 7.5358e-06, 'epoch': 2.24, 'throughput': 9996.35} [INFO|2025-03-21 00:01:12] logging.py:143 >> {'loss': 0.2240, 'learning_rate': 7.5258e-06, 'epoch': 2.24, 'throughput': 9996.32} [INFO|2025-03-21 00:01:54] logging.py:143 >> {'loss': 0.2427, 'learning_rate': 7.5158e-06, 'epoch': 2.24, 'throughput': 9996.25} [INFO|2025-03-21 00:02:33] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 7.5058e-06, 'epoch': 2.25, 'throughput': 9996.21} [INFO|2025-03-21 00:03:13] logging.py:143 >> {'loss': 0.2494, 'learning_rate': 7.4958e-06, 'epoch': 2.25, 'throughput': 9996.28} [INFO|2025-03-21 00:03:55] logging.py:143 >> {'loss': 0.2520, 'learning_rate': 7.4858e-06, 'epoch': 2.25, 'throughput': 9996.23} [INFO|2025-03-21 00:04:35] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 7.4759e-06, 'epoch': 2.25, 'throughput': 9996.16} [INFO|2025-03-21 00:05:15] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 7.4659e-06, 'epoch': 2.25, 'throughput': 9996.12} [INFO|2025-03-21 00:05:55] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 7.4559e-06, 'epoch': 2.25, 'throughput': 9996.19} [INFO|2025-03-21 00:06:35] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 7.4460e-06, 'epoch': 2.25, 'throughput': 9996.24} [INFO|2025-03-21 00:07:16] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 7.4360e-06, 'epoch': 2.25, 'throughput': 9996.18} [INFO|2025-03-21 00:07:59] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 7.4260e-06, 'epoch': 2.25, 'throughput': 9996.12} [INFO|2025-03-21 00:08:38] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 7.4161e-06, 'epoch': 2.25, 'throughput': 9996.22} [INFO|2025-03-21 00:09:18] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 7.4061e-06, 'epoch': 2.25, 'throughput': 9996.26} [INFO|2025-03-21 00:09:59] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 7.3962e-06, 'epoch': 2.25, 'throughput': 9996.18} [INFO|2025-03-21 00:10:39] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 7.3863e-06, 'epoch': 2.25, 'throughput': 9996.15} [INFO|2025-03-21 00:11:18] logging.py:143 >> {'loss': 0.2404, 'learning_rate': 7.3763e-06, 'epoch': 2.25, 'throughput': 9996.22} [INFO|2025-03-21 00:11:58] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 7.3664e-06, 'epoch': 2.25, 'throughput': 9996.29} [INFO|2025-03-21 00:12:39] logging.py:143 >> {'loss': 0.2290, 'learning_rate': 7.3565e-06, 'epoch': 2.25, 'throughput': 9996.37} [INFO|2025-03-21 00:13:19] logging.py:143 >> {'loss': 0.2607, 'learning_rate': 7.3466e-06, 'epoch': 2.25, 'throughput': 9996.42} [INFO|2025-03-21 00:13:58] logging.py:143 >> {'loss': 0.2474, 'learning_rate': 7.3367e-06, 'epoch': 2.25, 'throughput': 9996.47} [INFO|2025-03-21 00:14:40] logging.py:143 >> {'loss': 0.2488, 'learning_rate': 7.3268e-06, 'epoch': 2.26, 'throughput': 9996.38} [INFO|2025-03-21 00:15:19] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 7.3169e-06, 'epoch': 2.26, 'throughput': 9996.33} [INFO|2025-03-21 00:15:58] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 7.3070e-06, 'epoch': 2.26, 'throughput': 9996.38} [INFO|2025-03-21 00:16:39] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 7.2971e-06, 'epoch': 2.26, 'throughput': 9996.33} [INFO|2025-03-21 00:17:19] logging.py:143 >> {'loss': 0.2544, 'learning_rate': 7.2872e-06, 'epoch': 2.26, 'throughput': 9996.35} [INFO|2025-03-21 00:18:00] logging.py:143 >> {'loss': 0.2488, 'learning_rate': 7.2774e-06, 'epoch': 2.26, 'throughput': 9996.37} [INFO|2025-03-21 00:18:40] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 7.2675e-06, 'epoch': 2.26, 'throughput': 9996.43} [INFO|2025-03-21 00:19:19] logging.py:143 >> {'loss': 0.2530, 'learning_rate': 7.2576e-06, 'epoch': 2.26, 'throughput': 9996.52} [INFO|2025-03-21 00:19:59] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 7.2478e-06, 'epoch': 2.26, 'throughput': 9996.56} [INFO|2025-03-21 00:20:38] logging.py:143 >> {'loss': 0.2537, 'learning_rate': 7.2379e-06, 'epoch': 2.26, 'throughput': 9996.63} [INFO|2025-03-21 00:21:18] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 7.2281e-06, 'epoch': 2.26, 'throughput': 9996.69} [INFO|2025-03-21 00:21:57] logging.py:143 >> {'loss': 0.2734, 'learning_rate': 7.2182e-06, 'epoch': 2.26, 'throughput': 9996.78} [INFO|2025-03-21 00:22:37] logging.py:143 >> {'loss': 0.2397, 'learning_rate': 7.2084e-06, 'epoch': 2.26, 'throughput': 9996.77} [INFO|2025-03-21 00:23:17] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 7.1986e-06, 'epoch': 2.26, 'throughput': 9996.77} [INFO|2025-03-21 00:23:58] logging.py:143 >> {'loss': 0.2454, 'learning_rate': 7.1888e-06, 'epoch': 2.26, 'throughput': 9996.80} [INFO|2025-03-21 00:24:39] logging.py:143 >> {'loss': 0.2194, 'learning_rate': 7.1789e-06, 'epoch': 2.26, 'throughput': 9996.77} [INFO|2025-03-21 00:25:19] logging.py:143 >> {'loss': 0.2422, 'learning_rate': 7.1691e-06, 'epoch': 2.26, 'throughput': 9996.83} [INFO|2025-03-21 00:25:59] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 7.1593e-06, 'epoch': 2.26, 'throughput': 9996.74} [INFO|2025-03-21 00:26:39] logging.py:143 >> {'loss': 0.2281, 'learning_rate': 7.1495e-06, 'epoch': 2.26, 'throughput': 9996.80} [INFO|2025-03-21 00:27:21] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 7.1397e-06, 'epoch': 2.27, 'throughput': 9996.71} [INFO|2025-03-21 00:28:02] logging.py:143 >> {'loss': 0.2709, 'learning_rate': 7.1299e-06, 'epoch': 2.27, 'throughput': 9996.72} [INFO|2025-03-21 00:28:42] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 7.1202e-06, 'epoch': 2.27, 'throughput': 9996.76} [INFO|2025-03-21 00:29:23] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 7.1104e-06, 'epoch': 2.27, 'throughput': 9996.77} [INFO|2025-03-21 00:30:01] logging.py:143 >> {'loss': 0.2396, 'learning_rate': 7.1006e-06, 'epoch': 2.27, 'throughput': 9996.88} [INFO|2025-03-21 00:30:41] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 7.0908e-06, 'epoch': 2.27, 'throughput': 9996.87} [INFO|2025-03-21 00:31:22] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 7.0811e-06, 'epoch': 2.27, 'throughput': 9996.90} [INFO|2025-03-21 00:32:03] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 7.0713e-06, 'epoch': 2.27, 'throughput': 9996.87} [INFO|2025-03-21 00:32:44] logging.py:143 >> {'loss': 0.2648, 'learning_rate': 7.0616e-06, 'epoch': 2.27, 'throughput': 9996.84} [INFO|2025-03-21 00:33:25] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 7.0518e-06, 'epoch': 2.27, 'throughput': 9996.81} [INFO|2025-03-21 00:34:08] logging.py:143 >> {'loss': 0.2257, 'learning_rate': 7.0421e-06, 'epoch': 2.27, 'throughput': 9996.71} [INFO|2025-03-21 00:34:48] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 7.0324e-06, 'epoch': 2.27, 'throughput': 9996.68} [INFO|2025-03-21 00:35:29] logging.py:143 >> {'loss': 0.2575, 'learning_rate': 7.0226e-06, 'epoch': 2.27, 'throughput': 9996.67} [INFO|2025-03-21 00:36:10] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 7.0129e-06, 'epoch': 2.27, 'throughput': 9996.70} [INFO|2025-03-21 00:36:52] logging.py:143 >> {'loss': 0.2367, 'learning_rate': 7.0032e-06, 'epoch': 2.27, 'throughput': 9996.61} [INFO|2025-03-21 00:37:33] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 6.9935e-06, 'epoch': 2.27, 'throughput': 9996.58} [INFO|2025-03-21 00:38:13] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 6.9838e-06, 'epoch': 2.27, 'throughput': 9996.58} [INFO|2025-03-21 00:38:53] logging.py:143 >> {'loss': 0.2229, 'learning_rate': 6.9741e-06, 'epoch': 2.27, 'throughput': 9996.58} [INFO|2025-03-21 00:39:34] logging.py:143 >> {'loss': 0.2491, 'learning_rate': 6.9644e-06, 'epoch': 2.27, 'throughput': 9996.58} [INFO|2025-03-21 00:40:13] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 6.9547e-06, 'epoch': 2.28, 'throughput': 9996.67} [INFO|2025-03-21 00:40:53] logging.py:143 >> {'loss': 0.2545, 'learning_rate': 6.9450e-06, 'epoch': 2.28, 'throughput': 9996.71} [INFO|2025-03-21 00:41:33] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 6.9354e-06, 'epoch': 2.28, 'throughput': 9996.70} [INFO|2025-03-21 00:42:14] logging.py:143 >> {'loss': 0.2142, 'learning_rate': 6.9257e-06, 'epoch': 2.28, 'throughput': 9996.66} [INFO|2025-03-21 00:42:54] logging.py:143 >> {'loss': 0.2419, 'learning_rate': 6.9160e-06, 'epoch': 2.28, 'throughput': 9996.64} [INFO|2025-03-21 00:43:36] logging.py:143 >> {'loss': 0.2502, 'learning_rate': 6.9064e-06, 'epoch': 2.28, 'throughput': 9996.65} [INFO|2025-03-21 00:44:16] logging.py:143 >> {'loss': 0.2242, 'learning_rate': 6.8967e-06, 'epoch': 2.28, 'throughput': 9996.67} [INFO|2025-03-21 00:44:57] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 6.8871e-06, 'epoch': 2.28, 'throughput': 9996.70} [INFO|2025-03-21 00:45:38] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 6.8774e-06, 'epoch': 2.28, 'throughput': 9996.68} [INFO|2025-03-21 00:46:18] logging.py:143 >> {'loss': 0.2458, 'learning_rate': 6.8678e-06, 'epoch': 2.28, 'throughput': 9996.70} [INFO|2025-03-21 00:46:59] logging.py:143 >> {'loss': 0.2503, 'learning_rate': 6.8581e-06, 'epoch': 2.28, 'throughput': 9996.69} [INFO|2025-03-21 00:47:39] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 6.8485e-06, 'epoch': 2.28, 'throughput': 9996.64} [INFO|2025-03-21 00:48:19] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 6.8389e-06, 'epoch': 2.28, 'throughput': 9996.58} [INFO|2025-03-21 00:48:59] logging.py:143 >> {'loss': 0.2430, 'learning_rate': 6.8293e-06, 'epoch': 2.28, 'throughput': 9996.57} [INFO|2025-03-21 00:49:40] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 6.8197e-06, 'epoch': 2.28, 'throughput': 9996.56} [INFO|2025-03-21 00:50:20] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 6.8101e-06, 'epoch': 2.28, 'throughput': 9996.63} [INFO|2025-03-21 00:51:00] logging.py:143 >> {'loss': 0.2419, 'learning_rate': 6.8005e-06, 'epoch': 2.28, 'throughput': 9996.64} [INFO|2025-03-21 00:51:39] logging.py:143 >> {'loss': 0.2394, 'learning_rate': 6.7909e-06, 'epoch': 2.28, 'throughput': 9996.67} [INFO|2025-03-21 00:52:19] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 6.7813e-06, 'epoch': 2.28, 'throughput': 9996.70} [INFO|2025-03-21 00:53:00] logging.py:143 >> {'loss': 0.2432, 'learning_rate': 6.7717e-06, 'epoch': 2.29, 'throughput': 9996.68} [INFO|2025-03-21 00:53:40] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 6.7621e-06, 'epoch': 2.29, 'throughput': 9996.69} [INFO|2025-03-21 00:54:23] logging.py:143 >> {'loss': 0.2133, 'learning_rate': 6.7526e-06, 'epoch': 2.29, 'throughput': 9996.57} [INFO|2025-03-21 00:55:03] logging.py:143 >> {'loss': 0.2238, 'learning_rate': 6.7430e-06, 'epoch': 2.29, 'throughput': 9996.60} [INFO|2025-03-21 00:55:44] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 6.7335e-06, 'epoch': 2.29, 'throughput': 9996.54} [INFO|2025-03-21 00:56:24] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 6.7239e-06, 'epoch': 2.29, 'throughput': 9996.58} [INFO|2025-03-21 00:57:04] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 6.7144e-06, 'epoch': 2.29, 'throughput': 9996.56} [INFO|2025-03-21 00:57:45] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 6.7048e-06, 'epoch': 2.29, 'throughput': 9996.61} [INFO|2025-03-21 00:58:25] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 6.6953e-06, 'epoch': 2.29, 'throughput': 9996.63} [INFO|2025-03-21 00:59:06] logging.py:143 >> {'loss': 0.2437, 'learning_rate': 6.6858e-06, 'epoch': 2.29, 'throughput': 9996.56} [INFO|2025-03-21 00:59:47] logging.py:143 >> {'loss': 0.2493, 'learning_rate': 6.6762e-06, 'epoch': 2.29, 'throughput': 9996.60} [INFO|2025-03-21 01:00:27] logging.py:143 >> {'loss': 0.2191, 'learning_rate': 6.6667e-06, 'epoch': 2.29, 'throughput': 9996.59} [INFO|2025-03-21 01:01:07] logging.py:143 >> {'loss': 0.2352, 'learning_rate': 6.6572e-06, 'epoch': 2.29, 'throughput': 9996.59} [INFO|2025-03-21 01:01:48] logging.py:143 >> {'loss': 0.2584, 'learning_rate': 6.6477e-06, 'epoch': 2.29, 'throughput': 9996.64} [INFO|2025-03-21 01:02:27] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 6.6382e-06, 'epoch': 2.29, 'throughput': 9996.68} [INFO|2025-03-21 01:03:10] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 6.6287e-06, 'epoch': 2.29, 'throughput': 9996.63} [INFO|2025-03-21 01:03:52] logging.py:143 >> {'loss': 0.2475, 'learning_rate': 6.6192e-06, 'epoch': 2.29, 'throughput': 9996.57} [INFO|2025-03-21 01:04:32] logging.py:143 >> {'loss': 0.2299, 'learning_rate': 6.6097e-06, 'epoch': 2.29, 'throughput': 9996.60} [INFO|2025-03-21 01:05:11] logging.py:143 >> {'loss': 0.2527, 'learning_rate': 6.6003e-06, 'epoch': 2.29, 'throughput': 9996.66} [INFO|2025-03-21 01:05:50] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 6.5908e-06, 'epoch': 2.30, 'throughput': 9996.64} [INFO|2025-03-21 01:06:31] logging.py:143 >> {'loss': 0.2479, 'learning_rate': 6.5813e-06, 'epoch': 2.30, 'throughput': 9996.70} [INFO|2025-03-21 01:07:11] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 6.5719e-06, 'epoch': 2.30, 'throughput': 9996.61} [INFO|2025-03-21 01:07:52] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 6.5624e-06, 'epoch': 2.30, 'throughput': 9996.65} [INFO|2025-03-21 01:08:31] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 6.5530e-06, 'epoch': 2.30, 'throughput': 9996.65} [INFO|2025-03-21 01:09:11] logging.py:143 >> {'loss': 0.2462, 'learning_rate': 6.5435e-06, 'epoch': 2.30, 'throughput': 9996.69} [INFO|2025-03-21 01:09:52] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 6.5341e-06, 'epoch': 2.30, 'throughput': 9996.70} [INFO|2025-03-21 01:10:33] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 6.5247e-06, 'epoch': 2.30, 'throughput': 9996.72} [INFO|2025-03-21 01:11:15] logging.py:143 >> {'loss': 0.2511, 'learning_rate': 6.5152e-06, 'epoch': 2.30, 'throughput': 9996.71} [INFO|2025-03-21 01:11:56] logging.py:143 >> {'loss': 0.2609, 'learning_rate': 6.5058e-06, 'epoch': 2.30, 'throughput': 9996.73} [INFO|2025-03-21 01:12:37] logging.py:143 >> {'loss': 0.2156, 'learning_rate': 6.4964e-06, 'epoch': 2.30, 'throughput': 9996.68} [INFO|2025-03-21 01:13:17] logging.py:143 >> {'loss': 0.2282, 'learning_rate': 6.4870e-06, 'epoch': 2.30, 'throughput': 9996.68} [INFO|2025-03-21 01:13:57] logging.py:143 >> {'loss': 0.2467, 'learning_rate': 6.4776e-06, 'epoch': 2.30, 'throughput': 9996.71} [INFO|2025-03-21 01:14:39] logging.py:143 >> {'loss': 0.2382, 'learning_rate': 6.4682e-06, 'epoch': 2.30, 'throughput': 9996.69} [INFO|2025-03-21 01:15:19] logging.py:143 >> {'loss': 0.2305, 'learning_rate': 6.4588e-06, 'epoch': 2.30, 'throughput': 9996.71} [INFO|2025-03-21 01:15:59] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 6.4494e-06, 'epoch': 2.30, 'throughput': 9996.78} [INFO|2025-03-21 01:16:39] logging.py:143 >> {'loss': 0.2468, 'learning_rate': 6.4401e-06, 'epoch': 2.30, 'throughput': 9996.81} [INFO|2025-03-21 01:17:19] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 6.4307e-06, 'epoch': 2.30, 'throughput': 9996.82} [INFO|2025-03-21 01:17:58] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 6.4213e-06, 'epoch': 2.30, 'throughput': 9996.87} [INFO|2025-03-21 01:18:38] logging.py:143 >> {'loss': 0.2614, 'learning_rate': 6.4120e-06, 'epoch': 2.31, 'throughput': 9996.88} [INFO|2025-03-21 01:19:18] logging.py:143 >> {'loss': 0.2520, 'learning_rate': 6.4026e-06, 'epoch': 2.31, 'throughput': 9996.93} [INFO|2025-03-21 01:19:58] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 6.3933e-06, 'epoch': 2.31, 'throughput': 9996.89} [INFO|2025-03-21 01:20:38] logging.py:143 >> {'loss': 0.2258, 'learning_rate': 6.3839e-06, 'epoch': 2.31, 'throughput': 9996.92} [INFO|2025-03-21 01:21:18] logging.py:143 >> {'loss': 0.2458, 'learning_rate': 6.3746e-06, 'epoch': 2.31, 'throughput': 9996.94} [INFO|2025-03-21 01:21:58] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 6.3652e-06, 'epoch': 2.31, 'throughput': 9996.92} [INFO|2025-03-21 01:22:40] logging.py:143 >> {'loss': 0.2504, 'learning_rate': 6.3559e-06, 'epoch': 2.31, 'throughput': 9996.92} [INFO|2025-03-21 01:23:20] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 6.3466e-06, 'epoch': 2.31, 'throughput': 9996.92} [INFO|2025-03-21 01:24:00] logging.py:143 >> {'loss': 0.2566, 'learning_rate': 6.3373e-06, 'epoch': 2.31, 'throughput': 9996.85} [INFO|2025-03-21 01:24:39] logging.py:143 >> {'loss': 0.2431, 'learning_rate': 6.3280e-06, 'epoch': 2.31, 'throughput': 9996.86} [INFO|2025-03-21 01:25:21] logging.py:143 >> {'loss': 0.2433, 'learning_rate': 6.3187e-06, 'epoch': 2.31, 'throughput': 9996.88} [INFO|2025-03-21 01:26:01] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 6.3094e-06, 'epoch': 2.31, 'throughput': 9996.83} [INFO|2025-03-21 01:26:42] logging.py:143 >> {'loss': 0.2505, 'learning_rate': 6.3001e-06, 'epoch': 2.31, 'throughput': 9996.84} [INFO|2025-03-21 01:27:22] logging.py:143 >> {'loss': 0.2477, 'learning_rate': 6.2908e-06, 'epoch': 2.31, 'throughput': 9996.85} [INFO|2025-03-21 01:28:03] logging.py:143 >> {'loss': 0.2252, 'learning_rate': 6.2815e-06, 'epoch': 2.31, 'throughput': 9996.83} [INFO|2025-03-21 01:28:44] logging.py:143 >> {'loss': 0.2508, 'learning_rate': 6.2722e-06, 'epoch': 2.31, 'throughput': 9996.82} [INFO|2025-03-21 01:29:26] logging.py:143 >> {'loss': 0.2203, 'learning_rate': 6.2630e-06, 'epoch': 2.31, 'throughput': 9996.76} [INFO|2025-03-21 01:30:05] logging.py:143 >> {'loss': 0.2314, 'learning_rate': 6.2537e-06, 'epoch': 2.31, 'throughput': 9996.75} [INFO|2025-03-21 01:30:46] logging.py:143 >> {'loss': 0.2361, 'learning_rate': 6.2445e-06, 'epoch': 2.32, 'throughput': 9996.76} [INFO|2025-03-21 01:31:25] logging.py:143 >> {'loss': 0.2373, 'learning_rate': 6.2352e-06, 'epoch': 2.32, 'throughput': 9996.84} [INFO|2025-03-21 01:32:07] logging.py:143 >> {'loss': 0.2598, 'learning_rate': 6.2260e-06, 'epoch': 2.32, 'throughput': 9996.80} [INFO|2025-03-21 01:32:47] logging.py:143 >> {'loss': 0.2150, 'learning_rate': 6.2167e-06, 'epoch': 2.32, 'throughput': 9996.84} [INFO|2025-03-21 01:33:27] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 6.2075e-06, 'epoch': 2.32, 'throughput': 9996.83} [INFO|2025-03-21 01:34:08] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 6.1983e-06, 'epoch': 2.32, 'throughput': 9996.87} [INFO|2025-03-21 01:34:47] logging.py:143 >> {'loss': 0.2491, 'learning_rate': 6.1891e-06, 'epoch': 2.32, 'throughput': 9996.96} [INFO|2025-03-21 01:35:26] logging.py:143 >> {'loss': 0.2405, 'learning_rate': 6.1798e-06, 'epoch': 2.32, 'throughput': 9996.94} [INFO|2025-03-21 01:36:06] logging.py:143 >> {'loss': 0.2313, 'learning_rate': 6.1706e-06, 'epoch': 2.32, 'throughput': 9996.99} [INFO|2025-03-21 01:36:46] logging.py:143 >> {'loss': 0.2517, 'learning_rate': 6.1614e-06, 'epoch': 2.32, 'throughput': 9996.96} [INFO|2025-03-21 01:37:27] logging.py:143 >> {'loss': 0.2585, 'learning_rate': 6.1522e-06, 'epoch': 2.32, 'throughput': 9996.98} [INFO|2025-03-21 01:38:08] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 6.1430e-06, 'epoch': 2.32, 'throughput': 9996.91} [INFO|2025-03-21 01:38:50] logging.py:143 >> {'loss': 0.2190, 'learning_rate': 6.1339e-06, 'epoch': 2.32, 'throughput': 9996.80} [INFO|2025-03-21 01:39:31] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 6.1247e-06, 'epoch': 2.32, 'throughput': 9996.81} [INFO|2025-03-21 01:40:09] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 6.1155e-06, 'epoch': 2.32, 'throughput': 9996.88} [INFO|2025-03-21 01:40:50] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 6.1063e-06, 'epoch': 2.32, 'throughput': 9996.86} [INFO|2025-03-21 01:41:29] logging.py:143 >> {'loss': 0.2334, 'learning_rate': 6.0972e-06, 'epoch': 2.32, 'throughput': 9996.96} [INFO|2025-03-21 01:42:09] logging.py:143 >> {'loss': 0.2405, 'learning_rate': 6.0880e-06, 'epoch': 2.32, 'throughput': 9997.01} [INFO|2025-03-21 01:42:49] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 6.0789e-06, 'epoch': 2.32, 'throughput': 9996.99} [INFO|2025-03-21 01:43:29] logging.py:143 >> {'loss': 0.2278, 'learning_rate': 6.0697e-06, 'epoch': 2.33, 'throughput': 9997.00} [INFO|2025-03-21 01:44:10] logging.py:143 >> {'loss': 0.2515, 'learning_rate': 6.0606e-06, 'epoch': 2.33, 'throughput': 9996.98} [INFO|2025-03-21 01:44:50] logging.py:143 >> {'loss': 0.2339, 'learning_rate': 6.0515e-06, 'epoch': 2.33, 'throughput': 9996.96} [INFO|2025-03-21 01:45:31] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 6.0423e-06, 'epoch': 2.33, 'throughput': 9996.99} [INFO|2025-03-21 01:46:12] logging.py:143 >> {'loss': 0.2472, 'learning_rate': 6.0332e-06, 'epoch': 2.33, 'throughput': 9996.95} [INFO|2025-03-21 01:46:51] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 6.0241e-06, 'epoch': 2.33, 'throughput': 9996.96} [INFO|2025-03-21 01:47:31] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 6.0150e-06, 'epoch': 2.33, 'throughput': 9996.93} [INFO|2025-03-21 01:48:12] logging.py:143 >> {'loss': 0.2419, 'learning_rate': 6.0059e-06, 'epoch': 2.33, 'throughput': 9996.95} [INFO|2025-03-21 01:48:53] logging.py:143 >> {'loss': 0.2489, 'learning_rate': 5.9968e-06, 'epoch': 2.33, 'throughput': 9996.93} [INFO|2025-03-21 01:49:34] logging.py:143 >> {'loss': 0.2485, 'learning_rate': 5.9877e-06, 'epoch': 2.33, 'throughput': 9996.90} [INFO|2025-03-21 01:50:15] logging.py:143 >> {'loss': 0.2267, 'learning_rate': 5.9786e-06, 'epoch': 2.33, 'throughput': 9996.92} [INFO|2025-03-21 01:50:55] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 5.9696e-06, 'epoch': 2.33, 'throughput': 9996.94} [INFO|2025-03-21 01:51:35] logging.py:143 >> {'loss': 0.2423, 'learning_rate': 5.9605e-06, 'epoch': 2.33, 'throughput': 9996.94} [INFO|2025-03-21 01:52:15] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 5.9514e-06, 'epoch': 2.33, 'throughput': 9997.00} [INFO|2025-03-21 01:52:56] logging.py:143 >> {'loss': 0.2446, 'learning_rate': 5.9424e-06, 'epoch': 2.33, 'throughput': 9996.96} [INFO|2025-03-21 01:53:36] logging.py:143 >> {'loss': 0.2507, 'learning_rate': 5.9333e-06, 'epoch': 2.33, 'throughput': 9997.02} [INFO|2025-03-21 01:54:18] logging.py:143 >> {'loss': 0.2408, 'learning_rate': 5.9243e-06, 'epoch': 2.33, 'throughput': 9996.96} [INFO|2025-03-21 01:54:58] logging.py:143 >> {'loss': 0.2123, 'learning_rate': 5.9152e-06, 'epoch': 2.33, 'throughput': 9996.89} [INFO|2025-03-21 01:55:40] logging.py:143 >> {'loss': 0.2382, 'learning_rate': 5.9062e-06, 'epoch': 2.33, 'throughput': 9996.90} [INFO|2025-03-21 01:56:20] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 5.8971e-06, 'epoch': 2.34, 'throughput': 9996.93} [INFO|2025-03-21 01:57:01] logging.py:143 >> {'loss': 0.2498, 'learning_rate': 5.8881e-06, 'epoch': 2.34, 'throughput': 9996.90} [INFO|2025-03-21 01:57:41] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 5.8791e-06, 'epoch': 2.34, 'throughput': 9996.94} [INFO|2025-03-21 01:58:21] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 5.8701e-06, 'epoch': 2.34, 'throughput': 9996.99} [INFO|2025-03-21 01:59:02] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 5.8611e-06, 'epoch': 2.34, 'throughput': 9996.97} [INFO|2025-03-21 01:59:42] logging.py:143 >> {'loss': 0.2560, 'learning_rate': 5.8521e-06, 'epoch': 2.34, 'throughput': 9996.98} [INFO|2025-03-21 02:00:22] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 5.8431e-06, 'epoch': 2.34, 'throughput': 9996.97} [INFO|2025-03-21 02:01:02] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 5.8341e-06, 'epoch': 2.34, 'throughput': 9996.98} [INFO|2025-03-21 02:01:42] logging.py:143 >> {'loss': 0.2188, 'learning_rate': 5.8251e-06, 'epoch': 2.34, 'throughput': 9996.97} [INFO|2025-03-21 02:02:21] logging.py:143 >> {'loss': 0.2481, 'learning_rate': 5.8161e-06, 'epoch': 2.34, 'throughput': 9997.00} [INFO|2025-03-21 02:03:03] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 5.8072e-06, 'epoch': 2.34, 'throughput': 9997.00} [INFO|2025-03-21 02:03:43] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 5.7982e-06, 'epoch': 2.34, 'throughput': 9997.03} [INFO|2025-03-21 02:04:23] logging.py:143 >> {'loss': 0.2390, 'learning_rate': 5.7893e-06, 'epoch': 2.34, 'throughput': 9996.98} [INFO|2025-03-21 02:05:03] logging.py:143 >> {'loss': 0.2422, 'learning_rate': 5.7803e-06, 'epoch': 2.34, 'throughput': 9996.96} [INFO|2025-03-21 02:05:45] logging.py:143 >> {'loss': 0.2548, 'learning_rate': 5.7714e-06, 'epoch': 2.34, 'throughput': 9996.97} [INFO|2025-03-21 02:06:25] logging.py:143 >> {'loss': 0.2323, 'learning_rate': 5.7624e-06, 'epoch': 2.34, 'throughput': 9996.99} [INFO|2025-03-21 02:07:05] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 5.7535e-06, 'epoch': 2.34, 'throughput': 9996.96} [INFO|2025-03-21 02:07:46] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 5.7446e-06, 'epoch': 2.34, 'throughput': 9996.90} [INFO|2025-03-21 02:08:26] logging.py:143 >> {'loss': 0.2385, 'learning_rate': 5.7356e-06, 'epoch': 2.34, 'throughput': 9996.95} [INFO|2025-03-21 02:09:07] logging.py:143 >> {'loss': 0.2561, 'learning_rate': 5.7267e-06, 'epoch': 2.35, 'throughput': 9996.94} [INFO|2025-03-21 02:09:48] logging.py:143 >> {'loss': 0.2603, 'learning_rate': 5.7178e-06, 'epoch': 2.35, 'throughput': 9996.93} [INFO|2025-03-21 02:10:28] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 5.7089e-06, 'epoch': 2.35, 'throughput': 9996.95} [INFO|2025-03-21 02:11:09] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 5.7000e-06, 'epoch': 2.35, 'throughput': 9996.99} [INFO|2025-03-21 02:11:49] logging.py:143 >> {'loss': 0.2632, 'learning_rate': 5.6911e-06, 'epoch': 2.35, 'throughput': 9996.94} [INFO|2025-03-21 02:12:31] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 5.6822e-06, 'epoch': 2.35, 'throughput': 9996.88} [INFO|2025-03-21 02:13:11] logging.py:143 >> {'loss': 0.2427, 'learning_rate': 5.6734e-06, 'epoch': 2.35, 'throughput': 9996.92} [INFO|2025-03-21 02:13:50] logging.py:143 >> {'loss': 0.2489, 'learning_rate': 5.6645e-06, 'epoch': 2.35, 'throughput': 9996.95} [INFO|2025-03-21 02:14:30] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 5.6556e-06, 'epoch': 2.35, 'throughput': 9996.97} [INFO|2025-03-21 02:15:12] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 5.6467e-06, 'epoch': 2.35, 'throughput': 9996.90} [INFO|2025-03-21 02:15:51] logging.py:143 >> {'loss': 0.2485, 'learning_rate': 5.6379e-06, 'epoch': 2.35, 'throughput': 9996.93} [INFO|2025-03-21 02:16:31] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 5.6290e-06, 'epoch': 2.35, 'throughput': 9996.88} [INFO|2025-03-21 02:17:12] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 5.6202e-06, 'epoch': 2.35, 'throughput': 9996.83} [INFO|2025-03-21 02:17:52] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 5.6114e-06, 'epoch': 2.35, 'throughput': 9996.86} [INFO|2025-03-21 02:18:32] logging.py:143 >> {'loss': 0.2261, 'learning_rate': 5.6025e-06, 'epoch': 2.35, 'throughput': 9996.93} [INFO|2025-03-21 02:19:12] logging.py:143 >> {'loss': 0.2166, 'learning_rate': 5.5937e-06, 'epoch': 2.35, 'throughput': 9996.82} [INFO|2025-03-21 02:19:53] logging.py:143 >> {'loss': 0.2445, 'learning_rate': 5.5849e-06, 'epoch': 2.35, 'throughput': 9996.76} [INFO|2025-03-21 02:20:33] logging.py:143 >> {'loss': 0.2262, 'learning_rate': 5.5761e-06, 'epoch': 2.35, 'throughput': 9996.79} [INFO|2025-03-21 02:21:13] logging.py:143 >> {'loss': 0.2549, 'learning_rate': 5.5673e-06, 'epoch': 2.35, 'throughput': 9996.80} [INFO|2025-03-21 02:21:55] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 5.5585e-06, 'epoch': 2.36, 'throughput': 9996.81} [INFO|2025-03-21 02:22:35] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 5.5497e-06, 'epoch': 2.36, 'throughput': 9996.84} [INFO|2025-03-21 02:23:15] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 5.5409e-06, 'epoch': 2.36, 'throughput': 9996.81} [INFO|2025-03-21 02:23:58] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 5.5321e-06, 'epoch': 2.36, 'throughput': 9996.72} [INFO|2025-03-21 02:24:39] logging.py:143 >> {'loss': 0.2405, 'learning_rate': 5.5233e-06, 'epoch': 2.36, 'throughput': 9996.74} [INFO|2025-03-21 02:25:18] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 5.5146e-06, 'epoch': 2.36, 'throughput': 9996.76} [INFO|2025-03-21 02:26:00] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 5.5058e-06, 'epoch': 2.36, 'throughput': 9996.71} [INFO|2025-03-21 02:26:40] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 5.4970e-06, 'epoch': 2.36, 'throughput': 9996.71} [INFO|2025-03-21 02:27:20] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 5.4883e-06, 'epoch': 2.36, 'throughput': 9996.78} [INFO|2025-03-21 02:27:59] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 5.4795e-06, 'epoch': 2.36, 'throughput': 9996.75} [INFO|2025-03-21 02:28:41] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 5.4708e-06, 'epoch': 2.36, 'throughput': 9996.64} [INFO|2025-03-21 02:29:23] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 5.4621e-06, 'epoch': 2.36, 'throughput': 9996.59} [INFO|2025-03-21 02:30:05] logging.py:143 >> {'loss': 0.2381, 'learning_rate': 5.4533e-06, 'epoch': 2.36, 'throughput': 9996.59} [INFO|2025-03-21 02:30:46] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 5.4446e-06, 'epoch': 2.36, 'throughput': 9996.59} [INFO|2025-03-21 02:31:25] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 5.4359e-06, 'epoch': 2.36, 'throughput': 9996.65} [INFO|2025-03-21 02:32:08] logging.py:143 >> {'loss': 0.2575, 'learning_rate': 5.4272e-06, 'epoch': 2.36, 'throughput': 9996.56} [INFO|2025-03-21 02:32:49] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 5.4185e-06, 'epoch': 2.36, 'throughput': 9996.57} [INFO|2025-03-21 02:33:29] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 5.4098e-06, 'epoch': 2.36, 'throughput': 9996.55} [INFO|2025-03-21 02:34:10] logging.py:143 >> {'loss': 0.2222, 'learning_rate': 5.4011e-06, 'epoch': 2.36, 'throughput': 9996.52} [INFO|2025-03-21 02:34:51] logging.py:143 >> {'loss': 0.2249, 'learning_rate': 5.3924e-06, 'epoch': 2.37, 'throughput': 9996.49} [INFO|2025-03-21 02:35:32] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 5.3837e-06, 'epoch': 2.37, 'throughput': 9996.48} [INFO|2025-03-21 02:36:12] logging.py:143 >> {'loss': 0.2462, 'learning_rate': 5.3751e-06, 'epoch': 2.37, 'throughput': 9996.50} [INFO|2025-03-21 02:36:54] logging.py:143 >> {'loss': 0.2508, 'learning_rate': 5.3664e-06, 'epoch': 2.37, 'throughput': 9996.42} [INFO|2025-03-21 02:37:33] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 5.3577e-06, 'epoch': 2.37, 'throughput': 9996.46} [INFO|2025-03-21 02:38:14] logging.py:143 >> {'loss': 0.2525, 'learning_rate': 5.3491e-06, 'epoch': 2.37, 'throughput': 9996.44} [INFO|2025-03-21 02:38:54] logging.py:143 >> {'loss': 0.2350, 'learning_rate': 5.3404e-06, 'epoch': 2.37, 'throughput': 9996.49} [INFO|2025-03-21 02:39:34] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 5.3318e-06, 'epoch': 2.37, 'throughput': 9996.47} [INFO|2025-03-21 02:40:16] logging.py:143 >> {'loss': 0.2533, 'learning_rate': 5.3232e-06, 'epoch': 2.37, 'throughput': 9996.41} [INFO|2025-03-21 02:40:56] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 5.3145e-06, 'epoch': 2.37, 'throughput': 9996.34} [INFO|2025-03-21 02:41:37] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 5.3059e-06, 'epoch': 2.37, 'throughput': 9996.36} [INFO|2025-03-21 02:42:16] logging.py:143 >> {'loss': 0.2328, 'learning_rate': 5.2973e-06, 'epoch': 2.37, 'throughput': 9996.44} [INFO|2025-03-21 02:43:00] logging.py:143 >> {'loss': 0.2153, 'learning_rate': 5.2887e-06, 'epoch': 2.37, 'throughput': 9996.38} [INFO|2025-03-21 02:43:40] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 5.2801e-06, 'epoch': 2.37, 'throughput': 9996.41} [INFO|2025-03-21 02:44:22] logging.py:143 >> {'loss': 0.2371, 'learning_rate': 5.2715e-06, 'epoch': 2.37, 'throughput': 9996.31} [INFO|2025-03-21 02:45:04] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 5.2629e-06, 'epoch': 2.37, 'throughput': 9996.27} [INFO|2025-03-21 02:45:45] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 5.2543e-06, 'epoch': 2.37, 'throughput': 9996.28} [INFO|2025-03-21 02:46:28] logging.py:143 >> {'loss': 0.2572, 'learning_rate': 5.2457e-06, 'epoch': 2.37, 'throughput': 9996.14} [INFO|2025-03-21 02:47:09] logging.py:143 >> {'loss': 0.2463, 'learning_rate': 5.2372e-06, 'epoch': 2.38, 'throughput': 9996.16} [INFO|2025-03-21 02:47:51] logging.py:143 >> {'loss': 0.2348, 'learning_rate': 5.2286e-06, 'epoch': 2.38, 'throughput': 9996.09} [INFO|2025-03-21 02:48:31] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 5.2200e-06, 'epoch': 2.38, 'throughput': 9996.02} [INFO|2025-03-21 02:49:11] logging.py:143 >> {'loss': 0.2470, 'learning_rate': 5.2115e-06, 'epoch': 2.38, 'throughput': 9995.93} [INFO|2025-03-21 02:49:53] logging.py:143 >> {'loss': 0.2562, 'learning_rate': 5.2029e-06, 'epoch': 2.38, 'throughput': 9995.94} [INFO|2025-03-21 02:50:33] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 5.1944e-06, 'epoch': 2.38, 'throughput': 9995.95} [INFO|2025-03-21 02:51:13] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 5.1858e-06, 'epoch': 2.38, 'throughput': 9995.96} [INFO|2025-03-21 02:51:54] logging.py:143 >> {'loss': 0.2151, 'learning_rate': 5.1773e-06, 'epoch': 2.38, 'throughput': 9995.97} [INFO|2025-03-21 02:52:36] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 5.1688e-06, 'epoch': 2.38, 'throughput': 9995.94} [INFO|2025-03-21 02:53:16] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 5.1603e-06, 'epoch': 2.38, 'throughput': 9995.94} [INFO|2025-03-21 02:53:58] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 5.1518e-06, 'epoch': 2.38, 'throughput': 9995.93} [INFO|2025-03-21 02:54:39] logging.py:143 >> {'loss': 0.2210, 'learning_rate': 5.1433e-06, 'epoch': 2.38, 'throughput': 9995.85} [INFO|2025-03-21 02:55:21] logging.py:143 >> {'loss': 0.2419, 'learning_rate': 5.1348e-06, 'epoch': 2.38, 'throughput': 9995.81} [INFO|2025-03-21 02:56:02] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 5.1263e-06, 'epoch': 2.38, 'throughput': 9995.70} [INFO|2025-03-21 02:56:42] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 5.1178e-06, 'epoch': 2.38, 'throughput': 9995.64} [INFO|2025-03-21 02:57:24] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 5.1093e-06, 'epoch': 2.38, 'throughput': 9995.52} [INFO|2025-03-21 02:58:06] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 5.1008e-06, 'epoch': 2.38, 'throughput': 9995.47} [INFO|2025-03-21 02:58:47] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 5.0924e-06, 'epoch': 2.38, 'throughput': 9995.48} [INFO|2025-03-21 02:59:28] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 5.0839e-06, 'epoch': 2.38, 'throughput': 9995.47} [INFO|2025-03-21 03:00:08] logging.py:143 >> {'loss': 0.2499, 'learning_rate': 5.0754e-06, 'epoch': 2.39, 'throughput': 9995.46} [INFO|2025-03-21 03:00:49] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 5.0670e-06, 'epoch': 2.39, 'throughput': 9995.45} [INFO|2025-03-21 03:01:30] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 5.0586e-06, 'epoch': 2.39, 'throughput': 9995.45} [INFO|2025-03-21 03:02:10] logging.py:143 >> {'loss': 0.2138, 'learning_rate': 5.0501e-06, 'epoch': 2.39, 'throughput': 9995.42} [INFO|2025-03-21 03:02:50] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 5.0417e-06, 'epoch': 2.39, 'throughput': 9995.44} [INFO|2025-03-21 03:03:31] logging.py:143 >> {'loss': 0.2274, 'learning_rate': 5.0333e-06, 'epoch': 2.39, 'throughput': 9995.46} [INFO|2025-03-21 03:04:13] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 5.0248e-06, 'epoch': 2.39, 'throughput': 9995.43} [INFO|2025-03-21 03:04:52] logging.py:143 >> {'loss': 0.2141, 'learning_rate': 5.0164e-06, 'epoch': 2.39, 'throughput': 9995.49} [INFO|2025-03-21 03:05:33] logging.py:143 >> {'loss': 0.2429, 'learning_rate': 5.0080e-06, 'epoch': 2.39, 'throughput': 9995.40} [INFO|2025-03-21 03:06:14] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 4.9996e-06, 'epoch': 2.39, 'throughput': 9995.38} [INFO|2025-03-21 03:06:55] logging.py:143 >> {'loss': 0.2408, 'learning_rate': 4.9912e-06, 'epoch': 2.39, 'throughput': 9995.35} [INFO|2025-03-21 03:07:36] logging.py:143 >> {'loss': 0.2396, 'learning_rate': 4.9828e-06, 'epoch': 2.39, 'throughput': 9995.29} [INFO|2025-03-21 03:08:18] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 4.9745e-06, 'epoch': 2.39, 'throughput': 9995.23} [INFO|2025-03-21 03:09:00] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 4.9661e-06, 'epoch': 2.39, 'throughput': 9995.21} [INFO|2025-03-21 03:09:40] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 4.9577e-06, 'epoch': 2.39, 'throughput': 9995.27} [INFO|2025-03-21 03:10:20] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 4.9494e-06, 'epoch': 2.39, 'throughput': 9995.28} [INFO|2025-03-21 03:11:01] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 4.9410e-06, 'epoch': 2.39, 'throughput': 9995.28} [INFO|2025-03-21 03:11:42] logging.py:143 >> {'loss': 0.2431, 'learning_rate': 4.9327e-06, 'epoch': 2.39, 'throughput': 9995.25} [INFO|2025-03-21 03:12:23] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 4.9243e-06, 'epoch': 2.39, 'throughput': 9995.21} [INFO|2025-03-21 03:13:03] logging.py:143 >> {'loss': 0.2542, 'learning_rate': 4.9160e-06, 'epoch': 2.40, 'throughput': 9995.25} [INFO|2025-03-21 03:13:43] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 4.9077e-06, 'epoch': 2.40, 'throughput': 9995.29} [INFO|2025-03-21 03:14:25] logging.py:143 >> {'loss': 0.2444, 'learning_rate': 4.8993e-06, 'epoch': 2.40, 'throughput': 9995.27} [INFO|2025-03-21 03:15:05] logging.py:143 >> {'loss': 0.2564, 'learning_rate': 4.8910e-06, 'epoch': 2.40, 'throughput': 9995.31} [INFO|2025-03-21 03:15:47] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 4.8827e-06, 'epoch': 2.40, 'throughput': 9995.28} [INFO|2025-03-21 03:16:28] logging.py:143 >> {'loss': 0.2652, 'learning_rate': 4.8744e-06, 'epoch': 2.40, 'throughput': 9995.24} [INFO|2025-03-21 03:17:07] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 4.8661e-06, 'epoch': 2.40, 'throughput': 9995.28} [INFO|2025-03-21 03:17:48] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 4.8578e-06, 'epoch': 2.40, 'throughput': 9995.20} [INFO|2025-03-21 03:18:29] logging.py:143 >> {'loss': 0.2209, 'learning_rate': 4.8495e-06, 'epoch': 2.40, 'throughput': 9995.12} [INFO|2025-03-21 03:19:10] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 4.8412e-06, 'epoch': 2.40, 'throughput': 9995.09} [INFO|2025-03-21 03:19:51] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 4.8330e-06, 'epoch': 2.40, 'throughput': 9995.04} [INFO|2025-03-21 03:20:31] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 4.8247e-06, 'epoch': 2.40, 'throughput': 9995.10} [INFO|2025-03-21 03:21:11] logging.py:143 >> {'loss': 0.2049, 'learning_rate': 4.8164e-06, 'epoch': 2.40, 'throughput': 9995.11} [INFO|2025-03-21 03:21:50] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 4.8082e-06, 'epoch': 2.40, 'throughput': 9995.10} [INFO|2025-03-21 03:22:32] logging.py:143 >> {'loss': 0.2519, 'learning_rate': 4.7999e-06, 'epoch': 2.40, 'throughput': 9995.01} [INFO|2025-03-21 03:23:10] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 4.7917e-06, 'epoch': 2.40, 'throughput': 9995.06} [INFO|2025-03-21 03:23:51] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 4.7835e-06, 'epoch': 2.40, 'throughput': 9995.05} [INFO|2025-03-21 03:24:31] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 4.7752e-06, 'epoch': 2.40, 'throughput': 9995.04} [INFO|2025-03-21 03:25:12] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 4.7670e-06, 'epoch': 2.40, 'throughput': 9995.07} [INFO|2025-03-21 03:25:53] logging.py:143 >> {'loss': 0.2286, 'learning_rate': 4.7588e-06, 'epoch': 2.41, 'throughput': 9995.01} [INFO|2025-03-21 03:26:35] logging.py:143 >> {'loss': 0.2265, 'learning_rate': 4.7506e-06, 'epoch': 2.41, 'throughput': 9995.02} [INFO|2025-03-21 03:27:15] logging.py:143 >> {'loss': 0.2003, 'learning_rate': 4.7424e-06, 'epoch': 2.41, 'throughput': 9995.01} [INFO|2025-03-21 03:27:56] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 4.7342e-06, 'epoch': 2.41, 'throughput': 9994.98} [INFO|2025-03-21 03:28:37] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 4.7260e-06, 'epoch': 2.41, 'throughput': 9995.00} [INFO|2025-03-21 03:29:19] logging.py:143 >> {'loss': 0.2522, 'learning_rate': 4.7178e-06, 'epoch': 2.41, 'throughput': 9994.92} [INFO|2025-03-21 03:30:01] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 4.7096e-06, 'epoch': 2.41, 'throughput': 9994.94} [INFO|2025-03-21 03:30:41] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 4.7015e-06, 'epoch': 2.41, 'throughput': 9994.98} [INFO|2025-03-21 03:31:21] logging.py:143 >> {'loss': 0.2525, 'learning_rate': 4.6933e-06, 'epoch': 2.41, 'throughput': 9995.02} [INFO|2025-03-21 03:32:01] logging.py:143 >> {'loss': 0.2410, 'learning_rate': 4.6851e-06, 'epoch': 2.41, 'throughput': 9995.07} [INFO|2025-03-21 03:32:42] logging.py:143 >> {'loss': 0.2178, 'learning_rate': 4.6770e-06, 'epoch': 2.41, 'throughput': 9995.00} [INFO|2025-03-21 03:33:24] logging.py:143 >> {'loss': 0.2243, 'learning_rate': 4.6688e-06, 'epoch': 2.41, 'throughput': 9994.99} [INFO|2025-03-21 03:34:04] logging.py:143 >> {'loss': 0.2375, 'learning_rate': 4.6607e-06, 'epoch': 2.41, 'throughput': 9995.03} [INFO|2025-03-21 03:34:45] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 4.6526e-06, 'epoch': 2.41, 'throughput': 9995.00} [INFO|2025-03-21 03:35:26] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 4.6444e-06, 'epoch': 2.41, 'throughput': 9995.00} [INFO|2025-03-21 03:36:07] logging.py:143 >> {'loss': 0.2575, 'learning_rate': 4.6363e-06, 'epoch': 2.41, 'throughput': 9994.99} [INFO|2025-03-21 03:36:49] logging.py:143 >> {'loss': 0.2305, 'learning_rate': 4.6282e-06, 'epoch': 2.41, 'throughput': 9994.85} [INFO|2025-03-21 03:37:30] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 4.6201e-06, 'epoch': 2.41, 'throughput': 9994.80} [INFO|2025-03-21 03:38:10] logging.py:143 >> {'loss': 0.2530, 'learning_rate': 4.6120e-06, 'epoch': 2.41, 'throughput': 9994.82} [INFO|2025-03-21 03:38:52] logging.py:143 >> {'loss': 0.2499, 'learning_rate': 4.6039e-06, 'epoch': 2.42, 'throughput': 9994.79} [INFO|2025-03-21 03:39:32] logging.py:143 >> {'loss': 0.2315, 'learning_rate': 4.5958e-06, 'epoch': 2.42, 'throughput': 9994.82} [INFO|2025-03-21 03:40:13] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 4.5877e-06, 'epoch': 2.42, 'throughput': 9994.84} [INFO|2025-03-21 03:40:54] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 4.5796e-06, 'epoch': 2.42, 'throughput': 9994.80} [INFO|2025-03-21 03:41:36] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 4.5716e-06, 'epoch': 2.42, 'throughput': 9994.75} [INFO|2025-03-21 03:42:17] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 4.5635e-06, 'epoch': 2.42, 'throughput': 9994.77} [INFO|2025-03-21 03:42:59] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 4.5555e-06, 'epoch': 2.42, 'throughput': 9994.69} [INFO|2025-03-21 03:43:41] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 4.5474e-06, 'epoch': 2.42, 'throughput': 9994.58} [INFO|2025-03-21 03:44:21] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 4.5394e-06, 'epoch': 2.42, 'throughput': 9994.60} [INFO|2025-03-21 03:45:00] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 4.5313e-06, 'epoch': 2.42, 'throughput': 9994.65} [INFO|2025-03-21 03:45:41] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 4.5233e-06, 'epoch': 2.42, 'throughput': 9994.64} [INFO|2025-03-21 03:46:21] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 4.5153e-06, 'epoch': 2.42, 'throughput': 9994.68} [INFO|2025-03-21 03:47:00] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 4.5073e-06, 'epoch': 2.42, 'throughput': 9994.71} [INFO|2025-03-21 03:47:40] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 4.4992e-06, 'epoch': 2.42, 'throughput': 9994.71} [INFO|2025-03-21 03:48:21] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 4.4912e-06, 'epoch': 2.42, 'throughput': 9994.75} [INFO|2025-03-21 03:49:01] logging.py:143 >> {'loss': 0.2350, 'learning_rate': 4.4832e-06, 'epoch': 2.42, 'throughput': 9994.75} [INFO|2025-03-21 03:49:41] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 4.4752e-06, 'epoch': 2.42, 'throughput': 9994.78} [INFO|2025-03-21 03:50:21] logging.py:143 >> {'loss': 0.2161, 'learning_rate': 4.4673e-06, 'epoch': 2.42, 'throughput': 9994.76} [INFO|2025-03-21 03:51:02] logging.py:143 >> {'loss': 0.2560, 'learning_rate': 4.4593e-06, 'epoch': 2.42, 'throughput': 9994.79} [INFO|2025-03-21 03:51:42] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 4.4513e-06, 'epoch': 2.43, 'throughput': 9994.80} [INFO|2025-03-21 03:52:23] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 4.4433e-06, 'epoch': 2.43, 'throughput': 9994.82} [INFO|2025-03-21 03:53:04] logging.py:143 >> {'loss': 0.2328, 'learning_rate': 4.4354e-06, 'epoch': 2.43, 'throughput': 9994.82} [INFO|2025-03-21 03:53:45] logging.py:143 >> {'loss': 0.2322, 'learning_rate': 4.4274e-06, 'epoch': 2.43, 'throughput': 9994.81} [INFO|2025-03-21 03:54:27] logging.py:143 >> {'loss': 0.2173, 'learning_rate': 4.4195e-06, 'epoch': 2.43, 'throughput': 9994.77} [INFO|2025-03-21 03:55:07] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 4.4115e-06, 'epoch': 2.43, 'throughput': 9994.74} [INFO|2025-03-21 03:55:48] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 4.4036e-06, 'epoch': 2.43, 'throughput': 9994.72} [INFO|2025-03-21 03:56:30] logging.py:143 >> {'loss': 0.2550, 'learning_rate': 4.3957e-06, 'epoch': 2.43, 'throughput': 9994.74} [INFO|2025-03-21 03:57:11] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 4.3877e-06, 'epoch': 2.43, 'throughput': 9994.71} [INFO|2025-03-21 03:57:51] logging.py:143 >> {'loss': 0.2485, 'learning_rate': 4.3798e-06, 'epoch': 2.43, 'throughput': 9994.74} [INFO|2025-03-21 03:58:31] logging.py:143 >> {'loss': 0.2346, 'learning_rate': 4.3719e-06, 'epoch': 2.43, 'throughput': 9994.73} [INFO|2025-03-21 03:59:12] logging.py:143 >> {'loss': 0.2404, 'learning_rate': 4.3640e-06, 'epoch': 2.43, 'throughput': 9994.69} [INFO|2025-03-21 03:59:54] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 4.3561e-06, 'epoch': 2.43, 'throughput': 9994.67} [INFO|2025-03-21 04:00:34] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 4.3482e-06, 'epoch': 2.43, 'throughput': 9994.67} [INFO|2025-03-21 04:01:15] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 4.3404e-06, 'epoch': 2.43, 'throughput': 9994.65} [INFO|2025-03-21 04:01:55] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 4.3325e-06, 'epoch': 2.43, 'throughput': 9994.60} [INFO|2025-03-21 04:02:35] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 4.3246e-06, 'epoch': 2.43, 'throughput': 9994.56} [INFO|2025-03-21 04:03:17] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 4.3167e-06, 'epoch': 2.43, 'throughput': 9994.56} [INFO|2025-03-21 04:03:58] logging.py:143 >> {'loss': 0.2483, 'learning_rate': 4.3089e-06, 'epoch': 2.43, 'throughput': 9994.49} [INFO|2025-03-21 04:04:38] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 4.3010e-06, 'epoch': 2.44, 'throughput': 9994.48} [INFO|2025-03-21 04:05:19] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 4.2932e-06, 'epoch': 2.44, 'throughput': 9994.48} [INFO|2025-03-21 04:05:59] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 4.2854e-06, 'epoch': 2.44, 'throughput': 9994.49} [INFO|2025-03-21 04:06:40] logging.py:143 >> {'loss': 0.2206, 'learning_rate': 4.2775e-06, 'epoch': 2.44, 'throughput': 9994.45} [INFO|2025-03-21 04:07:21] logging.py:143 >> {'loss': 0.2387, 'learning_rate': 4.2697e-06, 'epoch': 2.44, 'throughput': 9994.42} [INFO|2025-03-21 04:08:02] logging.py:143 >> {'loss': 0.2262, 'learning_rate': 4.2619e-06, 'epoch': 2.44, 'throughput': 9994.43} [INFO|2025-03-21 04:08:42] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 4.2541e-06, 'epoch': 2.44, 'throughput': 9994.49} [INFO|2025-03-21 04:09:22] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 4.2463e-06, 'epoch': 2.44, 'throughput': 9994.50} [INFO|2025-03-21 04:10:04] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 4.2385e-06, 'epoch': 2.44, 'throughput': 9994.45} [INFO|2025-03-21 04:10:44] logging.py:143 >> {'loss': 0.2320, 'learning_rate': 4.2307e-06, 'epoch': 2.44, 'throughput': 9994.47} [INFO|2025-03-21 04:11:25] logging.py:143 >> {'loss': 0.2462, 'learning_rate': 4.2229e-06, 'epoch': 2.44, 'throughput': 9994.53} [INFO|2025-03-21 04:12:05] logging.py:143 >> {'loss': 0.2296, 'learning_rate': 4.2151e-06, 'epoch': 2.44, 'throughput': 9994.58} [INFO|2025-03-21 04:12:45] logging.py:143 >> {'loss': 0.2145, 'learning_rate': 4.2073e-06, 'epoch': 2.44, 'throughput': 9994.62} [INFO|2025-03-21 04:13:26] logging.py:143 >> {'loss': 0.2581, 'learning_rate': 4.1996e-06, 'epoch': 2.44, 'throughput': 9994.60} [INFO|2025-03-21 04:14:07] logging.py:143 >> {'loss': 0.2403, 'learning_rate': 4.1918e-06, 'epoch': 2.44, 'throughput': 9994.50} [INFO|2025-03-21 04:14:48] logging.py:143 >> {'loss': 0.2475, 'learning_rate': 4.1841e-06, 'epoch': 2.44, 'throughput': 9994.48} [INFO|2025-03-21 04:15:30] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 4.1763e-06, 'epoch': 2.44, 'throughput': 9994.52} [INFO|2025-03-21 04:16:11] logging.py:143 >> {'loss': 0.2177, 'learning_rate': 4.1686e-06, 'epoch': 2.44, 'throughput': 9994.49} [INFO|2025-03-21 04:16:52] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 4.1608e-06, 'epoch': 2.45, 'throughput': 9994.47} [INFO|2025-03-21 04:17:32] logging.py:143 >> {'loss': 0.2474, 'learning_rate': 4.1531e-06, 'epoch': 2.45, 'throughput': 9994.52} [INFO|2025-03-21 04:18:13] logging.py:143 >> {'loss': 0.2447, 'learning_rate': 4.1454e-06, 'epoch': 2.45, 'throughput': 9994.52} [INFO|2025-03-21 04:18:53] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 4.1377e-06, 'epoch': 2.45, 'throughput': 9994.57} [INFO|2025-03-21 04:19:35] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 4.1300e-06, 'epoch': 2.45, 'throughput': 9994.54} [INFO|2025-03-21 04:20:15] logging.py:143 >> {'loss': 0.2483, 'learning_rate': 4.1223e-06, 'epoch': 2.45, 'throughput': 9994.59} [INFO|2025-03-21 04:20:54] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 4.1146e-06, 'epoch': 2.45, 'throughput': 9994.61} [INFO|2025-03-21 04:21:35] logging.py:143 >> {'loss': 0.2380, 'learning_rate': 4.1069e-06, 'epoch': 2.45, 'throughput': 9994.59} [INFO|2025-03-21 04:22:15] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 4.0992e-06, 'epoch': 2.45, 'throughput': 9994.57} [INFO|2025-03-21 04:22:55] logging.py:143 >> {'loss': 0.2229, 'learning_rate': 4.0915e-06, 'epoch': 2.45, 'throughput': 9994.57} [INFO|2025-03-21 04:23:35] logging.py:143 >> {'loss': 0.2450, 'learning_rate': 4.0839e-06, 'epoch': 2.45, 'throughput': 9994.58} [INFO|2025-03-21 04:24:16] logging.py:143 >> {'loss': 0.2232, 'learning_rate': 4.0762e-06, 'epoch': 2.45, 'throughput': 9994.57} [INFO|2025-03-21 04:24:57] logging.py:143 >> {'loss': 0.2189, 'learning_rate': 4.0685e-06, 'epoch': 2.45, 'throughput': 9994.58} [INFO|2025-03-21 04:25:39] logging.py:143 >> {'loss': 0.2126, 'learning_rate': 4.0609e-06, 'epoch': 2.45, 'throughput': 9994.54} [INFO|2025-03-21 04:26:19] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 4.0533e-06, 'epoch': 2.45, 'throughput': 9994.57} [INFO|2025-03-21 04:26:59] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 4.0456e-06, 'epoch': 2.45, 'throughput': 9994.60} [INFO|2025-03-21 04:27:39] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 4.0380e-06, 'epoch': 2.45, 'throughput': 9994.65} [INFO|2025-03-21 04:28:18] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 4.0304e-06, 'epoch': 2.45, 'throughput': 9994.68} [INFO|2025-03-21 04:28:58] logging.py:143 >> {'loss': 0.2502, 'learning_rate': 4.0227e-06, 'epoch': 2.45, 'throughput': 9994.71} [INFO|2025-03-21 04:29:39] logging.py:143 >> {'loss': 0.2265, 'learning_rate': 4.0151e-06, 'epoch': 2.46, 'throughput': 9994.76} [INFO|2025-03-21 04:30:20] logging.py:143 >> {'loss': 0.2419, 'learning_rate': 4.0075e-06, 'epoch': 2.46, 'throughput': 9994.74} [INFO|2025-03-21 04:30:59] logging.py:143 >> {'loss': 0.2143, 'learning_rate': 3.9999e-06, 'epoch': 2.46, 'throughput': 9994.81} [INFO|2025-03-21 04:31:40] logging.py:143 >> {'loss': 0.2436, 'learning_rate': 3.9924e-06, 'epoch': 2.46, 'throughput': 9994.86} [INFO|2025-03-21 04:32:20] logging.py:143 >> {'loss': 0.2480, 'learning_rate': 3.9848e-06, 'epoch': 2.46, 'throughput': 9994.90} [INFO|2025-03-21 04:33:01] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 3.9772e-06, 'epoch': 2.46, 'throughput': 9994.84} [INFO|2025-03-21 04:33:42] logging.py:143 >> {'loss': 0.2247, 'learning_rate': 3.9696e-06, 'epoch': 2.46, 'throughput': 9994.78} [INFO|2025-03-21 04:34:24] logging.py:143 >> {'loss': 0.2227, 'learning_rate': 3.9621e-06, 'epoch': 2.46, 'throughput': 9994.74} [INFO|2025-03-21 04:35:04] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 3.9545e-06, 'epoch': 2.46, 'throughput': 9994.72} [INFO|2025-03-21 04:35:45] logging.py:143 >> {'loss': 0.2315, 'learning_rate': 3.9470e-06, 'epoch': 2.46, 'throughput': 9994.67} [INFO|2025-03-21 04:36:25] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 3.9394e-06, 'epoch': 2.46, 'throughput': 9994.68} [INFO|2025-03-21 04:37:04] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 3.9319e-06, 'epoch': 2.46, 'throughput': 9994.67} [INFO|2025-03-21 04:37:46] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 3.9243e-06, 'epoch': 2.46, 'throughput': 9994.64} [INFO|2025-03-21 04:38:26] logging.py:143 >> {'loss': 0.2247, 'learning_rate': 3.9168e-06, 'epoch': 2.46, 'throughput': 9994.71} [INFO|2025-03-21 04:39:07] logging.py:143 >> {'loss': 0.2351, 'learning_rate': 3.9093e-06, 'epoch': 2.46, 'throughput': 9994.66} [INFO|2025-03-21 04:39:47] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 3.9018e-06, 'epoch': 2.46, 'throughput': 9994.69} [INFO|2025-03-21 04:40:27] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 3.8943e-06, 'epoch': 2.46, 'throughput': 9994.70} [INFO|2025-03-21 04:41:08] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 3.8868e-06, 'epoch': 2.46, 'throughput': 9994.74} [INFO|2025-03-21 04:41:47] logging.py:143 >> {'loss': 0.2471, 'learning_rate': 3.8793e-06, 'epoch': 2.46, 'throughput': 9994.81} [INFO|2025-03-21 04:42:28] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 3.8718e-06, 'epoch': 2.47, 'throughput': 9994.82} [INFO|2025-03-21 04:43:10] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 3.8643e-06, 'epoch': 2.47, 'throughput': 9994.84} [INFO|2025-03-21 04:43:51] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 3.8569e-06, 'epoch': 2.47, 'throughput': 9994.83} [INFO|2025-03-21 04:44:32] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 3.8494e-06, 'epoch': 2.47, 'throughput': 9994.73} [INFO|2025-03-21 04:45:12] logging.py:143 >> {'loss': 0.2156, 'learning_rate': 3.8420e-06, 'epoch': 2.47, 'throughput': 9994.76} [INFO|2025-03-21 04:45:54] logging.py:143 >> {'loss': 0.2336, 'learning_rate': 3.8345e-06, 'epoch': 2.47, 'throughput': 9994.72} [INFO|2025-03-21 04:46:34] logging.py:143 >> {'loss': 0.2423, 'learning_rate': 3.8271e-06, 'epoch': 2.47, 'throughput': 9994.76} [INFO|2025-03-21 04:47:15] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 3.8196e-06, 'epoch': 2.47, 'throughput': 9994.81} [INFO|2025-03-21 04:47:56] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 3.8122e-06, 'epoch': 2.47, 'throughput': 9994.74} [INFO|2025-03-21 04:48:36] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 3.8048e-06, 'epoch': 2.47, 'throughput': 9994.74} [INFO|2025-03-21 04:49:17] logging.py:143 >> {'loss': 0.2448, 'learning_rate': 3.7973e-06, 'epoch': 2.47, 'throughput': 9994.67} [INFO|2025-03-21 04:49:59] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 3.7899e-06, 'epoch': 2.47, 'throughput': 9994.66} [INFO|2025-03-21 04:50:41] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 3.7825e-06, 'epoch': 2.47, 'throughput': 9994.60} [INFO|2025-03-21 04:51:21] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 3.7751e-06, 'epoch': 2.47, 'throughput': 9994.62} [INFO|2025-03-21 04:52:02] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 3.7677e-06, 'epoch': 2.47, 'throughput': 9994.61} [INFO|2025-03-21 04:52:43] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 3.7604e-06, 'epoch': 2.47, 'throughput': 9994.60} [INFO|2025-03-21 04:53:23] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 3.7530e-06, 'epoch': 2.47, 'throughput': 9994.57} [INFO|2025-03-21 04:54:03] logging.py:143 >> {'loss': 0.2188, 'learning_rate': 3.7456e-06, 'epoch': 2.47, 'throughput': 9994.57} [INFO|2025-03-21 04:54:44] logging.py:143 >> {'loss': 0.2181, 'learning_rate': 3.7382e-06, 'epoch': 2.47, 'throughput': 9994.57} [INFO|2025-03-21 04:55:25] logging.py:143 >> {'loss': 0.2215, 'learning_rate': 3.7309e-06, 'epoch': 2.48, 'throughput': 9994.59} [INFO|2025-03-21 04:56:06] logging.py:143 >> {'loss': 0.2361, 'learning_rate': 3.7235e-06, 'epoch': 2.48, 'throughput': 9994.62} [INFO|2025-03-21 04:56:47] logging.py:143 >> {'loss': 0.2604, 'learning_rate': 3.7162e-06, 'epoch': 2.48, 'throughput': 9994.60} [INFO|2025-03-21 04:57:28] logging.py:143 >> {'loss': 0.2355, 'learning_rate': 3.7089e-06, 'epoch': 2.48, 'throughput': 9994.58} [INFO|2025-03-21 04:58:10] logging.py:143 >> {'loss': 0.2467, 'learning_rate': 3.7015e-06, 'epoch': 2.48, 'throughput': 9994.55} [INFO|2025-03-21 04:58:50] logging.py:143 >> {'loss': 0.2429, 'learning_rate': 3.6942e-06, 'epoch': 2.48, 'throughput': 9994.57} [INFO|2025-03-21 04:59:30] logging.py:143 >> {'loss': 0.2590, 'learning_rate': 3.6869e-06, 'epoch': 2.48, 'throughput': 9994.60} [INFO|2025-03-21 05:00:11] logging.py:143 >> {'loss': 0.2523, 'learning_rate': 3.6796e-06, 'epoch': 2.48, 'throughput': 9994.56} [INFO|2025-03-21 05:00:51] logging.py:143 >> {'loss': 0.2439, 'learning_rate': 3.6723e-06, 'epoch': 2.48, 'throughput': 9994.55} [INFO|2025-03-21 05:01:30] logging.py:143 >> {'loss': 0.2378, 'learning_rate': 3.6650e-06, 'epoch': 2.48, 'throughput': 9994.58} [INFO|2025-03-21 05:02:11] logging.py:143 >> {'loss': 0.2171, 'learning_rate': 3.6577e-06, 'epoch': 2.48, 'throughput': 9994.56} [INFO|2025-03-21 05:02:51] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 3.6504e-06, 'epoch': 2.48, 'throughput': 9994.58} [INFO|2025-03-21 05:03:32] logging.py:143 >> {'loss': 0.2172, 'learning_rate': 3.6431e-06, 'epoch': 2.48, 'throughput': 9994.60} [INFO|2025-03-21 05:04:12] logging.py:143 >> {'loss': 0.2392, 'learning_rate': 3.6358e-06, 'epoch': 2.48, 'throughput': 9994.62} [INFO|2025-03-21 05:04:51] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 3.6286e-06, 'epoch': 2.48, 'throughput': 9994.61} [INFO|2025-03-21 05:05:33] logging.py:143 >> {'loss': 0.2536, 'learning_rate': 3.6213e-06, 'epoch': 2.48, 'throughput': 9994.57} [INFO|2025-03-21 05:06:13] logging.py:143 >> {'loss': 0.2140, 'learning_rate': 3.6141e-06, 'epoch': 2.48, 'throughput': 9994.55} [INFO|2025-03-21 05:06:54] logging.py:143 >> {'loss': 0.2066, 'learning_rate': 3.6068e-06, 'epoch': 2.48, 'throughput': 9994.58} [INFO|2025-03-21 05:07:34] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 3.5996e-06, 'epoch': 2.48, 'throughput': 9994.68} [INFO|2025-03-21 05:08:15] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 3.5924e-06, 'epoch': 2.49, 'throughput': 9994.69} [INFO|2025-03-21 05:08:55] logging.py:143 >> {'loss': 0.2455, 'learning_rate': 3.5851e-06, 'epoch': 2.49, 'throughput': 9994.79} [INFO|2025-03-21 05:09:34] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 3.5779e-06, 'epoch': 2.49, 'throughput': 9994.76} [INFO|2025-03-21 05:10:14] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 3.5707e-06, 'epoch': 2.49, 'throughput': 9994.79} [INFO|2025-03-21 05:10:54] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 3.5635e-06, 'epoch': 2.49, 'throughput': 9994.78} [INFO|2025-03-21 05:11:36] logging.py:143 >> {'loss': 0.2188, 'learning_rate': 3.5563e-06, 'epoch': 2.49, 'throughput': 9994.69} [INFO|2025-03-21 05:12:16] logging.py:143 >> {'loss': 0.2371, 'learning_rate': 3.5491e-06, 'epoch': 2.49, 'throughput': 9994.72} [INFO|2025-03-21 05:12:56] logging.py:143 >> {'loss': 0.2196, 'learning_rate': 3.5419e-06, 'epoch': 2.49, 'throughput': 9994.77} [INFO|2025-03-21 05:13:38] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 3.5348e-06, 'epoch': 2.49, 'throughput': 9994.77} [INFO|2025-03-21 05:14:18] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 3.5276e-06, 'epoch': 2.49, 'throughput': 9994.75} [INFO|2025-03-21 05:14:59] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 3.5204e-06, 'epoch': 2.49, 'throughput': 9994.71} [INFO|2025-03-21 05:15:39] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 3.5133e-06, 'epoch': 2.49, 'throughput': 9994.80} [INFO|2025-03-21 05:16:19] logging.py:143 >> {'loss': 0.2328, 'learning_rate': 3.5061e-06, 'epoch': 2.49, 'throughput': 9994.78} [INFO|2025-03-21 05:17:00] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 3.4990e-06, 'epoch': 2.49, 'throughput': 9994.79} [INFO|2025-03-21 05:17:40] logging.py:143 >> {'loss': 0.2425, 'learning_rate': 3.4918e-06, 'epoch': 2.49, 'throughput': 9994.75} [INFO|2025-03-21 05:18:20] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 3.4847e-06, 'epoch': 2.49, 'throughput': 9994.76} [INFO|2025-03-21 05:19:00] logging.py:143 >> {'loss': 0.2202, 'learning_rate': 3.4776e-06, 'epoch': 2.49, 'throughput': 9994.77} [INFO|2025-03-21 05:19:39] logging.py:143 >> {'loss': 0.2107, 'learning_rate': 3.4705e-06, 'epoch': 2.49, 'throughput': 9994.80} [INFO|2025-03-21 05:20:20] logging.py:143 >> {'loss': 0.2214, 'learning_rate': 3.4634e-06, 'epoch': 2.49, 'throughput': 9994.80} [INFO|2025-03-21 05:21:01] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 3.4563e-06, 'epoch': 2.50, 'throughput': 9994.80} [INFO|2025-03-21 05:21:43] logging.py:143 >> {'loss': 0.2405, 'learning_rate': 3.4492e-06, 'epoch': 2.50, 'throughput': 9994.76} [INFO|2025-03-21 05:22:23] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 3.4421e-06, 'epoch': 2.50, 'throughput': 9994.76} [INFO|2025-03-21 05:23:02] logging.py:143 >> {'loss': 0.2469, 'learning_rate': 3.4350e-06, 'epoch': 2.50, 'throughput': 9994.80} [INFO|2025-03-21 05:23:43] logging.py:143 >> {'loss': 0.2603, 'learning_rate': 3.4279e-06, 'epoch': 2.50, 'throughput': 9994.87} [INFO|2025-03-21 05:24:24] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 3.4208e-06, 'epoch': 2.50, 'throughput': 9994.85} [INFO|2025-03-21 05:25:05] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 3.4138e-06, 'epoch': 2.50, 'throughput': 9994.83} [INFO|2025-03-21 05:25:45] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 3.4067e-06, 'epoch': 2.50, 'throughput': 9994.84} [INFO|2025-03-21 05:26:25] logging.py:143 >> {'loss': 0.2352, 'learning_rate': 3.3997e-06, 'epoch': 2.50, 'throughput': 9994.83} [INFO|2025-03-21 05:27:05] logging.py:143 >> {'loss': 0.2310, 'learning_rate': 3.3926e-06, 'epoch': 2.50, 'throughput': 9994.81} [INFO|2025-03-21 05:27:46] logging.py:143 >> {'loss': 0.2299, 'learning_rate': 3.3856e-06, 'epoch': 2.50, 'throughput': 9994.87} [INFO|2025-03-21 05:28:26] logging.py:143 >> {'loss': 0.2163, 'learning_rate': 3.3786e-06, 'epoch': 2.50, 'throughput': 9994.90} [INFO|2025-03-21 05:29:05] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 3.3716e-06, 'epoch': 2.50, 'throughput': 9994.93} [INFO|2025-03-21 05:29:47] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 3.3645e-06, 'epoch': 2.50, 'throughput': 9994.88} [INFO|2025-03-21 05:30:28] logging.py:143 >> {'loss': 0.2090, 'learning_rate': 3.3575e-06, 'epoch': 2.50, 'throughput': 9994.85} [INFO|2025-03-21 05:31:09] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 3.3505e-06, 'epoch': 2.50, 'throughput': 9994.86} [INFO|2025-03-21 05:31:50] logging.py:143 >> {'loss': 0.2423, 'learning_rate': 3.3435e-06, 'epoch': 2.50, 'throughput': 9994.86} [INFO|2025-03-21 05:32:32] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 3.3365e-06, 'epoch': 2.50, 'throughput': 9994.79} [INFO|2025-03-21 05:33:12] logging.py:143 >> {'loss': 0.2540, 'learning_rate': 3.3296e-06, 'epoch': 2.51, 'throughput': 9994.88} [INFO|2025-03-21 05:33:52] logging.py:143 >> {'loss': 0.2187, 'learning_rate': 3.3226e-06, 'epoch': 2.51, 'throughput': 9994.91} [INFO|2025-03-21 05:34:31] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 3.3156e-06, 'epoch': 2.51, 'throughput': 9994.94} [INFO|2025-03-21 05:35:11] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 3.3087e-06, 'epoch': 2.51, 'throughput': 9994.91} [INFO|2025-03-21 05:35:52] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 3.3017e-06, 'epoch': 2.51, 'throughput': 9994.94} [INFO|2025-03-21 05:36:32] logging.py:143 >> {'loss': 0.2494, 'learning_rate': 3.2948e-06, 'epoch': 2.51, 'throughput': 9994.99} [INFO|2025-03-21 05:37:12] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 3.2878e-06, 'epoch': 2.51, 'throughput': 9994.96} [INFO|2025-03-21 05:37:54] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 3.2809e-06, 'epoch': 2.51, 'throughput': 9994.88} [INFO|2025-03-21 05:38:35] logging.py:143 >> {'loss': 0.2375, 'learning_rate': 3.2740e-06, 'epoch': 2.51, 'throughput': 9994.85} [INFO|2025-03-21 05:39:16] logging.py:143 >> {'loss': 0.2273, 'learning_rate': 3.2670e-06, 'epoch': 2.51, 'throughput': 9994.84} [INFO|2025-03-21 05:39:56] logging.py:143 >> {'loss': 0.2242, 'learning_rate': 3.2601e-06, 'epoch': 2.51, 'throughput': 9994.89} [INFO|2025-03-21 05:40:35] logging.py:143 >> {'loss': 0.2106, 'learning_rate': 3.2532e-06, 'epoch': 2.51, 'throughput': 9994.94} [INFO|2025-03-21 05:41:17] logging.py:143 >> {'loss': 0.2310, 'learning_rate': 3.2463e-06, 'epoch': 2.51, 'throughput': 9994.97} [INFO|2025-03-21 05:41:57] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 3.2394e-06, 'epoch': 2.51, 'throughput': 9995.01} [INFO|2025-03-21 05:42:37] logging.py:143 >> {'loss': 0.2299, 'learning_rate': 3.2325e-06, 'epoch': 2.51, 'throughput': 9994.99} [INFO|2025-03-21 05:43:17] logging.py:143 >> {'loss': 0.2391, 'learning_rate': 3.2257e-06, 'epoch': 2.51, 'throughput': 9994.97} [INFO|2025-03-21 05:44:00] logging.py:143 >> {'loss': 0.2613, 'learning_rate': 3.2188e-06, 'epoch': 2.51, 'throughput': 9994.92} [INFO|2025-03-21 05:44:39] logging.py:143 >> {'loss': 0.2472, 'learning_rate': 3.2119e-06, 'epoch': 2.51, 'throughput': 9994.90} [INFO|2025-03-21 05:45:21] logging.py:143 >> {'loss': 0.2355, 'learning_rate': 3.2051e-06, 'epoch': 2.51, 'throughput': 9994.88} [INFO|2025-03-21 05:46:01] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 3.1982e-06, 'epoch': 2.52, 'throughput': 9994.83} [INFO|2025-03-21 05:46:42] logging.py:143 >> {'loss': 0.2305, 'learning_rate': 3.1914e-06, 'epoch': 2.52, 'throughput': 9994.84} [INFO|2025-03-21 05:47:23] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 3.1845e-06, 'epoch': 2.52, 'throughput': 9994.85} [INFO|2025-03-21 05:48:03] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 3.1777e-06, 'epoch': 2.52, 'throughput': 9994.82} [INFO|2025-03-21 05:48:45] logging.py:143 >> {'loss': 0.2273, 'learning_rate': 3.1709e-06, 'epoch': 2.52, 'throughput': 9994.76} [INFO|2025-03-21 05:49:27] logging.py:143 >> {'loss': 0.2315, 'learning_rate': 3.1641e-06, 'epoch': 2.52, 'throughput': 9994.74} [INFO|2025-03-21 05:50:06] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 3.1573e-06, 'epoch': 2.52, 'throughput': 9994.73} [INFO|2025-03-21 05:50:45] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 3.1505e-06, 'epoch': 2.52, 'throughput': 9994.83} [INFO|2025-03-21 05:51:24] logging.py:143 >> {'loss': 0.2471, 'learning_rate': 3.1437e-06, 'epoch': 2.52, 'throughput': 9994.79} [INFO|2025-03-21 05:52:04] logging.py:143 >> {'loss': 0.2380, 'learning_rate': 3.1369e-06, 'epoch': 2.52, 'throughput': 9994.84} [INFO|2025-03-21 05:52:44] logging.py:143 >> {'loss': 0.2142, 'learning_rate': 3.1301e-06, 'epoch': 2.52, 'throughput': 9994.84} [INFO|2025-03-21 05:53:24] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 3.1233e-06, 'epoch': 2.52, 'throughput': 9994.88} [INFO|2025-03-21 05:54:05] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 3.1165e-06, 'epoch': 2.52, 'throughput': 9994.90} [INFO|2025-03-21 05:54:46] logging.py:143 >> {'loss': 0.2111, 'learning_rate': 3.1098e-06, 'epoch': 2.52, 'throughput': 9994.85} [INFO|2025-03-21 05:55:27] logging.py:143 >> {'loss': 0.2252, 'learning_rate': 3.1030e-06, 'epoch': 2.52, 'throughput': 9994.86} [INFO|2025-03-21 05:56:07] logging.py:143 >> {'loss': 0.2475, 'learning_rate': 3.0963e-06, 'epoch': 2.52, 'throughput': 9994.94} [INFO|2025-03-21 05:56:47] logging.py:143 >> {'loss': 0.2278, 'learning_rate': 3.0895e-06, 'epoch': 2.52, 'throughput': 9995.00} [INFO|2025-03-21 05:57:26] logging.py:143 >> {'loss': 0.2286, 'learning_rate': 3.0828e-06, 'epoch': 2.52, 'throughput': 9995.02} [INFO|2025-03-21 05:58:06] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 3.0761e-06, 'epoch': 2.52, 'throughput': 9995.08} [INFO|2025-03-21 05:58:46] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 3.0693e-06, 'epoch': 2.53, 'throughput': 9995.04} [INFO|2025-03-21 05:59:28] logging.py:143 >> {'loss': 0.2219, 'learning_rate': 3.0626e-06, 'epoch': 2.53, 'throughput': 9994.93} [INFO|2025-03-21 06:00:08] logging.py:143 >> {'loss': 0.2558, 'learning_rate': 3.0559e-06, 'epoch': 2.53, 'throughput': 9995.02} [INFO|2025-03-21 06:00:48] logging.py:143 >> {'loss': 0.2286, 'learning_rate': 3.0492e-06, 'epoch': 2.53, 'throughput': 9995.05} [INFO|2025-03-21 06:01:27] logging.py:143 >> {'loss': 0.2502, 'learning_rate': 3.0425e-06, 'epoch': 2.53, 'throughput': 9995.10} [INFO|2025-03-21 06:02:08] logging.py:143 >> {'loss': 0.2339, 'learning_rate': 3.0358e-06, 'epoch': 2.53, 'throughput': 9995.12} [INFO|2025-03-21 06:02:48] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 3.0292e-06, 'epoch': 2.53, 'throughput': 9995.16} [INFO|2025-03-21 06:03:29] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 3.0225e-06, 'epoch': 2.53, 'throughput': 9995.18} [INFO|2025-03-21 06:04:10] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 3.0158e-06, 'epoch': 2.53, 'throughput': 9995.19} [INFO|2025-03-21 06:04:51] logging.py:143 >> {'loss': 0.2125, 'learning_rate': 3.0092e-06, 'epoch': 2.53, 'throughput': 9995.08} [INFO|2025-03-21 06:05:32] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 3.0025e-06, 'epoch': 2.53, 'throughput': 9995.17} [INFO|2025-03-21 06:06:13] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 2.9959e-06, 'epoch': 2.53, 'throughput': 9995.21} [INFO|2025-03-21 06:06:53] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 2.9892e-06, 'epoch': 2.53, 'throughput': 9995.26} [INFO|2025-03-21 06:07:33] logging.py:143 >> {'loss': 0.2099, 'learning_rate': 2.9826e-06, 'epoch': 2.53, 'throughput': 9995.23} [INFO|2025-03-21 06:08:13] logging.py:143 >> {'loss': 0.2316, 'learning_rate': 2.9760e-06, 'epoch': 2.53, 'throughput': 9995.18} [INFO|2025-03-21 06:08:55] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 2.9693e-06, 'epoch': 2.53, 'throughput': 9995.11} [INFO|2025-03-21 06:09:36] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 2.9627e-06, 'epoch': 2.53, 'throughput': 9995.06} [INFO|2025-03-21 06:10:16] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 2.9561e-06, 'epoch': 2.53, 'throughput': 9995.07} [INFO|2025-03-21 06:10:57] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 2.9495e-06, 'epoch': 2.53, 'throughput': 9995.06} [INFO|2025-03-21 06:11:37] logging.py:143 >> {'loss': 0.2333, 'learning_rate': 2.9429e-06, 'epoch': 2.54, 'throughput': 9995.10} [INFO|2025-03-21 06:12:18] logging.py:143 >> {'loss': 0.2314, 'learning_rate': 2.9364e-06, 'epoch': 2.54, 'throughput': 9995.09} [INFO|2025-03-21 06:13:00] logging.py:143 >> {'loss': 0.2296, 'learning_rate': 2.9298e-06, 'epoch': 2.54, 'throughput': 9994.95} [INFO|2025-03-21 06:13:39] logging.py:143 >> {'loss': 0.2354, 'learning_rate': 2.9232e-06, 'epoch': 2.54, 'throughput': 9995.02} [INFO|2025-03-21 06:14:22] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 2.9167e-06, 'epoch': 2.54, 'throughput': 9995.00} [INFO|2025-03-21 06:15:02] logging.py:143 >> {'loss': 0.2095, 'learning_rate': 2.9101e-06, 'epoch': 2.54, 'throughput': 9994.98} [INFO|2025-03-21 06:15:42] logging.py:143 >> {'loss': 0.2481, 'learning_rate': 2.9035e-06, 'epoch': 2.54, 'throughput': 9994.97} [INFO|2025-03-21 06:16:22] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 2.8970e-06, 'epoch': 2.54, 'throughput': 9995.03} [INFO|2025-03-21 06:17:03] logging.py:143 >> {'loss': 0.2352, 'learning_rate': 2.8905e-06, 'epoch': 2.54, 'throughput': 9995.07} [INFO|2025-03-21 06:17:42] logging.py:143 >> {'loss': 0.2210, 'learning_rate': 2.8839e-06, 'epoch': 2.54, 'throughput': 9995.13} [INFO|2025-03-21 06:18:22] logging.py:143 >> {'loss': 0.2296, 'learning_rate': 2.8774e-06, 'epoch': 2.54, 'throughput': 9995.16} [INFO|2025-03-21 06:19:02] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 2.8709e-06, 'epoch': 2.54, 'throughput': 9995.15} [INFO|2025-03-21 06:19:41] logging.py:143 >> {'loss': 0.2445, 'learning_rate': 2.8644e-06, 'epoch': 2.54, 'throughput': 9995.19} [INFO|2025-03-21 06:20:22] logging.py:143 >> {'loss': 0.2423, 'learning_rate': 2.8579e-06, 'epoch': 2.54, 'throughput': 9995.26} [INFO|2025-03-21 06:21:01] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 2.8514e-06, 'epoch': 2.54, 'throughput': 9995.33} [INFO|2025-03-21 06:21:42] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 2.8449e-06, 'epoch': 2.54, 'throughput': 9995.32} [INFO|2025-03-21 06:22:22] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 2.8384e-06, 'epoch': 2.54, 'throughput': 9995.32} [INFO|2025-03-21 06:23:01] logging.py:143 >> {'loss': 0.2255, 'learning_rate': 2.8320e-06, 'epoch': 2.54, 'throughput': 9995.34} [INFO|2025-03-21 06:23:40] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 2.8255e-06, 'epoch': 2.54, 'throughput': 9995.35} [INFO|2025-03-21 06:24:20] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 2.8190e-06, 'epoch': 2.55, 'throughput': 9995.41} [INFO|2025-03-21 06:25:01] logging.py:143 >> {'loss': 0.2442, 'learning_rate': 2.8126e-06, 'epoch': 2.55, 'throughput': 9995.44} [INFO|2025-03-21 06:25:42] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 2.8061e-06, 'epoch': 2.55, 'throughput': 9995.36} [INFO|2025-03-21 06:26:23] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 2.7997e-06, 'epoch': 2.55, 'throughput': 9995.34} [INFO|2025-03-21 06:27:04] logging.py:143 >> {'loss': 0.2148, 'learning_rate': 2.7933e-06, 'epoch': 2.55, 'throughput': 9995.30} [INFO|2025-03-21 06:27:44] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 2.7869e-06, 'epoch': 2.55, 'throughput': 9995.34} [INFO|2025-03-21 06:28:25] logging.py:143 >> {'loss': 0.2385, 'learning_rate': 2.7804e-06, 'epoch': 2.55, 'throughput': 9995.32} [INFO|2025-03-21 06:29:06] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 2.7740e-06, 'epoch': 2.55, 'throughput': 9995.25} [INFO|2025-03-21 06:29:44] logging.py:143 >> {'loss': 0.2256, 'learning_rate': 2.7676e-06, 'epoch': 2.55, 'throughput': 9995.34} [INFO|2025-03-21 06:30:25] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 2.7612e-06, 'epoch': 2.55, 'throughput': 9995.33} [INFO|2025-03-21 06:31:06] logging.py:143 >> {'loss': 0.2209, 'learning_rate': 2.7548e-06, 'epoch': 2.55, 'throughput': 9995.35} [INFO|2025-03-21 06:31:46] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 2.7485e-06, 'epoch': 2.55, 'throughput': 9995.32} [INFO|2025-03-21 06:32:28] logging.py:143 >> {'loss': 0.2316, 'learning_rate': 2.7421e-06, 'epoch': 2.55, 'throughput': 9995.27} [INFO|2025-03-21 06:33:07] logging.py:143 >> {'loss': 0.2310, 'learning_rate': 2.7357e-06, 'epoch': 2.55, 'throughput': 9995.31} [INFO|2025-03-21 06:33:47] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 2.7293e-06, 'epoch': 2.55, 'throughput': 9995.30} [INFO|2025-03-21 06:34:29] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 2.7230e-06, 'epoch': 2.55, 'throughput': 9995.25} [INFO|2025-03-21 06:35:10] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 2.7166e-06, 'epoch': 2.55, 'throughput': 9995.23} [INFO|2025-03-21 06:35:50] logging.py:143 >> {'loss': 0.2510, 'learning_rate': 2.7103e-06, 'epoch': 2.55, 'throughput': 9995.26} [INFO|2025-03-21 06:36:30] logging.py:143 >> {'loss': 0.2252, 'learning_rate': 2.7040e-06, 'epoch': 2.55, 'throughput': 9995.25} [INFO|2025-03-21 06:37:11] logging.py:143 >> {'loss': 0.2390, 'learning_rate': 2.6976e-06, 'epoch': 2.56, 'throughput': 9995.25} [INFO|2025-03-21 06:37:53] logging.py:143 >> {'loss': 0.2271, 'learning_rate': 2.6913e-06, 'epoch': 2.56, 'throughput': 9995.19} [INFO|2025-03-21 06:38:33] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 2.6850e-06, 'epoch': 2.56, 'throughput': 9995.20} [INFO|2025-03-21 06:39:12] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 2.6787e-06, 'epoch': 2.56, 'throughput': 9995.28} [INFO|2025-03-21 06:39:52] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 2.6724e-06, 'epoch': 2.56, 'throughput': 9995.26} [INFO|2025-03-21 06:40:33] logging.py:143 >> {'loss': 0.2132, 'learning_rate': 2.6661e-06, 'epoch': 2.56, 'throughput': 9995.24} [INFO|2025-03-21 06:41:14] logging.py:143 >> {'loss': 0.2375, 'learning_rate': 2.6598e-06, 'epoch': 2.56, 'throughput': 9995.22} [INFO|2025-03-21 06:41:55] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 2.6536e-06, 'epoch': 2.56, 'throughput': 9995.20} [INFO|2025-03-21 06:42:35] logging.py:143 >> {'loss': 0.2320, 'learning_rate': 2.6473e-06, 'epoch': 2.56, 'throughput': 9995.23} [INFO|2025-03-21 06:43:16] logging.py:143 >> {'loss': 0.2459, 'learning_rate': 2.6410e-06, 'epoch': 2.56, 'throughput': 9995.24} [INFO|2025-03-21 06:43:56] logging.py:143 >> {'loss': 0.2286, 'learning_rate': 2.6348e-06, 'epoch': 2.56, 'throughput': 9995.29} [INFO|2025-03-21 06:44:35] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 2.6285e-06, 'epoch': 2.56, 'throughput': 9995.39} [INFO|2025-03-21 06:45:14] logging.py:143 >> {'loss': 0.2562, 'learning_rate': 2.6223e-06, 'epoch': 2.56, 'throughput': 9995.41} [INFO|2025-03-21 06:45:55] logging.py:143 >> {'loss': 0.2354, 'learning_rate': 2.6160e-06, 'epoch': 2.56, 'throughput': 9995.41} [INFO|2025-03-21 06:46:35] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 2.6098e-06, 'epoch': 2.56, 'throughput': 9995.42} [INFO|2025-03-21 06:47:15] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 2.6036e-06, 'epoch': 2.56, 'throughput': 9995.45} [INFO|2025-03-21 06:47:55] logging.py:143 >> {'loss': 0.2243, 'learning_rate': 2.5974e-06, 'epoch': 2.56, 'throughput': 9995.45} [INFO|2025-03-21 06:48:35] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 2.5912e-06, 'epoch': 2.56, 'throughput': 9995.47} [INFO|2025-03-21 06:49:16] logging.py:143 >> {'loss': 0.2589, 'learning_rate': 2.5850e-06, 'epoch': 2.57, 'throughput': 9995.46} [INFO|2025-03-21 06:49:57] logging.py:143 >> {'loss': 0.2287, 'learning_rate': 2.5788e-06, 'epoch': 2.57, 'throughput': 9995.47} [INFO|2025-03-21 06:50:37] logging.py:143 >> {'loss': 0.2471, 'learning_rate': 2.5726e-06, 'epoch': 2.57, 'throughput': 9995.49} [INFO|2025-03-21 06:51:18] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 2.5664e-06, 'epoch': 2.57, 'throughput': 9995.49} [INFO|2025-03-21 06:51:59] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 2.5602e-06, 'epoch': 2.57, 'throughput': 9995.55} [INFO|2025-03-21 06:52:39] logging.py:143 >> {'loss': 0.2176, 'learning_rate': 2.5541e-06, 'epoch': 2.57, 'throughput': 9995.57} [INFO|2025-03-21 06:53:19] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 2.5479e-06, 'epoch': 2.57, 'throughput': 9995.57} [INFO|2025-03-21 06:54:02] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 2.5418e-06, 'epoch': 2.57, 'throughput': 9995.55} [INFO|2025-03-21 06:54:41] logging.py:143 >> {'loss': 0.2323, 'learning_rate': 2.5356e-06, 'epoch': 2.57, 'throughput': 9995.58} [INFO|2025-03-21 06:55:23] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 2.5295e-06, 'epoch': 2.57, 'throughput': 9995.58} [INFO|2025-03-21 06:56:02] logging.py:143 >> {'loss': 0.2037, 'learning_rate': 2.5234e-06, 'epoch': 2.57, 'throughput': 9995.63} [INFO|2025-03-21 06:56:43] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 2.5172e-06, 'epoch': 2.57, 'throughput': 9995.63} [INFO|2025-03-21 06:57:24] logging.py:143 >> {'loss': 0.2373, 'learning_rate': 2.5111e-06, 'epoch': 2.57, 'throughput': 9995.64} [INFO|2025-03-21 06:58:03] logging.py:143 >> {'loss': 0.2305, 'learning_rate': 2.5050e-06, 'epoch': 2.57, 'throughput': 9995.67} [INFO|2025-03-21 06:58:42] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 2.4989e-06, 'epoch': 2.57, 'throughput': 9995.69} [INFO|2025-03-21 06:59:21] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 2.4928e-06, 'epoch': 2.57, 'throughput': 9995.75} [INFO|2025-03-21 07:00:03] logging.py:143 >> {'loss': 0.2142, 'learning_rate': 2.4867e-06, 'epoch': 2.57, 'throughput': 9995.68} [INFO|2025-03-21 07:00:43] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 2.4806e-06, 'epoch': 2.57, 'throughput': 9995.69} [INFO|2025-03-21 07:01:22] logging.py:143 >> {'loss': 0.2346, 'learning_rate': 2.4746e-06, 'epoch': 2.57, 'throughput': 9995.72} [INFO|2025-03-21 07:02:02] logging.py:143 >> {'loss': 0.2085, 'learning_rate': 2.4685e-06, 'epoch': 2.58, 'throughput': 9995.74} [INFO|2025-03-21 07:02:41] logging.py:143 >> {'loss': 0.2267, 'learning_rate': 2.4624e-06, 'epoch': 2.58, 'throughput': 9995.81} [INFO|2025-03-21 07:03:22] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 2.4564e-06, 'epoch': 2.58, 'throughput': 9995.83} [INFO|2025-03-21 07:04:02] logging.py:143 >> {'loss': 0.2469, 'learning_rate': 2.4503e-06, 'epoch': 2.58, 'throughput': 9995.87} [INFO|2025-03-21 07:04:43] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 2.4443e-06, 'epoch': 2.58, 'throughput': 9995.86} [INFO|2025-03-21 07:05:24] logging.py:143 >> {'loss': 0.2218, 'learning_rate': 2.4383e-06, 'epoch': 2.58, 'throughput': 9995.90} [INFO|2025-03-21 07:06:03] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 2.4322e-06, 'epoch': 2.58, 'throughput': 9995.89} [INFO|2025-03-21 07:06:44] logging.py:143 >> {'loss': 0.2432, 'learning_rate': 2.4262e-06, 'epoch': 2.58, 'throughput': 9995.87} [INFO|2025-03-21 07:07:23] logging.py:143 >> {'loss': 0.2111, 'learning_rate': 2.4202e-06, 'epoch': 2.58, 'throughput': 9995.89} [INFO|2025-03-21 07:08:05] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 2.4142e-06, 'epoch': 2.58, 'throughput': 9995.90} [INFO|2025-03-21 07:08:45] logging.py:143 >> {'loss': 0.2217, 'learning_rate': 2.4082e-06, 'epoch': 2.58, 'throughput': 9995.90} [INFO|2025-03-21 07:09:25] logging.py:143 >> {'loss': 0.2375, 'learning_rate': 2.4022e-06, 'epoch': 2.58, 'throughput': 9995.91} [INFO|2025-03-21 07:10:04] logging.py:143 >> {'loss': 0.2204, 'learning_rate': 2.3962e-06, 'epoch': 2.58, 'throughput': 9995.93} [INFO|2025-03-21 07:10:43] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 2.3903e-06, 'epoch': 2.58, 'throughput': 9996.04} [INFO|2025-03-21 07:11:23] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 2.3843e-06, 'epoch': 2.58, 'throughput': 9996.06} [INFO|2025-03-21 07:12:02] logging.py:143 >> {'loss': 0.2320, 'learning_rate': 2.3783e-06, 'epoch': 2.58, 'throughput': 9996.08} [INFO|2025-03-21 07:12:42] logging.py:143 >> {'loss': 0.2460, 'learning_rate': 2.3724e-06, 'epoch': 2.58, 'throughput': 9996.08} [INFO|2025-03-21 07:13:22] logging.py:143 >> {'loss': 0.2160, 'learning_rate': 2.3664e-06, 'epoch': 2.58, 'throughput': 9996.06} [INFO|2025-03-21 07:14:02] logging.py:143 >> {'loss': 0.2209, 'learning_rate': 2.3605e-06, 'epoch': 2.58, 'throughput': 9996.03} [INFO|2025-03-21 07:14:44] logging.py:143 >> {'loss': 0.2478, 'learning_rate': 2.3546e-06, 'epoch': 2.59, 'throughput': 9995.97} [INFO|2025-03-21 07:15:25] logging.py:143 >> {'loss': 0.2299, 'learning_rate': 2.3487e-06, 'epoch': 2.59, 'throughput': 9995.96} [INFO|2025-03-21 07:16:05] logging.py:143 >> {'loss': 0.2418, 'learning_rate': 2.3427e-06, 'epoch': 2.59, 'throughput': 9995.94} [INFO|2025-03-21 07:16:44] logging.py:143 >> {'loss': 0.2415, 'learning_rate': 2.3368e-06, 'epoch': 2.59, 'throughput': 9996.02} [INFO|2025-03-21 07:17:24] logging.py:143 >> {'loss': 0.2272, 'learning_rate': 2.3309e-06, 'epoch': 2.59, 'throughput': 9996.06} [INFO|2025-03-21 07:18:03] logging.py:143 >> {'loss': 0.2529, 'learning_rate': 2.3250e-06, 'epoch': 2.59, 'throughput': 9996.14} [INFO|2025-03-21 07:18:44] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 2.3191e-06, 'epoch': 2.59, 'throughput': 9996.12} [INFO|2025-03-21 07:19:25] logging.py:143 >> {'loss': 0.2179, 'learning_rate': 2.3132e-06, 'epoch': 2.59, 'throughput': 9996.10} [INFO|2025-03-21 07:20:05] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 2.3074e-06, 'epoch': 2.59, 'throughput': 9996.12} [INFO|2025-03-21 07:20:44] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 2.3015e-06, 'epoch': 2.59, 'throughput': 9996.21} [INFO|2025-03-21 07:21:25] logging.py:143 >> {'loss': 0.2339, 'learning_rate': 2.2956e-06, 'epoch': 2.59, 'throughput': 9996.22} [INFO|2025-03-21 07:22:06] logging.py:143 >> {'loss': 0.2267, 'learning_rate': 2.2898e-06, 'epoch': 2.59, 'throughput': 9996.14} [INFO|2025-03-21 07:22:46] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 2.2839e-06, 'epoch': 2.59, 'throughput': 9996.11} [INFO|2025-03-21 07:23:26] logging.py:143 >> {'loss': 0.2165, 'learning_rate': 2.2781e-06, 'epoch': 2.59, 'throughput': 9996.15} [INFO|2025-03-21 07:24:07] logging.py:143 >> {'loss': 0.2111, 'learning_rate': 2.2723e-06, 'epoch': 2.59, 'throughput': 9996.18} [INFO|2025-03-21 07:24:47] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 2.2664e-06, 'epoch': 2.59, 'throughput': 9996.23} [INFO|2025-03-21 07:25:28] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 2.2606e-06, 'epoch': 2.59, 'throughput': 9996.20} [INFO|2025-03-21 07:26:08] logging.py:143 >> {'loss': 0.2441, 'learning_rate': 2.2548e-06, 'epoch': 2.59, 'throughput': 9996.21} [INFO|2025-03-21 07:26:47] logging.py:143 >> {'loss': 0.2290, 'learning_rate': 2.2490e-06, 'epoch': 2.59, 'throughput': 9996.27} [INFO|2025-03-21 07:27:29] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 2.2432e-06, 'epoch': 2.60, 'throughput': 9996.23} [INFO|2025-03-21 07:28:10] logging.py:143 >> {'loss': 0.2480, 'learning_rate': 2.2374e-06, 'epoch': 2.60, 'throughput': 9996.16} [INFO|2025-03-21 07:28:51] logging.py:143 >> {'loss': 0.2208, 'learning_rate': 2.2316e-06, 'epoch': 2.60, 'throughput': 9996.15} [INFO|2025-03-21 07:29:31] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 2.2259e-06, 'epoch': 2.60, 'throughput': 9996.17} [INFO|2025-03-21 07:30:11] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 2.2201e-06, 'epoch': 2.60, 'throughput': 9996.18} [INFO|2025-03-21 07:30:51] logging.py:143 >> {'loss': 0.2362, 'learning_rate': 2.2143e-06, 'epoch': 2.60, 'throughput': 9996.22} [INFO|2025-03-21 07:31:30] logging.py:143 >> {'loss': 0.2208, 'learning_rate': 2.2086e-06, 'epoch': 2.60, 'throughput': 9996.23} [INFO|2025-03-21 07:32:12] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 2.2028e-06, 'epoch': 2.60, 'throughput': 9996.20} [INFO|2025-03-21 07:32:53] logging.py:143 >> {'loss': 0.2221, 'learning_rate': 2.1971e-06, 'epoch': 2.60, 'throughput': 9996.14} [INFO|2025-03-21 07:33:33] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 2.1914e-06, 'epoch': 2.60, 'throughput': 9996.14} [INFO|2025-03-21 07:34:13] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 2.1856e-06, 'epoch': 2.60, 'throughput': 9996.16} [INFO|2025-03-21 07:34:54] logging.py:143 >> {'loss': 0.2257, 'learning_rate': 2.1799e-06, 'epoch': 2.60, 'throughput': 9996.14} [INFO|2025-03-21 07:35:34] logging.py:143 >> {'loss': 0.2115, 'learning_rate': 2.1742e-06, 'epoch': 2.60, 'throughput': 9996.11} [INFO|2025-03-21 07:36:15] logging.py:143 >> {'loss': 0.2468, 'learning_rate': 2.1685e-06, 'epoch': 2.60, 'throughput': 9996.12} [INFO|2025-03-21 07:36:55] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 2.1628e-06, 'epoch': 2.60, 'throughput': 9996.13} [INFO|2025-03-21 07:37:36] logging.py:143 >> {'loss': 0.2142, 'learning_rate': 2.1571e-06, 'epoch': 2.60, 'throughput': 9996.09} [INFO|2025-03-21 07:38:18] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 2.1514e-06, 'epoch': 2.60, 'throughput': 9996.07} [INFO|2025-03-21 07:38:58] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 2.1458e-06, 'epoch': 2.60, 'throughput': 9996.10} [INFO|2025-03-21 07:39:38] logging.py:143 >> {'loss': 0.2509, 'learning_rate': 2.1401e-06, 'epoch': 2.60, 'throughput': 9996.06} [INFO|2025-03-21 07:40:19] logging.py:143 >> {'loss': 0.2310, 'learning_rate': 2.1344e-06, 'epoch': 2.61, 'throughput': 9996.08} [INFO|2025-03-21 07:40:59] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 2.1288e-06, 'epoch': 2.61, 'throughput': 9996.07} [INFO|2025-03-21 07:41:40] logging.py:143 >> {'loss': 0.2187, 'learning_rate': 2.1231e-06, 'epoch': 2.61, 'throughput': 9996.02} [INFO|2025-03-21 07:42:21] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 2.1175e-06, 'epoch': 2.61, 'throughput': 9996.01} [INFO|2025-03-21 07:43:01] logging.py:143 >> {'loss': 0.2252, 'learning_rate': 2.1119e-06, 'epoch': 2.61, 'throughput': 9996.05} [INFO|2025-03-21 07:43:41] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 2.1062e-06, 'epoch': 2.61, 'throughput': 9996.04} [INFO|2025-03-21 07:44:22] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 2.1006e-06, 'epoch': 2.61, 'throughput': 9996.00} [INFO|2025-03-21 07:45:02] logging.py:143 >> {'loss': 0.2238, 'learning_rate': 2.0950e-06, 'epoch': 2.61, 'throughput': 9996.00} [INFO|2025-03-21 07:45:42] logging.py:143 >> {'loss': 0.2209, 'learning_rate': 2.0894e-06, 'epoch': 2.61, 'throughput': 9996.03} [INFO|2025-03-21 07:46:24] logging.py:143 >> {'loss': 0.2229, 'learning_rate': 2.0838e-06, 'epoch': 2.61, 'throughput': 9996.07} [INFO|2025-03-21 07:47:05] logging.py:143 >> {'loss': 0.2254, 'learning_rate': 2.0782e-06, 'epoch': 2.61, 'throughput': 9996.02} [INFO|2025-03-21 07:47:45] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 2.0726e-06, 'epoch': 2.61, 'throughput': 9996.00} [INFO|2025-03-21 07:48:26] logging.py:143 >> {'loss': 0.2357, 'learning_rate': 2.0671e-06, 'epoch': 2.61, 'throughput': 9996.04} [INFO|2025-03-21 07:49:05] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 2.0615e-06, 'epoch': 2.61, 'throughput': 9996.07} [INFO|2025-03-21 07:49:45] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 2.0559e-06, 'epoch': 2.61, 'throughput': 9996.11} [INFO|2025-03-21 07:50:25] logging.py:143 >> {'loss': 0.2068, 'learning_rate': 2.0504e-06, 'epoch': 2.61, 'throughput': 9996.12} [INFO|2025-03-21 07:51:06] logging.py:143 >> {'loss': 0.2378, 'learning_rate': 2.0448e-06, 'epoch': 2.61, 'throughput': 9996.09} [INFO|2025-03-21 07:51:48] logging.py:143 >> {'loss': 0.2394, 'learning_rate': 2.0393e-06, 'epoch': 2.61, 'throughput': 9996.07} [INFO|2025-03-21 07:52:28] logging.py:143 >> {'loss': 0.2286, 'learning_rate': 2.0338e-06, 'epoch': 2.61, 'throughput': 9996.11} [INFO|2025-03-21 07:53:08] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 2.0282e-06, 'epoch': 2.62, 'throughput': 9996.13} [INFO|2025-03-21 07:53:49] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 2.0227e-06, 'epoch': 2.62, 'throughput': 9996.12} [INFO|2025-03-21 07:54:30] logging.py:143 >> {'loss': 0.2195, 'learning_rate': 2.0172e-06, 'epoch': 2.62, 'throughput': 9996.14} [INFO|2025-03-21 07:55:11] logging.py:143 >> {'loss': 0.2281, 'learning_rate': 2.0117e-06, 'epoch': 2.62, 'throughput': 9996.11} [INFO|2025-03-21 07:55:51] logging.py:143 >> {'loss': 0.2144, 'learning_rate': 2.0062e-06, 'epoch': 2.62, 'throughput': 9996.09} [INFO|2025-03-21 07:56:32] logging.py:143 >> {'loss': 0.2513, 'learning_rate': 2.0007e-06, 'epoch': 2.62, 'throughput': 9996.11} [INFO|2025-03-21 07:57:12] logging.py:143 >> {'loss': 0.2338, 'learning_rate': 1.9952e-06, 'epoch': 2.62, 'throughput': 9996.16} [INFO|2025-03-21 07:57:53] logging.py:143 >> {'loss': 0.2316, 'learning_rate': 1.9898e-06, 'epoch': 2.62, 'throughput': 9996.11} [INFO|2025-03-21 07:58:33] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 1.9843e-06, 'epoch': 2.62, 'throughput': 9996.13} [INFO|2025-03-21 07:59:14] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 1.9788e-06, 'epoch': 2.62, 'throughput': 9996.11} [INFO|2025-03-21 07:59:55] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 1.9734e-06, 'epoch': 2.62, 'throughput': 9996.09} [INFO|2025-03-21 08:00:35] logging.py:143 >> {'loss': 0.2227, 'learning_rate': 1.9679e-06, 'epoch': 2.62, 'throughput': 9996.08} [INFO|2025-03-21 08:01:15] logging.py:143 >> {'loss': 0.2375, 'learning_rate': 1.9625e-06, 'epoch': 2.62, 'throughput': 9996.06} [INFO|2025-03-21 08:01:55] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 1.9571e-06, 'epoch': 2.62, 'throughput': 9996.05} [INFO|2025-03-21 08:02:36] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 1.9516e-06, 'epoch': 2.62, 'throughput': 9996.05} [INFO|2025-03-21 08:03:16] logging.py:143 >> {'loss': 0.2283, 'learning_rate': 1.9462e-06, 'epoch': 2.62, 'throughput': 9996.08} [INFO|2025-03-21 08:03:57] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 1.9408e-06, 'epoch': 2.62, 'throughput': 9996.06} [INFO|2025-03-21 08:04:36] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 1.9354e-06, 'epoch': 2.62, 'throughput': 9996.11} [INFO|2025-03-21 08:05:16] logging.py:143 >> {'loss': 0.2413, 'learning_rate': 1.9300e-06, 'epoch': 2.62, 'throughput': 9996.19} [INFO|2025-03-21 08:05:57] logging.py:143 >> {'loss': 0.2399, 'learning_rate': 1.9246e-06, 'epoch': 2.63, 'throughput': 9996.20} [INFO|2025-03-21 08:06:38] logging.py:143 >> {'loss': 0.2422, 'learning_rate': 1.9192e-06, 'epoch': 2.63, 'throughput': 9996.19} [INFO|2025-03-21 08:07:18] logging.py:143 >> {'loss': 0.2376, 'learning_rate': 1.9139e-06, 'epoch': 2.63, 'throughput': 9996.12} [INFO|2025-03-21 08:08:00] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 1.9085e-06, 'epoch': 2.63, 'throughput': 9996.07} [INFO|2025-03-21 08:08:40] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 1.9031e-06, 'epoch': 2.63, 'throughput': 9996.03} [INFO|2025-03-21 08:09:21] logging.py:143 >> {'loss': 0.2271, 'learning_rate': 1.8978e-06, 'epoch': 2.63, 'throughput': 9996.03} [INFO|2025-03-21 08:10:02] logging.py:143 >> {'loss': 0.2411, 'learning_rate': 1.8924e-06, 'epoch': 2.63, 'throughput': 9996.00} [INFO|2025-03-21 08:10:41] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 1.8871e-06, 'epoch': 2.63, 'throughput': 9996.03} [INFO|2025-03-21 08:11:20] logging.py:143 >> {'loss': 0.2392, 'learning_rate': 1.8818e-06, 'epoch': 2.63, 'throughput': 9996.08} [INFO|2025-03-21 08:12:01] logging.py:143 >> {'loss': 0.2392, 'learning_rate': 1.8765e-06, 'epoch': 2.63, 'throughput': 9996.14} [INFO|2025-03-21 08:12:42] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 1.8711e-06, 'epoch': 2.63, 'throughput': 9996.15} [INFO|2025-03-21 08:13:24] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 1.8658e-06, 'epoch': 2.63, 'throughput': 9996.09} [INFO|2025-03-21 08:14:05] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 1.8605e-06, 'epoch': 2.63, 'throughput': 9996.09} [INFO|2025-03-21 08:14:43] logging.py:143 >> {'loss': 0.2396, 'learning_rate': 1.8552e-06, 'epoch': 2.63, 'throughput': 9996.09} [INFO|2025-03-21 08:15:23] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 1.8500e-06, 'epoch': 2.63, 'throughput': 9996.12} [INFO|2025-03-21 08:16:05] logging.py:143 >> {'loss': 0.2406, 'learning_rate': 1.8447e-06, 'epoch': 2.63, 'throughput': 9996.10} [INFO|2025-03-21 08:16:45] logging.py:143 >> {'loss': 0.2227, 'learning_rate': 1.8394e-06, 'epoch': 2.63, 'throughput': 9996.07} [INFO|2025-03-21 08:17:26] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 1.8341e-06, 'epoch': 2.63, 'throughput': 9996.03} [INFO|2025-03-21 08:18:06] logging.py:143 >> {'loss': 0.2458, 'learning_rate': 1.8289e-06, 'epoch': 2.64, 'throughput': 9996.01} [INFO|2025-03-21 08:18:46] logging.py:143 >> {'loss': 0.2530, 'learning_rate': 1.8236e-06, 'epoch': 2.64, 'throughput': 9996.06} [INFO|2025-03-21 08:19:27] logging.py:143 >> {'loss': 0.2273, 'learning_rate': 1.8184e-06, 'epoch': 2.64, 'throughput': 9996.07} [INFO|2025-03-21 08:20:09] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 1.8132e-06, 'epoch': 2.64, 'throughput': 9996.10} [INFO|2025-03-21 08:20:51] logging.py:143 >> {'loss': 0.2127, 'learning_rate': 1.8079e-06, 'epoch': 2.64, 'throughput': 9996.04} [INFO|2025-03-21 08:21:31] logging.py:143 >> {'loss': 0.2265, 'learning_rate': 1.8027e-06, 'epoch': 2.64, 'throughput': 9996.09} [INFO|2025-03-21 08:22:11] logging.py:143 >> {'loss': 0.2135, 'learning_rate': 1.7975e-06, 'epoch': 2.64, 'throughput': 9996.13} [INFO|2025-03-21 08:22:52] logging.py:143 >> {'loss': 0.2345, 'learning_rate': 1.7923e-06, 'epoch': 2.64, 'throughput': 9996.13} [INFO|2025-03-21 08:23:33] logging.py:143 >> {'loss': 0.2421, 'learning_rate': 1.7871e-06, 'epoch': 2.64, 'throughput': 9996.12} [INFO|2025-03-21 08:24:14] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 1.7819e-06, 'epoch': 2.64, 'throughput': 9996.17} [INFO|2025-03-21 08:24:54] logging.py:143 >> {'loss': 0.2540, 'learning_rate': 1.7767e-06, 'epoch': 2.64, 'throughput': 9996.25} [INFO|2025-03-21 08:25:33] logging.py:143 >> {'loss': 0.2131, 'learning_rate': 1.7715e-06, 'epoch': 2.64, 'throughput': 9996.22} [INFO|2025-03-21 08:26:13] logging.py:143 >> {'loss': 0.2337, 'learning_rate': 1.7664e-06, 'epoch': 2.64, 'throughput': 9996.27} [INFO|2025-03-21 08:26:54] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 1.7612e-06, 'epoch': 2.64, 'throughput': 9996.27} [INFO|2025-03-21 08:27:34] logging.py:143 >> {'loss': 0.2287, 'learning_rate': 1.7560e-06, 'epoch': 2.64, 'throughput': 9996.28} [INFO|2025-03-21 08:28:15] logging.py:143 >> {'loss': 0.2688, 'learning_rate': 1.7509e-06, 'epoch': 2.64, 'throughput': 9996.31} [INFO|2025-03-21 08:28:56] logging.py:143 >> {'loss': 0.2473, 'learning_rate': 1.7458e-06, 'epoch': 2.64, 'throughput': 9996.25} [INFO|2025-03-21 08:29:37] logging.py:143 >> {'loss': 0.2503, 'learning_rate': 1.7406e-06, 'epoch': 2.64, 'throughput': 9996.31} [INFO|2025-03-21 08:30:18] logging.py:143 >> {'loss': 0.2458, 'learning_rate': 1.7355e-06, 'epoch': 2.64, 'throughput': 9996.31} [INFO|2025-03-21 08:30:57] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 1.7304e-06, 'epoch': 2.65, 'throughput': 9996.32} [INFO|2025-03-21 08:31:38] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 1.7253e-06, 'epoch': 2.65, 'throughput': 9996.27} [INFO|2025-03-21 08:32:19] logging.py:143 >> {'loss': 0.2182, 'learning_rate': 1.7202e-06, 'epoch': 2.65, 'throughput': 9996.29} [INFO|2025-03-21 08:32:59] logging.py:143 >> {'loss': 0.2096, 'learning_rate': 1.7151e-06, 'epoch': 2.65, 'throughput': 9996.34} [INFO|2025-03-21 08:33:39] logging.py:143 >> {'loss': 0.2278, 'learning_rate': 1.7100e-06, 'epoch': 2.65, 'throughput': 9996.35} [INFO|2025-03-21 08:34:18] logging.py:143 >> {'loss': 0.2115, 'learning_rate': 1.7049e-06, 'epoch': 2.65, 'throughput': 9996.36} [INFO|2025-03-21 08:34:59] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 1.6998e-06, 'epoch': 2.65, 'throughput': 9996.39} [INFO|2025-03-21 08:35:42] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 1.6947e-06, 'epoch': 2.65, 'throughput': 9996.34} [INFO|2025-03-21 08:36:23] logging.py:143 >> {'loss': 0.2464, 'learning_rate': 1.6897e-06, 'epoch': 2.65, 'throughput': 9996.36} [INFO|2025-03-21 08:37:03] logging.py:143 >> {'loss': 0.2397, 'learning_rate': 1.6846e-06, 'epoch': 2.65, 'throughput': 9996.34} [INFO|2025-03-21 08:37:43] logging.py:143 >> {'loss': 0.2430, 'learning_rate': 1.6796e-06, 'epoch': 2.65, 'throughput': 9996.35} [INFO|2025-03-21 08:38:23] logging.py:143 >> {'loss': 0.2585, 'learning_rate': 1.6745e-06, 'epoch': 2.65, 'throughput': 9996.40} [INFO|2025-03-21 08:39:04] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 1.6695e-06, 'epoch': 2.65, 'throughput': 9996.38} [INFO|2025-03-21 08:39:44] logging.py:143 >> {'loss': 0.2628, 'learning_rate': 1.6645e-06, 'epoch': 2.65, 'throughput': 9996.41} [INFO|2025-03-21 08:40:24] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 1.6595e-06, 'epoch': 2.65, 'throughput': 9996.36} [INFO|2025-03-21 08:41:04] logging.py:143 >> {'loss': 0.2513, 'learning_rate': 1.6545e-06, 'epoch': 2.65, 'throughput': 9996.33} [INFO|2025-03-21 08:41:44] logging.py:143 >> {'loss': 0.2447, 'learning_rate': 1.6495e-06, 'epoch': 2.65, 'throughput': 9996.33} [INFO|2025-03-21 08:41:48] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000 [INFO|2025-03-21 08:41:48] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000/config.json [INFO|2025-03-21 08:41:48] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000/generation_config.json [INFO|2025-03-21 08:42:00] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000/model.safetensors.index.json. [INFO|2025-03-21 08:42:00] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000/tokenizer_config.json [INFO|2025-03-21 08:42:00] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-25000/special_tokens_map.json [INFO|2025-03-21 08:43:06] logging.py:143 >> {'loss': 0.2447, 'learning_rate': 1.6445e-06, 'epoch': 2.65, 'throughput': 9994.36} [INFO|2025-03-21 08:43:49] logging.py:143 >> {'loss': 0.2443, 'learning_rate': 1.6395e-06, 'epoch': 2.65, 'throughput': 9994.25} [INFO|2025-03-21 08:44:29] logging.py:143 >> {'loss': 0.2401, 'learning_rate': 1.6345e-06, 'epoch': 2.66, 'throughput': 9994.28} [INFO|2025-03-21 08:45:10] logging.py:143 >> {'loss': 0.2163, 'learning_rate': 1.6295e-06, 'epoch': 2.66, 'throughput': 9994.25} [INFO|2025-03-21 08:45:51] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 1.6246e-06, 'epoch': 2.66, 'throughput': 9994.25} [INFO|2025-03-21 08:46:33] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 1.6196e-06, 'epoch': 2.66, 'throughput': 9994.24} [INFO|2025-03-21 08:47:13] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 1.6146e-06, 'epoch': 2.66, 'throughput': 9994.23} [INFO|2025-03-21 08:47:53] logging.py:143 >> {'loss': 0.2349, 'learning_rate': 1.6097e-06, 'epoch': 2.66, 'throughput': 9994.27} [INFO|2025-03-21 08:48:34] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 1.6048e-06, 'epoch': 2.66, 'throughput': 9994.28} [INFO|2025-03-21 08:49:15] logging.py:143 >> {'loss': 0.2231, 'learning_rate': 1.5998e-06, 'epoch': 2.66, 'throughput': 9994.28} [INFO|2025-03-21 08:49:55] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 1.5949e-06, 'epoch': 2.66, 'throughput': 9994.30} [INFO|2025-03-21 08:50:35] logging.py:143 >> {'loss': 0.2450, 'learning_rate': 1.5900e-06, 'epoch': 2.66, 'throughput': 9994.27} [INFO|2025-03-21 08:51:17] logging.py:143 >> {'loss': 0.2342, 'learning_rate': 1.5851e-06, 'epoch': 2.66, 'throughput': 9994.27} [INFO|2025-03-21 08:51:57] logging.py:143 >> {'loss': 0.2573, 'learning_rate': 1.5802e-06, 'epoch': 2.66, 'throughput': 9994.32} [INFO|2025-03-21 08:52:38] logging.py:143 >> {'loss': 0.2209, 'learning_rate': 1.5753e-06, 'epoch': 2.66, 'throughput': 9994.28} [INFO|2025-03-21 08:53:18] logging.py:143 >> {'loss': 0.2342, 'learning_rate': 1.5704e-06, 'epoch': 2.66, 'throughput': 9994.31} [INFO|2025-03-21 08:53:57] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 1.5655e-06, 'epoch': 2.66, 'throughput': 9994.37} [INFO|2025-03-21 08:54:37] logging.py:143 >> {'loss': 0.2374, 'learning_rate': 1.5607e-06, 'epoch': 2.66, 'throughput': 9994.43} [INFO|2025-03-21 08:55:17] logging.py:143 >> {'loss': 0.2204, 'learning_rate': 1.5558e-06, 'epoch': 2.66, 'throughput': 9994.41} [INFO|2025-03-21 08:55:58] logging.py:143 >> {'loss': 0.2267, 'learning_rate': 1.5509e-06, 'epoch': 2.66, 'throughput': 9994.44} [INFO|2025-03-21 08:56:39] logging.py:143 >> {'loss': 0.2281, 'learning_rate': 1.5461e-06, 'epoch': 2.66, 'throughput': 9994.52} [INFO|2025-03-21 08:57:19] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 1.5413e-06, 'epoch': 2.67, 'throughput': 9994.54} [INFO|2025-03-21 08:58:00] logging.py:143 >> {'loss': 0.2324, 'learning_rate': 1.5364e-06, 'epoch': 2.67, 'throughput': 9994.58} [INFO|2025-03-21 08:58:40] logging.py:143 >> {'loss': 0.2228, 'learning_rate': 1.5316e-06, 'epoch': 2.67, 'throughput': 9994.58} [INFO|2025-03-21 08:59:19] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 1.5268e-06, 'epoch': 2.67, 'throughput': 9994.65} [INFO|2025-03-21 09:00:00] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 1.5220e-06, 'epoch': 2.67, 'throughput': 9994.64} [INFO|2025-03-21 09:00:41] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 1.5172e-06, 'epoch': 2.67, 'throughput': 9994.55} [INFO|2025-03-21 09:01:22] logging.py:143 >> {'loss': 0.2117, 'learning_rate': 1.5124e-06, 'epoch': 2.67, 'throughput': 9994.54} [INFO|2025-03-21 09:02:01] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 1.5076e-06, 'epoch': 2.67, 'throughput': 9994.60} [INFO|2025-03-21 09:02:41] logging.py:143 >> {'loss': 0.2282, 'learning_rate': 1.5028e-06, 'epoch': 2.67, 'throughput': 9994.65} [INFO|2025-03-21 09:03:21] logging.py:143 >> {'loss': 0.2100, 'learning_rate': 1.4980e-06, 'epoch': 2.67, 'throughput': 9994.65} [INFO|2025-03-21 09:04:02] logging.py:143 >> {'loss': 0.2257, 'learning_rate': 1.4933e-06, 'epoch': 2.67, 'throughput': 9994.62} [INFO|2025-03-21 09:04:43] logging.py:143 >> {'loss': 0.2312, 'learning_rate': 1.4885e-06, 'epoch': 2.67, 'throughput': 9994.60} [INFO|2025-03-21 09:05:25] logging.py:143 >> {'loss': 0.2115, 'learning_rate': 1.4837e-06, 'epoch': 2.67, 'throughput': 9994.60} [INFO|2025-03-21 09:06:07] logging.py:143 >> {'loss': 0.2125, 'learning_rate': 1.4790e-06, 'epoch': 2.67, 'throughput': 9994.55} [INFO|2025-03-21 09:06:46] logging.py:143 >> {'loss': 0.2081, 'learning_rate': 1.4743e-06, 'epoch': 2.67, 'throughput': 9994.57} [INFO|2025-03-21 09:07:27] logging.py:143 >> {'loss': 0.2152, 'learning_rate': 1.4695e-06, 'epoch': 2.67, 'throughput': 9994.58} [INFO|2025-03-21 09:08:07] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 1.4648e-06, 'epoch': 2.67, 'throughput': 9994.54} [INFO|2025-03-21 09:08:48] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 1.4601e-06, 'epoch': 2.67, 'throughput': 9994.57} [INFO|2025-03-21 09:09:29] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 1.4554e-06, 'epoch': 2.67, 'throughput': 9994.58} [INFO|2025-03-21 09:10:11] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 1.4507e-06, 'epoch': 2.68, 'throughput': 9994.49} [INFO|2025-03-21 09:10:53] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 1.4460e-06, 'epoch': 2.68, 'throughput': 9994.44} [INFO|2025-03-21 09:11:32] logging.py:143 >> {'loss': 0.2351, 'learning_rate': 1.4413e-06, 'epoch': 2.68, 'throughput': 9994.52} [INFO|2025-03-21 09:12:12] logging.py:143 >> {'loss': 0.2404, 'learning_rate': 1.4366e-06, 'epoch': 2.68, 'throughput': 9994.52} [INFO|2025-03-21 09:12:53] logging.py:143 >> {'loss': 0.2186, 'learning_rate': 1.4319e-06, 'epoch': 2.68, 'throughput': 9994.53} [INFO|2025-03-21 09:13:31] logging.py:143 >> {'loss': 0.2119, 'learning_rate': 1.4273e-06, 'epoch': 2.68, 'throughput': 9994.56} [INFO|2025-03-21 09:14:11] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 1.4226e-06, 'epoch': 2.68, 'throughput': 9994.56} [INFO|2025-03-21 09:14:51] logging.py:143 >> {'loss': 0.2140, 'learning_rate': 1.4180e-06, 'epoch': 2.68, 'throughput': 9994.59} [INFO|2025-03-21 09:15:32] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 1.4133e-06, 'epoch': 2.68, 'throughput': 9994.60} [INFO|2025-03-21 09:16:14] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 1.4087e-06, 'epoch': 2.68, 'throughput': 9994.62} [INFO|2025-03-21 09:16:55] logging.py:143 >> {'loss': 0.2358, 'learning_rate': 1.4041e-06, 'epoch': 2.68, 'throughput': 9994.58} [INFO|2025-03-21 09:17:35] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 1.3995e-06, 'epoch': 2.68, 'throughput': 9994.59} [INFO|2025-03-21 09:18:15] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 1.3948e-06, 'epoch': 2.68, 'throughput': 9994.66} [INFO|2025-03-21 09:18:54] logging.py:143 >> {'loss': 0.2043, 'learning_rate': 1.3902e-06, 'epoch': 2.68, 'throughput': 9994.68} [INFO|2025-03-21 09:19:36] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 1.3856e-06, 'epoch': 2.68, 'throughput': 9994.71} [INFO|2025-03-21 09:20:18] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 1.3810e-06, 'epoch': 2.68, 'throughput': 9994.62} [INFO|2025-03-21 09:20:58] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 1.3765e-06, 'epoch': 2.68, 'throughput': 9994.63} [INFO|2025-03-21 09:21:39] logging.py:143 >> {'loss': 0.2176, 'learning_rate': 1.3719e-06, 'epoch': 2.68, 'throughput': 9994.57} [INFO|2025-03-21 09:22:19] logging.py:143 >> {'loss': 0.2369, 'learning_rate': 1.3673e-06, 'epoch': 2.68, 'throughput': 9994.58} [INFO|2025-03-21 09:23:00] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 1.3628e-06, 'epoch': 2.69, 'throughput': 9994.56} [INFO|2025-03-21 09:23:42] logging.py:143 >> {'loss': 0.2219, 'learning_rate': 1.3582e-06, 'epoch': 2.69, 'throughput': 9994.56} [INFO|2025-03-21 09:24:22] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 1.3537e-06, 'epoch': 2.69, 'throughput': 9994.54} [INFO|2025-03-21 09:25:04] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 1.3491e-06, 'epoch': 2.69, 'throughput': 9994.47} [INFO|2025-03-21 09:25:45] logging.py:143 >> {'loss': 0.2204, 'learning_rate': 1.3446e-06, 'epoch': 2.69, 'throughput': 9994.47} [INFO|2025-03-21 09:26:25] logging.py:143 >> {'loss': 0.2188, 'learning_rate': 1.3401e-06, 'epoch': 2.69, 'throughput': 9994.49} [INFO|2025-03-21 09:27:06] logging.py:143 >> {'loss': 0.2129, 'learning_rate': 1.3356e-06, 'epoch': 2.69, 'throughput': 9994.52} [INFO|2025-03-21 09:27:45] logging.py:143 >> {'loss': 0.2138, 'learning_rate': 1.3310e-06, 'epoch': 2.69, 'throughput': 9994.58} [INFO|2025-03-21 09:28:24] logging.py:143 >> {'loss': 0.2397, 'learning_rate': 1.3265e-06, 'epoch': 2.69, 'throughput': 9994.66} [INFO|2025-03-21 09:29:06] logging.py:143 >> {'loss': 0.2185, 'learning_rate': 1.3220e-06, 'epoch': 2.69, 'throughput': 9994.67} [INFO|2025-03-21 09:29:47] logging.py:143 >> {'loss': 0.2280, 'learning_rate': 1.3176e-06, 'epoch': 2.69, 'throughput': 9994.62} [INFO|2025-03-21 09:30:28] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 1.3131e-06, 'epoch': 2.69, 'throughput': 9994.62} [INFO|2025-03-21 09:31:10] logging.py:143 >> {'loss': 0.2604, 'learning_rate': 1.3086e-06, 'epoch': 2.69, 'throughput': 9994.58} [INFO|2025-03-21 09:31:51] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 1.3041e-06, 'epoch': 2.69, 'throughput': 9994.57} [INFO|2025-03-21 09:32:31] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 1.2997e-06, 'epoch': 2.69, 'throughput': 9994.59} [INFO|2025-03-21 09:33:12] logging.py:143 >> {'loss': 0.2387, 'learning_rate': 1.2952e-06, 'epoch': 2.69, 'throughput': 9994.55} [INFO|2025-03-21 09:33:54] logging.py:143 >> {'loss': 0.2325, 'learning_rate': 1.2908e-06, 'epoch': 2.69, 'throughput': 9994.49} [INFO|2025-03-21 09:34:34] logging.py:143 >> {'loss': 0.2105, 'learning_rate': 1.2864e-06, 'epoch': 2.69, 'throughput': 9994.47} [INFO|2025-03-21 09:35:14] logging.py:143 >> {'loss': 0.2210, 'learning_rate': 1.2819e-06, 'epoch': 2.70, 'throughput': 9994.49} [INFO|2025-03-21 09:35:54] logging.py:143 >> {'loss': 0.2474, 'learning_rate': 1.2775e-06, 'epoch': 2.70, 'throughput': 9994.49} [INFO|2025-03-21 09:36:34] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 1.2731e-06, 'epoch': 2.70, 'throughput': 9994.53} [INFO|2025-03-21 09:37:14] logging.py:143 >> {'loss': 0.2264, 'learning_rate': 1.2687e-06, 'epoch': 2.70, 'throughput': 9994.53} [INFO|2025-03-21 09:37:56] logging.py:143 >> {'loss': 0.2515, 'learning_rate': 1.2643e-06, 'epoch': 2.70, 'throughput': 9994.53} [INFO|2025-03-21 09:38:36] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 1.2599e-06, 'epoch': 2.70, 'throughput': 9994.55} [INFO|2025-03-21 09:39:17] logging.py:143 >> {'loss': 0.2352, 'learning_rate': 1.2555e-06, 'epoch': 2.70, 'throughput': 9994.55} [INFO|2025-03-21 09:39:57] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 1.2512e-06, 'epoch': 2.70, 'throughput': 9994.53} [INFO|2025-03-21 09:40:37] logging.py:143 >> {'loss': 0.2271, 'learning_rate': 1.2468e-06, 'epoch': 2.70, 'throughput': 9994.54} [INFO|2025-03-21 09:41:19] logging.py:143 >> {'loss': 0.2351, 'learning_rate': 1.2424e-06, 'epoch': 2.70, 'throughput': 9994.49} [INFO|2025-03-21 09:42:00] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 1.2381e-06, 'epoch': 2.70, 'throughput': 9994.50} [INFO|2025-03-21 09:42:39] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 1.2337e-06, 'epoch': 2.70, 'throughput': 9994.53} [INFO|2025-03-21 09:43:19] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 1.2294e-06, 'epoch': 2.70, 'throughput': 9994.57} [INFO|2025-03-21 09:44:00] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 1.2251e-06, 'epoch': 2.70, 'throughput': 9994.57} [INFO|2025-03-21 09:44:39] logging.py:143 >> {'loss': 0.2494, 'learning_rate': 1.2207e-06, 'epoch': 2.70, 'throughput': 9994.58} [INFO|2025-03-21 09:45:20] logging.py:143 >> {'loss': 0.2357, 'learning_rate': 1.2164e-06, 'epoch': 2.70, 'throughput': 9994.59} [INFO|2025-03-21 09:45:59] logging.py:143 >> {'loss': 0.2133, 'learning_rate': 1.2121e-06, 'epoch': 2.70, 'throughput': 9994.62} [INFO|2025-03-21 09:46:39] logging.py:143 >> {'loss': 0.2198, 'learning_rate': 1.2078e-06, 'epoch': 2.70, 'throughput': 9994.59} [INFO|2025-03-21 09:47:20] logging.py:143 >> {'loss': 0.2198, 'learning_rate': 1.2035e-06, 'epoch': 2.70, 'throughput': 9994.58} [INFO|2025-03-21 09:47:59] logging.py:143 >> {'loss': 0.2552, 'learning_rate': 1.1992e-06, 'epoch': 2.71, 'throughput': 9994.55} [INFO|2025-03-21 09:48:40] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 1.1950e-06, 'epoch': 2.71, 'throughput': 9994.54} [INFO|2025-03-21 09:49:21] logging.py:143 >> {'loss': 0.2122, 'learning_rate': 1.1907e-06, 'epoch': 2.71, 'throughput': 9994.55} [INFO|2025-03-21 09:50:00] logging.py:143 >> {'loss': 0.2129, 'learning_rate': 1.1864e-06, 'epoch': 2.71, 'throughput': 9994.57} [INFO|2025-03-21 09:50:42] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 1.1822e-06, 'epoch': 2.71, 'throughput': 9994.60} [INFO|2025-03-21 09:51:23] logging.py:143 >> {'loss': 0.2145, 'learning_rate': 1.1779e-06, 'epoch': 2.71, 'throughput': 9994.60} [INFO|2025-03-21 09:52:05] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 1.1737e-06, 'epoch': 2.71, 'throughput': 9994.53} [INFO|2025-03-21 09:52:45] logging.py:143 >> {'loss': 0.2501, 'learning_rate': 1.1694e-06, 'epoch': 2.71, 'throughput': 9994.55} [INFO|2025-03-21 09:53:25] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 1.1652e-06, 'epoch': 2.71, 'throughput': 9994.59} [INFO|2025-03-21 09:54:05] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 1.1610e-06, 'epoch': 2.71, 'throughput': 9994.63} [INFO|2025-03-21 09:54:45] logging.py:143 >> {'loss': 0.2177, 'learning_rate': 1.1568e-06, 'epoch': 2.71, 'throughput': 9994.69} [INFO|2025-03-21 09:55:27] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 1.1526e-06, 'epoch': 2.71, 'throughput': 9994.64} [INFO|2025-03-21 09:56:10] logging.py:143 >> {'loss': 0.2163, 'learning_rate': 1.1484e-06, 'epoch': 2.71, 'throughput': 9994.59} [INFO|2025-03-21 09:56:50] logging.py:143 >> {'loss': 0.2175, 'learning_rate': 1.1442e-06, 'epoch': 2.71, 'throughput': 9994.55} [INFO|2025-03-21 09:57:30] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 1.1400e-06, 'epoch': 2.71, 'throughput': 9994.59} [INFO|2025-03-21 09:58:11] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 1.1358e-06, 'epoch': 2.71, 'throughput': 9994.63} [INFO|2025-03-21 09:58:51] logging.py:143 >> {'loss': 0.2238, 'learning_rate': 1.1317e-06, 'epoch': 2.71, 'throughput': 9994.68} [INFO|2025-03-21 09:59:32] logging.py:143 >> {'loss': 0.2421, 'learning_rate': 1.1275e-06, 'epoch': 2.71, 'throughput': 9994.66} [INFO|2025-03-21 10:00:14] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 1.1234e-06, 'epoch': 2.71, 'throughput': 9994.63} [INFO|2025-03-21 10:00:54] logging.py:143 >> {'loss': 0.2176, 'learning_rate': 1.1192e-06, 'epoch': 2.72, 'throughput': 9994.63} [INFO|2025-03-21 10:01:34] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 1.1151e-06, 'epoch': 2.72, 'throughput': 9994.60} [INFO|2025-03-21 10:02:15] logging.py:143 >> {'loss': 0.2357, 'learning_rate': 1.1110e-06, 'epoch': 2.72, 'throughput': 9994.61} [INFO|2025-03-21 10:02:54] logging.py:143 >> {'loss': 0.2385, 'learning_rate': 1.1068e-06, 'epoch': 2.72, 'throughput': 9994.61} [INFO|2025-03-21 10:03:34] logging.py:143 >> {'loss': 0.2341, 'learning_rate': 1.1027e-06, 'epoch': 2.72, 'throughput': 9994.68} [INFO|2025-03-21 10:04:14] logging.py:143 >> {'loss': 0.2353, 'learning_rate': 1.0986e-06, 'epoch': 2.72, 'throughput': 9994.71} [INFO|2025-03-21 10:04:55] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 1.0945e-06, 'epoch': 2.72, 'throughput': 9994.74} [INFO|2025-03-21 10:05:36] logging.py:143 >> {'loss': 0.2270, 'learning_rate': 1.0904e-06, 'epoch': 2.72, 'throughput': 9994.73} [INFO|2025-03-21 10:06:16] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 1.0863e-06, 'epoch': 2.72, 'throughput': 9994.69} [INFO|2025-03-21 10:06:56] logging.py:143 >> {'loss': 0.2350, 'learning_rate': 1.0823e-06, 'epoch': 2.72, 'throughput': 9994.69} [INFO|2025-03-21 10:07:36] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 1.0782e-06, 'epoch': 2.72, 'throughput': 9994.70} [INFO|2025-03-21 10:08:17] logging.py:143 >> {'loss': 0.2425, 'learning_rate': 1.0741e-06, 'epoch': 2.72, 'throughput': 9994.70} [INFO|2025-03-21 10:08:56] logging.py:143 >> {'loss': 0.2227, 'learning_rate': 1.0701e-06, 'epoch': 2.72, 'throughput': 9994.72} [INFO|2025-03-21 10:09:37] logging.py:143 >> {'loss': 0.2143, 'learning_rate': 1.0660e-06, 'epoch': 2.72, 'throughput': 9994.72} [INFO|2025-03-21 10:10:16] logging.py:143 >> {'loss': 0.2313, 'learning_rate': 1.0620e-06, 'epoch': 2.72, 'throughput': 9994.75} [INFO|2025-03-21 10:10:55] logging.py:143 >> {'loss': 0.2480, 'learning_rate': 1.0580e-06, 'epoch': 2.72, 'throughput': 9994.84} [INFO|2025-03-21 10:11:37] logging.py:143 >> {'loss': 0.2241, 'learning_rate': 1.0539e-06, 'epoch': 2.72, 'throughput': 9994.74} [INFO|2025-03-21 10:12:17] logging.py:143 >> {'loss': 0.2202, 'learning_rate': 1.0499e-06, 'epoch': 2.72, 'throughput': 9994.69} [INFO|2025-03-21 10:12:57] logging.py:143 >> {'loss': 0.2260, 'learning_rate': 1.0459e-06, 'epoch': 2.72, 'throughput': 9994.69} [INFO|2025-03-21 10:13:37] logging.py:143 >> {'loss': 0.2232, 'learning_rate': 1.0419e-06, 'epoch': 2.73, 'throughput': 9994.70} [INFO|2025-03-21 10:14:18] logging.py:143 >> {'loss': 0.2387, 'learning_rate': 1.0379e-06, 'epoch': 2.73, 'throughput': 9994.70} [INFO|2025-03-21 10:15:00] logging.py:143 >> {'loss': 0.2322, 'learning_rate': 1.0339e-06, 'epoch': 2.73, 'throughput': 9994.65} [INFO|2025-03-21 10:15:40] logging.py:143 >> {'loss': 0.2319, 'learning_rate': 1.0300e-06, 'epoch': 2.73, 'throughput': 9994.66} [INFO|2025-03-21 10:16:21] logging.py:143 >> {'loss': 0.2524, 'learning_rate': 1.0260e-06, 'epoch': 2.73, 'throughput': 9994.62} [INFO|2025-03-21 10:17:03] logging.py:143 >> {'loss': 0.2255, 'learning_rate': 1.0220e-06, 'epoch': 2.73, 'throughput': 9994.57} [INFO|2025-03-21 10:17:44] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 1.0181e-06, 'epoch': 2.73, 'throughput': 9994.61} [INFO|2025-03-21 10:18:24] logging.py:143 >> {'loss': 0.2533, 'learning_rate': 1.0141e-06, 'epoch': 2.73, 'throughput': 9994.59} [INFO|2025-03-21 10:19:04] logging.py:143 >> {'loss': 0.2262, 'learning_rate': 1.0102e-06, 'epoch': 2.73, 'throughput': 9994.62} [INFO|2025-03-21 10:19:45] logging.py:143 >> {'loss': 0.2129, 'learning_rate': 1.0062e-06, 'epoch': 2.73, 'throughput': 9994.65} [INFO|2025-03-21 10:20:24] logging.py:143 >> {'loss': 0.2165, 'learning_rate': 1.0023e-06, 'epoch': 2.73, 'throughput': 9994.71} [INFO|2025-03-21 10:21:06] logging.py:143 >> {'loss': 0.2483, 'learning_rate': 9.9839e-07, 'epoch': 2.73, 'throughput': 9994.66} [INFO|2025-03-21 10:21:48] logging.py:143 >> {'loss': 0.2327, 'learning_rate': 9.9448e-07, 'epoch': 2.73, 'throughput': 9994.56} [INFO|2025-03-21 10:22:30] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 9.9057e-07, 'epoch': 2.73, 'throughput': 9994.54} [INFO|2025-03-21 10:23:09] logging.py:143 >> {'loss': 0.2115, 'learning_rate': 9.8668e-07, 'epoch': 2.73, 'throughput': 9994.52} [INFO|2025-03-21 10:23:52] logging.py:143 >> {'loss': 0.2076, 'learning_rate': 9.8279e-07, 'epoch': 2.73, 'throughput': 9994.40} [INFO|2025-03-21 10:24:33] logging.py:143 >> {'loss': 0.2169, 'learning_rate': 9.7891e-07, 'epoch': 2.73, 'throughput': 9994.32} [INFO|2025-03-21 10:25:14] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 9.7503e-07, 'epoch': 2.73, 'throughput': 9994.35} [INFO|2025-03-21 10:25:55] logging.py:143 >> {'loss': 0.2295, 'learning_rate': 9.7117e-07, 'epoch': 2.73, 'throughput': 9994.28} [INFO|2025-03-21 10:26:35] logging.py:143 >> {'loss': 0.2500, 'learning_rate': 9.6731e-07, 'epoch': 2.74, 'throughput': 9994.27} [INFO|2025-03-21 10:27:15] logging.py:143 >> {'loss': 0.2242, 'learning_rate': 9.6346e-07, 'epoch': 2.74, 'throughput': 9994.30} [INFO|2025-03-21 10:27:55] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 9.5961e-07, 'epoch': 2.74, 'throughput': 9994.33} [INFO|2025-03-21 10:28:35] logging.py:143 >> {'loss': 0.2350, 'learning_rate': 9.5578e-07, 'epoch': 2.74, 'throughput': 9994.35} [INFO|2025-03-21 10:29:16] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 9.5195e-07, 'epoch': 2.74, 'throughput': 9994.34} [INFO|2025-03-21 10:29:56] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 9.4813e-07, 'epoch': 2.74, 'throughput': 9994.40} [INFO|2025-03-21 10:30:36] logging.py:143 >> {'loss': 0.2362, 'learning_rate': 9.4432e-07, 'epoch': 2.74, 'throughput': 9994.37} [INFO|2025-03-21 10:31:17] logging.py:143 >> {'loss': 0.2196, 'learning_rate': 9.4051e-07, 'epoch': 2.74, 'throughput': 9994.29} [INFO|2025-03-21 10:31:59] logging.py:143 >> {'loss': 0.2413, 'learning_rate': 9.3671e-07, 'epoch': 2.74, 'throughput': 9994.27} [INFO|2025-03-21 10:32:40] logging.py:143 >> {'loss': 0.2285, 'learning_rate': 9.3292e-07, 'epoch': 2.74, 'throughput': 9994.27} [INFO|2025-03-21 10:33:20] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 9.2914e-07, 'epoch': 2.74, 'throughput': 9994.28} [INFO|2025-03-21 10:34:01] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 9.2536e-07, 'epoch': 2.74, 'throughput': 9994.31} [INFO|2025-03-21 10:34:40] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 9.2159e-07, 'epoch': 2.74, 'throughput': 9994.28} [INFO|2025-03-21 10:35:21] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 9.1783e-07, 'epoch': 2.74, 'throughput': 9994.27} [INFO|2025-03-21 10:36:02] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 9.1408e-07, 'epoch': 2.74, 'throughput': 9994.25} [INFO|2025-03-21 10:36:44] logging.py:143 >> {'loss': 0.2179, 'learning_rate': 9.1033e-07, 'epoch': 2.74, 'throughput': 9994.25} [INFO|2025-03-21 10:37:22] logging.py:143 >> {'loss': 0.2364, 'learning_rate': 9.0660e-07, 'epoch': 2.74, 'throughput': 9994.29} [INFO|2025-03-21 10:38:02] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 9.0287e-07, 'epoch': 2.74, 'throughput': 9994.28} [INFO|2025-03-21 10:38:42] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 8.9914e-07, 'epoch': 2.74, 'throughput': 9994.28} [INFO|2025-03-21 10:39:21] logging.py:143 >> {'loss': 0.2342, 'learning_rate': 8.9543e-07, 'epoch': 2.75, 'throughput': 9994.33} [INFO|2025-03-21 10:40:02] logging.py:143 >> {'loss': 0.2144, 'learning_rate': 8.9172e-07, 'epoch': 2.75, 'throughput': 9994.34} [INFO|2025-03-21 10:40:43] logging.py:143 >> {'loss': 0.2104, 'learning_rate': 8.8802e-07, 'epoch': 2.75, 'throughput': 9994.31} [INFO|2025-03-21 10:41:23] logging.py:143 >> {'loss': 0.2217, 'learning_rate': 8.8433e-07, 'epoch': 2.75, 'throughput': 9994.38} [INFO|2025-03-21 10:42:03] logging.py:143 >> {'loss': 0.2206, 'learning_rate': 8.8064e-07, 'epoch': 2.75, 'throughput': 9994.37} [INFO|2025-03-21 10:42:45] logging.py:143 >> {'loss': 0.2070, 'learning_rate': 8.7696e-07, 'epoch': 2.75, 'throughput': 9994.36} [INFO|2025-03-21 10:43:25] logging.py:143 >> {'loss': 0.2048, 'learning_rate': 8.7329e-07, 'epoch': 2.75, 'throughput': 9994.39} [INFO|2025-03-21 10:44:05] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 8.6963e-07, 'epoch': 2.75, 'throughput': 9994.42} [INFO|2025-03-21 10:44:45] logging.py:143 >> {'loss': 0.2532, 'learning_rate': 8.6598e-07, 'epoch': 2.75, 'throughput': 9994.43} [INFO|2025-03-21 10:45:26] logging.py:143 >> {'loss': 0.2409, 'learning_rate': 8.6233e-07, 'epoch': 2.75, 'throughput': 9994.40} [INFO|2025-03-21 10:46:09] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 8.5869e-07, 'epoch': 2.75, 'throughput': 9994.31} [INFO|2025-03-21 10:46:49] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 8.5506e-07, 'epoch': 2.75, 'throughput': 9994.33} [INFO|2025-03-21 10:47:30] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 8.5143e-07, 'epoch': 2.75, 'throughput': 9994.31} [INFO|2025-03-21 10:48:12] logging.py:143 >> {'loss': 0.2478, 'learning_rate': 8.4781e-07, 'epoch': 2.75, 'throughput': 9994.29} [INFO|2025-03-21 10:48:53] logging.py:143 >> {'loss': 0.2461, 'learning_rate': 8.4421e-07, 'epoch': 2.75, 'throughput': 9994.33} [INFO|2025-03-21 10:49:34] logging.py:143 >> {'loss': 0.2318, 'learning_rate': 8.4060e-07, 'epoch': 2.75, 'throughput': 9994.36} [INFO|2025-03-21 10:50:14] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 8.3701e-07, 'epoch': 2.75, 'throughput': 9994.40} [INFO|2025-03-21 10:50:54] logging.py:143 >> {'loss': 0.2014, 'learning_rate': 8.3342e-07, 'epoch': 2.75, 'throughput': 9994.43} [INFO|2025-03-21 10:51:34] logging.py:143 >> {'loss': 0.2126, 'learning_rate': 8.2984e-07, 'epoch': 2.76, 'throughput': 9994.48} [INFO|2025-03-21 10:52:12] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 8.2627e-07, 'epoch': 2.76, 'throughput': 9994.56} [INFO|2025-03-21 10:52:52] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 8.2271e-07, 'epoch': 2.76, 'throughput': 9994.59} [INFO|2025-03-21 10:53:31] logging.py:143 >> {'loss': 0.2322, 'learning_rate': 8.1915e-07, 'epoch': 2.76, 'throughput': 9994.63} [INFO|2025-03-21 10:54:11] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 8.1560e-07, 'epoch': 2.76, 'throughput': 9994.67} [INFO|2025-03-21 10:54:51] logging.py:143 >> {'loss': 0.2425, 'learning_rate': 8.1206e-07, 'epoch': 2.76, 'throughput': 9994.69} [INFO|2025-03-21 10:55:32] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 8.0853e-07, 'epoch': 2.76, 'throughput': 9994.68} [INFO|2025-03-21 10:56:13] logging.py:143 >> {'loss': 0.2189, 'learning_rate': 8.0500e-07, 'epoch': 2.76, 'throughput': 9994.68} [INFO|2025-03-21 10:56:53] logging.py:143 >> {'loss': 0.2112, 'learning_rate': 8.0148e-07, 'epoch': 2.76, 'throughput': 9994.75} [INFO|2025-03-21 10:57:32] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 7.9797e-07, 'epoch': 2.76, 'throughput': 9994.79} [INFO|2025-03-21 10:58:13] logging.py:143 >> {'loss': 0.2191, 'learning_rate': 7.9447e-07, 'epoch': 2.76, 'throughput': 9994.80} [INFO|2025-03-21 10:58:53] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 7.9097e-07, 'epoch': 2.76, 'throughput': 9994.86} [INFO|2025-03-21 10:59:34] logging.py:143 >> {'loss': 0.2195, 'learning_rate': 7.8748e-07, 'epoch': 2.76, 'throughput': 9994.86} [INFO|2025-03-21 11:00:15] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 7.8400e-07, 'epoch': 2.76, 'throughput': 9994.83} [INFO|2025-03-21 11:00:54] logging.py:143 >> {'loss': 0.2380, 'learning_rate': 7.8053e-07, 'epoch': 2.76, 'throughput': 9994.87} [INFO|2025-03-21 11:01:35] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 7.7706e-07, 'epoch': 2.76, 'throughput': 9994.88} [INFO|2025-03-21 11:02:16] logging.py:143 >> {'loss': 0.2178, 'learning_rate': 7.7361e-07, 'epoch': 2.76, 'throughput': 9994.88} [INFO|2025-03-21 11:02:56] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 7.7015e-07, 'epoch': 2.76, 'throughput': 9994.88} [INFO|2025-03-21 11:03:35] logging.py:143 >> {'loss': 0.2181, 'learning_rate': 7.6671e-07, 'epoch': 2.76, 'throughput': 9994.91} [INFO|2025-03-21 11:04:17] logging.py:143 >> {'loss': 0.2161, 'learning_rate': 7.6328e-07, 'epoch': 2.77, 'throughput': 9994.82} [INFO|2025-03-21 11:04:57] logging.py:143 >> {'loss': 0.2157, 'learning_rate': 7.5985e-07, 'epoch': 2.77, 'throughput': 9994.81} [INFO|2025-03-21 11:05:38] logging.py:143 >> {'loss': 0.2249, 'learning_rate': 7.5643e-07, 'epoch': 2.77, 'throughput': 9994.80} [INFO|2025-03-21 11:06:18] logging.py:143 >> {'loss': 0.2394, 'learning_rate': 7.5302e-07, 'epoch': 2.77, 'throughput': 9994.84} [INFO|2025-03-21 11:07:00] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 7.4961e-07, 'epoch': 2.77, 'throughput': 9994.80} [INFO|2025-03-21 11:07:40] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 7.4621e-07, 'epoch': 2.77, 'throughput': 9994.81} [INFO|2025-03-21 11:08:21] logging.py:143 >> {'loss': 0.1882, 'learning_rate': 7.4283e-07, 'epoch': 2.77, 'throughput': 9994.81} [INFO|2025-03-21 11:09:03] logging.py:143 >> {'loss': 0.2148, 'learning_rate': 7.3944e-07, 'epoch': 2.77, 'throughput': 9994.70} [INFO|2025-03-21 11:09:44] logging.py:143 >> {'loss': 0.2455, 'learning_rate': 7.3607e-07, 'epoch': 2.77, 'throughput': 9994.69} [INFO|2025-03-21 11:10:25] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 7.3270e-07, 'epoch': 2.77, 'throughput': 9994.73} [INFO|2025-03-21 11:11:04] logging.py:143 >> {'loss': 0.2233, 'learning_rate': 7.2934e-07, 'epoch': 2.77, 'throughput': 9994.78} [INFO|2025-03-21 11:11:45] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 7.2599e-07, 'epoch': 2.77, 'throughput': 9994.76} [INFO|2025-03-21 11:12:26] logging.py:143 >> {'loss': 0.2362, 'learning_rate': 7.2265e-07, 'epoch': 2.77, 'throughput': 9994.76} [INFO|2025-03-21 11:13:08] logging.py:143 >> {'loss': 0.2462, 'learning_rate': 7.1931e-07, 'epoch': 2.77, 'throughput': 9994.75} [INFO|2025-03-21 11:13:48] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 7.1598e-07, 'epoch': 2.77, 'throughput': 9994.78} [INFO|2025-03-21 11:14:28] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 7.1266e-07, 'epoch': 2.77, 'throughput': 9994.81} [INFO|2025-03-21 11:15:08] logging.py:143 >> {'loss': 0.2165, 'learning_rate': 7.0935e-07, 'epoch': 2.77, 'throughput': 9994.79} [INFO|2025-03-21 11:15:49] logging.py:143 >> {'loss': 0.2370, 'learning_rate': 7.0604e-07, 'epoch': 2.77, 'throughput': 9994.82} [INFO|2025-03-21 11:16:28] logging.py:143 >> {'loss': 0.2289, 'learning_rate': 7.0274e-07, 'epoch': 2.77, 'throughput': 9994.86} [INFO|2025-03-21 11:17:08] logging.py:143 >> {'loss': 0.2453, 'learning_rate': 6.9945e-07, 'epoch': 2.78, 'throughput': 9994.89} [INFO|2025-03-21 11:17:48] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 6.9617e-07, 'epoch': 2.78, 'throughput': 9994.91} [INFO|2025-03-21 11:18:28] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 6.9289e-07, 'epoch': 2.78, 'throughput': 9994.92} [INFO|2025-03-21 11:19:08] logging.py:143 >> {'loss': 0.2401, 'learning_rate': 6.8963e-07, 'epoch': 2.78, 'throughput': 9994.90} [INFO|2025-03-21 11:19:48] logging.py:143 >> {'loss': 0.2044, 'learning_rate': 6.8637e-07, 'epoch': 2.78, 'throughput': 9994.92} [INFO|2025-03-21 11:20:28] logging.py:143 >> {'loss': 0.2143, 'learning_rate': 6.8311e-07, 'epoch': 2.78, 'throughput': 9994.88} [INFO|2025-03-21 11:21:10] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 6.7987e-07, 'epoch': 2.78, 'throughput': 9994.85} [INFO|2025-03-21 11:21:51] logging.py:143 >> {'loss': 0.2256, 'learning_rate': 6.7663e-07, 'epoch': 2.78, 'throughput': 9994.88} [INFO|2025-03-21 11:22:30] logging.py:143 >> {'loss': 0.2132, 'learning_rate': 6.7340e-07, 'epoch': 2.78, 'throughput': 9994.88} [INFO|2025-03-21 11:23:11] logging.py:143 >> {'loss': 0.2519, 'learning_rate': 6.7018e-07, 'epoch': 2.78, 'throughput': 9994.93} [INFO|2025-03-21 11:23:52] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 6.6696e-07, 'epoch': 2.78, 'throughput': 9994.93} [INFO|2025-03-21 11:24:33] logging.py:143 >> {'loss': 0.2257, 'learning_rate': 6.6376e-07, 'epoch': 2.78, 'throughput': 9994.94} [INFO|2025-03-21 11:25:13] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 6.6056e-07, 'epoch': 2.78, 'throughput': 9995.01} [INFO|2025-03-21 11:25:55] logging.py:143 >> {'loss': 0.2385, 'learning_rate': 6.5737e-07, 'epoch': 2.78, 'throughput': 9995.02} [INFO|2025-03-21 11:26:34] logging.py:143 >> {'loss': 0.2251, 'learning_rate': 6.5418e-07, 'epoch': 2.78, 'throughput': 9995.06} [INFO|2025-03-21 11:27:14] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 6.5101e-07, 'epoch': 2.78, 'throughput': 9995.10} [INFO|2025-03-21 11:27:55] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 6.4784e-07, 'epoch': 2.78, 'throughput': 9995.10} [INFO|2025-03-21 11:28:35] logging.py:143 >> {'loss': 0.2555, 'learning_rate': 6.4468e-07, 'epoch': 2.78, 'throughput': 9995.09} [INFO|2025-03-21 11:29:16] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 6.4152e-07, 'epoch': 2.78, 'throughput': 9995.08} [INFO|2025-03-21 11:29:57] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 6.3838e-07, 'epoch': 2.79, 'throughput': 9995.03} [INFO|2025-03-21 11:30:36] logging.py:143 >> {'loss': 0.2431, 'learning_rate': 6.3524e-07, 'epoch': 2.79, 'throughput': 9995.10} [INFO|2025-03-21 11:31:18] logging.py:143 >> {'loss': 0.2382, 'learning_rate': 6.3211e-07, 'epoch': 2.79, 'throughput': 9995.06} [INFO|2025-03-21 11:31:57] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 6.2898e-07, 'epoch': 2.79, 'throughput': 9995.11} [INFO|2025-03-21 11:32:37] logging.py:143 >> {'loss': 0.2721, 'learning_rate': 6.2587e-07, 'epoch': 2.79, 'throughput': 9995.13} [INFO|2025-03-21 11:33:18] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 6.2276e-07, 'epoch': 2.79, 'throughput': 9995.10} [INFO|2025-03-21 11:33:58] logging.py:143 >> {'loss': 0.2268, 'learning_rate': 6.1966e-07, 'epoch': 2.79, 'throughput': 9995.11} [INFO|2025-03-21 11:34:39] logging.py:143 >> {'loss': 0.2235, 'learning_rate': 6.1657e-07, 'epoch': 2.79, 'throughput': 9995.09} [INFO|2025-03-21 11:35:19] logging.py:143 >> {'loss': 0.2372, 'learning_rate': 6.1348e-07, 'epoch': 2.79, 'throughput': 9995.11} [INFO|2025-03-21 11:35:59] logging.py:143 >> {'loss': 0.2323, 'learning_rate': 6.1041e-07, 'epoch': 2.79, 'throughput': 9995.07} [INFO|2025-03-21 11:36:40] logging.py:143 >> {'loss': 0.2319, 'learning_rate': 6.0734e-07, 'epoch': 2.79, 'throughput': 9995.08} [INFO|2025-03-21 11:37:21] logging.py:143 >> {'loss': 0.2471, 'learning_rate': 6.0427e-07, 'epoch': 2.79, 'throughput': 9995.06} [INFO|2025-03-21 11:38:00] logging.py:143 >> {'loss': 0.2299, 'learning_rate': 6.0122e-07, 'epoch': 2.79, 'throughput': 9995.11} [INFO|2025-03-21 11:38:41] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 5.9817e-07, 'epoch': 2.79, 'throughput': 9995.12} [INFO|2025-03-21 11:39:21] logging.py:143 >> {'loss': 0.2381, 'learning_rate': 5.9513e-07, 'epoch': 2.79, 'throughput': 9995.13} [INFO|2025-03-21 11:40:02] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 5.9210e-07, 'epoch': 2.79, 'throughput': 9995.16} [INFO|2025-03-21 11:40:42] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 5.8908e-07, 'epoch': 2.79, 'throughput': 9995.17} [INFO|2025-03-21 11:41:22] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 5.8606e-07, 'epoch': 2.79, 'throughput': 9995.18} [INFO|2025-03-21 11:42:02] logging.py:143 >> {'loss': 0.2045, 'learning_rate': 5.8306e-07, 'epoch': 2.79, 'throughput': 9995.13} [INFO|2025-03-21 11:42:42] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 5.8005e-07, 'epoch': 2.80, 'throughput': 9995.16} [INFO|2025-03-21 11:43:23] logging.py:143 >> {'loss': 0.2202, 'learning_rate': 5.7706e-07, 'epoch': 2.80, 'throughput': 9995.20} [INFO|2025-03-21 11:44:04] logging.py:143 >> {'loss': 0.2185, 'learning_rate': 5.7408e-07, 'epoch': 2.80, 'throughput': 9995.22} [INFO|2025-03-21 11:44:45] logging.py:143 >> {'loss': 0.2344, 'learning_rate': 5.7110e-07, 'epoch': 2.80, 'throughput': 9995.19} [INFO|2025-03-21 11:45:24] logging.py:143 >> {'loss': 0.2388, 'learning_rate': 5.6813e-07, 'epoch': 2.80, 'throughput': 9995.20} [INFO|2025-03-21 11:46:04] logging.py:143 >> {'loss': 0.2003, 'learning_rate': 5.6517e-07, 'epoch': 2.80, 'throughput': 9995.15} [INFO|2025-03-21 11:46:45] logging.py:143 >> {'loss': 0.2282, 'learning_rate': 5.6221e-07, 'epoch': 2.80, 'throughput': 9995.14} [INFO|2025-03-21 11:47:28] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 5.5926e-07, 'epoch': 2.80, 'throughput': 9995.10} [INFO|2025-03-21 11:48:08] logging.py:143 >> {'loss': 0.2035, 'learning_rate': 5.5632e-07, 'epoch': 2.80, 'throughput': 9995.11} [INFO|2025-03-21 11:48:50] logging.py:143 >> {'loss': 0.2251, 'learning_rate': 5.5339e-07, 'epoch': 2.80, 'throughput': 9995.13} [INFO|2025-03-21 11:49:30] logging.py:143 >> {'loss': 0.2262, 'learning_rate': 5.5047e-07, 'epoch': 2.80, 'throughput': 9995.17} [INFO|2025-03-21 11:50:10] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 5.4755e-07, 'epoch': 2.80, 'throughput': 9995.19} [INFO|2025-03-21 11:50:49] logging.py:143 >> {'loss': 0.2167, 'learning_rate': 5.4464e-07, 'epoch': 2.80, 'throughput': 9995.19} [INFO|2025-03-21 11:51:30] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 5.4174e-07, 'epoch': 2.80, 'throughput': 9995.22} [INFO|2025-03-21 11:52:11] logging.py:143 >> {'loss': 0.2395, 'learning_rate': 5.3885e-07, 'epoch': 2.80, 'throughput': 9995.22} [INFO|2025-03-21 11:52:50] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 5.3596e-07, 'epoch': 2.80, 'throughput': 9995.30} [INFO|2025-03-21 11:53:32] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 5.3308e-07, 'epoch': 2.80, 'throughput': 9995.28} [INFO|2025-03-21 11:54:12] logging.py:143 >> {'loss': 0.2175, 'learning_rate': 5.3021e-07, 'epoch': 2.80, 'throughput': 9995.31} [INFO|2025-03-21 11:54:52] logging.py:143 >> {'loss': 0.2327, 'learning_rate': 5.2735e-07, 'epoch': 2.80, 'throughput': 9995.32} [INFO|2025-03-21 11:55:33] logging.py:143 >> {'loss': 0.2273, 'learning_rate': 5.2450e-07, 'epoch': 2.81, 'throughput': 9995.34} [INFO|2025-03-21 11:56:12] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 5.2165e-07, 'epoch': 2.81, 'throughput': 9995.38} [INFO|2025-03-21 11:56:53] logging.py:143 >> {'loss': 0.2249, 'learning_rate': 5.1881e-07, 'epoch': 2.81, 'throughput': 9995.42} [INFO|2025-03-21 11:57:34] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 5.1598e-07, 'epoch': 2.81, 'throughput': 9995.39} [INFO|2025-03-21 11:58:14] logging.py:143 >> {'loss': 0.2470, 'learning_rate': 5.1315e-07, 'epoch': 2.81, 'throughput': 9995.42} [INFO|2025-03-21 11:58:54] logging.py:143 >> {'loss': 0.2275, 'learning_rate': 5.1033e-07, 'epoch': 2.81, 'throughput': 9995.40} [INFO|2025-03-21 11:59:33] logging.py:143 >> {'loss': 0.2482, 'learning_rate': 5.0753e-07, 'epoch': 2.81, 'throughput': 9995.43} [INFO|2025-03-21 12:00:14] logging.py:143 >> {'loss': 0.2149, 'learning_rate': 5.0472e-07, 'epoch': 2.81, 'throughput': 9995.42} [INFO|2025-03-21 12:00:54] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 5.0193e-07, 'epoch': 2.81, 'throughput': 9995.45} [INFO|2025-03-21 12:01:35] logging.py:143 >> {'loss': 0.2270, 'learning_rate': 4.9914e-07, 'epoch': 2.81, 'throughput': 9995.45} [INFO|2025-03-21 12:02:16] logging.py:143 >> {'loss': 0.2219, 'learning_rate': 4.9637e-07, 'epoch': 2.81, 'throughput': 9995.41} [INFO|2025-03-21 12:02:57] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 4.9360e-07, 'epoch': 2.81, 'throughput': 9995.41} [INFO|2025-03-21 12:03:38] logging.py:143 >> {'loss': 0.2451, 'learning_rate': 4.9083e-07, 'epoch': 2.81, 'throughput': 9995.44} [INFO|2025-03-21 12:04:18] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 4.8808e-07, 'epoch': 2.81, 'throughput': 9995.48} [INFO|2025-03-21 12:04:59] logging.py:143 >> {'loss': 0.2113, 'learning_rate': 4.8533e-07, 'epoch': 2.81, 'throughput': 9995.49} [INFO|2025-03-21 12:05:40] logging.py:143 >> {'loss': 0.2093, 'learning_rate': 4.8259e-07, 'epoch': 2.81, 'throughput': 9995.48} [INFO|2025-03-21 12:06:21] logging.py:143 >> {'loss': 0.2089, 'learning_rate': 4.7986e-07, 'epoch': 2.81, 'throughput': 9995.48} [INFO|2025-03-21 12:07:00] logging.py:143 >> {'loss': 0.2115, 'learning_rate': 4.7713e-07, 'epoch': 2.81, 'throughput': 9995.48} [INFO|2025-03-21 12:07:40] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 4.7441e-07, 'epoch': 2.81, 'throughput': 9995.47} [INFO|2025-03-21 12:08:21] logging.py:143 >> {'loss': 0.2400, 'learning_rate': 4.7171e-07, 'epoch': 2.82, 'throughput': 9995.47} [INFO|2025-03-21 12:09:01] logging.py:143 >> {'loss': 0.2520, 'learning_rate': 4.6900e-07, 'epoch': 2.82, 'throughput': 9995.50} [INFO|2025-03-21 12:09:42] logging.py:143 >> {'loss': 0.2208, 'learning_rate': 4.6631e-07, 'epoch': 2.82, 'throughput': 9995.48} [INFO|2025-03-21 12:10:22] logging.py:143 >> {'loss': 0.2284, 'learning_rate': 4.6362e-07, 'epoch': 2.82, 'throughput': 9995.50} [INFO|2025-03-21 12:11:04] logging.py:143 >> {'loss': 0.2162, 'learning_rate': 4.6094e-07, 'epoch': 2.82, 'throughput': 9995.47} [INFO|2025-03-21 12:11:44] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 4.5827e-07, 'epoch': 2.82, 'throughput': 9995.46} [INFO|2025-03-21 12:12:25] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 4.5561e-07, 'epoch': 2.82, 'throughput': 9995.49} [INFO|2025-03-21 12:13:04] logging.py:143 >> {'loss': 0.2415, 'learning_rate': 4.5295e-07, 'epoch': 2.82, 'throughput': 9995.54} [INFO|2025-03-21 12:13:45] logging.py:143 >> {'loss': 0.2281, 'learning_rate': 4.5031e-07, 'epoch': 2.82, 'throughput': 9995.52} [INFO|2025-03-21 12:14:26] logging.py:143 >> {'loss': 0.2523, 'learning_rate': 4.4767e-07, 'epoch': 2.82, 'throughput': 9995.49} [INFO|2025-03-21 12:15:06] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 4.4503e-07, 'epoch': 2.82, 'throughput': 9995.51} [INFO|2025-03-21 12:15:46] logging.py:143 >> {'loss': 0.2390, 'learning_rate': 4.4241e-07, 'epoch': 2.82, 'throughput': 9995.50} [INFO|2025-03-21 12:16:25] logging.py:143 >> {'loss': 0.2233, 'learning_rate': 4.3979e-07, 'epoch': 2.82, 'throughput': 9995.55} [INFO|2025-03-21 12:17:07] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 4.3718e-07, 'epoch': 2.82, 'throughput': 9995.52} [INFO|2025-03-21 12:17:47] logging.py:143 >> {'loss': 0.2292, 'learning_rate': 4.3458e-07, 'epoch': 2.82, 'throughput': 9995.52} [INFO|2025-03-21 12:18:26] logging.py:143 >> {'loss': 0.1983, 'learning_rate': 4.3199e-07, 'epoch': 2.82, 'throughput': 9995.52} [INFO|2025-03-21 12:19:06] logging.py:143 >> {'loss': 0.2161, 'learning_rate': 4.2940e-07, 'epoch': 2.82, 'throughput': 9995.58} [INFO|2025-03-21 12:19:47] logging.py:143 >> {'loss': 0.2407, 'learning_rate': 4.2682e-07, 'epoch': 2.82, 'throughput': 9995.61} [INFO|2025-03-21 12:20:27] logging.py:143 >> {'loss': 0.2175, 'learning_rate': 4.2425e-07, 'epoch': 2.83, 'throughput': 9995.68} [INFO|2025-03-21 12:21:06] logging.py:143 >> {'loss': 0.2412, 'learning_rate': 4.2169e-07, 'epoch': 2.83, 'throughput': 9995.72} [INFO|2025-03-21 12:21:46] logging.py:143 >> {'loss': 0.2526, 'learning_rate': 4.1913e-07, 'epoch': 2.83, 'throughput': 9995.71} [INFO|2025-03-21 12:22:26] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 4.1659e-07, 'epoch': 2.83, 'throughput': 9995.73} [INFO|2025-03-21 12:23:07] logging.py:143 >> {'loss': 0.2363, 'learning_rate': 4.1405e-07, 'epoch': 2.83, 'throughput': 9995.77} [INFO|2025-03-21 12:23:47] logging.py:143 >> {'loss': 0.2228, 'learning_rate': 4.1151e-07, 'epoch': 2.83, 'throughput': 9995.78} [INFO|2025-03-21 12:24:27] logging.py:143 >> {'loss': 0.2138, 'learning_rate': 4.0899e-07, 'epoch': 2.83, 'throughput': 9995.78} [INFO|2025-03-21 12:25:06] logging.py:143 >> {'loss': 0.2226, 'learning_rate': 4.0647e-07, 'epoch': 2.83, 'throughput': 9995.82} [INFO|2025-03-21 12:25:48] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 4.0396e-07, 'epoch': 2.83, 'throughput': 9995.82} [INFO|2025-03-21 12:26:28] logging.py:143 >> {'loss': 0.2290, 'learning_rate': 4.0146e-07, 'epoch': 2.83, 'throughput': 9995.79} [INFO|2025-03-21 12:27:07] logging.py:143 >> {'loss': 0.2359, 'learning_rate': 3.9897e-07, 'epoch': 2.83, 'throughput': 9995.83} [INFO|2025-03-21 12:27:48] logging.py:143 >> {'loss': 0.2018, 'learning_rate': 3.9648e-07, 'epoch': 2.83, 'throughput': 9995.80} [INFO|2025-03-21 12:28:28] logging.py:143 >> {'loss': 0.2463, 'learning_rate': 3.9400e-07, 'epoch': 2.83, 'throughput': 9995.84} [INFO|2025-03-21 12:29:09] logging.py:143 >> {'loss': 0.2314, 'learning_rate': 3.9153e-07, 'epoch': 2.83, 'throughput': 9995.85} [INFO|2025-03-21 12:29:51] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 3.8907e-07, 'epoch': 2.83, 'throughput': 9995.83} [INFO|2025-03-21 12:30:31] logging.py:143 >> {'loss': 0.2106, 'learning_rate': 3.8661e-07, 'epoch': 2.83, 'throughput': 9995.83} [INFO|2025-03-21 12:31:11] logging.py:143 >> {'loss': 0.2097, 'learning_rate': 3.8417e-07, 'epoch': 2.83, 'throughput': 9995.74} [INFO|2025-03-21 12:31:52] logging.py:143 >> {'loss': 0.2373, 'learning_rate': 3.8173e-07, 'epoch': 2.83, 'throughput': 9995.73} [INFO|2025-03-21 12:32:33] logging.py:143 >> {'loss': 0.2104, 'learning_rate': 3.7929e-07, 'epoch': 2.83, 'throughput': 9995.71} [INFO|2025-03-21 12:33:14] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 3.7687e-07, 'epoch': 2.84, 'throughput': 9995.70} [INFO|2025-03-21 12:33:53] logging.py:143 >> {'loss': 0.2197, 'learning_rate': 3.7445e-07, 'epoch': 2.84, 'throughput': 9995.74} [INFO|2025-03-21 12:34:34] logging.py:143 >> {'loss': 0.2251, 'learning_rate': 3.7204e-07, 'epoch': 2.84, 'throughput': 9995.75} [INFO|2025-03-21 12:35:15] logging.py:143 >> {'loss': 0.2328, 'learning_rate': 3.6964e-07, 'epoch': 2.84, 'throughput': 9995.73} [INFO|2025-03-21 12:35:54] logging.py:143 >> {'loss': 0.2278, 'learning_rate': 3.6725e-07, 'epoch': 2.84, 'throughput': 9995.77} [INFO|2025-03-21 12:36:35] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 3.6486e-07, 'epoch': 2.84, 'throughput': 9995.80} [INFO|2025-03-21 12:37:16] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 3.6248e-07, 'epoch': 2.84, 'throughput': 9995.81} [INFO|2025-03-21 12:37:56] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 3.6011e-07, 'epoch': 2.84, 'throughput': 9995.81} [INFO|2025-03-21 12:38:35] logging.py:143 >> {'loss': 0.2199, 'learning_rate': 3.5775e-07, 'epoch': 2.84, 'throughput': 9995.82} [INFO|2025-03-21 12:39:15] logging.py:143 >> {'loss': 0.2351, 'learning_rate': 3.5540e-07, 'epoch': 2.84, 'throughput': 9995.83} [INFO|2025-03-21 12:39:55] logging.py:143 >> {'loss': 0.2233, 'learning_rate': 3.5305e-07, 'epoch': 2.84, 'throughput': 9995.82} [INFO|2025-03-21 12:40:36] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 3.5071e-07, 'epoch': 2.84, 'throughput': 9995.78} [INFO|2025-03-21 12:41:17] logging.py:143 >> {'loss': 0.2000, 'learning_rate': 3.4838e-07, 'epoch': 2.84, 'throughput': 9995.76} [INFO|2025-03-21 12:41:57] logging.py:143 >> {'loss': 0.2260, 'learning_rate': 3.4605e-07, 'epoch': 2.84, 'throughput': 9995.78} [INFO|2025-03-21 12:42:37] logging.py:143 >> {'loss': 0.2248, 'learning_rate': 3.4374e-07, 'epoch': 2.84, 'throughput': 9995.79} [INFO|2025-03-21 12:43:17] logging.py:143 >> {'loss': 0.2233, 'learning_rate': 3.4143e-07, 'epoch': 2.84, 'throughput': 9995.83} [INFO|2025-03-21 12:43:58] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 3.3913e-07, 'epoch': 2.84, 'throughput': 9995.84} [INFO|2025-03-21 12:44:39] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 3.3683e-07, 'epoch': 2.84, 'throughput': 9995.84} [INFO|2025-03-21 12:45:19] logging.py:143 >> {'loss': 0.2134, 'learning_rate': 3.3455e-07, 'epoch': 2.84, 'throughput': 9995.83} [INFO|2025-03-21 12:46:01] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 3.3227e-07, 'epoch': 2.85, 'throughput': 9995.79} [INFO|2025-03-21 12:46:41] logging.py:143 >> {'loss': 0.2475, 'learning_rate': 3.3000e-07, 'epoch': 2.85, 'throughput': 9995.77} [INFO|2025-03-21 12:47:21] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 3.2774e-07, 'epoch': 2.85, 'throughput': 9995.79} [INFO|2025-03-21 12:48:01] logging.py:143 >> {'loss': 0.2457, 'learning_rate': 3.2548e-07, 'epoch': 2.85, 'throughput': 9995.83} [INFO|2025-03-21 12:48:43] logging.py:143 >> {'loss': 0.2190, 'learning_rate': 3.2324e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:49:24] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 3.2100e-07, 'epoch': 2.85, 'throughput': 9995.75} [INFO|2025-03-21 12:50:04] logging.py:143 >> {'loss': 0.2177, 'learning_rate': 3.1877e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:50:45] logging.py:143 >> {'loss': 0.2212, 'learning_rate': 3.1654e-07, 'epoch': 2.85, 'throughput': 9995.79} [INFO|2025-03-21 12:51:26] logging.py:143 >> {'loss': 0.2297, 'learning_rate': 3.1433e-07, 'epoch': 2.85, 'throughput': 9995.76} [INFO|2025-03-21 12:52:06] logging.py:143 >> {'loss': 0.2078, 'learning_rate': 3.1212e-07, 'epoch': 2.85, 'throughput': 9995.76} [INFO|2025-03-21 12:52:46] logging.py:143 >> {'loss': 0.2427, 'learning_rate': 3.0992e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:53:27] logging.py:143 >> {'loss': 0.2215, 'learning_rate': 3.0773e-07, 'epoch': 2.85, 'throughput': 9995.74} [INFO|2025-03-21 12:54:07] logging.py:143 >> {'loss': 0.2319, 'learning_rate': 3.0554e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:54:45] logging.py:143 >> {'loss': 0.2465, 'learning_rate': 3.0336e-07, 'epoch': 2.85, 'throughput': 9995.86} [INFO|2025-03-21 12:55:27] logging.py:143 >> {'loss': 0.2144, 'learning_rate': 3.0119e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:56:08] logging.py:143 >> {'loss': 0.2530, 'learning_rate': 2.9903e-07, 'epoch': 2.85, 'throughput': 9995.78} [INFO|2025-03-21 12:56:50] logging.py:143 >> {'loss': 0.2309, 'learning_rate': 2.9688e-07, 'epoch': 2.85, 'throughput': 9995.74} [INFO|2025-03-21 12:57:30] logging.py:143 >> {'loss': 0.2150, 'learning_rate': 2.9473e-07, 'epoch': 2.85, 'throughput': 9995.80} [INFO|2025-03-21 12:58:10] logging.py:143 >> {'loss': 0.2490, 'learning_rate': 2.9259e-07, 'epoch': 2.85, 'throughput': 9995.82} [INFO|2025-03-21 12:58:50] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 2.9046e-07, 'epoch': 2.86, 'throughput': 9995.84} [INFO|2025-03-21 12:59:30] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 2.8834e-07, 'epoch': 2.86, 'throughput': 9995.81} [INFO|2025-03-21 13:00:09] logging.py:143 >> {'loss': 0.2329, 'learning_rate': 2.8622e-07, 'epoch': 2.86, 'throughput': 9995.83} [INFO|2025-03-21 13:00:49] logging.py:143 >> {'loss': 0.2438, 'learning_rate': 2.8412e-07, 'epoch': 2.86, 'throughput': 9995.85} [INFO|2025-03-21 13:01:28] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 2.8202e-07, 'epoch': 2.86, 'throughput': 9995.88} [INFO|2025-03-21 13:02:11] logging.py:143 >> {'loss': 0.2136, 'learning_rate': 2.7992e-07, 'epoch': 2.86, 'throughput': 9995.82} [INFO|2025-03-21 13:02:52] logging.py:143 >> {'loss': 0.2328, 'learning_rate': 2.7784e-07, 'epoch': 2.86, 'throughput': 9995.79} [INFO|2025-03-21 13:03:32] logging.py:143 >> {'loss': 0.2279, 'learning_rate': 2.7576e-07, 'epoch': 2.86, 'throughput': 9995.81} [INFO|2025-03-21 13:04:12] logging.py:143 >> {'loss': 0.2223, 'learning_rate': 2.7370e-07, 'epoch': 2.86, 'throughput': 9995.86} [INFO|2025-03-21 13:04:51] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 2.7163e-07, 'epoch': 2.86, 'throughput': 9995.90} [INFO|2025-03-21 13:05:30] logging.py:143 >> {'loss': 0.2128, 'learning_rate': 2.6958e-07, 'epoch': 2.86, 'throughput': 9995.88} [INFO|2025-03-21 13:06:11] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 2.6754e-07, 'epoch': 2.86, 'throughput': 9995.91} [INFO|2025-03-21 13:06:51] logging.py:143 >> {'loss': 0.2261, 'learning_rate': 2.6550e-07, 'epoch': 2.86, 'throughput': 9995.93} [INFO|2025-03-21 13:07:31] logging.py:143 >> {'loss': 0.2495, 'learning_rate': 2.6347e-07, 'epoch': 2.86, 'throughput': 9995.93} [INFO|2025-03-21 13:08:11] logging.py:143 >> {'loss': 0.2384, 'learning_rate': 2.6144e-07, 'epoch': 2.86, 'throughput': 9995.99} [INFO|2025-03-21 13:08:51] logging.py:143 >> {'loss': 0.2260, 'learning_rate': 2.5943e-07, 'epoch': 2.86, 'throughput': 9995.98} [INFO|2025-03-21 13:09:30] logging.py:143 >> {'loss': 0.2219, 'learning_rate': 2.5742e-07, 'epoch': 2.86, 'throughput': 9996.02} [INFO|2025-03-21 13:10:12] logging.py:143 >> {'loss': 0.2581, 'learning_rate': 2.5542e-07, 'epoch': 2.86, 'throughput': 9996.00} [INFO|2025-03-21 13:10:52] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 2.5343e-07, 'epoch': 2.86, 'throughput': 9995.98} [INFO|2025-03-21 13:11:33] logging.py:143 >> {'loss': 0.2155, 'learning_rate': 2.5145e-07, 'epoch': 2.87, 'throughput': 9995.98} [INFO|2025-03-21 13:12:13] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 2.4947e-07, 'epoch': 2.87, 'throughput': 9996.01} [INFO|2025-03-21 13:12:54] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 2.4751e-07, 'epoch': 2.87, 'throughput': 9996.02} [INFO|2025-03-21 13:13:35] logging.py:143 >> {'loss': 0.2313, 'learning_rate': 2.4555e-07, 'epoch': 2.87, 'throughput': 9996.05} [INFO|2025-03-21 13:14:15] logging.py:143 >> {'loss': 0.2321, 'learning_rate': 2.4359e-07, 'epoch': 2.87, 'throughput': 9996.09} [INFO|2025-03-21 13:14:56] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 2.4165e-07, 'epoch': 2.87, 'throughput': 9996.07} [INFO|2025-03-21 13:15:37] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 2.3971e-07, 'epoch': 2.87, 'throughput': 9996.09} [INFO|2025-03-21 13:16:16] logging.py:143 >> {'loss': 0.2217, 'learning_rate': 2.3778e-07, 'epoch': 2.87, 'throughput': 9996.10} [INFO|2025-03-21 13:16:56] logging.py:143 >> {'loss': 0.2403, 'learning_rate': 2.3586e-07, 'epoch': 2.87, 'throughput': 9996.18} [INFO|2025-03-21 13:17:36] logging.py:143 >> {'loss': 0.2257, 'learning_rate': 2.3395e-07, 'epoch': 2.87, 'throughput': 9996.18} [INFO|2025-03-21 13:18:17] logging.py:143 >> {'loss': 0.2310, 'learning_rate': 2.3204e-07, 'epoch': 2.87, 'throughput': 9996.18} [INFO|2025-03-21 13:18:56] logging.py:143 >> {'loss': 0.2229, 'learning_rate': 2.3014e-07, 'epoch': 2.87, 'throughput': 9996.18} [INFO|2025-03-21 13:19:36] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 2.2825e-07, 'epoch': 2.87, 'throughput': 9996.23} [INFO|2025-03-21 13:20:16] logging.py:143 >> {'loss': 0.2199, 'learning_rate': 2.2637e-07, 'epoch': 2.87, 'throughput': 9996.23} [INFO|2025-03-21 13:20:55] logging.py:143 >> {'loss': 0.2245, 'learning_rate': 2.2449e-07, 'epoch': 2.87, 'throughput': 9996.22} [INFO|2025-03-21 13:21:36] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 2.2263e-07, 'epoch': 2.87, 'throughput': 9996.25} [INFO|2025-03-21 13:22:16] logging.py:143 >> {'loss': 0.2346, 'learning_rate': 2.2077e-07, 'epoch': 2.87, 'throughput': 9996.25} [INFO|2025-03-21 13:22:57] logging.py:143 >> {'loss': 0.2354, 'learning_rate': 2.1892e-07, 'epoch': 2.87, 'throughput': 9996.21} [INFO|2025-03-21 13:23:38] logging.py:143 >> {'loss': 0.2274, 'learning_rate': 2.1707e-07, 'epoch': 2.87, 'throughput': 9996.20} [INFO|2025-03-21 13:24:19] logging.py:143 >> {'loss': 0.2346, 'learning_rate': 2.1524e-07, 'epoch': 2.88, 'throughput': 9996.20} [INFO|2025-03-21 13:24:59] logging.py:143 >> {'loss': 0.2337, 'learning_rate': 2.1341e-07, 'epoch': 2.88, 'throughput': 9996.29} [INFO|2025-03-21 13:25:41] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 2.1159e-07, 'epoch': 2.88, 'throughput': 9996.30} [INFO|2025-03-21 13:26:21] logging.py:143 >> {'loss': 0.2190, 'learning_rate': 2.0977e-07, 'epoch': 2.88, 'throughput': 9996.32} [INFO|2025-03-21 13:27:02] logging.py:143 >> {'loss': 0.2231, 'learning_rate': 2.0797e-07, 'epoch': 2.88, 'throughput': 9996.28} [INFO|2025-03-21 13:27:42] logging.py:143 >> {'loss': 0.2140, 'learning_rate': 2.0617e-07, 'epoch': 2.88, 'throughput': 9996.32} [INFO|2025-03-21 13:28:23] logging.py:143 >> {'loss': 0.2136, 'learning_rate': 2.0438e-07, 'epoch': 2.88, 'throughput': 9996.34} [INFO|2025-03-21 13:29:02] logging.py:143 >> {'loss': 0.2369, 'learning_rate': 2.0260e-07, 'epoch': 2.88, 'throughput': 9996.35} [INFO|2025-03-21 13:29:43] logging.py:143 >> {'loss': 0.2282, 'learning_rate': 2.0083e-07, 'epoch': 2.88, 'throughput': 9996.39} [INFO|2025-03-21 13:30:23] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 1.9906e-07, 'epoch': 2.88, 'throughput': 9996.37} [INFO|2025-03-21 13:31:03] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 1.9730e-07, 'epoch': 2.88, 'throughput': 9996.37} [INFO|2025-03-21 13:31:43] logging.py:143 >> {'loss': 0.2194, 'learning_rate': 1.9555e-07, 'epoch': 2.88, 'throughput': 9996.41} [INFO|2025-03-21 13:32:24] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 1.9381e-07, 'epoch': 2.88, 'throughput': 9996.44} [INFO|2025-03-21 13:33:06] logging.py:143 >> {'loss': 0.2277, 'learning_rate': 1.9207e-07, 'epoch': 2.88, 'throughput': 9996.39} [INFO|2025-03-21 13:33:47] logging.py:143 >> {'loss': 0.2251, 'learning_rate': 1.9034e-07, 'epoch': 2.88, 'throughput': 9996.38} [INFO|2025-03-21 13:34:27] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 1.8863e-07, 'epoch': 2.88, 'throughput': 9996.35} [INFO|2025-03-21 13:35:07] logging.py:143 >> {'loss': 0.2284, 'learning_rate': 1.8691e-07, 'epoch': 2.88, 'throughput': 9996.36} [INFO|2025-03-21 13:35:47] logging.py:143 >> {'loss': 0.2382, 'learning_rate': 1.8521e-07, 'epoch': 2.88, 'throughput': 9996.33} [INFO|2025-03-21 13:36:27] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 1.8351e-07, 'epoch': 2.89, 'throughput': 9996.38} [INFO|2025-03-21 13:37:08] logging.py:143 >> {'loss': 0.2433, 'learning_rate': 1.8182e-07, 'epoch': 2.89, 'throughput': 9996.43} [INFO|2025-03-21 13:37:48] logging.py:143 >> {'loss': 0.2084, 'learning_rate': 1.8014e-07, 'epoch': 2.89, 'throughput': 9996.39} [INFO|2025-03-21 13:38:29] logging.py:143 >> {'loss': 0.2242, 'learning_rate': 1.7847e-07, 'epoch': 2.89, 'throughput': 9996.42} [INFO|2025-03-21 13:39:09] logging.py:143 >> {'loss': 0.2377, 'learning_rate': 1.7681e-07, 'epoch': 2.89, 'throughput': 9996.45} [INFO|2025-03-21 13:39:50] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 1.7515e-07, 'epoch': 2.89, 'throughput': 9996.39} [INFO|2025-03-21 13:40:32] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 1.7350e-07, 'epoch': 2.89, 'throughput': 9996.41} [INFO|2025-03-21 13:41:12] logging.py:143 >> {'loss': 0.2291, 'learning_rate': 1.7186e-07, 'epoch': 2.89, 'throughput': 9996.47} [INFO|2025-03-21 13:41:51] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 1.7022e-07, 'epoch': 2.89, 'throughput': 9996.47} [INFO|2025-03-21 13:42:32] logging.py:143 >> {'loss': 0.2135, 'learning_rate': 1.6860e-07, 'epoch': 2.89, 'throughput': 9996.49} [INFO|2025-03-21 13:43:13] logging.py:143 >> {'loss': 0.2424, 'learning_rate': 1.6698e-07, 'epoch': 2.89, 'throughput': 9996.47} [INFO|2025-03-21 13:43:54] logging.py:143 >> {'loss': 0.2308, 'learning_rate': 1.6537e-07, 'epoch': 2.89, 'throughput': 9996.49} [INFO|2025-03-21 13:44:35] logging.py:143 >> {'loss': 0.2300, 'learning_rate': 1.6376e-07, 'epoch': 2.89, 'throughput': 9996.41} [INFO|2025-03-21 13:45:15] logging.py:143 >> {'loss': 0.2113, 'learning_rate': 1.6217e-07, 'epoch': 2.89, 'throughput': 9996.43} [INFO|2025-03-21 13:45:54] logging.py:143 >> {'loss': 0.2139, 'learning_rate': 1.6058e-07, 'epoch': 2.89, 'throughput': 9996.45} [INFO|2025-03-21 13:46:35] logging.py:143 >> {'loss': 0.2336, 'learning_rate': 1.5900e-07, 'epoch': 2.89, 'throughput': 9996.46} [INFO|2025-03-21 13:47:16] logging.py:143 >> {'loss': 0.2148, 'learning_rate': 1.5743e-07, 'epoch': 2.89, 'throughput': 9996.44} [INFO|2025-03-21 13:47:56] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 1.5587e-07, 'epoch': 2.89, 'throughput': 9996.42} [INFO|2025-03-21 13:48:36] logging.py:143 >> {'loss': 0.2626, 'learning_rate': 1.5431e-07, 'epoch': 2.89, 'throughput': 9996.38} [INFO|2025-03-21 13:49:16] logging.py:143 >> {'loss': 0.2445, 'learning_rate': 1.5276e-07, 'epoch': 2.90, 'throughput': 9996.37} [INFO|2025-03-21 13:49:57] logging.py:143 >> {'loss': 0.2056, 'learning_rate': 1.5122e-07, 'epoch': 2.90, 'throughput': 9996.28} [INFO|2025-03-21 13:50:39] logging.py:143 >> {'loss': 0.2334, 'learning_rate': 1.4969e-07, 'epoch': 2.90, 'throughput': 9996.28} [INFO|2025-03-21 13:51:19] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 1.4816e-07, 'epoch': 2.90, 'throughput': 9996.29} [INFO|2025-03-21 13:51:59] logging.py:143 >> {'loss': 0.2266, 'learning_rate': 1.4664e-07, 'epoch': 2.90, 'throughput': 9996.30} [INFO|2025-03-21 13:52:39] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 1.4514e-07, 'epoch': 2.90, 'throughput': 9996.34} [INFO|2025-03-21 13:53:19] logging.py:143 >> {'loss': 0.2204, 'learning_rate': 1.4363e-07, 'epoch': 2.90, 'throughput': 9996.36} [INFO|2025-03-21 13:54:01] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 1.4214e-07, 'epoch': 2.90, 'throughput': 9996.31} [INFO|2025-03-21 13:54:41] logging.py:143 >> {'loss': 0.2489, 'learning_rate': 1.4065e-07, 'epoch': 2.90, 'throughput': 9996.37} [INFO|2025-03-21 13:55:22] logging.py:143 >> {'loss': 0.2360, 'learning_rate': 1.3918e-07, 'epoch': 2.90, 'throughput': 9996.40} [INFO|2025-03-21 13:56:02] logging.py:143 >> {'loss': 0.2205, 'learning_rate': 1.3770e-07, 'epoch': 2.90, 'throughput': 9996.41} [INFO|2025-03-21 13:56:43] logging.py:143 >> {'loss': 0.2434, 'learning_rate': 1.3624e-07, 'epoch': 2.90, 'throughput': 9996.39} [INFO|2025-03-21 13:57:23] logging.py:143 >> {'loss': 0.2089, 'learning_rate': 1.3479e-07, 'epoch': 2.90, 'throughput': 9996.39} [INFO|2025-03-21 13:58:04] logging.py:143 >> {'loss': 0.2161, 'learning_rate': 1.3334e-07, 'epoch': 2.90, 'throughput': 9996.42} [INFO|2025-03-21 13:58:45] logging.py:143 >> {'loss': 0.2244, 'learning_rate': 1.3190e-07, 'epoch': 2.90, 'throughput': 9996.43} [INFO|2025-03-21 13:59:25] logging.py:143 >> {'loss': 0.2462, 'learning_rate': 1.3047e-07, 'epoch': 2.90, 'throughput': 9996.43} [INFO|2025-03-21 14:00:05] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 1.2904e-07, 'epoch': 2.90, 'throughput': 9996.47} [INFO|2025-03-21 14:00:46] logging.py:143 >> {'loss': 0.2153, 'learning_rate': 1.2763e-07, 'epoch': 2.90, 'throughput': 9996.42} [INFO|2025-03-21 14:01:26] logging.py:143 >> {'loss': 0.2454, 'learning_rate': 1.2622e-07, 'epoch': 2.90, 'throughput': 9996.43} [INFO|2025-03-21 14:02:07] logging.py:143 >> {'loss': 0.2259, 'learning_rate': 1.2482e-07, 'epoch': 2.91, 'throughput': 9996.47} [INFO|2025-03-21 14:02:47] logging.py:143 >> {'loss': 0.2383, 'learning_rate': 1.2343e-07, 'epoch': 2.91, 'throughput': 9996.50} [INFO|2025-03-21 14:03:27] logging.py:143 >> {'loss': 0.2265, 'learning_rate': 1.2204e-07, 'epoch': 2.91, 'throughput': 9996.54} [INFO|2025-03-21 14:04:09] logging.py:143 >> {'loss': 0.2251, 'learning_rate': 1.2066e-07, 'epoch': 2.91, 'throughput': 9996.50} [INFO|2025-03-21 14:04:50] logging.py:143 >> {'loss': 0.2422, 'learning_rate': 1.1930e-07, 'epoch': 2.91, 'throughput': 9996.47} [INFO|2025-03-21 14:05:29] logging.py:143 >> {'loss': 0.2381, 'learning_rate': 1.1793e-07, 'epoch': 2.91, 'throughput': 9996.53} [INFO|2025-03-21 14:06:10] logging.py:143 >> {'loss': 0.2225, 'learning_rate': 1.1658e-07, 'epoch': 2.91, 'throughput': 9996.51} [INFO|2025-03-21 14:06:50] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 1.1523e-07, 'epoch': 2.91, 'throughput': 9996.55} [INFO|2025-03-21 14:07:30] logging.py:143 >> {'loss': 0.2282, 'learning_rate': 1.1390e-07, 'epoch': 2.91, 'throughput': 9996.57} [INFO|2025-03-21 14:08:11] logging.py:143 >> {'loss': 0.2148, 'learning_rate': 1.1257e-07, 'epoch': 2.91, 'throughput': 9996.57} [INFO|2025-03-21 14:08:52] logging.py:143 >> {'loss': 0.2238, 'learning_rate': 1.1124e-07, 'epoch': 2.91, 'throughput': 9996.57} [INFO|2025-03-21 14:09:32] logging.py:143 >> {'loss': 0.2202, 'learning_rate': 1.0993e-07, 'epoch': 2.91, 'throughput': 9996.62} [INFO|2025-03-21 14:10:13] logging.py:143 >> {'loss': 0.2172, 'learning_rate': 1.0862e-07, 'epoch': 2.91, 'throughput': 9996.63} [INFO|2025-03-21 14:10:53] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 1.0732e-07, 'epoch': 2.91, 'throughput': 9996.59} [INFO|2025-03-21 14:11:34] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 1.0603e-07, 'epoch': 2.91, 'throughput': 9996.59} [INFO|2025-03-21 14:12:14] logging.py:143 >> {'loss': 0.2097, 'learning_rate': 1.0475e-07, 'epoch': 2.91, 'throughput': 9996.60} [INFO|2025-03-21 14:12:55] logging.py:143 >> {'loss': 0.2160, 'learning_rate': 1.0347e-07, 'epoch': 2.91, 'throughput': 9996.58} [INFO|2025-03-21 14:13:38] logging.py:143 >> {'loss': 0.2201, 'learning_rate': 1.0220e-07, 'epoch': 2.91, 'throughput': 9996.51} [INFO|2025-03-21 14:14:18] logging.py:143 >> {'loss': 0.2371, 'learning_rate': 1.0094e-07, 'epoch': 2.91, 'throughput': 9996.53} [INFO|2025-03-21 14:14:58] logging.py:143 >> {'loss': 0.2369, 'learning_rate': 9.9692e-08, 'epoch': 2.92, 'throughput': 9996.52} [INFO|2025-03-21 14:15:37] logging.py:143 >> {'loss': 0.2222, 'learning_rate': 9.8447e-08, 'epoch': 2.92, 'throughput': 9996.55} [INFO|2025-03-21 14:16:18] logging.py:143 >> {'loss': 0.2306, 'learning_rate': 9.7210e-08, 'epoch': 2.92, 'throughput': 9996.54} [INFO|2025-03-21 14:16:59] logging.py:143 >> {'loss': 0.2194, 'learning_rate': 9.5981e-08, 'epoch': 2.92, 'throughput': 9996.56} [INFO|2025-03-21 14:17:39] logging.py:143 >> {'loss': 0.2212, 'learning_rate': 9.4760e-08, 'epoch': 2.92, 'throughput': 9996.56} [INFO|2025-03-21 14:18:20] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 9.3547e-08, 'epoch': 2.92, 'throughput': 9996.52} [INFO|2025-03-21 14:19:00] logging.py:143 >> {'loss': 0.2270, 'learning_rate': 9.2341e-08, 'epoch': 2.92, 'throughput': 9996.54} [INFO|2025-03-21 14:19:41] logging.py:143 >> {'loss': 0.2236, 'learning_rate': 9.1144e-08, 'epoch': 2.92, 'throughput': 9996.55} [INFO|2025-03-21 14:20:20] logging.py:143 >> {'loss': 0.2218, 'learning_rate': 8.9954e-08, 'epoch': 2.92, 'throughput': 9996.58} [INFO|2025-03-21 14:21:00] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 8.8772e-08, 'epoch': 2.92, 'throughput': 9996.60} [INFO|2025-03-21 14:21:40] logging.py:143 >> {'loss': 0.2416, 'learning_rate': 8.7597e-08, 'epoch': 2.92, 'throughput': 9996.61} [INFO|2025-03-21 14:22:22] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 8.6431e-08, 'epoch': 2.92, 'throughput': 9996.55} [INFO|2025-03-21 14:23:03] logging.py:143 >> {'loss': 0.1983, 'learning_rate': 8.5272e-08, 'epoch': 2.92, 'throughput': 9996.56} [INFO|2025-03-21 14:23:43] logging.py:143 >> {'loss': 0.2284, 'learning_rate': 8.4121e-08, 'epoch': 2.92, 'throughput': 9996.57} [INFO|2025-03-21 14:24:22] logging.py:143 >> {'loss': 0.2580, 'learning_rate': 8.2978e-08, 'epoch': 2.92, 'throughput': 9996.60} [INFO|2025-03-21 14:25:01] logging.py:143 >> {'loss': 0.2303, 'learning_rate': 8.1843e-08, 'epoch': 2.92, 'throughput': 9996.64} [INFO|2025-03-21 14:25:42] logging.py:143 >> {'loss': 0.2284, 'learning_rate': 8.0715e-08, 'epoch': 2.92, 'throughput': 9996.65} [INFO|2025-03-21 14:26:23] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 7.9596e-08, 'epoch': 2.92, 'throughput': 9996.67} [INFO|2025-03-21 14:27:04] logging.py:143 >> {'loss': 0.2011, 'learning_rate': 7.8484e-08, 'epoch': 2.92, 'throughput': 9996.69} [INFO|2025-03-21 14:27:46] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 7.7380e-08, 'epoch': 2.93, 'throughput': 9996.63} [INFO|2025-03-21 14:28:26] logging.py:143 >> {'loss': 0.2051, 'learning_rate': 7.6284e-08, 'epoch': 2.93, 'throughput': 9996.63} [INFO|2025-03-21 14:29:06] logging.py:143 >> {'loss': 0.2132, 'learning_rate': 7.5195e-08, 'epoch': 2.93, 'throughput': 9996.69} [INFO|2025-03-21 14:29:47] logging.py:143 >> {'loss': 0.2319, 'learning_rate': 7.4115e-08, 'epoch': 2.93, 'throughput': 9996.69} [INFO|2025-03-21 14:30:27] logging.py:143 >> {'loss': 0.2253, 'learning_rate': 7.3042e-08, 'epoch': 2.93, 'throughput': 9996.76} [INFO|2025-03-21 14:31:07] logging.py:143 >> {'loss': 0.2178, 'learning_rate': 7.1977e-08, 'epoch': 2.93, 'throughput': 9996.81} [INFO|2025-03-21 14:31:50] logging.py:143 >> {'loss': 0.2396, 'learning_rate': 7.0920e-08, 'epoch': 2.93, 'throughput': 9996.73} [INFO|2025-03-21 14:32:31] logging.py:143 >> {'loss': 0.2221, 'learning_rate': 6.9870e-08, 'epoch': 2.93, 'throughput': 9996.72} [INFO|2025-03-21 14:33:10] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 6.8829e-08, 'epoch': 2.93, 'throughput': 9996.78} [INFO|2025-03-21 14:33:50] logging.py:143 >> {'loss': 0.2307, 'learning_rate': 6.7795e-08, 'epoch': 2.93, 'throughput': 9996.81} [INFO|2025-03-21 14:34:30] logging.py:143 >> {'loss': 0.2276, 'learning_rate': 6.6769e-08, 'epoch': 2.93, 'throughput': 9996.85} [INFO|2025-03-21 14:35:11] logging.py:143 >> {'loss': 0.2151, 'learning_rate': 6.5751e-08, 'epoch': 2.93, 'throughput': 9996.83} [INFO|2025-03-21 14:35:50] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 6.4740e-08, 'epoch': 2.93, 'throughput': 9996.85} [INFO|2025-03-21 14:36:30] logging.py:143 >> {'loss': 0.2270, 'learning_rate': 6.3738e-08, 'epoch': 2.93, 'throughput': 9996.85} [INFO|2025-03-21 14:37:11] logging.py:143 >> {'loss': 0.2096, 'learning_rate': 6.2743e-08, 'epoch': 2.93, 'throughput': 9996.81} [INFO|2025-03-21 14:37:52] logging.py:143 >> {'loss': 0.2138, 'learning_rate': 6.1756e-08, 'epoch': 2.93, 'throughput': 9996.79} [INFO|2025-03-21 14:38:33] logging.py:143 >> {'loss': 0.2205, 'learning_rate': 6.0777e-08, 'epoch': 2.93, 'throughput': 9996.81} [INFO|2025-03-21 14:39:15] logging.py:143 >> {'loss': 0.2389, 'learning_rate': 5.9806e-08, 'epoch': 2.93, 'throughput': 9996.82} [INFO|2025-03-21 14:39:55] logging.py:143 >> {'loss': 0.2233, 'learning_rate': 5.8843e-08, 'epoch': 2.93, 'throughput': 9996.83} [INFO|2025-03-21 14:40:36] logging.py:143 >> {'loss': 0.2192, 'learning_rate': 5.7887e-08, 'epoch': 2.94, 'throughput': 9996.84} [INFO|2025-03-21 14:41:16] logging.py:143 >> {'loss': 0.2456, 'learning_rate': 5.6939e-08, 'epoch': 2.94, 'throughput': 9996.86} [INFO|2025-03-21 14:41:57] logging.py:143 >> {'loss': 0.2179, 'learning_rate': 5.5999e-08, 'epoch': 2.94, 'throughput': 9996.89} [INFO|2025-03-21 14:42:38] logging.py:143 >> {'loss': 0.2317, 'learning_rate': 5.5067e-08, 'epoch': 2.94, 'throughput': 9996.94} [INFO|2025-03-21 14:43:19] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 5.4143e-08, 'epoch': 2.94, 'throughput': 9996.96} [INFO|2025-03-21 14:43:59] logging.py:143 >> {'loss': 0.2398, 'learning_rate': 5.3226e-08, 'epoch': 2.94, 'throughput': 9997.01} [INFO|2025-03-21 14:44:38] logging.py:143 >> {'loss': 0.2041, 'learning_rate': 5.2317e-08, 'epoch': 2.94, 'throughput': 9997.05} [INFO|2025-03-21 14:45:19] logging.py:143 >> {'loss': 0.2111, 'learning_rate': 5.1416e-08, 'epoch': 2.94, 'throughput': 9996.98} [INFO|2025-03-21 14:46:00] logging.py:143 >> {'loss': 0.2216, 'learning_rate': 5.0523e-08, 'epoch': 2.94, 'throughput': 9997.02} [INFO|2025-03-21 14:46:41] logging.py:143 >> {'loss': 0.2540, 'learning_rate': 4.9638e-08, 'epoch': 2.94, 'throughput': 9997.08} [INFO|2025-03-21 14:47:22] logging.py:143 >> {'loss': 0.2158, 'learning_rate': 4.8761e-08, 'epoch': 2.94, 'throughput': 9997.06} [INFO|2025-03-21 14:48:03] logging.py:143 >> {'loss': 0.2540, 'learning_rate': 4.7891e-08, 'epoch': 2.94, 'throughput': 9997.09} [INFO|2025-03-21 14:48:43] logging.py:143 >> {'loss': 0.2298, 'learning_rate': 4.7029e-08, 'epoch': 2.94, 'throughput': 9997.14} [INFO|2025-03-21 14:49:24] logging.py:143 >> {'loss': 0.2386, 'learning_rate': 4.6175e-08, 'epoch': 2.94, 'throughput': 9997.13} [INFO|2025-03-21 14:50:04] logging.py:143 >> {'loss': 0.2092, 'learning_rate': 4.5329e-08, 'epoch': 2.94, 'throughput': 9997.15} [INFO|2025-03-21 14:50:45] logging.py:143 >> {'loss': 0.2163, 'learning_rate': 4.4490e-08, 'epoch': 2.94, 'throughput': 9997.14} [INFO|2025-03-21 14:51:25] logging.py:143 >> {'loss': 0.2365, 'learning_rate': 4.3660e-08, 'epoch': 2.94, 'throughput': 9997.18} [INFO|2025-03-21 14:52:05] logging.py:143 >> {'loss': 0.2130, 'learning_rate': 4.2837e-08, 'epoch': 2.94, 'throughput': 9997.21} [INFO|2025-03-21 14:52:46] logging.py:143 >> {'loss': 0.2337, 'learning_rate': 4.2022e-08, 'epoch': 2.95, 'throughput': 9997.22} [INFO|2025-03-21 14:53:26] logging.py:143 >> {'loss': 0.2239, 'learning_rate': 4.1215e-08, 'epoch': 2.95, 'throughput': 9997.25} [INFO|2025-03-21 14:54:04] logging.py:143 >> {'loss': 0.2141, 'learning_rate': 4.0416e-08, 'epoch': 2.95, 'throughput': 9997.31} [INFO|2025-03-21 14:54:45] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 3.9624e-08, 'epoch': 2.95, 'throughput': 9997.33} [INFO|2025-03-21 14:55:25] logging.py:143 >> {'loss': 0.2166, 'learning_rate': 3.8841e-08, 'epoch': 2.95, 'throughput': 9997.36} [INFO|2025-03-21 14:56:04] logging.py:143 >> {'loss': 0.2476, 'learning_rate': 3.8065e-08, 'epoch': 2.95, 'throughput': 9997.36} [INFO|2025-03-21 14:56:46] logging.py:143 >> {'loss': 0.2155, 'learning_rate': 3.7297e-08, 'epoch': 2.95, 'throughput': 9997.27} [INFO|2025-03-21 14:57:27] logging.py:143 >> {'loss': 0.2094, 'learning_rate': 3.6537e-08, 'epoch': 2.95, 'throughput': 9997.27} [INFO|2025-03-21 14:58:07] logging.py:143 >> {'loss': 0.2215, 'learning_rate': 3.5784e-08, 'epoch': 2.95, 'throughput': 9997.28} [INFO|2025-03-21 14:58:48] logging.py:143 >> {'loss': 0.2520, 'learning_rate': 3.5040e-08, 'epoch': 2.95, 'throughput': 9997.29} [INFO|2025-03-21 14:59:27] logging.py:143 >> {'loss': 0.2392, 'learning_rate': 3.4303e-08, 'epoch': 2.95, 'throughput': 9997.31} [INFO|2025-03-21 15:00:07] logging.py:143 >> {'loss': 0.2504, 'learning_rate': 3.3574e-08, 'epoch': 2.95, 'throughput': 9997.29} [INFO|2025-03-21 15:00:47] logging.py:143 >> {'loss': 0.2408, 'learning_rate': 3.2853e-08, 'epoch': 2.95, 'throughput': 9997.26} [INFO|2025-03-21 15:01:28] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 3.2140e-08, 'epoch': 2.95, 'throughput': 9997.28} [INFO|2025-03-21 15:02:09] logging.py:143 >> {'loss': 0.2449, 'learning_rate': 3.1435e-08, 'epoch': 2.95, 'throughput': 9997.25} [INFO|2025-03-21 15:02:50] logging.py:143 >> {'loss': 0.2294, 'learning_rate': 3.0737e-08, 'epoch': 2.95, 'throughput': 9997.30} [INFO|2025-03-21 15:03:30] logging.py:143 >> {'loss': 0.2304, 'learning_rate': 3.0047e-08, 'epoch': 2.95, 'throughput': 9997.27} [INFO|2025-03-21 15:04:12] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 2.9365e-08, 'epoch': 2.95, 'throughput': 9997.27} [INFO|2025-03-21 15:04:53] logging.py:143 >> {'loss': 0.2239, 'learning_rate': 2.8691e-08, 'epoch': 2.95, 'throughput': 9997.26} [INFO|2025-03-21 15:05:33] logging.py:143 >> {'loss': 0.2393, 'learning_rate': 2.8025e-08, 'epoch': 2.96, 'throughput': 9997.24} [INFO|2025-03-21 15:06:13] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 2.7366e-08, 'epoch': 2.96, 'throughput': 9997.21} [INFO|2025-03-21 15:06:53] logging.py:143 >> {'loss': 0.2351, 'learning_rate': 2.6716e-08, 'epoch': 2.96, 'throughput': 9997.22} [INFO|2025-03-21 15:07:33] logging.py:143 >> {'loss': 0.2200, 'learning_rate': 2.6073e-08, 'epoch': 2.96, 'throughput': 9997.20} [INFO|2025-03-21 15:08:15] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 2.5438e-08, 'epoch': 2.96, 'throughput': 9997.13} [INFO|2025-03-21 15:08:57] logging.py:143 >> {'loss': 0.2448, 'learning_rate': 2.4811e-08, 'epoch': 2.96, 'throughput': 9997.12} [INFO|2025-03-21 15:09:37] logging.py:143 >> {'loss': 0.2356, 'learning_rate': 2.4191e-08, 'epoch': 2.96, 'throughput': 9997.19} [INFO|2025-03-21 15:10:16] logging.py:143 >> {'loss': 0.2379, 'learning_rate': 2.3580e-08, 'epoch': 2.96, 'throughput': 9997.22} [INFO|2025-03-21 15:10:58] logging.py:143 >> {'loss': 0.2290, 'learning_rate': 2.2976e-08, 'epoch': 2.96, 'throughput': 9997.21} [INFO|2025-03-21 15:11:39] logging.py:143 >> {'loss': 0.2290, 'learning_rate': 2.2380e-08, 'epoch': 2.96, 'throughput': 9997.19} [INFO|2025-03-21 15:12:19] logging.py:143 >> {'loss': 0.2254, 'learning_rate': 2.1792e-08, 'epoch': 2.96, 'throughput': 9997.22} [INFO|2025-03-21 15:13:02] logging.py:143 >> {'loss': 0.2340, 'learning_rate': 2.1212e-08, 'epoch': 2.96, 'throughput': 9997.19} [INFO|2025-03-21 15:13:42] logging.py:143 >> {'loss': 0.2342, 'learning_rate': 2.0639e-08, 'epoch': 2.96, 'throughput': 9997.17} [INFO|2025-03-21 15:14:22] logging.py:143 >> {'loss': 0.2270, 'learning_rate': 2.0075e-08, 'epoch': 2.96, 'throughput': 9997.21} [INFO|2025-03-21 15:15:02] logging.py:143 >> {'loss': 0.2302, 'learning_rate': 1.9518e-08, 'epoch': 2.96, 'throughput': 9997.27} [INFO|2025-03-21 15:15:43] logging.py:143 >> {'loss': 0.2343, 'learning_rate': 1.8969e-08, 'epoch': 2.96, 'throughput': 9997.29} [INFO|2025-03-21 15:16:23] logging.py:143 >> {'loss': 0.2181, 'learning_rate': 1.8428e-08, 'epoch': 2.96, 'throughput': 9997.34} [INFO|2025-03-21 15:17:03] logging.py:143 >> {'loss': 0.2100, 'learning_rate': 1.7895e-08, 'epoch': 2.96, 'throughput': 9997.29} [INFO|2025-03-21 15:17:42] logging.py:143 >> {'loss': 0.2347, 'learning_rate': 1.7369e-08, 'epoch': 2.96, 'throughput': 9997.33} [INFO|2025-03-21 15:18:23] logging.py:143 >> {'loss': 0.2326, 'learning_rate': 1.6852e-08, 'epoch': 2.97, 'throughput': 9997.36} [INFO|2025-03-21 15:19:03] logging.py:143 >> {'loss': 0.2201, 'learning_rate': 1.6342e-08, 'epoch': 2.97, 'throughput': 9997.38} [INFO|2025-03-21 15:19:44] logging.py:143 >> {'loss': 0.2452, 'learning_rate': 1.5840e-08, 'epoch': 2.97, 'throughput': 9997.36} [INFO|2025-03-21 15:20:24] logging.py:143 >> {'loss': 0.2402, 'learning_rate': 1.5346e-08, 'epoch': 2.97, 'throughput': 9997.35} [INFO|2025-03-21 15:21:05] logging.py:143 >> {'loss': 0.2538, 'learning_rate': 1.4859e-08, 'epoch': 2.97, 'throughput': 9997.34} [INFO|2025-03-21 15:21:47] logging.py:143 >> {'loss': 0.2445, 'learning_rate': 1.4381e-08, 'epoch': 2.97, 'throughput': 9997.35} [INFO|2025-03-21 15:22:28] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 1.3910e-08, 'epoch': 2.97, 'throughput': 9997.38} [INFO|2025-03-21 15:23:09] logging.py:143 >> {'loss': 0.2262, 'learning_rate': 1.3447e-08, 'epoch': 2.97, 'throughput': 9997.41} [INFO|2025-03-21 15:23:50] logging.py:143 >> {'loss': 0.2221, 'learning_rate': 1.2992e-08, 'epoch': 2.97, 'throughput': 9997.41} [INFO|2025-03-21 15:24:30] logging.py:143 >> {'loss': 0.2140, 'learning_rate': 1.2545e-08, 'epoch': 2.97, 'throughput': 9997.38} [INFO|2025-03-21 15:25:10] logging.py:143 >> {'loss': 0.2214, 'learning_rate': 1.2106e-08, 'epoch': 2.97, 'throughput': 9997.39} [INFO|2025-03-21 15:25:49] logging.py:143 >> {'loss': 0.2239, 'learning_rate': 1.1674e-08, 'epoch': 2.97, 'throughput': 9997.43} [INFO|2025-03-21 15:26:30] logging.py:143 >> {'loss': 0.2151, 'learning_rate': 1.1251e-08, 'epoch': 2.97, 'throughput': 9997.43} [INFO|2025-03-21 15:27:09] logging.py:143 >> {'loss': 0.2234, 'learning_rate': 1.0835e-08, 'epoch': 2.97, 'throughput': 9997.46} [INFO|2025-03-21 15:27:49] logging.py:143 >> {'loss': 0.2337, 'learning_rate': 1.0427e-08, 'epoch': 2.97, 'throughput': 9997.41} [INFO|2025-03-21 15:28:31] logging.py:143 >> {'loss': 0.2220, 'learning_rate': 1.0027e-08, 'epoch': 2.97, 'throughput': 9997.39} [INFO|2025-03-21 15:29:12] logging.py:143 >> {'loss': 0.2390, 'learning_rate': 9.6342e-09, 'epoch': 2.97, 'throughput': 9997.35} [INFO|2025-03-21 15:29:52] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 9.2497e-09, 'epoch': 2.97, 'throughput': 9997.37} [INFO|2025-03-21 15:30:32] logging.py:143 >> {'loss': 0.2486, 'learning_rate': 8.8730e-09, 'epoch': 2.97, 'throughput': 9997.34} [INFO|2025-03-21 15:31:12] logging.py:143 >> {'loss': 0.2313, 'learning_rate': 8.5041e-09, 'epoch': 2.98, 'throughput': 9997.39} [INFO|2025-03-21 15:31:53] logging.py:143 >> {'loss': 0.2314, 'learning_rate': 8.1431e-09, 'epoch': 2.98, 'throughput': 9997.39} [INFO|2025-03-21 15:32:33] logging.py:143 >> {'loss': 0.2211, 'learning_rate': 7.7898e-09, 'epoch': 2.98, 'throughput': 9997.38} [INFO|2025-03-21 15:33:12] logging.py:143 >> {'loss': 0.2269, 'learning_rate': 7.4445e-09, 'epoch': 2.98, 'throughput': 9997.42} [INFO|2025-03-21 15:33:52] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 7.1069e-09, 'epoch': 2.98, 'throughput': 9997.48} [INFO|2025-03-21 15:34:32] logging.py:143 >> {'loss': 0.2371, 'learning_rate': 6.7772e-09, 'epoch': 2.98, 'throughput': 9997.46} [INFO|2025-03-21 15:35:13] logging.py:143 >> {'loss': 0.2420, 'learning_rate': 6.4553e-09, 'epoch': 2.98, 'throughput': 9997.43} [INFO|2025-03-21 15:35:54] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 6.1412e-09, 'epoch': 2.98, 'throughput': 9997.42} [INFO|2025-03-21 15:36:34] logging.py:143 >> {'loss': 0.2496, 'learning_rate': 5.8350e-09, 'epoch': 2.98, 'throughput': 9997.44} [INFO|2025-03-21 15:37:15] logging.py:143 >> {'loss': 0.2141, 'learning_rate': 5.5366e-09, 'epoch': 2.98, 'throughput': 9997.44} [INFO|2025-03-21 15:37:55] logging.py:143 >> {'loss': 0.2301, 'learning_rate': 5.2460e-09, 'epoch': 2.98, 'throughput': 9997.43} [INFO|2025-03-21 15:38:35] logging.py:143 >> {'loss': 0.2268, 'learning_rate': 4.9633e-09, 'epoch': 2.98, 'throughput': 9997.45} [INFO|2025-03-21 15:39:16] logging.py:143 >> {'loss': 0.2332, 'learning_rate': 4.6884e-09, 'epoch': 2.98, 'throughput': 9997.49} [INFO|2025-03-21 15:39:56] logging.py:143 >> {'loss': 0.2566, 'learning_rate': 4.4213e-09, 'epoch': 2.98, 'throughput': 9997.53} [INFO|2025-03-21 15:40:37] logging.py:143 >> {'loss': 0.2497, 'learning_rate': 4.1620e-09, 'epoch': 2.98, 'throughput': 9997.56} [INFO|2025-03-21 15:41:19] logging.py:143 >> {'loss': 0.2283, 'learning_rate': 3.9106e-09, 'epoch': 2.98, 'throughput': 9997.52} [INFO|2025-03-21 15:41:59] logging.py:143 >> {'loss': 0.2315, 'learning_rate': 3.6670e-09, 'epoch': 2.98, 'throughput': 9997.56} [INFO|2025-03-21 15:42:38] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 3.4313e-09, 'epoch': 2.98, 'throughput': 9997.56} [INFO|2025-03-21 15:43:20] logging.py:143 >> {'loss': 0.2213, 'learning_rate': 3.2033e-09, 'epoch': 2.98, 'throughput': 9997.57} [INFO|2025-03-21 15:43:59] logging.py:143 >> {'loss': 0.2238, 'learning_rate': 2.9833e-09, 'epoch': 2.99, 'throughput': 9997.60} [INFO|2025-03-21 15:44:39] logging.py:143 >> {'loss': 0.2175, 'learning_rate': 2.7710e-09, 'epoch': 2.99, 'throughput': 9997.58} [INFO|2025-03-21 15:45:20] logging.py:143 >> {'loss': 0.2204, 'learning_rate': 2.5666e-09, 'epoch': 2.99, 'throughput': 9997.56} [INFO|2025-03-21 15:45:59] logging.py:143 >> {'loss': 0.2466, 'learning_rate': 2.3700e-09, 'epoch': 2.99, 'throughput': 9997.59} [INFO|2025-03-21 15:46:39] logging.py:143 >> {'loss': 0.2435, 'learning_rate': 2.1812e-09, 'epoch': 2.99, 'throughput': 9997.58} [INFO|2025-03-21 15:47:20] logging.py:143 >> {'loss': 0.2368, 'learning_rate': 2.0003e-09, 'epoch': 2.99, 'throughput': 9997.55} [INFO|2025-03-21 15:48:00] logging.py:143 >> {'loss': 0.2157, 'learning_rate': 1.8272e-09, 'epoch': 2.99, 'throughput': 9997.56} [INFO|2025-03-21 15:48:39] logging.py:143 >> {'loss': 0.2228, 'learning_rate': 1.6619e-09, 'epoch': 2.99, 'throughput': 9997.53} [INFO|2025-03-21 15:49:19] logging.py:143 >> {'loss': 0.2518, 'learning_rate': 1.5045e-09, 'epoch': 2.99, 'throughput': 9997.59} [INFO|2025-03-21 15:50:00] logging.py:143 >> {'loss': 0.2250, 'learning_rate': 1.3549e-09, 'epoch': 2.99, 'throughput': 9997.66} [INFO|2025-03-21 15:50:42] logging.py:143 >> {'loss': 0.2481, 'learning_rate': 1.2131e-09, 'epoch': 2.99, 'throughput': 9997.61} [INFO|2025-03-21 15:51:23] logging.py:143 >> {'loss': 0.2437, 'learning_rate': 1.0792e-09, 'epoch': 2.99, 'throughput': 9997.59} [INFO|2025-03-21 15:52:03] logging.py:143 >> {'loss': 0.2090, 'learning_rate': 9.5308e-10, 'epoch': 2.99, 'throughput': 9997.61} [INFO|2025-03-21 15:52:44] logging.py:143 >> {'loss': 0.2224, 'learning_rate': 8.3480e-10, 'epoch': 2.99, 'throughput': 9997.59} [INFO|2025-03-21 15:53:25] logging.py:143 >> {'loss': 0.2258, 'learning_rate': 7.2436e-10, 'epoch': 2.99, 'throughput': 9997.55} [INFO|2025-03-21 15:54:04] logging.py:143 >> {'loss': 0.2330, 'learning_rate': 6.2176e-10, 'epoch': 2.99, 'throughput': 9997.55} [INFO|2025-03-21 15:54:45] logging.py:143 >> {'loss': 0.2417, 'learning_rate': 5.2698e-10, 'epoch': 2.99, 'throughput': 9997.55} [INFO|2025-03-21 15:55:25] logging.py:143 >> {'loss': 0.2426, 'learning_rate': 4.4004e-10, 'epoch': 2.99, 'throughput': 9997.55} [INFO|2025-03-21 15:56:06] logging.py:143 >> {'loss': 0.2366, 'learning_rate': 3.6093e-10, 'epoch': 2.99, 'throughput': 9997.61} [INFO|2025-03-21 15:56:45] logging.py:143 >> {'loss': 0.2206, 'learning_rate': 2.8965e-10, 'epoch': 3.00, 'throughput': 9997.62} [INFO|2025-03-21 15:57:26] logging.py:143 >> {'loss': 0.2288, 'learning_rate': 2.2621e-10, 'epoch': 3.00, 'throughput': 9997.62} [INFO|2025-03-21 15:58:09] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 1.7060e-10, 'epoch': 3.00, 'throughput': 9997.55} [INFO|2025-03-21 15:58:49] logging.py:143 >> {'loss': 0.2198, 'learning_rate': 1.2282e-10, 'epoch': 3.00, 'throughput': 9997.59} [INFO|2025-03-21 15:59:31] logging.py:143 >> {'loss': 0.2331, 'learning_rate': 8.2870e-11, 'epoch': 3.00, 'throughput': 9997.60} [INFO|2025-03-21 16:00:12] logging.py:143 >> {'loss': 0.2241, 'learning_rate': 5.0756e-11, 'epoch': 3.00, 'throughput': 9997.61} [INFO|2025-03-21 16:00:53] logging.py:143 >> {'loss': 0.2335, 'learning_rate': 2.6475e-11, 'epoch': 3.00, 'throughput': 9997.59} [INFO|2025-03-21 16:01:32] logging.py:143 >> {'loss': 0.2293, 'learning_rate': 1.0026e-11, 'epoch': 3.00, 'throughput': 9997.68} [INFO|2025-03-21 16:02:12] logging.py:143 >> {'loss': 0.2142, 'learning_rate': 1.4099e-12, 'epoch': 3.00, 'throughput': 9997.69} [INFO|2025-03-21 16:02:40] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263 [INFO|2025-03-21 16:02:40] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263/config.json [INFO|2025-03-21 16:02:40] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263/generation_config.json [INFO|2025-03-21 16:02:56] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263/model.safetensors.index.json. [INFO|2025-03-21 16:02:56] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263/tokenizer_config.json [INFO|2025-03-21 16:02:56] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/checkpoint-28263/special_tokens_map.json [INFO|2025-03-21 16:03:20] trainer.py:2657 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|2025-03-21 16:03:24] trainer.py:3942 >> Saving model checkpoint to /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09 [INFO|2025-03-21 16:03:24] configuration_utils.py:423 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/config.json [INFO|2025-03-21 16:03:24] configuration_utils.py:909 >> Configuration saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/generation_config.json [INFO|2025-03-21 16:03:39] modeling_utils.py:3048 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/model.safetensors.index.json. [INFO|2025-03-21 16:03:39] tokenization_utils_base.py:2500 >> tokenizer config file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/tokenizer_config.json [INFO|2025-03-21 16:03:39] tokenization_utils_base.py:2509 >> Special tokens file saved in /usr1/data/weiweis/QG/models/train_2025-03-19-00-21-09/special_tokens_map.json [WARNING|2025-03-21 16:03:40] logging.py:148 >> No metric eval_loss to plot. [WARNING|2025-03-21 16:03:40] logging.py:148 >> No metric eval_accuracy to plot. [INFO|2025-03-21 16:03:40] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}