[2024-07-08 19:15:52,470] torch.distributed.run: [WARNING] [2024-07-08 19:15:52,470] torch.distributed.run: [WARNING] ***************************************** [2024-07-08 19:15:52,470] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2024-07-08 19:15:52,470] torch.distributed.run: [WARNING] ***************************************** 07/08/2024 19:15:59 - WARNING - __main__ - Process rank: 6, device: cuda:6, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:15:59 - WARNING - __main__ - Process rank: 5, device: cuda:5, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - INFO - __main__ - Training parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, analysis_dataset=bbh, analysis_mode=False, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=3, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[, ], fsdp_config={'fsdp_transformer_layer_cls_to_wrap': ['LlamaDecoderLayer'], 'fsdp_backward_prefetch': 'backward_pre', 'limit_all_gathers': 'true', 'use_orig_params': 'true', 'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=16, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=../out/llama3-8b-inst-p0.05-lora-seed3/runs/Jul08_19-15-55_dlc1apybk6l37ai7-master-0, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=4.0, optim=adamw_torch, optim_args=None, output_dir=../out/llama3-8b-inst-p0.05-lora-seed3, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=../out/llama3-8b-inst-p0.05-lora-seed3, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=epoch, save_total_limit=None, seed=0, skip_memory_metrics=True, split_batches=False, tf32=False, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, train_dataset_names=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.03, warmup_steps=0, weight_decay=0.0, ) 07/08/2024 19:16:00 - INFO - __main__ - Model parameters ModelArguments(model_name_or_path='/mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct', config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, lora=True, lora_r=128, lora_alpha=512.0, lora_dropout=0.1, lora_target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj']) 07/08/2024 19:16:00 - INFO - __main__ - Dataset parameters DataArguments(train_files=['./data/train/processed/hermes.jsonl'], overwrite_cache=False, preprocessing_num_workers=None, max_seq_length=2048, sample_data_seed=42, percentage=0.05) [INFO|tokenization_utils_base.py:2024] 2024-07-08 19:16:00,014 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2024] 2024-07-08 19:16:00,014 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2024] 2024-07-08 19:16:00,014 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2024] 2024-07-08 19:16:00,014 >> loading file tokenizer_config.json 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 7, device: cuda:7, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 4, device: cuda:4, n_gpu: 1distributed training: True, 16-bits training: False 07/08/2024 19:16:00 - WARNING - __main__ - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False [WARNING|logging.py:314] 2024-07-08 19:16:01,232 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,232 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,237 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,244 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,250 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,253 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,254 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [WARNING|logging.py:314] 2024-07-08 19:16:01,271 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. ['./data/train/processed/hermes.jsonl'] hermes ['context', 'prefix', 'suffix', 'sft_index', 'reward', 'dataset', 'category', 'source', 'messages', 'id'] ['./data/train/processed/hermes.jsonl'] hermes ['context', 'prefix', 'suffix', 'sft_index', 'reward', 'dataset', 'category', 'source', 'messages', 'id'] ['./data/train/processed/hermes.jsonl'] hermes ['context', 'prefix', 'suffix', 'sft_index', 'reward', 'dataset', 'category', 'source', 'messages', 'id'] Using custom data configuration default-1b97820c651401bc 07/08/2024 19:16:02 - INFO - datasets.builder - Using custom data configuration default-1b97820c651401bc Loading Dataset Infos from /home/pai/envs/less/lib/python3.10/site-packages/datasets/packaged_modules/json 07/08/2024 19:16:02 - INFO - datasets.info - Loading Dataset Infos from /home/pai/envs/less/lib/python3.10/site-packages/datasets/packaged_modules/json Overwrite dataset info from restored data version if exists. 07/08/2024 19:16:02 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7 07/08/2024 19:16:02 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7 Found cached dataset json (/root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7) 07/08/2024 19:16:02 - INFO - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7) Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7 07/08/2024 19:16:02 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7 ['./data/train/processed/hermes.jsonl'] hermes ['context', 'prefix', 'suffix', 'sft_index', 'reward', 'dataset', 'category', 'source', 'messages', 'id'] Tokenizing and reformatting instruction data (num_proc=10): 0%| | 0/50413 [00:00> loading configuration file /mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct/config.json [INFO|configuration_utils.py:802] 2024-07-08 19:16:33,744 >> Model config LlamaConfig { "_name_or_path": "/mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "transformers_version": "4.36.2", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3341] 2024-07-08 19:16:33,773 >> loading weights file /mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct/model.safetensors.index.json [INFO|configuration_utils.py:826] 2024-07-08 19:16:33,775 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } Tokenizing and reformatting instruction data (num_proc=10): 79%|███████▉ | 39841/50413 [00:20<00:02, 4759.42 examples/s] Tokenizing and reformatting instruction data (num_proc=10): 80%|████████ | 40376/50413 [00:20<00:02, 4869.08 examples/s] Loading checkpoint shards: 0%| | 0/4 [00:00> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4193] 2024-07-08 19:19:10,578 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 32.84s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 39.31s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:36<00:00, 32.79s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:36<00:00, 39.22s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:32<00:00, 32.28s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:32<00:00, 38.25s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 32.88s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 39.39s/it] [INFO|configuration_utils.py:779] 2024-07-08 19:19:10,581 >> loading configuration file /mnt/data_large/ccy/tmp/tmp2/Meta-Llama-3-8B-Instruct/generation_config.json [INFO|configuration_utils.py:826] 2024-07-08 19:19:10,581 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128009 ], "max_length": 4096, "temperature": 0.6, "top_p": 0.9 } Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 33.01s/it] Loading checkpoint shards: 100%|██████████| 4/4 [02:37<00:00, 39.47s/it] [INFO|modeling_utils.py:1813] 2024-07-08 19:19:10,622 >> You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 128257. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc 07/08/2024 19:19:25 - INFO - __main__ - Applied LoRA to model. trainable params: 109,051,904 || all params: 8,139,321,344 || trainable%: 1.3398156847608547 Map: 0%| | 0/50413 [00:00 -1).sum() Caching processed dataset at /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7/cache-2da034c65ea96ef2.arrow 07/08/2024 19:19:25 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/json/default-1b97820c651401bc/0.0.0/c8d2d9508a2a2067ab02cd118834ecef34c3700d143b31835ec4235bf10109f7/cache-2da034c65ea96ef2.arrow trainable params: 109,051,904 || all params: 8,139,321,344 || trainable%: 1.3398156847608547 Map: 0%| | 0/50413 [00:00 -1).sum() /mnt/data_local/tmp/llama/LESS/less/train/data_arguments.py:50: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). com_len = (torch.tensor(labels) > -1).sum() Map: 2%|▏ | 1000/50413 [00:00<00:08, 5703.60 examples/s] Map: 2%|▏ | 1000/50413 [00:00<00:08, 5624.59 examples/s]/mnt/data_local/tmp/llama/LESS/less/train/data_arguments.py:50: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). com_len = (torch.tensor(labels) > -1).sum() /mnt/data_local/tmp/llama/LESS/less/train/data_arguments.py:50: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). com_len = (torch.tensor(labels) > -1).sum() trainable params: 109,051,904 || all params: 8,139,321,344 || trainable%: 1.3398156847608547 Map: 0%| | 0/50413 [00:00 -1).sum() Map: 2%|▏ | 1000/50413 [00:00<00:08, 5763.27 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5689.62 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5969.49 examples/s]/mnt/data_local/tmp/llama/LESS/less/train/data_arguments.py:50: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). com_len = (torch.tensor(labels) > -1).sum() Map: 4%|▍ | 2000/50413 [00:00<00:08, 5895.28 examples/s] Map: 2%|▏ | 1000/50413 [00:00<00:08, 5807.69 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5986.38 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5967.48 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5987.56 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:08, 5849.06 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 6101.35 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 6006.44 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5982.62 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 6055.72 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 6058.77 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 6081.84 examples/s] Map: 8%|▊ | 4000/50413 [00:00<00:07, 5901.79 examples/s]trainable params: 109,051,904 || all params: 8,139,321,344 || trainable%: 1.3398156847608547 Map: 0%| | 0/50413 [00:00 -1).sum() Map: 2%|▏ | 1000/50413 [00:00<00:08, 5744.77 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 5974.95 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 6048.87 examples/s] Map: 8%|▊ | 4000/50413 [00:00<00:07, 6052.87 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 6094.90 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 6042.71 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 6096.90 examples/s] Map: 12%|█▏ | 6000/50413 [00:01<00:07, 5884.40 examples/s] Map: 4%|▍ | 2000/50413 [00:00<00:08, 5908.10 examples/s] Map: 12%|█▏ | 6000/50413 [00:01<00:07, 5985.06 examples/s] Map: 12%|█▏ | 6000/50413 [00:01<00:07, 5999.91 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 6036.94 examples/s] Map: 12%|█▏ | 6000/50413 [00:00<00:07, 6049.08 examples/s] Map: 12%|█▏ | 6000/50413 [00:00<00:07, 5997.85 examples/s] Map: 12%|█▏ | 6000/50413 [00:00<00:07, 6044.37 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 5912.51 examples/s] Map: 6%|▌ | 3000/50413 [00:00<00:07, 5977.74 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6032.05 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6031.07 examples/s] Map: 12%|█▏ | 6000/50413 [00:00<00:07, 5999.63 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6067.35 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6012.54 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6062.05 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 5890.12 examples/s] Map: 8%|▊ | 4000/50413 [00:00<00:07, 5982.67 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 6058.58 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 6029.76 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 6020.04 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:06, 6062.30 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 6015.45 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 6058.76 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:07, 5900.44 examples/s] Map: 10%|▉ | 5000/50413 [00:00<00:07, 5995.04 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6052.88 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6006.71 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 6025.99 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6055.56 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6004.52 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6047.30 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 5931.24 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 6065.67 examples/s] Map: 12%|█▏ | 6000/50413 [00:01<00:07, 5937.08 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 6035.17 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 6025.59 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 6060.59 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 5995.34 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 6042.07 examples/s] Map: 14%|█▍ | 7000/50413 [00:01<00:07, 5955.73 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 6055.84 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4889.27 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4785.01 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4859.83 examples/s] Map: 16%|█▌ | 8000/50413 [00:01<00:07, 5955.10 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4904.79 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4868.28 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:08, 4923.60 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5168.29 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5113.16 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5153.41 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:07, 4979.52 examples/s] Map: 18%|█▊ | 9000/50413 [00:01<00:06, 5949.20 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5199.51 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5165.11 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5219.53 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5398.08 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5393.25 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5399.99 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5262.65 examples/s] Map: 20%|█▉ | 10000/50413 [00:01<00:06, 5971.88 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5448.50 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5412.43 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5453.93 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5549.98 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5585.85 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5572.22 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5498.42 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5620.18 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5573.16 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5619.22 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5644.17 examples/s] Map: 22%|██▏ | 11000/50413 [00:01<00:07, 4942.81 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5718.93 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5669.72 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5645.67 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5727.74 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5678.01 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5715.33 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:06, 5711.25 examples/s] Map: 24%|██▍ | 12000/50413 [00:02<00:07, 5217.31 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5813.06 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5746.30 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5744.87 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5804.95 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5761.85 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5803.06 examples/s] Map: 34%|███▎ | 17000/50413 [00:03<00:05, 5743.51 examples/s] Map: 26%|██▌ | 13000/50413 [00:02<00:06, 5440.19 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5871.24 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5815.60 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5801.21 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5866.05 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5819.97 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5852.06 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5762.47 examples/s] Map: 28%|██▊ | 14000/50413 [00:02<00:06, 5584.96 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5889.85 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5877.51 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5808.48 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5873.29 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5829.12 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5860.16 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5759.29 examples/s] Map: 30%|██▉ | 15000/50413 [00:02<00:06, 5684.11 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5927.23 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5892.73 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5842.89 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5920.05 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5846.56 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5910.45 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5820.51 examples/s] Map: 32%|███▏ | 16000/50413 [00:02<00:05, 5759.15 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5991.67 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5940.89 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5906.78 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5970.17 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5903.64 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5958.47 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:05, 5856.85 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 6023.85 examples/s] Map: 34%|███▎ | 17000/50413 [00:02<00:05, 5809.64 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5970.82 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 5945.29 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 6003.38 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 5928.77 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 5990.25 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 5868.69 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 6049.64 examples/s] Map: 36%|███▌ | 18000/50413 [00:03<00:05, 5817.24 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 5994.58 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 5977.21 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 6037.64 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 5949.96 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 6018.56 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 6056.21 examples/s] Map: 46%|████▌ | 23000/50413 [00:04<00:04, 5884.97 examples/s] Map: 38%|███▊ | 19000/50413 [00:03<00:05, 5840.95 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 6037.86 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 5996.22 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 6050.84 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 5954.18 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 6015.63 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 6068.78 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 5901.71 examples/s] Map: 40%|███▉ | 20000/50413 [00:03<00:05, 5902.96 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 6049.24 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 6008.51 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 6059.03 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 5969.02 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 6027.09 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 6083.59 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 5896.29 examples/s] Map: 42%|████▏ | 21000/50413 [00:03<00:04, 5939.42 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 6057.28 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 6073.82 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 5974.14 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 5943.05 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 6031.59 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 6077.63 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 5904.76 examples/s] Map: 44%|████▎ | 22000/50413 [00:03<00:04, 5945.53 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 6058.62 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 6066.49 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 5975.25 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 6022.07 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 5947.67 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 6041.08 examples/s] Map: 46%|████▌ | 23000/50413 [00:03<00:04, 5941.49 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5867.85 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 6030.75 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 6020.91 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5951.48 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5980.86 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5921.88 examples/s] Map: 48%|████▊ | 24000/50413 [00:04<00:04, 5943.59 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5968.09 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4884.25 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4878.39 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4869.80 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4844.42 examples/s] Map: 50%|████▉ | 25000/50413 [00:04<00:04, 5907.13 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4854.60 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4866.72 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5196.66 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5162.74 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4922.92 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5145.36 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5141.22 examples/s] Map: 52%|█████▏ | 26000/50413 [00:04<00:04, 5929.82 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5152.07 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5155.25 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5460.27 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5370.47 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5233.64 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5387.84 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5380.29 examples/s] Map: 54%|█████▎ | 27000/50413 [00:04<00:03, 5912.98 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5388.77 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5363.77 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5584.76 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5502.67 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5482.12 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5539.50 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5519.50 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5550.13 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5504.12 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5732.41 examples/s] Map: 56%|█████▌ | 28000/50413 [00:04<00:04, 4933.50 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5621.71 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5606.94 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5662.83 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5655.86 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5684.19 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5638.90 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:02, 5831.12 examples/s] Map: 58%|█████▊ | 29000/50413 [00:05<00:04, 5211.01 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5677.77 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5717.98 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5752.84 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5748.57 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5766.28 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5735.01 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5876.17 examples/s] Map: 60%|█████▉ | 30000/50413 [00:05<00:03, 5424.84 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:02, 5817.54 examples/s] Map: 67%|██████▋ | 34000/50413 [00:06<00:02, 5716.77 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5744.64 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5565.59 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5792.86 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5751.45 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5903.45 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5841.74 examples/s] Map: 61%|██████▏ | 31000/50413 [00:05<00:03, 5465.95 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5724.25 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5817.17 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5673.37 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5799.50 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5820.07 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5970.21 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5925.58 examples/s] Map: 63%|██████▎ | 32000/50413 [00:05<00:03, 5597.72 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5760.93 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5884.74 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5786.41 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5860.07 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5881.97 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5945.02 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5987.81 examples/s] Map: 65%|██████▌ | 33000/50413 [00:05<00:03, 5700.56 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5790.50 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5878.25 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5854.11 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5887.19 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5888.99 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5986.34 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 6002.58 examples/s] Map: 67%|██████▋ | 34000/50413 [00:05<00:02, 5760.13 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5837.83 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5923.28 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5911.49 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5916.56 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5912.57 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 6020.65 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5987.13 examples/s] Map: 69%|██████▉ | 35000/50413 [00:06<00:02, 5830.97 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5818.66 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5964.66 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5931.96 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5906.47 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5914.11 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 6035.35 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 6030.92 examples/s] Map: 71%|███████▏ | 36000/50413 [00:06<00:02, 5877.68 examples/s] Map: 79%|███████▉ | 40000/50413 [00:07<00:01, 5850.81 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 5971.54 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 5970.51 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 5931.91 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 5934.59 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 6028.86 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 6053.85 examples/s] Map: 73%|███████▎ | 37000/50413 [00:06<00:02, 5850.30 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5871.29 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5974.72 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5963.91 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5914.30 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5936.24 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 6040.69 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 6067.72 examples/s] Map: 75%|███████▌ | 38000/50413 [00:06<00:02, 5889.18 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5863.52 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5988.60 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5995.81 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5947.61 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5946.51 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 6076.60 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 6082.92 examples/s] Map: 77%|███████▋ | 39000/50413 [00:06<00:01, 5903.32 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 5896.25 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 6011.63 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 6024.84 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 5982.40 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 5971.42 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 6060.65 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 6114.24 examples/s] Map: 79%|███████▉ | 40000/50413 [00:06<00:01, 5923.96 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 5906.76 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 5979.42 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 6045.22 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 5991.19 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 5969.91 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 6055.03 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 6099.40 examples/s] Map: 81%|████████▏ | 41000/50413 [00:07<00:01, 5917.72 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 5893.73 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 5947.79 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 6024.16 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 5975.64 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 5820.76 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 6058.85 examples/s] Map: 83%|████████▎ | 42000/50413 [00:07<00:01, 5925.77 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4897.79 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4891.96 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4850.53 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4902.16 examples/s] Map: 85%|████████▌ | 43000/50413 [00:07<00:01, 5922.90 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4866.34 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4785.84 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5164.25 examples/s] Map: 91%|█████████ | 46000/50413 [00:07<00:00, 4966.28 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5169.47 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5147.85 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5190.02 examples/s] Map: 87%|████████▋ | 44000/50413 [00:07<00:01, 5930.39 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5162.89 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5393.24 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5005.46 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5264.56 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5367.09 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5381.59 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5424.92 examples/s] Map: 89%|████████▉ | 45000/50413 [00:07<00:00, 5934.78 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5387.09 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5565.59 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5242.19 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5486.30 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5513.66 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5545.63 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5574.48 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5549.54 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5684.55 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5406.71 examples/s] Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5589.20 examples/s] Map: 91%|█████████ | 46000/50413 [00:08<00:00, 4922.75 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5759.72 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5614.48 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5719.39 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5664.85 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5690.27 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5648.58 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5737.52 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5539.19 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5708.04 examples/s] Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5665.15 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5736.35 examples/s] Map: 93%|█████████▎| 47000/50413 [00:08<00:00, 5165.19 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5684.47 examples/s] Map: 100%|██████████| 50413/50413 [00:08<00:00, 5782.69 examples/s] Map: 95%|█████████▌| 48000/50413 [00:08<00:00, 5357.34 examples/s][train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 Map: 97%|█████████▋| 49000/50413 [00:08<00:00, 5485.79 examples/s]/home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 07/08/2024 19:19:35 - INFO - __main__ - Sample 25247 of the training set: {'context': [{'role': 'user', 'content': 'Given the below context: Following the termination of their contract with RCA, the Kinks signed with Arista Records in 1976. With the encouragement of Arista\'s management they stripped back down to a five-man core group and were reborn as an arena rock band. John Dalton left the band before finishing the sessions for the debut Arista album. Andy Pyle was brought in to complete the sessions and to play on the subsequent tour. Sleepwalker, released in 1977, marked a return to success for the group as it peaked at number 21 on the Billboard chart. After its release and the recording of the follow-up, Misfits, Andy Pyle and keyboardist John Gosling left the group to work together on a separate project. Dalton returned to complete the tour and ex–Pretty Things keyboardist Gordon John Edwards joined the band. In May 1978, Misfits, the Kinks\' second Arista album, was released. It included the US Top 40 hit "A Rock \'n\' Roll Fantasy", which helped make the record another success for the band. The non-album single "Father Christmas" has remained a popular track. Driven by session drummer Henry Spinetti\'s drumming and Dave Davies\' heavy guitar the song "Father Christmas" has become a classic seasonal favorite on mainstream radio. Dalton left the band permanently at the end of their UK tour, and Gordon John Edwards followed. Ex-Argent bassist Jim Rodford joined the band before the recording of Low Budget, on which Ray Davies played the keyboard sections. Keyboardist Ian Gibbons was recruited for the subsequent tour, and became a permanent member of the group. Despite the personnel changes, the popularity of the band\'s records and live shows continued to grow. Beginning in the late 1970s, bands such as the Jam ("David Watts"), the Pretenders ("Stop Your Sobbing", "I Go to Sleep") and the Knack ("The Hard Way") recorded covers of Kinks songs, which helped bring attention to the group\'s new releases. In 1978, Van Halen covered "You Really Got Me" for their debut single, a Top 40 US hit, helping boost the band\'s commercial resurgence (Van Halen... Guess a valid title for it!\nAnswer:'}], 'prefix': ['<|im_start|>user\nGiven the below context: Following the termination of their contract with RCA, the Kinks signed with Arista Records in 1976. With the encouragement of Arista\'s management they stripped back down to a five-man core group and were reborn as an arena rock band. John Dalton left the band before finishing the sessions for the debut Arista album. Andy Pyle was brought in to complete the sessions and to play on the subsequent tour. Sleepwalker, released in 1977, marked a return to success for the group as it peaked at number 21 on the Billboard chart. After its release and the recording of the follow-up, Misfits, Andy Pyle and keyboardist John Gosling left the group to work together on a separate project. Dalton returned to complete the tour and ex–Pretty Things keyboardist Gordon John Edwards joined the band. In May 1978, Misfits, the Kinks\' second Arista album, was released. It included the US Top 40 hit "A Rock \'n\' Roll Fantasy", which helped make the record another success for the band. The non-album single "Father Christmas" has remained a popular track. Driven by session drummer Henry Spinetti\'s drumming and Dave Davies\' heavy guitar the song "Father Christmas" has become a classic seasonal favorite on mainstream radio. Dalton left the band permanently at the end of their UK tour, and Gordon John Edwards followed. Ex-Argent bassist Jim Rodford joined the band before the recording of Low Budget, on which Ray Davies played the keyboard sections. Keyboardist Ian Gibbons was recruited for the subsequent tour, and became a permanent member of the group. Despite the personnel changes, the popularity of the band\'s records and live shows continued to grow. Beginning in the late 1970s, bands such as the Jam ("David Watts"), the Pretenders ("Stop Your Sobbing", "I Go to Sleep") and the Knack ("The Hard Way") recorded covers of Kinks songs, which helped bring attention to the group\'s new releases. In 1978, Van Halen covered "You Really Got Me" for their debut single, a Top 40 US hit, helping boost the band\'s commercial resurgence (Van Halen... Guess a valid title for it!\nAnswer:<|im_end|>\n<|im_start|>assistant\n'], 'suffix': ['"Rebirth of The Kinks: The Arista Years and Beyond"'], 'sft_index': tensor(0), 'reward': tensor([1]), 'category': None, 'source': None, 'input_ids': tensor([128000, 128000, 128006, 882, 128007, 271, 22818, 279, 3770, 2317, 25, 220, 23548, 279, 35508, 315, 872, 5226, 449, 99431, 11, 279, 735, 15872, 8667, 449, 1676, 9265, 22293, 304, 220, 4468, 21, 13, 3161, 279, 51475, 315, 1676, 9265, 596, 6373, 814, 37779, 1203, 1523, 311, 264, 4330, 21110, 6332, 1912, 323, 1051, 12646, 1540, 439, 459, 25946, 7091, 7200, 13, 3842, 72554, 2163, 279, 7200, 1603, 25270, 279, 16079, 369, 279, 17755, 1676, 9265, 8176, 13, 25871, 393, 982, 574, 7263, 304, 311, 4686, 279, 16079, 323, 311, 1514, 389, 279, 17876, 7364, 13, 24708, 45352, 11, 6004, 304, 220, 4468, 22, 11, 13160, 264, 471, 311, 2450, 369, 279, 1912, 439, 433, 78292, 520, 1396, 220, 1691, 389, 279, 67293, 9676, 13, 4740, 1202, 4984, 323, 279, 14975, 315, 279, 1833, 5352, 11, 33659, 30322, 11, 25871, 393, 982, 323, 13939, 380, 3842, 63481, 2785, 2163, 279, 1912, 311, 990, 3871, 389, 264, 8821, 2447, 13, 72554, 6052, 311, 4686, 279, 7364, 323, 506, 4235, 53040, 20695, 13939, 380, 26952, 3842, 37863, 11096, 279, 7200, 13, 763, 3297, 220, 4468, 23, 11, 33659, 30322, 11, 279, 735, 15872, 6, 2132, 1676, 9265, 8176, 11, 574, 6004, 13, 1102, 5343, 279, 2326, 7054, 220, 1272, 4295, 330, 32, 9305, 364, 77, 6, 15028, 27582, 498, 902, 9087, 1304, 279, 3335, 2500, 2450, 369, 279, 7200, 13, 578, 2536, 19308, 5490, 3254, 330, 62416, 10280, 1, 706, 14958, 264, 5526, 3839, 13, 2999, 2116, 555, 3882, 69046, 18063, 41785, 29037, 596, 24074, 5424, 323, 20851, 56872, 6, 8987, 17418, 279, 5609, 330, 62416, 10280, 1, 706, 3719, 264, 11670, 36899, 7075, 389, 21391, 9063, 13, 72554, 2163, 279, 7200, 31859, 520, 279, 842, 315, 872, 6560, 7364, 11, 323, 26952, 3842, 37863, 8272, 13, 1398, 12, 2803, 306, 22253, 380, 11641, 13611, 8350, 11096, 279, 7200, 1603, 279, 14975, 315, 12310, 28368, 11, 389, 902, 13558, 56872, 6476, 279, 13939, 14491, 13, 26698, 380, 29335, 29479, 47620, 574, 45425, 369, 279, 17876, 7364, 11, 323, 6244, 264, 15690, 4562, 315, 279, 1912, 13, 18185, 279, 17274, 4442, 11, 279, 23354, 315, 279, 7200, 596, 7576, 323, 3974, 5039, 8738, 311, 3139, 13, 52950, 304, 279, 3389, 220, 4468, 15, 82, 11, 21562, 1778, 439, 279, 20614, 3573, 23083, 59336, 4063, 279, 63039, 14846, 3573, 10903, 4718, 67537, 7278, 498, 330, 40, 6122, 311, 24708, 909, 323, 279, 13934, 474, 3573, 791, 11481, 12424, 909, 12715, 14861, 315, 735, 15872, 11936, 11, 902, 9087, 4546, 6666, 311, 279, 1912, 596, 502, 19786, 13, 763, 220, 4468, 23, 11, 13000, 20442, 268, 9960, 330, 2675, 29308, 25545, 2206, 1, 369, 872, 17755, 3254, 11, 264, 7054, 220, 1272, 2326, 4295, 11, 10695, 7916, 279, 7200, 596, 8518, 91590, 320, 46324, 20442, 268, 1131, 220, 55379, 264, 2764, 2316, 369, 433, 4999, 16533, 25, 128009, 128006, 78191, 128007, 271, 1, 697, 28813, 315, 578, 735, 15872, 25, 578, 1676, 9265, 23116, 323, 31886, 1, 128009]), 'labels': tensor([ -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 1, 697, 28813, 315, 578, 735, 15872, 25, 578, 1676, 9265, 23116, 323, 31886, 1, 128009]), 'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])}. 07/08/2024 19:19:35 - INFO - __main__ - trainable model_params: 109051904 PeftModelForCausalLM( (base_model): LoraModel( (model): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128257, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): lora.Linear( (base_layer): Linear(in_features=4096, out_features=4096, bias=False) (lora_dropout): ModuleDict( (default): Dropout(p=0.1, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=128, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=128, out_features=4096, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (k_proj): lora.Linear( (base_layer): Linear(in_features=4096, out_features=1024, bias=False) (lora_dropout): ModuleDict( (default): Dropout(p=0.1, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=128, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=128, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (v_proj): lora.Linear( (base_layer): Linear(in_features=4096, out_features=1024, bias=False) (lora_dropout): ModuleDict( (default): Dropout(p=0.1, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=128, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=128, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (o_proj): lora.Linear( (base_layer): Linear(in_features=4096, out_features=4096, bias=False) (lora_dropout): ModuleDict( (default): Dropout(p=0.1, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=128, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=128, out_features=4096, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128257, bias=False) ) ) ) /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( 07/08/2024 19:19:35 - WARNING - accelerate.utils.other - Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( Map: 99%|█████████▉| 50000/50413 [00:08<00:00, 5597.52 examples/s]/home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( Map: 100%|██████████| 50413/50413 [00:08<00:00, 5662.87 examples/s] [INFO|trainer.py:568] 2024-07-08 19:19:35,377 >> Using auto half precision backend [INFO|trainer.py:712] 2024-07-08 19:19:35,528 >> The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: sft_index, prefix, source, context, reward, category, suffix. If sft_index, prefix, source, context, reward, category, suffix are not expected by `PeftModelForCausalLM.forward`, you can safely ignore this message. [train set] examples: 50413; # avg tokens: 369.3782958984375 [train set] examples: 50413; # avg completion tokens: 209.04800415039062 /home/pai/envs/less/lib/python3.10/site-packages/accelerate/accelerator.py:446: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False) warnings.warn( dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO cudaDriverVersion 12020 NCCL version 2.18.6+cuda11.8 dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO cudaDriverVersion 12020 dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO Bootstrap : Using eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO Plugin name set by env to libnccl-net-none.so dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO NET/Plugin : Plugin load (libnccl-net-none.so) returned 2 : libnccl-net-none.so: cannot open shared object file: No such file or directory dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO NET/Plugin : No plugin found, using internal implementation dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NCCL_IB_HCA set to mlx5 dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:22.5.97.1<0> dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Using network IB dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO comm 0x68858fe0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO comm 0x2a244470 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO comm 0x2775e5c0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO comm 0x5942be30 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO comm 0x5a835df0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO comm 0x2a9ede60 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO comm 0x2a5517e0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO comm 0x68055110 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x86c8fe3988ed8ac1 - Init START dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 00/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 01/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 02/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 03/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 04/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 05/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 06/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 07/08 : 0 1 2 3 4 5 6 7 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 8. dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 4. dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO P2P Chunksize set to 524288 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Connected all rings dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/IPC/read dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO Connected all trees dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 8 p2p channels per peer dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO NCCL_LAUNCH_MODE set by environment to PARALLEL dlc1apybk6l37ai7-master-0:429460:430029 [4] NCCL INFO comm 0x68055110 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 50 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429458:430026 [2] NCCL INFO comm 0x2a244470 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 30 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429462:430025 [6] NCCL INFO comm 0x2a5517e0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId 70 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429456:430024 [0] NCCL INFO comm 0x5942be30 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429463:430028 [7] NCCL INFO comm 0x5a835df0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId 80 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429459:430027 [3] NCCL INFO comm 0x68858fe0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 40 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429461:430030 [5] NCCL INFO comm 0x2a9ede60 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x86c8fe3988ed8ac1 - Init COMPLETE dlc1apybk6l37ai7-master-0:429457:430031 [1] NCCL INFO comm 0x2775e5c0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 20 commId 0x86c8fe3988ed8ac1 - Init COMPLETE [INFO|trainer.py:1706] 2024-07-08 19:19:45,415 >> ***** Running training ***** [INFO|trainer.py:1707] 2024-07-08 19:19:45,415 >> Num examples = 50,413 [INFO|trainer.py:1708] 2024-07-08 19:19:45,415 >> Num Epochs = 4 [INFO|trainer.py:1709] 2024-07-08 19:19:45,415 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1712] 2024-07-08 19:19:45,415 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:1713] 2024-07-08 19:19:45,415 >> Gradient Accumulation steps = 16 [INFO|trainer.py:1714] 2024-07-08 19:19:45,415 >> Total optimization steps = 1,572 [INFO|trainer.py:1715] 2024-07-08 19:19:45,418 >> Number of trainable parameters = 75,497,472 [INFO|integration_utils.py:722] 2024-07-08 19:19:45,563 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" 0%| | 0/1572 [00:00> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. [WARNING|logging.py:314] 2024-07-08 19:19:49,234 >> You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. 0%| | 1/1572 [00:10<4:29:10, 10.28s/it] {'loss': 1.3058, 'learning_rate': 4.1666666666666667e-07, 'epoch': 0.0} 0%| | 1/1572 [00:10<4:29:10, 10.28s/it] 0%| | 2/1572 [00:17<3:39:20, 8.38s/it] {'loss': 1.192, 'learning_rate': 8.333333333333333e-07, 'epoch': 0.01} 0%| | 2/1572 [00:17<3:39:20, 8.38s/it] 0%| | 3/1572 [00:25<3:34:50, 8.22s/it] {'loss': 1.2973, 'learning_rate': 1.25e-06, 'epoch': 0.01} 0%| | 3/1572 [00:25<3:34:50, 8.22s/it] 0%| | 4/1572 [00:32<3:25:01, 7.85s/it] {'loss': 1.2629, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.01} 0%| | 4/1572 [00:32<3:25:01, 7.85s/it] 0%| | 5/1572 [00:40<3:21:00, 7.70s/it] {'loss': 1.2073, 'learning_rate': 2.0833333333333334e-06, 'epoch': 0.01} 0%| | 5/1572 [00:40<3:21:00, 7.70s/it] 0%| | 6/1572 [00:46<3:11:05, 7.32s/it] {'loss': 1.1229, 'learning_rate': 2.5e-06, 'epoch': 0.02} 0%| | 6/1572 [00:46<3:11:05, 7.32s/it] 0%| | 7/1572 [00:54<3:14:04, 7.44s/it] {'loss': 1.2677, 'learning_rate': 2.916666666666667e-06, 'epoch': 0.02} 0%| | 7/1572 [00:54<3:14:04, 7.44s/it] 1%| | 8/1572 [01:01<3:08:56, 7.25s/it] {'loss': 1.1353, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.02} 1%| | 8/1572 [01:01<3:08:56, 7.25s/it] 1%| | 9/1572 [01:08<3:08:43, 7.24s/it] {'loss': 1.0524, 'learning_rate': 3.7500000000000005e-06, 'epoch': 0.02} 1%| | 9/1572 [01:08<3:08:43, 7.24s/it] 1%| | 10/1572 [01:15<3:08:11, 7.23s/it] {'loss': 1.0016, 'learning_rate': 4.166666666666667e-06, 'epoch': 0.03} 1%| | 10/1572 [01:15<3:08:11, 7.23s/it] 1%| | 11/1572 [01:23<3:10:17, 7.31s/it] {'loss': 1.1865, 'learning_rate': 4.583333333333333e-06, 'epoch': 0.03} 1%| | 11/1572 [01:23<3:10:17, 7.31s/it] 1%| | 12/1572 [01:30<3:08:18, 7.24s/it] {'loss': 1.0415, 'learning_rate': 5e-06, 'epoch': 0.03} 1%| | 12/1572 [01:30<3:08:18, 7.24s/it] 1%| | 13/1572 [01:37<3:09:22, 7.29s/it] {'loss': 0.9764, 'learning_rate': 5.416666666666667e-06, 'epoch': 0.03} 1%| | 13/1572 [01:37<3:09:22, 7.29s/it] 1%| | 14/1572 [01:45<3:15:41, 7.54s/it] {'loss': 1.0045, 'learning_rate': 5.833333333333334e-06, 'epoch': 0.04} 1%| | 14/1572 [01:45<3:15:41, 7.54s/it] 1%| | 15/1572 [01:53<3:20:27, 7.73s/it] {'loss': 0.8317, 'learning_rate': 6.25e-06, 'epoch': 0.04} 1%| | 15/1572 [01:53<3:20:27, 7.73s/it] 1%| | 16/1572 [02:01<3:21:32, 7.77s/it] {'loss': 0.9207, 'learning_rate': 6.666666666666667e-06, 'epoch': 0.04} 1%| | 16/1572 [02:01<3:21:32, 7.77s/it] 1%| | 17/1572 [02:08<3:14:13, 7.49s/it] {'loss': 0.8949, 'learning_rate': 7.083333333333335e-06, 'epoch': 0.04} 1%| | 17/1572 [02:08<3:14:13, 7.49s/it] 1%| | 18/1572 [02:16<3:14:55, 7.53s/it] {'loss': 0.8629, 'learning_rate': 7.500000000000001e-06, 'epoch': 0.05} 1%| | 18/1572 [02:16<3:14:55, 7.53s/it] 1%| | 19/1572 [02:23<3:15:49, 7.57s/it] {'loss': 0.9026, 'learning_rate': 7.916666666666667e-06, 'epoch': 0.05} 1%| | 19/1572 [02:23<3:15:49, 7.57s/it] 1%|▏ | 20/1572 [02:31<3:16:50, 7.61s/it] {'loss': 0.8833, 'learning_rate': 8.333333333333334e-06, 'epoch': 0.05} 1%|▏ | 20/1572 [02:31<3:16:50, 7.61s/it] 1%|▏ | 21/1572 [02:39<3:17:03, 7.62s/it] {'loss': 0.8436, 'learning_rate': 8.750000000000001e-06, 'epoch': 0.05} 1%|▏ | 21/1572 [02:39<3:17:03, 7.62s/it] 1%|▏ | 22/1572 [02:46<3:14:09, 7.52s/it] {'loss': 0.868, 'learning_rate': 9.166666666666666e-06, 'epoch': 0.06} 1%|▏ | 22/1572 [02:46<3:14:09, 7.52s/it] 1%|▏ | 23/1572 [02:53<3:09:50, 7.35s/it] {'loss': 0.8943, 'learning_rate': 9.583333333333335e-06, 'epoch': 0.06} 1%|▏ | 23/1572 [02:53<3:09:50, 7.35s/it] 2%|▏ | 24/1572 [03:00<3:05:41, 7.20s/it] {'loss': 0.835, 'learning_rate': 1e-05, 'epoch': 0.06} 2%|▏ | 24/1572 [03:00<3:05:41, 7.20s/it] 2%|▏ | 25/1572 [03:07<3:05:33, 7.20s/it] {'loss': 0.8719, 'learning_rate': 1.0416666666666668e-05, 'epoch': 0.06} 2%|▏ | 25/1572 [03:07<3:05:33, 7.20s/it] 2%|▏ | 26/1572 [03:15<3:13:49, 7.52s/it] {'loss': 0.8289, 'learning_rate': 1.0833333333333334e-05, 'epoch': 0.07} 2%|▏ | 26/1572 [03:15<3:13:49, 7.52s/it] 2%|▏ | 27/1572 [03:23<3:12:06, 7.46s/it] {'loss': 0.8167, 'learning_rate': 1.125e-05, 'epoch': 0.07} 2%|▏ | 27/1572 [03:23<3:12:06, 7.46s/it] 2%|▏ | 28/1572 [03:29<3:05:32, 7.21s/it] {'loss': 0.8897, 'learning_rate': 1.1666666666666668e-05, 'epoch': 0.07} 2%|▏ | 28/1572 [03:29<3:05:32, 7.21s/it] 2%|▏ | 29/1572 [03:37<3:09:01, 7.35s/it] {'loss': 0.895, 'learning_rate': 1.2083333333333333e-05, 'epoch': 0.07} 2%|▏ | 29/1572 [03:37<3:09:01, 7.35s/it] 2%|▏ | 30/1572 [03:44<3:03:57, 7.16s/it] {'loss': 0.8243, 'learning_rate': 1.25e-05, 'epoch': 0.08} 2%|▏ | 30/1572 [03:44<3:03:57, 7.16s/it] 2%|▏ | 31/1572 [03:51<3:02:02, 7.09s/it] {'loss': 0.8314, 'learning_rate': 1.2916666666666668e-05, 'epoch': 0.08} 2%|▏ | 31/1572 [03:51<3:02:02, 7.09s/it] 2%|▏ | 32/1572 [03:57<3:00:10, 7.02s/it] {'loss': 0.8049, 'learning_rate': 1.3333333333333333e-05, 'epoch': 0.08} 2%|▏ | 32/1572 [03:57<3:00:10, 7.02s/it] 2%|▏ | 33/1572 [04:05<3:02:19, 7.11s/it] {'loss': 0.8087, 'learning_rate': 1.375e-05, 'epoch': 0.08} 2%|▏ | 33/1572 [04:05<3:02:19, 7.11s/it] 2%|▏ | 34/1572 [04:12<3:02:36, 7.12s/it] {'loss': 0.7565, 'learning_rate': 1.416666666666667e-05, 'epoch': 0.09} 2%|▏ | 34/1572 [04:12<3:02:36, 7.12s/it] 2%|▏ | 35/1572 [04:19<3:02:50, 7.14s/it] {'loss': 0.8179, 'learning_rate': 1.4583333333333333e-05, 'epoch': 0.09} 2%|▏ | 35/1572 [04:19<3:02:50, 7.14s/it] 2%|▏ | 36/1572 [04:26<3:01:59, 7.11s/it] {'loss': 0.8248, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.09} 2%|▏ | 36/1572 [04:26<3:01:59, 7.11s/it] 2%|▏ | 37/1572 [04:33<3:01:57, 7.11s/it] {'loss': 0.8214, 'learning_rate': 1.5416666666666668e-05, 'epoch': 0.09} 2%|▏ | 37/1572 [04:33<3:01:57, 7.11s/it] 2%|▏ | 38/1572 [04:41<3:05:02, 7.24s/it] {'loss': 0.8612, 'learning_rate': 1.5833333333333333e-05, 'epoch': 0.1} 2%|▏ | 38/1572 [04:41<3:05:02, 7.24s/it] 2%|▏ | 39/1572 [04:48<3:08:32, 7.38s/it] {'loss': 0.8087, 'learning_rate': 1.6250000000000002e-05, 'epoch': 0.1} 2%|▏ | 39/1572 [04:48<3:08:32, 7.38s/it] 3%|▎ | 40/1572 [04:55<3:03:39, 7.19s/it] {'loss': 0.8338, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.1} 3%|▎ | 40/1572 [04:55<3:03:39, 7.19s/it] 3%|▎ | 41/1572 [05:02<3:01:11, 7.10s/it] {'loss': 0.8538, 'learning_rate': 1.7083333333333333e-05, 'epoch': 0.1} 3%|▎ | 41/1572 [05:02<3:01:11, 7.10s/it] 3%|▎ | 42/1572 [05:09<2:59:54, 7.06s/it] {'loss': 0.736, 'learning_rate': 1.7500000000000002e-05, 'epoch': 0.11} 3%|▎ | 42/1572 [05:09<2:59:54, 7.06s/it] 3%|▎ | 43/1572 [05:16<2:58:40, 7.01s/it] {'loss': 0.7752, 'learning_rate': 1.7916666666666667e-05, 'epoch': 0.11} 3%|▎ | 43/1572 [05:16<2:58:40, 7.01s/it] 3%|▎ | 44/1572 [05:23<3:02:21, 7.16s/it] {'loss': 0.8186, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.11} 3%|▎ | 44/1572 [05:23<3:02:21, 7.16s/it] 3%|▎ | 45/1572 [05:31<3:02:38, 7.18s/it] {'loss': 0.9004, 'learning_rate': 1.8750000000000002e-05, 'epoch': 0.11} 3%|▎ | 45/1572 [05:31<3:02:38, 7.18s/it] 3%|▎ | 46/1572 [05:38<3:04:20, 7.25s/it] {'loss': 0.7464, 'learning_rate': 1.916666666666667e-05, 'epoch': 0.12} 3%|▎ | 46/1572 [05:38<3:04:20, 7.25s/it] 3%|▎ | 47/1572 [05:46<3:07:24, 7.37s/it] {'loss': 0.8744, 'learning_rate': 1.9583333333333333e-05, 'epoch': 0.12} 3%|▎ | 47/1572 [05:46<3:07:24, 7.37s/it] 3%|▎ | 48/1572 [05:53<3:09:45, 7.47s/it] {'loss': 0.7909, 'learning_rate': 2e-05, 'epoch': 0.12} 3%|▎ | 48/1572 [05:53<3:09:45, 7.47s/it] 3%|▎ | 49/1572 [06:01<3:10:40, 7.51s/it] {'loss': 0.7354, 'learning_rate': 1.998687664041995e-05, 'epoch': 0.12} 3%|▎ | 49/1572 [06:01<3:10:40, 7.51s/it] 3%|▎ | 50/1572 [06:08<3:09:09, 7.46s/it] {'loss': 0.7691, 'learning_rate': 1.9973753280839896e-05, 'epoch': 0.13} 3%|▎ | 50/1572 [06:08<3:09:09, 7.46s/it] 3%|▎ | 51/1572 [06:16<3:07:23, 7.39s/it] {'loss': 0.7398, 'learning_rate': 1.9960629921259843e-05, 'epoch': 0.13} 3%|▎ | 51/1572 [06:16<3:07:23, 7.39s/it] 3%|▎ | 52/1572 [06:23<3:06:13, 7.35s/it] {'loss': 0.814, 'learning_rate': 1.9947506561679793e-05, 'epoch': 0.13} 3%|▎ | 52/1572 [06:23<3:06:13, 7.35s/it] 3%|▎ | 53/1572 [06:30<3:02:55, 7.23s/it] {'loss': 0.8055, 'learning_rate': 1.9934383202099737e-05, 'epoch': 0.13} 3%|▎ | 53/1572 [06:30<3:02:55, 7.23s/it] 3%|▎ | 54/1572 [06:37<2:59:02, 7.08s/it] {'loss': 0.785, 'learning_rate': 1.9921259842519688e-05, 'epoch': 0.14} 3%|▎ | 54/1572 [06:37<2:59:02, 7.08s/it] 3%|▎ | 55/1572 [06:44<3:04:10, 7.28s/it] {'loss': 1.0027, 'learning_rate': 1.9908136482939635e-05, 'epoch': 0.14} 3%|▎ | 55/1572 [06:44<3:04:10, 7.28s/it] 4%|▎ | 56/1572 [06:52<3:07:20, 7.41s/it] {'loss': 0.8046, 'learning_rate': 1.9895013123359582e-05, 'epoch': 0.14} 4%|▎ | 56/1572 [06:52<3:07:20, 7.41s/it] 4%|▎ | 57/1572 [06:59<3:06:07, 7.37s/it] {'loss': 0.6848, 'learning_rate': 1.988188976377953e-05, 'epoch': 0.14} 4%|▎ | 57/1572 [06:59<3:06:07, 7.37s/it] 4%|▎ | 58/1572 [07:07<3:08:16, 7.46s/it] {'loss': 0.786, 'learning_rate': 1.9868766404199476e-05, 'epoch': 0.15} 4%|▎ | 58/1572 [07:07<3:08:16, 7.46s/it] 4%|▍ | 59/1572 [07:14<3:06:11, 7.38s/it] {'loss': 0.8834, 'learning_rate': 1.9855643044619423e-05, 'epoch': 0.15} 4%|▍ | 59/1572 [07:14<3:06:11, 7.38s/it] 4%|▍ | 60/1572 [07:22<3:06:00, 7.38s/it] {'loss': 0.752, 'learning_rate': 1.984251968503937e-05, 'epoch': 0.15} 4%|▍ | 60/1572 [07:22<3:06:00, 7.38s/it] 4%|▍ | 61/1572 [07:29<3:05:06, 7.35s/it] {'loss': 0.6875, 'learning_rate': 1.982939632545932e-05, 'epoch': 0.15} 4%|▍ | 61/1572 [07:29<3:05:06, 7.35s/it] 4%|▍ | 62/1572 [07:36<3:06:09, 7.40s/it] {'loss': 0.7118, 'learning_rate': 1.9816272965879265e-05, 'epoch': 0.16} 4%|▍ | 62/1572 [07:36<3:06:09, 7.40s/it] 4%|▍ | 63/1572 [07:44<3:06:21, 7.41s/it] {'loss': 0.6849, 'learning_rate': 1.9803149606299215e-05, 'epoch': 0.16} 4%|▍ | 63/1572 [07:44<3:06:21, 7.41s/it] 4%|▍ | 64/1572 [07:51<3:04:54, 7.36s/it] {'loss': 0.8545, 'learning_rate': 1.9790026246719162e-05, 'epoch': 0.16} 4%|▍ | 64/1572 [07:51<3:04:54, 7.36s/it] 4%|▍ | 65/1572 [07:58<3:01:53, 7.24s/it] {'loss': 0.6823, 'learning_rate': 1.977690288713911e-05, 'epoch': 0.17} 4%|▍ | 65/1572 [07:58<3:01:53, 7.24s/it] 4%|▍ | 66/1572 [08:05<3:03:22, 7.31s/it] {'loss': 0.7772, 'learning_rate': 1.9763779527559057e-05, 'epoch': 0.17} 4%|▍ | 66/1572 [08:05<3:03:22, 7.31s/it] 4%|▍ | 67/1572 [08:13<3:02:40, 7.28s/it] {'loss': 0.789, 'learning_rate': 1.9750656167979004e-05, 'epoch': 0.17} 4%|▍ | 67/1572 [08:13<3:02:40, 7.28s/it] 4%|▍ | 68/1572 [08:20<3:00:07, 7.19s/it] {'loss': 0.7375, 'learning_rate': 1.973753280839895e-05, 'epoch': 0.17} 4%|▍ | 68/1572 [08:20<3:00:07, 7.19s/it] 4%|▍ | 69/1572 [08:27<3:02:05, 7.27s/it] {'loss': 0.8808, 'learning_rate': 1.97244094488189e-05, 'epoch': 0.18} 4%|▍ | 69/1572 [08:27<3:02:05, 7.27s/it] 4%|▍ | 70/1572 [08:34<3:00:22, 7.21s/it] {'loss': 0.7628, 'learning_rate': 1.9711286089238845e-05, 'epoch': 0.18} 4%|▍ | 70/1572 [08:34<3:00:22, 7.21s/it] 5%|▍ | 71/1572 [08:41<2:57:18, 7.09s/it] {'loss': 0.732, 'learning_rate': 1.9698162729658795e-05, 'epoch': 0.18} 5%|▍ | 71/1572 [08:41<2:57:18, 7.09s/it] 5%|▍ | 72/1572 [08:48<2:57:00, 7.08s/it] {'loss': 0.7512, 'learning_rate': 1.9685039370078743e-05, 'epoch': 0.18} 5%|▍ | 72/1572 [08:48<2:57:00, 7.08s/it] 5%|▍ | 73/1572 [08:55<2:59:00, 7.17s/it] {'loss': 0.7593, 'learning_rate': 1.967191601049869e-05, 'epoch': 0.19} 5%|▍ | 73/1572 [08:55<2:59:00, 7.17s/it] 5%|▍ | 74/1572 [09:02<2:56:46, 7.08s/it] {'loss': 0.7135, 'learning_rate': 1.9658792650918637e-05, 'epoch': 0.19} 5%|▍ | 74/1572 [09:02<2:56:46, 7.08s/it] 5%|▍ | 75/1572 [09:10<3:02:53, 7.33s/it] {'loss': 0.8312, 'learning_rate': 1.9645669291338584e-05, 'epoch': 0.19} 5%|▍ | 75/1572 [09:10<3:02:53, 7.33s/it] 5%|▍ | 76/1572 [09:17<3:02:30, 7.32s/it] {'loss': 0.7569, 'learning_rate': 1.963254593175853e-05, 'epoch': 0.19} 5%|▍ | 76/1572 [09:17<3:02:30, 7.32s/it] 5%|▍ | 77/1572 [09:25<3:01:26, 7.28s/it] {'loss': 0.7697, 'learning_rate': 1.9619422572178478e-05, 'epoch': 0.2} 5%|▍ | 77/1572 [09:25<3:01:26, 7.28s/it] 5%|▍ | 78/1572 [09:32<2:59:27, 7.21s/it] {'loss': 0.7605, 'learning_rate': 1.960629921259843e-05, 'epoch': 0.2} 5%|▍ | 78/1572 [09:32<2:59:27, 7.21s/it] 5%|▌ | 79/1572 [09:39<3:02:09, 7.32s/it] {'loss': 0.6993, 'learning_rate': 1.9593175853018372e-05, 'epoch': 0.2} 5%|▌ | 79/1572 [09:39<3:02:09, 7.32s/it] 5%|▌ | 80/1572 [09:47<3:08:21, 7.57s/it] {'loss': 0.7379, 'learning_rate': 1.9580052493438323e-05, 'epoch': 0.2} 5%|▌ | 80/1572 [09:47<3:08:21, 7.57s/it] 5%|▌ | 81/1572 [09:55<3:05:58, 7.48s/it] {'loss': 0.7636, 'learning_rate': 1.956692913385827e-05, 'epoch': 0.21} 5%|▌ | 81/1572 [09:55<3:05:58, 7.48s/it] 5%|▌ | 82/1572 [10:02<3:05:21, 7.46s/it] {'loss': 0.7483, 'learning_rate': 1.9553805774278217e-05, 'epoch': 0.21} 5%|▌ | 82/1572 [10:02<3:05:21, 7.46s/it] 5%|▌ | 83/1572 [10:09<2:59:29, 7.23s/it] {'loss': 0.6927, 'learning_rate': 1.9540682414698164e-05, 'epoch': 0.21} 5%|▌ | 83/1572 [10:09<2:59:29, 7.23s/it] 5%|▌ | 84/1572 [10:16<2:59:12, 7.23s/it] {'loss': 0.7517, 'learning_rate': 1.952755905511811e-05, 'epoch': 0.21} 5%|▌ | 84/1572 [10:16<2:59:12, 7.23s/it] 5%|▌ | 85/1572 [10:24<3:05:28, 7.48s/it] {'loss': 0.7909, 'learning_rate': 1.951443569553806e-05, 'epoch': 0.22} 5%|▌ | 85/1572 [10:24<3:05:28, 7.48s/it] 5%|▌ | 86/1572 [10:31<3:01:27, 7.33s/it] {'loss': 0.8087, 'learning_rate': 1.9501312335958006e-05, 'epoch': 0.22} 5%|▌ | 86/1572 [10:31<3:01:27, 7.33s/it] 6%|▌ | 87/1572 [10:39<3:02:57, 7.39s/it] {'loss': 0.6628, 'learning_rate': 1.9488188976377956e-05, 'epoch': 0.22} 6%|▌ | 87/1572 [10:39<3:02:57, 7.39s/it] 6%|▌ | 88/1572 [10:46<3:02:57, 7.40s/it] {'loss': 0.7129, 'learning_rate': 1.94750656167979e-05, 'epoch': 0.22} 6%|▌ | 88/1572 [10:46<3:02:57, 7.40s/it] 6%|▌ | 89/1572 [10:53<3:01:14, 7.33s/it] {'loss': 0.6922, 'learning_rate': 1.946194225721785e-05, 'epoch': 0.23} 6%|▌ | 89/1572 [10:53<3:01:14, 7.33s/it] 6%|▌ | 90/1572 [11:01<3:02:12, 7.38s/it] {'loss': 0.7997, 'learning_rate': 1.9448818897637797e-05, 'epoch': 0.23} 6%|▌ | 90/1572 [11:01<3:02:12, 7.38s/it] 6%|▌ | 91/1572 [11:08<3:02:02, 7.37s/it] {'loss': 0.6905, 'learning_rate': 1.9435695538057745e-05, 'epoch': 0.23} 6%|▌ | 91/1572 [11:08<3:02:02, 7.37s/it] 6%|▌ | 92/1572 [11:16<3:03:45, 7.45s/it] {'loss': 0.8932, 'learning_rate': 1.9422572178477692e-05, 'epoch': 0.23} 6%|▌ | 92/1572 [11:16<3:03:45, 7.45s/it] 6%|▌ | 93/1572 [11:23<3:02:13, 7.39s/it] {'loss': 0.7211, 'learning_rate': 1.940944881889764e-05, 'epoch': 0.24} 6%|▌ | 93/1572 [11:23<3:02:13, 7.39s/it] 6%|▌ | 94/1572 [11:31<3:03:45, 7.46s/it] {'loss': 0.7291, 'learning_rate': 1.9396325459317586e-05, 'epoch': 0.24} 6%|▌ | 94/1572 [11:31<3:03:45, 7.46s/it] 6%|▌ | 95/1572 [11:37<2:58:16, 7.24s/it] {'loss': 0.7578, 'learning_rate': 1.9383202099737536e-05, 'epoch': 0.24} 6%|▌ | 95/1572 [11:37<2:58:16, 7.24s/it] 6%|▌ | 96/1572 [11:45<2:58:28, 7.26s/it] {'loss': 0.8398, 'learning_rate': 1.937007874015748e-05, 'epoch': 0.24} 6%|▌ | 96/1572 [11:45<2:58:28, 7.26s/it] 6%|▌ | 97/1572 [11:52<2:58:04, 7.24s/it] {'loss': 0.7768, 'learning_rate': 1.935695538057743e-05, 'epoch': 0.25} 6%|▌ | 97/1572 [11:52<2:58:04, 7.24s/it] 6%|▌ | 98/1572 [11:59<2:57:02, 7.21s/it] {'loss': 0.8122, 'learning_rate': 1.9343832020997378e-05, 'epoch': 0.25} 6%|▌ | 98/1572 [11:59<2:57:02, 7.21s/it] 6%|▋ | 99/1572 [12:06<2:55:06, 7.13s/it] {'loss': 0.7434, 'learning_rate': 1.9330708661417325e-05, 'epoch': 0.25} 6%|▋ | 99/1572 [12:06<2:55:06, 7.13s/it] 6%|▋ | 100/1572 [12:13<2:54:26, 7.11s/it] {'loss': 0.7193, 'learning_rate': 1.9317585301837272e-05, 'epoch': 0.25} 6%|▋ | 100/1572 [12:13<2:54:26, 7.11s/it] 6%|▋ | 101/1572 [12:20<2:53:08, 7.06s/it] {'loss': 0.7587, 'learning_rate': 1.930446194225722e-05, 'epoch': 0.26} 6%|▋ | 101/1572 [12:20<2:53:08, 7.06s/it] 6%|▋ | 102/1572 [12:27<2:55:01, 7.14s/it] {'loss': 0.7144, 'learning_rate': 1.9291338582677166e-05, 'epoch': 0.26} 6%|▋ | 102/1572 [12:27<2:55:01, 7.14s/it] 7%|▋ | 103/1572 [12:35<2:56:04, 7.19s/it] {'loss': 0.629, 'learning_rate': 1.9278215223097113e-05, 'epoch': 0.26} 7%|▋ | 103/1572 [12:35<2:56:04, 7.19s/it] 7%|▋ | 104/1572 [12:42<2:58:10, 7.28s/it] {'loss': 0.7723, 'learning_rate': 1.9265091863517064e-05, 'epoch': 0.26} 7%|▋ | 104/1572 [12:42<2:58:10, 7.28s/it] 7%|▋ | 105/1572 [12:50<3:00:44, 7.39s/it] {'loss': 0.7523, 'learning_rate': 1.9251968503937008e-05, 'epoch': 0.27} 7%|▋ | 105/1572 [12:50<3:00:44, 7.39s/it] 7%|▋ | 106/1572 [12:57<3:01:01, 7.41s/it] {'loss': 0.7475, 'learning_rate': 1.9238845144356958e-05, 'epoch': 0.27} 7%|▋ | 106/1572 [12:57<3:01:01, 7.41s/it] 7%|▋ | 107/1572 [13:04<3:00:20, 7.39s/it] {'loss': 0.789, 'learning_rate': 1.9225721784776905e-05, 'epoch': 0.27} 7%|▋ | 107/1572 [13:04<3:00:20, 7.39s/it] 7%|▋ | 108/1572 [13:11<2:56:12, 7.22s/it] {'loss': 0.7289, 'learning_rate': 1.9212598425196852e-05, 'epoch': 0.27} 7%|▋ | 108/1572 [13:11<2:56:12, 7.22s/it] 7%|▋ | 109/1572 [13:19<3:00:05, 7.39s/it] {'loss': 0.7498, 'learning_rate': 1.91994750656168e-05, 'epoch': 0.28} 7%|▋ | 109/1572 [13:19<3:00:05, 7.39s/it] 7%|▋ | 110/1572 [13:26<2:59:39, 7.37s/it] {'loss': 0.9547, 'learning_rate': 1.9186351706036747e-05, 'epoch': 0.28} 7%|▋ | 110/1572 [13:26<2:59:39, 7.37s/it] 7%|▋ | 111/1572 [13:34<3:01:37, 7.46s/it] {'loss': 0.8021, 'learning_rate': 1.9173228346456694e-05, 'epoch': 0.28} 7%|▋ | 111/1572 [13:34<3:01:37, 7.46s/it] 7%|▋ | 112/1572 [13:41<2:59:48, 7.39s/it] {'loss': 0.7983, 'learning_rate': 1.916010498687664e-05, 'epoch': 0.28} 7%|▋ | 112/1572 [13:41<2:59:48, 7.39s/it] 7%|▋ | 113/1572 [13:49<2:58:57, 7.36s/it] {'loss': 0.6735, 'learning_rate': 1.914698162729659e-05, 'epoch': 0.29} 7%|▋ | 113/1572 [13:49<2:58:57, 7.36s/it] 7%|▋ | 114/1572 [13:56<2:59:06, 7.37s/it] {'loss': 0.6658, 'learning_rate': 1.9133858267716535e-05, 'epoch': 0.29} 7%|▋ | 114/1572 [13:56<2:59:06, 7.37s/it] 7%|▋ | 115/1572 [14:03<2:57:34, 7.31s/it] {'loss': 0.7631, 'learning_rate': 1.9120734908136486e-05, 'epoch': 0.29} 7%|▋ | 115/1572 [14:03<2:57:34, 7.31s/it] 7%|▋ | 116/1572 [14:11<2:59:07, 7.38s/it] {'loss': 0.7936, 'learning_rate': 1.9107611548556433e-05, 'epoch': 0.29} 7%|▋ | 116/1572 [14:11<2:59:07, 7.38s/it] 7%|▋ | 117/1572 [14:18<2:59:18, 7.39s/it] {'loss': 0.7189, 'learning_rate': 1.909448818897638e-05, 'epoch': 0.3} 7%|▋ | 117/1572 [14:18<2:59:18, 7.39s/it] 8%|▊ | 118/1572 [14:25<2:55:58, 7.26s/it] {'loss': 0.7468, 'learning_rate': 1.9081364829396327e-05, 'epoch': 0.3} 8%|▊ | 118/1572 [14:25<2:55:58, 7.26s/it] 8%|▊ | 119/1572 [14:32<2:53:54, 7.18s/it] {'loss': 0.6605, 'learning_rate': 1.9068241469816274e-05, 'epoch': 0.3} 8%|▊ | 119/1572 [14:32<2:53:54, 7.18s/it] 8%|▊ | 120/1572 [14:40<2:58:39, 7.38s/it] {'loss': 0.7652, 'learning_rate': 1.905511811023622e-05, 'epoch': 0.3} 8%|▊ | 120/1572 [14:40<2:58:39, 7.38s/it] 8%|▊ | 121/1572 [14:48<3:05:46, 7.68s/it] {'loss': 0.7574, 'learning_rate': 1.9041994750656168e-05, 'epoch': 0.31} 8%|▊ | 121/1572 [14:48<3:05:46, 7.68s/it] 8%|▊ | 122/1572 [14:56<3:02:22, 7.55s/it] {'loss': 0.7762, 'learning_rate': 1.902887139107612e-05, 'epoch': 0.31} 8%|▊ | 122/1572 [14:56<3:02:22, 7.55s/it] 8%|▊ | 123/1572 [15:03<3:00:34, 7.48s/it] {'loss': 0.8641, 'learning_rate': 1.9015748031496062e-05, 'epoch': 0.31} 8%|▊ | 123/1572 [15:03<3:00:34, 7.48s/it] 8%|▊ | 124/1572 [15:11<3:04:02, 7.63s/it] {'loss': 0.8135, 'learning_rate': 1.9002624671916013e-05, 'epoch': 0.31} 8%|▊ | 124/1572 [15:11<3:04:02, 7.63s/it] 8%|▊ | 125/1572 [15:18<2:58:39, 7.41s/it] {'loss': 0.7567, 'learning_rate': 1.898950131233596e-05, 'epoch': 0.32} 8%|▊ | 125/1572 [15:18<2:58:39, 7.41s/it] 8%|▊ | 126/1572 [15:25<2:55:47, 7.29s/it] {'loss': 0.7335, 'learning_rate': 1.8976377952755907e-05, 'epoch': 0.32} 8%|▊ | 126/1572 [15:25<2:55:47, 7.29s/it] 8%|▊ | 127/1572 [15:32<2:55:47, 7.30s/it] {'loss': 0.7627, 'learning_rate': 1.8963254593175854e-05, 'epoch': 0.32} 8%|▊ | 127/1572 [15:32<2:55:47, 7.30s/it] 8%|▊ | 128/1572 [15:39<2:56:16, 7.32s/it] {'loss': 0.6773, 'learning_rate': 1.89501312335958e-05, 'epoch': 0.32} 8%|▊ | 128/1572 [15:39<2:56:16, 7.32s/it] 8%|▊ | 129/1572 [15:46<2:53:23, 7.21s/it] {'loss': 0.729, 'learning_rate': 1.893700787401575e-05, 'epoch': 0.33} 8%|▊ | 129/1572 [15:46<2:53:23, 7.21s/it] 8%|▊ | 130/1572 [15:54<2:56:19, 7.34s/it] {'loss': 0.7523, 'learning_rate': 1.89238845144357e-05, 'epoch': 0.33} 8%|▊ | 130/1572 [15:54<2:56:19, 7.34s/it] 8%|▊ | 131/1572 [16:02<3:00:10, 7.50s/it] {'loss': 0.8078, 'learning_rate': 1.8910761154855643e-05, 'epoch': 0.33} 8%|▊ | 131/1572 [16:02<3:00:10, 7.50s/it] 8%|▊ | 132/1572 [16:09<2:57:20, 7.39s/it] {'loss': 0.6969, 'learning_rate': 1.8897637795275593e-05, 'epoch': 0.34} 8%|▊ | 132/1572 [16:09<2:57:20, 7.39s/it] 8%|▊ | 133/1572 [16:17<2:58:24, 7.44s/it] {'loss': 0.6729, 'learning_rate': 1.888451443569554e-05, 'epoch': 0.34} 8%|▊ | 133/1572 [16:17<2:58:24, 7.44s/it] 9%|▊ | 134/1572 [16:23<2:53:28, 7.24s/it] {'loss': 0.6691, 'learning_rate': 1.8871391076115488e-05, 'epoch': 0.34} 9%|▊ | 134/1572 [16:23<2:53:28, 7.24s/it] 9%|▊ | 135/1572 [16:30<2:51:53, 7.18s/it] {'loss': 0.7116, 'learning_rate': 1.8858267716535435e-05, 'epoch': 0.34} 9%|▊ | 135/1572 [16:30<2:51:53, 7.18s/it] 9%|▊ | 136/1572 [16:38<2:54:00, 7.27s/it] {'loss': 0.7435, 'learning_rate': 1.8845144356955382e-05, 'epoch': 0.35} 9%|▊ | 136/1572 [16:38<2:54:00, 7.27s/it] 9%|▊ | 137/1572 [16:45<2:52:00, 7.19s/it] {'loss': 0.7491, 'learning_rate': 1.883202099737533e-05, 'epoch': 0.35} 9%|▊ | 137/1572 [16:45<2:52:00, 7.19s/it] 9%|▉ | 138/1572 [16:53<2:56:10, 7.37s/it] {'loss': 0.7642, 'learning_rate': 1.8818897637795276e-05, 'epoch': 0.35} 9%|▉ | 138/1572 [16:53<2:56:10, 7.37s/it] 9%|▉ | 139/1572 [17:00<2:53:50, 7.28s/it] {'loss': 0.7199, 'learning_rate': 1.8805774278215227e-05, 'epoch': 0.35} 9%|▉ | 139/1572 [17:00<2:53:50, 7.28s/it] 9%|▉ | 140/1572 [17:08<3:01:00, 7.58s/it] {'loss': 0.8203, 'learning_rate': 1.879265091863517e-05, 'epoch': 0.36} 9%|▉ | 140/1572 [17:08<3:01:00, 7.58s/it] 9%|▉ | 141/1572 [17:15<2:56:02, 7.38s/it] {'loss': 0.7748, 'learning_rate': 1.877952755905512e-05, 'epoch': 0.36} 9%|▉ | 141/1572 [17:15<2:56:02, 7.38s/it] 9%|▉ | 142/1572 [17:22<2:55:16, 7.35s/it] {'loss': 0.7709, 'learning_rate': 1.8766404199475068e-05, 'epoch': 0.36} 9%|▉ | 142/1572 [17:22<2:55:16, 7.35s/it] 9%|▉ | 143/1572 [17:29<2:53:34, 7.29s/it] {'loss': 0.8527, 'learning_rate': 1.8753280839895015e-05, 'epoch': 0.36} 9%|▉ | 143/1572 [17:29<2:53:34, 7.29s/it] 9%|▉ | 144/1572 [17:36<2:51:13, 7.19s/it] {'loss': 0.6678, 'learning_rate': 1.8740157480314962e-05, 'epoch': 0.37} 9%|▉ | 144/1572 [17:36<2:51:13, 7.19s/it] 9%|▉ | 145/1572 [17:44<2:54:59, 7.36s/it] {'loss': 0.8097, 'learning_rate': 1.872703412073491e-05, 'epoch': 0.37} 9%|▉ | 145/1572 [17:44<2:54:59, 7.36s/it] 9%|▉ | 146/1572 [17:52<2:57:27, 7.47s/it] {'loss': 0.7239, 'learning_rate': 1.8713910761154856e-05, 'epoch': 0.37} 9%|▉ | 146/1572 [17:52<2:57:27, 7.47s/it] 9%|▉ | 147/1572 [17:59<2:58:27, 7.51s/it] {'loss': 0.6475, 'learning_rate': 1.8700787401574803e-05, 'epoch': 0.37} 9%|▉ | 147/1572 [17:59<2:58:27, 7.51s/it] 9%|▉ | 148/1572 [18:06<2:53:14, 7.30s/it] {'loss': 0.6625, 'learning_rate': 1.8687664041994754e-05, 'epoch': 0.38} 9%|▉ | 148/1572 [18:06<2:53:14, 7.30s/it] 9%|▉ | 149/1572 [18:14<2:54:31, 7.36s/it] {'loss': 0.7769, 'learning_rate': 1.8674540682414698e-05, 'epoch': 0.38} 9%|▉ | 149/1572 [18:14<2:54:31, 7.36s/it] 10%|▉ | 150/1572 [18:21<2:54:24, 7.36s/it] {'loss': 0.7377, 'learning_rate': 1.8661417322834648e-05, 'epoch': 0.38} 10%|▉ | 150/1572 [18:21<2:54:24, 7.36s/it] 10%|▉ | 151/1572 [18:28<2:52:13, 7.27s/it] {'loss': 0.7023, 'learning_rate': 1.8648293963254595e-05, 'epoch': 0.38} 10%|▉ | 151/1572 [18:28<2:52:13, 7.27s/it] 10%|▉ | 152/1572 [18:35<2:51:37, 7.25s/it] {'loss': 0.7637, 'learning_rate': 1.8635170603674542e-05, 'epoch': 0.39} 10%|▉ | 152/1572 [18:35<2:51:37, 7.25s/it] 10%|▉ | 153/1572 [18:43<2:56:19, 7.46s/it] {'loss': 0.7102, 'learning_rate': 1.862204724409449e-05, 'epoch': 0.39} 10%|▉ | 153/1572 [18:43<2:56:19, 7.46s/it] 10%|▉ | 154/1572 [18:51<2:54:56, 7.40s/it] {'loss': 0.7238, 'learning_rate': 1.8608923884514437e-05, 'epoch': 0.39} 10%|▉ | 154/1572 [18:51<2:54:56, 7.40s/it] 10%|▉ | 155/1572 [18:58<2:52:12, 7.29s/it] {'loss': 0.7406, 'learning_rate': 1.8595800524934384e-05, 'epoch': 0.39} 10%|▉ | 155/1572 [18:58<2:52:12, 7.29s/it] 10%|▉ | 156/1572 [19:05<2:55:22, 7.43s/it] {'loss': 0.834, 'learning_rate': 1.858267716535433e-05, 'epoch': 0.4} 10%|▉ | 156/1572 [19:05<2:55:22, 7.43s/it] 10%|▉ | 157/1572 [19:12<2:52:31, 7.32s/it] {'loss': 0.7231, 'learning_rate': 1.856955380577428e-05, 'epoch': 0.4} 10%|▉ | 157/1572 [19:12<2:52:31, 7.32s/it] 10%|█ | 158/1572 [19:20<2:54:29, 7.40s/it] {'loss': 0.6133, 'learning_rate': 1.855643044619423e-05, 'epoch': 0.4} 10%|█ | 158/1572 [19:20<2:54:29, 7.40s/it] 10%|█ | 159/1572 [19:27<2:49:54, 7.21s/it] {'loss': 0.7238, 'learning_rate': 1.8543307086614176e-05, 'epoch': 0.4} 10%|█ | 159/1572 [19:27<2:49:54, 7.21s/it] 10%|█ | 160/1572 [19:34<2:48:09, 7.15s/it] {'loss': 0.6676, 'learning_rate': 1.8530183727034123e-05, 'epoch': 0.41} 10%|█ | 160/1572 [19:34<2:48:09, 7.15s/it] 10%|█ | 161/1572 [19:41<2:48:26, 7.16s/it] {'loss': 0.7014, 'learning_rate': 1.851706036745407e-05, 'epoch': 0.41} 10%|█ | 161/1572 [19:41<2:48:26, 7.16s/it] 10%|█ | 162/1572 [19:48<2:49:09, 7.20s/it] {'loss': 0.7926, 'learning_rate': 1.8503937007874017e-05, 'epoch': 0.41} 10%|█ | 162/1572 [19:48<2:49:09, 7.20s/it] 10%|█ | 163/1572 [19:56<2:50:48, 7.27s/it] {'loss': 0.678, 'learning_rate': 1.8490813648293964e-05, 'epoch': 0.41} 10%|█ | 163/1572 [19:56<2:50:48, 7.27s/it] 10%|█ | 164/1572 [20:03<2:48:38, 7.19s/it] {'loss': 0.7271, 'learning_rate': 1.847769028871391e-05, 'epoch': 0.42} 10%|█ | 164/1572 [20:03<2:48:38, 7.19s/it] 10%|█ | 165/1572 [20:10<2:51:35, 7.32s/it] {'loss': 0.7583, 'learning_rate': 1.846456692913386e-05, 'epoch': 0.42} 10%|█ | 165/1572 [20:10<2:51:35, 7.32s/it] 11%|█ | 166/1572 [20:18<2:56:33, 7.53s/it] {'loss': 0.7569, 'learning_rate': 1.8451443569553805e-05, 'epoch': 0.42} 11%|█ | 166/1572 [20:18<2:56:33, 7.53s/it] 11%|█ | 167/1572 [20:26<2:55:57, 7.51s/it] {'loss': 0.6899, 'learning_rate': 1.8438320209973756e-05, 'epoch': 0.42} 11%|█ | 167/1572 [20:26<2:55:57, 7.51s/it] 11%|█ | 168/1572 [20:33<2:51:21, 7.32s/it] {'loss': 0.6988, 'learning_rate': 1.8425196850393703e-05, 'epoch': 0.43} 11%|█ | 168/1572 [20:33<2:51:21, 7.32s/it] 11%|█ | 169/1572 [20:40<2:52:01, 7.36s/it] {'loss': 0.7114, 'learning_rate': 1.841207349081365e-05, 'epoch': 0.43} 11%|█ | 169/1572 [20:40<2:52:01, 7.36s/it] 11%|█ | 170/1572 [20:47<2:51:42, 7.35s/it] {'loss': 0.6809, 'learning_rate': 1.8398950131233597e-05, 'epoch': 0.43} 11%|█ | 170/1572 [20:47<2:51:42, 7.35s/it] 11%|█ | 171/1572 [20:55<2:52:50, 7.40s/it] {'loss': 0.8045, 'learning_rate': 1.8385826771653544e-05, 'epoch': 0.43} 11%|█ | 171/1572 [20:55<2:52:50, 7.40s/it] 11%|█ | 172/1572 [21:02<2:53:24, 7.43s/it] {'loss': 0.7361, 'learning_rate': 1.837270341207349e-05, 'epoch': 0.44} 11%|█ | 172/1572 [21:02<2:53:24, 7.43s/it] 11%|█ | 173/1572 [21:09<2:49:34, 7.27s/it] {'loss': 0.7402, 'learning_rate': 1.835958005249344e-05, 'epoch': 0.44} 11%|█ | 173/1572 [21:09<2:49:34, 7.27s/it] 11%|█ | 174/1572 [21:17<2:49:59, 7.30s/it] {'loss': 0.7559, 'learning_rate': 1.834645669291339e-05, 'epoch': 0.44} 11%|█ | 174/1572 [21:17<2:49:59, 7.30s/it] 11%|█ | 175/1572 [21:24<2:48:57, 7.26s/it] {'loss': 0.7329, 'learning_rate': 1.8333333333333333e-05, 'epoch': 0.44} 11%|█ | 175/1572 [21:24<2:48:57, 7.26s/it] 11%|█ | 176/1572 [21:32<2:52:05, 7.40s/it] {'loss': 0.735, 'learning_rate': 1.8320209973753283e-05, 'epoch': 0.45} 11%|█ | 176/1572 [21:32<2:52:05, 7.40s/it] 11%|█▏ | 177/1572 [21:38<2:48:13, 7.24s/it] {'loss': 0.6951, 'learning_rate': 1.830708661417323e-05, 'epoch': 0.45} 11%|█▏ | 177/1572 [21:38<2:48:13, 7.24s/it] 11%|█▏ | 178/1572 [21:45<2:46:11, 7.15s/it] {'loss': 0.7412, 'learning_rate': 1.8293963254593178e-05, 'epoch': 0.45} 11%|█▏ | 178/1572 [21:45<2:46:11, 7.15s/it] 11%|█▏ | 179/1572 [21:52<2:44:39, 7.09s/it] {'loss': 0.8119, 'learning_rate': 1.8280839895013125e-05, 'epoch': 0.45} 11%|█▏ | 179/1572 [21:52<2:44:39, 7.09s/it] 11%|█▏ | 180/1572 [22:00<2:44:44, 7.10s/it] {'loss': 0.7192, 'learning_rate': 1.8267716535433072e-05, 'epoch': 0.46} 11%|█▏ | 180/1572 [22:00<2:44:44, 7.10s/it] 12%|█▏ | 181/1572 [22:07<2:45:10, 7.13s/it] {'loss': 0.7069, 'learning_rate': 1.825459317585302e-05, 'epoch': 0.46} 12%|█▏ | 181/1572 [22:07<2:45:10, 7.13s/it] 12%|█▏ | 182/1572 [22:14<2:44:43, 7.11s/it] {'loss': 0.7952, 'learning_rate': 1.8241469816272966e-05, 'epoch': 0.46} 12%|█▏ | 182/1572 [22:14<2:44:43, 7.11s/it] 12%|█▏ | 183/1572 [22:21<2:45:21, 7.14s/it] {'loss': 0.7064, 'learning_rate': 1.8228346456692917e-05, 'epoch': 0.46} 12%|█▏ | 183/1572 [22:21<2:45:21, 7.14s/it] 12%|█▏ | 184/1572 [22:28<2:45:20, 7.15s/it] {'loss': 0.7563, 'learning_rate': 1.821522309711286e-05, 'epoch': 0.47} 12%|█▏ | 184/1572 [22:28<2:45:20, 7.15s/it] 12%|█▏ | 185/1572 [22:36<2:51:10, 7.41s/it] {'loss': 0.6829, 'learning_rate': 1.820209973753281e-05, 'epoch': 0.47} 12%|█▏ | 185/1572 [22:36<2:51:10, 7.41s/it] 12%|█▏ | 186/1572 [22:44<2:53:16, 7.50s/it] {'loss': 0.6525, 'learning_rate': 1.8188976377952758e-05, 'epoch': 0.47} 12%|█▏ | 186/1572 [22:44<2:53:16, 7.50s/it] 12%|█▏ | 187/1572 [22:52<2:54:45, 7.57s/it] {'loss': 0.7344, 'learning_rate': 1.8175853018372705e-05, 'epoch': 0.47} 12%|█▏ | 187/1572 [22:52<2:54:45, 7.57s/it] 12%|█▏ | 188/1572 [22:59<2:51:08, 7.42s/it] {'loss': 0.7181, 'learning_rate': 1.8162729658792652e-05, 'epoch': 0.48} 12%|█▏ | 188/1572 [22:59<2:51:08, 7.42s/it] 12%|█▏ | 189/1572 [23:06<2:47:45, 7.28s/it] {'loss': 0.8006, 'learning_rate': 1.8149606299212603e-05, 'epoch': 0.48} 12%|█▏ | 189/1572 [23:06<2:47:45, 7.28s/it] 12%|█▏ | 190/1572 [23:13<2:47:18, 7.26s/it] {'loss': 0.6388, 'learning_rate': 1.8136482939632546e-05, 'epoch': 0.48} 12%|█▏ | 190/1572 [23:13<2:47:18, 7.26s/it] 12%|█▏ | 191/1572 [23:20<2:44:48, 7.16s/it] {'loss': 0.8009, 'learning_rate': 1.8123359580052497e-05, 'epoch': 0.48} 12%|█▏ | 191/1572 [23:20<2:44:48, 7.16s/it] 12%|█▏ | 192/1572 [23:28<2:52:40, 7.51s/it] {'loss': 0.7908, 'learning_rate': 1.811023622047244e-05, 'epoch': 0.49} 12%|█▏ | 192/1572 [23:28<2:52:40, 7.51s/it] 12%|█▏ | 193/1572 [23:36<2:53:13, 7.54s/it] {'loss': 0.6814, 'learning_rate': 1.809711286089239e-05, 'epoch': 0.49} 12%|█▏ | 193/1572 [23:36<2:53:13, 7.54s/it] 12%|█▏ | 194/1572 [23:43<2:49:57, 7.40s/it] {'loss': 0.7376, 'learning_rate': 1.8083989501312338e-05, 'epoch': 0.49} 12%|█▏ | 194/1572 [23:43<2:49:57, 7.40s/it] 12%|█▏ | 195/1572 [23:50<2:48:36, 7.35s/it] {'loss': 0.7501, 'learning_rate': 1.8070866141732285e-05, 'epoch': 0.5} 12%|█▏ | 195/1572 [23:50<2:48:36, 7.35s/it] 12%|█▏ | 196/1572 [23:57<2:48:55, 7.37s/it] {'loss': 0.7356, 'learning_rate': 1.8057742782152232e-05, 'epoch': 0.5} 12%|█▏ | 196/1572 [23:57<2:48:55, 7.37s/it] 13%|█▎ | 197/1572 [24:04<2:45:11, 7.21s/it] {'loss': 0.7889, 'learning_rate': 1.804461942257218e-05, 'epoch': 0.5} 13%|█▎ | 197/1572 [24:04<2:45:11, 7.21s/it] 13%|█▎ | 198/1572 [24:12<2:46:47, 7.28s/it] {'loss': 0.7564, 'learning_rate': 1.8031496062992127e-05, 'epoch': 0.5} 13%|█▎ | 198/1572 [24:12<2:46:47, 7.28s/it] 13%|█▎ | 199/1572 [24:19<2:47:54, 7.34s/it] {'loss': 0.759, 'learning_rate': 1.8018372703412074e-05, 'epoch': 0.51} 13%|█▎ | 199/1572 [24:19<2:47:54, 7.34s/it] 13%|█▎ | 200/1572 [24:27<2:47:57, 7.34s/it] {'loss': 0.7271, 'learning_rate': 1.8005249343832024e-05, 'epoch': 0.51} 13%|█▎ | 200/1572 [24:27<2:47:57, 7.34s/it] 13%|█▎ | 201/1572 [24:34<2:47:32, 7.33s/it] {'loss': 0.7391, 'learning_rate': 1.7992125984251968e-05, 'epoch': 0.51} 13%|█▎ | 201/1572 [24:34<2:47:32, 7.33s/it] 13%|█▎ | 202/1572 [24:41<2:48:33, 7.38s/it] {'loss': 0.7416, 'learning_rate': 1.797900262467192e-05, 'epoch': 0.51} 13%|█▎ | 202/1572 [24:41<2:48:33, 7.38s/it] 13%|█▎ | 203/1572 [24:48<2:45:11, 7.24s/it] {'loss': 0.816, 'learning_rate': 1.7965879265091866e-05, 'epoch': 0.52} 13%|█▎ | 203/1572 [24:48<2:45:11, 7.24s/it] 13%|█▎ | 204/1572 [24:56<2:50:48, 7.49s/it] {'loss': 0.8431, 'learning_rate': 1.7952755905511813e-05, 'epoch': 0.52} 13%|█▎ | 204/1572 [24:56<2:50:48, 7.49s/it] 13%|█▎ | 205/1572 [25:04<2:49:04, 7.42s/it] {'loss': 0.7536, 'learning_rate': 1.793963254593176e-05, 'epoch': 0.52} 13%|█▎ | 205/1572 [25:04<2:49:04, 7.42s/it] 13%|█▎ | 206/1572 [25:11<2:49:14, 7.43s/it] {'loss': 0.8151, 'learning_rate': 1.7926509186351707e-05, 'epoch': 0.52} 13%|█▎ | 206/1572 [25:11<2:49:14, 7.43s/it] 13%|█▎ | 207/1572 [25:18<2:46:54, 7.34s/it] {'loss': 0.7686, 'learning_rate': 1.7913385826771654e-05, 'epoch': 0.53} 13%|█▎ | 207/1572 [25:18<2:46:54, 7.34s/it] 13%|█▎ | 208/1572 [25:26<2:52:25, 7.58s/it] {'loss': 0.8059, 'learning_rate': 1.79002624671916e-05, 'epoch': 0.53} 13%|█▎ | 208/1572 [25:26<2:52:25, 7.58s/it] 13%|█▎ | 209/1572 [25:34<2:50:52, 7.52s/it] {'loss': 0.7326, 'learning_rate': 1.7887139107611552e-05, 'epoch': 0.53} 13%|█▎ | 209/1572 [25:34<2:50:52, 7.52s/it] 13%|█▎ | 210/1572 [25:41<2:48:58, 7.44s/it] {'loss': 0.7793, 'learning_rate': 1.7874015748031495e-05, 'epoch': 0.53} 13%|█▎ | 210/1572 [25:41<2:48:58, 7.44s/it] 13%|█▎ | 211/1572 [25:49<2:49:59, 7.49s/it] {'loss': 0.7138, 'learning_rate': 1.7860892388451446e-05, 'epoch': 0.54} 13%|█▎ | 211/1572 [25:49<2:49:59, 7.49s/it] 13%|█▎ | 212/1572 [25:55<2:45:01, 7.28s/it] {'loss': 0.7229, 'learning_rate': 1.7847769028871393e-05, 'epoch': 0.54} 13%|█▎ | 212/1572 [25:55<2:45:01, 7.28s/it] 14%|█▎ | 213/1572 [26:03<2:44:21, 7.26s/it] {'loss': 0.6943, 'learning_rate': 1.783464566929134e-05, 'epoch': 0.54} 14%|█▎ | 213/1572 [26:03<2:44:21, 7.26s/it] 14%|█▎ | 214/1572 [26:10<2:46:11, 7.34s/it] {'loss': 0.7195, 'learning_rate': 1.7821522309711287e-05, 'epoch': 0.54} 14%|█▎ | 214/1572 [26:10<2:46:11, 7.34s/it] 14%|█▎ | 215/1572 [26:17<2:44:00, 7.25s/it] {'loss': 0.7308, 'learning_rate': 1.7808398950131234e-05, 'epoch': 0.55} 14%|█▎ | 215/1572 [26:17<2:44:00, 7.25s/it] 14%|█▎ | 216/1572 [26:25<2:46:08, 7.35s/it] {'loss': 0.7402, 'learning_rate': 1.779527559055118e-05, 'epoch': 0.55} 14%|█▎ | 216/1572 [26:25<2:46:08, 7.35s/it] 14%|█▍ | 217/1572 [26:32<2:43:21, 7.23s/it] {'loss': 0.744, 'learning_rate': 1.778215223097113e-05, 'epoch': 0.55} 14%|█▍ | 217/1572 [26:32<2:43:21, 7.23s/it] 14%|█▍ | 218/1572 [26:39<2:43:00, 7.22s/it] {'loss': 0.7673, 'learning_rate': 1.776902887139108e-05, 'epoch': 0.55} 14%|█▍ | 218/1572 [26:39<2:43:00, 7.22s/it] 14%|█▍ | 219/1572 [26:46<2:43:25, 7.25s/it] {'loss': 0.7006, 'learning_rate': 1.7755905511811026e-05, 'epoch': 0.56} 14%|█▍ | 219/1572 [26:46<2:43:25, 7.25s/it] 14%|█▍ | 220/1572 [26:53<2:43:15, 7.25s/it] {'loss': 0.6871, 'learning_rate': 1.7742782152230973e-05, 'epoch': 0.56} 14%|█▍ | 220/1572 [26:53<2:43:15, 7.25s/it] 14%|█▍ | 221/1572 [27:00<2:41:35, 7.18s/it] {'loss': 0.7029, 'learning_rate': 1.772965879265092e-05, 'epoch': 0.56} 14%|█▍ | 221/1572 [27:00<2:41:35, 7.18s/it] 14%|█▍ | 222/1572 [27:08<2:42:57, 7.24s/it] {'loss': 0.7153, 'learning_rate': 1.7716535433070868e-05, 'epoch': 0.56} 14%|█▍ | 222/1572 [27:08<2:42:57, 7.24s/it] 14%|█▍ | 223/1572 [27:16<2:45:44, 7.37s/it] {'loss': 0.7385, 'learning_rate': 1.7703412073490815e-05, 'epoch': 0.57} 14%|█▍ | 223/1572 [27:16<2:45:44, 7.37s/it] 14%|█▍ | 224/1572 [27:23<2:47:47, 7.47s/it] {'loss': 0.712, 'learning_rate': 1.7690288713910762e-05, 'epoch': 0.57} 14%|█▍ | 224/1572 [27:23<2:47:47, 7.47s/it] 14%|█▍ | 225/1572 [27:31<2:47:19, 7.45s/it] {'loss': 0.7185, 'learning_rate': 1.767716535433071e-05, 'epoch': 0.57} 14%|█▍ | 225/1572 [27:31<2:47:19, 7.45s/it] 14%|█▍ | 226/1572 [27:38<2:46:40, 7.43s/it] {'loss': 0.7657, 'learning_rate': 1.766404199475066e-05, 'epoch': 0.57} 14%|█▍ | 226/1572 [27:38<2:46:40, 7.43s/it] 14%|█▍ | 227/1572 [27:46<2:47:34, 7.48s/it] {'loss': 0.7549, 'learning_rate': 1.7650918635170603e-05, 'epoch': 0.58} 14%|█▍ | 227/1572 [27:46<2:47:34, 7.48s/it] 15%|█▍ | 228/1572 [27:53<2:44:31, 7.34s/it] {'loss': 0.7118, 'learning_rate': 1.7637795275590554e-05, 'epoch': 0.58} 15%|█▍ | 228/1572 [27:53<2:44:31, 7.34s/it] 15%|█▍ | 229/1572 [28:00<2:43:43, 7.31s/it] {'loss': 0.6845, 'learning_rate': 1.76246719160105e-05, 'epoch': 0.58} 15%|█▍ | 229/1572 [28:00<2:43:43, 7.31s/it] 15%|█▍ | 230/1572 [28:07<2:43:52, 7.33s/it] {'loss': 0.689, 'learning_rate': 1.7611548556430448e-05, 'epoch': 0.58} 15%|█▍ | 230/1572 [28:07<2:43:52, 7.33s/it] 15%|█▍ | 231/1572 [28:15<2:46:13, 7.44s/it] {'loss': 0.7365, 'learning_rate': 1.7598425196850395e-05, 'epoch': 0.59} 15%|█▍ | 231/1572 [28:15<2:46:13, 7.44s/it] 15%|█▍ | 232/1572 [28:23<2:51:48, 7.69s/it] {'loss': 0.7297, 'learning_rate': 1.7585301837270342e-05, 'epoch': 0.59} 15%|█▍ | 232/1572 [28:23<2:51:48, 7.69s/it] 15%|█▍ | 233/1572 [28:31<2:49:13, 7.58s/it] {'loss': 0.7449, 'learning_rate': 1.757217847769029e-05, 'epoch': 0.59} 15%|█▍ | 233/1572 [28:31<2:49:13, 7.58s/it] 15%|█▍ | 234/1572 [28:38<2:45:55, 7.44s/it] {'loss': 0.7652, 'learning_rate': 1.7559055118110236e-05, 'epoch': 0.59} 15%|█▍ | 234/1572 [28:38<2:45:55, 7.44s/it] 15%|█▍ | 235/1572 [28:45<2:47:02, 7.50s/it] {'loss': 0.719, 'learning_rate': 1.7545931758530187e-05, 'epoch': 0.6} 15%|█▍ | 235/1572 [28:45<2:47:02, 7.50s/it] 15%|█▌ | 236/1572 [28:53<2:46:19, 7.47s/it] {'loss': 0.6519, 'learning_rate': 1.753280839895013e-05, 'epoch': 0.6} 15%|█▌ | 236/1572 [28:53<2:46:19, 7.47s/it] 15%|█▌ | 237/1572 [29:00<2:45:15, 7.43s/it] {'loss': 0.7801, 'learning_rate': 1.751968503937008e-05, 'epoch': 0.6} 15%|█▌ | 237/1572 [29:00<2:45:15, 7.43s/it] 15%|█▌ | 238/1572 [29:08<2:47:32, 7.54s/it] {'loss': 0.7334, 'learning_rate': 1.7506561679790028e-05, 'epoch': 0.6} 15%|█▌ | 238/1572 [29:08<2:47:32, 7.54s/it] 15%|█▌ | 239/1572 [29:15<2:42:55, 7.33s/it] {'loss': 0.689, 'learning_rate': 1.7493438320209975e-05, 'epoch': 0.61} 15%|█▌ | 239/1572 [29:15<2:42:55, 7.33s/it] 15%|█▌ | 240/1572 [29:23<2:47:22, 7.54s/it] {'loss': 0.8085, 'learning_rate': 1.7480314960629923e-05, 'epoch': 0.61} 15%|█▌ | 240/1572 [29:23<2:47:22, 7.54s/it] 15%|█▌ | 241/1572 [29:30<2:45:51, 7.48s/it] {'loss': 0.8845, 'learning_rate': 1.746719160104987e-05, 'epoch': 0.61} 15%|█▌ | 241/1572 [29:30<2:45:51, 7.48s/it] 15%|█▌ | 242/1572 [29:38<2:48:45, 7.61s/it] {'loss': 0.6735, 'learning_rate': 1.7454068241469817e-05, 'epoch': 0.61} 15%|█▌ | 242/1572 [29:38<2:48:45, 7.61s/it] 15%|█▌ | 243/1572 [29:45<2:46:32, 7.52s/it] {'loss': 0.7002, 'learning_rate': 1.7440944881889764e-05, 'epoch': 0.62} 15%|█▌ | 243/1572 [29:45<2:46:32, 7.52s/it] 16%|█▌ | 244/1572 [29:53<2:45:33, 7.48s/it] {'loss': 0.6246, 'learning_rate': 1.7427821522309714e-05, 'epoch': 0.62} 16%|█▌ | 244/1572 [29:53<2:45:33, 7.48s/it] 16%|█▌ | 245/1572 [30:00<2:46:41, 7.54s/it] {'loss': 0.6858, 'learning_rate': 1.7414698162729658e-05, 'epoch': 0.62} 16%|█▌ | 245/1572 [30:00<2:46:41, 7.54s/it] 16%|█▌ | 246/1572 [30:08<2:45:51, 7.50s/it] {'loss': 0.6897, 'learning_rate': 1.740157480314961e-05, 'epoch': 0.62} 16%|█▌ | 246/1572 [30:08<2:45:51, 7.50s/it] 16%|█▌ | 247/1572 [30:15<2:42:25, 7.35s/it] {'loss': 0.6294, 'learning_rate': 1.7388451443569556e-05, 'epoch': 0.63} 16%|█▌ | 247/1572 [30:15<2:42:25, 7.35s/it] 16%|█▌ | 248/1572 [30:22<2:43:00, 7.39s/it] {'loss': 0.6608, 'learning_rate': 1.7375328083989503e-05, 'epoch': 0.63} 16%|█▌ | 248/1572 [30:22<2:43:00, 7.39s/it] 16%|█▌ | 249/1572 [30:29<2:40:18, 7.27s/it] {'loss': 0.6644, 'learning_rate': 1.736220472440945e-05, 'epoch': 0.63} 16%|█▌ | 249/1572 [30:29<2:40:18, 7.27s/it] 16%|█▌ | 250/1572 [30:36<2:38:15, 7.18s/it] {'loss': 0.6894, 'learning_rate': 1.7349081364829397e-05, 'epoch': 0.63} 16%|█▌ | 250/1572 [30:36<2:38:15, 7.18s/it] 16%|█▌ | 251/1572 [30:44<2:41:22, 7.33s/it] {'loss': 0.6826, 'learning_rate': 1.7335958005249344e-05, 'epoch': 0.64} 16%|█▌ | 251/1572 [30:44<2:41:22, 7.33s/it] 16%|█▌ | 252/1572 [30:51<2:39:29, 7.25s/it] {'loss': 0.719, 'learning_rate': 1.7322834645669295e-05, 'epoch': 0.64} 16%|█▌ | 252/1572 [30:51<2:39:29, 7.25s/it] 16%|█▌ | 253/1572 [30:58<2:39:14, 7.24s/it] {'loss': 0.7302, 'learning_rate': 1.7309711286089242e-05, 'epoch': 0.64} 16%|█▌ | 253/1572 [30:58<2:39:14, 7.24s/it] 16%|█▌ | 254/1572 [31:06<2:42:55, 7.42s/it] {'loss': 0.7835, 'learning_rate': 1.729658792650919e-05, 'epoch': 0.64} 16%|█▌ | 254/1572 [31:06<2:42:55, 7.42s/it] 16%|█▌ | 255/1572 [31:13<2:41:33, 7.36s/it] {'loss': 0.8102, 'learning_rate': 1.7283464566929136e-05, 'epoch': 0.65} 16%|█▌ | 255/1572 [31:13<2:41:33, 7.36s/it] 16%|█▋ | 256/1572 [31:21<2:43:38, 7.46s/it] {'loss': 0.7183, 'learning_rate': 1.7270341207349083e-05, 'epoch': 0.65} 16%|█▋ | 256/1572 [31:21<2:43:38, 7.46s/it] 16%|█▋ | 257/1572 [31:28<2:38:06, 7.21s/it] {'loss': 0.7524, 'learning_rate': 1.725721784776903e-05, 'epoch': 0.65} 16%|█▋ | 257/1572 [31:28<2:38:06, 7.21s/it] 16%|█▋ | 258/1572 [31:35<2:39:33, 7.29s/it] {'loss': 0.7385, 'learning_rate': 1.7244094488188977e-05, 'epoch': 0.66} 16%|█▋ | 258/1572 [31:35<2:39:33, 7.29s/it] 16%|█▋ | 259/1572 [31:43<2:41:39, 7.39s/it] {'loss': 0.7174, 'learning_rate': 1.7230971128608925e-05, 'epoch': 0.66} 16%|█▋ | 259/1572 [31:43<2:41:39, 7.39s/it] 17%|█▋ | 260/1572 [31:50<2:41:29, 7.39s/it] {'loss': 0.6565, 'learning_rate': 1.721784776902887e-05, 'epoch': 0.66} 17%|█▋ | 260/1572 [31:50<2:41:29, 7.39s/it] 17%|█▋ | 261/1572 [31:57<2:40:51, 7.36s/it] {'loss': 0.7028, 'learning_rate': 1.7204724409448822e-05, 'epoch': 0.66} 17%|█▋ | 261/1572 [31:57<2:40:51, 7.36s/it] 17%|█▋ | 262/1572 [32:04<2:37:42, 7.22s/it] {'loss': 0.6655, 'learning_rate': 1.7191601049868766e-05, 'epoch': 0.67} 17%|█▋ | 262/1572 [32:04<2:37:42, 7.22s/it] 17%|█▋ | 263/1572 [32:12<2:39:02, 7.29s/it] {'loss': 0.6893, 'learning_rate': 1.7178477690288716e-05, 'epoch': 0.67} 17%|█▋ | 263/1572 [32:12<2:39:02, 7.29s/it] 17%|█▋ | 264/1572 [32:19<2:36:15, 7.17s/it] {'loss': 0.6681, 'learning_rate': 1.7165354330708663e-05, 'epoch': 0.67} 17%|█▋ | 264/1572 [32:19<2:36:15, 7.17s/it] 17%|█▋ | 265/1572 [32:26<2:36:40, 7.19s/it] {'loss': 0.6808, 'learning_rate': 1.715223097112861e-05, 'epoch': 0.67} 17%|█▋ | 265/1572 [32:26<2:36:40, 7.19s/it] 17%|█▋ | 266/1572 [32:33<2:35:29, 7.14s/it] {'loss': 0.6594, 'learning_rate': 1.7139107611548558e-05, 'epoch': 0.68} 17%|█▋ | 266/1572 [32:33<2:35:29, 7.14s/it] 17%|█▋ | 267/1572 [32:40<2:35:56, 7.17s/it] {'loss': 0.7971, 'learning_rate': 1.7125984251968505e-05, 'epoch': 0.68} 17%|█▋ | 267/1572 [32:40<2:35:56, 7.17s/it] 17%|█▋ | 268/1572 [32:47<2:36:09, 7.19s/it] {'loss': 0.7395, 'learning_rate': 1.7112860892388452e-05, 'epoch': 0.68} 17%|█▋ | 268/1572 [32:47<2:36:09, 7.19s/it] 17%|█▋ | 269/1572 [32:54<2:35:24, 7.16s/it] {'loss': 0.6938, 'learning_rate': 1.70997375328084e-05, 'epoch': 0.68} 17%|█▋ | 269/1572 [32:54<2:35:24, 7.16s/it] 17%|█▋ | 270/1572 [33:02<2:35:53, 7.18s/it] {'loss': 0.7269, 'learning_rate': 1.708661417322835e-05, 'epoch': 0.69} 17%|█▋ | 270/1572 [33:02<2:35:53, 7.18s/it] 17%|█▋ | 271/1572 [33:09<2:39:01, 7.33s/it] {'loss': 0.6377, 'learning_rate': 1.7073490813648293e-05, 'epoch': 0.69} 17%|█▋ | 271/1572 [33:09<2:39:01, 7.33s/it] 17%|█▋ | 272/1572 [33:16<2:36:54, 7.24s/it] {'loss': 0.6876, 'learning_rate': 1.7060367454068244e-05, 'epoch': 0.69} 17%|█▋ | 272/1572 [33:16<2:36:54, 7.24s/it] 17%|█▋ | 273/1572 [33:24<2:37:26, 7.27s/it] {'loss': 0.7468, 'learning_rate': 1.704724409448819e-05, 'epoch': 0.69} 17%|█▋ | 273/1572 [33:24<2:37:26, 7.27s/it] 17%|█▋ | 274/1572 [33:31<2:37:03, 7.26s/it] {'loss': 0.7081, 'learning_rate': 1.7034120734908138e-05, 'epoch': 0.7} 17%|█▋ | 274/1572 [33:31<2:37:03, 7.26s/it] 17%|█▋ | 275/1572 [33:38<2:36:01, 7.22s/it] {'loss': 0.7155, 'learning_rate': 1.7020997375328085e-05, 'epoch': 0.7} 17%|█▋ | 275/1572 [33:38<2:36:01, 7.22s/it] 18%|█▊ | 276/1572 [33:46<2:39:44, 7.40s/it] {'loss': 0.7403, 'learning_rate': 1.7007874015748032e-05, 'epoch': 0.7} 18%|█▊ | 276/1572 [33:46<2:39:44, 7.40s/it] 18%|█▊ | 277/1572 [33:54<2:43:22, 7.57s/it] {'loss': 0.7509, 'learning_rate': 1.699475065616798e-05, 'epoch': 0.7} 18%|█▊ | 277/1572 [33:54<2:43:22, 7.57s/it] 18%|█▊ | 278/1572 [34:02<2:44:58, 7.65s/it] {'loss': 0.7691, 'learning_rate': 1.6981627296587927e-05, 'epoch': 0.71} 18%|█▊ | 278/1572 [34:02<2:44:58, 7.65s/it] 18%|█▊ | 279/1572 [34:09<2:40:05, 7.43s/it] {'loss': 0.6769, 'learning_rate': 1.6968503937007877e-05, 'epoch': 0.71} 18%|█▊ | 279/1572 [34:09<2:40:05, 7.43s/it] 18%|█▊ | 280/1572 [34:16<2:42:26, 7.54s/it] {'loss': 0.7203, 'learning_rate': 1.695538057742782e-05, 'epoch': 0.71} 18%|█▊ | 280/1572 [34:16<2:42:26, 7.54s/it] 18%|█▊ | 281/1572 [34:23<2:38:51, 7.38s/it] {'loss': 0.6667, 'learning_rate': 1.694225721784777e-05, 'epoch': 0.71} 18%|█▊ | 281/1572 [34:23<2:38:51, 7.38s/it] 18%|█▊ | 282/1572 [34:30<2:36:16, 7.27s/it] {'loss': 0.6891, 'learning_rate': 1.692913385826772e-05, 'epoch': 0.72} 18%|█▊ | 282/1572 [34:30<2:36:16, 7.27s/it] 18%|█▊ | 283/1572 [34:38<2:38:08, 7.36s/it] {'loss': 0.6798, 'learning_rate': 1.6916010498687665e-05, 'epoch': 0.72} 18%|█▊ | 283/1572 [34:38<2:38:08, 7.36s/it] 18%|█▊ | 284/1572 [34:45<2:36:52, 7.31s/it] {'loss': 0.6851, 'learning_rate': 1.6902887139107613e-05, 'epoch': 0.72} 18%|█▊ | 284/1572 [34:45<2:36:52, 7.31s/it] 18%|█▊ | 285/1572 [34:52<2:36:15, 7.28s/it] {'loss': 0.6759, 'learning_rate': 1.6889763779527563e-05, 'epoch': 0.72} 18%|█▊ | 285/1572 [34:52<2:36:15, 7.28s/it] 18%|█▊ | 286/1572 [34:59<2:33:17, 7.15s/it] {'loss': 0.704, 'learning_rate': 1.6876640419947507e-05, 'epoch': 0.73} 18%|█▊ | 286/1572 [34:59<2:33:17, 7.15s/it] 18%|█▊ | 287/1572 [35:07<2:38:22, 7.39s/it] {'loss': 0.6572, 'learning_rate': 1.6863517060367457e-05, 'epoch': 0.73} 18%|█▊ | 287/1572 [35:07<2:38:22, 7.39s/it] 18%|█▊ | 288/1572 [35:14<2:35:02, 7.24s/it] {'loss': 0.7341, 'learning_rate': 1.68503937007874e-05, 'epoch': 0.73} 18%|█▊ | 288/1572 [35:14<2:35:02, 7.24s/it] 18%|█▊ | 289/1572 [35:22<2:37:12, 7.35s/it] {'loss': 0.7115, 'learning_rate': 1.683727034120735e-05, 'epoch': 0.73} 18%|█▊ | 289/1572 [35:22<2:37:12, 7.35s/it] 18%|█▊ | 290/1572 [35:28<2:32:59, 7.16s/it] {'loss': 0.7271, 'learning_rate': 1.68241469816273e-05, 'epoch': 0.74} 18%|█▊ | 290/1572 [35:28<2:32:59, 7.16s/it] 19%|█▊ | 291/1572 [35:35<2:31:34, 7.10s/it] {'loss': 0.703, 'learning_rate': 1.6811023622047246e-05, 'epoch': 0.74} 19%|█▊ | 291/1572 [35:35<2:31:34, 7.10s/it] 19%|█▊ | 292/1572 [35:42<2:29:52, 7.03s/it] {'loss': 0.7439, 'learning_rate': 1.6797900262467193e-05, 'epoch': 0.74} 19%|█▊ | 292/1572 [35:42<2:29:52, 7.03s/it] 19%|█▊ | 293/1572 [35:49<2:28:53, 6.98s/it] {'loss': 0.6744, 'learning_rate': 1.678477690288714e-05, 'epoch': 0.74} 19%|█▊ | 293/1572 [35:49<2:28:53, 6.98s/it] 19%|█▊ | 294/1572 [35:56<2:29:02, 7.00s/it] {'loss': 0.7919, 'learning_rate': 1.6771653543307087e-05, 'epoch': 0.75} 19%|█▊ | 294/1572 [35:56<2:29:02, 7.00s/it] 19%|█▉ | 295/1572 [36:04<2:31:51, 7.14s/it] {'loss': 0.773, 'learning_rate': 1.6758530183727034e-05, 'epoch': 0.75} 19%|█▉ | 295/1572 [36:04<2:31:51, 7.14s/it] 19%|█▉ | 296/1572 [36:11<2:34:01, 7.24s/it] {'loss': 0.7258, 'learning_rate': 1.6745406824146985e-05, 'epoch': 0.75} 19%|█▉ | 296/1572 [36:11<2:34:01, 7.24s/it] 19%|█▉ | 297/1572 [36:18<2:30:21, 7.08s/it] {'loss': 0.7358, 'learning_rate': 1.673228346456693e-05, 'epoch': 0.75} 19%|█▉ | 297/1572 [36:18<2:30:21, 7.08s/it] 19%|█▉ | 298/1572 [36:26<2:36:03, 7.35s/it] {'loss': 0.7336, 'learning_rate': 1.671916010498688e-05, 'epoch': 0.76} 19%|█▉ | 298/1572 [36:26<2:36:03, 7.35s/it] 19%|█▉ | 299/1572 [36:33<2:32:46, 7.20s/it] {'loss': 0.6757, 'learning_rate': 1.6706036745406826e-05, 'epoch': 0.76} 19%|█▉ | 299/1572 [36:33<2:32:46, 7.20s/it] 19%|█▉ | 300/1572 [36:40<2:31:38, 7.15s/it] {'loss': 0.7411, 'learning_rate': 1.6692913385826773e-05, 'epoch': 0.76} 19%|█▉ | 300/1572 [36:40<2:31:38, 7.15s/it] 19%|█▉ | 301/1572 [36:47<2:32:40, 7.21s/it] {'loss': 0.8021, 'learning_rate': 1.667979002624672e-05, 'epoch': 0.76} 19%|█▉ | 301/1572 [36:47<2:32:40, 7.21s/it] 19%|█▉ | 302/1572 [36:55<2:35:28, 7.35s/it] {'loss': 0.7287, 'learning_rate': 1.6666666666666667e-05, 'epoch': 0.77} 19%|█▉ | 302/1572 [36:55<2:35:28, 7.35s/it] 19%|█▉ | 303/1572 [37:01<2:32:21, 7.20s/it] {'loss': 0.7613, 'learning_rate': 1.6653543307086615e-05, 'epoch': 0.77} 19%|█▉ | 303/1572 [37:01<2:32:21, 7.20s/it] 19%|█▉ | 304/1572 [37:09<2:35:55, 7.38s/it] {'loss': 0.8577, 'learning_rate': 1.6640419947506562e-05, 'epoch': 0.77} 19%|█▉ | 304/1572 [37:09<2:35:55, 7.38s/it] 19%|█▉ | 305/1572 [37:16<2:33:33, 7.27s/it] {'loss': 0.7202, 'learning_rate': 1.6627296587926512e-05, 'epoch': 0.77} 19%|█▉ | 305/1572 [37:16<2:33:33, 7.27s/it] 19%|█▉ | 306/1572 [37:24<2:33:34, 7.28s/it] {'loss': 0.7679, 'learning_rate': 1.6614173228346456e-05, 'epoch': 0.78} 19%|█▉ | 306/1572 [37:24<2:33:34, 7.28s/it] 20%|█▉ | 307/1572 [37:31<2:32:13, 7.22s/it] {'loss': 0.6743, 'learning_rate': 1.6601049868766406e-05, 'epoch': 0.78} 20%|█▉ | 307/1572 [37:31<2:32:13, 7.22s/it] 20%|█▉ | 308/1572 [37:38<2:34:14, 7.32s/it] {'loss': 0.7753, 'learning_rate': 1.6587926509186354e-05, 'epoch': 0.78} 20%|█▉ | 308/1572 [37:38<2:34:14, 7.32s/it] 20%|█▉ | 309/1572 [37:45<2:33:45, 7.30s/it] {'loss': 0.7147, 'learning_rate': 1.65748031496063e-05, 'epoch': 0.78} 20%|█▉ | 309/1572 [37:45<2:33:45, 7.30s/it] 20%|█▉ | 310/1572 [37:53<2:32:25, 7.25s/it] {'loss': 0.6653, 'learning_rate': 1.6561679790026248e-05, 'epoch': 0.79} 20%|█▉ | 310/1572 [37:53<2:32:25, 7.25s/it] 20%|█▉ | 311/1572 [38:00<2:33:22, 7.30s/it] {'loss': 0.6641, 'learning_rate': 1.6548556430446195e-05, 'epoch': 0.79} 20%|█▉ | 311/1572 [38:00<2:33:22, 7.30s/it] 20%|█▉ | 312/1572 [38:07<2:30:07, 7.15s/it] {'loss': 0.7154, 'learning_rate': 1.6535433070866142e-05, 'epoch': 0.79} 20%|█▉ | 312/1572 [38:07<2:30:07, 7.15s/it] 20%|█▉ | 313/1572 [38:14<2:33:04, 7.30s/it] {'loss': 0.7679, 'learning_rate': 1.6522309711286093e-05, 'epoch': 0.79} 20%|█▉ | 313/1572 [38:14<2:33:04, 7.30s/it] 20%|█▉ | 314/1572 [38:22<2:32:23, 7.27s/it] {'loss': 0.6796, 'learning_rate': 1.650918635170604e-05, 'epoch': 0.8} 20%|█▉ | 314/1572 [38:22<2:32:23, 7.27s/it] 20%|██ | 315/1572 [38:29<2:32:48, 7.29s/it] {'loss': 0.7906, 'learning_rate': 1.6496062992125987e-05, 'epoch': 0.8} 20%|██ | 315/1572 [38:29<2:32:48, 7.29s/it] 20%|██ | 316/1572 [38:36<2:28:45, 7.11s/it] {'loss': 0.6748, 'learning_rate': 1.6482939632545934e-05, 'epoch': 0.8} 20%|██ | 316/1572 [38:36<2:28:45, 7.11s/it] 20%|██ | 317/1572 [38:43<2:30:59, 7.22s/it] {'loss': 0.6933, 'learning_rate': 1.646981627296588e-05, 'epoch': 0.8} 20%|██ | 317/1572 [38:43<2:30:59, 7.22s/it] 20%|██ | 318/1572 [38:50<2:31:00, 7.22s/it] {'loss': 0.7913, 'learning_rate': 1.6456692913385828e-05, 'epoch': 0.81} 20%|██ | 318/1572 [38:50<2:31:00, 7.22s/it] 20%|██ | 319/1572 [38:58<2:30:39, 7.21s/it] {'loss': 0.7508, 'learning_rate': 1.6443569553805775e-05, 'epoch': 0.81} 20%|██ | 319/1572 [38:58<2:30:39, 7.21s/it] 20%|██ | 320/1572 [39:05<2:29:01, 7.14s/it] {'loss': 0.6814, 'learning_rate': 1.6430446194225722e-05, 'epoch': 0.81} 20%|██ | 320/1572 [39:05<2:29:01, 7.14s/it] 20%|██ | 321/1572 [39:12<2:32:34, 7.32s/it] {'loss': 0.7016, 'learning_rate': 1.641732283464567e-05, 'epoch': 0.81} 20%|██ | 321/1572 [39:12<2:32:34, 7.32s/it] 20%|██ | 322/1572 [39:19<2:31:26, 7.27s/it] {'loss': 0.7073, 'learning_rate': 1.640419947506562e-05, 'epoch': 0.82} 20%|██ | 322/1572 [39:19<2:31:26, 7.27s/it] 21%|██ | 323/1572 [39:27<2:33:42, 7.38s/it] {'loss': 0.7956, 'learning_rate': 1.6391076115485564e-05, 'epoch': 0.82} 21%|██ | 323/1572 [39:27<2:33:42, 7.38s/it] 21%|██ | 324/1572 [39:35<2:33:58, 7.40s/it] {'loss': 0.6889, 'learning_rate': 1.6377952755905514e-05, 'epoch': 0.82} 21%|██ | 324/1572 [39:35<2:33:58, 7.40s/it] 21%|██ | 325/1572 [39:42<2:33:26, 7.38s/it] {'loss': 0.6842, 'learning_rate': 1.636482939632546e-05, 'epoch': 0.83} 21%|██ | 325/1572 [39:42<2:33:26, 7.38s/it] 21%|██ | 326/1572 [39:49<2:32:20, 7.34s/it] {'loss': 0.7083, 'learning_rate': 1.635170603674541e-05, 'epoch': 0.83} 21%|██ | 326/1572 [39:49<2:32:20, 7.34s/it] 21%|██ | 327/1572 [39:56<2:30:44, 7.26s/it] {'loss': 0.695, 'learning_rate': 1.6338582677165356e-05, 'epoch': 0.83} 21%|██ | 327/1572 [39:56<2:30:44, 7.26s/it] 21%|██ | 328/1572 [40:03<2:27:19, 7.11s/it] {'loss': 0.6817, 'learning_rate': 1.6325459317585303e-05, 'epoch': 0.83} 21%|██ | 328/1572 [40:03<2:27:19, 7.11s/it] 21%|██ | 329/1572 [40:10<2:25:58, 7.05s/it] {'loss': 0.6597, 'learning_rate': 1.631233595800525e-05, 'epoch': 0.84} 21%|██ | 329/1572 [40:10<2:25:58, 7.05s/it] 21%|██ | 330/1572 [40:17<2:26:34, 7.08s/it] {'loss': 0.8122, 'learning_rate': 1.6299212598425197e-05, 'epoch': 0.84} 21%|██ | 330/1572 [40:17<2:26:34, 7.08s/it] 21%|██ | 331/1572 [40:24<2:28:37, 7.19s/it] {'loss': 0.7084, 'learning_rate': 1.6286089238845147e-05, 'epoch': 0.84} 21%|██ | 331/1572 [40:24<2:28:37, 7.19s/it] 21%|██ | 332/1572 [40:31<2:25:31, 7.04s/it] {'loss': 0.6669, 'learning_rate': 1.627296587926509e-05, 'epoch': 0.84} 21%|██ | 332/1572 [40:31<2:25:31, 7.04s/it] 21%|██ | 333/1572 [40:38<2:26:33, 7.10s/it] {'loss': 0.7016, 'learning_rate': 1.625984251968504e-05, 'epoch': 0.85} 21%|██ | 333/1572 [40:38<2:26:33, 7.10s/it] 21%|██ | 334/1572 [40:46<2:30:13, 7.28s/it] {'loss': 0.6762, 'learning_rate': 1.624671916010499e-05, 'epoch': 0.85} 21%|██ | 334/1572 [40:46<2:30:13, 7.28s/it] 21%|██▏ | 335/1572 [40:53<2:28:20, 7.20s/it] {'loss': 0.707, 'learning_rate': 1.6233595800524936e-05, 'epoch': 0.85} 21%|██▏ | 335/1572 [40:53<2:28:20, 7.20s/it] 21%|██▏ | 336/1572 [41:01<2:30:27, 7.30s/it] {'loss': 0.7484, 'learning_rate': 1.6220472440944883e-05, 'epoch': 0.85} 21%|██▏ | 336/1572 [41:01<2:30:27, 7.30s/it] 21%|██▏ | 337/1572 [41:08<2:29:34, 7.27s/it] {'loss': 0.6942, 'learning_rate': 1.620734908136483e-05, 'epoch': 0.86} 21%|██▏ | 337/1572 [41:08<2:29:34, 7.27s/it] 22%|██▏ | 338/1572 [41:15<2:26:25, 7.12s/it] {'loss': 0.657, 'learning_rate': 1.6194225721784777e-05, 'epoch': 0.86} 22%|██▏ | 338/1572 [41:15<2:26:25, 7.12s/it] 22%|██▏ | 339/1572 [41:22<2:28:04, 7.21s/it] {'loss': 0.7008, 'learning_rate': 1.6181102362204724e-05, 'epoch': 0.86} 22%|██▏ | 339/1572 [41:22<2:28:04, 7.21s/it] 22%|██▏ | 340/1572 [41:29<2:26:05, 7.11s/it] {'loss': 0.7285, 'learning_rate': 1.6167979002624675e-05, 'epoch': 0.86} 22%|██▏ | 340/1572 [41:29<2:26:05, 7.11s/it] 22%|██▏ | 341/1572 [41:36<2:26:43, 7.15s/it] {'loss': 0.6876, 'learning_rate': 1.615485564304462e-05, 'epoch': 0.87} 22%|██▏ | 341/1572 [41:36<2:26:43, 7.15s/it] 22%|██▏ | 342/1572 [41:43<2:27:49, 7.21s/it] {'loss': 0.7145, 'learning_rate': 1.614173228346457e-05, 'epoch': 0.87} 22%|██▏ | 342/1572 [41:43<2:27:49, 7.21s/it] 22%|██▏ | 343/1572 [41:51<2:27:52, 7.22s/it] {'loss': 0.6675, 'learning_rate': 1.6128608923884516e-05, 'epoch': 0.87} 22%|██▏ | 343/1572 [41:51<2:27:52, 7.22s/it] 22%|██▏ | 344/1572 [41:58<2:27:18, 7.20s/it] {'loss': 0.6565, 'learning_rate': 1.6115485564304463e-05, 'epoch': 0.87} 22%|██▏ | 344/1572 [41:58<2:27:18, 7.20s/it] 22%|██▏ | 345/1572 [42:05<2:26:13, 7.15s/it] {'loss': 0.7392, 'learning_rate': 1.610236220472441e-05, 'epoch': 0.88} 22%|██▏ | 345/1572 [42:05<2:26:13, 7.15s/it] 22%|██▏ | 346/1572 [42:12<2:28:11, 7.25s/it] {'loss': 0.6847, 'learning_rate': 1.608923884514436e-05, 'epoch': 0.88} 22%|██▏ | 346/1572 [42:12<2:28:11, 7.25s/it] 22%|██▏ | 347/1572 [42:20<2:29:54, 7.34s/it] {'loss': 0.6308, 'learning_rate': 1.6076115485564305e-05, 'epoch': 0.88} 22%|██▏ | 347/1572 [42:20<2:29:54, 7.34s/it] 22%|██▏ | 348/1572 [42:27<2:29:37, 7.33s/it] {'loss': 0.6739, 'learning_rate': 1.6062992125984255e-05, 'epoch': 0.88} 22%|██▏ | 348/1572 [42:27<2:29:37, 7.33s/it] 22%|██▏ | 349/1572 [42:34<2:27:43, 7.25s/it] {'loss': 0.645, 'learning_rate': 1.6049868766404202e-05, 'epoch': 0.89} 22%|██▏ | 349/1572 [42:34<2:27:43, 7.25s/it] 22%|██▏ | 350/1572 [42:41<2:27:11, 7.23s/it] {'loss': 0.6569, 'learning_rate': 1.603674540682415e-05, 'epoch': 0.89} 22%|██▏ | 350/1572 [42:41<2:27:11, 7.23s/it] 22%|██▏ | 351/1572 [42:49<2:27:11, 7.23s/it] {'loss': 0.8016, 'learning_rate': 1.6023622047244096e-05, 'epoch': 0.89} 22%|██▏ | 351/1572 [42:49<2:27:11, 7.23s/it] 22%|██▏ | 352/1572 [42:56<2:25:49, 7.17s/it] {'loss': 0.6679, 'learning_rate': 1.6010498687664044e-05, 'epoch': 0.89} 22%|██▏ | 352/1572 [42:56<2:25:49, 7.17s/it] 22%|██▏ | 353/1572 [43:03<2:23:39, 7.07s/it] {'loss': 0.7299, 'learning_rate': 1.599737532808399e-05, 'epoch': 0.9} 22%|██▏ | 353/1572 [43:03<2:23:39, 7.07s/it] 23%|██▎ | 354/1572 [43:10<2:22:57, 7.04s/it] {'loss': 0.6514, 'learning_rate': 1.5984251968503938e-05, 'epoch': 0.9} 23%|██▎ | 354/1572 [43:10<2:22:57, 7.04s/it] 23%|██▎ | 355/1572 [43:17<2:22:55, 7.05s/it] {'loss': 0.6613, 'learning_rate': 1.5971128608923885e-05, 'epoch': 0.9} 23%|██▎ | 355/1572 [43:17<2:22:55, 7.05s/it] 23%|██▎ | 356/1572 [43:24<2:25:07, 7.16s/it] {'loss': 0.6922, 'learning_rate': 1.5958005249343832e-05, 'epoch': 0.9} 23%|██▎ | 356/1572 [43:24<2:25:07, 7.16s/it] 23%|██▎ | 357/1572 [43:31<2:24:22, 7.13s/it] {'loss': 0.7856, 'learning_rate': 1.5944881889763783e-05, 'epoch': 0.91} 23%|██▎ | 357/1572 [43:31<2:24:22, 7.13s/it] 23%|██▎ | 358/1572 [43:38<2:24:55, 7.16s/it] {'loss': 0.7365, 'learning_rate': 1.5931758530183726e-05, 'epoch': 0.91} 23%|██▎ | 358/1572 [43:38<2:24:55, 7.16s/it] 23%|██▎ | 359/1572 [43:46<2:25:38, 7.20s/it] {'loss': 0.6569, 'learning_rate': 1.5918635170603677e-05, 'epoch': 0.91} 23%|██▎ | 359/1572 [43:46<2:25:38, 7.20s/it] 23%|██▎ | 360/1572 [43:53<2:26:13, 7.24s/it] {'loss': 0.7049, 'learning_rate': 1.5905511811023624e-05, 'epoch': 0.91} 23%|██▎ | 360/1572 [43:53<2:26:13, 7.24s/it] 23%|██▎ | 361/1572 [44:00<2:23:38, 7.12s/it] {'loss': 0.7298, 'learning_rate': 1.589238845144357e-05, 'epoch': 0.92} 23%|██▎ | 361/1572 [44:00<2:23:38, 7.12s/it] 23%|██▎ | 362/1572 [44:07<2:23:41, 7.12s/it] {'loss': 0.6846, 'learning_rate': 1.5879265091863518e-05, 'epoch': 0.92} 23%|██▎ | 362/1572 [44:07<2:23:41, 7.12s/it] 23%|██▎ | 363/1572 [44:15<2:26:38, 7.28s/it] {'loss': 0.6876, 'learning_rate': 1.5866141732283465e-05, 'epoch': 0.92} 23%|██▎ | 363/1572 [44:15<2:26:38, 7.28s/it] 23%|██▎ | 364/1572 [44:22<2:25:10, 7.21s/it] {'loss': 0.8115, 'learning_rate': 1.5853018372703412e-05, 'epoch': 0.92} 23%|██▎ | 364/1572 [44:22<2:25:10, 7.21s/it] 23%|██▎ | 365/1572 [44:29<2:24:40, 7.19s/it] {'loss': 0.7524, 'learning_rate': 1.583989501312336e-05, 'epoch': 0.93} 23%|██▎ | 365/1572 [44:29<2:24:40, 7.19s/it] 23%|██▎ | 366/1572 [44:36<2:25:47, 7.25s/it] {'loss': 0.6977, 'learning_rate': 1.582677165354331e-05, 'epoch': 0.93} 23%|██▎ | 366/1572 [44:36<2:25:47, 7.25s/it] 23%|██▎ | 367/1572 [44:43<2:22:56, 7.12s/it] {'loss': 0.7187, 'learning_rate': 1.5813648293963254e-05, 'epoch': 0.93} 23%|██▎ | 367/1572 [44:43<2:22:56, 7.12s/it] 23%|██▎ | 368/1572 [44:50<2:21:21, 7.04s/it] {'loss': 0.7026, 'learning_rate': 1.5800524934383204e-05, 'epoch': 0.93} 23%|██▎ | 368/1572 [44:50<2:21:21, 7.04s/it] 23%|██▎ | 369/1572 [44:57<2:21:54, 7.08s/it] {'loss': 0.7194, 'learning_rate': 1.578740157480315e-05, 'epoch': 0.94} 23%|██▎ | 369/1572 [44:57<2:21:54, 7.08s/it] 24%|██▎ | 370/1572 [45:04<2:23:19, 7.15s/it] {'loss': 0.6877, 'learning_rate': 1.57742782152231e-05, 'epoch': 0.94} 24%|██▎ | 370/1572 [45:04<2:23:19, 7.15s/it] 24%|██▎ | 371/1572 [45:13<2:30:35, 7.52s/it] {'loss': 0.8635, 'learning_rate': 1.5761154855643046e-05, 'epoch': 0.94} 24%|██▎ | 371/1572 [45:13<2:30:35, 7.52s/it] 24%|██▎ | 372/1572 [45:20<2:26:34, 7.33s/it] {'loss': 0.6863, 'learning_rate': 1.5748031496062993e-05, 'epoch': 0.94} 24%|██▎ | 372/1572 [45:20<2:26:34, 7.33s/it] 24%|██▎ | 373/1572 [45:27<2:24:42, 7.24s/it] {'loss': 0.6409, 'learning_rate': 1.573490813648294e-05, 'epoch': 0.95} 24%|██▎ | 373/1572 [45:27<2:24:42, 7.24s/it] 24%|██▍ | 374/1572 [45:34<2:23:57, 7.21s/it] {'loss': 0.7026, 'learning_rate': 1.5721784776902887e-05, 'epoch': 0.95} 24%|██▍ | 374/1572 [45:34<2:23:57, 7.21s/it] 24%|██▍ | 375/1572 [45:41<2:26:18, 7.33s/it] {'loss': 0.7152, 'learning_rate': 1.5708661417322837e-05, 'epoch': 0.95} 24%|██▍ | 375/1572 [45:41<2:26:18, 7.33s/it] 24%|██▍ | 376/1572 [45:48<2:23:10, 7.18s/it] {'loss': 0.6851, 'learning_rate': 1.5695538057742785e-05, 'epoch': 0.95} 24%|██▍ | 376/1572 [45:48<2:23:10, 7.18s/it] 24%|██▍ | 377/1572 [45:56<2:26:44, 7.37s/it] {'loss': 0.7097, 'learning_rate': 1.568241469816273e-05, 'epoch': 0.96} 24%|██▍ | 377/1572 [45:56<2:26:44, 7.37s/it] 24%|██▍ | 378/1572 [46:03<2:24:18, 7.25s/it] {'loss': 0.6708, 'learning_rate': 1.566929133858268e-05, 'epoch': 0.96} 24%|██▍ | 378/1572 [46:03<2:24:18, 7.25s/it] 24%|██▍ | 379/1572 [46:10<2:25:06, 7.30s/it] {'loss': 0.7262, 'learning_rate': 1.5656167979002626e-05, 'epoch': 0.96} 24%|██▍ | 379/1572 [46:10<2:25:06, 7.30s/it] 24%|██▍ | 380/1572 [46:17<2:21:04, 7.10s/it] {'loss': 0.692, 'learning_rate': 1.5643044619422573e-05, 'epoch': 0.96} 24%|██▍ | 380/1572 [46:17<2:21:04, 7.10s/it] 24%|██▍ | 381/1572 [46:25<2:23:09, 7.21s/it] {'loss': 0.7964, 'learning_rate': 1.5629921259842524e-05, 'epoch': 0.97} 24%|██▍ | 381/1572 [46:25<2:23:09, 7.21s/it] 24%|██▍ | 382/1572 [46:32<2:21:45, 7.15s/it] {'loss': 0.6485, 'learning_rate': 1.5616797900262467e-05, 'epoch': 0.97} 24%|██▍ | 382/1572 [46:32<2:21:45, 7.15s/it] 24%|██▍ | 383/1572 [46:39<2:22:02, 7.17s/it] {'loss': 0.6548, 'learning_rate': 1.5603674540682418e-05, 'epoch': 0.97} 24%|██▍ | 383/1572 [46:39<2:22:02, 7.17s/it] 24%|██▍ | 384/1572 [46:46<2:22:39, 7.21s/it] {'loss': 0.7234, 'learning_rate': 1.559055118110236e-05, 'epoch': 0.97} 24%|██▍ | 384/1572 [46:46<2:22:39, 7.21s/it] 24%|██▍ | 385/1572 [46:54<2:27:31, 7.46s/it] {'loss': 0.6184, 'learning_rate': 1.5577427821522312e-05, 'epoch': 0.98} 24%|██▍ | 385/1572 [46:54<2:27:31, 7.46s/it] 25%|██▍ | 386/1572 [47:01<2:24:53, 7.33s/it] {'loss': 0.6784, 'learning_rate': 1.556430446194226e-05, 'epoch': 0.98} 25%|██▍ | 386/1572 [47:01<2:24:53, 7.33s/it] 25%|██▍ | 387/1572 [47:08<2:24:35, 7.32s/it] {'loss': 0.6862, 'learning_rate': 1.5551181102362206e-05, 'epoch': 0.98} 25%|██▍ | 387/1572 [47:08<2:24:35, 7.32s/it] 25%|██▍ | 388/1572 [47:16<2:24:51, 7.34s/it] {'loss': 0.6898, 'learning_rate': 1.5538057742782153e-05, 'epoch': 0.99} 25%|██▍ | 388/1572 [47:16<2:24:51, 7.34s/it] 25%|██▍ | 389/1572 [47:24<2:27:32, 7.48s/it] {'loss': 0.6877, 'learning_rate': 1.55249343832021e-05, 'epoch': 0.99} 25%|██▍ | 389/1572 [47:24<2:27:32, 7.48s/it] 25%|██▍ | 390/1572 [47:31<2:24:54, 7.36s/it] {'loss': 0.63, 'learning_rate': 1.5511811023622048e-05, 'epoch': 0.99} 25%|██▍ | 390/1572 [47:31<2:24:54, 7.36s/it] 25%|██▍ | 391/1572 [47:38<2:24:04, 7.32s/it] {'loss': 0.7392, 'learning_rate': 1.5498687664041995e-05, 'epoch': 0.99} 25%|██▍ | 391/1572 [47:38<2:24:04, 7.32s/it] 25%|██▍ | 392/1572 [47:46<2:25:48, 7.41s/it] {'loss': 0.7621, 'learning_rate': 1.5485564304461945e-05, 'epoch': 1.0} 25%|██▍ | 392/1572 [47:46<2:25:48, 7.41s/it] 25%|██▌ | 393/1572 [47:53<2:26:42, 7.47s/it] {'loss': 0.7172, 'learning_rate': 1.547244094488189e-05, 'epoch': 1.0} 25%|██▌ | 393/1572 [47:53<2:26:42, 7.47s/it][WARNING|trainer.py:2348] 2024-07-08 20:07:49,006 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,006 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:07:49,007 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 already exists and is non-empty.Saving will proceed but saved results may be invalid. [INFO|trainer.py:2889] 2024-07-08 20:08:13,359 >> Saving model checkpoint to ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393 [INFO|tokenization_utils_base.py:2432] 2024-07-08 20:08:14,809 >> tokenizer config file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-07-08 20:08:14,814 >> Special tokens file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-393/special_tokens_map.json 25%|██▌ | 394/1572 [50:04<14:31:04, 44.37s/it] {'loss': 0.6579, 'learning_rate': 1.545931758530184e-05, 'epoch': 1.0} 25%|██▌ | 394/1572 [50:04<14:31:04, 44.37s/it] 25%|██▌ | 395/1572 [50:11<10:51:04, 33.19s/it] {'loss': 0.6238, 'learning_rate': 1.5446194225721787e-05, 'epoch': 1.0} 25%|██▌ | 395/1572 [50:11<10:51:04, 33.19s/it] 25%|██▌ | 396/1572 [50:18<8:18:30, 25.43s/it] {'loss': 0.6394, 'learning_rate': 1.5433070866141734e-05, 'epoch': 1.01} 25%|██▌ | 396/1572 [50:18<8:18:30, 25.43s/it] 25%|██▌ | 397/1572 [50:25<6:31:24, 19.99s/it] {'loss': 0.6816, 'learning_rate': 1.541994750656168e-05, 'epoch': 1.01} 25%|██▌ | 397/1572 [50:25<6:31:24, 19.99s/it] 25%|██▌ | 398/1572 [50:33<5:16:51, 16.19s/it] {'loss': 0.6576, 'learning_rate': 1.5406824146981628e-05, 'epoch': 1.01} 25%|██▌ | 398/1572 [50:33<5:16:51, 16.19s/it] 25%|██▌ | 399/1572 [50:40<4:26:08, 13.61s/it] {'loss': 0.657, 'learning_rate': 1.5393700787401575e-05, 'epoch': 1.01} 25%|██▌ | 399/1572 [50:40<4:26:08, 13.61s/it] 25%|██▌ | 400/1572 [50:47<3:46:17, 11.59s/it] {'loss': 0.7055, 'learning_rate': 1.5380577427821522e-05, 'epoch': 1.02} 25%|██▌ | 400/1572 [50:47<3:46:17, 11.59s/it] 26%|██▌ | 401/1572 [50:54<3:21:11, 10.31s/it] {'loss': 0.5969, 'learning_rate': 1.5367454068241473e-05, 'epoch': 1.02} 26%|██▌ | 401/1572 [50:54<3:21:11, 10.31s/it] 26%|██▌ | 402/1572 [51:02<3:02:35, 9.36s/it] {'loss': 0.7107, 'learning_rate': 1.5354330708661416e-05, 'epoch': 1.02} 26%|██▌ | 402/1572 [51:02<3:02:35, 9.36s/it] 26%|██▌ | 403/1572 [51:09<2:49:17, 8.69s/it] {'loss': 0.6595, 'learning_rate': 1.5341207349081367e-05, 'epoch': 1.02} 26%|██▌ | 403/1572 [51:09<2:49:17, 8.69s/it] 26%|██▌ | 404/1572 [51:16<2:41:10, 8.28s/it] {'loss': 0.7404, 'learning_rate': 1.5328083989501314e-05, 'epoch': 1.03} 26%|██▌ | 404/1572 [51:16<2:41:10, 8.28s/it] 26%|██▌ | 405/1572 [51:24<2:38:37, 8.16s/it] {'loss': 0.6249, 'learning_rate': 1.531496062992126e-05, 'epoch': 1.03} 26%|██▌ | 405/1572 [51:24<2:38:37, 8.16s/it] 26%|██▌ | 406/1572 [51:32<2:35:46, 8.02s/it] {'loss': 0.6453, 'learning_rate': 1.5301837270341208e-05, 'epoch': 1.03} 26%|██▌ | 406/1572 [51:32<2:35:46, 8.02s/it] 26%|██▌ | 407/1572 [51:39<2:31:35, 7.81s/it] {'loss': 0.7113, 'learning_rate': 1.528871391076116e-05, 'epoch': 1.03} 26%|██▌ | 407/1572 [51:39<2:31:35, 7.81s/it] 26%|██▌ | 408/1572 [51:46<2:27:57, 7.63s/it] {'loss': 0.7842, 'learning_rate': 1.5275590551181102e-05, 'epoch': 1.04} 26%|██▌ | 408/1572 [51:46<2:27:57, 7.63s/it] 26%|██▌ | 409/1572 [51:54<2:29:22, 7.71s/it] {'loss': 0.6437, 'learning_rate': 1.5262467191601053e-05, 'epoch': 1.04} 26%|██▌ | 409/1572 [51:54<2:29:22, 7.71s/it] 26%|██▌ | 410/1572 [52:01<2:27:38, 7.62s/it] {'loss': 0.6802, 'learning_rate': 1.5249343832021e-05, 'epoch': 1.04} 26%|██▌ | 410/1572 [52:01<2:27:38, 7.62s/it] 26%|██▌ | 411/1572 [52:08<2:21:52, 7.33s/it] {'loss': 0.6295, 'learning_rate': 1.5236220472440946e-05, 'epoch': 1.04} 26%|██▌ | 411/1572 [52:08<2:21:52, 7.33s/it] 26%|██▌ | 412/1572 [52:15<2:19:17, 7.20s/it] {'loss': 0.6469, 'learning_rate': 1.5223097112860894e-05, 'epoch': 1.05} 26%|██▌ | 412/1572 [52:15<2:19:17, 7.20s/it] 26%|██▋ | 413/1572 [52:22<2:19:30, 7.22s/it] {'loss': 0.7367, 'learning_rate': 1.5209973753280841e-05, 'epoch': 1.05} 26%|██▋ | 413/1572 [52:22<2:19:30, 7.22s/it] 26%|██▋ | 414/1572 [52:30<2:19:46, 7.24s/it] {'loss': 0.6417, 'learning_rate': 1.5196850393700789e-05, 'epoch': 1.05} 26%|██▋ | 414/1572 [52:30<2:19:46, 7.24s/it] 26%|██▋ | 415/1572 [52:37<2:23:43, 7.45s/it] {'loss': 0.7288, 'learning_rate': 1.5183727034120737e-05, 'epoch': 1.05} 26%|██▋ | 415/1572 [52:37<2:23:43, 7.45s/it] 26%|██▋ | 416/1572 [52:46<2:28:16, 7.70s/it] {'loss': 0.796, 'learning_rate': 1.5170603674540683e-05, 'epoch': 1.06} 26%|██▋ | 416/1572 [52:46<2:28:16, 7.70s/it] 27%|██▋ | 417/1572 [52:53<2:22:53, 7.42s/it] {'loss': 0.6668, 'learning_rate': 1.5157480314960632e-05, 'epoch': 1.06} 27%|██▋ | 417/1572 [52:53<2:22:53, 7.42s/it] 27%|██▋ | 418/1572 [52:59<2:19:02, 7.23s/it] {'loss': 0.7408, 'learning_rate': 1.5144356955380579e-05, 'epoch': 1.06} 27%|██▋ | 418/1572 [52:59<2:19:02, 7.23s/it] 27%|██▋ | 419/1572 [53:06<2:18:27, 7.21s/it] {'loss': 0.6293, 'learning_rate': 1.5131233595800526e-05, 'epoch': 1.06} 27%|██▋ | 419/1572 [53:06<2:18:27, 7.21s/it] 27%|██▋ | 420/1572 [53:14<2:21:34, 7.37s/it] {'loss': 0.6605, 'learning_rate': 1.5118110236220473e-05, 'epoch': 1.07} 27%|██▋ | 420/1572 [53:14<2:21:34, 7.37s/it] 27%|██▋ | 421/1572 [53:21<2:17:46, 7.18s/it] {'loss': 0.6289, 'learning_rate': 1.5104986876640422e-05, 'epoch': 1.07} 27%|██▋ | 421/1572 [53:21<2:17:46, 7.18s/it] 27%|██▋ | 422/1572 [53:29<2:19:55, 7.30s/it] {'loss': 0.585, 'learning_rate': 1.5091863517060367e-05, 'epoch': 1.07} 27%|██▋ | 422/1572 [53:29<2:19:55, 7.30s/it] 27%|██▋ | 423/1572 [53:36<2:19:46, 7.30s/it] {'loss': 0.6207, 'learning_rate': 1.5078740157480316e-05, 'epoch': 1.07} 27%|██▋ | 423/1572 [53:36<2:19:46, 7.30s/it] 27%|██▋ | 424/1572 [53:43<2:17:24, 7.18s/it] {'loss': 0.671, 'learning_rate': 1.5065616797900265e-05, 'epoch': 1.08} 27%|██▋ | 424/1572 [53:43<2:17:24, 7.18s/it] 27%|██▋ | 425/1572 [53:50<2:18:05, 7.22s/it] {'loss': 0.6277, 'learning_rate': 1.505249343832021e-05, 'epoch': 1.08} 27%|██▋ | 425/1572 [53:50<2:18:05, 7.22s/it] 27%|██▋ | 426/1572 [53:57<2:17:37, 7.21s/it] {'loss': 0.6535, 'learning_rate': 1.5039370078740159e-05, 'epoch': 1.08} 27%|██▋ | 426/1572 [53:57<2:17:37, 7.21s/it] 27%|██▋ | 427/1572 [54:04<2:16:49, 7.17s/it] {'loss': 0.5868, 'learning_rate': 1.5026246719160106e-05, 'epoch': 1.08} 27%|██▋ | 427/1572 [54:04<2:16:49, 7.17s/it] 27%|██▋ | 428/1572 [54:12<2:17:11, 7.19s/it] {'loss': 0.7606, 'learning_rate': 1.5013123359580053e-05, 'epoch': 1.09} 27%|██▋ | 428/1572 [54:12<2:17:11, 7.19s/it] 27%|██▋ | 429/1572 [54:19<2:20:56, 7.40s/it] {'loss': 0.6605, 'learning_rate': 1.5000000000000002e-05, 'epoch': 1.09} 27%|██▋ | 429/1572 [54:19<2:20:56, 7.40s/it] 27%|██▋ | 430/1572 [54:26<2:18:44, 7.29s/it] {'loss': 0.6228, 'learning_rate': 1.498687664041995e-05, 'epoch': 1.09} 27%|██▋ | 430/1572 [54:26<2:18:44, 7.29s/it] 27%|██▋ | 431/1572 [54:34<2:17:24, 7.23s/it] {'loss': 0.7267, 'learning_rate': 1.4973753280839896e-05, 'epoch': 1.09} 27%|██▋ | 431/1572 [54:34<2:17:24, 7.23s/it] 27%|██▋ | 432/1572 [54:40<2:14:54, 7.10s/it] {'loss': 0.6723, 'learning_rate': 1.4960629921259843e-05, 'epoch': 1.1} 27%|██▋ | 432/1572 [54:40<2:14:54, 7.10s/it] 28%|██▊ | 433/1572 [54:48<2:16:31, 7.19s/it] {'loss': 0.6098, 'learning_rate': 1.4947506561679792e-05, 'epoch': 1.1} 28%|██▊ | 433/1572 [54:48<2:16:31, 7.19s/it] 28%|██▊ | 434/1572 [54:55<2:17:09, 7.23s/it] {'loss': 0.6253, 'learning_rate': 1.4934383202099738e-05, 'epoch': 1.1} 28%|██▊ | 434/1572 [54:55<2:17:09, 7.23s/it] 28%|██▊ | 435/1572 [55:02<2:14:09, 7.08s/it] {'loss': 0.6369, 'learning_rate': 1.4921259842519686e-05, 'epoch': 1.1} 28%|██▊ | 435/1572 [55:02<2:14:09, 7.08s/it] 28%|██▊ | 436/1572 [55:09<2:14:29, 7.10s/it] {'loss': 0.7005, 'learning_rate': 1.4908136482939635e-05, 'epoch': 1.11} 28%|██▊ | 436/1572 [55:09<2:14:29, 7.10s/it] 28%|██▊ | 437/1572 [55:16<2:15:48, 7.18s/it] {'loss': 0.679, 'learning_rate': 1.489501312335958e-05, 'epoch': 1.11} 28%|██▊ | 437/1572 [55:16<2:15:48, 7.18s/it] 28%|██▊ | 438/1572 [55:25<2:22:56, 7.56s/it] {'loss': 0.7713, 'learning_rate': 1.488188976377953e-05, 'epoch': 1.11} 28%|██▊ | 438/1572 [55:25<2:22:56, 7.56s/it] 28%|██▊ | 439/1572 [55:32<2:22:25, 7.54s/it] {'loss': 0.6451, 'learning_rate': 1.4868766404199477e-05, 'epoch': 1.11} 28%|██▊ | 439/1572 [55:32<2:22:25, 7.54s/it] 28%|██▊ | 440/1572 [55:39<2:18:05, 7.32s/it] {'loss': 0.6409, 'learning_rate': 1.4855643044619424e-05, 'epoch': 1.12} 28%|██▊ | 440/1572 [55:39<2:18:05, 7.32s/it] 28%|██▊ | 441/1572 [55:46<2:16:10, 7.22s/it] {'loss': 0.7342, 'learning_rate': 1.4842519685039371e-05, 'epoch': 1.12} 28%|██▊ | 441/1572 [55:46<2:16:10, 7.22s/it] 28%|██▊ | 442/1572 [55:54<2:19:34, 7.41s/it] {'loss': 0.6575, 'learning_rate': 1.482939632545932e-05, 'epoch': 1.12} 28%|██▊ | 442/1572 [55:54<2:19:34, 7.41s/it] 28%|██▊ | 443/1572 [56:01<2:19:44, 7.43s/it] {'loss': 0.6358, 'learning_rate': 1.4816272965879265e-05, 'epoch': 1.12} 28%|██▊ | 443/1572 [56:01<2:19:44, 7.43s/it] 28%|██▊ | 444/1572 [56:09<2:17:58, 7.34s/it] {'loss': 0.5813, 'learning_rate': 1.4803149606299214e-05, 'epoch': 1.13} 28%|██▊ | 444/1572 [56:09<2:17:58, 7.34s/it] 28%|██▊ | 445/1572 [56:16<2:16:17, 7.26s/it] {'loss': 0.6427, 'learning_rate': 1.4790026246719161e-05, 'epoch': 1.13} 28%|██▊ | 445/1572 [56:16<2:16:17, 7.26s/it] 28%|██▊ | 446/1572 [56:23<2:16:37, 7.28s/it] {'loss': 0.7327, 'learning_rate': 1.4776902887139108e-05, 'epoch': 1.13} 28%|██▊ | 446/1572 [56:23<2:16:37, 7.28s/it] 28%|██▊ | 447/1572 [56:30<2:15:33, 7.23s/it] {'loss': 0.6833, 'learning_rate': 1.4763779527559057e-05, 'epoch': 1.13} 28%|██▊ | 447/1572 [56:30<2:15:33, 7.23s/it] 28%|██▊ | 448/1572 [56:37<2:15:21, 7.23s/it] {'loss': 0.6715, 'learning_rate': 1.4750656167979002e-05, 'epoch': 1.14} 28%|██▊ | 448/1572 [56:37<2:15:21, 7.23s/it] 29%|██▊ | 449/1572 [56:45<2:17:55, 7.37s/it] {'loss': 0.6244, 'learning_rate': 1.4737532808398951e-05, 'epoch': 1.14} 29%|██▊ | 449/1572 [56:45<2:17:55, 7.37s/it] 29%|██▊ | 450/1572 [56:52<2:17:01, 7.33s/it] {'loss': 0.5874, 'learning_rate': 1.47244094488189e-05, 'epoch': 1.14} 29%|██▊ | 450/1572 [56:52<2:17:01, 7.33s/it] 29%|██▊ | 451/1572 [57:00<2:16:58, 7.33s/it] {'loss': 0.8781, 'learning_rate': 1.4711286089238845e-05, 'epoch': 1.15} 29%|██▊ | 451/1572 [57:00<2:16:58, 7.33s/it] 29%|██▉ | 452/1572 [57:07<2:15:22, 7.25s/it] {'loss': 0.5761, 'learning_rate': 1.4698162729658794e-05, 'epoch': 1.15} 29%|██▉ | 452/1572 [57:07<2:15:22, 7.25s/it] 29%|██▉ | 453/1572 [57:14<2:15:11, 7.25s/it] {'loss': 0.6387, 'learning_rate': 1.4685039370078741e-05, 'epoch': 1.15} 29%|██▉ | 453/1572 [57:14<2:15:11, 7.25s/it] 29%|██▉ | 454/1572 [57:21<2:15:17, 7.26s/it] {'loss': 0.6827, 'learning_rate': 1.4671916010498688e-05, 'epoch': 1.15} 29%|██▉ | 454/1572 [57:21<2:15:17, 7.26s/it] 29%|██▉ | 455/1572 [57:28<2:11:43, 7.08s/it] {'loss': 0.697, 'learning_rate': 1.4658792650918636e-05, 'epoch': 1.16} 29%|██▉ | 455/1572 [57:28<2:11:43, 7.08s/it] 29%|██▉ | 456/1572 [57:35<2:12:56, 7.15s/it] {'loss': 0.7584, 'learning_rate': 1.4645669291338584e-05, 'epoch': 1.16} 29%|██▉ | 456/1572 [57:35<2:12:56, 7.15s/it] 29%|██▉ | 457/1572 [57:42<2:13:23, 7.18s/it] {'loss': 0.6165, 'learning_rate': 1.463254593175853e-05, 'epoch': 1.16} 29%|██▉ | 457/1572 [57:42<2:13:23, 7.18s/it] 29%|██▉ | 458/1572 [57:50<2:13:36, 7.20s/it] {'loss': 0.6614, 'learning_rate': 1.4619422572178479e-05, 'epoch': 1.16} 29%|██▉ | 458/1572 [57:50<2:13:36, 7.20s/it] 29%|██▉ | 459/1572 [57:56<2:10:51, 7.05s/it] {'loss': 0.6898, 'learning_rate': 1.4606299212598427e-05, 'epoch': 1.17} 29%|██▉ | 459/1572 [57:56<2:10:51, 7.05s/it] 29%|██▉ | 460/1572 [58:03<2:09:42, 7.00s/it] {'loss': 0.7243, 'learning_rate': 1.4593175853018373e-05, 'epoch': 1.17} 29%|██▉ | 460/1572 [58:03<2:09:42, 7.00s/it] 29%|██▉ | 461/1572 [58:11<2:13:09, 7.19s/it] {'loss': 0.6284, 'learning_rate': 1.4580052493438322e-05, 'epoch': 1.17} 29%|██▉ | 461/1572 [58:11<2:13:09, 7.19s/it] 29%|██▉ | 462/1572 [58:18<2:13:03, 7.19s/it] {'loss': 0.6523, 'learning_rate': 1.456692913385827e-05, 'epoch': 1.17} 29%|██▉ | 462/1572 [58:18<2:13:03, 7.19s/it] 29%|██▉ | 463/1572 [58:25<2:12:17, 7.16s/it] {'loss': 0.6886, 'learning_rate': 1.4553805774278216e-05, 'epoch': 1.18} 29%|██▉ | 463/1572 [58:25<2:12:17, 7.16s/it] 30%|██▉ | 464/1572 [58:33<2:13:46, 7.24s/it] {'loss': 0.6592, 'learning_rate': 1.4540682414698165e-05, 'epoch': 1.18} 30%|██▉ | 464/1572 [58:33<2:13:46, 7.24s/it] 30%|██▉ | 465/1572 [58:40<2:13:12, 7.22s/it] {'loss': 0.5917, 'learning_rate': 1.4527559055118112e-05, 'epoch': 1.18} 30%|██▉ | 465/1572 [58:40<2:13:12, 7.22s/it] 30%|██▉ | 466/1572 [58:47<2:13:36, 7.25s/it] {'loss': 0.6923, 'learning_rate': 1.4514435695538059e-05, 'epoch': 1.18} 30%|██▉ | 466/1572 [58:47<2:13:36, 7.25s/it] 30%|██▉ | 467/1572 [58:54<2:13:14, 7.24s/it] {'loss': 0.7126, 'learning_rate': 1.4501312335958006e-05, 'epoch': 1.19} 30%|██▉ | 467/1572 [58:54<2:13:14, 7.24s/it] 30%|██▉ | 468/1572 [59:01<2:12:38, 7.21s/it] {'loss': 0.6608, 'learning_rate': 1.4488188976377955e-05, 'epoch': 1.19} 30%|██▉ | 468/1572 [59:01<2:12:38, 7.21s/it] 30%|██▉ | 469/1572 [59:09<2:13:53, 7.28s/it] {'loss': 0.6697, 'learning_rate': 1.44750656167979e-05, 'epoch': 1.19} 30%|██▉ | 469/1572 [59:09<2:13:53, 7.28s/it] 30%|██▉ | 470/1572 [59:15<2:10:27, 7.10s/it] {'loss': 0.6865, 'learning_rate': 1.4461942257217849e-05, 'epoch': 1.19} 30%|██▉ | 470/1572 [59:15<2:10:27, 7.10s/it] 30%|██▉ | 471/1572 [59:23<2:11:51, 7.19s/it] {'loss': 0.6339, 'learning_rate': 1.4448818897637798e-05, 'epoch': 1.2} 30%|██▉ | 471/1572 [59:23<2:11:51, 7.19s/it] 30%|███ | 472/1572 [59:31<2:16:55, 7.47s/it] {'loss': 0.6488, 'learning_rate': 1.4435695538057743e-05, 'epoch': 1.2} 30%|███ | 472/1572 [59:31<2:16:55, 7.47s/it] 30%|███ | 473/1572 [59:38<2:15:30, 7.40s/it] {'loss': 0.6327, 'learning_rate': 1.4422572178477692e-05, 'epoch': 1.2} 30%|███ | 473/1572 [59:38<2:15:30, 7.40s/it] 30%|███ | 474/1572 [59:46<2:15:18, 7.39s/it] {'loss': 0.6231, 'learning_rate': 1.440944881889764e-05, 'epoch': 1.2} 30%|███ | 474/1572 [59:46<2:15:18, 7.39s/it] 30%|███ | 475/1572 [59:53<2:16:58, 7.49s/it] {'loss': 0.6524, 'learning_rate': 1.4396325459317586e-05, 'epoch': 1.21} 30%|███ | 475/1572 [59:53<2:16:58, 7.49s/it] 30%|███ | 476/1572 [1:00:01<2:16:24, 7.47s/it] {'loss': 0.6779, 'learning_rate': 1.4383202099737535e-05, 'epoch': 1.21} 30%|███ | 476/1572 [1:00:01<2:16:24, 7.47s/it] 30%|███ | 477/1572 [1:00:08<2:14:22, 7.36s/it] {'loss': 0.5976, 'learning_rate': 1.437007874015748e-05, 'epoch': 1.21} 30%|███ | 477/1572 [1:00:08<2:14:22, 7.36s/it] 30%|███ | 478/1572 [1:00:15<2:13:44, 7.34s/it] {'loss': 0.6892, 'learning_rate': 1.435695538057743e-05, 'epoch': 1.21} 30%|███ | 478/1572 [1:00:15<2:13:44, 7.34s/it] 30%|███ | 479/1572 [1:00:23<2:15:35, 7.44s/it] {'loss': 0.6442, 'learning_rate': 1.4343832020997377e-05, 'epoch': 1.22} 30%|███ | 479/1572 [1:00:23<2:15:35, 7.44s/it] 31%|███ | 480/1572 [1:00:30<2:13:20, 7.33s/it] {'loss': 0.662, 'learning_rate': 1.4330708661417324e-05, 'epoch': 1.22} 31%|███ | 480/1572 [1:00:30<2:13:20, 7.33s/it] 31%|███ | 481/1572 [1:00:37<2:12:09, 7.27s/it] {'loss': 0.7119, 'learning_rate': 1.431758530183727e-05, 'epoch': 1.22} 31%|███ | 481/1572 [1:00:37<2:12:09, 7.27s/it] 31%|███ | 482/1572 [1:00:44<2:11:21, 7.23s/it] {'loss': 0.7041, 'learning_rate': 1.430446194225722e-05, 'epoch': 1.22} 31%|███ | 482/1572 [1:00:44<2:11:21, 7.23s/it] 31%|███ | 483/1572 [1:00:52<2:12:35, 7.31s/it] {'loss': 0.6616, 'learning_rate': 1.4291338582677165e-05, 'epoch': 1.23} 31%|███ | 483/1572 [1:00:52<2:12:35, 7.31s/it] 31%|███ | 484/1572 [1:00:59<2:12:41, 7.32s/it] {'loss': 0.6021, 'learning_rate': 1.4278215223097114e-05, 'epoch': 1.23} 31%|███ | 484/1572 [1:00:59<2:12:41, 7.32s/it] 31%|███ | 485/1572 [1:01:06<2:10:38, 7.21s/it] {'loss': 0.6857, 'learning_rate': 1.4265091863517063e-05, 'epoch': 1.23} 31%|███ | 485/1572 [1:01:06<2:10:38, 7.21s/it] 31%|███ | 486/1572 [1:01:13<2:07:53, 7.07s/it] {'loss': 0.6354, 'learning_rate': 1.4251968503937008e-05, 'epoch': 1.23} 31%|███ | 486/1572 [1:01:13<2:07:53, 7.07s/it] 31%|███ | 487/1572 [1:01:20<2:08:09, 7.09s/it] {'loss': 0.7394, 'learning_rate': 1.4238845144356957e-05, 'epoch': 1.24} 31%|███ | 487/1572 [1:01:20<2:08:09, 7.09s/it] 31%|███ | 488/1572 [1:01:27<2:08:35, 7.12s/it] {'loss': 0.6941, 'learning_rate': 1.4225721784776904e-05, 'epoch': 1.24} 31%|███ | 488/1572 [1:01:27<2:08:35, 7.12s/it] 31%|███ | 489/1572 [1:01:34<2:09:21, 7.17s/it] {'loss': 0.6883, 'learning_rate': 1.4212598425196851e-05, 'epoch': 1.24} 31%|███ | 489/1572 [1:01:34<2:09:21, 7.17s/it] 31%|███ | 490/1572 [1:01:41<2:06:59, 7.04s/it] {'loss': 0.6294, 'learning_rate': 1.4199475065616798e-05, 'epoch': 1.24} 31%|███ | 490/1572 [1:01:41<2:06:59, 7.04s/it] 31%|███ | 491/1572 [1:01:48<2:08:42, 7.14s/it] {'loss': 0.6858, 'learning_rate': 1.4186351706036747e-05, 'epoch': 1.25} 31%|███ | 491/1572 [1:01:48<2:08:42, 7.14s/it] 31%|███▏ | 492/1572 [1:01:56<2:08:59, 7.17s/it] {'loss': 0.7239, 'learning_rate': 1.4173228346456694e-05, 'epoch': 1.25} 31%|███▏ | 492/1572 [1:01:56<2:08:59, 7.17s/it] 31%|███▏ | 493/1572 [1:02:04<2:12:44, 7.38s/it] {'loss': 0.6474, 'learning_rate': 1.4160104986876641e-05, 'epoch': 1.25} 31%|███▏ | 493/1572 [1:02:04<2:12:44, 7.38s/it] 31%|███▏ | 494/1572 [1:02:11<2:10:59, 7.29s/it] {'loss': 0.7456, 'learning_rate': 1.414698162729659e-05, 'epoch': 1.25} 31%|███▏ | 494/1572 [1:02:11<2:10:59, 7.29s/it] 31%|███▏ | 495/1572 [1:02:18<2:09:23, 7.21s/it] {'loss': 0.7308, 'learning_rate': 1.4133858267716535e-05, 'epoch': 1.26} 31%|███▏ | 495/1572 [1:02:18<2:09:23, 7.21s/it] 32%|███▏ | 496/1572 [1:02:24<2:07:01, 7.08s/it] {'loss': 0.6171, 'learning_rate': 1.4120734908136484e-05, 'epoch': 1.26} 32%|███▏ | 496/1572 [1:02:24<2:07:01, 7.08s/it] 32%|███▏ | 497/1572 [1:02:32<2:10:19, 7.27s/it] {'loss': 0.6873, 'learning_rate': 1.4107611548556433e-05, 'epoch': 1.26} 32%|███▏ | 497/1572 [1:02:32<2:10:19, 7.27s/it] 32%|███▏ | 498/1572 [1:02:39<2:08:26, 7.18s/it] {'loss': 0.6792, 'learning_rate': 1.4094488188976379e-05, 'epoch': 1.26} 32%|███▏ | 498/1572 [1:02:39<2:08:26, 7.18s/it] 32%|███▏ | 499/1572 [1:02:46<2:07:37, 7.14s/it] {'loss': 0.6495, 'learning_rate': 1.4081364829396327e-05, 'epoch': 1.27} 32%|███▏ | 499/1572 [1:02:46<2:07:37, 7.14s/it] 32%|███▏ | 500/1572 [1:02:54<2:09:13, 7.23s/it] {'loss': 0.6466, 'learning_rate': 1.4068241469816274e-05, 'epoch': 1.27} 32%|███▏ | 500/1572 [1:02:54<2:09:13, 7.23s/it] 32%|███▏ | 501/1572 [1:03:00<2:05:44, 7.04s/it] {'loss': 0.6188, 'learning_rate': 1.4055118110236222e-05, 'epoch': 1.27} 32%|███▏ | 501/1572 [1:03:00<2:05:44, 7.04s/it] 32%|███▏ | 502/1572 [1:03:08<2:09:19, 7.25s/it] {'loss': 0.7188, 'learning_rate': 1.4041994750656169e-05, 'epoch': 1.27} 32%|███▏ | 502/1572 [1:03:08<2:09:19, 7.25s/it] 32%|███▏ | 503/1572 [1:03:15<2:09:54, 7.29s/it] {'loss': 0.6322, 'learning_rate': 1.4028871391076117e-05, 'epoch': 1.28} 32%|███▏ | 503/1572 [1:03:15<2:09:54, 7.29s/it] 32%|███▏ | 504/1572 [1:03:23<2:10:08, 7.31s/it] {'loss': 0.719, 'learning_rate': 1.4015748031496063e-05, 'epoch': 1.28} 32%|███▏ | 504/1572 [1:03:23<2:10:08, 7.31s/it] 32%|███▏ | 505/1572 [1:03:30<2:08:43, 7.24s/it] {'loss': 0.6446, 'learning_rate': 1.4002624671916012e-05, 'epoch': 1.28} 32%|███▏ | 505/1572 [1:03:30<2:08:43, 7.24s/it] 32%|███▏ | 506/1572 [1:03:37<2:08:22, 7.23s/it] {'loss': 0.6482, 'learning_rate': 1.398950131233596e-05, 'epoch': 1.28} 32%|███▏ | 506/1572 [1:03:37<2:08:22, 7.23s/it] 32%|███▏ | 507/1572 [1:03:44<2:09:28, 7.29s/it] {'loss': 0.727, 'learning_rate': 1.3976377952755906e-05, 'epoch': 1.29} 32%|███▏ | 507/1572 [1:03:44<2:09:28, 7.29s/it] 32%|███▏ | 508/1572 [1:03:52<2:10:23, 7.35s/it] {'loss': 0.5877, 'learning_rate': 1.3963254593175855e-05, 'epoch': 1.29} 32%|███▏ | 508/1572 [1:03:52<2:10:23, 7.35s/it] 32%|███▏ | 509/1572 [1:03:59<2:09:41, 7.32s/it] {'loss': 0.626, 'learning_rate': 1.39501312335958e-05, 'epoch': 1.29} 32%|███▏ | 509/1572 [1:03:59<2:09:41, 7.32s/it] 32%|███▏ | 510/1572 [1:04:06<2:07:27, 7.20s/it] {'loss': 0.7112, 'learning_rate': 1.3937007874015749e-05, 'epoch': 1.29} 32%|███▏ | 510/1572 [1:04:06<2:07:27, 7.20s/it] 33%|███▎ | 511/1572 [1:04:14<2:09:45, 7.34s/it] {'loss': 0.6759, 'learning_rate': 1.3923884514435698e-05, 'epoch': 1.3} 33%|███▎ | 511/1572 [1:04:14<2:09:45, 7.34s/it] 33%|███▎ | 512/1572 [1:04:21<2:11:55, 7.47s/it] {'loss': 0.6676, 'learning_rate': 1.3910761154855643e-05, 'epoch': 1.3} 33%|███▎ | 512/1572 [1:04:21<2:11:55, 7.47s/it] 33%|███▎ | 513/1572 [1:04:29<2:11:24, 7.44s/it] {'loss': 0.6789, 'learning_rate': 1.3897637795275592e-05, 'epoch': 1.3} 33%|███▎ | 513/1572 [1:04:29<2:11:24, 7.44s/it] 33%|███▎ | 514/1572 [1:04:37<2:13:47, 7.59s/it] {'loss': 0.7629, 'learning_rate': 1.388451443569554e-05, 'epoch': 1.3} 33%|███▎ | 514/1572 [1:04:37<2:13:47, 7.59s/it] 33%|███▎ | 515/1572 [1:04:44<2:11:20, 7.46s/it] {'loss': 0.6798, 'learning_rate': 1.3871391076115486e-05, 'epoch': 1.31} 33%|███▎ | 515/1572 [1:04:44<2:11:20, 7.46s/it] 33%|███▎ | 516/1572 [1:04:51<2:10:09, 7.40s/it] {'loss': 0.7309, 'learning_rate': 1.3858267716535433e-05, 'epoch': 1.31} 33%|███▎ | 516/1572 [1:04:51<2:10:09, 7.40s/it] 33%|███▎ | 517/1572 [1:04:59<2:12:14, 7.52s/it] {'loss': 0.6596, 'learning_rate': 1.3845144356955382e-05, 'epoch': 1.31} 33%|███▎ | 517/1572 [1:04:59<2:12:14, 7.52s/it] 33%|███▎ | 518/1572 [1:05:06<2:10:58, 7.46s/it] {'loss': 0.6956, 'learning_rate': 1.3832020997375328e-05, 'epoch': 1.32} 33%|███▎ | 518/1572 [1:05:06<2:10:58, 7.46s/it] 33%|███▎ | 519/1572 [1:05:14<2:10:29, 7.43s/it] {'loss': 0.712, 'learning_rate': 1.3818897637795276e-05, 'epoch': 1.32} 33%|███▎ | 519/1572 [1:05:14<2:10:29, 7.43s/it] 33%|███▎ | 520/1572 [1:05:21<2:07:49, 7.29s/it] {'loss': 0.7014, 'learning_rate': 1.3805774278215225e-05, 'epoch': 1.32} 33%|███▎ | 520/1572 [1:05:21<2:07:49, 7.29s/it] 33%|███▎ | 521/1572 [1:05:27<2:05:11, 7.15s/it] {'loss': 0.6591, 'learning_rate': 1.379265091863517e-05, 'epoch': 1.32} 33%|███▎ | 521/1572 [1:05:27<2:05:11, 7.15s/it] 33%|███▎ | 522/1572 [1:05:35<2:09:31, 7.40s/it] {'loss': 0.6482, 'learning_rate': 1.377952755905512e-05, 'epoch': 1.33} 33%|███▎ | 522/1572 [1:05:35<2:09:31, 7.40s/it] 33%|███▎ | 523/1572 [1:05:43<2:09:01, 7.38s/it] {'loss': 0.6946, 'learning_rate': 1.3766404199475068e-05, 'epoch': 1.33} 33%|███▎ | 523/1572 [1:05:43<2:09:01, 7.38s/it] 33%|███▎ | 524/1572 [1:05:50<2:06:50, 7.26s/it] {'loss': 0.5933, 'learning_rate': 1.3753280839895014e-05, 'epoch': 1.33} 33%|███▎ | 524/1572 [1:05:50<2:06:50, 7.26s/it] 33%|███▎ | 525/1572 [1:05:57<2:07:50, 7.33s/it] {'loss': 0.6783, 'learning_rate': 1.3740157480314963e-05, 'epoch': 1.33} 33%|███▎ | 525/1572 [1:05:57<2:07:50, 7.33s/it] 33%|███▎ | 526/1572 [1:06:04<2:04:56, 7.17s/it] {'loss': 0.6338, 'learning_rate': 1.372703412073491e-05, 'epoch': 1.34} 33%|███▎ | 526/1572 [1:06:04<2:04:56, 7.17s/it] 34%|███▎ | 527/1572 [1:06:11<2:03:28, 7.09s/it] {'loss': 0.6873, 'learning_rate': 1.3713910761154857e-05, 'epoch': 1.34} 34%|███▎ | 527/1572 [1:06:11<2:03:28, 7.09s/it] 34%|███▎ | 528/1572 [1:06:18<2:04:10, 7.14s/it] {'loss': 0.5833, 'learning_rate': 1.3700787401574804e-05, 'epoch': 1.34} 34%|███▎ | 528/1572 [1:06:18<2:04:10, 7.14s/it] 34%|███▎ | 529/1572 [1:06:25<2:03:29, 7.10s/it] {'loss': 0.6784, 'learning_rate': 1.3687664041994753e-05, 'epoch': 1.34} 34%|███▎ | 529/1572 [1:06:25<2:03:29, 7.10s/it] 34%|███▎ | 530/1572 [1:06:32<2:03:33, 7.11s/it] {'loss': 0.6976, 'learning_rate': 1.3674540682414698e-05, 'epoch': 1.35} 34%|███▎ | 530/1572 [1:06:32<2:03:33, 7.11s/it] 34%|███▍ | 531/1572 [1:06:39<2:01:29, 7.00s/it] {'loss': 0.7271, 'learning_rate': 1.3661417322834647e-05, 'epoch': 1.35} 34%|███▍ | 531/1572 [1:06:39<2:01:29, 7.00s/it] 34%|███▍ | 532/1572 [1:06:46<2:02:14, 7.05s/it] {'loss': 0.6426, 'learning_rate': 1.3648293963254596e-05, 'epoch': 1.35} 34%|███▍ | 532/1572 [1:06:46<2:02:14, 7.05s/it] 34%|███▍ | 533/1572 [1:06:54<2:04:27, 7.19s/it] {'loss': 0.6955, 'learning_rate': 1.3635170603674541e-05, 'epoch': 1.35} 34%|███▍ | 533/1572 [1:06:54<2:04:27, 7.19s/it] 34%|███▍ | 534/1572 [1:07:02<2:09:21, 7.48s/it] {'loss': 0.7484, 'learning_rate': 1.362204724409449e-05, 'epoch': 1.36} 34%|███▍ | 534/1572 [1:07:02<2:09:21, 7.48s/it] 34%|███▍ | 535/1572 [1:07:09<2:06:44, 7.33s/it] {'loss': 0.6906, 'learning_rate': 1.3608923884514437e-05, 'epoch': 1.36} 34%|███▍ | 535/1572 [1:07:09<2:06:44, 7.33s/it] 34%|███▍ | 536/1572 [1:07:16<2:05:57, 7.29s/it] {'loss': 0.6727, 'learning_rate': 1.3595800524934384e-05, 'epoch': 1.36} 34%|███▍ | 536/1572 [1:07:16<2:05:57, 7.29s/it] 34%|███▍ | 537/1572 [1:07:24<2:06:24, 7.33s/it] {'loss': 0.6647, 'learning_rate': 1.3582677165354331e-05, 'epoch': 1.36} 34%|███▍ | 537/1572 [1:07:24<2:06:24, 7.33s/it] 34%|███▍ | 538/1572 [1:07:30<2:04:13, 7.21s/it] {'loss': 0.6372, 'learning_rate': 1.356955380577428e-05, 'epoch': 1.37} 34%|███▍ | 538/1572 [1:07:30<2:04:13, 7.21s/it] 34%|███▍ | 539/1572 [1:07:38<2:03:22, 7.17s/it] {'loss': 0.5744, 'learning_rate': 1.3556430446194227e-05, 'epoch': 1.37} 34%|███▍ | 539/1572 [1:07:38<2:03:22, 7.17s/it] 34%|███▍ | 540/1572 [1:07:45<2:05:10, 7.28s/it] {'loss': 0.625, 'learning_rate': 1.3543307086614174e-05, 'epoch': 1.37} 34%|███▍ | 540/1572 [1:07:45<2:05:10, 7.28s/it] 34%|███▍ | 541/1572 [1:07:53<2:09:47, 7.55s/it] {'loss': 0.6893, 'learning_rate': 1.3530183727034121e-05, 'epoch': 1.37} 34%|███▍ | 541/1572 [1:07:53<2:09:47, 7.55s/it] 34%|███▍ | 542/1572 [1:08:01<2:11:26, 7.66s/it] {'loss': 0.6194, 'learning_rate': 1.3517060367454069e-05, 'epoch': 1.38} 34%|███▍ | 542/1572 [1:08:01<2:11:26, 7.66s/it] 35%|███▍ | 543/1572 [1:08:08<2:06:59, 7.41s/it] {'loss': 0.6335, 'learning_rate': 1.3503937007874017e-05, 'epoch': 1.38} 35%|███▍ | 543/1572 [1:08:08<2:06:59, 7.41s/it] 35%|███▍ | 544/1572 [1:08:15<2:04:15, 7.25s/it] {'loss': 0.6877, 'learning_rate': 1.3490813648293963e-05, 'epoch': 1.38} 35%|███▍ | 544/1572 [1:08:15<2:04:15, 7.25s/it] 35%|███▍ | 545/1572 [1:08:22<2:02:28, 7.16s/it] {'loss': 0.6932, 'learning_rate': 1.3477690288713912e-05, 'epoch': 1.38} 35%|███▍ | 545/1572 [1:08:22<2:02:28, 7.16s/it] 35%|███▍ | 546/1572 [1:08:29<2:01:48, 7.12s/it] {'loss': 0.7087, 'learning_rate': 1.346456692913386e-05, 'epoch': 1.39} 35%|███▍ | 546/1572 [1:08:29<2:01:48, 7.12s/it] 35%|███▍ | 547/1572 [1:08:36<2:02:11, 7.15s/it] {'loss': 0.6365, 'learning_rate': 1.3451443569553806e-05, 'epoch': 1.39} 35%|███▍ | 547/1572 [1:08:36<2:02:11, 7.15s/it] 35%|███▍ | 548/1572 [1:08:43<2:02:20, 7.17s/it] {'loss': 0.6548, 'learning_rate': 1.3438320209973755e-05, 'epoch': 1.39} 35%|███▍ | 548/1572 [1:08:43<2:02:20, 7.17s/it] 35%|███▍ | 549/1572 [1:08:51<2:05:16, 7.35s/it] {'loss': 0.6501, 'learning_rate': 1.3425196850393702e-05, 'epoch': 1.39} 35%|███▍ | 549/1572 [1:08:51<2:05:16, 7.35s/it] 35%|███▍ | 550/1572 [1:08:58<2:05:40, 7.38s/it] {'loss': 0.8315, 'learning_rate': 1.3412073490813649e-05, 'epoch': 1.4} 35%|███▍ | 550/1572 [1:08:58<2:05:40, 7.38s/it] 35%|███▌ | 551/1572 [1:09:05<2:02:50, 7.22s/it] {'loss': 0.7062, 'learning_rate': 1.3398950131233596e-05, 'epoch': 1.4} 35%|███▌ | 551/1572 [1:09:05<2:02:50, 7.22s/it] 35%|███▌ | 552/1572 [1:09:13<2:04:34, 7.33s/it] {'loss': 0.6554, 'learning_rate': 1.3385826771653545e-05, 'epoch': 1.4} 35%|███▌ | 552/1572 [1:09:13<2:04:34, 7.33s/it] 35%|███▌ | 553/1572 [1:09:20<2:03:21, 7.26s/it] {'loss': 0.6587, 'learning_rate': 1.337270341207349e-05, 'epoch': 1.4} 35%|███▌ | 553/1572 [1:09:20<2:03:21, 7.26s/it] 35%|███▌ | 554/1572 [1:09:27<2:03:17, 7.27s/it] {'loss': 0.6427, 'learning_rate': 1.3359580052493439e-05, 'epoch': 1.41} 35%|███▌ | 554/1572 [1:09:27<2:03:17, 7.27s/it] 35%|███▌ | 555/1572 [1:09:34<2:00:42, 7.12s/it] {'loss': 0.6918, 'learning_rate': 1.3346456692913388e-05, 'epoch': 1.41} 35%|███▌ | 555/1572 [1:09:34<2:00:42, 7.12s/it] 35%|███▌ | 556/1572 [1:09:41<2:01:22, 7.17s/it] {'loss': 0.6768, 'learning_rate': 1.3333333333333333e-05, 'epoch': 1.41} 35%|███▌ | 556/1572 [1:09:41<2:01:22, 7.17s/it] 35%|███▌ | 557/1572 [1:09:49<2:01:56, 7.21s/it] {'loss': 0.6826, 'learning_rate': 1.3320209973753282e-05, 'epoch': 1.41} 35%|███▌ | 557/1572 [1:09:49<2:01:56, 7.21s/it] 35%|███▌ | 558/1572 [1:09:56<2:00:02, 7.10s/it] {'loss': 0.6255, 'learning_rate': 1.3307086614173231e-05, 'epoch': 1.42} 35%|███▌ | 558/1572 [1:09:56<2:00:02, 7.10s/it] 36%|███▌ | 559/1572 [1:10:03<2:00:12, 7.12s/it] {'loss': 0.6723, 'learning_rate': 1.3293963254593176e-05, 'epoch': 1.42} 36%|███▌ | 559/1572 [1:10:03<2:00:12, 7.12s/it] 36%|███▌ | 560/1572 [1:10:10<2:01:25, 7.20s/it] {'loss': 0.6109, 'learning_rate': 1.3280839895013125e-05, 'epoch': 1.42} 36%|███▌ | 560/1572 [1:10:10<2:01:25, 7.20s/it] 36%|███▌ | 561/1572 [1:10:18<2:04:27, 7.39s/it] {'loss': 0.6529, 'learning_rate': 1.3267716535433072e-05, 'epoch': 1.42} 36%|███▌ | 561/1572 [1:10:18<2:04:27, 7.39s/it] 36%|███▌ | 562/1572 [1:10:25<2:03:17, 7.32s/it] {'loss': 0.6983, 'learning_rate': 1.325459317585302e-05, 'epoch': 1.43} 36%|███▌ | 562/1572 [1:10:25<2:03:17, 7.32s/it] 36%|███▌ | 563/1572 [1:10:32<2:03:26, 7.34s/it] {'loss': 0.6914, 'learning_rate': 1.3241469816272966e-05, 'epoch': 1.43} 36%|███▌ | 563/1572 [1:10:32<2:03:26, 7.34s/it] 36%|███▌ | 564/1572 [1:10:40<2:03:51, 7.37s/it] {'loss': 0.6383, 'learning_rate': 1.3228346456692915e-05, 'epoch': 1.43} 36%|███▌ | 564/1572 [1:10:40<2:03:51, 7.37s/it] 36%|███▌ | 565/1572 [1:10:47<2:00:52, 7.20s/it] {'loss': 0.6393, 'learning_rate': 1.321522309711286e-05, 'epoch': 1.43} 36%|███▌ | 565/1572 [1:10:47<2:00:52, 7.20s/it] 36%|███▌ | 566/1572 [1:10:54<2:01:41, 7.26s/it] {'loss': 0.7771, 'learning_rate': 1.320209973753281e-05, 'epoch': 1.44} 36%|███▌ | 566/1572 [1:10:54<2:01:41, 7.26s/it] 36%|███▌ | 567/1572 [1:11:02<2:03:37, 7.38s/it] {'loss': 0.8403, 'learning_rate': 1.3188976377952758e-05, 'epoch': 1.44} 36%|███▌ | 567/1572 [1:11:02<2:03:37, 7.38s/it] 36%|███▌ | 568/1572 [1:11:09<2:02:56, 7.35s/it] {'loss': 0.728, 'learning_rate': 1.3175853018372704e-05, 'epoch': 1.44} 36%|███▌ | 568/1572 [1:11:09<2:02:56, 7.35s/it] 36%|███▌ | 569/1572 [1:11:16<2:00:48, 7.23s/it] {'loss': 0.7123, 'learning_rate': 1.3162729658792653e-05, 'epoch': 1.44} 36%|███▌ | 569/1572 [1:11:16<2:00:48, 7.23s/it] 36%|███▋ | 570/1572 [1:11:23<2:01:59, 7.30s/it] {'loss': 0.6901, 'learning_rate': 1.3149606299212601e-05, 'epoch': 1.45} 36%|███▋ | 570/1572 [1:11:23<2:01:59, 7.30s/it] 36%|███▋ | 571/1572 [1:11:31<2:00:34, 7.23s/it] {'loss': 0.6541, 'learning_rate': 1.3136482939632547e-05, 'epoch': 1.45} 36%|███▋ | 571/1572 [1:11:31<2:00:34, 7.23s/it] 36%|███▋ | 572/1572 [1:11:38<1:59:28, 7.17s/it] {'loss': 0.6734, 'learning_rate': 1.3123359580052496e-05, 'epoch': 1.45} 36%|███▋ | 572/1572 [1:11:38<1:59:28, 7.17s/it] 36%|███▋ | 573/1572 [1:11:45<1:58:33, 7.12s/it] {'loss': 0.6712, 'learning_rate': 1.3110236220472441e-05, 'epoch': 1.45} 36%|███▋ | 573/1572 [1:11:45<1:58:33, 7.12s/it] 37%|███▋ | 574/1572 [1:11:52<1:58:38, 7.13s/it] {'loss': 0.6116, 'learning_rate': 1.309711286089239e-05, 'epoch': 1.46} 37%|███▋ | 574/1572 [1:11:52<1:58:38, 7.13s/it] 37%|███▋ | 575/1572 [1:11:59<1:58:03, 7.11s/it] {'loss': 0.6835, 'learning_rate': 1.3083989501312337e-05, 'epoch': 1.46} 37%|███▋ | 575/1572 [1:11:59<1:58:03, 7.11s/it] 37%|███▋ | 576/1572 [1:12:06<1:57:02, 7.05s/it] {'loss': 0.6363, 'learning_rate': 1.3070866141732284e-05, 'epoch': 1.46} 37%|███▋ | 576/1572 [1:12:06<1:57:02, 7.05s/it] 37%|███▋ | 577/1572 [1:12:13<1:59:12, 7.19s/it] {'loss': 0.6991, 'learning_rate': 1.3057742782152231e-05, 'epoch': 1.46} 37%|███▋ | 577/1572 [1:12:13<1:59:12, 7.19s/it] 37%|███▋ | 578/1572 [1:12:21<2:03:21, 7.45s/it] {'loss': 0.6465, 'learning_rate': 1.304461942257218e-05, 'epoch': 1.47} 37%|███▋ | 578/1572 [1:12:21<2:03:21, 7.45s/it] 37%|███▋ | 579/1572 [1:12:28<2:01:44, 7.36s/it] {'loss': 0.6969, 'learning_rate': 1.3031496062992125e-05, 'epoch': 1.47} 37%|███▋ | 579/1572 [1:12:28<2:01:44, 7.36s/it] 37%|███▋ | 580/1572 [1:12:36<2:03:35, 7.48s/it] {'loss': 0.7316, 'learning_rate': 1.3018372703412074e-05, 'epoch': 1.47} 37%|███▋ | 580/1572 [1:12:36<2:03:35, 7.48s/it] 37%|███▋ | 581/1572 [1:12:43<2:01:27, 7.35s/it] {'loss': 0.6521, 'learning_rate': 1.3005249343832023e-05, 'epoch': 1.48} 37%|███▋ | 581/1572 [1:12:43<2:01:27, 7.35s/it] 37%|███▋ | 582/1572 [1:12:50<2:00:11, 7.28s/it] {'loss': 0.6664, 'learning_rate': 1.2992125984251968e-05, 'epoch': 1.48} 37%|███▋ | 582/1572 [1:12:50<2:00:11, 7.28s/it] 37%|███▋ | 583/1572 [1:12:57<1:59:29, 7.25s/it] {'loss': 0.6338, 'learning_rate': 1.2979002624671917e-05, 'epoch': 1.48} 37%|███▋ | 583/1572 [1:12:57<1:59:29, 7.25s/it] 37%|███▋ | 584/1572 [1:13:04<1:56:57, 7.10s/it] {'loss': 0.637, 'learning_rate': 1.2965879265091864e-05, 'epoch': 1.48} 37%|███▋ | 584/1572 [1:13:04<1:56:57, 7.10s/it] 37%|███▋ | 585/1572 [1:13:12<2:00:17, 7.31s/it] {'loss': 0.6806, 'learning_rate': 1.2952755905511812e-05, 'epoch': 1.49} 37%|███▋ | 585/1572 [1:13:12<2:00:17, 7.31s/it] 37%|███▋ | 586/1572 [1:13:20<2:01:24, 7.39s/it] {'loss': 0.7268, 'learning_rate': 1.293963254593176e-05, 'epoch': 1.49} 37%|███▋ | 586/1572 [1:13:20<2:01:24, 7.39s/it] 37%|███▋ | 587/1572 [1:13:27<1:59:18, 7.27s/it] {'loss': 0.6092, 'learning_rate': 1.2926509186351707e-05, 'epoch': 1.49} 37%|███▋ | 587/1572 [1:13:27<1:59:18, 7.27s/it] 37%|███▋ | 588/1572 [1:13:34<2:00:05, 7.32s/it] {'loss': 0.7613, 'learning_rate': 1.2913385826771655e-05, 'epoch': 1.49} 37%|███▋ | 588/1572 [1:13:34<2:00:05, 7.32s/it] 37%|███▋ | 589/1572 [1:13:41<1:58:19, 7.22s/it] {'loss': 0.6138, 'learning_rate': 1.2900262467191602e-05, 'epoch': 1.5} 37%|███▋ | 589/1572 [1:13:41<1:58:19, 7.22s/it] 38%|███▊ | 590/1572 [1:13:49<1:59:35, 7.31s/it] {'loss': 0.6425, 'learning_rate': 1.288713910761155e-05, 'epoch': 1.5} 38%|███▊ | 590/1572 [1:13:49<1:59:35, 7.31s/it] 38%|███▊ | 591/1572 [1:13:56<1:59:44, 7.32s/it] {'loss': 0.6976, 'learning_rate': 1.2874015748031496e-05, 'epoch': 1.5} 38%|███▊ | 591/1572 [1:13:56<1:59:44, 7.32s/it] 38%|███▊ | 592/1572 [1:14:03<2:00:40, 7.39s/it] {'loss': 0.7524, 'learning_rate': 1.2860892388451445e-05, 'epoch': 1.5} 38%|███▊ | 592/1572 [1:14:03<2:00:40, 7.39s/it] 38%|███▊ | 593/1572 [1:14:11<2:00:08, 7.36s/it] {'loss': 0.6862, 'learning_rate': 1.2847769028871394e-05, 'epoch': 1.51} 38%|███▊ | 593/1572 [1:14:11<2:00:08, 7.36s/it] 38%|███▊ | 594/1572 [1:14:18<1:58:05, 7.24s/it] {'loss': 0.6916, 'learning_rate': 1.2834645669291339e-05, 'epoch': 1.51} 38%|███▊ | 594/1572 [1:14:18<1:58:05, 7.24s/it] 38%|███▊ | 595/1572 [1:14:25<1:58:14, 7.26s/it] {'loss': 0.6429, 'learning_rate': 1.2821522309711288e-05, 'epoch': 1.51} 38%|███▊ | 595/1572 [1:14:25<1:58:14, 7.26s/it] 38%|███▊ | 596/1572 [1:14:32<1:56:28, 7.16s/it] {'loss': 0.7474, 'learning_rate': 1.2808398950131235e-05, 'epoch': 1.51} 38%|███▊ | 596/1572 [1:14:32<1:56:28, 7.16s/it] 38%|███▊ | 597/1572 [1:14:39<1:56:01, 7.14s/it] {'loss': 0.6388, 'learning_rate': 1.2795275590551182e-05, 'epoch': 1.52} 38%|███▊ | 597/1572 [1:14:39<1:56:01, 7.14s/it] 38%|███▊ | 598/1572 [1:14:46<1:57:01, 7.21s/it] {'loss': 0.7393, 'learning_rate': 1.2782152230971129e-05, 'epoch': 1.52} 38%|███▊ | 598/1572 [1:14:46<1:57:01, 7.21s/it] 38%|███▊ | 599/1572 [1:14:54<1:57:07, 7.22s/it] {'loss': 0.632, 'learning_rate': 1.2769028871391078e-05, 'epoch': 1.52} 38%|███▊ | 599/1572 [1:14:54<1:57:07, 7.22s/it] 38%|███▊ | 600/1572 [1:15:01<1:56:41, 7.20s/it] {'loss': 0.6251, 'learning_rate': 1.2755905511811025e-05, 'epoch': 1.52} 38%|███▊ | 600/1572 [1:15:01<1:56:41, 7.20s/it] 38%|███▊ | 601/1572 [1:15:08<1:58:27, 7.32s/it] {'loss': 0.6984, 'learning_rate': 1.2742782152230972e-05, 'epoch': 1.53} 38%|███▊ | 601/1572 [1:15:08<1:58:27, 7.32s/it] 38%|███▊ | 602/1572 [1:15:16<1:57:55, 7.29s/it] {'loss': 0.6141, 'learning_rate': 1.2729658792650921e-05, 'epoch': 1.53} 38%|███▊ | 602/1572 [1:15:16<1:57:55, 7.29s/it] 38%|███▊ | 603/1572 [1:15:23<1:58:19, 7.33s/it] {'loss': 0.6312, 'learning_rate': 1.2716535433070866e-05, 'epoch': 1.53} 38%|███▊ | 603/1572 [1:15:23<1:58:19, 7.33s/it] 38%|███▊ | 604/1572 [1:15:31<2:00:32, 7.47s/it] {'loss': 0.5493, 'learning_rate': 1.2703412073490815e-05, 'epoch': 1.53} 38%|███▊ | 604/1572 [1:15:31<2:00:32, 7.47s/it] 38%|███▊ | 605/1572 [1:15:38<1:57:07, 7.27s/it] {'loss': 0.6029, 'learning_rate': 1.269028871391076e-05, 'epoch': 1.54} 38%|███▊ | 605/1572 [1:15:38<1:57:07, 7.27s/it] 39%|███▊ | 606/1572 [1:15:45<1:57:38, 7.31s/it] {'loss': 0.6128, 'learning_rate': 1.267716535433071e-05, 'epoch': 1.54} 39%|███▊ | 606/1572 [1:15:45<1:57:38, 7.31s/it] 39%|███▊ | 607/1572 [1:15:52<1:54:45, 7.13s/it] {'loss': 0.6787, 'learning_rate': 1.2664041994750658e-05, 'epoch': 1.54} 39%|███▊ | 607/1572 [1:15:52<1:54:45, 7.13s/it] 39%|███▊ | 608/1572 [1:15:59<1:57:03, 7.29s/it] {'loss': 0.6808, 'learning_rate': 1.2650918635170604e-05, 'epoch': 1.54} 39%|███▊ | 608/1572 [1:15:59<1:57:03, 7.29s/it] 39%|███▊ | 609/1572 [1:16:06<1:55:16, 7.18s/it] {'loss': 0.7465, 'learning_rate': 1.2637795275590552e-05, 'epoch': 1.55} 39%|███▊ | 609/1572 [1:16:06<1:55:16, 7.18s/it] 39%|███▉ | 610/1572 [1:16:13<1:54:39, 7.15s/it] {'loss': 0.5791, 'learning_rate': 1.26246719160105e-05, 'epoch': 1.55} 39%|███▉ | 610/1572 [1:16:13<1:54:39, 7.15s/it] 39%|███▉ | 611/1572 [1:16:21<1:55:35, 7.22s/it] {'loss': 0.7059, 'learning_rate': 1.2611548556430447e-05, 'epoch': 1.55} 39%|███▉ | 611/1572 [1:16:21<1:55:35, 7.22s/it] 39%|███▉ | 612/1572 [1:16:28<1:54:28, 7.15s/it] {'loss': 0.6348, 'learning_rate': 1.2598425196850394e-05, 'epoch': 1.55} 39%|███▉ | 612/1572 [1:16:28<1:54:28, 7.15s/it] 39%|███▉ | 613/1572 [1:16:35<1:53:29, 7.10s/it] {'loss': 0.6382, 'learning_rate': 1.2585301837270343e-05, 'epoch': 1.56} 39%|███▉ | 613/1572 [1:16:35<1:53:29, 7.10s/it] 39%|███▉ | 614/1572 [1:16:43<1:56:29, 7.30s/it] {'loss': 0.6318, 'learning_rate': 1.2572178477690288e-05, 'epoch': 1.56} 39%|███▉ | 614/1572 [1:16:43<1:56:29, 7.30s/it] 39%|███▉ | 615/1572 [1:16:50<1:57:20, 7.36s/it] {'loss': 0.6412, 'learning_rate': 1.2559055118110237e-05, 'epoch': 1.56} 39%|███▉ | 615/1572 [1:16:50<1:57:20, 7.36s/it] 39%|███▉ | 616/1572 [1:16:58<1:59:12, 7.48s/it] {'loss': 0.6998, 'learning_rate': 1.2545931758530186e-05, 'epoch': 1.56} 39%|███▉ | 616/1572 [1:16:58<1:59:12, 7.48s/it] 39%|███▉ | 617/1572 [1:17:04<1:55:04, 7.23s/it] {'loss': 0.6655, 'learning_rate': 1.2532808398950131e-05, 'epoch': 1.57} 39%|███▉ | 617/1572 [1:17:04<1:55:04, 7.23s/it] 39%|███▉ | 618/1572 [1:17:11<1:52:57, 7.10s/it] {'loss': 0.6509, 'learning_rate': 1.251968503937008e-05, 'epoch': 1.57} 39%|███▉ | 618/1572 [1:17:11<1:52:57, 7.10s/it] 39%|███▉ | 619/1572 [1:17:18<1:51:50, 7.04s/it] {'loss': 0.6825, 'learning_rate': 1.2506561679790029e-05, 'epoch': 1.57} 39%|███▉ | 619/1572 [1:17:18<1:51:50, 7.04s/it] 39%|███▉ | 620/1572 [1:17:25<1:52:35, 7.10s/it] {'loss': 0.7011, 'learning_rate': 1.2493438320209974e-05, 'epoch': 1.57} 39%|███▉ | 620/1572 [1:17:25<1:52:35, 7.10s/it] 40%|███▉ | 621/1572 [1:17:32<1:52:05, 7.07s/it] {'loss': 0.6242, 'learning_rate': 1.2480314960629923e-05, 'epoch': 1.58} 40%|███▉ | 621/1572 [1:17:32<1:52:05, 7.07s/it] 40%|███▉ | 622/1572 [1:17:39<1:51:30, 7.04s/it] {'loss': 0.6564, 'learning_rate': 1.246719160104987e-05, 'epoch': 1.58} 40%|███▉ | 622/1572 [1:17:39<1:51:30, 7.04s/it] 40%|███▉ | 623/1572 [1:17:47<1:52:35, 7.12s/it] {'loss': 0.744, 'learning_rate': 1.2454068241469817e-05, 'epoch': 1.58} 40%|███▉ | 623/1572 [1:17:47<1:52:35, 7.12s/it] 40%|███▉ | 624/1572 [1:17:54<1:51:30, 7.06s/it] {'loss': 0.6374, 'learning_rate': 1.2440944881889764e-05, 'epoch': 1.58} 40%|███▉ | 624/1572 [1:17:54<1:51:30, 7.06s/it] 40%|███▉ | 625/1572 [1:18:01<1:52:17, 7.11s/it] {'loss': 0.6434, 'learning_rate': 1.2427821522309713e-05, 'epoch': 1.59} 40%|███▉ | 625/1572 [1:18:01<1:52:17, 7.11s/it] 40%|███▉ | 626/1572 [1:18:08<1:50:51, 7.03s/it] {'loss': 0.7321, 'learning_rate': 1.2414698162729659e-05, 'epoch': 1.59} 40%|███▉ | 626/1572 [1:18:08<1:50:51, 7.03s/it] 40%|███▉ | 627/1572 [1:18:15<1:50:21, 7.01s/it] {'loss': 0.6143, 'learning_rate': 1.2401574803149607e-05, 'epoch': 1.59} 40%|███▉ | 627/1572 [1:18:15<1:50:21, 7.01s/it] 40%|███▉ | 628/1572 [1:18:21<1:49:14, 6.94s/it] {'loss': 0.6818, 'learning_rate': 1.2388451443569556e-05, 'epoch': 1.59} 40%|███▉ | 628/1572 [1:18:21<1:49:14, 6.94s/it] 40%|████ | 629/1572 [1:18:28<1:49:17, 6.95s/it] {'loss': 0.6714, 'learning_rate': 1.2375328083989502e-05, 'epoch': 1.6} 40%|████ | 629/1572 [1:18:28<1:49:17, 6.95s/it] 40%|████ | 630/1572 [1:18:36<1:53:26, 7.23s/it] {'loss': 0.6964, 'learning_rate': 1.236220472440945e-05, 'epoch': 1.6} 40%|████ | 630/1572 [1:18:36<1:53:26, 7.23s/it] 40%|████ | 631/1572 [1:18:44<1:53:49, 7.26s/it] {'loss': 0.7756, 'learning_rate': 1.2349081364829398e-05, 'epoch': 1.6} 40%|████ | 631/1572 [1:18:44<1:53:49, 7.26s/it] 40%|████ | 632/1572 [1:18:51<1:52:46, 7.20s/it] {'loss': 0.7802, 'learning_rate': 1.2335958005249345e-05, 'epoch': 1.6} 40%|████ | 632/1572 [1:18:51<1:52:46, 7.20s/it] 40%|████ | 633/1572 [1:18:58<1:52:11, 7.17s/it] {'loss': 0.6589, 'learning_rate': 1.2322834645669293e-05, 'epoch': 1.61} 40%|████ | 633/1572 [1:18:58<1:52:11, 7.17s/it] 40%|████ | 634/1572 [1:19:05<1:51:44, 7.15s/it] {'loss': 0.7549, 'learning_rate': 1.230971128608924e-05, 'epoch': 1.61} 40%|████ | 634/1572 [1:19:05<1:51:44, 7.15s/it] 40%|████ | 635/1572 [1:19:12<1:51:10, 7.12s/it] {'loss': 0.7144, 'learning_rate': 1.2296587926509188e-05, 'epoch': 1.61} 40%|████ | 635/1572 [1:19:12<1:51:10, 7.12s/it] 40%|████ | 636/1572 [1:19:19<1:52:23, 7.20s/it] {'loss': 0.6988, 'learning_rate': 1.2283464566929135e-05, 'epoch': 1.61} 40%|████ | 636/1572 [1:19:19<1:52:23, 7.20s/it] 41%|████ | 637/1572 [1:19:27<1:55:07, 7.39s/it] {'loss': 0.6001, 'learning_rate': 1.2270341207349082e-05, 'epoch': 1.62} 41%|████ | 637/1572 [1:19:27<1:55:07, 7.39s/it] 41%|████ | 638/1572 [1:19:34<1:53:55, 7.32s/it] {'loss': 0.6947, 'learning_rate': 1.2257217847769029e-05, 'epoch': 1.62} 41%|████ | 638/1572 [1:19:34<1:53:55, 7.32s/it] 41%|████ | 639/1572 [1:19:42<1:54:25, 7.36s/it] {'loss': 0.645, 'learning_rate': 1.2244094488188978e-05, 'epoch': 1.62} 41%|████ | 639/1572 [1:19:42<1:54:25, 7.36s/it] 41%|████ | 640/1572 [1:19:49<1:56:03, 7.47s/it] {'loss': 0.66, 'learning_rate': 1.2230971128608923e-05, 'epoch': 1.62} 41%|████ | 640/1572 [1:19:49<1:56:03, 7.47s/it] 41%|████ | 641/1572 [1:19:57<1:54:05, 7.35s/it] {'loss': 0.6684, 'learning_rate': 1.2217847769028872e-05, 'epoch': 1.63} 41%|████ | 641/1572 [1:19:57<1:54:05, 7.35s/it] 41%|████ | 642/1572 [1:20:04<1:56:14, 7.50s/it] {'loss': 0.7285, 'learning_rate': 1.2204724409448821e-05, 'epoch': 1.63} 41%|████ | 642/1572 [1:20:04<1:56:14, 7.50s/it] 41%|████ | 643/1572 [1:20:11<1:52:50, 7.29s/it] {'loss': 0.6325, 'learning_rate': 1.2191601049868766e-05, 'epoch': 1.63} 41%|████ | 643/1572 [1:20:11<1:52:50, 7.29s/it] 41%|████ | 644/1572 [1:20:19<1:53:16, 7.32s/it] {'loss': 0.6233, 'learning_rate': 1.2178477690288715e-05, 'epoch': 1.64} 41%|████ | 644/1572 [1:20:19<1:53:16, 7.32s/it] 41%|████ | 645/1572 [1:20:26<1:52:06, 7.26s/it] {'loss': 0.6203, 'learning_rate': 1.2165354330708662e-05, 'epoch': 1.64} 41%|████ | 645/1572 [1:20:26<1:52:06, 7.26s/it] 41%|████ | 646/1572 [1:20:33<1:50:26, 7.16s/it] {'loss': 0.6795, 'learning_rate': 1.215223097112861e-05, 'epoch': 1.64} 41%|████ | 646/1572 [1:20:33<1:50:26, 7.16s/it] 41%|████ | 647/1572 [1:20:40<1:52:21, 7.29s/it] {'loss': 0.6835, 'learning_rate': 1.2139107611548558e-05, 'epoch': 1.64} 41%|████ | 647/1572 [1:20:40<1:52:21, 7.29s/it] 41%|████ | 648/1572 [1:20:48<1:54:58, 7.47s/it] {'loss': 0.6351, 'learning_rate': 1.2125984251968505e-05, 'epoch': 1.65} 41%|████ | 648/1572 [1:20:48<1:54:58, 7.47s/it] 41%|████▏ | 649/1572 [1:20:55<1:53:37, 7.39s/it] {'loss': 0.6153, 'learning_rate': 1.2112860892388452e-05, 'epoch': 1.65} 41%|████▏ | 649/1572 [1:20:55<1:53:37, 7.39s/it] 41%|████▏ | 650/1572 [1:21:03<1:54:20, 7.44s/it] {'loss': 0.6549, 'learning_rate': 1.20997375328084e-05, 'epoch': 1.65} 41%|████▏ | 650/1572 [1:21:03<1:54:20, 7.44s/it] 41%|████▏ | 651/1572 [1:21:10<1:53:32, 7.40s/it] {'loss': 0.5985, 'learning_rate': 1.2086614173228348e-05, 'epoch': 1.65} 41%|████▏ | 651/1572 [1:21:10<1:53:32, 7.40s/it] 41%|████▏ | 652/1572 [1:21:18<1:53:27, 7.40s/it] {'loss': 0.6196, 'learning_rate': 1.2073490813648294e-05, 'epoch': 1.66} 41%|████▏ | 652/1572 [1:21:18<1:53:27, 7.40s/it] 42%|████▏ | 653/1572 [1:21:25<1:52:42, 7.36s/it] {'loss': 0.6036, 'learning_rate': 1.2060367454068243e-05, 'epoch': 1.66} 42%|████▏ | 653/1572 [1:21:25<1:52:42, 7.36s/it] 42%|████▏ | 654/1572 [1:21:33<1:55:00, 7.52s/it] {'loss': 0.733, 'learning_rate': 1.2047244094488191e-05, 'epoch': 1.66} 42%|████▏ | 654/1572 [1:21:33<1:55:00, 7.52s/it] 42%|████▏ | 655/1572 [1:21:41<1:56:13, 7.61s/it] {'loss': 0.655, 'learning_rate': 1.2034120734908137e-05, 'epoch': 1.66} 42%|████▏ | 655/1572 [1:21:41<1:56:13, 7.61s/it] 42%|████▏ | 656/1572 [1:21:48<1:54:28, 7.50s/it] {'loss': 0.6435, 'learning_rate': 1.2020997375328086e-05, 'epoch': 1.67} 42%|████▏ | 656/1572 [1:21:48<1:54:28, 7.50s/it] 42%|████▏ | 657/1572 [1:21:55<1:52:39, 7.39s/it] {'loss': 0.6405, 'learning_rate': 1.2007874015748033e-05, 'epoch': 1.67} 42%|████▏ | 657/1572 [1:21:55<1:52:39, 7.39s/it] 42%|████▏ | 658/1572 [1:22:02<1:52:58, 7.42s/it] {'loss': 0.703, 'learning_rate': 1.199475065616798e-05, 'epoch': 1.67} 42%|████▏ | 658/1572 [1:22:02<1:52:58, 7.42s/it] 42%|████▏ | 659/1572 [1:22:10<1:52:44, 7.41s/it] {'loss': 0.6925, 'learning_rate': 1.1981627296587927e-05, 'epoch': 1.67} 42%|████▏ | 659/1572 [1:22:10<1:52:44, 7.41s/it] 42%|████▏ | 660/1572 [1:22:17<1:51:25, 7.33s/it] {'loss': 0.6758, 'learning_rate': 1.1968503937007876e-05, 'epoch': 1.68} 42%|████▏ | 660/1572 [1:22:17<1:51:25, 7.33s/it] 42%|████▏ | 661/1572 [1:22:25<1:52:46, 7.43s/it] {'loss': 0.7093, 'learning_rate': 1.1955380577427821e-05, 'epoch': 1.68} 42%|████▏ | 661/1572 [1:22:25<1:52:46, 7.43s/it] 42%|████▏ | 662/1572 [1:22:32<1:50:28, 7.28s/it] {'loss': 0.6541, 'learning_rate': 1.194225721784777e-05, 'epoch': 1.68} 42%|████▏ | 662/1572 [1:22:32<1:50:28, 7.28s/it] 42%|████▏ | 663/1572 [1:22:39<1:50:09, 7.27s/it] {'loss': 0.5868, 'learning_rate': 1.1929133858267719e-05, 'epoch': 1.68} 42%|████▏ | 663/1572 [1:22:39<1:50:09, 7.27s/it] 42%|████▏ | 664/1572 [1:22:46<1:51:23, 7.36s/it] {'loss': 0.6866, 'learning_rate': 1.1916010498687664e-05, 'epoch': 1.69} 42%|████▏ | 664/1572 [1:22:46<1:51:23, 7.36s/it] 42%|████▏ | 665/1572 [1:22:53<1:50:06, 7.28s/it] {'loss': 0.6209, 'learning_rate': 1.1902887139107613e-05, 'epoch': 1.69} 42%|████▏ | 665/1572 [1:22:53<1:50:06, 7.28s/it] 42%|████▏ | 666/1572 [1:23:01<1:51:01, 7.35s/it] {'loss': 0.6694, 'learning_rate': 1.1889763779527562e-05, 'epoch': 1.69} 42%|████▏ | 666/1572 [1:23:01<1:51:01, 7.35s/it] 42%|████▏ | 667/1572 [1:23:08<1:51:04, 7.36s/it] {'loss': 0.8347, 'learning_rate': 1.1876640419947507e-05, 'epoch': 1.69} 42%|████▏ | 667/1572 [1:23:08<1:51:04, 7.36s/it] 42%|████▏ | 668/1572 [1:23:16<1:50:50, 7.36s/it] {'loss': 0.8037, 'learning_rate': 1.1863517060367456e-05, 'epoch': 1.7} 42%|████▏ | 668/1572 [1:23:16<1:50:50, 7.36s/it] 43%|████▎ | 669/1572 [1:23:23<1:51:02, 7.38s/it] {'loss': 0.6485, 'learning_rate': 1.1850393700787401e-05, 'epoch': 1.7} 43%|████▎ | 669/1572 [1:23:23<1:51:02, 7.38s/it] 43%|████▎ | 670/1572 [1:23:30<1:50:49, 7.37s/it] {'loss': 0.6699, 'learning_rate': 1.183727034120735e-05, 'epoch': 1.7} 43%|████▎ | 670/1572 [1:23:30<1:50:49, 7.37s/it] 43%|████▎ | 671/1572 [1:23:37<1:49:03, 7.26s/it] {'loss': 0.7128, 'learning_rate': 1.1824146981627297e-05, 'epoch': 1.7} 43%|████▎ | 671/1572 [1:23:37<1:49:03, 7.26s/it] 43%|████▎ | 672/1572 [1:23:45<1:49:12, 7.28s/it] {'loss': 0.6803, 'learning_rate': 1.1811023622047245e-05, 'epoch': 1.71} 43%|████▎ | 672/1572 [1:23:45<1:49:12, 7.28s/it] 43%|████▎ | 673/1572 [1:23:52<1:50:30, 7.38s/it] {'loss': 0.6019, 'learning_rate': 1.1797900262467192e-05, 'epoch': 1.71} 43%|████▎ | 673/1572 [1:23:52<1:50:30, 7.38s/it] 43%|████▎ | 674/1572 [1:24:00<1:49:43, 7.33s/it] {'loss': 0.603, 'learning_rate': 1.178477690288714e-05, 'epoch': 1.71} 43%|████▎ | 674/1572 [1:24:00<1:49:43, 7.33s/it] 43%|████▎ | 675/1572 [1:24:07<1:48:39, 7.27s/it] {'loss': 0.7078, 'learning_rate': 1.1771653543307086e-05, 'epoch': 1.71} 43%|████▎ | 675/1572 [1:24:07<1:48:39, 7.27s/it] 43%|████▎ | 676/1572 [1:24:14<1:49:10, 7.31s/it] {'loss': 0.6425, 'learning_rate': 1.1758530183727035e-05, 'epoch': 1.72} 43%|████▎ | 676/1572 [1:24:14<1:49:10, 7.31s/it] 43%|████▎ | 677/1572 [1:24:21<1:48:15, 7.26s/it] {'loss': 0.6275, 'learning_rate': 1.1745406824146984e-05, 'epoch': 1.72} 43%|████▎ | 677/1572 [1:24:21<1:48:15, 7.26s/it] 43%|████▎ | 678/1572 [1:24:29<1:50:08, 7.39s/it] {'loss': 0.6658, 'learning_rate': 1.1732283464566929e-05, 'epoch': 1.72} 43%|████▎ | 678/1572 [1:24:29<1:50:08, 7.39s/it] 43%|████▎ | 679/1572 [1:24:36<1:48:59, 7.32s/it] {'loss': 0.6794, 'learning_rate': 1.1719160104986878e-05, 'epoch': 1.72} 43%|████▎ | 679/1572 [1:24:36<1:48:59, 7.32s/it] 43%|████▎ | 680/1572 [1:24:43<1:48:38, 7.31s/it] {'loss': 0.6406, 'learning_rate': 1.1706036745406827e-05, 'epoch': 1.73} 43%|████▎ | 680/1572 [1:24:43<1:48:38, 7.31s/it] 43%|████▎ | 681/1572 [1:24:51<1:49:16, 7.36s/it] {'loss': 0.6341, 'learning_rate': 1.1692913385826772e-05, 'epoch': 1.73} 43%|████▎ | 681/1572 [1:24:51<1:49:16, 7.36s/it] 43%|████▎ | 682/1572 [1:24:58<1:46:35, 7.19s/it] {'loss': 0.6533, 'learning_rate': 1.167979002624672e-05, 'epoch': 1.73} 43%|████▎ | 682/1572 [1:24:58<1:46:35, 7.19s/it] 43%|████▎ | 683/1572 [1:25:05<1:45:28, 7.12s/it] {'loss': 0.7331, 'learning_rate': 1.1666666666666668e-05, 'epoch': 1.73} 43%|████▎ | 683/1572 [1:25:05<1:45:28, 7.12s/it] 44%|████▎ | 684/1572 [1:25:12<1:45:48, 7.15s/it] {'loss': 0.658, 'learning_rate': 1.1653543307086615e-05, 'epoch': 1.74} 44%|████▎ | 684/1572 [1:25:12<1:45:48, 7.15s/it] 44%|████▎ | 685/1572 [1:25:19<1:43:45, 7.02s/it] {'loss': 0.6517, 'learning_rate': 1.1640419947506562e-05, 'epoch': 1.74} 44%|████▎ | 685/1572 [1:25:19<1:43:45, 7.02s/it] 44%|████▎ | 686/1572 [1:25:26<1:44:47, 7.10s/it] {'loss': 0.6398, 'learning_rate': 1.1627296587926511e-05, 'epoch': 1.74} 44%|████▎ | 686/1572 [1:25:26<1:44:47, 7.10s/it] 44%|████▎ | 687/1572 [1:25:33<1:45:31, 7.15s/it] {'loss': 0.674, 'learning_rate': 1.1614173228346456e-05, 'epoch': 1.74} 44%|████▎ | 687/1572 [1:25:33<1:45:31, 7.15s/it] 44%|████▍ | 688/1572 [1:25:40<1:44:44, 7.11s/it] {'loss': 0.6094, 'learning_rate': 1.1601049868766405e-05, 'epoch': 1.75} 44%|████▍ | 688/1572 [1:25:40<1:44:44, 7.11s/it] 44%|████▍ | 689/1572 [1:25:47<1:43:59, 7.07s/it] {'loss': 0.5922, 'learning_rate': 1.1587926509186354e-05, 'epoch': 1.75} 44%|████▍ | 689/1572 [1:25:47<1:43:59, 7.07s/it] 44%|████▍ | 690/1572 [1:25:54<1:43:01, 7.01s/it] {'loss': 0.6527, 'learning_rate': 1.15748031496063e-05, 'epoch': 1.75} 44%|████▍ | 690/1572 [1:25:54<1:43:01, 7.01s/it] 44%|████▍ | 691/1572 [1:26:01<1:44:09, 7.09s/it] {'loss': 0.6102, 'learning_rate': 1.1561679790026248e-05, 'epoch': 1.75} 44%|████▍ | 691/1572 [1:26:01<1:44:09, 7.09s/it] 44%|████▍ | 692/1572 [1:26:08<1:43:03, 7.03s/it] {'loss': 0.6422, 'learning_rate': 1.1548556430446195e-05, 'epoch': 1.76} 44%|████▍ | 692/1572 [1:26:08<1:43:03, 7.03s/it] 44%|████▍ | 693/1572 [1:26:15<1:42:08, 6.97s/it] {'loss': 0.6254, 'learning_rate': 1.1535433070866142e-05, 'epoch': 1.76} 44%|████▍ | 693/1572 [1:26:15<1:42:08, 6.97s/it] 44%|████▍ | 694/1572 [1:26:22<1:42:52, 7.03s/it] {'loss': 0.5598, 'learning_rate': 1.1522309711286091e-05, 'epoch': 1.76} 44%|████▍ | 694/1572 [1:26:22<1:42:52, 7.03s/it] 44%|████▍ | 695/1572 [1:26:29<1:42:21, 7.00s/it] {'loss': 0.6273, 'learning_rate': 1.1509186351706038e-05, 'epoch': 1.76} 44%|████▍ | 695/1572 [1:26:29<1:42:21, 7.00s/it] 44%|████▍ | 696/1572 [1:26:37<1:44:09, 7.13s/it] {'loss': 0.6342, 'learning_rate': 1.1496062992125985e-05, 'epoch': 1.77} 44%|████▍ | 696/1572 [1:26:37<1:44:09, 7.13s/it] 44%|████▍ | 697/1572 [1:26:44<1:44:19, 7.15s/it] {'loss': 0.7083, 'learning_rate': 1.1482939632545933e-05, 'epoch': 1.77} 44%|████▍ | 697/1572 [1:26:44<1:44:19, 7.15s/it] 44%|████▍ | 698/1572 [1:26:51<1:43:28, 7.10s/it] {'loss': 0.683, 'learning_rate': 1.1469816272965881e-05, 'epoch': 1.77} 44%|████▍ | 698/1572 [1:26:51<1:43:28, 7.10s/it] 44%|████▍ | 699/1572 [1:26:58<1:42:39, 7.06s/it] {'loss': 0.5848, 'learning_rate': 1.1456692913385827e-05, 'epoch': 1.77} 44%|████▍ | 699/1572 [1:26:58<1:42:39, 7.06s/it] 45%|████▍ | 700/1572 [1:27:06<1:48:23, 7.46s/it] {'loss': 0.7239, 'learning_rate': 1.1443569553805776e-05, 'epoch': 1.78} 45%|████▍ | 700/1572 [1:27:06<1:48:23, 7.46s/it] 45%|████▍ | 701/1572 [1:27:13<1:46:32, 7.34s/it] {'loss': 0.5985, 'learning_rate': 1.1430446194225721e-05, 'epoch': 1.78} 45%|████▍ | 701/1572 [1:27:13<1:46:32, 7.34s/it] 45%|████▍ | 702/1572 [1:27:20<1:45:27, 7.27s/it] {'loss': 0.699, 'learning_rate': 1.141732283464567e-05, 'epoch': 1.78} 45%|████▍ | 702/1572 [1:27:20<1:45:27, 7.27s/it] 45%|████▍ | 703/1572 [1:27:27<1:44:25, 7.21s/it] {'loss': 0.6788, 'learning_rate': 1.1404199475065619e-05, 'epoch': 1.78} 45%|████▍ | 703/1572 [1:27:27<1:44:25, 7.21s/it] 45%|████▍ | 704/1572 [1:27:34<1:43:12, 7.13s/it] {'loss': 0.6577, 'learning_rate': 1.1391076115485564e-05, 'epoch': 1.79} 45%|████▍ | 704/1572 [1:27:34<1:43:12, 7.13s/it] 45%|████▍ | 705/1572 [1:27:41<1:41:18, 7.01s/it] {'loss': 0.6856, 'learning_rate': 1.1377952755905513e-05, 'epoch': 1.79} 45%|████▍ | 705/1572 [1:27:41<1:41:18, 7.01s/it] 45%|████▍ | 706/1572 [1:27:48<1:40:51, 6.99s/it] {'loss': 0.6631, 'learning_rate': 1.136482939632546e-05, 'epoch': 1.79} 45%|████▍ | 706/1572 [1:27:48<1:40:51, 6.99s/it] 45%|████▍ | 707/1572 [1:27:56<1:44:13, 7.23s/it] {'loss': 0.7377, 'learning_rate': 1.1351706036745407e-05, 'epoch': 1.79} 45%|████▍ | 707/1572 [1:27:56<1:44:13, 7.23s/it] 45%|████▌ | 708/1572 [1:28:03<1:44:56, 7.29s/it] {'loss': 0.7019, 'learning_rate': 1.1338582677165354e-05, 'epoch': 1.8} 45%|████▌ | 708/1572 [1:28:03<1:44:56, 7.29s/it] 45%|████▌ | 709/1572 [1:28:11<1:46:48, 7.43s/it] {'loss': 0.7614, 'learning_rate': 1.1325459317585303e-05, 'epoch': 1.8} 45%|████▌ | 709/1572 [1:28:11<1:46:48, 7.43s/it] 45%|████▌ | 710/1572 [1:28:18<1:47:18, 7.47s/it] {'loss': 0.6117, 'learning_rate': 1.131233595800525e-05, 'epoch': 1.8} 45%|████▌ | 710/1572 [1:28:18<1:47:18, 7.47s/it] 45%|████▌ | 711/1572 [1:28:26<1:47:31, 7.49s/it] {'loss': 0.6598, 'learning_rate': 1.1299212598425197e-05, 'epoch': 1.81} 45%|████▌ | 711/1572 [1:28:26<1:47:31, 7.49s/it] 45%|████▌ | 712/1572 [1:28:33<1:45:15, 7.34s/it] {'loss': 0.6825, 'learning_rate': 1.1286089238845146e-05, 'epoch': 1.81} 45%|████▌ | 712/1572 [1:28:33<1:45:15, 7.34s/it] 45%|████▌ | 713/1572 [1:28:40<1:44:29, 7.30s/it] {'loss': 0.6691, 'learning_rate': 1.1272965879265092e-05, 'epoch': 1.81} 45%|████▌ | 713/1572 [1:28:40<1:44:29, 7.30s/it] 45%|████▌ | 714/1572 [1:28:49<1:49:01, 7.62s/it] {'loss': 0.6533, 'learning_rate': 1.125984251968504e-05, 'epoch': 1.81} 45%|████▌ | 714/1572 [1:28:49<1:49:01, 7.62s/it] 45%|████▌ | 715/1572 [1:28:55<1:44:48, 7.34s/it] {'loss': 0.5929, 'learning_rate': 1.124671916010499e-05, 'epoch': 1.82} 45%|████▌ | 715/1572 [1:28:55<1:44:48, 7.34s/it] 46%|████▌ | 716/1572 [1:29:02<1:43:10, 7.23s/it] {'loss': 0.6919, 'learning_rate': 1.1233595800524935e-05, 'epoch': 1.82} 46%|████▌ | 716/1572 [1:29:02<1:43:10, 7.23s/it] 46%|████▌ | 717/1572 [1:29:11<1:47:30, 7.54s/it] {'loss': 0.7194, 'learning_rate': 1.1220472440944883e-05, 'epoch': 1.82} 46%|████▌ | 717/1572 [1:29:11<1:47:30, 7.54s/it] 46%|████▌ | 718/1572 [1:29:18<1:45:41, 7.43s/it] {'loss': 0.6258, 'learning_rate': 1.120734908136483e-05, 'epoch': 1.82} 46%|████▌ | 718/1572 [1:29:18<1:45:41, 7.43s/it] 46%|████▌ | 719/1572 [1:29:25<1:46:25, 7.49s/it] {'loss': 0.7814, 'learning_rate': 1.1194225721784778e-05, 'epoch': 1.83} 46%|████▌ | 719/1572 [1:29:25<1:46:25, 7.49s/it] 46%|████▌ | 720/1572 [1:29:32<1:44:53, 7.39s/it] {'loss': 0.6551, 'learning_rate': 1.1181102362204725e-05, 'epoch': 1.83} 46%|████▌ | 720/1572 [1:29:32<1:44:53, 7.39s/it] 46%|████▌ | 721/1572 [1:29:39<1:42:54, 7.26s/it] {'loss': 0.6137, 'learning_rate': 1.1167979002624674e-05, 'epoch': 1.83} 46%|████▌ | 721/1572 [1:29:39<1:42:54, 7.26s/it] 46%|████▌ | 722/1572 [1:29:46<1:41:34, 7.17s/it] {'loss': 0.6571, 'learning_rate': 1.1154855643044619e-05, 'epoch': 1.83} 46%|████▌ | 722/1572 [1:29:46<1:41:34, 7.17s/it] 46%|████▌ | 723/1572 [1:29:54<1:42:29, 7.24s/it] {'loss': 0.6935, 'learning_rate': 1.1141732283464568e-05, 'epoch': 1.84} 46%|████▌ | 723/1572 [1:29:54<1:42:29, 7.24s/it] 46%|████▌ | 724/1572 [1:30:02<1:44:30, 7.39s/it] {'loss': 0.6791, 'learning_rate': 1.1128608923884517e-05, 'epoch': 1.84} 46%|████▌ | 724/1572 [1:30:02<1:44:30, 7.39s/it] 46%|████▌ | 725/1572 [1:30:09<1:43:58, 7.36s/it] {'loss': 0.6395, 'learning_rate': 1.1115485564304462e-05, 'epoch': 1.84} 46%|████▌ | 725/1572 [1:30:09<1:43:58, 7.36s/it] 46%|████▌ | 726/1572 [1:30:16<1:44:14, 7.39s/it] {'loss': 0.7327, 'learning_rate': 1.1102362204724411e-05, 'epoch': 1.84} 46%|████▌ | 726/1572 [1:30:16<1:44:14, 7.39s/it] 46%|████▌ | 727/1572 [1:30:23<1:41:29, 7.21s/it] {'loss': 0.5885, 'learning_rate': 1.108923884514436e-05, 'epoch': 1.85} 46%|████▌ | 727/1572 [1:30:23<1:41:29, 7.21s/it] 46%|████▋ | 728/1572 [1:30:30<1:42:14, 7.27s/it] {'loss': 0.728, 'learning_rate': 1.1076115485564305e-05, 'epoch': 1.85} 46%|████▋ | 728/1572 [1:30:30<1:42:14, 7.27s/it] 46%|████▋ | 729/1572 [1:30:37<1:40:20, 7.14s/it] {'loss': 0.652, 'learning_rate': 1.1062992125984254e-05, 'epoch': 1.85} 46%|████▋ | 729/1572 [1:30:37<1:40:20, 7.14s/it] 46%|████▋ | 730/1572 [1:30:44<1:38:58, 7.05s/it] {'loss': 0.7011, 'learning_rate': 1.1049868766404201e-05, 'epoch': 1.85} 46%|████▋ | 730/1572 [1:30:44<1:38:58, 7.05s/it] 47%|████▋ | 731/1572 [1:30:52<1:40:30, 7.17s/it] {'loss': 0.737, 'learning_rate': 1.1036745406824148e-05, 'epoch': 1.86} 47%|████▋ | 731/1572 [1:30:52<1:40:30, 7.17s/it] 47%|████▋ | 732/1572 [1:30:58<1:39:02, 7.07s/it] {'loss': 0.6353, 'learning_rate': 1.1023622047244095e-05, 'epoch': 1.86} 47%|████▋ | 732/1572 [1:30:58<1:39:02, 7.07s/it] 47%|████▋ | 733/1572 [1:31:06<1:40:02, 7.15s/it] {'loss': 0.5945, 'learning_rate': 1.1010498687664042e-05, 'epoch': 1.86} 47%|████▋ | 733/1572 [1:31:06<1:40:02, 7.15s/it] 47%|████▋ | 734/1572 [1:31:13<1:39:32, 7.13s/it] {'loss': 0.621, 'learning_rate': 1.099737532808399e-05, 'epoch': 1.86} 47%|████▋ | 734/1572 [1:31:13<1:39:32, 7.13s/it] 47%|████▋ | 735/1572 [1:31:21<1:42:37, 7.36s/it] {'loss': 0.8395, 'learning_rate': 1.0984251968503938e-05, 'epoch': 1.87} 47%|████▋ | 735/1572 [1:31:21<1:42:37, 7.36s/it] 47%|████▋ | 736/1572 [1:31:28<1:43:05, 7.40s/it] {'loss': 0.6474, 'learning_rate': 1.0971128608923884e-05, 'epoch': 1.87} 47%|████▋ | 736/1572 [1:31:28<1:43:05, 7.40s/it] 47%|████▋ | 737/1572 [1:31:35<1:41:34, 7.30s/it] {'loss': 0.6155, 'learning_rate': 1.0958005249343833e-05, 'epoch': 1.87} 47%|████▋ | 737/1572 [1:31:35<1:41:34, 7.30s/it] 47%|████▋ | 738/1572 [1:31:42<1:39:43, 7.17s/it] {'loss': 0.7122, 'learning_rate': 1.0944881889763781e-05, 'epoch': 1.87} 47%|████▋ | 738/1572 [1:31:42<1:39:43, 7.17s/it] 47%|████▋ | 739/1572 [1:31:49<1:39:53, 7.20s/it] {'loss': 0.6375, 'learning_rate': 1.0931758530183727e-05, 'epoch': 1.88} 47%|████▋ | 739/1572 [1:31:49<1:39:53, 7.20s/it] 47%|████▋ | 740/1572 [1:31:57<1:41:03, 7.29s/it] {'loss': 0.8032, 'learning_rate': 1.0918635170603676e-05, 'epoch': 1.88} 47%|████▋ | 740/1572 [1:31:57<1:41:03, 7.29s/it] 47%|████▋ | 741/1572 [1:32:04<1:39:54, 7.21s/it] {'loss': 0.6403, 'learning_rate': 1.0905511811023624e-05, 'epoch': 1.88} 47%|████▋ | 741/1572 [1:32:04<1:39:54, 7.21s/it] 47%|████▋ | 742/1572 [1:32:11<1:38:06, 7.09s/it] {'loss': 0.6304, 'learning_rate': 1.089238845144357e-05, 'epoch': 1.88} 47%|████▋ | 742/1572 [1:32:11<1:38:06, 7.09s/it] 47%|████▋ | 743/1572 [1:32:18<1:37:29, 7.06s/it] {'loss': 0.6455, 'learning_rate': 1.0879265091863519e-05, 'epoch': 1.89} 47%|████▋ | 743/1572 [1:32:18<1:37:29, 7.06s/it] 47%|████▋ | 744/1572 [1:32:25<1:36:40, 7.01s/it] {'loss': 0.6916, 'learning_rate': 1.0866141732283466e-05, 'epoch': 1.89} 47%|████▋ | 744/1572 [1:32:25<1:36:40, 7.01s/it] 47%|████▋ | 745/1572 [1:32:32<1:36:55, 7.03s/it] {'loss': 0.646, 'learning_rate': 1.0853018372703413e-05, 'epoch': 1.89} 47%|████▋ | 745/1572 [1:32:32<1:36:55, 7.03s/it] 47%|████▋ | 746/1572 [1:32:39<1:36:04, 6.98s/it] {'loss': 0.717, 'learning_rate': 1.083989501312336e-05, 'epoch': 1.89} 47%|████▋ | 746/1572 [1:32:39<1:36:04, 6.98s/it] 48%|████▊ | 747/1572 [1:32:46<1:37:45, 7.11s/it] {'loss': 0.7486, 'learning_rate': 1.0826771653543309e-05, 'epoch': 1.9} 48%|████▊ | 747/1572 [1:32:46<1:37:45, 7.11s/it] 48%|████▊ | 748/1572 [1:32:53<1:37:05, 7.07s/it] {'loss': 0.7672, 'learning_rate': 1.0813648293963254e-05, 'epoch': 1.9} 48%|████▊ | 748/1572 [1:32:53<1:37:05, 7.07s/it] 48%|████▊ | 749/1572 [1:33:00<1:35:47, 6.98s/it] {'loss': 0.6813, 'learning_rate': 1.0800524934383203e-05, 'epoch': 1.9} 48%|████▊ | 749/1572 [1:33:00<1:35:47, 6.98s/it] 48%|████▊ | 750/1572 [1:33:07<1:35:16, 6.95s/it] {'loss': 0.6072, 'learning_rate': 1.0787401574803152e-05, 'epoch': 1.9} 48%|████▊ | 750/1572 [1:33:07<1:35:16, 6.95s/it] 48%|████▊ | 751/1572 [1:33:14<1:36:05, 7.02s/it] {'loss': 0.6769, 'learning_rate': 1.0774278215223097e-05, 'epoch': 1.91} 48%|████▊ | 751/1572 [1:33:14<1:36:05, 7.02s/it] 48%|████▊ | 752/1572 [1:33:21<1:37:42, 7.15s/it] {'loss': 0.7087, 'learning_rate': 1.0761154855643046e-05, 'epoch': 1.91} 48%|████▊ | 752/1572 [1:33:21<1:37:42, 7.15s/it] 48%|████▊ | 753/1572 [1:33:28<1:37:37, 7.15s/it] {'loss': 0.6325, 'learning_rate': 1.0748031496062993e-05, 'epoch': 1.91} 48%|████▊ | 753/1572 [1:33:28<1:37:37, 7.15s/it] 48%|████▊ | 754/1572 [1:33:36<1:38:03, 7.19s/it] {'loss': 0.6437, 'learning_rate': 1.073490813648294e-05, 'epoch': 1.91} 48%|████▊ | 754/1572 [1:33:36<1:38:03, 7.19s/it] 48%|████▊ | 755/1572 [1:33:43<1:37:47, 7.18s/it] {'loss': 0.6623, 'learning_rate': 1.0721784776902887e-05, 'epoch': 1.92} 48%|████▊ | 755/1572 [1:33:43<1:37:47, 7.18s/it] 48%|████▊ | 756/1572 [1:33:50<1:38:11, 7.22s/it] {'loss': 0.6227, 'learning_rate': 1.0708661417322836e-05, 'epoch': 1.92} 48%|████▊ | 756/1572 [1:33:50<1:38:11, 7.22s/it] 48%|████▊ | 757/1572 [1:33:57<1:37:31, 7.18s/it] {'loss': 0.723, 'learning_rate': 1.0695538057742783e-05, 'epoch': 1.92} 48%|████▊ | 757/1572 [1:33:57<1:37:31, 7.18s/it] 48%|████▊ | 758/1572 [1:34:04<1:36:54, 7.14s/it] {'loss': 0.729, 'learning_rate': 1.068241469816273e-05, 'epoch': 1.92} 48%|████▊ | 758/1572 [1:34:04<1:36:54, 7.14s/it] 48%|████▊ | 759/1572 [1:34:12<1:38:18, 7.26s/it] {'loss': 0.6285, 'learning_rate': 1.066929133858268e-05, 'epoch': 1.93} 48%|████▊ | 759/1572 [1:34:12<1:38:18, 7.26s/it] 48%|████▊ | 760/1572 [1:34:20<1:39:51, 7.38s/it] {'loss': 0.8486, 'learning_rate': 1.0656167979002625e-05, 'epoch': 1.93} 48%|████▊ | 760/1572 [1:34:20<1:39:51, 7.38s/it] 48%|████▊ | 761/1572 [1:34:27<1:41:45, 7.53s/it] {'loss': 0.629, 'learning_rate': 1.0643044619422573e-05, 'epoch': 1.93} 48%|████▊ | 761/1572 [1:34:27<1:41:45, 7.53s/it] 48%|████▊ | 762/1572 [1:34:35<1:41:13, 7.50s/it] {'loss': 0.6881, 'learning_rate': 1.0629921259842522e-05, 'epoch': 1.93} 48%|████▊ | 762/1572 [1:34:35<1:41:13, 7.50s/it] 49%|████▊ | 763/1572 [1:34:42<1:38:37, 7.31s/it] {'loss': 0.6187, 'learning_rate': 1.0616797900262468e-05, 'epoch': 1.94} 49%|████▊ | 763/1572 [1:34:42<1:38:37, 7.31s/it] 49%|████▊ | 764/1572 [1:34:50<1:41:42, 7.55s/it] {'loss': 0.7408, 'learning_rate': 1.0603674540682417e-05, 'epoch': 1.94} 49%|████▊ | 764/1572 [1:34:50<1:41:42, 7.55s/it] 49%|████▊ | 765/1572 [1:34:58<1:42:18, 7.61s/it] {'loss': 0.5962, 'learning_rate': 1.0590551181102362e-05, 'epoch': 1.94} 49%|████▊ | 765/1572 [1:34:58<1:42:18, 7.61s/it] 49%|████▊ | 766/1572 [1:35:06<1:45:02, 7.82s/it] {'loss': 0.8271, 'learning_rate': 1.057742782152231e-05, 'epoch': 1.94} 49%|████▊ | 766/1572 [1:35:06<1:45:02, 7.82s/it] 49%|████▉ | 767/1572 [1:35:13<1:41:39, 7.58s/it] {'loss': 0.7126, 'learning_rate': 1.0564304461942258e-05, 'epoch': 1.95} 49%|████▉ | 767/1572 [1:35:13<1:41:39, 7.58s/it] 49%|████▉ | 768/1572 [1:35:20<1:40:59, 7.54s/it] {'loss': 0.6346, 'learning_rate': 1.0551181102362205e-05, 'epoch': 1.95} 49%|████▉ | 768/1572 [1:35:20<1:40:59, 7.54s/it] 49%|████▉ | 769/1572 [1:35:28<1:40:35, 7.52s/it] {'loss': 0.6933, 'learning_rate': 1.0538057742782152e-05, 'epoch': 1.95} 49%|████▉ | 769/1572 [1:35:28<1:40:35, 7.52s/it] 49%|████▉ | 770/1572 [1:35:36<1:41:35, 7.60s/it] {'loss': 0.6603, 'learning_rate': 1.0524934383202101e-05, 'epoch': 1.95} 49%|████▉ | 770/1572 [1:35:36<1:41:35, 7.60s/it] 49%|████▉ | 771/1572 [1:35:43<1:39:55, 7.48s/it] {'loss': 0.7081, 'learning_rate': 1.0511811023622048e-05, 'epoch': 1.96} 49%|████▉ | 771/1572 [1:35:43<1:39:55, 7.48s/it] 49%|████▉ | 772/1572 [1:35:50<1:40:00, 7.50s/it] {'loss': 0.6914, 'learning_rate': 1.0498687664041995e-05, 'epoch': 1.96} 49%|████▉ | 772/1572 [1:35:50<1:40:00, 7.50s/it] 49%|████▉ | 773/1572 [1:35:57<1:37:51, 7.35s/it] {'loss': 0.632, 'learning_rate': 1.0485564304461944e-05, 'epoch': 1.96} 49%|████▉ | 773/1572 [1:35:57<1:37:51, 7.35s/it] 49%|████▉ | 774/1572 [1:36:04<1:35:49, 7.21s/it] {'loss': 0.6257, 'learning_rate': 1.047244094488189e-05, 'epoch': 1.97} 49%|████▉ | 774/1572 [1:36:04<1:35:49, 7.21s/it] 49%|████▉ | 775/1572 [1:36:11<1:35:32, 7.19s/it] {'loss': 0.6747, 'learning_rate': 1.0459317585301838e-05, 'epoch': 1.97} 49%|████▉ | 775/1572 [1:36:11<1:35:32, 7.19s/it] 49%|████▉ | 776/1572 [1:36:19<1:36:37, 7.28s/it] {'loss': 0.7726, 'learning_rate': 1.0446194225721787e-05, 'epoch': 1.97} 49%|████▉ | 776/1572 [1:36:19<1:36:37, 7.28s/it] 49%|████▉ | 777/1572 [1:36:26<1:37:11, 7.34s/it] {'loss': 0.6323, 'learning_rate': 1.0433070866141732e-05, 'epoch': 1.97} 49%|████▉ | 777/1572 [1:36:26<1:37:11, 7.34s/it] 49%|████▉ | 778/1572 [1:36:34<1:37:42, 7.38s/it] {'loss': 0.6529, 'learning_rate': 1.0419947506561681e-05, 'epoch': 1.98} 49%|████▉ | 778/1572 [1:36:34<1:37:42, 7.38s/it] 50%|████▉ | 779/1572 [1:36:41<1:38:15, 7.43s/it] {'loss': 0.6803, 'learning_rate': 1.0406824146981628e-05, 'epoch': 1.98} 50%|████▉ | 779/1572 [1:36:41<1:38:15, 7.43s/it] 50%|████▉ | 780/1572 [1:36:48<1:36:08, 7.28s/it] {'loss': 0.5972, 'learning_rate': 1.0393700787401575e-05, 'epoch': 1.98} 50%|████▉ | 780/1572 [1:36:48<1:36:08, 7.28s/it] 50%|████▉ | 781/1572 [1:36:55<1:35:01, 7.21s/it] {'loss': 0.664, 'learning_rate': 1.0380577427821523e-05, 'epoch': 1.98} 50%|████▉ | 781/1572 [1:36:55<1:35:01, 7.21s/it] 50%|████▉ | 782/1572 [1:37:03<1:36:26, 7.32s/it] {'loss': 0.6321, 'learning_rate': 1.0367454068241471e-05, 'epoch': 1.99} 50%|████▉ | 782/1572 [1:37:03<1:36:26, 7.32s/it] 50%|████▉ | 783/1572 [1:37:10<1:37:00, 7.38s/it] {'loss': 0.673, 'learning_rate': 1.0354330708661417e-05, 'epoch': 1.99} 50%|████▉ | 783/1572 [1:37:10<1:37:00, 7.38s/it] 50%|████▉ | 784/1572 [1:37:17<1:35:31, 7.27s/it] {'loss': 0.6909, 'learning_rate': 1.0341207349081366e-05, 'epoch': 1.99} 50%|████▉ | 784/1572 [1:37:17<1:35:31, 7.27s/it] 50%|████▉ | 785/1572 [1:37:25<1:34:39, 7.22s/it] {'loss': 0.6972, 'learning_rate': 1.0328083989501314e-05, 'epoch': 1.99} 50%|████▉ | 785/1572 [1:37:25<1:34:39, 7.22s/it] 50%|█████ | 786/1572 [1:37:32<1:37:03, 7.41s/it] {'loss': 0.7406, 'learning_rate': 1.031496062992126e-05, 'epoch': 2.0} 50%|█████ | 786/1572 [1:37:32<1:37:03, 7.41s/it] 50%|█████ | 787/1572 [1:37:40<1:36:47, 7.40s/it] {'loss': 0.6397, 'learning_rate': 1.0301837270341209e-05, 'epoch': 2.0} 50%|█████ | 787/1572 [1:37:40<1:36:47, 7.40s/it][WARNING|trainer.py:2348] 2024-07-08 20:57:34,580 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,580 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,580 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,580 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,580 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,581 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,581 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 20:57:34,581 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 already exists and is non-empty.Saving will proceed but saved results may be invalid. [INFO|trainer.py:2889] 2024-07-08 20:57:58,836 >> Saving model checkpoint to ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787 [INFO|tokenization_utils_base.py:2432] 2024-07-08 20:58:00,120 >> tokenizer config file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-07-08 20:58:00,125 >> Special tokens file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-787/special_tokens_map.json 50%|█████ | 788/1572 [1:39:52<9:47:39, 44.97s/it] {'loss': 0.6186, 'learning_rate': 1.0288713910761157e-05, 'epoch': 2.0} 50%|█████ | 788/1572 [1:39:52<9:47:39, 44.97s/it] 50%|█████ | 789/1572 [1:39:59<7:15:54, 33.40s/it] {'loss': 0.6089, 'learning_rate': 1.0275590551181103e-05, 'epoch': 2.0} 50%|█████ | 789/1572 [1:39:59<7:15:54, 33.40s/it] 50%|█████ | 790/1572 [1:40:07<5:35:17, 25.73s/it] {'loss': 0.7009, 'learning_rate': 1.0262467191601052e-05, 'epoch': 2.01} 50%|█████ | 790/1572 [1:40:07<5:35:17, 25.73s/it] 50%|█████ | 791/1572 [1:40:14<4:23:27, 20.24s/it] {'loss': 0.6637, 'learning_rate': 1.0249343832020999e-05, 'epoch': 2.01} 50%|█████ | 791/1572 [1:40:14<4:23:27, 20.24s/it] 50%|█████ | 792/1572 [1:40:21<3:33:00, 16.39s/it] {'loss': 0.6239, 'learning_rate': 1.0236220472440946e-05, 'epoch': 2.01} 50%|█████ | 792/1572 [1:40:21<3:33:00, 16.39s/it] 50%|█████ | 793/1572 [1:40:28<2:55:56, 13.55s/it] {'loss': 0.6173, 'learning_rate': 1.0223097112860893e-05, 'epoch': 2.01} 50%|█████ | 793/1572 [1:40:28<2:55:56, 13.55s/it] 51%|█████ | 794/1572 [1:40:35<2:30:08, 11.58s/it] {'loss': 0.6577, 'learning_rate': 1.0209973753280842e-05, 'epoch': 2.02} 51%|█████ | 794/1572 [1:40:35<2:30:08, 11.58s/it] 51%|█████ | 795/1572 [1:40:42<2:10:55, 10.11s/it] {'loss': 0.6167, 'learning_rate': 1.0196850393700787e-05, 'epoch': 2.02} 51%|█████ | 795/1572 [1:40:42<2:10:55, 10.11s/it] 51%|█████ | 796/1572 [1:40:49<1:58:45, 9.18s/it] {'loss': 0.6545, 'learning_rate': 1.0183727034120736e-05, 'epoch': 2.02} 51%|█████ | 796/1572 [1:40:49<1:58:45, 9.18s/it] 51%|█████ | 797/1572 [1:40:56<1:50:31, 8.56s/it] {'loss': 0.7052, 'learning_rate': 1.0170603674540682e-05, 'epoch': 2.02} 51%|█████ | 797/1572 [1:40:56<1:50:31, 8.56s/it] 51%|█████ | 798/1572 [1:41:04<1:46:26, 8.25s/it] {'loss': 0.6266, 'learning_rate': 1.015748031496063e-05, 'epoch': 2.03} 51%|█████ | 798/1572 [1:41:04<1:46:26, 8.25s/it] 51%|█████ | 799/1572 [1:41:11<1:42:11, 7.93s/it] {'loss': 0.6369, 'learning_rate': 1.0144356955380579e-05, 'epoch': 2.03} 51%|█████ | 799/1572 [1:41:11<1:42:11, 7.93s/it] 51%|█████ | 800/1572 [1:41:19<1:41:23, 7.88s/it] {'loss': 0.6615, 'learning_rate': 1.0131233595800525e-05, 'epoch': 2.03} 51%|█████ | 800/1572 [1:41:19<1:41:23, 7.88s/it] 51%|█████ | 801/1572 [1:41:26<1:37:24, 7.58s/it] {'loss': 0.5865, 'learning_rate': 1.0118110236220473e-05, 'epoch': 2.03} 51%|█████ | 801/1572 [1:41:26<1:37:24, 7.58s/it] 51%|█████ | 802/1572 [1:41:33<1:35:18, 7.43s/it] {'loss': 0.6904, 'learning_rate': 1.010498687664042e-05, 'epoch': 2.04} 51%|█████ | 802/1572 [1:41:33<1:35:18, 7.43s/it] 51%|█████ | 803/1572 [1:41:40<1:34:15, 7.35s/it] {'loss': 0.6218, 'learning_rate': 1.0091863517060368e-05, 'epoch': 2.04} 51%|█████ | 803/1572 [1:41:40<1:34:15, 7.35s/it] 51%|█████ | 804/1572 [1:41:47<1:35:16, 7.44s/it] {'loss': 0.6065, 'learning_rate': 1.0078740157480316e-05, 'epoch': 2.04} 51%|█████ | 804/1572 [1:41:47<1:35:16, 7.44s/it] 51%|█████ | 805/1572 [1:41:55<1:37:14, 7.61s/it] {'loss': 0.6221, 'learning_rate': 1.0065616797900264e-05, 'epoch': 2.04} 51%|█████ | 805/1572 [1:41:55<1:37:14, 7.61s/it] 51%|█████▏ | 806/1572 [1:42:03<1:35:12, 7.46s/it] {'loss': 0.6232, 'learning_rate': 1.005249343832021e-05, 'epoch': 2.05} 51%|█████▏ | 806/1572 [1:42:03<1:35:12, 7.46s/it] 51%|█████▏ | 807/1572 [1:42:10<1:34:11, 7.39s/it] {'loss': 0.6404, 'learning_rate': 1.0039370078740158e-05, 'epoch': 2.05} 51%|█████▏ | 807/1572 [1:42:10<1:34:11, 7.39s/it] 51%|█████▏ | 808/1572 [1:42:17<1:32:46, 7.29s/it] {'loss': 0.6067, 'learning_rate': 1.0026246719160107e-05, 'epoch': 2.05} 51%|█████▏ | 808/1572 [1:42:17<1:32:46, 7.29s/it] 51%|█████▏ | 809/1572 [1:42:24<1:33:39, 7.36s/it] {'loss': 0.6303, 'learning_rate': 1.0013123359580052e-05, 'epoch': 2.05} 51%|█████▏ | 809/1572 [1:42:24<1:33:39, 7.36s/it] 52%|█████▏ | 810/1572 [1:42:31<1:31:16, 7.19s/it] {'loss': 0.5762, 'learning_rate': 1e-05, 'epoch': 2.06} 52%|█████▏ | 810/1572 [1:42:31<1:31:16, 7.19s/it] 52%|█████▏ | 811/1572 [1:42:38<1:29:30, 7.06s/it] {'loss': 0.5743, 'learning_rate': 9.986876640419948e-06, 'epoch': 2.06} 52%|█████▏ | 811/1572 [1:42:38<1:29:30, 7.06s/it] 52%|█████▏ | 812/1572 [1:42:45<1:28:53, 7.02s/it] {'loss': 0.6424, 'learning_rate': 9.973753280839897e-06, 'epoch': 2.06} 52%|█████▏ | 812/1572 [1:42:45<1:28:53, 7.02s/it] 52%|█████▏ | 813/1572 [1:42:52<1:30:56, 7.19s/it] {'loss': 0.6624, 'learning_rate': 9.960629921259844e-06, 'epoch': 2.06} 52%|█████▏ | 813/1572 [1:42:52<1:30:56, 7.19s/it] 52%|█████▏ | 814/1572 [1:42:59<1:30:25, 7.16s/it] {'loss': 0.6268, 'learning_rate': 9.947506561679791e-06, 'epoch': 2.07} 52%|█████▏ | 814/1572 [1:43:00<1:30:25, 7.16s/it] 52%|█████▏ | 815/1572 [1:43:07<1:30:30, 7.17s/it] {'loss': 0.6103, 'learning_rate': 9.934383202099738e-06, 'epoch': 2.07} 52%|█████▏ | 815/1572 [1:43:07<1:30:30, 7.17s/it] 52%|█████▏ | 816/1572 [1:43:14<1:31:28, 7.26s/it] {'loss': 0.6463, 'learning_rate': 9.921259842519685e-06, 'epoch': 2.07} 52%|█████▏ | 816/1572 [1:43:14<1:31:28, 7.26s/it] 52%|█████▏ | 817/1572 [1:43:21<1:30:46, 7.21s/it] {'loss': 0.703, 'learning_rate': 9.908136482939632e-06, 'epoch': 2.07} 52%|█████▏ | 817/1572 [1:43:21<1:30:46, 7.21s/it] 52%|█████▏ | 818/1572 [1:43:28<1:30:05, 7.17s/it] {'loss': 0.5118, 'learning_rate': 9.895013123359581e-06, 'epoch': 2.08} 52%|█████▏ | 818/1572 [1:43:28<1:30:05, 7.17s/it] 52%|█████▏ | 819/1572 [1:43:35<1:29:29, 7.13s/it] {'loss': 0.6595, 'learning_rate': 9.881889763779528e-06, 'epoch': 2.08} 52%|█████▏ | 819/1572 [1:43:35<1:29:29, 7.13s/it] 52%|█████▏ | 820/1572 [1:43:44<1:33:23, 7.45s/it] {'loss': 0.5826, 'learning_rate': 9.868766404199475e-06, 'epoch': 2.08} 52%|█████▏ | 820/1572 [1:43:44<1:33:23, 7.45s/it] 52%|█████▏ | 821/1572 [1:43:51<1:32:20, 7.38s/it] {'loss': 0.6616, 'learning_rate': 9.855643044619422e-06, 'epoch': 2.08} 52%|█████▏ | 821/1572 [1:43:51<1:32:20, 7.38s/it] 52%|█████▏ | 822/1572 [1:43:59<1:33:29, 7.48s/it] {'loss': 0.6775, 'learning_rate': 9.842519685039371e-06, 'epoch': 2.09} 52%|█████▏ | 822/1572 [1:43:59<1:33:29, 7.48s/it] 52%|█████▏ | 823/1572 [1:44:06<1:32:02, 7.37s/it] {'loss': 0.6202, 'learning_rate': 9.829396325459318e-06, 'epoch': 2.09} 52%|█████▏ | 823/1572 [1:44:06<1:32:02, 7.37s/it] 52%|█████▏ | 824/1572 [1:44:13<1:32:26, 7.42s/it] {'loss': 0.607, 'learning_rate': 9.816272965879266e-06, 'epoch': 2.09} 52%|█████▏ | 824/1572 [1:44:13<1:32:26, 7.42s/it] 52%|█████▏ | 825/1572 [1:44:20<1:31:58, 7.39s/it] {'loss': 0.6514, 'learning_rate': 9.803149606299214e-06, 'epoch': 2.09} 52%|█████▏ | 825/1572 [1:44:20<1:31:58, 7.39s/it] 53%|█████▎ | 826/1572 [1:44:27<1:30:30, 7.28s/it] {'loss': 0.605, 'learning_rate': 9.790026246719161e-06, 'epoch': 2.1} 53%|█████▎ | 826/1572 [1:44:27<1:30:30, 7.28s/it] 53%|█████▎ | 827/1572 [1:44:34<1:29:10, 7.18s/it] {'loss': 0.6088, 'learning_rate': 9.776902887139109e-06, 'epoch': 2.1} 53%|█████▎ | 827/1572 [1:44:34<1:29:10, 7.18s/it] 53%|█████▎ | 828/1572 [1:44:42<1:29:24, 7.21s/it] {'loss': 0.5596, 'learning_rate': 9.763779527559056e-06, 'epoch': 2.1} 53%|█████▎ | 828/1572 [1:44:42<1:29:24, 7.21s/it] 53%|█████▎ | 829/1572 [1:44:49<1:29:11, 7.20s/it] {'loss': 0.5667, 'learning_rate': 9.750656167979003e-06, 'epoch': 2.1} 53%|█████▎ | 829/1572 [1:44:49<1:29:11, 7.20s/it] 53%|█████▎ | 830/1572 [1:44:56<1:29:57, 7.27s/it] {'loss': 0.6606, 'learning_rate': 9.73753280839895e-06, 'epoch': 2.11} 53%|█████▎ | 830/1572 [1:44:56<1:29:57, 7.27s/it] 53%|█████▎ | 831/1572 [1:45:03<1:29:17, 7.23s/it] {'loss': 0.6374, 'learning_rate': 9.724409448818899e-06, 'epoch': 2.11} 53%|█████▎ | 831/1572 [1:45:03<1:29:17, 7.23s/it] 53%|█████▎ | 832/1572 [1:45:10<1:27:51, 7.12s/it] {'loss': 0.628, 'learning_rate': 9.711286089238846e-06, 'epoch': 2.11} 53%|█████▎ | 832/1572 [1:45:10<1:27:51, 7.12s/it] 53%|█████▎ | 833/1572 [1:45:18<1:29:14, 7.25s/it] {'loss': 0.6703, 'learning_rate': 9.698162729658793e-06, 'epoch': 2.11} 53%|█████▎ | 833/1572 [1:45:18<1:29:14, 7.25s/it] 53%|█████▎ | 834/1572 [1:45:25<1:29:39, 7.29s/it] {'loss': 0.6053, 'learning_rate': 9.68503937007874e-06, 'epoch': 2.12} 53%|█████▎ | 834/1572 [1:45:25<1:29:39, 7.29s/it] 53%|█████▎ | 835/1572 [1:45:33<1:29:52, 7.32s/it] {'loss': 0.6029, 'learning_rate': 9.671916010498689e-06, 'epoch': 2.12} 53%|█████▎ | 835/1572 [1:45:33<1:29:52, 7.32s/it] 53%|█████▎ | 836/1572 [1:45:40<1:29:56, 7.33s/it] {'loss': 0.6459, 'learning_rate': 9.658792650918636e-06, 'epoch': 2.12} 53%|█████▎ | 836/1572 [1:45:40<1:29:56, 7.33s/it] 53%|█████▎ | 837/1572 [1:45:47<1:28:41, 7.24s/it] {'loss': 0.6399, 'learning_rate': 9.645669291338583e-06, 'epoch': 2.13} 53%|█████▎ | 837/1572 [1:45:47<1:28:41, 7.24s/it] 53%|█████▎ | 838/1572 [1:45:54<1:28:25, 7.23s/it] {'loss': 0.6433, 'learning_rate': 9.632545931758532e-06, 'epoch': 2.13} 53%|█████▎ | 838/1572 [1:45:54<1:28:25, 7.23s/it] 53%|█████▎ | 839/1572 [1:46:02<1:30:59, 7.45s/it] {'loss': 0.5805, 'learning_rate': 9.619422572178479e-06, 'epoch': 2.13} 53%|█████▎ | 839/1572 [1:46:02<1:30:59, 7.45s/it] 53%|█████▎ | 840/1572 [1:46:10<1:31:15, 7.48s/it] {'loss': 0.6291, 'learning_rate': 9.606299212598426e-06, 'epoch': 2.13} 53%|█████▎ | 840/1572 [1:46:10<1:31:15, 7.48s/it] 53%|█████▎ | 841/1572 [1:46:18<1:32:40, 7.61s/it] {'loss': 0.705, 'learning_rate': 9.593175853018373e-06, 'epoch': 2.14} 53%|█████▎ | 841/1572 [1:46:18<1:32:40, 7.61s/it] 54%|█████▎ | 842/1572 [1:46:25<1:32:34, 7.61s/it] {'loss': 0.6765, 'learning_rate': 9.58005249343832e-06, 'epoch': 2.14} 54%|█████▎ | 842/1572 [1:46:25<1:32:34, 7.61s/it] 54%|█████▎ | 843/1572 [1:46:32<1:30:04, 7.41s/it] {'loss': 0.6457, 'learning_rate': 9.566929133858268e-06, 'epoch': 2.14} 54%|█████▎ | 843/1572 [1:46:32<1:30:04, 7.41s/it] 54%|█████▎ | 844/1572 [1:46:40<1:29:29, 7.38s/it] {'loss': 0.6815, 'learning_rate': 9.553805774278216e-06, 'epoch': 2.14} 54%|█████▎ | 844/1572 [1:46:40<1:29:29, 7.38s/it] 54%|█████▍ | 845/1572 [1:46:46<1:26:57, 7.18s/it] {'loss': 0.6055, 'learning_rate': 9.540682414698163e-06, 'epoch': 2.15} 54%|█████▍ | 845/1572 [1:46:46<1:26:57, 7.18s/it] 54%|█████▍ | 846/1572 [1:46:53<1:26:15, 7.13s/it] {'loss': 0.6402, 'learning_rate': 9.52755905511811e-06, 'epoch': 2.15} 54%|█████▍ | 846/1572 [1:46:53<1:26:15, 7.13s/it] 54%|█████▍ | 847/1572 [1:47:01<1:27:45, 7.26s/it] {'loss': 0.6346, 'learning_rate': 9.51443569553806e-06, 'epoch': 2.15} 54%|█████▍ | 847/1572 [1:47:01<1:27:45, 7.26s/it] 54%|█████▍ | 848/1572 [1:47:09<1:30:51, 7.53s/it] {'loss': 0.6713, 'learning_rate': 9.501312335958006e-06, 'epoch': 2.15} 54%|█████▍ | 848/1572 [1:47:09<1:30:51, 7.53s/it] 54%|█████▍ | 849/1572 [1:47:16<1:28:56, 7.38s/it] {'loss': 0.6157, 'learning_rate': 9.488188976377954e-06, 'epoch': 2.16} 54%|█████▍ | 849/1572 [1:47:16<1:28:56, 7.38s/it] 54%|█████▍ | 850/1572 [1:47:23<1:27:54, 7.31s/it] {'loss': 0.6422, 'learning_rate': 9.4750656167979e-06, 'epoch': 2.16} 54%|█████▍ | 850/1572 [1:47:23<1:27:54, 7.31s/it] 54%|█████▍ | 851/1572 [1:47:31<1:28:18, 7.35s/it] {'loss': 0.5974, 'learning_rate': 9.46194225721785e-06, 'epoch': 2.16} 54%|█████▍ | 851/1572 [1:47:31<1:28:18, 7.35s/it] 54%|█████▍ | 852/1572 [1:47:38<1:27:12, 7.27s/it] {'loss': 0.6071, 'learning_rate': 9.448818897637797e-06, 'epoch': 2.16} 54%|█████▍ | 852/1572 [1:47:38<1:27:12, 7.27s/it] 54%|█████▍ | 853/1572 [1:47:45<1:28:38, 7.40s/it] {'loss': 0.682, 'learning_rate': 9.435695538057744e-06, 'epoch': 2.17} 54%|█████▍ | 853/1572 [1:47:45<1:28:38, 7.40s/it] 54%|█████▍ | 854/1572 [1:47:53<1:27:40, 7.33s/it] {'loss': 0.6037, 'learning_rate': 9.422572178477691e-06, 'epoch': 2.17} 54%|█████▍ | 854/1572 [1:47:53<1:27:40, 7.33s/it] 54%|█████▍ | 855/1572 [1:48:00<1:27:18, 7.31s/it] {'loss': 0.6007, 'learning_rate': 9.409448818897638e-06, 'epoch': 2.17} 54%|█████▍ | 855/1572 [1:48:00<1:27:18, 7.31s/it] 54%|█████▍ | 856/1572 [1:48:07<1:28:21, 7.40s/it] {'loss': 0.7192, 'learning_rate': 9.396325459317585e-06, 'epoch': 2.17} 54%|█████▍ | 856/1572 [1:48:07<1:28:21, 7.40s/it] 55%|█████▍ | 857/1572 [1:48:15<1:27:10, 7.32s/it] {'loss': 0.6368, 'learning_rate': 9.383202099737534e-06, 'epoch': 2.18} 55%|█████▍ | 857/1572 [1:48:15<1:27:10, 7.32s/it] 55%|█████▍ | 858/1572 [1:48:22<1:26:54, 7.30s/it] {'loss': 0.6612, 'learning_rate': 9.370078740157481e-06, 'epoch': 2.18} 55%|█████▍ | 858/1572 [1:48:22<1:26:54, 7.30s/it] 55%|█████▍ | 859/1572 [1:48:29<1:26:07, 7.25s/it] {'loss': 0.6396, 'learning_rate': 9.356955380577428e-06, 'epoch': 2.18} 55%|█████▍ | 859/1572 [1:48:29<1:26:07, 7.25s/it] 55%|█████▍ | 860/1572 [1:48:36<1:25:35, 7.21s/it] {'loss': 0.6173, 'learning_rate': 9.343832020997377e-06, 'epoch': 2.18} 55%|█████▍ | 860/1572 [1:48:36<1:25:35, 7.21s/it] 55%|█████▍ | 861/1572 [1:48:43<1:25:55, 7.25s/it] {'loss': 0.6089, 'learning_rate': 9.330708661417324e-06, 'epoch': 2.19} 55%|█████▍ | 861/1572 [1:48:43<1:25:55, 7.25s/it] 55%|█████▍ | 862/1572 [1:48:51<1:25:54, 7.26s/it] {'loss': 0.6533, 'learning_rate': 9.317585301837271e-06, 'epoch': 2.19} 55%|█████▍ | 862/1572 [1:48:51<1:25:54, 7.26s/it] 55%|█████▍ | 863/1572 [1:48:58<1:24:46, 7.17s/it] {'loss': 0.6849, 'learning_rate': 9.304461942257218e-06, 'epoch': 2.19} 55%|█████▍ | 863/1572 [1:48:58<1:24:46, 7.17s/it] 55%|█████▍ | 864/1572 [1:49:05<1:25:14, 7.22s/it] {'loss': 0.5914, 'learning_rate': 9.291338582677165e-06, 'epoch': 2.19} 55%|█████▍ | 864/1572 [1:49:05<1:25:14, 7.22s/it] 55%|█████▌ | 865/1572 [1:49:12<1:25:43, 7.28s/it] {'loss': 0.703, 'learning_rate': 9.278215223097114e-06, 'epoch': 2.2} 55%|█████▌ | 865/1572 [1:49:12<1:25:43, 7.28s/it] 55%|█████▌ | 866/1572 [1:49:20<1:26:35, 7.36s/it] {'loss': 0.6436, 'learning_rate': 9.265091863517061e-06, 'epoch': 2.2} 55%|█████▌ | 866/1572 [1:49:20<1:26:35, 7.36s/it] 55%|█████▌ | 867/1572 [1:49:27<1:26:31, 7.36s/it] {'loss': 0.658, 'learning_rate': 9.251968503937008e-06, 'epoch': 2.2} 55%|█████▌ | 867/1572 [1:49:27<1:26:31, 7.36s/it] 55%|█████▌ | 868/1572 [1:49:34<1:25:43, 7.31s/it] {'loss': 0.5291, 'learning_rate': 9.238845144356956e-06, 'epoch': 2.2} 55%|█████▌ | 868/1572 [1:49:34<1:25:43, 7.31s/it] 55%|█████▌ | 869/1572 [1:49:42<1:24:58, 7.25s/it] {'loss': 0.5614, 'learning_rate': 9.225721784776903e-06, 'epoch': 2.21} 55%|█████▌ | 869/1572 [1:49:42<1:24:58, 7.25s/it] 55%|█████▌ | 870/1572 [1:49:49<1:25:05, 7.27s/it] {'loss': 0.6571, 'learning_rate': 9.212598425196852e-06, 'epoch': 2.21} 55%|█████▌ | 870/1572 [1:49:49<1:25:05, 7.27s/it] 55%|█████▌ | 871/1572 [1:49:56<1:25:39, 7.33s/it] {'loss': 0.6326, 'learning_rate': 9.199475065616799e-06, 'epoch': 2.21} 55%|█████▌ | 871/1572 [1:49:56<1:25:39, 7.33s/it] 55%|█████▌ | 872/1572 [1:50:03<1:24:09, 7.21s/it] {'loss': 0.6077, 'learning_rate': 9.186351706036746e-06, 'epoch': 2.21} 55%|█████▌ | 872/1572 [1:50:03<1:24:09, 7.21s/it] 56%|█████▌ | 873/1572 [1:50:11<1:26:01, 7.38s/it] {'loss': 0.6227, 'learning_rate': 9.173228346456695e-06, 'epoch': 2.22} 56%|█████▌ | 873/1572 [1:50:11<1:26:01, 7.38s/it] 56%|█████▌ | 874/1572 [1:50:19<1:26:05, 7.40s/it] {'loss': 0.6865, 'learning_rate': 9.160104986876642e-06, 'epoch': 2.22} 56%|█████▌ | 874/1572 [1:50:19<1:26:05, 7.40s/it] 56%|█████▌ | 875/1572 [1:50:26<1:24:27, 7.27s/it] {'loss': 0.6281, 'learning_rate': 9.146981627296589e-06, 'epoch': 2.22} 56%|█████▌ | 875/1572 [1:50:26<1:24:27, 7.27s/it] 56%|█████▌ | 876/1572 [1:50:33<1:23:20, 7.18s/it] {'loss': 0.5794, 'learning_rate': 9.133858267716536e-06, 'epoch': 2.22} 56%|█████▌ | 876/1572 [1:50:33<1:23:20, 7.18s/it] 56%|█████▌ | 877/1572 [1:50:40<1:23:20, 7.19s/it] {'loss': 0.5695, 'learning_rate': 9.120734908136483e-06, 'epoch': 2.23} 56%|█████▌ | 877/1572 [1:50:40<1:23:20, 7.19s/it] 56%|█████▌ | 878/1572 [1:50:48<1:26:12, 7.45s/it] {'loss': 0.686, 'learning_rate': 9.10761154855643e-06, 'epoch': 2.23} 56%|█████▌ | 878/1572 [1:50:48<1:26:12, 7.45s/it] 56%|█████▌ | 879/1572 [1:50:55<1:24:29, 7.32s/it] {'loss': 0.6254, 'learning_rate': 9.094488188976379e-06, 'epoch': 2.23} 56%|█████▌ | 879/1572 [1:50:55<1:24:29, 7.32s/it] 56%|█████▌ | 880/1572 [1:51:02<1:23:56, 7.28s/it] {'loss': 0.6484, 'learning_rate': 9.081364829396326e-06, 'epoch': 2.23} 56%|█████▌ | 880/1572 [1:51:02<1:23:56, 7.28s/it] 56%|█████▌ | 881/1572 [1:51:09<1:23:05, 7.21s/it] {'loss': 0.6519, 'learning_rate': 9.068241469816273e-06, 'epoch': 2.24} 56%|█████▌ | 881/1572 [1:51:09<1:23:05, 7.21s/it] 56%|█████▌ | 882/1572 [1:51:16<1:21:32, 7.09s/it] {'loss': 0.6125, 'learning_rate': 9.05511811023622e-06, 'epoch': 2.24} 56%|█████▌ | 882/1572 [1:51:16<1:21:32, 7.09s/it] 56%|█████▌ | 883/1572 [1:51:23<1:21:15, 7.08s/it] {'loss': 0.6773, 'learning_rate': 9.041994750656169e-06, 'epoch': 2.24} 56%|█████▌ | 883/1572 [1:51:23<1:21:15, 7.08s/it] 56%|█████▌ | 884/1572 [1:51:30<1:20:47, 7.05s/it] {'loss': 0.6738, 'learning_rate': 9.028871391076116e-06, 'epoch': 2.24} 56%|█████▌ | 884/1572 [1:51:30<1:20:47, 7.05s/it] 56%|█████▋ | 885/1572 [1:51:37<1:21:01, 7.08s/it] {'loss': 0.6506, 'learning_rate': 9.015748031496063e-06, 'epoch': 2.25} 56%|█████▋ | 885/1572 [1:51:37<1:21:01, 7.08s/it] 56%|█████▋ | 886/1572 [1:51:44<1:20:50, 7.07s/it] {'loss': 0.6296, 'learning_rate': 9.002624671916012e-06, 'epoch': 2.25} 56%|█████▋ | 886/1572 [1:51:44<1:20:50, 7.07s/it] 56%|█████▋ | 887/1572 [1:51:51<1:20:28, 7.05s/it] {'loss': 0.6392, 'learning_rate': 8.98950131233596e-06, 'epoch': 2.25} 56%|█████▋ | 887/1572 [1:51:51<1:20:28, 7.05s/it] 56%|█████▋ | 888/1572 [1:51:58<1:19:55, 7.01s/it] {'loss': 0.5881, 'learning_rate': 8.976377952755906e-06, 'epoch': 2.25} 56%|█████▋ | 888/1572 [1:51:58<1:19:55, 7.01s/it] 57%|█████▋ | 889/1572 [1:52:05<1:21:20, 7.15s/it] {'loss': 0.6211, 'learning_rate': 8.963254593175854e-06, 'epoch': 2.26} 57%|█████▋ | 889/1572 [1:52:05<1:21:20, 7.15s/it] 57%|█████▋ | 890/1572 [1:52:12<1:19:32, 7.00s/it] {'loss': 0.5231, 'learning_rate': 8.9501312335958e-06, 'epoch': 2.26} 57%|█████▋ | 890/1572 [1:52:12<1:19:32, 7.00s/it] 57%|█████▋ | 891/1572 [1:52:19<1:19:17, 6.99s/it] {'loss': 0.6116, 'learning_rate': 8.937007874015748e-06, 'epoch': 2.26} 57%|█████▋ | 891/1572 [1:52:19<1:19:17, 6.99s/it] 57%|█████▋ | 892/1572 [1:52:27<1:21:06, 7.16s/it] {'loss': 0.6918, 'learning_rate': 8.923884514435697e-06, 'epoch': 2.26} 57%|█████▋ | 892/1572 [1:52:27<1:21:06, 7.16s/it] 57%|█████▋ | 893/1572 [1:52:34<1:22:19, 7.27s/it] {'loss': 0.6539, 'learning_rate': 8.910761154855644e-06, 'epoch': 2.27} 57%|█████▋ | 893/1572 [1:52:34<1:22:19, 7.27s/it] 57%|█████▋ | 894/1572 [1:52:41<1:20:24, 7.12s/it] {'loss': 0.6007, 'learning_rate': 8.89763779527559e-06, 'epoch': 2.27} 57%|█████▋ | 894/1572 [1:52:41<1:20:24, 7.12s/it] 57%|█████▋ | 895/1572 [1:52:48<1:19:09, 7.02s/it] {'loss': 0.6268, 'learning_rate': 8.88451443569554e-06, 'epoch': 2.27} 57%|█████▋ | 895/1572 [1:52:48<1:19:09, 7.02s/it] 57%|█████▋ | 896/1572 [1:52:55<1:19:19, 7.04s/it] {'loss': 0.6474, 'learning_rate': 8.871391076115487e-06, 'epoch': 2.27} 57%|█████▋ | 896/1572 [1:52:55<1:19:19, 7.04s/it] 57%|█████▋ | 897/1572 [1:53:02<1:19:27, 7.06s/it] {'loss': 0.5751, 'learning_rate': 8.858267716535434e-06, 'epoch': 2.28} 57%|█████▋ | 897/1572 [1:53:02<1:19:27, 7.06s/it] 57%|█████▋ | 898/1572 [1:53:09<1:18:11, 6.96s/it] {'loss': 0.6662, 'learning_rate': 8.845144356955381e-06, 'epoch': 2.28} 57%|█████▋ | 898/1572 [1:53:09<1:18:11, 6.96s/it] 57%|█████▋ | 899/1572 [1:53:16<1:20:00, 7.13s/it] {'loss': 0.6464, 'learning_rate': 8.83202099737533e-06, 'epoch': 2.28} 57%|█████▋ | 899/1572 [1:53:16<1:20:00, 7.13s/it] 57%|█████▋ | 900/1572 [1:53:24<1:22:31, 7.37s/it] {'loss': 0.7425, 'learning_rate': 8.818897637795277e-06, 'epoch': 2.28} 57%|█████▋ | 900/1572 [1:53:24<1:22:31, 7.37s/it] 57%|█████▋ | 901/1572 [1:53:31<1:21:06, 7.25s/it] {'loss': 0.646, 'learning_rate': 8.805774278215224e-06, 'epoch': 2.29} 57%|█████▋ | 901/1572 [1:53:31<1:21:06, 7.25s/it] 57%|█████▋ | 902/1572 [1:53:38<1:20:36, 7.22s/it] {'loss': 0.567, 'learning_rate': 8.792650918635171e-06, 'epoch': 2.29} 57%|█████▋ | 902/1572 [1:53:38<1:20:36, 7.22s/it] 57%|█████▋ | 903/1572 [1:53:45<1:19:45, 7.15s/it] {'loss': 0.6417, 'learning_rate': 8.779527559055118e-06, 'epoch': 2.29} 57%|█████▋ | 903/1572 [1:53:45<1:19:45, 7.15s/it] 58%|█████▊ | 904/1572 [1:53:52<1:19:20, 7.13s/it] {'loss': 0.5776, 'learning_rate': 8.766404199475065e-06, 'epoch': 2.3} 58%|█████▊ | 904/1572 [1:53:52<1:19:20, 7.13s/it] 58%|█████▊ | 905/1572 [1:53:59<1:19:27, 7.15s/it] {'loss': 0.5452, 'learning_rate': 8.753280839895014e-06, 'epoch': 2.3} 58%|█████▊ | 905/1572 [1:53:59<1:19:27, 7.15s/it] 58%|█████▊ | 906/1572 [1:54:07<1:19:53, 7.20s/it] {'loss': 0.624, 'learning_rate': 8.740157480314961e-06, 'epoch': 2.3} 58%|█████▊ | 906/1572 [1:54:07<1:19:53, 7.20s/it] 58%|█████▊ | 907/1572 [1:54:15<1:23:02, 7.49s/it] {'loss': 0.778, 'learning_rate': 8.727034120734908e-06, 'epoch': 2.3} 58%|█████▊ | 907/1572 [1:54:15<1:23:02, 7.49s/it] 58%|█████▊ | 908/1572 [1:54:22<1:20:51, 7.31s/it] {'loss': 0.6186, 'learning_rate': 8.713910761154857e-06, 'epoch': 2.31} 58%|█████▊ | 908/1572 [1:54:22<1:20:51, 7.31s/it] 58%|█████▊ | 909/1572 [1:54:30<1:23:07, 7.52s/it] {'loss': 0.7648, 'learning_rate': 8.700787401574804e-06, 'epoch': 2.31} 58%|█████▊ | 909/1572 [1:54:30<1:23:07, 7.52s/it] 58%|█████▊ | 910/1572 [1:54:37<1:23:20, 7.55s/it] {'loss': 0.6451, 'learning_rate': 8.687664041994751e-06, 'epoch': 2.31} 58%|█████▊ | 910/1572 [1:54:37<1:23:20, 7.55s/it] 58%|█████▊ | 911/1572 [1:54:44<1:20:41, 7.32s/it] {'loss': 0.5695, 'learning_rate': 8.674540682414699e-06, 'epoch': 2.31} 58%|█████▊ | 911/1572 [1:54:44<1:20:41, 7.32s/it] 58%|█████▊ | 912/1572 [1:54:51<1:19:13, 7.20s/it] {'loss': 0.6104, 'learning_rate': 8.661417322834647e-06, 'epoch': 2.32} 58%|█████▊ | 912/1572 [1:54:51<1:19:13, 7.20s/it] 58%|█████▊ | 913/1572 [1:54:59<1:21:09, 7.39s/it] {'loss': 0.6005, 'learning_rate': 8.648293963254594e-06, 'epoch': 2.32} 58%|█████▊ | 913/1572 [1:54:59<1:21:09, 7.39s/it] 58%|█████▊ | 914/1572 [1:55:07<1:22:23, 7.51s/it] {'loss': 0.6166, 'learning_rate': 8.635170603674542e-06, 'epoch': 2.32} 58%|█████▊ | 914/1572 [1:55:07<1:22:23, 7.51s/it] 58%|█████▊ | 915/1572 [1:55:14<1:21:01, 7.40s/it] {'loss': 0.6402, 'learning_rate': 8.622047244094489e-06, 'epoch': 2.32} 58%|█████▊ | 915/1572 [1:55:14<1:21:01, 7.40s/it] 58%|█████▊ | 916/1572 [1:55:21<1:20:27, 7.36s/it] {'loss': 0.5688, 'learning_rate': 8.608923884514436e-06, 'epoch': 2.33} 58%|█████▊ | 916/1572 [1:55:21<1:20:27, 7.36s/it] 58%|█████▊ | 917/1572 [1:55:29<1:20:05, 7.34s/it] {'loss': 0.6487, 'learning_rate': 8.595800524934383e-06, 'epoch': 2.33} 58%|█████▊ | 917/1572 [1:55:29<1:20:05, 7.34s/it] 58%|█████▊ | 918/1572 [1:55:36<1:19:41, 7.31s/it] {'loss': 0.6313, 'learning_rate': 8.582677165354332e-06, 'epoch': 2.33} 58%|█████▊ | 918/1572 [1:55:36<1:19:41, 7.31s/it] 58%|█████▊ | 919/1572 [1:55:43<1:19:29, 7.30s/it] {'loss': 0.632, 'learning_rate': 8.569553805774279e-06, 'epoch': 2.33} 58%|█████▊ | 919/1572 [1:55:43<1:19:29, 7.30s/it] 59%|█████▊ | 920/1572 [1:55:51<1:20:12, 7.38s/it] {'loss': 0.6531, 'learning_rate': 8.556430446194226e-06, 'epoch': 2.34} 59%|█████▊ | 920/1572 [1:55:51<1:20:12, 7.38s/it] 59%|█████▊ | 921/1572 [1:55:58<1:21:45, 7.54s/it] {'loss': 0.6736, 'learning_rate': 8.543307086614175e-06, 'epoch': 2.34} 59%|█████▊ | 921/1572 [1:55:58<1:21:45, 7.54s/it] 59%|█████▊ | 922/1572 [1:56:06<1:21:19, 7.51s/it] {'loss': 0.7554, 'learning_rate': 8.530183727034122e-06, 'epoch': 2.34} 59%|█████▊ | 922/1572 [1:56:06<1:21:19, 7.51s/it] 59%|█████▊ | 923/1572 [1:56:13<1:20:13, 7.42s/it] {'loss': 0.6537, 'learning_rate': 8.517060367454069e-06, 'epoch': 2.34} 59%|█████▊ | 923/1572 [1:56:13<1:20:13, 7.42s/it] 59%|█████▉ | 924/1572 [1:56:21<1:21:39, 7.56s/it] {'loss': 0.6095, 'learning_rate': 8.503937007874016e-06, 'epoch': 2.35} 59%|█████▉ | 924/1572 [1:56:21<1:21:39, 7.56s/it] 59%|█████▉ | 925/1572 [1:56:28<1:20:03, 7.42s/it] {'loss': 0.577, 'learning_rate': 8.490813648293963e-06, 'epoch': 2.35} 59%|█████▉ | 925/1572 [1:56:28<1:20:03, 7.42s/it] 59%|█████▉ | 926/1572 [1:56:35<1:18:36, 7.30s/it] {'loss': 0.643, 'learning_rate': 8.47769028871391e-06, 'epoch': 2.35} 59%|█████▉ | 926/1572 [1:56:35<1:18:36, 7.30s/it] 59%|█████▉ | 927/1572 [1:56:43<1:18:46, 7.33s/it] {'loss': 0.68, 'learning_rate': 8.46456692913386e-06, 'epoch': 2.35} 59%|█████▉ | 927/1572 [1:56:43<1:18:46, 7.33s/it] 59%|█████▉ | 928/1572 [1:56:49<1:16:51, 7.16s/it] {'loss': 0.5881, 'learning_rate': 8.451443569553806e-06, 'epoch': 2.36} 59%|█████▉ | 928/1572 [1:56:49<1:16:51, 7.16s/it] 59%|█████▉ | 929/1572 [1:56:57<1:18:07, 7.29s/it] {'loss': 0.6181, 'learning_rate': 8.438320209973753e-06, 'epoch': 2.36} 59%|█████▉ | 929/1572 [1:56:57<1:18:07, 7.29s/it] 59%|█████▉ | 930/1572 [1:57:04<1:18:32, 7.34s/it] {'loss': 0.6661, 'learning_rate': 8.4251968503937e-06, 'epoch': 2.36} 59%|█████▉ | 930/1572 [1:57:04<1:18:32, 7.34s/it] 59%|█████▉ | 931/1572 [1:57:12<1:18:36, 7.36s/it] {'loss': 0.6526, 'learning_rate': 8.41207349081365e-06, 'epoch': 2.36} 59%|█████▉ | 931/1572 [1:57:12<1:18:36, 7.36s/it] 59%|█████▉ | 932/1572 [1:57:19<1:18:19, 7.34s/it] {'loss': 0.6136, 'learning_rate': 8.398950131233596e-06, 'epoch': 2.37} 59%|█████▉ | 932/1572 [1:57:19<1:18:19, 7.34s/it] 59%|█████▉ | 933/1572 [1:57:27<1:19:04, 7.42s/it] {'loss': 0.6261, 'learning_rate': 8.385826771653544e-06, 'epoch': 2.37} 59%|█████▉ | 933/1572 [1:57:27<1:19:04, 7.42s/it] 59%|█████▉ | 934/1572 [1:57:34<1:19:06, 7.44s/it] {'loss': 0.7581, 'learning_rate': 8.372703412073492e-06, 'epoch': 2.37} 59%|█████▉ | 934/1572 [1:57:34<1:19:06, 7.44s/it] 59%|█████▉ | 935/1572 [1:57:41<1:18:17, 7.38s/it] {'loss': 0.7365, 'learning_rate': 8.35958005249344e-06, 'epoch': 2.37} 59%|█████▉ | 935/1572 [1:57:41<1:18:17, 7.38s/it] 60%|█████▉ | 936/1572 [1:57:49<1:17:20, 7.30s/it] {'loss': 0.543, 'learning_rate': 8.346456692913387e-06, 'epoch': 2.38} 60%|█████▉ | 936/1572 [1:57:49<1:17:20, 7.30s/it] 60%|█████▉ | 937/1572 [1:57:55<1:16:04, 7.19s/it] {'loss': 0.6141, 'learning_rate': 8.333333333333334e-06, 'epoch': 2.38} 60%|█████▉ | 937/1572 [1:57:55<1:16:04, 7.19s/it] 60%|█████▉ | 938/1572 [1:58:03<1:15:43, 7.17s/it] {'loss': 0.6007, 'learning_rate': 8.320209973753281e-06, 'epoch': 2.38} 60%|█████▉ | 938/1572 [1:58:03<1:15:43, 7.17s/it] 60%|█████▉ | 939/1572 [1:58:10<1:15:19, 7.14s/it] {'loss': 0.6645, 'learning_rate': 8.307086614173228e-06, 'epoch': 2.38} 60%|█████▉ | 939/1572 [1:58:10<1:15:19, 7.14s/it] 60%|█████▉ | 940/1572 [1:58:17<1:16:41, 7.28s/it] {'loss': 0.616, 'learning_rate': 8.293963254593177e-06, 'epoch': 2.39} 60%|█████▉ | 940/1572 [1:58:17<1:16:41, 7.28s/it] 60%|█████▉ | 941/1572 [1:58:24<1:14:17, 7.06s/it] {'loss': 0.5948, 'learning_rate': 8.280839895013124e-06, 'epoch': 2.39} 60%|█████▉ | 941/1572 [1:58:24<1:14:17, 7.06s/it] 60%|█████▉ | 942/1572 [1:58:31<1:14:28, 7.09s/it] {'loss': 0.6621, 'learning_rate': 8.267716535433071e-06, 'epoch': 2.39} 60%|█████▉ | 942/1572 [1:58:31<1:14:28, 7.09s/it] 60%|█████▉ | 943/1572 [1:58:38<1:14:38, 7.12s/it] {'loss': 0.5853, 'learning_rate': 8.25459317585302e-06, 'epoch': 2.39} 60%|█████▉ | 943/1572 [1:58:38<1:14:38, 7.12s/it] 60%|██████ | 944/1572 [1:58:45<1:14:22, 7.11s/it] {'loss': 0.6448, 'learning_rate': 8.241469816272967e-06, 'epoch': 2.4} 60%|██████ | 944/1572 [1:58:45<1:14:22, 7.11s/it] 60%|██████ | 945/1572 [1:58:53<1:16:05, 7.28s/it] {'loss': 0.6693, 'learning_rate': 8.228346456692914e-06, 'epoch': 2.4} 60%|██████ | 945/1572 [1:58:53<1:16:05, 7.28s/it] 60%|██████ | 946/1572 [1:59:01<1:18:49, 7.55s/it] {'loss': 0.7148, 'learning_rate': 8.215223097112861e-06, 'epoch': 2.4} 60%|██████ | 946/1572 [1:59:01<1:18:49, 7.55s/it] 60%|██████ | 947/1572 [1:59:09<1:19:30, 7.63s/it] {'loss': 0.7296, 'learning_rate': 8.20209973753281e-06, 'epoch': 2.4} 60%|██████ | 947/1572 [1:59:09<1:19:30, 7.63s/it] 60%|██████ | 948/1572 [1:59:16<1:18:29, 7.55s/it] {'loss': 0.6092, 'learning_rate': 8.188976377952757e-06, 'epoch': 2.41} 60%|██████ | 948/1572 [1:59:16<1:18:29, 7.55s/it] 60%|██████ | 949/1572 [1:59:24<1:17:39, 7.48s/it] {'loss': 0.6536, 'learning_rate': 8.175853018372704e-06, 'epoch': 2.41} 60%|██████ | 949/1572 [1:59:24<1:17:39, 7.48s/it] 60%|██████ | 950/1572 [1:59:31<1:16:07, 7.34s/it] {'loss': 0.5782, 'learning_rate': 8.162729658792651e-06, 'epoch': 2.41} 60%|██████ | 950/1572 [1:59:31<1:16:07, 7.34s/it] 60%|██████ | 951/1572 [1:59:37<1:13:53, 7.14s/it] {'loss': 0.5929, 'learning_rate': 8.149606299212598e-06, 'epoch': 2.41} 60%|██████ | 951/1572 [1:59:37<1:13:53, 7.14s/it] 61%|██████ | 952/1572 [1:59:44<1:13:19, 7.10s/it] {'loss': 0.6732, 'learning_rate': 8.136482939632546e-06, 'epoch': 2.42} 61%|██████ | 952/1572 [1:59:44<1:13:19, 7.10s/it] 61%|██████ | 953/1572 [1:59:51<1:13:22, 7.11s/it] {'loss': 0.6452, 'learning_rate': 8.123359580052494e-06, 'epoch': 2.42} 61%|██████ | 953/1572 [1:59:51<1:13:22, 7.11s/it] 61%|██████ | 954/1572 [1:59:58<1:13:08, 7.10s/it] {'loss': 0.6733, 'learning_rate': 8.110236220472441e-06, 'epoch': 2.42} 61%|██████ | 954/1572 [1:59:58<1:13:08, 7.10s/it] 61%|██████ | 955/1572 [2:00:05<1:12:32, 7.05s/it] {'loss': 0.5783, 'learning_rate': 8.097112860892389e-06, 'epoch': 2.42} 61%|██████ | 955/1572 [2:00:05<1:12:32, 7.05s/it] 61%|██████ | 956/1572 [2:00:13<1:13:59, 7.21s/it] {'loss': 0.6939, 'learning_rate': 8.083989501312337e-06, 'epoch': 2.43} 61%|██████ | 956/1572 [2:00:13<1:13:59, 7.21s/it] 61%|██████ | 957/1572 [2:00:20<1:14:42, 7.29s/it] {'loss': 0.649, 'learning_rate': 8.070866141732285e-06, 'epoch': 2.43} 61%|██████ | 957/1572 [2:00:20<1:14:42, 7.29s/it] 61%|██████ | 958/1572 [2:00:28<1:16:17, 7.46s/it] {'loss': 0.6461, 'learning_rate': 8.057742782152232e-06, 'epoch': 2.43} 61%|██████ | 958/1572 [2:00:28<1:16:17, 7.46s/it] 61%|██████ | 959/1572 [2:00:35<1:14:43, 7.31s/it] {'loss': 0.6792, 'learning_rate': 8.04461942257218e-06, 'epoch': 2.43} 61%|██████ | 959/1572 [2:00:35<1:14:43, 7.31s/it] 61%|██████ | 960/1572 [2:00:43<1:14:14, 7.28s/it] {'loss': 0.6308, 'learning_rate': 8.031496062992128e-06, 'epoch': 2.44} 61%|██████ | 960/1572 [2:00:43<1:14:14, 7.28s/it] 61%|██████ | 961/1572 [2:00:50<1:15:13, 7.39s/it] {'loss': 0.6007, 'learning_rate': 8.018372703412075e-06, 'epoch': 2.44} 61%|██████ | 961/1572 [2:00:50<1:15:13, 7.39s/it] 61%|██████ | 962/1572 [2:00:57<1:13:42, 7.25s/it] {'loss': 0.5804, 'learning_rate': 8.005249343832022e-06, 'epoch': 2.44} 61%|██████ | 962/1572 [2:00:57<1:13:42, 7.25s/it] 61%|██████▏ | 963/1572 [2:01:04<1:11:49, 7.08s/it] {'loss': 0.5736, 'learning_rate': 7.992125984251969e-06, 'epoch': 2.44} 61%|██████▏ | 963/1572 [2:01:04<1:11:49, 7.08s/it] 61%|██████▏ | 964/1572 [2:01:11<1:11:37, 7.07s/it] {'loss': 0.705, 'learning_rate': 7.979002624671916e-06, 'epoch': 2.45} 61%|██████▏ | 964/1572 [2:01:11<1:11:37, 7.07s/it] 61%|██████▏ | 965/1572 [2:01:18<1:12:58, 7.21s/it] {'loss': 0.6313, 'learning_rate': 7.965879265091863e-06, 'epoch': 2.45} 61%|██████▏ | 965/1572 [2:01:18<1:12:58, 7.21s/it] 61%|██████▏ | 966/1572 [2:01:26<1:12:46, 7.21s/it] {'loss': 0.8025, 'learning_rate': 7.952755905511812e-06, 'epoch': 2.45} 61%|██████▏ | 966/1572 [2:01:26<1:12:46, 7.21s/it] 62%|██████▏ | 967/1572 [2:01:32<1:11:05, 7.05s/it] {'loss': 0.5595, 'learning_rate': 7.939632545931759e-06, 'epoch': 2.46} 62%|██████▏ | 967/1572 [2:01:32<1:11:05, 7.05s/it] 62%|██████▏ | 968/1572 [2:01:39<1:10:44, 7.03s/it] {'loss': 0.6043, 'learning_rate': 7.926509186351706e-06, 'epoch': 2.46} 62%|██████▏ | 968/1572 [2:01:39<1:10:44, 7.03s/it] 62%|██████▏ | 969/1572 [2:01:46<1:10:40, 7.03s/it] {'loss': 0.7616, 'learning_rate': 7.913385826771655e-06, 'epoch': 2.46} 62%|██████▏ | 969/1572 [2:01:46<1:10:40, 7.03s/it] 62%|██████▏ | 970/1572 [2:01:54<1:11:32, 7.13s/it] {'loss': 0.6214, 'learning_rate': 7.900262467191602e-06, 'epoch': 2.46} 62%|██████▏ | 970/1572 [2:01:54<1:11:32, 7.13s/it] 62%|██████▏ | 971/1572 [2:02:01<1:11:38, 7.15s/it] {'loss': 0.5419, 'learning_rate': 7.88713910761155e-06, 'epoch': 2.47} 62%|██████▏ | 971/1572 [2:02:01<1:11:38, 7.15s/it] 62%|██████▏ | 972/1572 [2:02:08<1:11:28, 7.15s/it] {'loss': 0.6296, 'learning_rate': 7.874015748031496e-06, 'epoch': 2.47} 62%|██████▏ | 972/1572 [2:02:08<1:11:28, 7.15s/it] 62%|██████▏ | 973/1572 [2:02:15<1:12:17, 7.24s/it] {'loss': 0.6272, 'learning_rate': 7.860892388451443e-06, 'epoch': 2.47} 62%|██████▏ | 973/1572 [2:02:15<1:12:17, 7.24s/it] 62%|██████▏ | 974/1572 [2:02:22<1:11:23, 7.16s/it] {'loss': 0.6284, 'learning_rate': 7.847769028871392e-06, 'epoch': 2.47} 62%|██████▏ | 974/1572 [2:02:22<1:11:23, 7.16s/it] 62%|██████▏ | 975/1572 [2:02:30<1:12:23, 7.28s/it] {'loss': 0.6204, 'learning_rate': 7.83464566929134e-06, 'epoch': 2.48} 62%|██████▏ | 975/1572 [2:02:30<1:12:23, 7.28s/it] 62%|██████▏ | 976/1572 [2:02:37<1:10:48, 7.13s/it] {'loss': 0.5949, 'learning_rate': 7.821522309711287e-06, 'epoch': 2.48} 62%|██████▏ | 976/1572 [2:02:37<1:10:48, 7.13s/it] 62%|██████▏ | 977/1572 [2:02:44<1:10:25, 7.10s/it] {'loss': 0.6518, 'learning_rate': 7.808398950131234e-06, 'epoch': 2.48} 62%|██████▏ | 977/1572 [2:02:44<1:10:25, 7.10s/it] 62%|██████▏ | 978/1572 [2:02:51<1:11:13, 7.19s/it] {'loss': 0.5697, 'learning_rate': 7.79527559055118e-06, 'epoch': 2.48} 62%|██████▏ | 978/1572 [2:02:51<1:11:13, 7.19s/it] 62%|██████▏ | 979/1572 [2:02:58<1:10:08, 7.10s/it] {'loss': 0.6188, 'learning_rate': 7.78215223097113e-06, 'epoch': 2.49} 62%|██████▏ | 979/1572 [2:02:58<1:10:08, 7.10s/it] 62%|██████▏ | 980/1572 [2:03:05<1:09:13, 7.02s/it] {'loss': 0.6032, 'learning_rate': 7.769028871391077e-06, 'epoch': 2.49} 62%|██████▏ | 980/1572 [2:03:05<1:09:13, 7.02s/it] 62%|██████▏ | 981/1572 [2:03:12<1:09:48, 7.09s/it] {'loss': 0.6406, 'learning_rate': 7.755905511811024e-06, 'epoch': 2.49} 62%|██████▏ | 981/1572 [2:03:12<1:09:48, 7.09s/it] 62%|██████▏ | 982/1572 [2:03:19<1:09:04, 7.02s/it] {'loss': 0.5665, 'learning_rate': 7.742782152230973e-06, 'epoch': 2.49} 62%|██████▏ | 982/1572 [2:03:19<1:09:04, 7.02s/it] 63%|██████▎ | 983/1572 [2:03:26<1:09:07, 7.04s/it] {'loss': 0.5978, 'learning_rate': 7.72965879265092e-06, 'epoch': 2.5} 63%|██████▎ | 983/1572 [2:03:26<1:09:07, 7.04s/it] 63%|██████▎ | 984/1572 [2:03:33<1:08:15, 6.96s/it] {'loss': 0.5523, 'learning_rate': 7.716535433070867e-06, 'epoch': 2.5} 63%|██████▎ | 984/1572 [2:03:33<1:08:15, 6.96s/it] 63%|██████▎ | 985/1572 [2:03:40<1:09:04, 7.06s/it] {'loss': 0.5657, 'learning_rate': 7.703412073490814e-06, 'epoch': 2.5} 63%|██████▎ | 985/1572 [2:03:40<1:09:04, 7.06s/it] 63%|██████▎ | 986/1572 [2:03:48<1:10:17, 7.20s/it] {'loss': 0.6334, 'learning_rate': 7.690288713910761e-06, 'epoch': 2.5} 63%|██████▎ | 986/1572 [2:03:48<1:10:17, 7.20s/it] 63%|██████▎ | 987/1572 [2:03:55<1:10:24, 7.22s/it] {'loss': 0.645, 'learning_rate': 7.677165354330708e-06, 'epoch': 2.51} 63%|██████▎ | 987/1572 [2:03:55<1:10:24, 7.22s/it] 63%|██████▎ | 988/1572 [2:04:02<1:10:00, 7.19s/it] {'loss': 0.6624, 'learning_rate': 7.664041994750657e-06, 'epoch': 2.51} 63%|██████▎ | 988/1572 [2:04:02<1:10:00, 7.19s/it] 63%|██████▎ | 989/1572 [2:04:10<1:10:40, 7.27s/it] {'loss': 0.6452, 'learning_rate': 7.650918635170604e-06, 'epoch': 2.51} 63%|██████▎ | 989/1572 [2:04:10<1:10:40, 7.27s/it] 63%|██████▎ | 990/1572 [2:04:17<1:10:51, 7.30s/it] {'loss': 0.6135, 'learning_rate': 7.637795275590551e-06, 'epoch': 2.51} 63%|██████▎ | 990/1572 [2:04:17<1:10:51, 7.30s/it] 63%|██████▎ | 991/1572 [2:04:25<1:11:58, 7.43s/it] {'loss': 0.584, 'learning_rate': 7.6246719160105e-06, 'epoch': 2.52} 63%|██████▎ | 991/1572 [2:04:25<1:11:58, 7.43s/it] 63%|██████▎ | 992/1572 [2:04:32<1:11:25, 7.39s/it] {'loss': 0.61, 'learning_rate': 7.611548556430447e-06, 'epoch': 2.52} 63%|██████▎ | 992/1572 [2:04:32<1:11:25, 7.39s/it] 63%|██████▎ | 993/1572 [2:04:39<1:11:01, 7.36s/it] {'loss': 0.5792, 'learning_rate': 7.598425196850394e-06, 'epoch': 2.52} 63%|██████▎ | 993/1572 [2:04:39<1:11:01, 7.36s/it] 63%|██████▎ | 994/1572 [2:04:46<1:10:05, 7.28s/it] {'loss': 0.5922, 'learning_rate': 7.585301837270341e-06, 'epoch': 2.52} 63%|██████▎ | 994/1572 [2:04:46<1:10:05, 7.28s/it] 63%|██████▎ | 995/1572 [2:04:54<1:10:49, 7.37s/it] {'loss': 0.6651, 'learning_rate': 7.572178477690289e-06, 'epoch': 2.53} 63%|██████▎ | 995/1572 [2:04:54<1:10:49, 7.37s/it] 63%|██████▎ | 996/1572 [2:05:01<1:10:16, 7.32s/it] {'loss': 0.7296, 'learning_rate': 7.5590551181102365e-06, 'epoch': 2.53} 63%|██████▎ | 996/1572 [2:05:01<1:10:16, 7.32s/it] 63%|██████▎ | 997/1572 [2:05:09<1:10:26, 7.35s/it] {'loss': 0.5467, 'learning_rate': 7.545931758530184e-06, 'epoch': 2.53} 63%|██████▎ | 997/1572 [2:05:09<1:10:26, 7.35s/it] 63%|██████▎ | 998/1572 [2:05:16<1:11:36, 7.48s/it] {'loss': 0.7355, 'learning_rate': 7.532808398950132e-06, 'epoch': 2.53} 63%|██████▎ | 998/1572 [2:05:16<1:11:36, 7.48s/it] 64%|██████▎ | 999/1572 [2:05:24<1:10:43, 7.41s/it] {'loss': 0.6573, 'learning_rate': 7.5196850393700795e-06, 'epoch': 2.54} 64%|██████▎ | 999/1572 [2:05:24<1:10:43, 7.41s/it] 64%|██████▎ | 1000/1572 [2:05:30<1:08:51, 7.22s/it] {'loss': 0.6198, 'learning_rate': 7.506561679790027e-06, 'epoch': 2.54} 64%|██████▎ | 1000/1572 [2:05:30<1:08:51, 7.22s/it] 64%|██████▎ | 1001/1572 [2:05:38<1:08:40, 7.22s/it] {'loss': 0.5829, 'learning_rate': 7.493438320209975e-06, 'epoch': 2.54} 64%|██████▎ | 1001/1572 [2:05:38<1:08:40, 7.22s/it] 64%|██████▎ | 1002/1572 [2:05:44<1:06:57, 7.05s/it] {'loss': 0.6324, 'learning_rate': 7.480314960629922e-06, 'epoch': 2.54} 64%|██████▎ | 1002/1572 [2:05:44<1:06:57, 7.05s/it] 64%|██████▍ | 1003/1572 [2:05:52<1:07:44, 7.14s/it] {'loss': 0.5266, 'learning_rate': 7.467191601049869e-06, 'epoch': 2.55} 64%|██████▍ | 1003/1572 [2:05:52<1:07:44, 7.14s/it] 64%|██████▍ | 1004/1572 [2:05:59<1:09:10, 7.31s/it] {'loss': 0.6182, 'learning_rate': 7.454068241469818e-06, 'epoch': 2.55} 64%|██████▍ | 1004/1572 [2:05:59<1:09:10, 7.31s/it] 64%|██████▍ | 1005/1572 [2:06:06<1:08:14, 7.22s/it] {'loss': 0.6859, 'learning_rate': 7.440944881889765e-06, 'epoch': 2.55} 64%|██████▍ | 1005/1572 [2:06:06<1:08:14, 7.22s/it] 64%|██████▍ | 1006/1572 [2:06:13<1:07:58, 7.21s/it] {'loss': 0.5596, 'learning_rate': 7.427821522309712e-06, 'epoch': 2.55} 64%|██████▍ | 1006/1572 [2:06:13<1:07:58, 7.21s/it] 64%|██████▍ | 1007/1572 [2:06:21<1:07:55, 7.21s/it] {'loss': 0.5993, 'learning_rate': 7.41469816272966e-06, 'epoch': 2.56} 64%|██████▍ | 1007/1572 [2:06:21<1:07:55, 7.21s/it] 64%|██████▍ | 1008/1572 [2:06:29<1:09:38, 7.41s/it] {'loss': 0.6331, 'learning_rate': 7.401574803149607e-06, 'epoch': 2.56} 64%|██████▍ | 1008/1572 [2:06:29<1:09:38, 7.41s/it] 64%|██████▍ | 1009/1572 [2:06:36<1:09:30, 7.41s/it] {'loss': 0.6644, 'learning_rate': 7.388451443569554e-06, 'epoch': 2.56} 64%|██████▍ | 1009/1572 [2:06:36<1:09:30, 7.41s/it] 64%|██████▍ | 1010/1572 [2:06:43<1:08:13, 7.28s/it] {'loss': 0.6473, 'learning_rate': 7.375328083989501e-06, 'epoch': 2.56} 64%|██████▍ | 1010/1572 [2:06:43<1:08:13, 7.28s/it] 64%|██████▍ | 1011/1572 [2:06:50<1:06:36, 7.12s/it] {'loss': 0.6239, 'learning_rate': 7.36220472440945e-06, 'epoch': 2.57} 64%|██████▍ | 1011/1572 [2:06:50<1:06:36, 7.12s/it] 64%|██████▍ | 1012/1572 [2:06:56<1:05:23, 7.01s/it] {'loss': 0.6353, 'learning_rate': 7.349081364829397e-06, 'epoch': 2.57} 64%|██████▍ | 1012/1572 [2:06:56<1:05:23, 7.01s/it] 64%|██████▍ | 1013/1572 [2:07:03<1:05:06, 6.99s/it] {'loss': 0.6204, 'learning_rate': 7.335958005249344e-06, 'epoch': 2.57} 64%|██████▍ | 1013/1572 [2:07:03<1:05:06, 6.99s/it] 65%|██████▍ | 1014/1572 [2:07:11<1:05:40, 7.06s/it] {'loss': 0.5977, 'learning_rate': 7.322834645669292e-06, 'epoch': 2.57} 65%|██████▍ | 1014/1572 [2:07:11<1:05:40, 7.06s/it] 65%|██████▍ | 1015/1572 [2:07:18<1:06:01, 7.11s/it] {'loss': 0.6741, 'learning_rate': 7.309711286089239e-06, 'epoch': 2.58} 65%|██████▍ | 1015/1572 [2:07:18<1:06:01, 7.11s/it] 65%|██████▍ | 1016/1572 [2:07:25<1:05:35, 7.08s/it] {'loss': 0.6163, 'learning_rate': 7.2965879265091864e-06, 'epoch': 2.58} 65%|██████▍ | 1016/1572 [2:07:25<1:05:35, 7.08s/it] 65%|██████▍ | 1017/1572 [2:07:32<1:06:58, 7.24s/it] {'loss': 0.6252, 'learning_rate': 7.283464566929135e-06, 'epoch': 2.58} 65%|██████▍ | 1017/1572 [2:07:32<1:06:58, 7.24s/it] 65%|██████▍ | 1018/1572 [2:07:40<1:07:49, 7.35s/it] {'loss': 0.558, 'learning_rate': 7.270341207349082e-06, 'epoch': 2.58} 65%|██████▍ | 1018/1572 [2:07:40<1:07:49, 7.35s/it] 65%|██████▍ | 1019/1572 [2:07:47<1:07:18, 7.30s/it] {'loss': 0.7629, 'learning_rate': 7.2572178477690295e-06, 'epoch': 2.59} 65%|██████▍ | 1019/1572 [2:07:47<1:07:18, 7.30s/it] 65%|██████▍ | 1020/1572 [2:07:55<1:07:25, 7.33s/it] {'loss': 0.6893, 'learning_rate': 7.2440944881889774e-06, 'epoch': 2.59} 65%|██████▍ | 1020/1572 [2:07:55<1:07:25, 7.33s/it] 65%|██████▍ | 1021/1572 [2:08:02<1:06:53, 7.28s/it] {'loss': 0.6833, 'learning_rate': 7.2309711286089245e-06, 'epoch': 2.59} 65%|██████▍ | 1021/1572 [2:08:02<1:06:53, 7.28s/it] 65%|██████▌ | 1022/1572 [2:08:09<1:06:51, 7.29s/it] {'loss': 0.6381, 'learning_rate': 7.217847769028872e-06, 'epoch': 2.59} 65%|██████▌ | 1022/1572 [2:08:09<1:06:51, 7.29s/it] 65%|██████▌ | 1023/1572 [2:08:16<1:06:16, 7.24s/it] {'loss': 0.606, 'learning_rate': 7.20472440944882e-06, 'epoch': 2.6} 65%|██████▌ | 1023/1572 [2:08:16<1:06:16, 7.24s/it] 65%|██████▌ | 1024/1572 [2:08:24<1:06:33, 7.29s/it] {'loss': 0.6269, 'learning_rate': 7.191601049868768e-06, 'epoch': 2.6} 65%|██████▌ | 1024/1572 [2:08:24<1:06:33, 7.29s/it] 65%|██████▌ | 1025/1572 [2:08:30<1:04:22, 7.06s/it] {'loss': 0.6477, 'learning_rate': 7.178477690288715e-06, 'epoch': 2.6} 65%|██████▌ | 1025/1572 [2:08:30<1:04:22, 7.06s/it] 65%|██████▌ | 1026/1572 [2:08:38<1:05:13, 7.17s/it] {'loss': 0.6106, 'learning_rate': 7.165354330708662e-06, 'epoch': 2.6} 65%|██████▌ | 1026/1572 [2:08:38<1:05:13, 7.17s/it] 65%|██████▌ | 1027/1572 [2:08:44<1:03:42, 7.01s/it] {'loss': 0.5857, 'learning_rate': 7.15223097112861e-06, 'epoch': 2.61} 65%|██████▌ | 1027/1572 [2:08:44<1:03:42, 7.01s/it] 65%|██████▌ | 1028/1572 [2:08:52<1:04:34, 7.12s/it] {'loss': 0.707, 'learning_rate': 7.139107611548557e-06, 'epoch': 2.61} 65%|██████▌ | 1028/1572 [2:08:52<1:04:34, 7.12s/it] 65%|██████▌ | 1029/1572 [2:08:59<1:03:56, 7.06s/it] {'loss': 0.6282, 'learning_rate': 7.125984251968504e-06, 'epoch': 2.61} 65%|██████▌ | 1029/1572 [2:08:59<1:03:56, 7.06s/it] 66%|██████▌ | 1030/1572 [2:09:06<1:05:13, 7.22s/it] {'loss': 0.5898, 'learning_rate': 7.112860892388452e-06, 'epoch': 2.62} 66%|██████▌ | 1030/1572 [2:09:06<1:05:13, 7.22s/it] 66%|██████▌ | 1031/1572 [2:09:13<1:04:35, 7.16s/it] {'loss': 0.6308, 'learning_rate': 7.099737532808399e-06, 'epoch': 2.62} 66%|██████▌ | 1031/1572 [2:09:13<1:04:35, 7.16s/it] 66%|██████▌ | 1032/1572 [2:09:21<1:05:23, 7.27s/it] {'loss': 0.7467, 'learning_rate': 7.086614173228347e-06, 'epoch': 2.62} 66%|██████▌ | 1032/1572 [2:09:21<1:05:23, 7.27s/it] 66%|██████▌ | 1033/1572 [2:09:28<1:04:53, 7.22s/it] {'loss': 0.5981, 'learning_rate': 7.073490813648295e-06, 'epoch': 2.62} 66%|██████▌ | 1033/1572 [2:09:28<1:04:53, 7.22s/it] 66%|██████▌ | 1034/1572 [2:09:35<1:05:47, 7.34s/it] {'loss': 0.7398, 'learning_rate': 7.060367454068242e-06, 'epoch': 2.63} 66%|██████▌ | 1034/1572 [2:09:35<1:05:47, 7.34s/it] 66%|██████▌ | 1035/1572 [2:09:42<1:04:08, 7.17s/it] {'loss': 0.6665, 'learning_rate': 7.047244094488189e-06, 'epoch': 2.63} 66%|██████▌ | 1035/1572 [2:09:42<1:04:08, 7.17s/it] 66%|██████▌ | 1036/1572 [2:09:50<1:06:20, 7.43s/it] {'loss': 0.6329, 'learning_rate': 7.034120734908137e-06, 'epoch': 2.63} 66%|██████▌ | 1036/1572 [2:09:50<1:06:20, 7.43s/it] 66%|██████▌ | 1037/1572 [2:09:58<1:06:42, 7.48s/it] {'loss': 0.604, 'learning_rate': 7.020997375328084e-06, 'epoch': 2.63} 66%|██████▌ | 1037/1572 [2:09:58<1:06:42, 7.48s/it] 66%|██████▌ | 1038/1572 [2:10:05<1:05:52, 7.40s/it] {'loss': 0.5794, 'learning_rate': 7.0078740157480315e-06, 'epoch': 2.64} 66%|██████▌ | 1038/1572 [2:10:05<1:05:52, 7.40s/it] 66%|██████▌ | 1039/1572 [2:10:12<1:05:01, 7.32s/it] {'loss': 0.7249, 'learning_rate': 6.99475065616798e-06, 'epoch': 2.64} 66%|██████▌ | 1039/1572 [2:10:12<1:05:01, 7.32s/it] 66%|██████▌ | 1040/1572 [2:10:20<1:06:41, 7.52s/it] {'loss': 0.5788, 'learning_rate': 6.981627296587927e-06, 'epoch': 2.64} 66%|██████▌ | 1040/1572 [2:10:20<1:06:41, 7.52s/it] 66%|██████▌ | 1041/1572 [2:10:28<1:07:01, 7.57s/it] {'loss': 0.6256, 'learning_rate': 6.9685039370078745e-06, 'epoch': 2.64} 66%|██████▌ | 1041/1572 [2:10:28<1:07:01, 7.57s/it] 66%|██████▋ | 1042/1572 [2:10:35<1:05:38, 7.43s/it] {'loss': 0.6902, 'learning_rate': 6.955380577427822e-06, 'epoch': 2.65} 66%|██████▋ | 1042/1572 [2:10:35<1:05:38, 7.43s/it] 66%|██████▋ | 1043/1572 [2:10:42<1:04:51, 7.36s/it] {'loss': 0.6513, 'learning_rate': 6.94225721784777e-06, 'epoch': 2.65} 66%|██████▋ | 1043/1572 [2:10:42<1:04:51, 7.36s/it] 66%|██████▋ | 1044/1572 [2:10:49<1:04:39, 7.35s/it] {'loss': 0.6074, 'learning_rate': 6.929133858267717e-06, 'epoch': 2.65} 66%|██████▋ | 1044/1572 [2:10:49<1:04:39, 7.35s/it] 66%|██████▋ | 1045/1572 [2:10:57<1:04:06, 7.30s/it] {'loss': 0.627, 'learning_rate': 6.916010498687664e-06, 'epoch': 2.65} 66%|██████▋ | 1045/1572 [2:10:57<1:04:06, 7.30s/it] 67%|██████▋ | 1046/1572 [2:11:04<1:04:42, 7.38s/it] {'loss': 0.6453, 'learning_rate': 6.902887139107613e-06, 'epoch': 2.66} 67%|██████▋ | 1046/1572 [2:11:04<1:04:42, 7.38s/it] 67%|██████▋ | 1047/1572 [2:11:11<1:03:31, 7.26s/it] {'loss': 0.6491, 'learning_rate': 6.88976377952756e-06, 'epoch': 2.66} 67%|██████▋ | 1047/1572 [2:11:11<1:03:31, 7.26s/it] 67%|██████▋ | 1048/1572 [2:11:18<1:03:09, 7.23s/it] {'loss': 0.6324, 'learning_rate': 6.876640419947507e-06, 'epoch': 2.66} 67%|██████▋ | 1048/1572 [2:11:18<1:03:09, 7.23s/it] 67%|██████▋ | 1049/1572 [2:11:25<1:01:51, 7.10s/it] {'loss': 0.6118, 'learning_rate': 6.863517060367455e-06, 'epoch': 2.66} 67%|██████▋ | 1049/1572 [2:11:25<1:01:51, 7.10s/it] 67%|██████▋ | 1050/1572 [2:11:32<1:01:23, 7.06s/it] {'loss': 0.6043, 'learning_rate': 6.850393700787402e-06, 'epoch': 2.67} 67%|██████▋ | 1050/1572 [2:11:32<1:01:23, 7.06s/it] 67%|██████▋ | 1051/1572 [2:11:40<1:04:07, 7.38s/it] {'loss': 0.6692, 'learning_rate': 6.837270341207349e-06, 'epoch': 2.67} 67%|██████▋ | 1051/1572 [2:11:40<1:04:07, 7.38s/it] 67%|██████▋ | 1052/1572 [2:11:47<1:02:47, 7.25s/it] {'loss': 0.649, 'learning_rate': 6.824146981627298e-06, 'epoch': 2.67} 67%|██████▋ | 1052/1572 [2:11:47<1:02:47, 7.25s/it] 67%|██████▋ | 1053/1572 [2:11:54<1:02:33, 7.23s/it] {'loss': 0.626, 'learning_rate': 6.811023622047245e-06, 'epoch': 2.67} 67%|██████▋ | 1053/1572 [2:11:54<1:02:33, 7.23s/it] 67%|██████▋ | 1054/1572 [2:12:03<1:04:59, 7.53s/it] {'loss': 0.6614, 'learning_rate': 6.797900262467192e-06, 'epoch': 2.68} 67%|██████▋ | 1054/1572 [2:12:03<1:04:59, 7.53s/it] 67%|██████▋ | 1055/1572 [2:12:09<1:02:51, 7.30s/it] {'loss': 0.5828, 'learning_rate': 6.78477690288714e-06, 'epoch': 2.68} 67%|██████▋ | 1055/1572 [2:12:09<1:02:51, 7.30s/it] 67%|██████▋ | 1056/1572 [2:12:17<1:03:03, 7.33s/it] {'loss': 0.6297, 'learning_rate': 6.771653543307087e-06, 'epoch': 2.68} 67%|██████▋ | 1056/1572 [2:12:17<1:03:03, 7.33s/it] 67%|██████▋ | 1057/1572 [2:12:24<1:01:46, 7.20s/it] {'loss': 0.6677, 'learning_rate': 6.758530183727034e-06, 'epoch': 2.68} 67%|██████▋ | 1057/1572 [2:12:24<1:01:46, 7.20s/it] 67%|██████▋ | 1058/1572 [2:12:31<1:02:36, 7.31s/it] {'loss': 0.5872, 'learning_rate': 6.745406824146981e-06, 'epoch': 2.69} 67%|██████▋ | 1058/1572 [2:12:31<1:02:36, 7.31s/it] 67%|██████▋ | 1059/1572 [2:12:38<1:01:33, 7.20s/it] {'loss': 0.5833, 'learning_rate': 6.73228346456693e-06, 'epoch': 2.69} 67%|██████▋ | 1059/1572 [2:12:38<1:01:33, 7.20s/it] 67%|██████▋ | 1060/1572 [2:12:45<1:01:22, 7.19s/it] {'loss': 0.5769, 'learning_rate': 6.719160104986877e-06, 'epoch': 2.69} 67%|██████▋ | 1060/1572 [2:12:45<1:01:22, 7.19s/it] 67%|██████▋ | 1061/1572 [2:12:52<1:00:38, 7.12s/it] {'loss': 0.5524, 'learning_rate': 6.7060367454068245e-06, 'epoch': 2.69} 67%|██████▋ | 1061/1572 [2:12:52<1:00:38, 7.12s/it] 68%|██████▊ | 1062/1572 [2:12:59<59:51, 7.04s/it] {'loss': 0.6334, 'learning_rate': 6.692913385826772e-06, 'epoch': 2.7} 68%|██████▊ | 1062/1572 [2:12:59<59:51, 7.04s/it] 68%|██████▊ | 1063/1572 [2:13:07<1:02:28, 7.36s/it] {'loss': 0.6272, 'learning_rate': 6.6797900262467195e-06, 'epoch': 2.7} 68%|██████▊ | 1063/1572 [2:13:07<1:02:28, 7.36s/it] 68%|██████▊ | 1064/1572 [2:13:14<1:00:32, 7.15s/it] {'loss': 0.5321, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.7} 68%|██████▊ | 1064/1572 [2:13:14<1:00:32, 7.15s/it] 68%|██████▊ | 1065/1572 [2:13:21<1:01:12, 7.24s/it] {'loss': 0.6737, 'learning_rate': 6.6535433070866155e-06, 'epoch': 2.7} 68%|██████▊ | 1065/1572 [2:13:21<1:01:12, 7.24s/it] 68%|██████▊ | 1066/1572 [2:13:29<1:02:30, 7.41s/it] {'loss': 0.6599, 'learning_rate': 6.6404199475065626e-06, 'epoch': 2.71} 68%|██████▊ | 1066/1572 [2:13:29<1:02:30, 7.41s/it] 68%|██████▊ | 1067/1572 [2:13:36<1:00:52, 7.23s/it] {'loss': 0.5876, 'learning_rate': 6.62729658792651e-06, 'epoch': 2.71} 68%|██████▊ | 1067/1572 [2:13:36<1:00:52, 7.23s/it] 68%|██████▊ | 1068/1572 [2:13:43<1:00:16, 7.18s/it] {'loss': 0.6939, 'learning_rate': 6.614173228346458e-06, 'epoch': 2.71} 68%|██████▊ | 1068/1572 [2:13:43<1:00:16, 7.18s/it] 68%|██████▊ | 1069/1572 [2:13:50<1:00:43, 7.24s/it] {'loss': 0.5652, 'learning_rate': 6.601049868766405e-06, 'epoch': 2.71} 68%|██████▊ | 1069/1572 [2:13:50<1:00:43, 7.24s/it] 68%|██████▊ | 1070/1572 [2:13:57<59:20, 7.09s/it] {'loss': 0.6255, 'learning_rate': 6.587926509186352e-06, 'epoch': 2.72} 68%|██████▊ | 1070/1572 [2:13:57<59:20, 7.09s/it] 68%|██████▊ | 1071/1572 [2:14:04<59:16, 7.10s/it] {'loss': 0.5783, 'learning_rate': 6.574803149606301e-06, 'epoch': 2.72} 68%|██████▊ | 1071/1572 [2:14:04<59:16, 7.10s/it] 68%|██████▊ | 1072/1572 [2:14:12<59:42, 7.17s/it] {'loss': 0.6066, 'learning_rate': 6.561679790026248e-06, 'epoch': 2.72} 68%|██████▊ | 1072/1572 [2:14:12<59:42, 7.17s/it] 68%|██████▊ | 1073/1572 [2:14:19<59:27, 7.15s/it] {'loss': 0.6528, 'learning_rate': 6.548556430446195e-06, 'epoch': 2.72} 68%|██████▊ | 1073/1572 [2:14:19<59:27, 7.15s/it] 68%|██████▊ | 1074/1572 [2:14:26<59:45, 7.20s/it] {'loss': 0.5878, 'learning_rate': 6.535433070866142e-06, 'epoch': 2.73} 68%|██████▊ | 1074/1572 [2:14:26<59:45, 7.20s/it] 68%|██████▊ | 1075/1572 [2:14:34<1:00:42, 7.33s/it] {'loss': 0.612, 'learning_rate': 6.52230971128609e-06, 'epoch': 2.73} 68%|██████▊ | 1075/1572 [2:14:34<1:00:42, 7.33s/it] 68%|██████▊ | 1076/1572 [2:14:40<58:55, 7.13s/it] {'loss': 0.6516, 'learning_rate': 6.509186351706037e-06, 'epoch': 2.73} 68%|██████▊ | 1076/1572 [2:14:40<58:55, 7.13s/it] 69%|██████▊ | 1077/1572 [2:14:47<58:35, 7.10s/it] {'loss': 0.6555, 'learning_rate': 6.496062992125984e-06, 'epoch': 2.73} 69%|██████▊ | 1077/1572 [2:14:47<58:35, 7.10s/it] 69%|██████▊ | 1078/1572 [2:14:54<58:29, 7.10s/it] {'loss': 0.6241, 'learning_rate': 6.482939632545932e-06, 'epoch': 2.74} 69%|██████▊ | 1078/1572 [2:14:54<58:29, 7.10s/it] 69%|██████▊ | 1079/1572 [2:15:02<58:45, 7.15s/it] {'loss': 0.5879, 'learning_rate': 6.46981627296588e-06, 'epoch': 2.74} 69%|██████▊ | 1079/1572 [2:15:02<58:45, 7.15s/it] 69%|██████▊ | 1080/1572 [2:15:09<58:59, 7.19s/it] {'loss': 0.5961, 'learning_rate': 6.456692913385827e-06, 'epoch': 2.74} 69%|██████▊ | 1080/1572 [2:15:09<58:59, 7.19s/it] 69%|██████▉ | 1081/1572 [2:15:16<58:21, 7.13s/it] {'loss': 0.63, 'learning_rate': 6.443569553805775e-06, 'epoch': 2.74} 69%|██████▉ | 1081/1572 [2:15:16<58:21, 7.13s/it] 69%|██████▉ | 1082/1572 [2:15:23<58:26, 7.16s/it] {'loss': 0.5751, 'learning_rate': 6.430446194225722e-06, 'epoch': 2.75} 69%|██████▉ | 1082/1572 [2:15:23<58:26, 7.16s/it] 69%|██████▉ | 1083/1572 [2:15:31<59:06, 7.25s/it] {'loss': 0.6776, 'learning_rate': 6.4173228346456695e-06, 'epoch': 2.75} 69%|██████▉ | 1083/1572 [2:15:31<59:06, 7.25s/it] 69%|██████▉ | 1084/1572 [2:15:37<57:33, 7.08s/it] {'loss': 0.6335, 'learning_rate': 6.4041994750656174e-06, 'epoch': 2.75} 69%|██████▉ | 1084/1572 [2:15:37<57:33, 7.08s/it] 69%|██████▉ | 1085/1572 [2:15:44<57:36, 7.10s/it] {'loss': 0.6898, 'learning_rate': 6.3910761154855646e-06, 'epoch': 2.75} 69%|██████▉ | 1085/1572 [2:15:44<57:36, 7.10s/it] 69%|██████▉ | 1086/1572 [2:15:51<57:08, 7.05s/it] {'loss': 0.6258, 'learning_rate': 6.3779527559055125e-06, 'epoch': 2.76} 69%|██████▉ | 1086/1572 [2:15:51<57:08, 7.05s/it] 69%|██████▉ | 1087/1572 [2:15:59<58:49, 7.28s/it] {'loss': 0.7041, 'learning_rate': 6.3648293963254605e-06, 'epoch': 2.76} 69%|██████▉ | 1087/1572 [2:15:59<58:49, 7.28s/it] 69%|██████▉ | 1088/1572 [2:16:07<58:45, 7.28s/it] {'loss': 0.6841, 'learning_rate': 6.351706036745408e-06, 'epoch': 2.76} 69%|██████▉ | 1088/1572 [2:16:07<58:45, 7.28s/it] 69%|██████▉ | 1089/1572 [2:16:15<1:01:24, 7.63s/it] {'loss': 0.6457, 'learning_rate': 6.338582677165355e-06, 'epoch': 2.76} 69%|██████▉ | 1089/1572 [2:16:15<1:01:24, 7.63s/it] 69%|██████▉ | 1090/1572 [2:16:23<1:01:54, 7.71s/it] {'loss': 0.6021, 'learning_rate': 6.325459317585302e-06, 'epoch': 2.77} 69%|██████▉ | 1090/1572 [2:16:23<1:01:54, 7.71s/it] 69%|██████▉ | 1091/1572 [2:16:31<1:02:46, 7.83s/it] {'loss': 0.7876, 'learning_rate': 6.31233595800525e-06, 'epoch': 2.77} 69%|██████▉ | 1091/1572 [2:16:31<1:02:46, 7.83s/it] 69%|██████▉ | 1092/1572 [2:16:38<1:00:52, 7.61s/it] {'loss': 0.6511, 'learning_rate': 6.299212598425197e-06, 'epoch': 2.77} 69%|██████▉ | 1092/1572 [2:16:38<1:00:52, 7.61s/it] 70%|██████▉ | 1093/1572 [2:16:45<59:02, 7.40s/it] {'loss': 0.5693, 'learning_rate': 6.286089238845144e-06, 'epoch': 2.77} 70%|██████▉ | 1093/1572 [2:16:45<59:02, 7.40s/it] 70%|██████▉ | 1094/1572 [2:16:52<58:54, 7.39s/it] {'loss': 0.6689, 'learning_rate': 6.272965879265093e-06, 'epoch': 2.78} 70%|██████▉ | 1094/1572 [2:16:52<58:54, 7.39s/it] 70%|██████▉ | 1095/1572 [2:16:59<57:08, 7.19s/it] {'loss': 0.5794, 'learning_rate': 6.25984251968504e-06, 'epoch': 2.78} 70%|██████▉ | 1095/1572 [2:16:59<57:08, 7.19s/it] 70%|██████▉ | 1096/1572 [2:17:06<57:09, 7.21s/it] {'loss': 0.6095, 'learning_rate': 6.246719160104987e-06, 'epoch': 2.78} 70%|██████▉ | 1096/1572 [2:17:06<57:09, 7.21s/it] 70%|██████▉ | 1097/1572 [2:17:14<58:00, 7.33s/it] {'loss': 0.6198, 'learning_rate': 6.233595800524935e-06, 'epoch': 2.79} 70%|██████▉ | 1097/1572 [2:17:14<58:00, 7.33s/it] 70%|██████▉ | 1098/1572 [2:17:21<57:51, 7.32s/it] {'loss': 0.728, 'learning_rate': 6.220472440944882e-06, 'epoch': 2.79} 70%|██████▉ | 1098/1572 [2:17:21<57:51, 7.32s/it] 70%|██████▉ | 1099/1572 [2:17:29<58:36, 7.43s/it] {'loss': 0.6237, 'learning_rate': 6.207349081364829e-06, 'epoch': 2.79} 70%|██████▉ | 1099/1572 [2:17:29<58:36, 7.43s/it] 70%|██████▉ | 1100/1572 [2:17:36<57:41, 7.33s/it] {'loss': 0.5208, 'learning_rate': 6.194225721784778e-06, 'epoch': 2.79} 70%|██████▉ | 1100/1572 [2:17:36<57:41, 7.33s/it] 70%|███████ | 1101/1572 [2:17:43<57:15, 7.29s/it] {'loss': 0.6644, 'learning_rate': 6.181102362204725e-06, 'epoch': 2.8} 70%|███████ | 1101/1572 [2:17:43<57:15, 7.29s/it] 70%|███████ | 1102/1572 [2:17:50<56:38, 7.23s/it] {'loss': 0.6237, 'learning_rate': 6.167979002624672e-06, 'epoch': 2.8} 70%|███████ | 1102/1572 [2:17:50<56:38, 7.23s/it] 70%|███████ | 1103/1572 [2:17:58<56:50, 7.27s/it] {'loss': 0.6408, 'learning_rate': 6.15485564304462e-06, 'epoch': 2.8} 70%|███████ | 1103/1572 [2:17:58<56:50, 7.27s/it] 70%|███████ | 1104/1572 [2:18:05<56:23, 7.23s/it] {'loss': 0.5923, 'learning_rate': 6.141732283464567e-06, 'epoch': 2.8} 70%|███████ | 1104/1572 [2:18:05<56:23, 7.23s/it] 70%|███████ | 1105/1572 [2:18:12<56:16, 7.23s/it] {'loss': 0.5986, 'learning_rate': 6.1286089238845145e-06, 'epoch': 2.81} 70%|███████ | 1105/1572 [2:18:12<56:16, 7.23s/it] 70%|███████ | 1106/1572 [2:18:19<55:27, 7.14s/it] {'loss': 0.6119, 'learning_rate': 6.115485564304462e-06, 'epoch': 2.81} 70%|███████ | 1106/1572 [2:18:19<55:27, 7.14s/it] 70%|███████ | 1107/1572 [2:18:26<55:16, 7.13s/it] {'loss': 0.5958, 'learning_rate': 6.1023622047244104e-06, 'epoch': 2.81} 70%|███████ | 1107/1572 [2:18:26<55:16, 7.13s/it] 70%|███████ | 1108/1572 [2:18:34<56:29, 7.31s/it] {'loss': 0.5795, 'learning_rate': 6.0892388451443576e-06, 'epoch': 2.81} 70%|███████ | 1108/1572 [2:18:34<56:29, 7.31s/it] 71%|███████ | 1109/1572 [2:18:41<56:17, 7.30s/it] {'loss': 0.5703, 'learning_rate': 6.076115485564305e-06, 'epoch': 2.82} 71%|███████ | 1109/1572 [2:18:41<56:17, 7.30s/it] 71%|███████ | 1110/1572 [2:18:48<55:12, 7.17s/it] {'loss': 0.5626, 'learning_rate': 6.062992125984253e-06, 'epoch': 2.82} 71%|███████ | 1110/1572 [2:18:48<55:12, 7.17s/it] 71%|███████ | 1111/1572 [2:18:55<55:31, 7.23s/it] {'loss': 0.5996, 'learning_rate': 6.0498687664042e-06, 'epoch': 2.82} 71%|███████ | 1111/1572 [2:18:55<55:31, 7.23s/it] 71%|███████ | 1112/1572 [2:19:03<55:57, 7.30s/it] {'loss': 0.6301, 'learning_rate': 6.036745406824147e-06, 'epoch': 2.82} 71%|███████ | 1112/1572 [2:19:03<55:57, 7.30s/it] 71%|███████ | 1113/1572 [2:19:10<56:12, 7.35s/it] {'loss': 0.6289, 'learning_rate': 6.023622047244096e-06, 'epoch': 2.83} 71%|███████ | 1113/1572 [2:19:10<56:12, 7.35s/it] 71%|███████ | 1114/1572 [2:19:18<55:58, 7.33s/it] {'loss': 0.5775, 'learning_rate': 6.010498687664043e-06, 'epoch': 2.83} 71%|███████ | 1114/1572 [2:19:18<55:58, 7.33s/it] 71%|███████ | 1115/1572 [2:19:25<55:37, 7.30s/it] {'loss': 0.7325, 'learning_rate': 5.99737532808399e-06, 'epoch': 2.83} 71%|███████ | 1115/1572 [2:19:25<55:37, 7.30s/it] 71%|███████ | 1116/1572 [2:19:32<56:18, 7.41s/it] {'loss': 0.6303, 'learning_rate': 5.984251968503938e-06, 'epoch': 2.83} 71%|███████ | 1116/1572 [2:19:32<56:18, 7.41s/it] 71%|███████ | 1117/1572 [2:19:40<57:03, 7.53s/it] {'loss': 0.6087, 'learning_rate': 5.971128608923885e-06, 'epoch': 2.84} 71%|███████ | 1117/1572 [2:19:40<57:03, 7.53s/it] 71%|███████ | 1118/1572 [2:19:47<54:59, 7.27s/it] {'loss': 0.6375, 'learning_rate': 5.958005249343832e-06, 'epoch': 2.84} 71%|███████ | 1118/1572 [2:19:47<54:59, 7.27s/it] 71%|███████ | 1119/1572 [2:19:55<56:03, 7.43s/it] {'loss': 0.6599, 'learning_rate': 5.944881889763781e-06, 'epoch': 2.84} 71%|███████ | 1119/1572 [2:19:55<56:03, 7.43s/it] 71%|███████ | 1120/1572 [2:20:02<56:10, 7.46s/it] {'loss': 0.6024, 'learning_rate': 5.931758530183728e-06, 'epoch': 2.84} 71%|███████ | 1120/1572 [2:20:02<56:10, 7.46s/it] 71%|███████▏ | 1121/1572 [2:20:09<55:23, 7.37s/it] {'loss': 0.6323, 'learning_rate': 5.918635170603675e-06, 'epoch': 2.85} 71%|███████▏ | 1121/1572 [2:20:09<55:23, 7.37s/it] 71%|███████▏ | 1122/1572 [2:20:17<56:09, 7.49s/it] {'loss': 0.6376, 'learning_rate': 5.905511811023622e-06, 'epoch': 2.85} 71%|███████▏ | 1122/1572 [2:20:17<56:09, 7.49s/it] 71%|███████▏ | 1123/1572 [2:20:25<55:48, 7.46s/it] {'loss': 0.6958, 'learning_rate': 5.89238845144357e-06, 'epoch': 2.85} 71%|███████▏ | 1123/1572 [2:20:25<55:48, 7.46s/it] 72%|███████▏ | 1124/1572 [2:20:32<56:15, 7.53s/it] {'loss': 0.6853, 'learning_rate': 5.879265091863517e-06, 'epoch': 2.85} 72%|███████▏ | 1124/1572 [2:20:32<56:15, 7.53s/it] 72%|███████▏ | 1125/1572 [2:20:39<55:29, 7.45s/it] {'loss': 0.6357, 'learning_rate': 5.8661417322834645e-06, 'epoch': 2.86} 72%|███████▏ | 1125/1572 [2:20:39<55:29, 7.45s/it] 72%|███████▏ | 1126/1572 [2:20:47<55:09, 7.42s/it] {'loss': 0.6529, 'learning_rate': 5.853018372703413e-06, 'epoch': 2.86} 72%|███████▏ | 1126/1572 [2:20:47<55:09, 7.42s/it] 72%|███████▏ | 1127/1572 [2:20:54<54:31, 7.35s/it] {'loss': 0.6125, 'learning_rate': 5.83989501312336e-06, 'epoch': 2.86} 72%|███████▏ | 1127/1572 [2:20:54<54:31, 7.35s/it] 72%|███████▏ | 1128/1572 [2:21:01<54:00, 7.30s/it] {'loss': 0.648, 'learning_rate': 5.8267716535433075e-06, 'epoch': 2.86} 72%|███████▏ | 1128/1572 [2:21:01<54:00, 7.30s/it] 72%|███████▏ | 1129/1572 [2:21:08<53:44, 7.28s/it] {'loss': 0.655, 'learning_rate': 5.8136482939632555e-06, 'epoch': 2.87} 72%|███████▏ | 1129/1572 [2:21:08<53:44, 7.28s/it] 72%|███████▏ | 1130/1572 [2:21:16<54:24, 7.39s/it] {'loss': 0.5752, 'learning_rate': 5.800524934383203e-06, 'epoch': 2.87} 72%|███████▏ | 1130/1572 [2:21:16<54:24, 7.39s/it] 72%|███████▏ | 1131/1572 [2:21:23<52:49, 7.19s/it] {'loss': 0.7096, 'learning_rate': 5.78740157480315e-06, 'epoch': 2.87} 72%|███████▏ | 1131/1572 [2:21:23<52:49, 7.19s/it] 72%|███████▏ | 1132/1572 [2:21:30<52:48, 7.20s/it] {'loss': 0.6383, 'learning_rate': 5.774278215223098e-06, 'epoch': 2.87} 72%|███████▏ | 1132/1572 [2:21:30<52:48, 7.20s/it] 72%|███████▏ | 1133/1572 [2:21:38<53:48, 7.35s/it] {'loss': 0.731, 'learning_rate': 5.761154855643046e-06, 'epoch': 2.88} 72%|███████▏ | 1133/1572 [2:21:38<53:48, 7.35s/it] 72%|███████▏ | 1134/1572 [2:21:46<55:59, 7.67s/it] {'loss': 0.6733, 'learning_rate': 5.748031496062993e-06, 'epoch': 2.88} 72%|███████▏ | 1134/1572 [2:21:46<55:59, 7.67s/it] 72%|███████▏ | 1135/1572 [2:21:53<54:12, 7.44s/it] {'loss': 0.6333, 'learning_rate': 5.734908136482941e-06, 'epoch': 2.88} 72%|███████▏ | 1135/1572 [2:21:53<54:12, 7.44s/it] 72%|███████▏ | 1136/1572 [2:22:01<54:51, 7.55s/it] {'loss': 0.7259, 'learning_rate': 5.721784776902888e-06, 'epoch': 2.88} 72%|███████▏ | 1136/1572 [2:22:01<54:51, 7.55s/it] 72%|███████▏ | 1137/1572 [2:22:08<53:25, 7.37s/it] {'loss': 0.6367, 'learning_rate': 5.708661417322835e-06, 'epoch': 2.89} 72%|███████▏ | 1137/1572 [2:22:08<53:25, 7.37s/it] 72%|███████▏ | 1138/1572 [2:22:15<52:46, 7.30s/it] {'loss': 0.7418, 'learning_rate': 5.695538057742782e-06, 'epoch': 2.89} 72%|███████▏ | 1138/1572 [2:22:15<52:46, 7.30s/it] 72%|███████▏ | 1139/1572 [2:22:22<52:04, 7.22s/it] {'loss': 0.6129, 'learning_rate': 5.68241469816273e-06, 'epoch': 2.89} 72%|███████▏ | 1139/1572 [2:22:22<52:04, 7.22s/it] 73%|███████▎ | 1140/1572 [2:22:29<52:37, 7.31s/it] {'loss': 0.646, 'learning_rate': 5.669291338582677e-06, 'epoch': 2.89} 73%|███████▎ | 1140/1572 [2:22:29<52:37, 7.31s/it] 73%|███████▎ | 1141/1572 [2:22:37<52:15, 7.28s/it] {'loss': 0.6234, 'learning_rate': 5.656167979002625e-06, 'epoch': 2.9} 73%|███████▎ | 1141/1572 [2:22:37<52:15, 7.28s/it] 73%|███████▎ | 1142/1572 [2:22:45<54:16, 7.57s/it] {'loss': 0.6498, 'learning_rate': 5.643044619422573e-06, 'epoch': 2.9} 73%|███████▎ | 1142/1572 [2:22:45<54:16, 7.57s/it] 73%|███████▎ | 1143/1572 [2:22:52<53:11, 7.44s/it] {'loss': 0.5893, 'learning_rate': 5.62992125984252e-06, 'epoch': 2.9} 73%|███████▎ | 1143/1572 [2:22:52<53:11, 7.44s/it] 73%|███████▎ | 1144/1572 [2:22:59<51:51, 7.27s/it] {'loss': 0.6116, 'learning_rate': 5.616797900262467e-06, 'epoch': 2.9} 73%|███████▎ | 1144/1572 [2:22:59<51:51, 7.27s/it] 73%|███████▎ | 1145/1572 [2:23:06<52:06, 7.32s/it] {'loss': 0.6926, 'learning_rate': 5.603674540682415e-06, 'epoch': 2.91} 73%|███████▎ | 1145/1572 [2:23:06<52:06, 7.32s/it] 73%|███████▎ | 1146/1572 [2:23:14<53:17, 7.51s/it] {'loss': 0.5774, 'learning_rate': 5.590551181102362e-06, 'epoch': 2.91} 73%|███████▎ | 1146/1572 [2:23:14<53:17, 7.51s/it] 73%|███████▎ | 1147/1572 [2:23:21<51:59, 7.34s/it] {'loss': 0.5413, 'learning_rate': 5.5774278215223095e-06, 'epoch': 2.91} 73%|███████▎ | 1147/1572 [2:23:21<51:59, 7.34s/it] 73%|███████▎ | 1148/1572 [2:23:29<51:57, 7.35s/it] {'loss': 0.6365, 'learning_rate': 5.564304461942258e-06, 'epoch': 2.91} 73%|███████▎ | 1148/1572 [2:23:29<51:57, 7.35s/it] 73%|███████▎ | 1149/1572 [2:23:36<51:49, 7.35s/it] {'loss': 0.6671, 'learning_rate': 5.5511811023622054e-06, 'epoch': 2.92} 73%|███████▎ | 1149/1572 [2:23:36<51:49, 7.35s/it] 73%|███████▎ | 1150/1572 [2:23:43<51:20, 7.30s/it] {'loss': 0.5876, 'learning_rate': 5.5380577427821525e-06, 'epoch': 2.92} 73%|███████▎ | 1150/1572 [2:23:43<51:20, 7.30s/it] 73%|███████▎ | 1151/1572 [2:23:51<51:30, 7.34s/it] {'loss': 0.6142, 'learning_rate': 5.5249343832021005e-06, 'epoch': 2.92} 73%|███████▎ | 1151/1572 [2:23:51<51:30, 7.34s/it] 73%|███████▎ | 1152/1572 [2:23:58<50:25, 7.20s/it] {'loss': 0.6116, 'learning_rate': 5.511811023622048e-06, 'epoch': 2.92} 73%|███████▎ | 1152/1572 [2:23:58<50:25, 7.20s/it] 73%|███████▎ | 1153/1572 [2:24:05<50:02, 7.17s/it] {'loss': 0.652, 'learning_rate': 5.498687664041995e-06, 'epoch': 2.93} 73%|███████▎ | 1153/1572 [2:24:05<50:02, 7.17s/it] 73%|███████▎ | 1154/1572 [2:24:12<49:57, 7.17s/it] {'loss': 0.5818, 'learning_rate': 5.485564304461942e-06, 'epoch': 2.93} 73%|███████▎ | 1154/1572 [2:24:12<49:57, 7.17s/it] 73%|███████▎ | 1155/1572 [2:24:19<50:21, 7.25s/it] {'loss': 0.6518, 'learning_rate': 5.472440944881891e-06, 'epoch': 2.93} 73%|███████▎ | 1155/1572 [2:24:19<50:21, 7.25s/it] 74%|███████▎ | 1156/1572 [2:24:26<50:05, 7.23s/it] {'loss': 0.6722, 'learning_rate': 5.459317585301838e-06, 'epoch': 2.93} 74%|███████▎ | 1156/1572 [2:24:26<50:05, 7.23s/it] 74%|███████▎ | 1157/1572 [2:24:34<50:12, 7.26s/it] {'loss': 0.6402, 'learning_rate': 5.446194225721785e-06, 'epoch': 2.94} 74%|███████▎ | 1157/1572 [2:24:34<50:12, 7.26s/it] 74%|███████▎ | 1158/1572 [2:24:42<51:28, 7.46s/it] {'loss': 0.5531, 'learning_rate': 5.433070866141733e-06, 'epoch': 2.94} 74%|███████▎ | 1158/1572 [2:24:42<51:28, 7.46s/it] 74%|███████▎ | 1159/1572 [2:24:49<51:05, 7.42s/it] {'loss': 0.6255, 'learning_rate': 5.41994750656168e-06, 'epoch': 2.94} 74%|███████▎ | 1159/1572 [2:24:49<51:05, 7.42s/it] 74%|███████▍ | 1160/1572 [2:24:56<50:18, 7.33s/it] {'loss': 0.6685, 'learning_rate': 5.406824146981627e-06, 'epoch': 2.95} 74%|███████▍ | 1160/1572 [2:24:56<50:18, 7.33s/it] 74%|███████▍ | 1161/1572 [2:25:04<50:51, 7.43s/it] {'loss': 0.621, 'learning_rate': 5.393700787401576e-06, 'epoch': 2.95} 74%|███████▍ | 1161/1572 [2:25:04<50:51, 7.43s/it] 74%|███████▍ | 1162/1572 [2:25:11<49:26, 7.24s/it] {'loss': 0.6379, 'learning_rate': 5.380577427821523e-06, 'epoch': 2.95} 74%|███████▍ | 1162/1572 [2:25:11<49:26, 7.24s/it] 74%|███████▍ | 1163/1572 [2:25:17<48:12, 7.07s/it] {'loss': 0.606, 'learning_rate': 5.36745406824147e-06, 'epoch': 2.95} 74%|███████▍ | 1163/1572 [2:25:17<48:12, 7.07s/it] 74%|███████▍ | 1164/1572 [2:25:25<49:13, 7.24s/it] {'loss': 0.6268, 'learning_rate': 5.354330708661418e-06, 'epoch': 2.96} 74%|███████▍ | 1164/1572 [2:25:25<49:13, 7.24s/it] 74%|███████▍ | 1165/1572 [2:25:32<49:22, 7.28s/it] {'loss': 0.7007, 'learning_rate': 5.341207349081365e-06, 'epoch': 2.96} 74%|███████▍ | 1165/1572 [2:25:32<49:22, 7.28s/it] 74%|███████▍ | 1166/1572 [2:25:40<49:22, 7.30s/it] {'loss': 0.6645, 'learning_rate': 5.328083989501312e-06, 'epoch': 2.96} 74%|███████▍ | 1166/1572 [2:25:40<49:22, 7.30s/it] 74%|███████▍ | 1167/1572 [2:25:47<49:26, 7.32s/it] {'loss': 0.6981, 'learning_rate': 5.314960629921261e-06, 'epoch': 2.96} 74%|███████▍ | 1167/1572 [2:25:47<49:26, 7.32s/it] 74%|███████▍ | 1168/1572 [2:25:54<48:53, 7.26s/it] {'loss': 0.5873, 'learning_rate': 5.301837270341208e-06, 'epoch': 2.97} 74%|███████▍ | 1168/1572 [2:25:54<48:53, 7.26s/it] 74%|███████▍ | 1169/1572 [2:26:01<48:17, 7.19s/it] {'loss': 0.6321, 'learning_rate': 5.288713910761155e-06, 'epoch': 2.97} 74%|███████▍ | 1169/1572 [2:26:01<48:17, 7.19s/it] 74%|███████▍ | 1170/1572 [2:26:08<47:22, 7.07s/it] {'loss': 0.6274, 'learning_rate': 5.2755905511811025e-06, 'epoch': 2.97} 74%|███████▍ | 1170/1572 [2:26:08<47:22, 7.07s/it] 74%|███████▍ | 1171/1572 [2:26:16<49:12, 7.36s/it] {'loss': 0.6425, 'learning_rate': 5.2624671916010505e-06, 'epoch': 2.97} 74%|███████▍ | 1171/1572 [2:26:16<49:12, 7.36s/it] 75%|███████▍ | 1172/1572 [2:26:23<48:18, 7.25s/it] {'loss': 0.6479, 'learning_rate': 5.2493438320209976e-06, 'epoch': 2.98} 75%|███████▍ | 1172/1572 [2:26:23<48:18, 7.25s/it] 75%|███████▍ | 1173/1572 [2:26:30<47:51, 7.20s/it] {'loss': 0.6338, 'learning_rate': 5.236220472440945e-06, 'epoch': 2.98} 75%|███████▍ | 1173/1572 [2:26:30<47:51, 7.20s/it] 75%|███████▍ | 1174/1572 [2:26:37<47:21, 7.14s/it] {'loss': 0.6026, 'learning_rate': 5.2230971128608935e-06, 'epoch': 2.98} 75%|███████▍ | 1174/1572 [2:26:37<47:21, 7.14s/it] 75%|███████▍ | 1175/1572 [2:26:44<46:50, 7.08s/it] {'loss': 0.6891, 'learning_rate': 5.209973753280841e-06, 'epoch': 2.98} 75%|███████▍ | 1175/1572 [2:26:44<46:50, 7.08s/it] 75%|███████▍ | 1176/1572 [2:26:52<49:11, 7.45s/it] {'loss': 0.6816, 'learning_rate': 5.196850393700788e-06, 'epoch': 2.99} 75%|███████▍ | 1176/1572 [2:26:52<49:11, 7.45s/it] 75%|███████▍ | 1177/1572 [2:26:59<48:14, 7.33s/it] {'loss': 0.5745, 'learning_rate': 5.183727034120736e-06, 'epoch': 2.99} 75%|███████▍ | 1177/1572 [2:26:59<48:14, 7.33s/it] 75%|███████▍ | 1178/1572 [2:27:06<47:20, 7.21s/it] {'loss': 0.621, 'learning_rate': 5.170603674540683e-06, 'epoch': 2.99} 75%|███████▍ | 1178/1572 [2:27:06<47:20, 7.21s/it] 75%|███████▌ | 1179/1572 [2:27:13<47:18, 7.22s/it] {'loss': 0.5757, 'learning_rate': 5.15748031496063e-06, 'epoch': 2.99} 75%|███████▌ | 1179/1572 [2:27:13<47:18, 7.22s/it] 75%|███████▌ | 1180/1572 [2:27:21<48:14, 7.38s/it] {'loss': 0.6507, 'learning_rate': 5.144356955380579e-06, 'epoch': 3.0} 75%|███████▌ | 1180/1572 [2:27:21<48:14, 7.38s/it] 75%|███████▌ | 1181/1572 [2:27:28<47:34, 7.30s/it] {'loss': 0.6384, 'learning_rate': 5.131233595800526e-06, 'epoch': 3.0} 75%|███████▌ | 1181/1572 [2:27:28<47:34, 7.30s/it][WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 21:47:22,973 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 already exists and is non-empty.Saving will proceed but saved results may be invalid. [INFO|trainer.py:2889] 2024-07-08 21:47:45,977 >> Saving model checkpoint to ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181 [INFO|tokenization_utils_base.py:2432] 2024-07-08 21:47:47,218 >> tokenizer config file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-07-08 21:47:47,228 >> Special tokens file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1181/special_tokens_map.json 75%|███████▌ | 1182/1572 [2:29:39<4:47:33, 44.24s/it] {'loss': 0.5381, 'learning_rate': 5.118110236220473e-06, 'epoch': 3.0} 75%|███████▌ | 1182/1572 [2:29:39<4:47:33, 44.24s/it] 75%|███████▌ | 1183/1572 [2:29:46<3:35:24, 33.22s/it] {'loss': 0.6693, 'learning_rate': 5.104986876640421e-06, 'epoch': 3.0} 75%|███████▌ | 1183/1572 [2:29:46<3:35:24, 33.22s/it] 75%|███████▌ | 1184/1572 [2:29:53<2:44:09, 25.38s/it] {'loss': 0.6092, 'learning_rate': 5.091863517060368e-06, 'epoch': 3.01} 75%|███████▌ | 1184/1572 [2:29:53<2:44:09, 25.38s/it] 75%|███████▌ | 1185/1572 [2:30:01<2:09:00, 20.00s/it] {'loss': 0.5909, 'learning_rate': 5.078740157480315e-06, 'epoch': 3.01} 75%|███████▌ | 1185/1572 [2:30:01<2:09:00, 20.00s/it] 75%|███████▌ | 1186/1572 [2:30:08<1:43:03, 16.02s/it] {'loss': 0.5797, 'learning_rate': 5.065616797900262e-06, 'epoch': 3.01} 75%|███████▌ | 1186/1572 [2:30:08<1:43:03, 16.02s/it] 76%|███████▌ | 1187/1572 [2:30:14<1:24:49, 13.22s/it] {'loss': 0.5953, 'learning_rate': 5.05249343832021e-06, 'epoch': 3.01} 76%|███████▌ | 1187/1572 [2:30:14<1:24:49, 13.22s/it] 76%|███████▌ | 1188/1572 [2:30:21<1:12:51, 11.38s/it] {'loss': 0.6111, 'learning_rate': 5.039370078740158e-06, 'epoch': 3.02} 76%|███████▌ | 1188/1572 [2:30:21<1:12:51, 11.38s/it] 76%|███████▌ | 1189/1572 [2:30:28<1:04:24, 10.09s/it] {'loss': 0.6314, 'learning_rate': 5.026246719160105e-06, 'epoch': 3.02} 76%|███████▌ | 1189/1572 [2:30:28<1:04:24, 10.09s/it] 76%|███████▌ | 1190/1572 [2:30:36<58:55, 9.25s/it] {'loss': 0.5515, 'learning_rate': 5.013123359580053e-06, 'epoch': 3.02} 76%|███████▌ | 1190/1572 [2:30:36<58:55, 9.25s/it] 76%|███████▌ | 1191/1572 [2:30:43<54:16, 8.55s/it] {'loss': 0.5821, 'learning_rate': 5e-06, 'epoch': 3.02} 76%|███████▌ | 1191/1572 [2:30:43<54:16, 8.55s/it] 76%|███████▌ | 1192/1572 [2:30:50<51:47, 8.18s/it] {'loss': 0.5963, 'learning_rate': 4.986876640419948e-06, 'epoch': 3.03} 76%|███████▌ | 1192/1572 [2:30:50<51:47, 8.18s/it] 76%|███████▌ | 1193/1572 [2:30:57<49:49, 7.89s/it] {'loss': 0.5521, 'learning_rate': 4.9737532808398955e-06, 'epoch': 3.03} 76%|███████▌ | 1193/1572 [2:30:57<49:49, 7.89s/it] 76%|███████▌ | 1194/1572 [2:31:04<47:43, 7.57s/it] {'loss': 0.6068, 'learning_rate': 4.960629921259843e-06, 'epoch': 3.03} 76%|███████▌ | 1194/1572 [2:31:04<47:43, 7.57s/it] 76%|███████▌ | 1195/1572 [2:31:11<46:35, 7.42s/it] {'loss': 0.5339, 'learning_rate': 4.9475065616797906e-06, 'epoch': 3.03} 76%|███████▌ | 1195/1572 [2:31:11<46:35, 7.42s/it] 76%|███████▌ | 1196/1572 [2:31:18<46:26, 7.41s/it] {'loss': 0.5205, 'learning_rate': 4.934383202099738e-06, 'epoch': 3.04} 76%|███████▌ | 1196/1572 [2:31:18<46:26, 7.41s/it] 76%|███████▌ | 1197/1572 [2:31:26<45:58, 7.36s/it] {'loss': 0.5731, 'learning_rate': 4.921259842519686e-06, 'epoch': 3.04} 76%|███████▌ | 1197/1572 [2:31:26<45:58, 7.36s/it] 76%|███████▌ | 1198/1572 [2:31:34<46:58, 7.54s/it] {'loss': 0.5512, 'learning_rate': 4.908136482939633e-06, 'epoch': 3.04} 76%|███████▌ | 1198/1572 [2:31:34<46:58, 7.54s/it] 76%|███████▋ | 1199/1572 [2:31:41<46:25, 7.47s/it] {'loss': 0.6023, 'learning_rate': 4.895013123359581e-06, 'epoch': 3.04} 76%|███████▋ | 1199/1572 [2:31:41<46:25, 7.47s/it] 76%|███████▋ | 1200/1572 [2:31:48<46:02, 7.42s/it] {'loss': 0.635, 'learning_rate': 4.881889763779528e-06, 'epoch': 3.05} 76%|███████▋ | 1200/1572 [2:31:48<46:02, 7.42s/it] 76%|███████▋ | 1201/1572 [2:31:55<45:08, 7.30s/it] {'loss': 0.6287, 'learning_rate': 4.868766404199475e-06, 'epoch': 3.05} 76%|███████▋ | 1201/1572 [2:31:55<45:08, 7.30s/it] 76%|███████▋ | 1202/1572 [2:32:02<44:24, 7.20s/it] {'loss': 0.6664, 'learning_rate': 4.855643044619423e-06, 'epoch': 3.05} 76%|███████▋ | 1202/1572 [2:32:02<44:24, 7.20s/it] 77%|███████▋ | 1203/1572 [2:32:10<44:56, 7.31s/it] {'loss': 0.6193, 'learning_rate': 4.84251968503937e-06, 'epoch': 3.05} 77%|███████▋ | 1203/1572 [2:32:10<44:56, 7.31s/it] 77%|███████▋ | 1204/1572 [2:32:17<44:36, 7.27s/it] {'loss': 0.7184, 'learning_rate': 4.829396325459318e-06, 'epoch': 3.06} 77%|███████▋ | 1204/1572 [2:32:17<44:36, 7.27s/it] 77%|███████▋ | 1205/1572 [2:32:24<43:18, 7.08s/it] {'loss': 0.6296, 'learning_rate': 4.816272965879266e-06, 'epoch': 3.06} 77%|███████▋ | 1205/1572 [2:32:24<43:18, 7.08s/it] 77%|███████▋ | 1206/1572 [2:32:31<44:03, 7.22s/it] {'loss': 0.7193, 'learning_rate': 4.803149606299213e-06, 'epoch': 3.06} 77%|███████▋ | 1206/1572 [2:32:31<44:03, 7.22s/it] 77%|███████▋ | 1207/1572 [2:32:39<44:14, 7.27s/it] {'loss': 0.571, 'learning_rate': 4.79002624671916e-06, 'epoch': 3.06} 77%|███████▋ | 1207/1572 [2:32:39<44:14, 7.27s/it] 77%|███████▋ | 1208/1572 [2:32:46<43:40, 7.20s/it] {'loss': 0.6051, 'learning_rate': 4.776902887139108e-06, 'epoch': 3.07} 77%|███████▋ | 1208/1572 [2:32:46<43:40, 7.20s/it] 77%|███████▋ | 1209/1572 [2:32:52<42:39, 7.05s/it] {'loss': 0.5848, 'learning_rate': 4.763779527559055e-06, 'epoch': 3.07} 77%|███████▋ | 1209/1572 [2:32:52<42:39, 7.05s/it] 77%|███████▋ | 1210/1572 [2:32:59<42:34, 7.06s/it] {'loss': 0.6236, 'learning_rate': 4.750656167979003e-06, 'epoch': 3.07} 77%|███████▋ | 1210/1572 [2:32:59<42:34, 7.06s/it] 77%|███████▋ | 1211/1572 [2:33:06<42:23, 7.04s/it] {'loss': 0.6097, 'learning_rate': 4.73753280839895e-06, 'epoch': 3.07} 77%|███████▋ | 1211/1572 [2:33:06<42:23, 7.04s/it] 77%|███████▋ | 1212/1572 [2:33:14<42:28, 7.08s/it] {'loss': 0.6099, 'learning_rate': 4.724409448818898e-06, 'epoch': 3.08} 77%|███████▋ | 1212/1572 [2:33:14<42:28, 7.08s/it] 77%|███████▋ | 1213/1572 [2:33:21<42:44, 7.14s/it] {'loss': 0.578, 'learning_rate': 4.7112860892388454e-06, 'epoch': 3.08} 77%|███████▋ | 1213/1572 [2:33:21<42:44, 7.14s/it] 77%|███████▋ | 1214/1572 [2:33:28<43:16, 7.25s/it] {'loss': 0.6556, 'learning_rate': 4.6981627296587926e-06, 'epoch': 3.08} 77%|███████▋ | 1214/1572 [2:33:28<43:16, 7.25s/it] 77%|███████▋ | 1215/1572 [2:33:35<42:02, 7.06s/it] {'loss': 0.6387, 'learning_rate': 4.6850393700787405e-06, 'epoch': 3.08} 77%|███████▋ | 1215/1572 [2:33:35<42:02, 7.06s/it] 77%|███████▋ | 1216/1572 [2:33:43<43:26, 7.32s/it] {'loss': 0.6708, 'learning_rate': 4.6719160104986885e-06, 'epoch': 3.09} 77%|███████▋ | 1216/1572 [2:33:43<43:26, 7.32s/it] 77%|███████▋ | 1217/1572 [2:33:50<43:33, 7.36s/it] {'loss': 0.5391, 'learning_rate': 4.658792650918636e-06, 'epoch': 3.09} 77%|███████▋ | 1217/1572 [2:33:50<43:33, 7.36s/it] 77%|███████▋ | 1218/1572 [2:33:58<43:35, 7.39s/it] {'loss': 0.6061, 'learning_rate': 4.645669291338583e-06, 'epoch': 3.09} 77%|███████▋ | 1218/1572 [2:33:58<43:35, 7.39s/it] 78%|███████▊ | 1219/1572 [2:34:05<42:56, 7.30s/it] {'loss': 0.5698, 'learning_rate': 4.632545931758531e-06, 'epoch': 3.09} 78%|███████▊ | 1219/1572 [2:34:05<42:56, 7.30s/it] 78%|███████▊ | 1220/1572 [2:34:12<42:57, 7.32s/it] {'loss': 0.5522, 'learning_rate': 4.619422572178478e-06, 'epoch': 3.1} 78%|███████▊ | 1220/1572 [2:34:12<42:57, 7.32s/it] 78%|███████▊ | 1221/1572 [2:34:19<41:57, 7.17s/it] {'loss': 0.6396, 'learning_rate': 4.606299212598426e-06, 'epoch': 3.1} 78%|███████▊ | 1221/1572 [2:34:19<41:57, 7.17s/it] 78%|███████▊ | 1222/1572 [2:34:27<42:22, 7.26s/it] {'loss': 0.7344, 'learning_rate': 4.593175853018373e-06, 'epoch': 3.1} 78%|███████▊ | 1222/1572 [2:34:27<42:22, 7.26s/it] 78%|███████▊ | 1223/1572 [2:34:33<41:18, 7.10s/it] {'loss': 0.5797, 'learning_rate': 4.580052493438321e-06, 'epoch': 3.11} 78%|███████▊ | 1223/1572 [2:34:33<41:18, 7.10s/it] 78%|███████▊ | 1224/1572 [2:34:41<42:18, 7.30s/it] {'loss': 0.6183, 'learning_rate': 4.566929133858268e-06, 'epoch': 3.11} 78%|███████▊ | 1224/1572 [2:34:41<42:18, 7.30s/it] 78%|███████▊ | 1225/1572 [2:34:48<42:06, 7.28s/it] {'loss': 0.5866, 'learning_rate': 4.553805774278215e-06, 'epoch': 3.11} 78%|███████▊ | 1225/1572 [2:34:48<42:06, 7.28s/it] 78%|███████▊ | 1226/1572 [2:34:56<42:53, 7.44s/it] {'loss': 0.6462, 'learning_rate': 4.540682414698163e-06, 'epoch': 3.11} 78%|███████▊ | 1226/1572 [2:34:56<42:53, 7.44s/it] 78%|███████▊ | 1227/1572 [2:35:04<44:08, 7.68s/it] {'loss': 0.5587, 'learning_rate': 4.52755905511811e-06, 'epoch': 3.12} 78%|███████▊ | 1227/1572 [2:35:04<44:08, 7.68s/it] 78%|███████▊ | 1228/1572 [2:35:12<43:46, 7.63s/it] {'loss': 0.6201, 'learning_rate': 4.514435695538058e-06, 'epoch': 3.12} 78%|███████▊ | 1228/1572 [2:35:12<43:46, 7.63s/it] 78%|███████▊ | 1229/1572 [2:35:19<42:52, 7.50s/it] {'loss': 0.6224, 'learning_rate': 4.501312335958006e-06, 'epoch': 3.12} 78%|███████▊ | 1229/1572 [2:35:19<42:52, 7.50s/it] 78%|███████▊ | 1230/1572 [2:35:26<42:03, 7.38s/it] {'loss': 0.5492, 'learning_rate': 4.488188976377953e-06, 'epoch': 3.12} 78%|███████▊ | 1230/1572 [2:35:26<42:03, 7.38s/it] 78%|███████▊ | 1231/1572 [2:35:33<40:39, 7.15s/it] {'loss': 0.6344, 'learning_rate': 4.4750656167979e-06, 'epoch': 3.13} 78%|███████▊ | 1231/1572 [2:35:33<40:39, 7.15s/it] 78%|███████▊ | 1232/1572 [2:35:40<40:53, 7.22s/it] {'loss': 0.6005, 'learning_rate': 4.461942257217848e-06, 'epoch': 3.13} 78%|███████▊ | 1232/1572 [2:35:40<40:53, 7.22s/it] 78%|███████▊ | 1233/1572 [2:35:47<40:08, 7.11s/it] {'loss': 0.6069, 'learning_rate': 4.448818897637795e-06, 'epoch': 3.13} 78%|███████▊ | 1233/1572 [2:35:47<40:08, 7.11s/it] 78%|███████▊ | 1234/1572 [2:35:55<40:57, 7.27s/it] {'loss': 0.6872, 'learning_rate': 4.435695538057743e-06, 'epoch': 3.13} 78%|███████▊ | 1234/1572 [2:35:55<40:57, 7.27s/it] 79%|███████▊ | 1235/1572 [2:36:02<40:49, 7.27s/it] {'loss': 0.6643, 'learning_rate': 4.4225721784776905e-06, 'epoch': 3.14} 79%|███████▊ | 1235/1572 [2:36:02<40:49, 7.27s/it] 79%|███████▊ | 1236/1572 [2:36:09<40:43, 7.27s/it] {'loss': 0.5951, 'learning_rate': 4.4094488188976384e-06, 'epoch': 3.14} 79%|███████▊ | 1236/1572 [2:36:09<40:43, 7.27s/it] 79%|███████▊ | 1237/1572 [2:36:17<40:57, 7.34s/it] {'loss': 0.6105, 'learning_rate': 4.3963254593175856e-06, 'epoch': 3.14} 79%|███████▊ | 1237/1572 [2:36:17<40:57, 7.34s/it] 79%|███████▉ | 1238/1572 [2:36:24<40:12, 7.22s/it] {'loss': 0.5205, 'learning_rate': 4.383202099737533e-06, 'epoch': 3.14} 79%|███████▉ | 1238/1572 [2:36:24<40:12, 7.22s/it] 79%|███████▉ | 1239/1572 [2:36:31<39:36, 7.14s/it] {'loss': 0.6237, 'learning_rate': 4.370078740157481e-06, 'epoch': 3.15} 79%|███████▉ | 1239/1572 [2:36:31<39:36, 7.14s/it] 79%|███████▉ | 1240/1572 [2:36:38<39:53, 7.21s/it] {'loss': 0.5476, 'learning_rate': 4.356955380577429e-06, 'epoch': 3.15} 79%|███████▉ | 1240/1572 [2:36:38<39:53, 7.21s/it] 79%|███████▉ | 1241/1572 [2:36:46<41:08, 7.46s/it] {'loss': 0.7175, 'learning_rate': 4.343832020997376e-06, 'epoch': 3.15} 79%|███████▉ | 1241/1572 [2:36:46<41:08, 7.46s/it] 79%|███████▉ | 1242/1572 [2:36:53<39:42, 7.22s/it] {'loss': 0.6419, 'learning_rate': 4.330708661417324e-06, 'epoch': 3.15} 79%|███████▉ | 1242/1572 [2:36:53<39:42, 7.22s/it] 79%|███████▉ | 1243/1572 [2:37:00<40:03, 7.31s/it] {'loss': 0.6084, 'learning_rate': 4.317585301837271e-06, 'epoch': 3.16} 79%|███████▉ | 1243/1572 [2:37:00<40:03, 7.31s/it] 79%|███████▉ | 1244/1572 [2:37:08<40:48, 7.47s/it] {'loss': 0.5893, 'learning_rate': 4.304461942257218e-06, 'epoch': 3.16} 79%|███████▉ | 1244/1572 [2:37:08<40:48, 7.47s/it] 79%|███████▉ | 1245/1572 [2:37:16<41:18, 7.58s/it] {'loss': 0.5489, 'learning_rate': 4.291338582677166e-06, 'epoch': 3.16} 79%|███████▉ | 1245/1572 [2:37:16<41:18, 7.58s/it] 79%|███████▉ | 1246/1572 [2:37:23<41:07, 7.57s/it] {'loss': 0.6424, 'learning_rate': 4.278215223097113e-06, 'epoch': 3.16} 79%|███████▉ | 1246/1572 [2:37:23<41:07, 7.57s/it] 79%|███████▉ | 1247/1572 [2:37:30<39:31, 7.30s/it] {'loss': 0.6146, 'learning_rate': 4.265091863517061e-06, 'epoch': 3.17} 79%|███████▉ | 1247/1572 [2:37:30<39:31, 7.30s/it] 79%|███████▉ | 1248/1572 [2:37:37<39:03, 7.23s/it] {'loss': 0.6359, 'learning_rate': 4.251968503937008e-06, 'epoch': 3.17} 79%|███████▉ | 1248/1572 [2:37:37<39:03, 7.23s/it] 79%|███████▉ | 1249/1572 [2:37:45<39:16, 7.29s/it] {'loss': 0.6106, 'learning_rate': 4.238845144356955e-06, 'epoch': 3.17} 79%|███████▉ | 1249/1572 [2:37:45<39:16, 7.29s/it] 80%|███████▉ | 1250/1572 [2:37:52<38:58, 7.26s/it] {'loss': 0.6153, 'learning_rate': 4.225721784776903e-06, 'epoch': 3.17} 80%|███████▉ | 1250/1572 [2:37:52<38:58, 7.26s/it] 80%|███████▉ | 1251/1572 [2:37:59<38:36, 7.22s/it] {'loss': 0.5912, 'learning_rate': 4.21259842519685e-06, 'epoch': 3.18} 80%|███████▉ | 1251/1572 [2:37:59<38:36, 7.22s/it] 80%|███████▉ | 1252/1572 [2:38:06<38:43, 7.26s/it] {'loss': 0.6647, 'learning_rate': 4.199475065616798e-06, 'epoch': 3.18} 80%|███████▉ | 1252/1572 [2:38:06<38:43, 7.26s/it] 80%|███████▉ | 1253/1572 [2:38:13<38:15, 7.20s/it] {'loss': 0.6085, 'learning_rate': 4.186351706036746e-06, 'epoch': 3.18} 80%|███████▉ | 1253/1572 [2:38:13<38:15, 7.20s/it] 80%|███████▉ | 1254/1572 [2:38:21<38:27, 7.26s/it] {'loss': 0.528, 'learning_rate': 4.173228346456693e-06, 'epoch': 3.18} 80%|███████▉ | 1254/1572 [2:38:21<38:27, 7.26s/it] 80%|███████▉ | 1255/1572 [2:38:28<39:10, 7.41s/it] {'loss': 0.5937, 'learning_rate': 4.1601049868766404e-06, 'epoch': 3.19} 80%|███████▉ | 1255/1572 [2:38:28<39:10, 7.41s/it] 80%|███████▉ | 1256/1572 [2:38:35<38:22, 7.29s/it] {'loss': 0.605, 'learning_rate': 4.146981627296588e-06, 'epoch': 3.19} 80%|███████▉ | 1256/1572 [2:38:35<38:22, 7.29s/it] 80%|███████▉ | 1257/1572 [2:38:43<38:40, 7.37s/it] {'loss': 0.5645, 'learning_rate': 4.1338582677165355e-06, 'epoch': 3.19} 80%|███████▉ | 1257/1572 [2:38:43<38:40, 7.37s/it] 80%|████████ | 1258/1572 [2:38:51<39:04, 7.47s/it] {'loss': 0.6515, 'learning_rate': 4.1207349081364835e-06, 'epoch': 3.19} 80%|████████ | 1258/1572 [2:38:51<39:04, 7.47s/it] 80%|████████ | 1259/1572 [2:38:58<38:50, 7.45s/it] {'loss': 0.582, 'learning_rate': 4.107611548556431e-06, 'epoch': 3.2} 80%|████████ | 1259/1572 [2:38:58<38:50, 7.45s/it] 80%|████████ | 1260/1572 [2:39:05<37:49, 7.27s/it] {'loss': 0.6275, 'learning_rate': 4.0944881889763785e-06, 'epoch': 3.2} 80%|████████ | 1260/1572 [2:39:05<37:49, 7.27s/it] 80%|████████ | 1261/1572 [2:39:12<37:44, 7.28s/it] {'loss': 0.5371, 'learning_rate': 4.081364829396326e-06, 'epoch': 3.2} 80%|████████ | 1261/1572 [2:39:12<37:44, 7.28s/it] 80%|████████ | 1262/1572 [2:39:19<37:22, 7.23s/it] {'loss': 0.6456, 'learning_rate': 4.068241469816273e-06, 'epoch': 3.2} 80%|████████ | 1262/1572 [2:39:19<37:22, 7.23s/it] 80%|████████ | 1263/1572 [2:39:27<38:00, 7.38s/it] {'loss': 0.6005, 'learning_rate': 4.055118110236221e-06, 'epoch': 3.21} 80%|████████ | 1263/1572 [2:39:27<38:00, 7.38s/it] 80%|████████ | 1264/1572 [2:39:34<36:45, 7.16s/it] {'loss': 0.5733, 'learning_rate': 4.041994750656169e-06, 'epoch': 3.21} 80%|████████ | 1264/1572 [2:39:34<36:45, 7.16s/it] 80%|████████ | 1265/1572 [2:39:41<37:19, 7.29s/it] {'loss': 0.6131, 'learning_rate': 4.028871391076116e-06, 'epoch': 3.21} 80%|████████ | 1265/1572 [2:39:41<37:19, 7.29s/it] 81%|████████ | 1266/1572 [2:39:49<37:45, 7.40s/it] {'loss': 0.639, 'learning_rate': 4.015748031496064e-06, 'epoch': 3.21} 81%|████████ | 1266/1572 [2:39:49<37:45, 7.40s/it] 81%|████████ | 1267/1572 [2:39:56<37:15, 7.33s/it] {'loss': 0.583, 'learning_rate': 4.002624671916011e-06, 'epoch': 3.22} 81%|████████ | 1267/1572 [2:39:56<37:15, 7.33s/it] 81%|████████ | 1268/1572 [2:40:03<36:50, 7.27s/it] {'loss': 0.6704, 'learning_rate': 3.989501312335958e-06, 'epoch': 3.22} 81%|████████ | 1268/1572 [2:40:03<36:50, 7.27s/it] 81%|████████ | 1269/1572 [2:40:11<37:24, 7.41s/it] {'loss': 0.6779, 'learning_rate': 3.976377952755906e-06, 'epoch': 3.22} 81%|████████ | 1269/1572 [2:40:11<37:24, 7.41s/it] 81%|████████ | 1270/1572 [2:40:18<36:41, 7.29s/it] {'loss': 0.5783, 'learning_rate': 3.963254593175853e-06, 'epoch': 3.22} 81%|████████ | 1270/1572 [2:40:18<36:41, 7.29s/it] 81%|████████ | 1271/1572 [2:40:25<35:52, 7.15s/it] {'loss': 0.545, 'learning_rate': 3.950131233595801e-06, 'epoch': 3.23} 81%|████████ | 1271/1572 [2:40:25<35:52, 7.15s/it] 81%|████████ | 1272/1572 [2:40:32<35:10, 7.04s/it] {'loss': 0.5147, 'learning_rate': 3.937007874015748e-06, 'epoch': 3.23} 81%|████████ | 1272/1572 [2:40:32<35:10, 7.04s/it] 81%|████████ | 1273/1572 [2:40:39<35:58, 7.22s/it] {'loss': 0.667, 'learning_rate': 3.923884514435696e-06, 'epoch': 3.23} 81%|████████ | 1273/1572 [2:40:39<35:58, 7.22s/it] 81%|████████ | 1274/1572 [2:40:47<36:29, 7.35s/it] {'loss': 0.6262, 'learning_rate': 3.910761154855643e-06, 'epoch': 3.23} 81%|████████ | 1274/1572 [2:40:47<36:29, 7.35s/it] 81%|████████ | 1275/1572 [2:40:54<36:40, 7.41s/it] {'loss': 0.6188, 'learning_rate': 3.89763779527559e-06, 'epoch': 3.24} 81%|████████ | 1275/1572 [2:40:54<36:40, 7.41s/it] 81%|████████ | 1276/1572 [2:41:03<37:42, 7.64s/it] {'loss': 0.6738, 'learning_rate': 3.884514435695538e-06, 'epoch': 3.24} 81%|████████ | 1276/1572 [2:41:03<37:42, 7.64s/it] 81%|████████ | 1277/1572 [2:41:10<37:11, 7.56s/it] {'loss': 0.5153, 'learning_rate': 3.871391076115486e-06, 'epoch': 3.24} 81%|████████ | 1277/1572 [2:41:10<37:11, 7.56s/it] 81%|████████▏ | 1278/1572 [2:41:17<36:30, 7.45s/it] {'loss': 0.5038, 'learning_rate': 3.858267716535433e-06, 'epoch': 3.24} 81%|████████▏ | 1278/1572 [2:41:17<36:30, 7.45s/it] 81%|████████▏ | 1279/1572 [2:41:24<35:46, 7.32s/it] {'loss': 0.5774, 'learning_rate': 3.8451443569553805e-06, 'epoch': 3.25} 81%|████████▏ | 1279/1572 [2:41:24<35:46, 7.32s/it] 81%|████████▏ | 1280/1572 [2:41:31<34:42, 7.13s/it] {'loss': 0.6238, 'learning_rate': 3.8320209973753285e-06, 'epoch': 3.25} 81%|████████▏ | 1280/1572 [2:41:31<34:42, 7.13s/it] 81%|████████▏ | 1281/1572 [2:41:38<34:45, 7.17s/it] {'loss': 0.5444, 'learning_rate': 3.818897637795276e-06, 'epoch': 3.25} 81%|████████▏ | 1281/1572 [2:41:38<34:45, 7.17s/it] 82%|████████▏ | 1282/1572 [2:41:45<34:26, 7.13s/it] {'loss': 0.6276, 'learning_rate': 3.8057742782152236e-06, 'epoch': 3.25} 82%|████████▏ | 1282/1572 [2:41:45<34:26, 7.13s/it] 82%|████████▏ | 1283/1572 [2:41:53<34:55, 7.25s/it] {'loss': 0.6282, 'learning_rate': 3.7926509186351707e-06, 'epoch': 3.26} 82%|████████▏ | 1283/1572 [2:41:53<34:55, 7.25s/it] 82%|████████▏ | 1284/1572 [2:42:00<34:57, 7.28s/it] {'loss': 0.6081, 'learning_rate': 3.7795275590551182e-06, 'epoch': 3.26} 82%|████████▏ | 1284/1572 [2:42:00<34:57, 7.28s/it] 82%|████████▏ | 1285/1572 [2:42:07<34:29, 7.21s/it] {'loss': 0.6392, 'learning_rate': 3.766404199475066e-06, 'epoch': 3.26} 82%|████████▏ | 1285/1572 [2:42:07<34:29, 7.21s/it] 82%|████████▏ | 1286/1572 [2:42:14<33:37, 7.05s/it] {'loss': 0.579, 'learning_rate': 3.7532808398950133e-06, 'epoch': 3.26} 82%|████████▏ | 1286/1572 [2:42:14<33:37, 7.05s/it] 82%|████████▏ | 1287/1572 [2:42:21<33:42, 7.10s/it] {'loss': 0.698, 'learning_rate': 3.740157480314961e-06, 'epoch': 3.27} 82%|████████▏ | 1287/1572 [2:42:21<33:42, 7.10s/it] 82%|████████▏ | 1288/1572 [2:42:28<33:34, 7.09s/it] {'loss': 0.5969, 'learning_rate': 3.727034120734909e-06, 'epoch': 3.27} 82%|████████▏ | 1288/1572 [2:42:28<33:34, 7.09s/it] 82%|████████▏ | 1289/1572 [2:42:35<33:12, 7.04s/it] {'loss': 0.6102, 'learning_rate': 3.713910761154856e-06, 'epoch': 3.27} 82%|████████▏ | 1289/1572 [2:42:35<33:12, 7.04s/it] 82%|████████▏ | 1290/1572 [2:42:42<33:14, 7.07s/it] {'loss': 0.5752, 'learning_rate': 3.7007874015748035e-06, 'epoch': 3.28} 82%|████████▏ | 1290/1572 [2:42:42<33:14, 7.07s/it] 82%|████████▏ | 1291/1572 [2:42:49<33:04, 7.06s/it] {'loss': 0.6185, 'learning_rate': 3.6876640419947506e-06, 'epoch': 3.28} 82%|████████▏ | 1291/1572 [2:42:49<33:04, 7.06s/it] 82%|████████▏ | 1292/1572 [2:42:57<33:35, 7.20s/it] {'loss': 0.6071, 'learning_rate': 3.6745406824146986e-06, 'epoch': 3.28} 82%|████████▏ | 1292/1572 [2:42:57<33:35, 7.20s/it] 82%|████████▏ | 1293/1572 [2:43:04<33:01, 7.10s/it] {'loss': 0.576, 'learning_rate': 3.661417322834646e-06, 'epoch': 3.28} 82%|████████▏ | 1293/1572 [2:43:04<33:01, 7.10s/it] 82%|████████▏ | 1294/1572 [2:43:12<33:57, 7.33s/it] {'loss': 0.6004, 'learning_rate': 3.6482939632545932e-06, 'epoch': 3.29} 82%|████████▏ | 1294/1572 [2:43:12<33:57, 7.33s/it] 82%|████████▏ | 1295/1572 [2:43:19<34:33, 7.49s/it] {'loss': 0.5846, 'learning_rate': 3.635170603674541e-06, 'epoch': 3.29} 82%|████████▏ | 1295/1572 [2:43:19<34:33, 7.49s/it] 82%|████████▏ | 1296/1572 [2:43:27<34:53, 7.59s/it] {'loss': 0.6752, 'learning_rate': 3.6220472440944887e-06, 'epoch': 3.29} 82%|████████▏ | 1296/1572 [2:43:27<34:53, 7.59s/it] 83%|████████▎ | 1297/1572 [2:43:34<34:23, 7.50s/it] {'loss': 0.5583, 'learning_rate': 3.608923884514436e-06, 'epoch': 3.29} 83%|████████▎ | 1297/1572 [2:43:34<34:23, 7.50s/it] 83%|████████▎ | 1298/1572 [2:43:41<33:06, 7.25s/it] {'loss': 0.6503, 'learning_rate': 3.595800524934384e-06, 'epoch': 3.3} 83%|████████▎ | 1298/1572 [2:43:41<33:06, 7.25s/it] 83%|████████▎ | 1299/1572 [2:43:49<33:27, 7.35s/it] {'loss': 0.6422, 'learning_rate': 3.582677165354331e-06, 'epoch': 3.3} 83%|████████▎ | 1299/1572 [2:43:49<33:27, 7.35s/it] 83%|████████▎ | 1300/1572 [2:43:55<32:23, 7.14s/it] {'loss': 0.6243, 'learning_rate': 3.5695538057742785e-06, 'epoch': 3.3} 83%|████████▎ | 1300/1572 [2:43:55<32:23, 7.14s/it] 83%|████████▎ | 1301/1572 [2:44:03<32:15, 7.14s/it] {'loss': 0.6539, 'learning_rate': 3.556430446194226e-06, 'epoch': 3.3} 83%|████████▎ | 1301/1572 [2:44:03<32:15, 7.14s/it] 83%|████████▎ | 1302/1572 [2:44:10<32:26, 7.21s/it] {'loss': 0.62, 'learning_rate': 3.5433070866141735e-06, 'epoch': 3.31} 83%|████████▎ | 1302/1572 [2:44:10<32:26, 7.21s/it] 83%|████████▎ | 1303/1572 [2:44:17<32:16, 7.20s/it] {'loss': 0.5681, 'learning_rate': 3.530183727034121e-06, 'epoch': 3.31} 83%|████████▎ | 1303/1572 [2:44:17<32:16, 7.20s/it] 83%|████████▎ | 1304/1572 [2:44:25<32:52, 7.36s/it] {'loss': 0.5639, 'learning_rate': 3.5170603674540686e-06, 'epoch': 3.31} 83%|████████▎ | 1304/1572 [2:44:25<32:52, 7.36s/it] 83%|████████▎ | 1305/1572 [2:44:32<32:33, 7.31s/it] {'loss': 0.6487, 'learning_rate': 3.5039370078740157e-06, 'epoch': 3.31} 83%|████████▎ | 1305/1572 [2:44:32<32:33, 7.31s/it] 83%|████████▎ | 1306/1572 [2:44:39<31:45, 7.16s/it] {'loss': 0.5617, 'learning_rate': 3.4908136482939637e-06, 'epoch': 3.32} 83%|████████▎ | 1306/1572 [2:44:39<31:45, 7.16s/it] 83%|████████▎ | 1307/1572 [2:44:46<31:27, 7.12s/it] {'loss': 0.5928, 'learning_rate': 3.477690288713911e-06, 'epoch': 3.32} 83%|████████▎ | 1307/1572 [2:44:46<31:27, 7.12s/it] 83%|████████▎ | 1308/1572 [2:44:53<32:01, 7.28s/it] {'loss': 0.6927, 'learning_rate': 3.4645669291338583e-06, 'epoch': 3.32} 83%|████████▎ | 1308/1572 [2:44:53<32:01, 7.28s/it] 83%|████████▎ | 1309/1572 [2:45:01<31:42, 7.23s/it] {'loss': 0.5844, 'learning_rate': 3.4514435695538063e-06, 'epoch': 3.32} 83%|████████▎ | 1309/1572 [2:45:01<31:42, 7.23s/it] 83%|████████▎ | 1310/1572 [2:45:08<31:24, 7.19s/it] {'loss': 0.5415, 'learning_rate': 3.4383202099737534e-06, 'epoch': 3.33} 83%|████████▎ | 1310/1572 [2:45:08<31:24, 7.19s/it] 83%|████████▎ | 1311/1572 [2:45:15<31:00, 7.13s/it] {'loss': 0.5315, 'learning_rate': 3.425196850393701e-06, 'epoch': 3.33} 83%|████████▎ | 1311/1572 [2:45:15<31:00, 7.13s/it] 83%|████████▎ | 1312/1572 [2:45:22<31:06, 7.18s/it] {'loss': 0.5704, 'learning_rate': 3.412073490813649e-06, 'epoch': 3.33} 83%|████████▎ | 1312/1572 [2:45:22<31:06, 7.18s/it] 84%|████████▎ | 1313/1572 [2:45:29<30:27, 7.06s/it] {'loss': 0.515, 'learning_rate': 3.398950131233596e-06, 'epoch': 3.33} 84%|████████▎ | 1313/1572 [2:45:29<30:27, 7.06s/it] 84%|████████▎ | 1314/1572 [2:45:37<31:20, 7.29s/it] {'loss': 0.6536, 'learning_rate': 3.3858267716535436e-06, 'epoch': 3.34} 84%|████████▎ | 1314/1572 [2:45:37<31:20, 7.29s/it] 84%|████████▎ | 1315/1572 [2:45:44<30:53, 7.21s/it] {'loss': 0.5739, 'learning_rate': 3.3727034120734907e-06, 'epoch': 3.34} 84%|████████▎ | 1315/1572 [2:45:44<30:53, 7.21s/it] 84%|████████▎ | 1316/1572 [2:45:51<30:34, 7.17s/it] {'loss': 0.5631, 'learning_rate': 3.3595800524934387e-06, 'epoch': 3.34} 84%|████████▎ | 1316/1572 [2:45:51<30:34, 7.17s/it] 84%|████████▍ | 1317/1572 [2:45:58<30:29, 7.18s/it] {'loss': 0.5802, 'learning_rate': 3.346456692913386e-06, 'epoch': 3.34} 84%|████████▍ | 1317/1572 [2:45:58<30:29, 7.18s/it] 84%|████████▍ | 1318/1572 [2:46:05<30:44, 7.26s/it] {'loss': 0.6491, 'learning_rate': 3.3333333333333333e-06, 'epoch': 3.35} 84%|████████▍ | 1318/1572 [2:46:05<30:44, 7.26s/it] 84%|████████▍ | 1319/1572 [2:46:13<30:42, 7.28s/it] {'loss': 0.5736, 'learning_rate': 3.3202099737532813e-06, 'epoch': 3.35} 84%|████████▍ | 1319/1572 [2:46:13<30:42, 7.28s/it] 84%|████████▍ | 1320/1572 [2:46:20<30:21, 7.23s/it] {'loss': 0.6119, 'learning_rate': 3.307086614173229e-06, 'epoch': 3.35} 84%|████████▍ | 1320/1572 [2:46:20<30:21, 7.23s/it] 84%|████████▍ | 1321/1572 [2:46:27<29:59, 7.17s/it] {'loss': 0.6322, 'learning_rate': 3.293963254593176e-06, 'epoch': 3.35} 84%|████████▍ | 1321/1572 [2:46:27<29:59, 7.17s/it] 84%|████████▍ | 1322/1572 [2:46:34<30:00, 7.20s/it] {'loss': 0.5798, 'learning_rate': 3.280839895013124e-06, 'epoch': 3.36} 84%|████████▍ | 1322/1572 [2:46:34<30:00, 7.20s/it] 84%|████████▍ | 1323/1572 [2:46:41<29:27, 7.10s/it] {'loss': 0.565, 'learning_rate': 3.267716535433071e-06, 'epoch': 3.36} 84%|████████▍ | 1323/1572 [2:46:41<29:27, 7.10s/it] 84%|████████▍ | 1324/1572 [2:46:48<29:08, 7.05s/it] {'loss': 0.5726, 'learning_rate': 3.2545931758530186e-06, 'epoch': 3.36} 84%|████████▍ | 1324/1572 [2:46:48<29:08, 7.05s/it] 84%|████████▍ | 1325/1572 [2:46:54<28:27, 6.91s/it] {'loss': 0.6044, 'learning_rate': 3.241469816272966e-06, 'epoch': 3.36} 84%|████████▍ | 1325/1572 [2:46:54<28:27, 6.91s/it] 84%|████████▍ | 1326/1572 [2:47:01<28:07, 6.86s/it] {'loss': 0.5637, 'learning_rate': 3.2283464566929136e-06, 'epoch': 3.37} 84%|████████▍ | 1326/1572 [2:47:01<28:07, 6.86s/it] 84%|████████▍ | 1327/1572 [2:47:09<28:34, 7.00s/it] {'loss': 0.5935, 'learning_rate': 3.215223097112861e-06, 'epoch': 3.37} 84%|████████▍ | 1327/1572 [2:47:09<28:34, 7.00s/it] 84%|████████▍ | 1328/1572 [2:47:16<28:28, 7.00s/it] {'loss': 0.5824, 'learning_rate': 3.2020997375328087e-06, 'epoch': 3.37} 84%|████████▍ | 1328/1572 [2:47:16<28:28, 7.00s/it] 85%|████████▍ | 1329/1572 [2:47:22<27:52, 6.88s/it] {'loss': 0.594, 'learning_rate': 3.1889763779527563e-06, 'epoch': 3.37} 85%|████████▍ | 1329/1572 [2:47:22<27:52, 6.88s/it] 85%|████████▍ | 1330/1572 [2:47:29<27:31, 6.83s/it] {'loss': 0.5998, 'learning_rate': 3.175853018372704e-06, 'epoch': 3.38} 85%|████████▍ | 1330/1572 [2:47:29<27:31, 6.83s/it] 85%|████████▍ | 1331/1572 [2:47:36<28:24, 7.07s/it] {'loss': 0.5911, 'learning_rate': 3.162729658792651e-06, 'epoch': 3.38} 85%|████████▍ | 1331/1572 [2:47:36<28:24, 7.07s/it] 85%|████████▍ | 1332/1572 [2:47:44<28:29, 7.12s/it] {'loss': 0.6149, 'learning_rate': 3.1496062992125985e-06, 'epoch': 3.38} 85%|████████▍ | 1332/1572 [2:47:44<28:29, 7.12s/it] 85%|████████▍ | 1333/1572 [2:47:51<28:50, 7.24s/it] {'loss': 0.6125, 'learning_rate': 3.1364829396325464e-06, 'epoch': 3.38} 85%|████████▍ | 1333/1572 [2:47:51<28:50, 7.24s/it] 85%|████████▍ | 1334/1572 [2:47:59<28:59, 7.31s/it] {'loss': 0.7193, 'learning_rate': 3.1233595800524935e-06, 'epoch': 3.39} 85%|████████▍ | 1334/1572 [2:47:59<28:59, 7.31s/it] 85%|████████▍ | 1335/1572 [2:48:05<28:12, 7.14s/it] {'loss': 0.57, 'learning_rate': 3.110236220472441e-06, 'epoch': 3.39} 85%|████████▍ | 1335/1572 [2:48:05<28:12, 7.14s/it] 85%|████████▍ | 1336/1572 [2:48:13<28:15, 7.19s/it] {'loss': 0.625, 'learning_rate': 3.097112860892389e-06, 'epoch': 3.39} 85%|████████▍ | 1336/1572 [2:48:13<28:15, 7.19s/it] 85%|████████▌ | 1337/1572 [2:48:20<28:04, 7.17s/it] {'loss': 0.6525, 'learning_rate': 3.083989501312336e-06, 'epoch': 3.39} 85%|████████▌ | 1337/1572 [2:48:20<28:04, 7.17s/it] 85%|████████▌ | 1338/1572 [2:48:27<27:52, 7.15s/it] {'loss': 0.5988, 'learning_rate': 3.0708661417322837e-06, 'epoch': 3.4} 85%|████████▌ | 1338/1572 [2:48:27<27:52, 7.15s/it] 85%|████████▌ | 1339/1572 [2:48:34<27:48, 7.16s/it] {'loss': 0.6858, 'learning_rate': 3.057742782152231e-06, 'epoch': 3.4} 85%|████████▌ | 1339/1572 [2:48:34<27:48, 7.16s/it] 85%|████████▌ | 1340/1572 [2:48:42<28:38, 7.41s/it] {'loss': 0.6876, 'learning_rate': 3.0446194225721788e-06, 'epoch': 3.4} 85%|████████▌ | 1340/1572 [2:48:42<28:38, 7.41s/it] 85%|████████▌ | 1341/1572 [2:48:49<28:13, 7.33s/it] {'loss': 0.5405, 'learning_rate': 3.0314960629921263e-06, 'epoch': 3.4} 85%|████████▌ | 1341/1572 [2:48:49<28:13, 7.33s/it] 85%|████████▌ | 1342/1572 [2:48:56<27:45, 7.24s/it] {'loss': 0.5811, 'learning_rate': 3.0183727034120734e-06, 'epoch': 3.41} 85%|████████▌ | 1342/1572 [2:48:56<27:45, 7.24s/it] 85%|████████▌ | 1343/1572 [2:49:03<27:28, 7.20s/it] {'loss': 0.6261, 'learning_rate': 3.0052493438320214e-06, 'epoch': 3.41} 85%|████████▌ | 1343/1572 [2:49:03<27:28, 7.20s/it] 85%|████████▌ | 1344/1572 [2:49:11<27:46, 7.31s/it] {'loss': 0.6444, 'learning_rate': 2.992125984251969e-06, 'epoch': 3.41} 85%|████████▌ | 1344/1572 [2:49:11<27:46, 7.31s/it] 86%|████████▌ | 1345/1572 [2:49:19<28:06, 7.43s/it] {'loss': 0.6008, 'learning_rate': 2.979002624671916e-06, 'epoch': 3.41} 86%|████████▌ | 1345/1572 [2:49:19<28:06, 7.43s/it] 86%|████████▌ | 1346/1572 [2:49:27<28:32, 7.58s/it] {'loss': 0.6031, 'learning_rate': 2.965879265091864e-06, 'epoch': 3.42} 86%|████████▌ | 1346/1572 [2:49:27<28:32, 7.58s/it] 86%|████████▌ | 1347/1572 [2:49:34<28:18, 7.55s/it] {'loss': 0.636, 'learning_rate': 2.952755905511811e-06, 'epoch': 3.42} 86%|████████▌ | 1347/1572 [2:49:34<28:18, 7.55s/it] 86%|████████▌ | 1348/1572 [2:49:42<28:36, 7.66s/it] {'loss': 0.5578, 'learning_rate': 2.9396325459317587e-06, 'epoch': 3.42} 86%|████████▌ | 1348/1572 [2:49:42<28:36, 7.66s/it] 86%|████████▌ | 1349/1572 [2:49:49<27:48, 7.48s/it] {'loss': 0.6241, 'learning_rate': 2.9265091863517066e-06, 'epoch': 3.42} 86%|████████▌ | 1349/1572 [2:49:49<27:48, 7.48s/it] 86%|████████▌ | 1350/1572 [2:49:56<26:35, 7.19s/it] {'loss': 0.6523, 'learning_rate': 2.9133858267716538e-06, 'epoch': 3.43} 86%|████████▌ | 1350/1572 [2:49:56<26:35, 7.19s/it] 86%|████████▌ | 1351/1572 [2:50:03<26:33, 7.21s/it] {'loss': 0.6231, 'learning_rate': 2.9002624671916013e-06, 'epoch': 3.43} 86%|████████▌ | 1351/1572 [2:50:03<26:33, 7.21s/it] 86%|████████▌ | 1352/1572 [2:50:10<26:41, 7.28s/it] {'loss': 0.6905, 'learning_rate': 2.887139107611549e-06, 'epoch': 3.43} 86%|████████▌ | 1352/1572 [2:50:10<26:41, 7.28s/it] 86%|████████▌ | 1353/1572 [2:50:17<26:08, 7.16s/it] {'loss': 0.5796, 'learning_rate': 2.8740157480314964e-06, 'epoch': 3.44} 86%|████████▌ | 1353/1572 [2:50:17<26:08, 7.16s/it] 86%|████████▌ | 1354/1572 [2:50:25<26:43, 7.36s/it] {'loss': 0.6134, 'learning_rate': 2.860892388451444e-06, 'epoch': 3.44} 86%|████████▌ | 1354/1572 [2:50:25<26:43, 7.36s/it] 86%|████████▌ | 1355/1572 [2:50:32<26:22, 7.29s/it] {'loss': 0.6189, 'learning_rate': 2.847769028871391e-06, 'epoch': 3.44} 86%|████████▌ | 1355/1572 [2:50:32<26:22, 7.29s/it] 86%|████████▋ | 1356/1572 [2:50:39<25:59, 7.22s/it] {'loss': 0.6193, 'learning_rate': 2.8346456692913386e-06, 'epoch': 3.44} 86%|████████▋ | 1356/1572 [2:50:39<25:59, 7.22s/it] 86%|████████▋ | 1357/1572 [2:50:46<25:49, 7.20s/it] {'loss': 0.5601, 'learning_rate': 2.8215223097112865e-06, 'epoch': 3.45} 86%|████████▋ | 1357/1572 [2:50:46<25:49, 7.20s/it] 86%|████████▋ | 1358/1572 [2:50:53<25:35, 7.18s/it] {'loss': 0.5667, 'learning_rate': 2.8083989501312337e-06, 'epoch': 3.45} 86%|████████▋ | 1358/1572 [2:50:53<25:35, 7.18s/it] 86%|████████▋ | 1359/1572 [2:51:01<25:25, 7.16s/it] {'loss': 0.6043, 'learning_rate': 2.795275590551181e-06, 'epoch': 3.45} 86%|████████▋ | 1359/1572 [2:51:01<25:25, 7.16s/it] 87%|████████▋ | 1360/1572 [2:51:08<25:05, 7.10s/it] {'loss': 0.6244, 'learning_rate': 2.782152230971129e-06, 'epoch': 3.45} 87%|████████▋ | 1360/1572 [2:51:08<25:05, 7.10s/it] 87%|████████▋ | 1361/1572 [2:51:15<25:05, 7.14s/it] {'loss': 0.6286, 'learning_rate': 2.7690288713910763e-06, 'epoch': 3.46} 87%|████████▋ | 1361/1572 [2:51:15<25:05, 7.14s/it] 87%|████████▋ | 1362/1572 [2:51:22<25:29, 7.28s/it] {'loss': 0.6662, 'learning_rate': 2.755905511811024e-06, 'epoch': 3.46} 87%|████████▋ | 1362/1572 [2:51:22<25:29, 7.28s/it] 87%|████████▋ | 1363/1572 [2:51:30<25:19, 7.27s/it] {'loss': 0.5494, 'learning_rate': 2.742782152230971e-06, 'epoch': 3.46} 87%|████████▋ | 1363/1572 [2:51:30<25:19, 7.27s/it] 87%|████████▋ | 1364/1572 [2:51:37<25:33, 7.37s/it] {'loss': 0.688, 'learning_rate': 2.729658792650919e-06, 'epoch': 3.46} 87%|████████▋ | 1364/1572 [2:51:37<25:33, 7.37s/it] 87%|████████▋ | 1365/1572 [2:51:44<24:41, 7.16s/it] {'loss': 0.5471, 'learning_rate': 2.7165354330708664e-06, 'epoch': 3.47} 87%|████████▋ | 1365/1572 [2:51:44<24:41, 7.16s/it] 87%|████████▋ | 1366/1572 [2:51:51<24:29, 7.13s/it] {'loss': 0.5926, 'learning_rate': 2.7034120734908135e-06, 'epoch': 3.47} 87%|████████▋ | 1366/1572 [2:51:51<24:29, 7.13s/it] 87%|████████▋ | 1367/1572 [2:51:59<24:57, 7.30s/it] {'loss': 0.6642, 'learning_rate': 2.6902887139107615e-06, 'epoch': 3.47} 87%|████████▋ | 1367/1572 [2:51:59<24:57, 7.30s/it] 87%|████████▋ | 1368/1572 [2:52:06<24:44, 7.28s/it] {'loss': 0.5687, 'learning_rate': 2.677165354330709e-06, 'epoch': 3.47} 87%|████████▋ | 1368/1572 [2:52:06<24:44, 7.28s/it] 87%|████████▋ | 1369/1572 [2:52:13<24:54, 7.36s/it] {'loss': 0.6278, 'learning_rate': 2.664041994750656e-06, 'epoch': 3.48} 87%|████████▋ | 1369/1572 [2:52:13<24:54, 7.36s/it] 87%|████████▋ | 1370/1572 [2:52:21<24:53, 7.39s/it] {'loss': 0.6126, 'learning_rate': 2.650918635170604e-06, 'epoch': 3.48} 87%|████████▋ | 1370/1572 [2:52:21<24:53, 7.39s/it] 87%|████████▋ | 1371/1572 [2:52:28<24:18, 7.26s/it] {'loss': 0.5179, 'learning_rate': 2.6377952755905512e-06, 'epoch': 3.48} 87%|████████▋ | 1371/1572 [2:52:28<24:18, 7.26s/it] 87%|████████▋ | 1372/1572 [2:52:35<24:02, 7.21s/it] {'loss': 0.579, 'learning_rate': 2.6246719160104988e-06, 'epoch': 3.48} 87%|████████▋ | 1372/1572 [2:52:35<24:02, 7.21s/it] 87%|████████▋ | 1373/1572 [2:52:42<24:09, 7.28s/it] {'loss': 0.6222, 'learning_rate': 2.6115485564304468e-06, 'epoch': 3.49} 87%|████████▋ | 1373/1572 [2:52:42<24:09, 7.28s/it] 87%|████████▋ | 1374/1572 [2:52:50<24:07, 7.31s/it] {'loss': 0.645, 'learning_rate': 2.598425196850394e-06, 'epoch': 3.49} 87%|████████▋ | 1374/1572 [2:52:50<24:07, 7.31s/it] 87%|████████▋ | 1375/1572 [2:52:58<24:32, 7.48s/it] {'loss': 0.6468, 'learning_rate': 2.5853018372703414e-06, 'epoch': 3.49} 87%|████████▋ | 1375/1572 [2:52:58<24:32, 7.48s/it] 88%|████████▊ | 1376/1572 [2:53:05<24:27, 7.49s/it] {'loss': 0.6632, 'learning_rate': 2.5721784776902894e-06, 'epoch': 3.49} 88%|████████▊ | 1376/1572 [2:53:05<24:27, 7.49s/it] 88%|████████▊ | 1377/1572 [2:53:13<24:18, 7.48s/it] {'loss': 0.6044, 'learning_rate': 2.5590551181102365e-06, 'epoch': 3.5} 88%|████████▊ | 1377/1572 [2:53:13<24:18, 7.48s/it] 88%|████████▊ | 1378/1572 [2:53:20<23:56, 7.41s/it] {'loss': 0.6571, 'learning_rate': 2.545931758530184e-06, 'epoch': 3.5} 88%|████████▊ | 1378/1572 [2:53:20<23:56, 7.41s/it] 88%|████████▊ | 1379/1572 [2:53:27<23:36, 7.34s/it] {'loss': 0.5599, 'learning_rate': 2.532808398950131e-06, 'epoch': 3.5} 88%|████████▊ | 1379/1572 [2:53:27<23:36, 7.34s/it] 88%|████████▊ | 1380/1572 [2:53:35<23:41, 7.41s/it] {'loss': 0.6641, 'learning_rate': 2.519685039370079e-06, 'epoch': 3.5} 88%|████████▊ | 1380/1572 [2:53:35<23:41, 7.41s/it] 88%|████████▊ | 1381/1572 [2:53:42<23:28, 7.38s/it] {'loss': 0.5726, 'learning_rate': 2.5065616797900266e-06, 'epoch': 3.51} 88%|████████▊ | 1381/1572 [2:53:42<23:28, 7.38s/it] 88%|████████▊ | 1382/1572 [2:53:49<23:29, 7.42s/it] {'loss': 0.5685, 'learning_rate': 2.493438320209974e-06, 'epoch': 3.51} 88%|████████▊ | 1382/1572 [2:53:49<23:29, 7.42s/it] 88%|████████▊ | 1383/1572 [2:53:57<23:16, 7.39s/it] {'loss': 0.5564, 'learning_rate': 2.4803149606299213e-06, 'epoch': 3.51} 88%|████████▊ | 1383/1572 [2:53:57<23:16, 7.39s/it] 88%|████████▊ | 1384/1572 [2:54:03<22:27, 7.17s/it] {'loss': 0.6384, 'learning_rate': 2.467191601049869e-06, 'epoch': 3.51} 88%|████████▊ | 1384/1572 [2:54:03<22:27, 7.17s/it] 88%|████████▊ | 1385/1572 [2:54:11<22:18, 7.16s/it] {'loss': 0.6037, 'learning_rate': 2.4540682414698164e-06, 'epoch': 3.52} 88%|████████▊ | 1385/1572 [2:54:11<22:18, 7.16s/it] 88%|████████▊ | 1386/1572 [2:54:18<21:59, 7.10s/it] {'loss': 0.5677, 'learning_rate': 2.440944881889764e-06, 'epoch': 3.52} 88%|████████▊ | 1386/1572 [2:54:18<21:59, 7.10s/it] 88%|████████▊ | 1387/1572 [2:54:25<21:49, 7.08s/it] {'loss': 0.6128, 'learning_rate': 2.4278215223097115e-06, 'epoch': 3.52} 88%|████████▊ | 1387/1572 [2:54:25<21:49, 7.08s/it] 88%|████████▊ | 1388/1572 [2:54:32<22:26, 7.32s/it] {'loss': 0.626, 'learning_rate': 2.414698162729659e-06, 'epoch': 3.52} 88%|████████▊ | 1388/1572 [2:54:32<22:26, 7.32s/it] 88%|████████▊ | 1389/1572 [2:54:40<22:17, 7.31s/it] {'loss': 0.5141, 'learning_rate': 2.4015748031496065e-06, 'epoch': 3.53} 88%|████████▊ | 1389/1572 [2:54:40<22:17, 7.31s/it] 88%|████████▊ | 1390/1572 [2:54:47<22:25, 7.39s/it] {'loss': 0.6525, 'learning_rate': 2.388451443569554e-06, 'epoch': 3.53} 88%|████████▊ | 1390/1572 [2:54:47<22:25, 7.39s/it] 88%|████████▊ | 1391/1572 [2:54:54<22:01, 7.30s/it] {'loss': 0.5761, 'learning_rate': 2.3753280839895016e-06, 'epoch': 3.53} 88%|████████▊ | 1391/1572 [2:54:54<22:01, 7.30s/it] 89%|████████▊ | 1392/1572 [2:55:02<22:14, 7.41s/it] {'loss': 0.6983, 'learning_rate': 2.362204724409449e-06, 'epoch': 3.53} 89%|████████▊ | 1392/1572 [2:55:02<22:14, 7.41s/it] 89%|████████▊ | 1393/1572 [2:55:10<22:35, 7.57s/it] {'loss': 0.5376, 'learning_rate': 2.3490813648293963e-06, 'epoch': 3.54} 89%|████████▊ | 1393/1572 [2:55:10<22:35, 7.57s/it] 89%|████████▊ | 1394/1572 [2:55:18<22:27, 7.57s/it] {'loss': 0.6287, 'learning_rate': 2.3359580052493442e-06, 'epoch': 3.54} 89%|████████▊ | 1394/1572 [2:55:18<22:27, 7.57s/it] 89%|████████▊ | 1395/1572 [2:55:25<21:56, 7.44s/it] {'loss': 0.6647, 'learning_rate': 2.3228346456692914e-06, 'epoch': 3.54} 89%|████████▊ | 1395/1572 [2:55:25<21:56, 7.44s/it] 89%|████████▉ | 1396/1572 [2:55:32<22:02, 7.52s/it] {'loss': 0.7124, 'learning_rate': 2.309711286089239e-06, 'epoch': 3.54} 89%|████████▉ | 1396/1572 [2:55:32<22:02, 7.52s/it] 89%|████████▉ | 1397/1572 [2:55:39<21:31, 7.38s/it] {'loss': 0.6219, 'learning_rate': 2.2965879265091864e-06, 'epoch': 3.55} 89%|████████▉ | 1397/1572 [2:55:39<21:31, 7.38s/it] 89%|████████▉ | 1398/1572 [2:55:46<21:02, 7.26s/it] {'loss': 0.641, 'learning_rate': 2.283464566929134e-06, 'epoch': 3.55} 89%|████████▉ | 1398/1572 [2:55:46<21:02, 7.26s/it] 89%|████████▉ | 1399/1572 [2:55:54<20:53, 7.24s/it] {'loss': 0.5652, 'learning_rate': 2.2703412073490815e-06, 'epoch': 3.55} 89%|████████▉ | 1399/1572 [2:55:54<20:53, 7.24s/it] 89%|████████▉ | 1400/1572 [2:56:01<20:39, 7.21s/it] {'loss': 0.5909, 'learning_rate': 2.257217847769029e-06, 'epoch': 3.55} 89%|████████▉ | 1400/1572 [2:56:01<20:39, 7.21s/it] 89%|████████▉ | 1401/1572 [2:56:08<20:07, 7.06s/it] {'loss': 0.6209, 'learning_rate': 2.2440944881889766e-06, 'epoch': 3.56} 89%|████████▉ | 1401/1572 [2:56:08<20:07, 7.06s/it] 89%|████████▉ | 1402/1572 [2:56:15<20:21, 7.18s/it] {'loss': 0.6429, 'learning_rate': 2.230971128608924e-06, 'epoch': 3.56} 89%|████████▉ | 1402/1572 [2:56:15<20:21, 7.18s/it] 89%|████████▉ | 1403/1572 [2:56:22<20:12, 7.17s/it] {'loss': 0.6605, 'learning_rate': 2.2178477690288717e-06, 'epoch': 3.56} 89%|████████▉ | 1403/1572 [2:56:22<20:12, 7.17s/it] 89%|████████▉ | 1404/1572 [2:56:30<20:56, 7.48s/it] {'loss': 0.7196, 'learning_rate': 2.2047244094488192e-06, 'epoch': 3.56} 89%|████████▉ | 1404/1572 [2:56:30<20:56, 7.48s/it] 89%|████████▉ | 1405/1572 [2:56:38<20:42, 7.44s/it] {'loss': 0.637, 'learning_rate': 2.1916010498687663e-06, 'epoch': 3.57} 89%|████████▉ | 1405/1572 [2:56:38<20:42, 7.44s/it] 89%|████████▉ | 1406/1572 [2:56:45<20:53, 7.55s/it] {'loss': 0.6716, 'learning_rate': 2.1784776902887143e-06, 'epoch': 3.57} 89%|████████▉ | 1406/1572 [2:56:45<20:53, 7.55s/it] 90%|████████▉ | 1407/1572 [2:56:53<20:32, 7.47s/it] {'loss': 0.5949, 'learning_rate': 2.165354330708662e-06, 'epoch': 3.57} 90%|████████▉ | 1407/1572 [2:56:53<20:32, 7.47s/it] 90%|████████▉ | 1408/1572 [2:57:00<20:25, 7.47s/it] {'loss': 0.571, 'learning_rate': 2.152230971128609e-06, 'epoch': 3.57} 90%|████████▉ | 1408/1572 [2:57:00<20:25, 7.47s/it] 90%|████████▉ | 1409/1572 [2:57:07<19:46, 7.28s/it] {'loss': 0.6095, 'learning_rate': 2.1391076115485565e-06, 'epoch': 3.58} 90%|████████▉ | 1409/1572 [2:57:07<19:46, 7.28s/it] 90%|████████▉ | 1410/1572 [2:57:14<19:39, 7.28s/it] {'loss': 0.6222, 'learning_rate': 2.125984251968504e-06, 'epoch': 3.58} 90%|████████▉ | 1410/1572 [2:57:14<19:39, 7.28s/it] 90%|████████▉ | 1411/1572 [2:57:21<19:11, 7.15s/it] {'loss': 0.5718, 'learning_rate': 2.1128608923884516e-06, 'epoch': 3.58} 90%|████████▉ | 1411/1572 [2:57:21<19:11, 7.15s/it] 90%|████████▉ | 1412/1572 [2:57:28<19:02, 7.14s/it] {'loss': 0.6395, 'learning_rate': 2.099737532808399e-06, 'epoch': 3.58} 90%|████████▉ | 1412/1572 [2:57:28<19:02, 7.14s/it] 90%|████████▉ | 1413/1572 [2:57:36<19:16, 7.27s/it] {'loss': 0.5932, 'learning_rate': 2.0866141732283467e-06, 'epoch': 3.59} 90%|████████▉ | 1413/1572 [2:57:36<19:16, 7.27s/it] 90%|████████▉ | 1414/1572 [2:57:42<18:33, 7.05s/it] {'loss': 0.5273, 'learning_rate': 2.073490813648294e-06, 'epoch': 3.59} 90%|████████▉ | 1414/1572 [2:57:42<18:33, 7.05s/it] 90%|█████████ | 1415/1572 [2:57:49<18:23, 7.03s/it] {'loss': 0.5639, 'learning_rate': 2.0603674540682417e-06, 'epoch': 3.59} 90%|█████████ | 1415/1572 [2:57:49<18:23, 7.03s/it] 90%|█████████ | 1416/1572 [2:57:57<18:36, 7.16s/it] {'loss': 0.6417, 'learning_rate': 2.0472440944881893e-06, 'epoch': 3.6} 90%|█████████ | 1416/1572 [2:57:57<18:36, 7.16s/it] 90%|█████████ | 1417/1572 [2:58:04<18:21, 7.11s/it] {'loss': 0.5556, 'learning_rate': 2.0341207349081364e-06, 'epoch': 3.6} 90%|█████████ | 1417/1572 [2:58:04<18:21, 7.11s/it] 90%|█████████ | 1418/1572 [2:58:11<18:24, 7.17s/it] {'loss': 0.5647, 'learning_rate': 2.0209973753280844e-06, 'epoch': 3.6} 90%|█████████ | 1418/1572 [2:58:11<18:24, 7.17s/it] 90%|█████████ | 1419/1572 [2:58:19<18:43, 7.34s/it] {'loss': 0.5799, 'learning_rate': 2.007874015748032e-06, 'epoch': 3.6} 90%|█████████ | 1419/1572 [2:58:19<18:43, 7.34s/it] 90%|█████████ | 1420/1572 [2:58:26<18:38, 7.36s/it] {'loss': 0.5283, 'learning_rate': 1.994750656167979e-06, 'epoch': 3.61} 90%|█████████ | 1420/1572 [2:58:26<18:38, 7.36s/it] 90%|█████████ | 1421/1572 [2:58:33<18:19, 7.28s/it] {'loss': 0.5321, 'learning_rate': 1.9816272965879266e-06, 'epoch': 3.61} 90%|█████████ | 1421/1572 [2:58:33<18:19, 7.28s/it] 90%|█████████ | 1422/1572 [2:58:41<18:24, 7.36s/it] {'loss': 0.6185, 'learning_rate': 1.968503937007874e-06, 'epoch': 3.61} 90%|█████████ | 1422/1572 [2:58:41<18:24, 7.36s/it] 91%|█████████ | 1423/1572 [2:58:48<17:53, 7.21s/it] {'loss': 0.5786, 'learning_rate': 1.9553805774278216e-06, 'epoch': 3.61} 91%|█████████ | 1423/1572 [2:58:48<17:53, 7.21s/it] 91%|█████████ | 1424/1572 [2:58:55<17:41, 7.17s/it] {'loss': 0.5717, 'learning_rate': 1.942257217847769e-06, 'epoch': 3.62} 91%|█████████ | 1424/1572 [2:58:55<17:41, 7.17s/it] 91%|█████████ | 1425/1572 [2:59:02<17:40, 7.21s/it] {'loss': 0.6168, 'learning_rate': 1.9291338582677167e-06, 'epoch': 3.62} 91%|█████████ | 1425/1572 [2:59:02<17:40, 7.21s/it] 91%|█████████ | 1426/1572 [2:59:09<17:31, 7.20s/it] {'loss': 0.5913, 'learning_rate': 1.9160104986876642e-06, 'epoch': 3.62} 91%|█████████ | 1426/1572 [2:59:09<17:31, 7.20s/it] 91%|█████████ | 1427/1572 [2:59:17<17:33, 7.26s/it] {'loss': 0.6777, 'learning_rate': 1.9028871391076118e-06, 'epoch': 3.62} 91%|█████████ | 1427/1572 [2:59:17<17:33, 7.26s/it] 91%|█████████ | 1428/1572 [2:59:24<17:18, 7.21s/it] {'loss': 0.6172, 'learning_rate': 1.8897637795275591e-06, 'epoch': 3.63} 91%|█████████ | 1428/1572 [2:59:24<17:18, 7.21s/it] 91%|█████████ | 1429/1572 [2:59:30<16:44, 7.03s/it] {'loss': 0.5928, 'learning_rate': 1.8766404199475067e-06, 'epoch': 3.63} 91%|█████████ | 1429/1572 [2:59:30<16:44, 7.03s/it] 91%|█████████ | 1430/1572 [2:59:38<17:13, 7.28s/it] {'loss': 0.6762, 'learning_rate': 1.8635170603674544e-06, 'epoch': 3.63} 91%|█████████ | 1430/1572 [2:59:38<17:13, 7.28s/it] 91%|█████████ | 1431/1572 [2:59:45<16:35, 7.06s/it] {'loss': 0.6074, 'learning_rate': 1.8503937007874017e-06, 'epoch': 3.63} 91%|█████████ | 1431/1572 [2:59:45<16:35, 7.06s/it] 91%|█████████ | 1432/1572 [2:59:52<16:36, 7.12s/it] {'loss': 0.5603, 'learning_rate': 1.8372703412073493e-06, 'epoch': 3.64} 91%|█████████ | 1432/1572 [2:59:52<16:36, 7.12s/it] 91%|█████████ | 1433/1572 [2:59:59<16:15, 7.02s/it] {'loss': 0.5717, 'learning_rate': 1.8241469816272966e-06, 'epoch': 3.64} 91%|█████████ | 1433/1572 [2:59:59<16:15, 7.02s/it] 91%|█████████ | 1434/1572 [3:00:06<16:17, 7.08s/it] {'loss': 0.643, 'learning_rate': 1.8110236220472444e-06, 'epoch': 3.64} 91%|█████████ | 1434/1572 [3:00:06<16:17, 7.08s/it] 91%|█████████▏| 1435/1572 [3:00:13<16:16, 7.13s/it] {'loss': 0.6208, 'learning_rate': 1.797900262467192e-06, 'epoch': 3.64} 91%|█████████▏| 1435/1572 [3:00:13<16:16, 7.13s/it] 91%|█████████▏| 1436/1572 [3:00:21<16:13, 7.16s/it] {'loss': 0.621, 'learning_rate': 1.7847769028871392e-06, 'epoch': 3.65} 91%|█████████▏| 1436/1572 [3:00:21<16:13, 7.16s/it] 91%|█████████▏| 1437/1572 [3:00:29<16:35, 7.37s/it] {'loss': 0.5176, 'learning_rate': 1.7716535433070868e-06, 'epoch': 3.65} 91%|█████████▏| 1437/1572 [3:00:29<16:35, 7.37s/it] 91%|█████████▏| 1438/1572 [3:00:36<16:39, 7.46s/it] {'loss': 0.6123, 'learning_rate': 1.7585301837270343e-06, 'epoch': 3.65} 91%|█████████▏| 1438/1572 [3:00:36<16:39, 7.46s/it] 92%|█████████▏| 1439/1572 [3:00:43<16:14, 7.33s/it] {'loss': 0.6051, 'learning_rate': 1.7454068241469818e-06, 'epoch': 3.65} 92%|█████████▏| 1439/1572 [3:00:43<16:14, 7.33s/it] 92%|█████████▏| 1440/1572 [3:00:51<16:16, 7.40s/it] {'loss': 0.6126, 'learning_rate': 1.7322834645669292e-06, 'epoch': 3.66} 92%|█████████▏| 1440/1572 [3:00:51<16:16, 7.40s/it] 92%|█████████▏| 1441/1572 [3:00:58<16:05, 7.37s/it] {'loss': 0.6502, 'learning_rate': 1.7191601049868767e-06, 'epoch': 3.66} 92%|█████████▏| 1441/1572 [3:00:58<16:05, 7.37s/it] 92%|█████████▏| 1442/1572 [3:01:05<15:58, 7.38s/it] {'loss': 0.5842, 'learning_rate': 1.7060367454068245e-06, 'epoch': 3.66} 92%|█████████▏| 1442/1572 [3:01:05<15:58, 7.38s/it] 92%|█████████▏| 1443/1572 [3:01:14<16:21, 7.61s/it] {'loss': 0.6322, 'learning_rate': 1.6929133858267718e-06, 'epoch': 3.66} 92%|█████████▏| 1443/1572 [3:01:14<16:21, 7.61s/it] 92%|█████████▏| 1444/1572 [3:01:20<15:40, 7.35s/it] {'loss': 0.598, 'learning_rate': 1.6797900262467193e-06, 'epoch': 3.67} 92%|█████████▏| 1444/1572 [3:01:20<15:40, 7.35s/it] 92%|█████████▏| 1445/1572 [3:01:27<15:16, 7.21s/it] {'loss': 0.5934, 'learning_rate': 1.6666666666666667e-06, 'epoch': 3.67} 92%|█████████▏| 1445/1572 [3:01:27<15:16, 7.21s/it] 92%|█████████▏| 1446/1572 [3:01:35<15:14, 7.25s/it] {'loss': 0.6414, 'learning_rate': 1.6535433070866144e-06, 'epoch': 3.67} 92%|█████████▏| 1446/1572 [3:01:35<15:14, 7.25s/it] 92%|█████████▏| 1447/1572 [3:01:42<15:02, 7.22s/it] {'loss': 0.6002, 'learning_rate': 1.640419947506562e-06, 'epoch': 3.67} 92%|█████████▏| 1447/1572 [3:01:42<15:02, 7.22s/it] 92%|█████████▏| 1448/1572 [3:01:49<14:47, 7.16s/it] {'loss': 0.6249, 'learning_rate': 1.6272965879265093e-06, 'epoch': 3.68} 92%|█████████▏| 1448/1572 [3:01:49<14:47, 7.16s/it] 92%|█████████▏| 1449/1572 [3:01:56<14:35, 7.12s/it] {'loss': 0.6056, 'learning_rate': 1.6141732283464568e-06, 'epoch': 3.68} 92%|█████████▏| 1449/1572 [3:01:56<14:35, 7.12s/it] 92%|█████████▏| 1450/1572 [3:02:03<14:34, 7.17s/it] {'loss': 0.5995, 'learning_rate': 1.6010498687664044e-06, 'epoch': 3.68} 92%|█████████▏| 1450/1572 [3:02:03<14:34, 7.17s/it] 92%|█████████▏| 1451/1572 [3:02:10<14:11, 7.04s/it] {'loss': 0.6305, 'learning_rate': 1.587926509186352e-06, 'epoch': 3.68} 92%|█████████▏| 1451/1572 [3:02:10<14:11, 7.04s/it] 92%|█████████▏| 1452/1572 [3:02:17<13:55, 6.96s/it] {'loss': 0.6167, 'learning_rate': 1.5748031496062992e-06, 'epoch': 3.69} 92%|█████████▏| 1452/1572 [3:02:17<13:55, 6.96s/it] 92%|█████████▏| 1453/1572 [3:02:24<14:06, 7.12s/it] {'loss': 0.6691, 'learning_rate': 1.5616797900262468e-06, 'epoch': 3.69} 92%|█████████▏| 1453/1572 [3:02:24<14:06, 7.12s/it] 92%|█████████▏| 1454/1572 [3:02:31<14:00, 7.12s/it] {'loss': 0.6373, 'learning_rate': 1.5485564304461945e-06, 'epoch': 3.69} 92%|█████████▏| 1454/1572 [3:02:31<14:00, 7.12s/it] 93%|█████████▎| 1455/1572 [3:02:39<14:16, 7.32s/it] {'loss': 0.6748, 'learning_rate': 1.5354330708661418e-06, 'epoch': 3.69} 93%|█████████▎| 1455/1572 [3:02:39<14:16, 7.32s/it] 93%|█████████▎| 1456/1572 [3:02:46<14:06, 7.30s/it] {'loss': 0.5977, 'learning_rate': 1.5223097112860894e-06, 'epoch': 3.7} 93%|█████████▎| 1456/1572 [3:02:46<14:06, 7.30s/it] 93%|█████████▎| 1457/1572 [3:02:53<13:52, 7.24s/it] {'loss': 0.5841, 'learning_rate': 1.5091863517060367e-06, 'epoch': 3.7} 93%|█████████▎| 1457/1572 [3:02:53<13:52, 7.24s/it] 93%|█████████▎| 1458/1572 [3:03:00<13:41, 7.20s/it] {'loss': 0.6255, 'learning_rate': 1.4960629921259845e-06, 'epoch': 3.7} 93%|█████████▎| 1458/1572 [3:03:00<13:41, 7.20s/it] 93%|█████████▎| 1459/1572 [3:03:08<13:43, 7.29s/it] {'loss': 0.6535, 'learning_rate': 1.482939632545932e-06, 'epoch': 3.7} 93%|█████████▎| 1459/1572 [3:03:08<13:43, 7.29s/it] 93%|█████████▎| 1460/1572 [3:03:15<13:25, 7.19s/it] {'loss': 0.6219, 'learning_rate': 1.4698162729658793e-06, 'epoch': 3.71} 93%|█████████▎| 1460/1572 [3:03:15<13:25, 7.19s/it] 93%|█████████▎| 1461/1572 [3:03:22<13:14, 7.16s/it] {'loss': 0.6383, 'learning_rate': 1.4566929133858269e-06, 'epoch': 3.71} 93%|█████████▎| 1461/1572 [3:03:22<13:14, 7.16s/it] 93%|█████████▎| 1462/1572 [3:03:29<13:13, 7.22s/it] {'loss': 0.6382, 'learning_rate': 1.4435695538057744e-06, 'epoch': 3.71} 93%|█████████▎| 1462/1572 [3:03:29<13:13, 7.22s/it] 93%|█████████▎| 1463/1572 [3:03:36<13:04, 7.20s/it] {'loss': 0.5947, 'learning_rate': 1.430446194225722e-06, 'epoch': 3.71} 93%|█████████▎| 1463/1572 [3:03:36<13:04, 7.20s/it] 93%|█████████▎| 1464/1572 [3:03:43<12:44, 7.07s/it] {'loss': 0.585, 'learning_rate': 1.4173228346456693e-06, 'epoch': 3.72} 93%|█████████▎| 1464/1572 [3:03:43<12:44, 7.07s/it] 93%|█████████▎| 1465/1572 [3:03:50<12:27, 6.98s/it] {'loss': 0.6183, 'learning_rate': 1.4041994750656168e-06, 'epoch': 3.72} 93%|█████████▎| 1465/1572 [3:03:50<12:27, 6.98s/it] 93%|█████████▎| 1466/1572 [3:03:57<12:17, 6.96s/it] {'loss': 0.5613, 'learning_rate': 1.3910761154855646e-06, 'epoch': 3.72} 93%|█████████▎| 1466/1572 [3:03:57<12:17, 6.96s/it] 93%|█████████▎| 1467/1572 [3:04:05<12:33, 7.18s/it] {'loss': 0.6758, 'learning_rate': 1.377952755905512e-06, 'epoch': 3.72} 93%|█████████▎| 1467/1572 [3:04:05<12:33, 7.18s/it] 93%|█████████▎| 1468/1572 [3:04:12<12:35, 7.26s/it] {'loss': 0.6109, 'learning_rate': 1.3648293963254594e-06, 'epoch': 3.73} 93%|█████████▎| 1468/1572 [3:04:12<12:35, 7.26s/it] 93%|█████████▎| 1469/1572 [3:04:19<12:18, 7.17s/it] {'loss': 0.6069, 'learning_rate': 1.3517060367454068e-06, 'epoch': 3.73} 93%|█████████▎| 1469/1572 [3:04:19<12:18, 7.17s/it] 94%|█████████▎| 1470/1572 [3:04:26<12:16, 7.22s/it] {'loss': 0.6242, 'learning_rate': 1.3385826771653545e-06, 'epoch': 3.73} 94%|█████████▎| 1470/1572 [3:04:26<12:16, 7.22s/it] 94%|█████████▎| 1471/1572 [3:04:34<12:11, 7.24s/it] {'loss': 0.63, 'learning_rate': 1.325459317585302e-06, 'epoch': 3.73} 94%|█████████▎| 1471/1572 [3:04:34<12:11, 7.24s/it] 94%|█████████▎| 1472/1572 [3:04:42<12:25, 7.45s/it] {'loss': 0.6479, 'learning_rate': 1.3123359580052494e-06, 'epoch': 3.74} 94%|█████████▎| 1472/1572 [3:04:42<12:25, 7.45s/it] 94%|█████████▎| 1473/1572 [3:04:49<12:15, 7.43s/it] {'loss': 0.6132, 'learning_rate': 1.299212598425197e-06, 'epoch': 3.74} 94%|█████████▎| 1473/1572 [3:04:49<12:15, 7.43s/it] 94%|█████████▍| 1474/1572 [3:04:56<11:59, 7.34s/it] {'loss': 0.6606, 'learning_rate': 1.2860892388451447e-06, 'epoch': 3.74} 94%|█████████▍| 1474/1572 [3:04:56<11:59, 7.34s/it] 94%|█████████▍| 1475/1572 [3:05:03<11:45, 7.28s/it] {'loss': 0.6714, 'learning_rate': 1.272965879265092e-06, 'epoch': 3.74} 94%|█████████▍| 1475/1572 [3:05:03<11:45, 7.28s/it] 94%|█████████▍| 1476/1572 [3:05:11<11:48, 7.38s/it] {'loss': 0.5975, 'learning_rate': 1.2598425196850396e-06, 'epoch': 3.75} 94%|█████████▍| 1476/1572 [3:05:11<11:48, 7.38s/it] 94%|█████████▍| 1477/1572 [3:05:18<11:33, 7.30s/it] {'loss': 0.5945, 'learning_rate': 1.246719160104987e-06, 'epoch': 3.75} 94%|█████████▍| 1477/1572 [3:05:18<11:33, 7.30s/it] 94%|█████████▍| 1478/1572 [3:05:25<11:16, 7.19s/it] {'loss': 0.6447, 'learning_rate': 1.2335958005249344e-06, 'epoch': 3.75} 94%|█████████▍| 1478/1572 [3:05:25<11:16, 7.19s/it] 94%|█████████▍| 1479/1572 [3:05:32<11:07, 7.18s/it] {'loss': 0.5509, 'learning_rate': 1.220472440944882e-06, 'epoch': 3.75} 94%|█████████▍| 1479/1572 [3:05:32<11:07, 7.18s/it] 94%|█████████▍| 1480/1572 [3:05:39<10:56, 7.13s/it] {'loss': 0.5844, 'learning_rate': 1.2073490813648295e-06, 'epoch': 3.76} 94%|█████████▍| 1480/1572 [3:05:39<10:56, 7.13s/it] 94%|█████████▍| 1481/1572 [3:05:46<10:50, 7.15s/it] {'loss': 0.6216, 'learning_rate': 1.194225721784777e-06, 'epoch': 3.76} 94%|█████████▍| 1481/1572 [3:05:46<10:50, 7.15s/it] 94%|█████████▍| 1482/1572 [3:05:54<10:55, 7.28s/it] {'loss': 0.6026, 'learning_rate': 1.1811023622047246e-06, 'epoch': 3.76} 94%|█████████▍| 1482/1572 [3:05:54<10:55, 7.28s/it] 94%|█████████▍| 1483/1572 [3:06:01<10:43, 7.23s/it] {'loss': 0.5624, 'learning_rate': 1.1679790026246721e-06, 'epoch': 3.77} 94%|█████████▍| 1483/1572 [3:06:01<10:43, 7.23s/it] 94%|█████████▍| 1484/1572 [3:06:08<10:25, 7.10s/it] {'loss': 0.7082, 'learning_rate': 1.1548556430446194e-06, 'epoch': 3.77} 94%|█████████▍| 1484/1572 [3:06:08<10:25, 7.10s/it] 94%|█████████▍| 1485/1572 [3:06:15<10:21, 7.14s/it] {'loss': 0.6144, 'learning_rate': 1.141732283464567e-06, 'epoch': 3.77} 94%|█████████▍| 1485/1572 [3:06:15<10:21, 7.14s/it] 95%|█████████▍| 1486/1572 [3:06:22<10:18, 7.20s/it] {'loss': 0.5692, 'learning_rate': 1.1286089238845145e-06, 'epoch': 3.77} 95%|█████████▍| 1486/1572 [3:06:22<10:18, 7.20s/it] 95%|█████████▍| 1487/1572 [3:06:30<10:12, 7.21s/it] {'loss': 0.6262, 'learning_rate': 1.115485564304462e-06, 'epoch': 3.78} 95%|█████████▍| 1487/1572 [3:06:30<10:12, 7.21s/it] 95%|█████████▍| 1488/1572 [3:06:36<09:55, 7.09s/it] {'loss': 0.653, 'learning_rate': 1.1023622047244096e-06, 'epoch': 3.78} 95%|█████████▍| 1488/1572 [3:06:36<09:55, 7.09s/it] 95%|█████████▍| 1489/1572 [3:06:43<09:42, 7.02s/it] {'loss': 0.5855, 'learning_rate': 1.0892388451443571e-06, 'epoch': 3.78} 95%|█████████▍| 1489/1572 [3:06:43<09:42, 7.02s/it] 95%|█████████▍| 1490/1572 [3:06:50<09:26, 6.91s/it] {'loss': 0.5973, 'learning_rate': 1.0761154855643045e-06, 'epoch': 3.78} 95%|█████████▍| 1490/1572 [3:06:50<09:26, 6.91s/it] 95%|█████████▍| 1491/1572 [3:06:57<09:33, 7.08s/it] {'loss': 0.6105, 'learning_rate': 1.062992125984252e-06, 'epoch': 3.79} 95%|█████████▍| 1491/1572 [3:06:57<09:33, 7.08s/it] 95%|█████████▍| 1492/1572 [3:07:05<09:40, 7.25s/it] {'loss': 0.5299, 'learning_rate': 1.0498687664041996e-06, 'epoch': 3.79} 95%|█████████▍| 1492/1572 [3:07:05<09:40, 7.25s/it] 95%|█████████▍| 1493/1572 [3:07:12<09:38, 7.32s/it] {'loss': 0.6346, 'learning_rate': 1.036745406824147e-06, 'epoch': 3.79} 95%|█████████▍| 1493/1572 [3:07:12<09:38, 7.32s/it] 95%|█████████▌| 1494/1572 [3:07:20<09:40, 7.45s/it] {'loss': 0.6264, 'learning_rate': 1.0236220472440946e-06, 'epoch': 3.79} 95%|█████████▌| 1494/1572 [3:07:20<09:40, 7.45s/it] 95%|█████████▌| 1495/1572 [3:07:28<09:30, 7.40s/it] {'loss': 0.6193, 'learning_rate': 1.0104986876640422e-06, 'epoch': 3.8} 95%|█████████▌| 1495/1572 [3:07:28<09:30, 7.40s/it] 95%|█████████▌| 1496/1572 [3:07:35<09:14, 7.30s/it] {'loss': 0.595, 'learning_rate': 9.973753280839895e-07, 'epoch': 3.8} 95%|█████████▌| 1496/1572 [3:07:35<09:14, 7.30s/it] 95%|█████████▌| 1497/1572 [3:07:42<09:04, 7.26s/it] {'loss': 0.6105, 'learning_rate': 9.84251968503937e-07, 'epoch': 3.8} 95%|█████████▌| 1497/1572 [3:07:42<09:04, 7.26s/it] 95%|█████████▌| 1498/1572 [3:07:49<08:59, 7.30s/it] {'loss': 0.6622, 'learning_rate': 9.711286089238846e-07, 'epoch': 3.8} 95%|█████████▌| 1498/1572 [3:07:49<08:59, 7.30s/it] 95%|█████████▌| 1499/1572 [3:07:56<08:50, 7.27s/it] {'loss': 0.6297, 'learning_rate': 9.580052493438321e-07, 'epoch': 3.81} 95%|█████████▌| 1499/1572 [3:07:56<08:50, 7.27s/it] 95%|█████████▌| 1500/1572 [3:08:03<08:37, 7.18s/it] {'loss': 0.6445, 'learning_rate': 9.448818897637796e-07, 'epoch': 3.81} 95%|█████████▌| 1500/1572 [3:08:03<08:37, 7.18s/it] 95%|█████████▌| 1501/1572 [3:08:11<08:30, 7.19s/it] {'loss': 0.6366, 'learning_rate': 9.317585301837272e-07, 'epoch': 3.81} 95%|█████████▌| 1501/1572 [3:08:11<08:30, 7.19s/it] 96%|█████████▌| 1502/1572 [3:08:17<08:16, 7.09s/it] {'loss': 0.6128, 'learning_rate': 9.186351706036746e-07, 'epoch': 3.81} 96%|█████████▌| 1502/1572 [3:08:17<08:16, 7.09s/it] 96%|█████████▌| 1503/1572 [3:08:24<07:59, 6.95s/it] {'loss': 0.5532, 'learning_rate': 9.055118110236222e-07, 'epoch': 3.82} 96%|█████████▌| 1503/1572 [3:08:24<07:59, 6.95s/it] 96%|█████████▌| 1504/1572 [3:08:31<07:55, 6.99s/it] {'loss': 0.6056, 'learning_rate': 8.923884514435696e-07, 'epoch': 3.82} 96%|█████████▌| 1504/1572 [3:08:31<07:55, 6.99s/it] 96%|█████████▌| 1505/1572 [3:08:38<07:43, 6.92s/it] {'loss': 0.5792, 'learning_rate': 8.792650918635172e-07, 'epoch': 3.82} 96%|█████████▌| 1505/1572 [3:08:38<07:43, 6.92s/it] 96%|█████████▌| 1506/1572 [3:08:45<07:37, 6.93s/it] {'loss': 0.5819, 'learning_rate': 8.661417322834646e-07, 'epoch': 3.82} 96%|█████████▌| 1506/1572 [3:08:45<07:37, 6.93s/it] 96%|█████████▌| 1507/1572 [3:08:53<07:50, 7.23s/it] {'loss': 0.7251, 'learning_rate': 8.530183727034122e-07, 'epoch': 3.83} 96%|█████████▌| 1507/1572 [3:08:53<07:50, 7.23s/it] 96%|█████████▌| 1508/1572 [3:09:00<07:48, 7.33s/it] {'loss': 0.692, 'learning_rate': 8.398950131233597e-07, 'epoch': 3.83} 96%|█████████▌| 1508/1572 [3:09:00<07:48, 7.33s/it] 96%|█████████▌| 1509/1572 [3:09:08<07:39, 7.30s/it] {'loss': 0.6952, 'learning_rate': 8.267716535433072e-07, 'epoch': 3.83} 96%|█████████▌| 1509/1572 [3:09:08<07:39, 7.30s/it] 96%|█████████▌| 1510/1572 [3:09:15<07:39, 7.42s/it] {'loss': 0.6062, 'learning_rate': 8.136482939632546e-07, 'epoch': 3.83} 96%|█████████▌| 1510/1572 [3:09:15<07:39, 7.42s/it] 96%|█████████▌| 1511/1572 [3:09:23<07:38, 7.52s/it] {'loss': 0.5239, 'learning_rate': 8.005249343832022e-07, 'epoch': 3.84} 96%|█████████▌| 1511/1572 [3:09:23<07:38, 7.52s/it] 96%|█████████▌| 1512/1572 [3:09:31<07:31, 7.53s/it] {'loss': 0.5989, 'learning_rate': 7.874015748031496e-07, 'epoch': 3.84} 96%|█████████▌| 1512/1572 [3:09:31<07:31, 7.53s/it] 96%|█████████▌| 1513/1572 [3:09:38<07:24, 7.53s/it] {'loss': 0.661, 'learning_rate': 7.742782152230973e-07, 'epoch': 3.84} 96%|█████████▌| 1513/1572 [3:09:38<07:24, 7.53s/it] 96%|█████████▋| 1514/1572 [3:09:45<07:08, 7.38s/it] {'loss': 0.5685, 'learning_rate': 7.611548556430447e-07, 'epoch': 3.84} 96%|█████████▋| 1514/1572 [3:09:45<07:08, 7.38s/it] 96%|█████████▋| 1515/1572 [3:09:52<06:52, 7.24s/it] {'loss': 0.6027, 'learning_rate': 7.480314960629922e-07, 'epoch': 3.85} 96%|█████████▋| 1515/1572 [3:09:52<06:52, 7.24s/it] 96%|█████████▋| 1516/1572 [3:09:59<06:43, 7.21s/it] {'loss': 0.6204, 'learning_rate': 7.349081364829397e-07, 'epoch': 3.85} 96%|█████████▋| 1516/1572 [3:09:59<06:43, 7.21s/it] 97%|█████████▋| 1517/1572 [3:10:06<06:35, 7.19s/it] {'loss': 0.5488, 'learning_rate': 7.217847769028872e-07, 'epoch': 3.85} 97%|█████████▋| 1517/1572 [3:10:06<06:35, 7.19s/it] 97%|█████████▋| 1518/1572 [3:10:13<06:22, 7.09s/it] {'loss': 0.6015, 'learning_rate': 7.086614173228346e-07, 'epoch': 3.85} 97%|█████████▋| 1518/1572 [3:10:13<06:22, 7.09s/it] 97%|█████████▋| 1519/1572 [3:10:21<06:29, 7.34s/it] {'loss': 0.6198, 'learning_rate': 6.955380577427823e-07, 'epoch': 3.86} 97%|█████████▋| 1519/1572 [3:10:21<06:29, 7.34s/it] 97%|█████████▋| 1520/1572 [3:10:28<06:14, 7.21s/it] {'loss': 0.6718, 'learning_rate': 6.824146981627297e-07, 'epoch': 3.86} 97%|█████████▋| 1520/1572 [3:10:28<06:14, 7.21s/it] 97%|█████████▋| 1521/1572 [3:10:35<05:59, 7.05s/it] {'loss': 0.5899, 'learning_rate': 6.692913385826773e-07, 'epoch': 3.86} 97%|█████████▋| 1521/1572 [3:10:35<05:59, 7.05s/it] 97%|█████████▋| 1522/1572 [3:10:43<06:06, 7.33s/it] {'loss': 0.5739, 'learning_rate': 6.561679790026247e-07, 'epoch': 3.86} 97%|█████████▋| 1522/1572 [3:10:43<06:06, 7.33s/it] 97%|█████████▋| 1523/1572 [3:10:50<05:59, 7.33s/it] {'loss': 0.6321, 'learning_rate': 6.430446194225723e-07, 'epoch': 3.87} 97%|█████████▋| 1523/1572 [3:10:50<05:59, 7.33s/it] 97%|█████████▋| 1524/1572 [3:10:57<05:46, 7.22s/it] {'loss': 0.6345, 'learning_rate': 6.299212598425198e-07, 'epoch': 3.87} 97%|█████████▋| 1524/1572 [3:10:57<05:46, 7.22s/it] 97%|█████████▋| 1525/1572 [3:11:04<05:35, 7.14s/it] {'loss': 0.5331, 'learning_rate': 6.167979002624672e-07, 'epoch': 3.87} 97%|█████████▋| 1525/1572 [3:11:04<05:35, 7.14s/it] 97%|█████████▋| 1526/1572 [3:11:12<05:35, 7.29s/it] {'loss': 0.6951, 'learning_rate': 6.036745406824148e-07, 'epoch': 3.87} 97%|█████████▋| 1526/1572 [3:11:12<05:35, 7.29s/it] 97%|█████████▋| 1527/1572 [3:11:18<05:20, 7.12s/it] {'loss': 0.6193, 'learning_rate': 5.905511811023623e-07, 'epoch': 3.88} 97%|█████████▋| 1527/1572 [3:11:18<05:20, 7.12s/it] 97%|█████████▋| 1528/1572 [3:11:25<05:13, 7.13s/it] {'loss': 0.5597, 'learning_rate': 5.774278215223097e-07, 'epoch': 3.88} 97%|█████████▋| 1528/1572 [3:11:25<05:13, 7.13s/it] 97%|█████████▋| 1529/1572 [3:11:33<05:10, 7.23s/it] {'loss': 0.66, 'learning_rate': 5.643044619422573e-07, 'epoch': 3.88} 97%|█████████▋| 1529/1572 [3:11:33<05:10, 7.23s/it] 97%|█████████▋| 1530/1572 [3:11:40<05:02, 7.21s/it] {'loss': 0.6475, 'learning_rate': 5.511811023622048e-07, 'epoch': 3.88} 97%|█████████▋| 1530/1572 [3:11:40<05:02, 7.21s/it] 97%|█████████▋| 1531/1572 [3:11:47<04:52, 7.13s/it] {'loss': 0.6018, 'learning_rate': 5.380577427821522e-07, 'epoch': 3.89} 97%|█████████▋| 1531/1572 [3:11:47<04:52, 7.13s/it] 97%|█████████▋| 1532/1572 [3:11:54<04:46, 7.17s/it] {'loss': 0.5828, 'learning_rate': 5.249343832020998e-07, 'epoch': 3.89} 97%|█████████▋| 1532/1572 [3:11:54<04:46, 7.17s/it] 98%|█████████▊| 1533/1572 [3:12:02<04:42, 7.25s/it] {'loss': 0.5536, 'learning_rate': 5.118110236220473e-07, 'epoch': 3.89} 98%|█████████▊| 1533/1572 [3:12:02<04:42, 7.25s/it] 98%|█████████▊| 1534/1572 [3:12:09<04:32, 7.18s/it] {'loss': 0.6034, 'learning_rate': 4.986876640419948e-07, 'epoch': 3.89} 98%|█████████▊| 1534/1572 [3:12:09<04:32, 7.18s/it] 98%|█████████▊| 1535/1572 [3:12:16<04:23, 7.12s/it] {'loss': 0.5538, 'learning_rate': 4.855643044619423e-07, 'epoch': 3.9} 98%|█████████▊| 1535/1572 [3:12:16<04:23, 7.12s/it] 98%|█████████▊| 1536/1572 [3:12:23<04:21, 7.26s/it] {'loss': 0.5911, 'learning_rate': 4.724409448818898e-07, 'epoch': 3.9} 98%|█████████▊| 1536/1572 [3:12:23<04:21, 7.26s/it] 98%|█████████▊| 1537/1572 [3:12:30<04:10, 7.16s/it] {'loss': 0.6138, 'learning_rate': 4.593175853018373e-07, 'epoch': 3.9} 98%|█████████▊| 1537/1572 [3:12:30<04:10, 7.16s/it] 98%|█████████▊| 1538/1572 [3:12:37<03:58, 7.00s/it] {'loss': 0.5463, 'learning_rate': 4.461942257217848e-07, 'epoch': 3.9} 98%|█████████▊| 1538/1572 [3:12:37<03:58, 7.00s/it] 98%|█████████▊| 1539/1572 [3:12:44<03:53, 7.07s/it] {'loss': 0.562, 'learning_rate': 4.330708661417323e-07, 'epoch': 3.91} 98%|█████████▊| 1539/1572 [3:12:44<03:53, 7.07s/it] 98%|█████████▊| 1540/1572 [3:12:51<03:46, 7.07s/it] {'loss': 0.6298, 'learning_rate': 4.1994750656167983e-07, 'epoch': 3.91} 98%|█████████▊| 1540/1572 [3:12:51<03:46, 7.07s/it] 98%|█████████▊| 1541/1572 [3:12:59<03:43, 7.22s/it] {'loss': 0.5527, 'learning_rate': 4.068241469816273e-07, 'epoch': 3.91} 98%|█████████▊| 1541/1572 [3:12:59<03:43, 7.22s/it] 98%|█████████▊| 1542/1572 [3:13:06<03:34, 7.16s/it] {'loss': 0.5853, 'learning_rate': 3.937007874015748e-07, 'epoch': 3.91} 98%|█████████▊| 1542/1572 [3:13:06<03:34, 7.16s/it] 98%|█████████▊| 1543/1572 [3:13:13<03:30, 7.26s/it] {'loss': 0.6271, 'learning_rate': 3.8057742782152235e-07, 'epoch': 3.92} 98%|█████████▊| 1543/1572 [3:13:13<03:30, 7.26s/it] 98%|█████████▊| 1544/1572 [3:13:20<03:21, 7.21s/it] {'loss': 0.5871, 'learning_rate': 3.6745406824146983e-07, 'epoch': 3.92} 98%|█████████▊| 1544/1572 [3:13:20<03:21, 7.21s/it] 98%|█████████▊| 1545/1572 [3:13:28<03:16, 7.26s/it] {'loss': 0.5981, 'learning_rate': 3.543307086614173e-07, 'epoch': 3.92} 98%|█████████▊| 1545/1572 [3:13:28<03:16, 7.26s/it] 98%|█████████▊| 1546/1572 [3:13:35<03:08, 7.23s/it] {'loss': 0.6062, 'learning_rate': 3.4120734908136486e-07, 'epoch': 3.93} 98%|█████████▊| 1546/1572 [3:13:35<03:08, 7.23s/it] 98%|█████████▊| 1547/1572 [3:13:42<03:00, 7.21s/it] {'loss': 0.7301, 'learning_rate': 3.2808398950131235e-07, 'epoch': 3.93} 98%|█████████▊| 1547/1572 [3:13:42<03:00, 7.21s/it] 98%|█████████▊| 1548/1572 [3:13:49<02:54, 7.25s/it] {'loss': 0.5831, 'learning_rate': 3.149606299212599e-07, 'epoch': 3.93} 98%|█████████▊| 1548/1572 [3:13:49<02:54, 7.25s/it] 99%|█████████▊| 1549/1572 [3:13:56<02:44, 7.16s/it] {'loss': 0.5431, 'learning_rate': 3.018372703412074e-07, 'epoch': 3.93} 99%|█████████▊| 1549/1572 [3:13:56<02:44, 7.16s/it] 99%|█████████▊| 1550/1572 [3:14:03<02:34, 7.03s/it] {'loss': 0.5531, 'learning_rate': 2.8871391076115486e-07, 'epoch': 3.94} 99%|█████████▊| 1550/1572 [3:14:03<02:34, 7.03s/it] 99%|█████████▊| 1551/1572 [3:14:11<02:34, 7.36s/it] {'loss': 0.5909, 'learning_rate': 2.755905511811024e-07, 'epoch': 3.94} 99%|█████████▊| 1551/1572 [3:14:11<02:34, 7.36s/it] 99%|█████████▊| 1552/1572 [3:14:18<02:26, 7.33s/it] {'loss': 0.5495, 'learning_rate': 2.624671916010499e-07, 'epoch': 3.94} 99%|█████████▊| 1552/1572 [3:14:18<02:26, 7.33s/it] 99%|█████████▉| 1553/1572 [3:14:25<02:16, 7.17s/it] {'loss': 0.6404, 'learning_rate': 2.493438320209974e-07, 'epoch': 3.94} 99%|█████████▉| 1553/1572 [3:14:25<02:16, 7.17s/it] 99%|█████████▉| 1554/1572 [3:14:33<02:10, 7.24s/it] {'loss': 0.5947, 'learning_rate': 2.362204724409449e-07, 'epoch': 3.95} 99%|█████████▉| 1554/1572 [3:14:33<02:10, 7.24s/it] 99%|█████████▉| 1555/1572 [3:14:40<02:03, 7.24s/it] {'loss': 0.6294, 'learning_rate': 2.230971128608924e-07, 'epoch': 3.95} 99%|█████████▉| 1555/1572 [3:14:40<02:03, 7.24s/it] 99%|█████████▉| 1556/1572 [3:14:47<01:55, 7.23s/it] {'loss': 0.6102, 'learning_rate': 2.0997375328083992e-07, 'epoch': 3.95} 99%|█████████▉| 1556/1572 [3:14:47<01:55, 7.23s/it] 99%|█████████▉| 1557/1572 [3:14:54<01:46, 7.13s/it] {'loss': 0.5825, 'learning_rate': 1.968503937007874e-07, 'epoch': 3.95} 99%|█████████▉| 1557/1572 [3:14:54<01:46, 7.13s/it] 99%|█████████▉| 1558/1572 [3:15:01<01:41, 7.24s/it] {'loss': 0.5933, 'learning_rate': 1.8372703412073492e-07, 'epoch': 3.96} 99%|█████████▉| 1558/1572 [3:15:01<01:41, 7.24s/it] 99%|█████████▉| 1559/1572 [3:15:09<01:36, 7.43s/it] {'loss': 0.5506, 'learning_rate': 1.7060367454068243e-07, 'epoch': 3.96} 99%|█████████▉| 1559/1572 [3:15:09<01:36, 7.43s/it] 99%|█████████▉| 1560/1572 [3:15:17<01:31, 7.63s/it] {'loss': 0.6542, 'learning_rate': 1.5748031496062994e-07, 'epoch': 3.96} 99%|█████████▉| 1560/1572 [3:15:17<01:31, 7.63s/it] 99%|█████████▉| 1561/1572 [3:15:25<01:22, 7.49s/it] {'loss': 0.5879, 'learning_rate': 1.4435695538057743e-07, 'epoch': 3.96} 99%|█████████▉| 1561/1572 [3:15:25<01:22, 7.49s/it] 99%|█████████▉| 1562/1572 [3:15:32<01:14, 7.48s/it] {'loss': 0.5935, 'learning_rate': 1.3123359580052494e-07, 'epoch': 3.97} 99%|█████████▉| 1562/1572 [3:15:32<01:14, 7.48s/it] 99%|█████████▉| 1563/1572 [3:15:39<01:05, 7.24s/it] {'loss': 0.5989, 'learning_rate': 1.1811023622047244e-07, 'epoch': 3.97} 99%|█████████▉| 1563/1572 [3:15:39<01:05, 7.24s/it] 99%|█████████▉| 1564/1572 [3:15:45<00:56, 7.04s/it] {'loss': 0.5682, 'learning_rate': 1.0498687664041996e-07, 'epoch': 3.97} 99%|█████████▉| 1564/1572 [3:15:45<00:56, 7.04s/it] 100%|█████████▉| 1565/1572 [3:15:53<00:50, 7.19s/it] {'loss': 0.6218, 'learning_rate': 9.186351706036746e-08, 'epoch': 3.97} 100%|█████████▉| 1565/1572 [3:15:53<00:50, 7.19s/it] 100%|█████████▉| 1566/1572 [3:16:00<00:43, 7.23s/it] {'loss': 0.6013, 'learning_rate': 7.874015748031497e-08, 'epoch': 3.98} 100%|█████████▉| 1566/1572 [3:16:00<00:43, 7.23s/it] 100%|█████████▉| 1567/1572 [3:16:08<00:36, 7.30s/it] {'loss': 0.6295, 'learning_rate': 6.561679790026247e-08, 'epoch': 3.98} 100%|█████████▉| 1567/1572 [3:16:08<00:36, 7.30s/it] 100%|█████████▉| 1568/1572 [3:16:15<00:29, 7.41s/it] {'loss': 0.5939, 'learning_rate': 5.249343832020998e-08, 'epoch': 3.98} 100%|█████████▉| 1568/1572 [3:16:15<00:29, 7.41s/it] 100%|█████████▉| 1569/1572 [3:16:22<00:21, 7.30s/it] {'loss': 0.5385, 'learning_rate': 3.9370078740157486e-08, 'epoch': 3.98} 100%|█████████▉| 1569/1572 [3:16:22<00:21, 7.30s/it] 100%|█████████▉| 1570/1572 [3:16:30<00:14, 7.38s/it] {'loss': 0.5929, 'learning_rate': 2.624671916010499e-08, 'epoch': 3.99} 100%|█████████▉| 1570/1572 [3:16:30<00:14, 7.38s/it] 100%|█████████▉| 1571/1572 [3:16:37<00:07, 7.43s/it] {'loss': 0.5997, 'learning_rate': 1.3123359580052495e-08, 'epoch': 3.99} 100%|█████████▉| 1571/1572 [3:16:37<00:07, 7.43s/it] 100%|██████████| 1572/1572 [3:16:45<00:00, 7.37s/it] {'loss': 0.6515, 'learning_rate': 0.0, 'epoch': 3.99} 100%|██████████| 1572/1572 [3:16:45<00:00, 7.37s/it][WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [WARNING|trainer.py:2348] 2024-07-08 22:36:34,433 >> Checkpoint destination directory ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 already exists and is non-empty.Saving will proceed but saved results may be invalid. [INFO|trainer.py:2889] 2024-07-08 22:36:56,980 >> Saving model checkpoint to ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572 [INFO|tokenization_utils_base.py:2432] 2024-07-08 22:36:58,243 >> tokenizer config file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-07-08 22:36:58,248 >> Special tokens file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/checkpoint-1572/special_tokens_map.json [INFO|trainer.py:1947] 2024-07-08 22:38:37,405 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 11931.9867, 'train_samples_per_second': 16.9, 'train_steps_per_second': 0.132, 'train_loss': 0.6663590594178241, 'epoch': 3.99} 100%|██████████| 1572/1572 [3:18:48<00:00, 7.37s/it] 100%|██████████| 1572/1572 [3:18:48<00:00, 7.59s/it] dlc1apybk6l37ai7-master-0:429463:430065 [7] NCCL INFO [Service thread] Connection closed by localRank 7 dlc1apybk6l37ai7-master-0:429458:430067 [2] NCCL INFO [Service thread] Connection closed by localRank 2 dlc1apybk6l37ai7-master-0:429459:430070 [3] NCCL INFO [Service thread] Connection closed by localRank 3 dlc1apybk6l37ai7-master-0:429462:430068 [6] NCCL INFO [Service thread] Connection closed by localRank 6 dlc1apybk6l37ai7-master-0:429460:430071 [4] NCCL INFO [Service thread] Connection closed by localRank 4 dlc1apybk6l37ai7-master-0:429457:430066 [1] NCCL INFO [Service thread] Connection closed by localRank 1 dlc1apybk6l37ai7-master-0:429461:430069 [5] NCCL INFO [Service thread] Connection closed by localRank 5 dlc1apybk6l37ai7-master-0:429463:429463 [7] NCCL INFO comm 0x5a835df0 rank 7 nranks 8 cudaDev 7 busId 80 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429458:429458 [2] NCCL INFO comm 0x2a244470 rank 2 nranks 8 cudaDev 2 busId 30 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429459:429459 [3] NCCL INFO comm 0x68858fe0 rank 3 nranks 8 cudaDev 3 busId 40 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429462:429462 [6] NCCL INFO comm 0x2a5517e0 rank 6 nranks 8 cudaDev 6 busId 70 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429460:429460 [4] NCCL INFO comm 0x68055110 rank 4 nranks 8 cudaDev 4 busId 50 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429457:429457 [1] NCCL INFO comm 0x2775e5c0 rank 1 nranks 8 cudaDev 1 busId 20 - Abort COMPLETE dlc1apybk6l37ai7-master-0:429461:429461 [5] NCCL INFO comm 0x2a9ede60 rank 5 nranks 8 cudaDev 5 busId 60 - Abort COMPLETE [INFO|trainer.py:2889] 2024-07-08 22:39:00,631 >> Saving model checkpoint to ../out/llama3-8b-inst-p0.05-lora-seed3 [INFO|tokenization_utils_base.py:2432] 2024-07-08 22:39:01,946 >> tokenizer config file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-07-08 22:39:01,954 >> Special tokens file saved in ../out/llama3-8b-inst-p0.05-lora-seed3/special_tokens_map.json ***** train metrics ***** epoch = 3.99 train_loss = 0.6664 train_runtime = 3:18:51.98 train_samples = 50413 train_samples_per_second = 16.9 train_steps_per_second = 0.132 dlc1apybk6l37ai7-master-0:429456:430064 [0] NCCL INFO [Service thread] Connection closed by localRank 0 dlc1apybk6l37ai7-master-0:429456:429456 [0] NCCL INFO comm 0x5942be30 rank 0 nranks 8 cudaDev 0 busId 10 - Abort COMPLETE