[2024-03-15 12:42:03,639] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:06,097] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2024-03-15 12:42:06,097] [INFO] [runner.py:555:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None mobilevlm/train/train_mem.py --deepspeed scripts/deepspeed/zero3.json --model_name_or_path /mnt/MobileVLM/outputs/mobilevlm_v2_1.7b_20240315_094506/mobilevlm_v2-1.pretrain --version v1 --data_path data/finetune_data/combined_new_mobilevlm_dataset_filtered.json --image_folder data/finetune_data/ --vision_tower /mnt/clip-vit-large-patch14-336 --vision_tower_type clip --mm_projector_type ldpnetv2 --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --bf16 True --output_dir /mnt/MobileVLM/outputs/mobilevlm_v2_1.7b_20240315_094506/mobilevlm_v2-2.finetune --num_train_epochs 3 --per_device_train_batch_size 16 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy steps --save_steps 2000 --save_total_limit 1 --learning_rate 4e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to wandb [2024-03-15 12:42:07,402] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.17.1-1+cuda12.1 [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.17.1-1 [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.17.1-1 [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.17.1-1+cuda12.1 [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2 [2024-03-15 12:42:09,779] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.17.1-1 [2024-03-15 12:42:09,779] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]} [2024-03-15 12:42:09,779] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0 [2024-03-15 12:42:09,779] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}) [2024-03-15 12:42:09,779] [INFO] [launch.py:163:main] dist_world_size=8 [2024-03-15 12:42:09,780] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 [2024-03-15 12:42:14,509] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:14,628] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:14,793] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:15,280] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:15,293] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:15,294] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:15,353] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:15,355] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 12:42:16,035] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,036] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,036] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2024-03-15 12:42:16,100] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,100] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,608] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,608] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,708] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,708] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,731] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,731] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,783] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,783] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,786] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,786] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:16,914] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2024-03-15 12:42:16,914] [INFO] [comm.py:594:init_distributed] cdb=None [2024-03-15 12:42:31,531] [WARNING] [partition_parameters.py:836:_post_init_method] param `class_embedding` in CLIPVisionEmbeddings not on GPU so was not broadcasted from rank 0 [2024-03-15 12:42:35,626] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 1.67B parameters [2024-03-15 12:42:37,075] [WARNING] [partition_parameters.py:836:_post_init_method] param `class_embedding` in CLIPVisionEmbeddings not on GPU so was not broadcasted from rank 0 [2024-03-15 12:42:37,254] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 1.98B parameters [2024-03-15 12:42:41,263] [WARNING] [partition_parameters.py:836:_post_init_method] param `class_embedding` in CLIPVisionEmbeddings not on GPU so was not broadcasted from rank 0 [2024-03-15 12:42:41,601] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 2.28B parameters Formatting inputs...Skip in lazy mode /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.8/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( Parameter Offload: Total persistent parameters: 449536 in 298 params git root error: Cmd('git') failed due to: exit code(128) cmdline: git rev-parse --show-toplevel stderr: 'fatal: detected dubious ownership in repository at '/mnt/MobileVLM' To add an exception for this directory, call: git config --global --add safe.directory /mnt/MobileVLM' git root error: Cmd('git') failed due to: exit code(128) cmdline: git rev-parse --show-toplevel stderr: 'fatal: detected dubious ownership in repository at '/mnt/MobileVLM' To add an exception for this directory, call: git config --global --add safe.directory /mnt/MobileVLM' wandb: Currently logged in as: smellslikeml. Use `wandb login --relogin` to force relogin wandb: wandb version 0.16.4 is available! To upgrade, please run: wandb: $ pip install wandb --upgrade wandb: Tracking run with wandb version 0.16.3 wandb: Run data is saved locally in /mnt/MobileVLM/wandb/run-20240315_124251-mgav6nep wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run solar-oath-104 wandb: ⭐️ View project at https://wandb.ai/smellslikeml/huggingface wandb: 🚀 View run at https://wandb.ai/smellslikeml/huggingface/runs/mgav6nep 0%| | 0/660 [00:00 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3106 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3282 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3146 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3038 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3270 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3252 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3350 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3301 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3152 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3216 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3094 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3266 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3267 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3043 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3040 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3154 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3040 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2782 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2814 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3234 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2986 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3227 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3220 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2886 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3010 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2979 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3082 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3346 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3199 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (4031 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2992 > 2048). Running this sequence through the model will result in indexing errors /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( 0%| | 1/660 [00:16<3:03:10, 16.68s/it] {'loss': 1.4736, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0} 0%| | 1/660 [00:16<3:03:10, 16.68s/it] 0%| | 2/660 [00:19<1:36:02, 8.76s/it] {'loss': 1.5215, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01} 0%| | 2/660 [00:19<1:36:02, 8.76s/it] 0%| | 3/660 [00:22<1:05:54, 6.02s/it] {'loss': 1.4678, 'learning_rate': 6e-06, 'epoch': 0.01} 0%| | 3/660 [00:22<1:05:54, 6.02s/it] 1%| | 4/660 [00:25<51:44, 4.73s/it] {'loss': 1.4473, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02} 1%| | 4/660 [00:25<51:44, 4.73s/it] 1%| | 5/660 [00:28<43:53, 4.02s/it] {'loss': 1.2803, 'learning_rate': 1e-05, 'epoch': 0.02} 1%| | 5/660 [00:28<43:53, 4.02s/it] 1%| | 6/660 [00:30<39:06, 3.59s/it] {'loss': 1.042, 'learning_rate': 1.2e-05, 'epoch': 0.03} 1%| | 6/660 [00:30<39:06, 3.59s/it] 1%| | 7/660 [00:33<36:02, 3.31s/it] {'loss': 1.1182, 'learning_rate': 1.4e-05, 'epoch': 0.03} 1%| | 7/660 [00:33<36:02, 3.31s/it] 1%| | 8/660 [00:36<34:06, 3.14s/it] {'loss': 0.9946, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.04} 1%| | 8/660 [00:36<34:06, 3.14s/it] 1%|▏ | 9/660 [00:39<32:49, 3.03s/it] {'loss': 1.022, 'learning_rate': 1.8e-05, 'epoch': 0.04} 1%|▏ | 9/660 [00:39<32:49, 3.03s/it] 2%|▏ | 10/660 [00:41<31:54, 2.95s/it] {'loss': 0.7852, 'learning_rate': 2e-05, 'epoch': 0.05} 2%|▏ | 10/660 [00:41<31:54, 2.95s/it] 2%|▏ | 11/660 [00:44<31:11, 2.88s/it] {'loss': 0.8345, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.05} 2%|▏ | 11/660 [00:44<31:11, 2.88s/it] 2%|▏ | 12/660 [00:47<30:44, 2.85s/it] {'loss': 0.7861, 'learning_rate': 2.4e-05, 'epoch': 0.05} 2%|▏ | 12/660 [00:47<30:44, 2.85s/it] 2%|▏ | 13/660 [00:50<30:29, 2.83s/it] {'loss': 0.7231, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.06} 2%|▏ | 13/660 [00:50<30:29, 2.83s/it] 2%|▏ | 14/660 [00:53<30:16, 2.81s/it] {'loss': 0.5786, 'learning_rate': 2.8e-05, 'epoch': 0.06} 2%|▏ | 14/660 [00:53<30:16, 2.81s/it] 2%|▏ | 15/660 [00:55<30:05, 2.80s/it] {'loss': 0.6423, 'learning_rate': 3.0000000000000004e-05, 'epoch': 0.07} 2%|▏ | 15/660 [00:55<30:05, 2.80s/it] 2%|▏ | 16/660 [00:58<29:53, 2.79s/it] {'loss': 0.5959, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.07} 2%|▏ | 16/660 [00:58<29:53, 2.79s/it] 3%|▎ | 17/660 [01:01<29:46, 2.78s/it] {'loss': 0.5732, 'learning_rate': 3.4e-05, 'epoch': 0.08} 3%|▎ | 17/660 [01:01<29:46, 2.78s/it] 3%|▎ | 18/660 [01:04<29:38, 2.77s/it] {'loss': 0.5588, 'learning_rate': 3.6e-05, 'epoch': 0.08} 3%|▎ | 18/660 [01:04<29:38, 2.77s/it] 3%|▎ | 19/660 [01:06<29:36, 2.77s/it] {'loss': 0.51, 'learning_rate': 3.8e-05, 'epoch': 0.09} 3%|▎ | 19/660 [01:06<29:36, 2.77s/it] 3%|▎ | 20/660 [01:09<29:27, 2.76s/it] {'loss': 0.572, 'learning_rate': 4e-05, 'epoch': 0.09} 3%|▎ | 20/660 [01:09<29:27, 2.76s/it] 3%|▎ | 21/660 [01:12<29:28, 2.77s/it] {'loss': 0.4626, 'learning_rate': 3.9999759043345147e-05, 'epoch': 0.1} 3%|▎ | 21/660 [01:12<29:28, 2.77s/it] 3%|▎ | 22/660 [01:15<29:30, 2.78s/it] {'loss': 0.4841, 'learning_rate': 3.999903617918656e-05, 'epoch': 0.1} 3%|▎ | 22/660 [01:15<29:30, 2.78s/it] 3%|▎ | 23/660 [01:17<29:18, 2.76s/it] {'loss': 0.5466, 'learning_rate': 3.999783142494217e-05, 'epoch': 0.1} 3%|▎ | 23/660 [01:17<29:18, 2.76s/it] 4%|▎ | 24/660 [01:20<29:18, 2.77s/it] {'loss': 0.3667, 'learning_rate': 3.9996144809641296e-05, 'epoch': 0.11} 4%|▎ | 24/660 [01:20<29:18, 2.77s/it] 4%|▍ | 25/660 [01:23<29:11, 2.76s/it] {'loss': 0.4629, 'learning_rate': 3.999397637392409e-05, 'epoch': 0.11} 4%|▍ | 25/660 [01:23<29:11, 2.76s/it] 4%|▍ | 26/660 [01:26<29:09, 2.76s/it] {'loss': 0.4529, 'learning_rate': 3.999132617004043e-05, 'epoch': 0.12} 4%|▍ | 26/660 [01:26<29:09, 2.76s/it] 4%|▍ | 27/660 [01:28<29:04, 2.76s/it] {'loss': 0.4622, 'learning_rate': 3.998819426184875e-05, 'epoch': 0.12} 4%|▍ | 27/660 [01:28<29:04, 2.76s/it] 4%|▍ | 28/660 [01:31<29:01, 2.76s/it] {'loss': 0.4536, 'learning_rate': 3.9984580724814464e-05, 'epoch': 0.13} 4%|▍ | 28/660 [01:31<29:01, 2.76s/it] 4%|▍ | 29/660 [01:34<28:57, 2.75s/it] {'loss': 0.4685, 'learning_rate': 3.998048564600815e-05, 'epoch': 0.13} 4%|▍ | 29/660 [01:34<28:57, 2.75s/it] 5%|▍ | 30/660 [01:37<28:59, 2.76s/it] {'loss': 0.4087, 'learning_rate': 3.997590912410345e-05, 'epoch': 0.14} 5%|▍ | 30/660 [01:37<28:59, 2.76s/it] 5%|▍ | 31/660 [01:39<29:01, 2.77s/it] {'loss': 0.4453, 'learning_rate': 3.997085126937472e-05, 'epoch': 0.14} 5%|▍ | 31/660 [01:39<29:01, 2.77s/it] 5%|▍ | 32/660 [01:42<29:02, 2.78s/it] {'loss': 0.3704, 'learning_rate': 3.996531220369432e-05, 'epoch': 0.15} 5%|▍ | 32/660 [01:42<29:02, 2.78s/it] 5%|▌ | 33/660 [01:45<28:55, 2.77s/it] {'loss': 0.4199, 'learning_rate': 3.9959292060529734e-05, 'epoch': 0.15} 5%|▌ | 33/660 [01:45<28:55, 2.77s/it] 5%|▌ | 34/660 [01:48<28:54, 2.77s/it] {'loss': 0.4004, 'learning_rate': 3.995279098494032e-05, 'epoch': 0.15} 5%|▌ | 34/660 [01:48<28:54, 2.77s/it] 5%|▌ | 35/660 [01:51<28:48, 2.77s/it] {'loss': 0.4397, 'learning_rate': 3.994580913357381e-05, 'epoch': 0.16} 5%|▌ | 35/660 [01:51<28:48, 2.77s/it] 5%|▌ | 36/660 [01:53<28:43, 2.76s/it] {'loss': 0.4116, 'learning_rate': 3.9938346674662565e-05, 'epoch': 0.16} 5%|▌ | 36/660 [01:53<28:43, 2.76s/it] 6%|▌ | 37/660 [01:56<28:42, 2.76s/it] {'loss': 0.4175, 'learning_rate': 3.99304037880195e-05, 'epoch': 0.17} 6%|▌ | 37/660 [01:56<28:42, 2.76s/it] 6%|▌ | 38/660 [01:59<28:33, 2.75s/it] {'loss': 0.5688, 'learning_rate': 3.992198066503375e-05, 'epoch': 0.17} 6%|▌ | 38/660 [01:59<28:33, 2.75s/it] 6%|▌ | 39/660 [02:02<28:30, 2.75s/it] {'loss': 0.4849, 'learning_rate': 3.9913077508666066e-05, 'epoch': 0.18} 6%|▌ | 39/660 [02:02<28:30, 2.75s/it] 6%|▌ | 40/660 [02:04<28:24, 2.75s/it] {'loss': 0.4883, 'learning_rate': 3.990369453344394e-05, 'epoch': 0.18} 6%|▌ | 40/660 [02:04<28:24, 2.75s/it] 6%|▌ | 41/660 [02:07<28:24, 2.75s/it] {'loss': 0.4309, 'learning_rate': 3.989383196545639e-05, 'epoch': 0.19} 6%|▌ | 41/660 [02:07<28:24, 2.75s/it] 6%|▋ | 42/660 [02:10<28:29, 2.77s/it] {'loss': 0.3831, 'learning_rate': 3.988349004234857e-05, 'epoch': 0.19} 6%|▋ | 42/660 [02:10<28:29, 2.77s/it] 7%|▋ | 43/660 [02:13<28:26, 2.77s/it] {'loss': 0.3997, 'learning_rate': 3.987266901331598e-05, 'epoch': 0.2} 7%|▋ | 43/660 [02:13<28:26, 2.77s/it] 7%|▋ | 44/660 [02:15<28:25, 2.77s/it] {'loss': 0.3979, 'learning_rate': 3.986136913909853e-05, 'epoch': 0.2} 7%|▋ | 44/660 [02:15<28:25, 2.77s/it] 7%|▋ | 45/660 [02:18<28:17, 2.76s/it] {'loss': 0.4583, 'learning_rate': 3.9849590691974206e-05, 'epoch': 0.2} 7%|▋ | 45/660 [02:18<28:17, 2.76s/it] 7%|▋ | 46/660 [02:21<28:15, 2.76s/it] {'loss': 0.4446, 'learning_rate': 3.983733395575252e-05, 'epoch': 0.21} 7%|▋ | 46/660 [02:21<28:15, 2.76s/it] 7%|▋ | 47/660 [02:24<28:12, 2.76s/it] {'loss': 0.3961, 'learning_rate': 3.982459922576771e-05, 'epoch': 0.21} 7%|▋ | 47/660 [02:24<28:12, 2.76s/it] 7%|▋ | 48/660 [02:26<28:15, 2.77s/it] {'loss': 0.3503, 'learning_rate': 3.981138680887154e-05, 'epoch': 0.22} 7%|▋ | 48/660 [02:26<28:15, 2.77s/it] 7%|▋ | 49/660 [02:29<28:12, 2.77s/it] {'loss': 0.4221, 'learning_rate': 3.979769702342602e-05, 'epoch': 0.22} 7%|▋ | 49/660 [02:29<28:12, 2.77s/it] 8%|▊ | 50/660 [02:32<28:04, 2.76s/it] {'loss': 0.4985, 'learning_rate': 3.978353019929562e-05, 'epoch': 0.23} 8%|▊ | 50/660 [02:32<28:04, 2.76s/it] 8%|▊ | 51/660 [02:35<28:00, 2.76s/it] {'loss': 0.4048, 'learning_rate': 3.97688866778394e-05, 'epoch': 0.23} 8%|▊ | 51/660 [02:35<28:00, 2.76s/it] 8%|▊ | 52/660 [02:37<27:58, 2.76s/it] {'loss': 0.3792, 'learning_rate': 3.9753766811902756e-05, 'epoch': 0.24} 8%|▊ | 52/660 [02:37<27:58, 2.76s/it] 8%|▊ | 53/660 [02:40<27:56, 2.76s/it] {'loss': 0.4651, 'learning_rate': 3.973817096580892e-05, 'epoch': 0.24} 8%|▊ | 53/660 [02:40<27:56, 2.76s/it] 8%|▊ | 54/660 [02:43<27:52, 2.76s/it] {'loss': 0.3909, 'learning_rate': 3.9722099515350174e-05, 'epoch': 0.25} 8%|▊ | 54/660 [02:43<27:52, 2.76s/it] 8%|▊ | 55/660 [02:46<27:45, 2.75s/it] {'loss': 0.5127, 'learning_rate': 3.970555284777883e-05, 'epoch': 0.25} 8%|▊ | 55/660 [02:46<27:45, 2.75s/it] 8%|▊ | 56/660 [02:48<27:41, 2.75s/it] {'loss': 0.4479, 'learning_rate': 3.9688531361797834e-05, 'epoch': 0.25} 8%|▊ | 56/660 [02:48<27:41, 2.75s/it] 9%|▊ | 57/660 [02:51<27:42, 2.76s/it] {'loss': 0.3767, 'learning_rate': 3.967103546755123e-05, 'epoch': 0.26} 9%|▊ | 57/660 [02:51<27:42, 2.76s/it] 9%|▉ | 58/660 [02:54<27:38, 2.76s/it] {'loss': 0.4307, 'learning_rate': 3.965306558661424e-05, 'epoch': 0.26} 9%|▉ | 58/660 [02:54<27:38, 2.76s/it] 9%|▉ | 59/660 [02:57<27:34, 2.75s/it] {'loss': 0.3948, 'learning_rate': 3.963462215198309e-05, 'epoch': 0.27} 9%|▉ | 59/660 [02:57<27:34, 2.75s/it] 9%|▉ | 60/660 [03:00<27:31, 2.75s/it] {'loss': 0.4614, 'learning_rate': 3.961570560806461e-05, 'epoch': 0.27} 9%|▉ | 60/660 [03:00<27:31, 2.75s/it] 9%|▉ | 61/660 [03:02<27:30, 2.76s/it] {'loss': 0.4114, 'learning_rate': 3.959631641066553e-05, 'epoch': 0.28} 9%|▉ | 61/660 [03:02<27:30, 2.76s/it] 9%|▉ | 62/660 [03:05<27:29, 2.76s/it] {'loss': 0.4158, 'learning_rate': 3.957645502698145e-05, 'epoch': 0.28} 9%|▉ | 62/660 [03:05<27:29, 2.76s/it] 10%|▉ | 63/660 [03:08<27:22, 2.75s/it] {'loss': 0.4541, 'learning_rate': 3.955612193558564e-05, 'epoch': 0.29} 10%|▉ | 63/660 [03:08<27:22, 2.75s/it] 10%|▉ | 64/660 [03:11<27:18, 2.75s/it] {'loss': 0.423, 'learning_rate': 3.953531762641745e-05, 'epoch': 0.29} 10%|▉ | 64/660 [03:11<27:18, 2.75s/it] 10%|▉ | 65/660 [03:13<27:16, 2.75s/it] {'loss': 0.3955, 'learning_rate': 3.9514042600770576e-05, 'epoch': 0.3} 10%|▉ | 65/660 [03:13<27:16, 2.75s/it] 10%|█ | 66/660 [03:16<27:11, 2.75s/it] {'loss': 0.4592, 'learning_rate': 3.94922973712809e-05, 'epoch': 0.3} 10%|█ | 66/660 [03:16<27:11, 2.75s/it] 10%|█ | 67/660 [03:19<27:15, 2.76s/it] {'loss': 0.4014, 'learning_rate': 3.9470082461914216e-05, 'epoch': 0.3} 10%|█ | 67/660 [03:19<27:15, 2.76s/it] 10%|█ | 68/660 [03:22<27:16, 2.76s/it] {'loss': 0.3845, 'learning_rate': 3.9447398407953536e-05, 'epoch': 0.31} 10%|█ | 68/660 [03:22<27:16, 2.76s/it] 10%|█ | 69/660 [03:24<27:11, 2.76s/it] {'loss': 0.4546, 'learning_rate': 3.942424575598624e-05, 'epoch': 0.31} 10%|█ | 69/660 [03:24<27:11, 2.76s/it] 11%|█ | 70/660 [03:27<27:11, 2.77s/it] {'loss': 0.4053, 'learning_rate': 3.940062506389089e-05, 'epoch': 0.32} 11%|█ | 70/660 [03:27<27:11, 2.77s/it] 11%|█ | 71/660 [03:30<27:08, 2.76s/it] {'loss': 0.3638, 'learning_rate': 3.9376536900823764e-05, 'epoch': 0.32} 11%|█ | 71/660 [03:30<27:08, 2.76s/it] 11%|█ | 72/660 [03:33<27:01, 2.76s/it] {'loss': 0.4541, 'learning_rate': 3.93519818472052e-05, 'epoch': 0.33} 11%|█ | 72/660 [03:33<27:01, 2.76s/it] 11%|█ | 73/660 [03:35<26:58, 2.76s/it] {'loss': 0.4302, 'learning_rate': 3.932696049470555e-05, 'epoch': 0.33} 11%|█ | 73/660 [03:35<26:58, 2.76s/it] 11%|█ | 74/660 [03:38<26:54, 2.76s/it] {'loss': 0.4272, 'learning_rate': 3.930147344623095e-05, 'epoch': 0.34} 11%|█ | 74/660 [03:38<26:54, 2.76s/it] 11%|█▏ | 75/660 [03:41<26:56, 2.76s/it] {'loss': 0.3616, 'learning_rate': 3.92755213159088e-05, 'epoch': 0.34} 11%|█▏ | 75/660 [03:41<26:56, 2.76s/it] 12%|█▏ | 76/660 [03:44<26:54, 2.76s/it] {'loss': 0.4036, 'learning_rate': 3.9249104729072944e-05, 'epoch': 0.35} 12%|█▏ | 76/660 [03:44<26:54, 2.76s/it] 12%|█▏ | 77/660 [03:46<26:51, 2.76s/it] {'loss': 0.3239, 'learning_rate': 3.922222432224864e-05, 'epoch': 0.35} 12%|█▏ | 77/660 [03:46<26:51, 2.76s/it] 12%|█▏ | 78/660 [03:49<26:51, 2.77s/it] {'loss': 0.4143, 'learning_rate': 3.919488074313715e-05, 'epoch': 0.35} 12%|█▏ | 78/660 [03:49<26:51, 2.77s/it] 12%|█▏ | 79/660 [03:52<26:49, 2.77s/it] {'loss': 0.4331, 'learning_rate': 3.9167074650600235e-05, 'epoch': 0.36} 12%|█▏ | 79/660 [03:52<26:49, 2.77s/it] 12%|█▏ | 80/660 [03:55<26:43, 2.76s/it] {'loss': 0.3865, 'learning_rate': 3.913880671464418e-05, 'epoch': 0.36} 12%|█▏ | 80/660 [03:55<26:43, 2.76s/it] 12%|█▏ | 81/660 [03:57<26:39, 2.76s/it] {'loss': 0.3398, 'learning_rate': 3.911007761640373e-05, 'epoch': 0.37} 12%|█▏ | 81/660 [03:57<26:39, 2.76s/it] 12%|█▏ | 82/660 [04:00<26:35, 2.76s/it] {'loss': 0.3346, 'learning_rate': 3.9080888048125614e-05, 'epoch': 0.37} 12%|█▏ | 82/660 [04:00<26:35, 2.76s/it] 13%|█▎ | 83/660 [04:03<26:34, 2.76s/it] {'loss': 0.3809, 'learning_rate': 3.905123871315191e-05, 'epoch': 0.38} 13%|█▎ | 83/660 [04:03<26:34, 2.76s/it] 13%|█▎ | 84/660 [04:06<26:31, 2.76s/it] {'loss': 0.3754, 'learning_rate': 3.9021130325903076e-05, 'epoch': 0.38} 13%|█▎ | 84/660 [04:06<26:31, 2.76s/it] 13%|█▎ | 85/660 [04:09<26:32, 2.77s/it] {'loss': 0.3518, 'learning_rate': 3.899056361186074e-05, 'epoch': 0.39} 13%|█▎ | 85/660 [04:09<26:32, 2.77s/it] 13%|█▎ | 86/660 [04:11<26:26, 2.76s/it] {'loss': 0.4314, 'learning_rate': 3.8959539307550215e-05, 'epoch': 0.39} 13%|█▎ | 86/660 [04:11<26:26, 2.76s/it] 13%|█▎ | 87/660 [04:14<26:19, 2.76s/it] {'loss': 0.4265, 'learning_rate': 3.892805816052276e-05, 'epoch': 0.4} 13%|█▎ | 87/660 [04:14<26:19, 2.76s/it] 13%|█▎ | 88/660 [04:17<26:20, 2.76s/it] {'loss': 0.3843, 'learning_rate': 3.889612092933756e-05, 'epoch': 0.4} 13%|█▎ | 88/660 [04:17<26:20, 2.76s/it] 13%|█▎ | 89/660 [04:20<26:14, 2.76s/it] {'loss': 0.4114, 'learning_rate': 3.8863728383543466e-05, 'epoch': 0.4} 13%|█▎ | 89/660 [04:20<26:14, 2.76s/it] 14%|█▎ | 90/660 [04:22<26:09, 2.75s/it] {'loss': 0.4556, 'learning_rate': 3.883088130366042e-05, 'epoch': 0.41} 14%|█▎ | 90/660 [04:22<26:09, 2.75s/it] 14%|█▍ | 91/660 [04:25<26:09, 2.76s/it] {'loss': 0.3721, 'learning_rate': 3.8797580481160665e-05, 'epoch': 0.41} 14%|█▍ | 91/660 [04:25<26:09, 2.76s/it] 14%|█▍ | 92/660 [04:28<26:08, 2.76s/it] {'loss': 0.4451, 'learning_rate': 3.876382671844969e-05, 'epoch': 0.42} 14%|█▍ | 92/660 [04:28<26:08, 2.76s/it] 14%|█▍ | 93/660 [04:31<26:02, 2.75s/it] {'loss': 0.4409, 'learning_rate': 3.8729620828846856e-05, 'epoch': 0.42} 14%|█▍ | 93/660 [04:31<26:02, 2.75s/it] 14%|█▍ | 94/660 [04:33<25:59, 2.75s/it] {'loss': 0.3547, 'learning_rate': 3.869496363656585e-05, 'epoch': 0.43} 14%|█▍ | 94/660 [04:33<25:59, 2.75s/it] 14%|█▍ | 95/660 [04:36<25:58, 2.76s/it] {'loss': 0.3828, 'learning_rate': 3.865985597669478e-05, 'epoch': 0.43} 14%|█▍ | 95/660 [04:36<25:58, 2.76s/it] 15%|█▍ | 96/660 [04:39<25:55, 2.76s/it] {'loss': 0.4224, 'learning_rate': 3.862429869517607e-05, 'epoch': 0.44} 15%|█▍ | 96/660 [04:39<25:55, 2.76s/it] 15%|█▍ | 97/660 [04:42<25:53, 2.76s/it] {'loss': 0.4165, 'learning_rate': 3.8588292648786095e-05, 'epoch': 0.44} 15%|█▍ | 97/660 [04:42<25:53, 2.76s/it] 15%|█▍ | 98/660 [04:44<25:57, 2.77s/it] {'loss': 0.3352, 'learning_rate': 3.8551838705114484e-05, 'epoch': 0.45} 15%|█▍ | 98/660 [04:44<25:57, 2.77s/it] 15%|█▌ | 99/660 [04:47<25:52, 2.77s/it] {'loss': 0.3855, 'learning_rate': 3.851493774254328e-05, 'epoch': 0.45} 15%|█▌ | 99/660 [04:47<25:52, 2.77s/it] 15%|█▌ | 100/660 [04:50<25:48, 2.77s/it] {'loss': 0.3268, 'learning_rate': 3.8477590650225735e-05, 'epoch': 0.45} 15%|█▌ | 100/660 [04:50<25:48, 2.77s/it] 15%|█▌ | 101/660 [04:53<25:41, 2.76s/it] {'loss': 0.4158, 'learning_rate': 3.843979832806489e-05, 'epoch': 0.46} 15%|█▌ | 101/660 [04:53<25:41, 2.76s/it] 15%|█▌ | 102/660 [04:55<25:41, 2.76s/it] {'loss': 0.3711, 'learning_rate': 3.84015616866919e-05, 'epoch': 0.46} 15%|█▌ | 102/660 [04:55<25:41, 2.76s/it] 16%|█▌ | 103/660 [04:58<25:37, 2.76s/it] {'loss': 0.3862, 'learning_rate': 3.836288164744409e-05, 'epoch': 0.47} 16%|█▌ | 103/660 [04:58<25:37, 2.76s/it] 16%|█▌ | 104/660 [05:01<25:36, 2.76s/it] {'loss': 0.4187, 'learning_rate': 3.832375914234272e-05, 'epoch': 0.47} 16%|█▌ | 104/660 [05:01<25:36, 2.76s/it] 16%|█▌ | 105/660 [05:04<25:37, 2.77s/it] {'loss': 0.3416, 'learning_rate': 3.828419511407062e-05, 'epoch': 0.48} 16%|█▌ | 105/660 [05:04<25:37, 2.77s/it] 16%|█▌ | 106/660 [05:07<25:29, 2.76s/it] {'loss': 0.4111, 'learning_rate': 3.824419051594935e-05, 'epoch': 0.48} 16%|█▌ | 106/660 [05:07<25:29, 2.76s/it] 16%|█▌ | 107/660 [05:09<25:27, 2.76s/it] {'loss': 0.4028, 'learning_rate': 3.820374631191636e-05, 'epoch': 0.49} 16%|█▌ | 107/660 [05:09<25:27, 2.76s/it] 16%|█▋ | 108/660 [05:12<25:29, 2.77s/it] {'loss': 0.3098, 'learning_rate': 3.816286347650163e-05, 'epoch': 0.49} 16%|█▋ | 108/660 [05:12<25:29, 2.77s/it] 17%|█▋ | 109/660 [05:15<25:30, 2.78s/it] {'loss': 0.3168, 'learning_rate': 3.8121542994804295e-05, 'epoch': 0.5} 17%|█▋ | 109/660 [05:15<25:30, 2.78s/it] 17%|█▋ | 110/660 [05:18<25:26, 2.77s/it] {'loss': 0.3551, 'learning_rate': 3.807978586246887e-05, 'epoch': 0.5} 17%|█▋ | 110/660 [05:18<25:26, 2.77s/it] 17%|█▋ | 111/660 [05:20<25:25, 2.78s/it] {'loss': 0.3007, 'learning_rate': 3.803759308566123e-05, 'epoch': 0.5} 17%|█▋ | 111/660 [05:20<25:25, 2.78s/it] 17%|█▋ | 112/660 [05:23<25:21, 2.78s/it] {'loss': 0.3967, 'learning_rate': 3.7994965681044436e-05, 'epoch': 0.51} 17%|█▋ | 112/660 [05:23<25:21, 2.78s/it] 17%|█▋ | 113/660 [05:26<25:19, 2.78s/it] {'loss': 0.3828, 'learning_rate': 3.795190467575414e-05, 'epoch': 0.51} 17%|█▋ | 113/660 [05:26<25:19, 2.78s/it] 17%|█▋ | 114/660 [05:29<25:10, 2.77s/it] {'loss': 0.4634, 'learning_rate': 3.790841110737394e-05, 'epoch': 0.52} 17%|█▋ | 114/660 [05:29<25:10, 2.77s/it] 17%|█▋ | 115/660 [05:31<25:08, 2.77s/it] {'loss': 0.3923, 'learning_rate': 3.786448602391031e-05, 'epoch': 0.52} 17%|█▋ | 115/660 [05:31<25:08, 2.77s/it] 18%|█▊ | 116/660 [05:34<25:03, 2.76s/it] {'loss': 0.427, 'learning_rate': 3.782013048376736e-05, 'epoch': 0.53} 18%|█▊ | 116/660 [05:34<25:03, 2.76s/it] 18%|█▊ | 117/660 [05:37<25:03, 2.77s/it] {'loss': 0.3428, 'learning_rate': 3.7775345555721356e-05, 'epoch': 0.53} 18%|█▊ | 117/660 [05:37<25:03, 2.77s/it] 18%|█▊ | 118/660 [05:40<24:59, 2.77s/it] {'loss': 0.3997, 'learning_rate': 3.7730132318894936e-05, 'epoch': 0.54} 18%|█▊ | 118/660 [05:40<24:59, 2.77s/it] 18%|█▊ | 119/660 [05:43<25:00, 2.77s/it] {'loss': 0.3428, 'learning_rate': 3.768449186273113e-05, 'epoch': 0.54} 18%|█▊ | 119/660 [05:43<25:00, 2.77s/it] 18%|█▊ | 120/660 [05:45<25:03, 2.78s/it] {'loss': 0.3105, 'learning_rate': 3.76384252869671e-05, 'epoch': 0.55} 18%|█▊ | 120/660 [05:45<25:03, 2.78s/it] 18%|█▊ | 121/660 [05:48<25:00, 2.78s/it] {'loss': 0.3801, 'learning_rate': 3.759193370160766e-05, 'epoch': 0.55} 18%|█▊ | 121/660 [05:48<25:00, 2.78s/it] 18%|█▊ | 122/660 [05:51<24:51, 2.77s/it] {'loss': 0.469, 'learning_rate': 3.7545018226898486e-05, 'epoch': 0.55} 18%|█▊ | 122/660 [05:51<24:51, 2.77s/it] 19%|█▊ | 123/660 [05:54<24:52, 2.78s/it] {'loss': 0.3047, 'learning_rate': 3.749767999329917e-05, 'epoch': 0.56} 19%|█▊ | 123/660 [05:54<24:52, 2.78s/it] 19%|█▉ | 124/660 [05:56<24:46, 2.77s/it] {'loss': 0.3835, 'learning_rate': 3.744992014145595e-05, 'epoch': 0.56} 19%|█▉ | 124/660 [05:56<24:46, 2.77s/it] 19%|█▉ | 125/660 [05:59<24:42, 2.77s/it] {'loss': 0.4114, 'learning_rate': 3.740173982217423e-05, 'epoch': 0.57} 19%|█▉ | 125/660 [05:59<24:42, 2.77s/it] 19%|█▉ | 126/660 [06:02<24:39, 2.77s/it] {'loss': 0.3203, 'learning_rate': 3.735314019639089e-05, 'epoch': 0.57} 19%|█▉ | 126/660 [06:02<24:39, 2.77s/it] 19%|█▉ | 127/660 [06:05<24:37, 2.77s/it] {'loss': 0.3564, 'learning_rate': 3.730412243514623e-05, 'epoch': 0.58} 19%|█▉ | 127/660 [06:05<24:37, 2.77s/it] 19%|█▉ | 128/660 [06:08<24:38, 2.78s/it] {'loss': 0.3376, 'learning_rate': 3.725468771955584e-05, 'epoch': 0.58} 19%|█▉ | 128/660 [06:08<24:38, 2.78s/it] 20%|█▉ | 129/660 [06:10<24:35, 2.78s/it] {'loss': 0.373, 'learning_rate': 3.720483724078209e-05, 'epoch': 0.59} 20%|█▉ | 129/660 [06:10<24:35, 2.78s/it] 20%|█▉ | 130/660 [06:13<24:30, 2.77s/it] {'loss': 0.3882, 'learning_rate': 3.7154572200005446e-05, 'epoch': 0.59} 20%|█▉ | 130/660 [06:13<24:30, 2.77s/it] 20%|█▉ | 131/660 [06:16<24:25, 2.77s/it] {'loss': 0.4131, 'learning_rate': 3.710389380839551e-05, 'epoch': 0.6} 20%|█▉ | 131/660 [06:16<24:25, 2.77s/it] 20%|██ | 132/660 [06:19<24:17, 2.76s/it] {'loss': 0.3894, 'learning_rate': 3.705280328708185e-05, 'epoch': 0.6} 20%|██ | 132/660 [06:19<24:17, 2.76s/it] 20%|██ | 133/660 [06:21<24:13, 2.76s/it] {'loss': 0.4065, 'learning_rate': 3.700130186712458e-05, 'epoch': 0.6} 20%|██ | 133/660 [06:21<24:13, 2.76s/it] 20%|██ | 134/660 [06:24<24:06, 2.75s/it] {'loss': 0.4578, 'learning_rate': 3.694939078948469e-05, 'epoch': 0.61} 20%|██ | 134/660 [06:24<24:06, 2.75s/it] 20%|██ | 135/660 [06:27<23:59, 2.74s/it] {'loss': 0.4265, 'learning_rate': 3.6897071304994145e-05, 'epoch': 0.61} 20%|██ | 135/660 [06:27<23:59, 2.74s/it] 21%|██ | 136/660 [06:30<23:59, 2.75s/it] {'loss': 0.4333, 'learning_rate': 3.684434467432573e-05, 'epoch': 0.62} 21%|██ | 136/660 [06:30<23:59, 2.75s/it] 21%|██ | 137/660 [06:32<24:03, 2.76s/it] {'loss': 0.3567, 'learning_rate': 3.679121216796272e-05, 'epoch': 0.62} 21%|██ | 137/660 [06:32<24:03, 2.76s/it] 21%|██ | 138/660 [06:35<24:02, 2.76s/it] {'loss': 0.3972, 'learning_rate': 3.6737675066168185e-05, 'epoch': 0.63} 21%|██ | 138/660 [06:35<24:02, 2.76s/it] 21%|██ | 139/660 [06:38<23:59, 2.76s/it] {'loss': 0.3899, 'learning_rate': 3.668373465895425e-05, 'epoch': 0.63} 21%|██ | 139/660 [06:38<23:59, 2.76s/it] 21%|██ | 140/660 [06:41<23:58, 2.77s/it] {'loss': 0.3789, 'learning_rate': 3.662939224605091e-05, 'epoch': 0.64} 21%|██ | 140/660 [06:41<23:58, 2.77s/it] 21%|██▏ | 141/660 [06:43<23:52, 2.76s/it] {'loss': 0.4131, 'learning_rate': 3.6574649136874766e-05, 'epoch': 0.64} 21%|██▏ | 141/660 [06:43<23:52, 2.76s/it] 22%|██▏ | 142/660 [06:46<23:49, 2.76s/it] {'loss': 0.3967, 'learning_rate': 3.651950665049746e-05, 'epoch': 0.65} 22%|██▏ | 142/660 [06:46<23:49, 2.76s/it] 22%|██▏ | 143/660 [06:49<23:45, 2.76s/it] {'loss': 0.384, 'learning_rate': 3.646396611561392e-05, 'epoch': 0.65} 22%|██▏ | 143/660 [06:49<23:45, 2.76s/it] 22%|██▏ | 144/660 [06:52<23:42, 2.76s/it] {'loss': 0.3544, 'learning_rate': 3.640802887051027e-05, 'epoch': 0.65} 22%|██▏ | 144/660 [06:52<23:42, 2.76s/it] 22%|██▏ | 145/660 [06:55<23:48, 2.77s/it] {'loss': 0.342, 'learning_rate': 3.635169626303168e-05, 'epoch': 0.66} 22%|██▏ | 145/660 [06:55<23:48, 2.77s/it] 22%|██▏ | 146/660 [06:57<23:42, 2.77s/it] {'loss': 0.418, 'learning_rate': 3.629496965054979e-05, 'epoch': 0.66} 22%|██▏ | 146/660 [06:57<23:42, 2.77s/it] 22%|██▏ | 147/660 [07:00<23:44, 2.78s/it] {'loss': 0.3046, 'learning_rate': 3.62378503999301e-05, 'epoch': 0.67} 22%|██▏ | 147/660 [07:00<23:44, 2.78s/it] 22%|██▏ | 148/660 [07:03<23:40, 2.77s/it] {'loss': 0.3862, 'learning_rate': 3.6180339887498953e-05, 'epoch': 0.67} 22%|██▏ | 148/660 [07:03<23:40, 2.77s/it] 23%|██▎ | 149/660 [07:06<23:36, 2.77s/it] {'loss': 0.3795, 'learning_rate': 3.612243949901042e-05, 'epoch': 0.68} 23%|██▎ | 149/660 [07:06<23:36, 2.77s/it] 23%|██▎ | 150/660 [07:08<23:34, 2.77s/it] {'loss': 0.3823, 'learning_rate': 3.60641506296129e-05, 'epoch': 0.68} 23%|██▎ | 150/660 [07:08<23:34, 2.77s/it] 23%|██▎ | 151/660 [07:11<23:27, 2.77s/it] {'loss': 0.3918, 'learning_rate': 3.600547468381549e-05, 'epoch': 0.69} 23%|██▎ | 151/660 [07:11<23:27, 2.77s/it] 23%|██▎ | 152/660 [07:14<23:24, 2.76s/it] {'loss': 0.4111, 'learning_rate': 3.594641307545414e-05, 'epoch': 0.69} 23%|██▎ | 152/660 [07:14<23:24, 2.76s/it] 23%|██▎ | 153/660 [07:17<23:22, 2.77s/it] {'loss': 0.326, 'learning_rate': 3.5886967227657635e-05, 'epoch': 0.7} 23%|██▎ | 153/660 [07:17<23:22, 2.77s/it] 23%|██▎ | 154/660 [07:19<23:17, 2.76s/it] {'loss': 0.3784, 'learning_rate': 3.582713857281321e-05, 'epoch': 0.7} 23%|██▎ | 154/660 [07:19<23:17, 2.76s/it] 23%|██▎ | 155/660 [07:22<23:19, 2.77s/it] {'loss': 0.3208, 'learning_rate': 3.576692855253213e-05, 'epoch': 0.7} 23%|██▎ | 155/660 [07:22<23:19, 2.77s/it] 24%|██▎ | 156/660 [07:25<23:18, 2.78s/it] {'loss': 0.3953, 'learning_rate': 3.57063386176149e-05, 'epoch': 0.71} 24%|██▎ | 156/660 [07:25<23:18, 2.78s/it] 24%|██▍ | 157/660 [07:28<23:13, 2.77s/it] {'loss': 0.4219, 'learning_rate': 3.564537022801634e-05, 'epoch': 0.71} 24%|██▍ | 157/660 [07:28<23:13, 2.77s/it] 24%|██▍ | 158/660 [07:31<23:10, 2.77s/it] {'loss': 0.4011, 'learning_rate': 3.558402485281034e-05, 'epoch': 0.72} 24%|██▍ | 158/660 [07:31<23:10, 2.77s/it] 24%|██▍ | 159/660 [07:33<23:08, 2.77s/it] {'loss': 0.3574, 'learning_rate': 3.552230397015456e-05, 'epoch': 0.72} 24%|██▍ | 159/660 [07:33<23:08, 2.77s/it] 24%|██▍ | 160/660 [07:36<23:01, 2.76s/it] {'loss': 0.4443, 'learning_rate': 3.546020906725474e-05, 'epoch': 0.73} 24%|██▍ | 160/660 [07:36<23:01, 2.76s/it] 24%|██▍ | 161/660 [07:39<22:56, 2.76s/it] {'loss': 0.4086, 'learning_rate': 3.5397741640328895e-05, 'epoch': 0.73} 24%|██▍ | 161/660 [07:39<22:56, 2.76s/it] 25%|██▍ | 162/660 [07:42<22:49, 2.75s/it] {'loss': 0.4055, 'learning_rate': 3.5334903194571235e-05, 'epoch': 0.74} 25%|██▍ | 162/660 [07:42<22:49, 2.75s/it] 25%|██▍ | 163/660 [07:44<22:42, 2.74s/it] {'loss': 0.4648, 'learning_rate': 3.5271695244115935e-05, 'epoch': 0.74} 25%|██▍ | 163/660 [07:44<22:42, 2.74s/it] 25%|██▍ | 164/660 [07:47<22:42, 2.75s/it] {'loss': 0.3984, 'learning_rate': 3.520811931200063e-05, 'epoch': 0.75} 25%|██▍ | 164/660 [07:47<22:42, 2.75s/it] 25%|██▌ | 165/660 [07:50<22:39, 2.75s/it] {'loss': 0.4128, 'learning_rate': 3.5144176930129694e-05, 'epoch': 0.75} 25%|██▌ | 165/660 [07:50<22:39, 2.75s/it] 25%|██▌ | 166/660 [07:52<22:39, 2.75s/it] {'loss': 0.3621, 'learning_rate': 3.507986963923739e-05, 'epoch': 0.75} 25%|██▌ | 166/660 [07:52<22:39, 2.75s/it] 25%|██▌ | 167/660 [07:55<22:39, 2.76s/it] {'loss': 0.3967, 'learning_rate': 3.501519898885069e-05, 'epoch': 0.76} 25%|██▌ | 167/660 [07:55<22:39, 2.76s/it] 25%|██▌ | 168/660 [07:58<22:36, 2.76s/it] {'loss': 0.384, 'learning_rate': 3.495016653725194e-05, 'epoch': 0.76} 25%|██▌ | 168/660 [07:58<22:36, 2.76s/it] 26%|██▌ | 169/660 [08:01<22:29, 2.75s/it] {'loss': 0.4456, 'learning_rate': 3.488477385144134e-05, 'epoch': 0.77} 26%|██▌ | 169/660 [08:01<22:29, 2.75s/it] 26%|██▌ | 170/660 [08:03<22:25, 2.74s/it] {'loss': 0.4519, 'learning_rate': 3.4819022507099184e-05, 'epoch': 0.77} 26%|██▌ | 170/660 [08:03<22:25, 2.74s/it] 26%|██▌ | 171/660 [08:06<22:24, 2.75s/it] {'loss': 0.3982, 'learning_rate': 3.4752914088547864e-05, 'epoch': 0.78} 26%|██▌ | 171/660 [08:06<22:24, 2.75s/it] 26%|██▌ | 172/660 [08:09<22:25, 2.76s/it] {'loss': 0.3798, 'learning_rate': 3.468645018871371e-05, 'epoch': 0.78} 26%|██▌ | 172/660 [08:09<22:25, 2.76s/it] 26%|██▌ | 173/660 [08:12<22:21, 2.76s/it] {'loss': 0.4365, 'learning_rate': 3.461963240908864e-05, 'epoch': 0.79} 26%|██▌ | 173/660 [08:12<22:21, 2.76s/it] 26%|██▋ | 174/660 [08:15<22:26, 2.77s/it] {'loss': 0.2789, 'learning_rate': 3.45524623596915e-05, 'epoch': 0.79} 26%|██▋ | 174/660 [08:15<22:26, 2.77s/it] 27%|██▋ | 175/660 [08:17<22:19, 2.76s/it] {'loss': 0.4272, 'learning_rate': 3.448494165902935e-05, 'epoch': 0.8} 27%|██▋ | 175/660 [08:17<22:19, 2.76s/it] 27%|██▋ | 176/660 [08:20<22:18, 2.77s/it] {'loss': 0.34, 'learning_rate': 3.441707193405838e-05, 'epoch': 0.8} 27%|██▋ | 176/660 [08:20<22:18, 2.77s/it] 27%|██▋ | 177/660 [08:23<22:19, 2.77s/it] {'loss': 0.3225, 'learning_rate': 3.434885482014481e-05, 'epoch': 0.8} 27%|██▋ | 177/660 [08:23<22:19, 2.77s/it] 27%|██▋ | 178/660 [08:26<22:17, 2.78s/it] {'loss': 0.3363, 'learning_rate': 3.428029196102537e-05, 'epoch': 0.81} 27%|██▋ | 178/660 [08:26<22:17, 2.78s/it] 27%|██▋ | 179/660 [08:28<22:12, 2.77s/it] {'loss': 0.3959, 'learning_rate': 3.4211385008767795e-05, 'epoch': 0.81} 27%|██▋ | 179/660 [08:28<22:12, 2.77s/it] 27%|██▋ | 180/660 [08:31<22:11, 2.77s/it] {'loss': 0.3301, 'learning_rate': 3.4142135623730954e-05, 'epoch': 0.82} 27%|██▋ | 180/660 [08:31<22:11, 2.77s/it] 27%|██▋ | 181/660 [08:34<22:03, 2.76s/it] {'loss': 0.4717, 'learning_rate': 3.4072545474524865e-05, 'epoch': 0.82} 27%|██▋ | 181/660 [08:34<22:03, 2.76s/it] 28%|██▊ | 182/660 [08:37<22:03, 2.77s/it] {'loss': 0.3137, 'learning_rate': 3.4002616237970473e-05, 'epoch': 0.83} 28%|██▊ | 182/660 [08:37<22:03, 2.77s/it] 28%|██▊ | 183/660 [08:39<22:01, 2.77s/it] {'loss': 0.3665, 'learning_rate': 3.393234959905928e-05, 'epoch': 0.83} 28%|██▊ | 183/660 [08:40<22:01, 2.77s/it] 28%|██▊ | 184/660 [08:42<21:56, 2.77s/it] {'loss': 0.4058, 'learning_rate': 3.3861747250912724e-05, 'epoch': 0.84} 28%|██▊ | 184/660 [08:42<21:56, 2.77s/it] 28%|██▊ | 185/660 [08:45<21:52, 2.76s/it] {'loss': 0.3635, 'learning_rate': 3.379081089474134e-05, 'epoch': 0.84} 28%|██▊ | 185/660 [08:45<21:52, 2.76s/it] 28%|██▊ | 186/660 [08:48<21:53, 2.77s/it] {'loss': 0.3203, 'learning_rate': 3.371954223980386e-05, 'epoch': 0.85} 28%|██▊ | 186/660 [08:48<21:53, 2.77s/it] 28%|██▊ | 187/660 [08:51<21:51, 2.77s/it] {'loss': 0.3591, 'learning_rate': 3.364794300336594e-05, 'epoch': 0.85} 28%|██▊ | 187/660 [08:51<21:51, 2.77s/it] 28%|██▊ | 188/660 [08:53<21:42, 2.76s/it] {'loss': 0.408, 'learning_rate': 3.357601491065884e-05, 'epoch': 0.85} 28%|██▊ | 188/660 [08:53<21:42, 2.76s/it] 29%|██▊ | 189/660 [08:56<21:39, 2.76s/it] {'loss': 0.3716, 'learning_rate': 3.3503759694837814e-05, 'epoch': 0.86} 29%|██▊ | 189/660 [08:56<21:39, 2.76s/it] 29%|██▉ | 190/660 [08:59<21:39, 2.77s/it] {'loss': 0.3826, 'learning_rate': 3.3431179096940375e-05, 'epoch': 0.86} 29%|██▉ | 190/660 [08:59<21:39, 2.77s/it] 29%|██▉ | 191/660 [09:02<21:35, 2.76s/it] {'loss': 0.4131, 'learning_rate': 3.335827486584433e-05, 'epoch': 0.87} 29%|██▉ | 191/660 [09:02<21:35, 2.76s/it] 29%|██▉ | 192/660 [09:04<21:32, 2.76s/it] {'loss': 0.3828, 'learning_rate': 3.328504875822564e-05, 'epoch': 0.87} 29%|██▉ | 192/660 [09:04<21:32, 2.76s/it] 29%|██▉ | 193/660 [09:07<21:32, 2.77s/it] {'loss': 0.353, 'learning_rate': 3.321150253851611e-05, 'epoch': 0.88} 29%|██▉ | 193/660 [09:07<21:32, 2.77s/it] 29%|██▉ | 194/660 [09:10<21:31, 2.77s/it] {'loss': 0.3811, 'learning_rate': 3.313763797886083e-05, 'epoch': 0.88} 29%|██▉ | 194/660 [09:10<21:31, 2.77s/it] 30%|██▉ | 195/660 [09:13<21:26, 2.77s/it] {'loss': 0.363, 'learning_rate': 3.306345685907553e-05, 'epoch': 0.89} 30%|██▉ | 195/660 [09:13<21:26, 2.77s/it] 30%|██▉ | 196/660 [09:15<21:26, 2.77s/it] {'loss': 0.3362, 'learning_rate': 3.298896096660367e-05, 'epoch': 0.89} 30%|██▉ | 196/660 [09:15<21:26, 2.77s/it] 30%|██▉ | 197/660 [09:18<21:20, 2.77s/it] {'loss': 0.4651, 'learning_rate': 3.291415209647335e-05, 'epoch': 0.9} 30%|██▉ | 197/660 [09:18<21:20, 2.77s/it] 30%|███ | 198/660 [09:21<21:20, 2.77s/it] {'loss': 0.3191, 'learning_rate': 3.283903205125406e-05, 'epoch': 0.9} 30%|███ | 198/660 [09:21<21:20, 2.77s/it] 30%|███ | 199/660 [09:24<21:15, 2.77s/it] {'loss': 0.3352, 'learning_rate': 3.276360264101331e-05, 'epoch': 0.9} 30%|███ | 199/660 [09:24<21:15, 2.77s/it] 30%|███ | 200/660 [09:27<21:09, 2.76s/it] {'loss': 0.4275, 'learning_rate': 3.268786568327291e-05, 'epoch': 0.91} 30%|███ | 200/660 [09:27<21:09, 2.76s/it] 30%|███ | 201/660 [09:29<21:04, 2.76s/it] {'loss': 0.415, 'learning_rate': 3.261182300296528e-05, 'epoch': 0.91} 30%|███ | 201/660 [09:29<21:04, 2.76s/it] 31%|███ | 202/660 [09:32<21:06, 2.76s/it] {'loss': 0.3933, 'learning_rate': 3.2535476432389396e-05, 'epoch': 0.92} 31%|███ | 202/660 [09:32<21:06, 2.76s/it] 31%|███ | 203/660 [09:35<21:04, 2.77s/it] {'loss': 0.3958, 'learning_rate': 3.245882781116668e-05, 'epoch': 0.92} 31%|███ | 203/660 [09:35<21:04, 2.77s/it] 31%|███ | 204/660 [09:38<21:01, 2.77s/it] {'loss': 0.3677, 'learning_rate': 3.238187898619669e-05, 'epoch': 0.93} 31%|███ | 204/660 [09:38<21:01, 2.77s/it] 31%|███ | 205/660 [09:40<21:02, 2.77s/it] {'loss': 0.354, 'learning_rate': 3.230463181161254e-05, 'epoch': 0.93} 31%|███ | 205/660 [09:40<21:02, 2.77s/it] 31%|███ | 206/660 [09:43<20:59, 2.77s/it] {'loss': 0.3787, 'learning_rate': 3.222708814873633e-05, 'epoch': 0.94} 31%|███ | 206/660 [09:43<20:59, 2.77s/it] 31%|███▏ | 207/660 [09:46<20:52, 2.77s/it] {'loss': 0.4424, 'learning_rate': 3.214924986603422e-05, 'epoch': 0.94} 31%|███▏ | 207/660 [09:46<20:52, 2.77s/it] 32%|███▏ | 208/660 [09:49<20:49, 2.76s/it] {'loss': 0.4253, 'learning_rate': 3.207111883907143e-05, 'epoch': 0.95} 32%|███▏ | 208/660 [09:49<20:49, 2.76s/it] 32%|███▏ | 209/660 [09:51<20:47, 2.77s/it] {'loss': 0.3868, 'learning_rate': 3.199269695046705e-05, 'epoch': 0.95} 32%|███▏ | 209/660 [09:51<20:47, 2.77s/it] 32%|███▏ | 210/660 [09:54<20:43, 2.76s/it] {'loss': 0.3967, 'learning_rate': 3.191398608984867e-05, 'epoch': 0.95} 32%|███▏ | 210/660 [09:54<20:43, 2.76s/it] 32%|███▏ | 211/660 [09:57<20:39, 2.76s/it] {'loss': 0.3848, 'learning_rate': 3.183498815380686e-05, 'epoch': 0.96} 32%|███▏ | 211/660 [09:57<20:39, 2.76s/it] 32%|███▏ | 212/660 [10:00<20:35, 2.76s/it] {'loss': 0.3494, 'learning_rate': 3.1755705045849465e-05, 'epoch': 0.96} 32%|███▏ | 212/660 [10:00<20:35, 2.76s/it] 32%|███▏ | 213/660 [10:02<20:32, 2.76s/it] {'loss': 0.3574, 'learning_rate': 3.167613867635573e-05, 'epoch': 0.97} 32%|███▏ | 213/660 [10:02<20:32, 2.76s/it] 32%|███▏ | 214/660 [10:05<20:36, 2.77s/it] {'loss': 0.3265, 'learning_rate': 3.159629096253028e-05, 'epoch': 0.97} 32%|███▏ | 214/660 [10:05<20:36, 2.77s/it] 33%|███▎ | 215/660 [10:08<20:35, 2.78s/it] {'loss': 0.3718, 'learning_rate': 3.1516163828356915e-05, 'epoch': 0.98} 33%|███▎ | 215/660 [10:08<20:35, 2.78s/it] 33%|███▎ | 216/660 [10:11<20:32, 2.78s/it] {'loss': 0.365, 'learning_rate': 3.1435759204552246e-05, 'epoch': 0.98} 33%|███▎ | 216/660 [10:11<20:32, 2.78s/it] 33%|███▎ | 217/660 [10:14<20:28, 2.77s/it] {'loss': 0.3738, 'learning_rate': 3.1355079028519216e-05, 'epoch': 0.99} 33%|███▎ | 217/660 [10:14<20:28, 2.77s/it] 33%|███▎ | 218/660 [10:16<20:24, 2.77s/it] {'loss': 0.3828, 'learning_rate': 3.1274125244300336e-05, 'epoch': 0.99} 33%|███▎ | 218/660 [10:16<20:24, 2.77s/it] 33%|███▎ | 219/660 [10:19<20:20, 2.77s/it] {'loss': 0.3745, 'learning_rate': 3.119289980253092e-05, 'epoch': 1.0} 33%|███▎ | 219/660 [10:19<20:20, 2.77s/it] 33%|███▎ | 220/660 [10:22<20:46, 2.83s/it] {'loss': 0.3728, 'learning_rate': 3.111140466039205e-05, 'epoch': 1.0} 33%|███▎ | 220/660 [10:22<20:46, 2.83s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (2660 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2793 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2822 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2811 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2819 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3134 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2999 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2957 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2885 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3430 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3159 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3376 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3062 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3256 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3089 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2873 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3104 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3628 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3125 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3082 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3843 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3176 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3550 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3200 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3138 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3376 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2951 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2973 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2869 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3115 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3090 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (5312 > 2048). Running this sequence through the model will result in indexing errors 33%|███▎ | 221/660 [10:26<22:51, 3.12s/it] {'loss': 0.2803, 'learning_rate': 3.102964178156339e-05, 'epoch': 1.0} 33%|███▎ | 221/660 [10:26<22:51, 3.12s/it] 34%|███▎ | 222/660 [10:29<21:57, 3.01s/it] {'loss': 0.3955, 'learning_rate': 3.094761313617594e-05, 'epoch': 1.01} 34%|███▎ | 222/660 [10:29<21:57, 3.01s/it] 34%|███▍ | 223/660 [10:31<21:24, 2.94s/it] {'loss': 0.3201, 'learning_rate': 3.086532070076447e-05, 'epoch': 1.01} 34%|███▍ | 223/660 [10:31<21:24, 2.94s/it] 34%|███▍ | 224/660 [10:34<20:57, 2.88s/it] {'loss': 0.3713, 'learning_rate': 3.078276645822001e-05, 'epoch': 1.02} 34%|███▍ | 224/660 [10:34<20:57, 2.88s/it] 34%|███▍ | 225/660 [10:37<20:36, 2.84s/it] {'loss': 0.3833, 'learning_rate': 3.069995239774195e-05, 'epoch': 1.02} 34%|███▍ | 225/660 [10:37<20:36, 2.84s/it] 34%|███▍ | 226/660 [10:40<20:21, 2.82s/it] {'loss': 0.3359, 'learning_rate': 3.061688051479019e-05, 'epoch': 1.03} 34%|███▍ | 226/660 [10:40<20:21, 2.82s/it] 34%|███▍ | 227/660 [10:42<20:12, 2.80s/it] {'loss': 0.3481, 'learning_rate': 3.0533552811037046e-05, 'epoch': 1.03} 34%|███▍ | 227/660 [10:42<20:12, 2.80s/it] 35%|███▍ | 228/660 [10:45<20:03, 2.79s/it] {'loss': 0.3523, 'learning_rate': 3.0449971294318977e-05, 'epoch': 1.04} 35%|███▍ | 228/660 [10:45<20:03, 2.79s/it] 35%|███▍ | 229/660 [10:48<19:58, 2.78s/it] {'loss': 0.3789, 'learning_rate': 3.0366137978588265e-05, 'epoch': 1.04} 35%|███▍ | 229/660 [10:48<19:58, 2.78s/it] 35%|███▍ | 230/660 [10:51<19:57, 2.78s/it] {'loss': 0.2993, 'learning_rate': 3.0282054883864434e-05, 'epoch': 1.05} 35%|███▍ | 230/660 [10:51<19:57, 2.78s/it] 35%|███▌ | 231/660 [10:54<19:53, 2.78s/it] {'loss': 0.304, 'learning_rate': 3.0197724036185616e-05, 'epoch': 1.05} 35%|███▌ | 231/660 [10:54<19:53, 2.78s/it] 35%|███▌ | 232/660 [10:56<19:48, 2.78s/it] {'loss': 0.3555, 'learning_rate': 3.0113147467559697e-05, 'epoch': 1.05} 35%|███▌ | 232/660 [10:56<19:48, 2.78s/it] 35%|███▌ | 233/660 [10:59<19:44, 2.77s/it] {'loss': 0.304, 'learning_rate': 3.0028327215915382e-05, 'epoch': 1.06} 35%|███▌ | 233/660 [10:59<19:44, 2.77s/it] 35%|███▌ | 234/660 [11:02<19:39, 2.77s/it] {'loss': 0.3674, 'learning_rate': 2.994326532505309e-05, 'epoch': 1.06} 35%|███▌ | 234/660 [11:02<19:39, 2.77s/it] 36%|███▌ | 235/660 [11:05<19:36, 2.77s/it] {'loss': 0.311, 'learning_rate': 2.9857963844595684e-05, 'epoch': 1.07} 36%|███▌ | 235/660 [11:05<19:36, 2.77s/it] 36%|███▌ | 236/660 [11:07<19:32, 2.77s/it] {'loss': 0.3293, 'learning_rate': 2.9772424829939103e-05, 'epoch': 1.07} 36%|███▌ | 236/660 [11:07<19:32, 2.77s/it] 36%|███▌ | 237/660 [11:10<19:30, 2.77s/it] {'loss': 0.3492, 'learning_rate': 2.9686650342202827e-05, 'epoch': 1.08} 36%|███▌ | 237/660 [11:10<19:30, 2.77s/it] 36%|███▌ | 238/660 [11:13<19:25, 2.76s/it] {'loss': 0.3267, 'learning_rate': 2.9600642448180227e-05, 'epoch': 1.08} 36%|███▌ | 238/660 [11:13<19:25, 2.76s/it] 36%|███▌ | 239/660 [11:16<19:19, 2.76s/it] {'loss': 0.3745, 'learning_rate': 2.9514403220288744e-05, 'epoch': 1.09} 36%|███▌ | 239/660 [11:16<19:19, 2.76s/it] 36%|███▋ | 240/660 [11:18<19:19, 2.76s/it] {'loss': 0.3135, 'learning_rate': 2.9427934736519962e-05, 'epoch': 1.09} 36%|███▋ | 240/660 [11:18<19:19, 2.76s/it] 37%|███▋ | 241/660 [11:21<19:20, 2.77s/it] {'loss': 0.3054, 'learning_rate': 2.934123908038953e-05, 'epoch': 1.1} 37%|███▋ | 241/660 [11:21<19:20, 2.77s/it] 37%|███▋ | 242/660 [11:24<19:16, 2.77s/it] {'loss': 0.3857, 'learning_rate': 2.9254318340887008e-05, 'epoch': 1.1} 37%|███▋ | 242/660 [11:24<19:16, 2.77s/it] 37%|███▋ | 243/660 [11:27<19:10, 2.76s/it] {'loss': 0.3486, 'learning_rate': 2.9167174612425427e-05, 'epoch': 1.1} 37%|███▋ | 243/660 [11:27<19:10, 2.76s/it] 37%|███▋ | 244/660 [11:29<19:14, 2.77s/it] {'loss': 0.2719, 'learning_rate': 2.9079809994790937e-05, 'epoch': 1.11} 37%|███▋ | 244/660 [11:29<19:14, 2.77s/it] 37%|███▋ | 245/660 [11:32<19:09, 2.77s/it] {'loss': 0.3486, 'learning_rate': 2.8992226593092135e-05, 'epoch': 1.11} 37%|███▋ | 245/660 [11:32<19:09, 2.77s/it] 37%|███▋ | 246/660 [11:35<19:06, 2.77s/it] {'loss': 0.3596, 'learning_rate': 2.8904426517709368e-05, 'epoch': 1.12} 37%|███▋ | 246/660 [11:35<19:06, 2.77s/it] 37%|███▋ | 247/660 [11:38<19:07, 2.78s/it] {'loss': 0.2834, 'learning_rate': 2.8816411884243885e-05, 'epoch': 1.12} 37%|███▋ | 247/660 [11:38<19:07, 2.78s/it] 38%|███▊ | 248/660 [11:41<19:01, 2.77s/it] {'loss': 0.3535, 'learning_rate': 2.872818481346684e-05, 'epoch': 1.13} 38%|███▊ | 248/660 [11:41<19:01, 2.77s/it] 38%|███▊ | 249/660 [11:43<18:57, 2.77s/it] {'loss': 0.3594, 'learning_rate': 2.863974743126824e-05, 'epoch': 1.13} 38%|███▊ | 249/660 [11:43<18:57, 2.77s/it] 38%|███▊ | 250/660 [11:46<18:53, 2.76s/it] {'loss': 0.3621, 'learning_rate': 2.8551101868605644e-05, 'epoch': 1.14} 38%|███▊ | 250/660 [11:46<18:53, 2.76s/it] 38%|███▊ | 251/660 [11:49<18:48, 2.76s/it] {'loss': 0.3807, 'learning_rate': 2.8462250261452884e-05, 'epoch': 1.14} 38%|███▊ | 251/660 [11:49<18:48, 2.76s/it] 38%|███▊ | 252/660 [11:52<18:44, 2.76s/it] {'loss': 0.3811, 'learning_rate': 2.8373194750748566e-05, 'epoch': 1.15} 38%|███▊ | 252/660 [11:52<18:44, 2.76s/it] 38%|███▊ | 253/660 [11:54<18:43, 2.76s/it] {'loss': 0.3411, 'learning_rate': 2.8283937482344477e-05, 'epoch': 1.15} 38%|███▊ | 253/660 [11:54<18:43, 2.76s/it] 38%|███▊ | 254/660 [11:57<18:42, 2.76s/it] {'loss': 0.3123, 'learning_rate': 2.819448060695391e-05, 'epoch': 1.15} 38%|███▊ | 254/660 [11:57<18:42, 2.76s/it] 39%|███▊ | 255/660 [12:00<18:39, 2.77s/it] {'loss': 0.3125, 'learning_rate': 2.8104826280099796e-05, 'epoch': 1.16} 39%|███▊ | 255/660 [12:00<18:39, 2.77s/it] 39%|███▉ | 256/660 [12:03<18:36, 2.76s/it] {'loss': 0.348, 'learning_rate': 2.8014976662062818e-05, 'epoch': 1.16} 39%|███▉ | 256/660 [12:03<18:36, 2.76s/it] 39%|███▉ | 257/660 [12:05<18:30, 2.76s/it] {'loss': 0.3994, 'learning_rate': 2.7924933917829314e-05, 'epoch': 1.17} 39%|███▉ | 257/660 [12:05<18:30, 2.76s/it] 39%|███▉ | 258/660 [12:08<18:27, 2.76s/it] {'loss': 0.3263, 'learning_rate': 2.783470021703912e-05, 'epoch': 1.17} 39%|███▉ | 258/660 [12:08<18:27, 2.76s/it] 39%|███▉ | 259/660 [12:11<18:25, 2.76s/it] {'loss': 0.3202, 'learning_rate': 2.7744277733933315e-05, 'epoch': 1.18} 39%|███▉ | 259/660 [12:11<18:25, 2.76s/it] 39%|███▉ | 260/660 [12:14<18:23, 2.76s/it] {'loss': 0.3367, 'learning_rate': 2.7653668647301797e-05, 'epoch': 1.18} 39%|███▉ | 260/660 [12:14<18:23, 2.76s/it] 40%|███▉ | 261/660 [12:16<18:19, 2.76s/it] {'loss': 0.3281, 'learning_rate': 2.756287514043081e-05, 'epoch': 1.19} 40%|███▉ | 261/660 [12:16<18:19, 2.76s/it] 40%|███▉ | 262/660 [12:19<18:19, 2.76s/it] {'loss': 0.3278, 'learning_rate': 2.747189940105033e-05, 'epoch': 1.19} 40%|███▉ | 262/660 [12:19<18:19, 2.76s/it] 40%|███▉ | 263/660 [12:22<18:17, 2.77s/it] {'loss': 0.3403, 'learning_rate': 2.738074362128133e-05, 'epoch': 1.2} 40%|███▉ | 263/660 [12:22<18:17, 2.77s/it] 40%|████ | 264/660 [12:25<18:15, 2.77s/it] {'loss': 0.3181, 'learning_rate': 2.7289409997583002e-05, 'epoch': 1.2} 40%|████ | 264/660 [12:25<18:15, 2.77s/it] 40%|████ | 265/660 [12:27<18:13, 2.77s/it] {'loss': 0.3354, 'learning_rate': 2.7197900730699768e-05, 'epoch': 1.2} 40%|████ | 265/660 [12:27<18:13, 2.77s/it] 40%|████ | 266/660 [12:30<18:13, 2.78s/it] {'loss': 0.3433, 'learning_rate': 2.7106218025608324e-05, 'epoch': 1.21} 40%|████ | 266/660 [12:30<18:13, 2.78s/it] 40%|████ | 267/660 [12:33<18:11, 2.78s/it] {'loss': 0.3296, 'learning_rate': 2.701436409146447e-05, 'epoch': 1.21} 40%|████ | 267/660 [12:33<18:11, 2.78s/it] 41%|████ | 268/660 [12:36<18:07, 2.77s/it] {'loss': 0.3411, 'learning_rate': 2.692234114154986e-05, 'epoch': 1.22} 41%|████ | 268/660 [12:36<18:07, 2.77s/it] 41%|████ | 269/660 [12:39<18:05, 2.78s/it] {'loss': 0.3093, 'learning_rate': 2.6830151393218727e-05, 'epoch': 1.22} 41%|████ | 269/660 [12:39<18:05, 2.78s/it] 41%|████ | 270/660 [12:41<18:04, 2.78s/it] {'loss': 0.2961, 'learning_rate': 2.6737797067844403e-05, 'epoch': 1.23} 41%|████ | 270/660 [12:41<18:04, 2.78s/it] 41%|████ | 271/660 [12:44<18:02, 2.78s/it] {'loss': 0.3359, 'learning_rate': 2.6645280390765815e-05, 'epoch': 1.23} 41%|████ | 271/660 [12:44<18:02, 2.78s/it] 41%|████ | 272/660 [12:47<17:55, 2.77s/it] {'loss': 0.3508, 'learning_rate': 2.6552603591233875e-05, 'epoch': 1.24} 41%|████ | 272/660 [12:47<17:55, 2.77s/it] 41%|████▏ | 273/660 [12:50<17:50, 2.77s/it] {'loss': 0.3285, 'learning_rate': 2.6459768902357727e-05, 'epoch': 1.24} 41%|████▏ | 273/660 [12:50<17:50, 2.77s/it] 42%|████▏ | 274/660 [12:52<17:47, 2.76s/it] {'loss': 0.3516, 'learning_rate': 2.6366778561050995e-05, 'epoch': 1.25} 42%|████▏ | 274/660 [12:52<17:47, 2.76s/it] 42%|████▏ | 275/660 [12:55<17:44, 2.76s/it] {'loss': 0.3356, 'learning_rate': 2.6273634807977835e-05, 'epoch': 1.25} 42%|████▏ | 275/660 [12:55<17:44, 2.76s/it] 42%|████▏ | 276/660 [12:58<17:40, 2.76s/it] {'loss': 0.3204, 'learning_rate': 2.618033988749895e-05, 'epoch': 1.25} 42%|████▏ | 276/660 [12:58<17:40, 2.76s/it] 42%|████▏ | 277/660 [13:01<17:40, 2.77s/it] {'loss': 0.3577, 'learning_rate': 2.6086896047617548e-05, 'epoch': 1.26} 42%|████▏ | 277/660 [13:01<17:40, 2.77s/it] 42%|████▏ | 278/660 [13:04<17:36, 2.77s/it] {'loss': 0.3198, 'learning_rate': 2.5993305539925134e-05, 'epoch': 1.26} 42%|████▏ | 278/660 [13:04<17:36, 2.77s/it] 42%|████▏ | 279/660 [13:06<17:30, 2.76s/it] {'loss': 0.45, 'learning_rate': 2.5899570619547277e-05, 'epoch': 1.27} 42%|████▏ | 279/660 [13:06<17:30, 2.76s/it] 42%|████▏ | 280/660 [13:09<17:32, 2.77s/it] {'loss': 0.2886, 'learning_rate': 2.580569354508925e-05, 'epoch': 1.27} 42%|████▏ | 280/660 [13:09<17:32, 2.77s/it] 43%|████▎ | 281/660 [13:12<17:32, 2.78s/it] {'loss': 0.2974, 'learning_rate': 2.5711676578581644e-05, 'epoch': 1.28} 43%|████▎ | 281/660 [13:12<17:32, 2.78s/it] 43%|████▎ | 282/660 [13:15<17:31, 2.78s/it] {'loss': 0.3259, 'learning_rate': 2.5617521985425846e-05, 'epoch': 1.28} 43%|████▎ | 282/660 [13:15<17:31, 2.78s/it] 43%|████▎ | 283/660 [13:17<17:26, 2.78s/it] {'loss': 0.3887, 'learning_rate': 2.5523232034339418e-05, 'epoch': 1.29} 43%|████▎ | 283/660 [13:17<17:26, 2.78s/it] 43%|████▎ | 284/660 [13:20<17:22, 2.77s/it] {'loss': 0.3147, 'learning_rate': 2.5428808997301486e-05, 'epoch': 1.29} 43%|████▎ | 284/660 [13:20<17:22, 2.77s/it] 43%|████▎ | 285/660 [13:23<17:20, 2.77s/it] {'loss': 0.3528, 'learning_rate': 2.533425514949797e-05, 'epoch': 1.3} 43%|████▎ | 285/660 [13:23<17:20, 2.77s/it] 43%|████▎ | 286/660 [13:26<17:18, 2.78s/it] {'loss': 0.3254, 'learning_rate': 2.523957276926675e-05, 'epoch': 1.3} 43%|████▎ | 286/660 [13:26<17:18, 2.78s/it] 43%|████▎ | 287/660 [13:28<17:16, 2.78s/it] {'loss': 0.3109, 'learning_rate': 2.51447641380428e-05, 'epoch': 1.3} 43%|████▎ | 287/660 [13:28<17:16, 2.78s/it] 44%|████▎ | 288/660 [13:31<17:13, 2.78s/it] {'loss': 0.3308, 'learning_rate': 2.504983154030316e-05, 'epoch': 1.31} 44%|████▎ | 288/660 [13:31<17:13, 2.78s/it] 44%|████▍ | 289/660 [13:34<17:08, 2.77s/it] {'loss': 0.3593, 'learning_rate': 2.4954777263511974e-05, 'epoch': 1.31} 44%|████▍ | 289/660 [13:34<17:08, 2.77s/it] 44%|████▍ | 290/660 [13:37<17:06, 2.77s/it] {'loss': 0.3506, 'learning_rate': 2.485960359806528e-05, 'epoch': 1.32} 44%|████▍ | 290/660 [13:37<17:06, 2.77s/it] 44%|████▍ | 291/660 [13:40<17:02, 2.77s/it] {'loss': 0.3396, 'learning_rate': 2.4764312837235895e-05, 'epoch': 1.32} 44%|████▍ | 291/660 [13:40<17:02, 2.77s/it] 44%|████▍ | 292/660 [13:42<17:00, 2.77s/it] {'loss': 0.373, 'learning_rate': 2.4668907277118114e-05, 'epoch': 1.33} 44%|████▍ | 292/660 [13:42<17:00, 2.77s/it] 44%|████▍ | 293/660 [13:45<16:58, 2.78s/it] {'loss': 0.297, 'learning_rate': 2.4573389216572393e-05, 'epoch': 1.33} 44%|████▍ | 293/660 [13:45<16:58, 2.78s/it] 45%|████▍ | 294/660 [13:48<16:57, 2.78s/it] {'loss': 0.3176, 'learning_rate': 2.4477760957169973e-05, 'epoch': 1.34} 45%|████▍ | 294/660 [13:48<16:57, 2.78s/it] 45%|████▍ | 295/660 [13:51<16:53, 2.78s/it] {'loss': 0.3833, 'learning_rate': 2.4382024803137396e-05, 'epoch': 1.34} 45%|████▍ | 295/660 [13:51<16:53, 2.78s/it] 45%|████▍ | 296/660 [13:53<16:52, 2.78s/it] {'loss': 0.2886, 'learning_rate': 2.4286183061301016e-05, 'epoch': 1.35} 45%|████▍ | 296/660 [13:53<16:52, 2.78s/it] 45%|████▌ | 297/660 [13:56<16:46, 2.77s/it] {'loss': 0.3672, 'learning_rate': 2.4190238041031382e-05, 'epoch': 1.35} 45%|████▌ | 297/660 [13:56<16:46, 2.77s/it] 45%|████▌ | 298/660 [13:59<16:43, 2.77s/it] {'loss': 0.3467, 'learning_rate': 2.4094192054187596e-05, 'epoch': 1.35} 45%|████▌ | 298/660 [13:59<16:43, 2.77s/it] 45%|████▌ | 299/660 [14:02<16:39, 2.77s/it] {'loss': 0.2853, 'learning_rate': 2.399804741506164e-05, 'epoch': 1.36} 45%|████▌ | 299/660 [14:02<16:39, 2.77s/it] 45%|████▌ | 300/660 [14:05<16:35, 2.76s/it] {'loss': 0.3845, 'learning_rate': 2.390180644032257e-05, 'epoch': 1.36} 45%|████▌ | 300/660 [14:05<16:35, 2.76s/it] 46%|████▌ | 301/660 [14:07<16:34, 2.77s/it] {'loss': 0.3082, 'learning_rate': 2.380547144896072e-05, 'epoch': 1.37} 46%|████▌ | 301/660 [14:07<16:34, 2.77s/it] 46%|████▌ | 302/660 [14:10<16:27, 2.76s/it] {'loss': 0.4136, 'learning_rate': 2.370904476223182e-05, 'epoch': 1.37} 46%|████▌ | 302/660 [14:10<16:27, 2.76s/it] 46%|████▌ | 303/660 [14:13<16:26, 2.76s/it] {'loss': 0.3513, 'learning_rate': 2.3612528703601055e-05, 'epoch': 1.38} 46%|████▌ | 303/660 [14:13<16:26, 2.76s/it] 46%|████▌ | 304/660 [14:16<16:26, 2.77s/it] {'loss': 0.2994, 'learning_rate': 2.3515925598687097e-05, 'epoch': 1.38} 46%|████▌ | 304/660 [14:16<16:26, 2.77s/it] 46%|████▌ | 305/660 [14:18<16:24, 2.77s/it] {'loss': 0.3097, 'learning_rate': 2.3419237775206026e-05, 'epoch': 1.39} 46%|████▌ | 305/660 [14:18<16:24, 2.77s/it] 46%|████▋ | 306/660 [14:21<16:22, 2.78s/it] {'loss': 0.3071, 'learning_rate': 2.332246756291531e-05, 'epoch': 1.39} 46%|████▋ | 306/660 [14:21<16:22, 2.78s/it] 47%|████▋ | 307/660 [14:24<16:18, 2.77s/it] {'loss': 0.3212, 'learning_rate': 2.322561729355761e-05, 'epoch': 1.4} 47%|████▋ | 307/660 [14:24<16:18, 2.77s/it] 47%|████▋ | 308/660 [14:27<16:16, 2.77s/it] {'loss': 0.2954, 'learning_rate': 2.312868930080462e-05, 'epoch': 1.4} 47%|████▋ | 308/660 [14:27<16:16, 2.77s/it] 47%|████▋ | 309/660 [14:29<16:13, 2.77s/it] {'loss': 0.3264, 'learning_rate': 2.3031685920200823e-05, 'epoch': 1.4} 47%|████▋ | 309/660 [14:29<16:13, 2.77s/it] 47%|████▋ | 310/660 [14:32<16:10, 2.77s/it] {'loss': 0.3422, 'learning_rate': 2.2934609489107236e-05, 'epoch': 1.41} 47%|████▋ | 310/660 [14:32<16:10, 2.77s/it] 47%|████▋ | 311/660 [14:35<16:09, 2.78s/it] {'loss': 0.3247, 'learning_rate': 2.2837462346645065e-05, 'epoch': 1.41} 47%|████▋ | 311/660 [14:35<16:09, 2.78s/it] 47%|████▋ | 312/660 [14:38<16:06, 2.78s/it] {'loss': 0.3093, 'learning_rate': 2.2740246833639366e-05, 'epoch': 1.42} 47%|████▋ | 312/660 [14:38<16:06, 2.78s/it] 47%|████▋ | 313/660 [14:41<16:04, 2.78s/it] {'loss': 0.3035, 'learning_rate': 2.2642965292562602e-05, 'epoch': 1.42} 47%|████▋ | 313/660 [14:41<16:04, 2.78s/it] 48%|████▊ | 314/660 [14:43<16:01, 2.78s/it] {'loss': 0.3103, 'learning_rate': 2.2545620067478268e-05, 'epoch': 1.43} 48%|████▊ | 314/660 [14:43<16:01, 2.78s/it] 48%|████▊ | 315/660 [14:46<16:01, 2.79s/it] {'loss': 0.3145, 'learning_rate': 2.2448213503984328e-05, 'epoch': 1.43} 48%|████▊ | 315/660 [14:46<16:01, 2.79s/it] 48%|████▊ | 316/660 [14:49<15:56, 2.78s/it] {'loss': 0.3816, 'learning_rate': 2.2350747949156756e-05, 'epoch': 1.44} 48%|████▊ | 316/660 [14:49<15:56, 2.78s/it] 48%|████▊ | 317/660 [14:52<15:52, 2.78s/it] {'loss': 0.3105, 'learning_rate': 2.2253225751492958e-05, 'epoch': 1.44} 48%|████▊ | 317/660 [14:52<15:52, 2.78s/it] 48%|████▊ | 318/660 [14:54<15:45, 2.77s/it] {'loss': 0.3882, 'learning_rate': 2.2155649260855186e-05, 'epoch': 1.45} 48%|████▊ | 318/660 [14:54<15:45, 2.77s/it] 48%|████▊ | 319/660 [14:57<15:43, 2.77s/it] {'loss': 0.3077, 'learning_rate': 2.205802082841393e-05, 'epoch': 1.45} 48%|████▊ | 319/660 [14:57<15:43, 2.77s/it] 48%|████▊ | 320/660 [15:00<15:39, 2.76s/it] {'loss': 0.3181, 'learning_rate': 2.196034280659122e-05, 'epoch': 1.45} 48%|████▊ | 320/660 [15:00<15:39, 2.76s/it] 49%|████▊ | 321/660 [15:03<15:39, 2.77s/it] {'loss': 0.3601, 'learning_rate': 2.1862617549003998e-05, 'epoch': 1.46} 49%|████▊ | 321/660 [15:03<15:39, 2.77s/it] 49%|████▉ | 322/660 [15:06<15:36, 2.77s/it] {'loss': 0.3217, 'learning_rate': 2.1764847410407396e-05, 'epoch': 1.46} 49%|████▉ | 322/660 [15:06<15:36, 2.77s/it] 49%|████▉ | 323/660 [15:08<15:31, 2.76s/it] {'loss': 0.3842, 'learning_rate': 2.166703474663795e-05, 'epoch': 1.47} 49%|████▉ | 323/660 [15:08<15:31, 2.76s/it] 49%|████▉ | 324/660 [15:11<15:28, 2.76s/it] {'loss': 0.2976, 'learning_rate': 2.1569181914556904e-05, 'epoch': 1.47} 49%|████▉ | 324/660 [15:11<15:28, 2.76s/it] 49%|████▉ | 325/660 [15:14<15:26, 2.77s/it] {'loss': 0.3408, 'learning_rate': 2.1471291271993353e-05, 'epoch': 1.48} 49%|████▉ | 325/660 [15:14<15:26, 2.77s/it] 49%|████▉ | 326/660 [15:17<15:25, 2.77s/it] {'loss': 0.2855, 'learning_rate': 2.1373365177687474e-05, 'epoch': 1.48} 49%|████▉ | 326/660 [15:17<15:25, 2.77s/it] 50%|████▉ | 327/660 [15:19<15:21, 2.77s/it] {'loss': 0.3542, 'learning_rate': 2.1275405991233696e-05, 'epoch': 1.49} 50%|████▉ | 327/660 [15:19<15:21, 2.77s/it] 50%|████▉ | 328/660 [15:22<15:18, 2.77s/it] {'loss': 0.3743, 'learning_rate': 2.117741607302378e-05, 'epoch': 1.49} 50%|████▉ | 328/660 [15:22<15:18, 2.77s/it] 50%|████▉ | 329/660 [15:25<15:16, 2.77s/it] {'loss': 0.3042, 'learning_rate': 2.107939778419004e-05, 'epoch': 1.5} 50%|████▉ | 329/660 [15:25<15:16, 2.77s/it] 50%|█████ | 330/660 [15:28<15:11, 2.76s/it] {'loss': 0.3662, 'learning_rate': 2.0981353486548363e-05, 'epoch': 1.5} 50%|█████ | 330/660 [15:28<15:11, 2.76s/it] 50%|█████ | 331/660 [15:30<15:11, 2.77s/it] {'loss': 0.3406, 'learning_rate': 2.088328554254135e-05, 'epoch': 1.5} 50%|█████ | 331/660 [15:30<15:11, 2.77s/it] 50%|█████ | 332/660 [15:33<15:04, 2.76s/it] {'loss': 0.3755, 'learning_rate': 2.0785196315181374e-05, 'epoch': 1.51} 50%|█████ | 332/660 [15:33<15:04, 2.76s/it] 50%|█████ | 333/660 [15:36<15:05, 2.77s/it] {'loss': 0.2773, 'learning_rate': 2.0687088167993648e-05, 'epoch': 1.51} 50%|█████ | 333/660 [15:36<15:05, 2.77s/it] 51%|█████ | 334/660 [15:39<15:02, 2.77s/it] {'loss': 0.3196, 'learning_rate': 2.058896346495927e-05, 'epoch': 1.52} 51%|█████ | 334/660 [15:39<15:02, 2.77s/it] 51%|█████ | 335/660 [15:42<15:01, 2.77s/it] {'loss': 0.318, 'learning_rate': 2.0490824570458248e-05, 'epoch': 1.52} 51%|█████ | 335/660 [15:42<15:01, 2.77s/it] 51%|█████ | 336/660 [15:44<14:55, 2.76s/it] {'loss': 0.3462, 'learning_rate': 2.0392673849212565e-05, 'epoch': 1.53} 51%|█████ | 336/660 [15:44<14:55, 2.76s/it] 51%|█████ | 337/660 [15:47<14:52, 2.76s/it] {'loss': 0.3235, 'learning_rate': 2.0294513666229173e-05, 'epoch': 1.53} 51%|█████ | 337/660 [15:47<14:52, 2.76s/it] 51%|█████ | 338/660 [15:50<14:48, 2.76s/it] {'loss': 0.2921, 'learning_rate': 2.0196346386742997e-05, 'epoch': 1.54} 51%|█████ | 338/660 [15:50<14:48, 2.76s/it] 51%|█████▏ | 339/660 [15:53<14:45, 2.76s/it] {'loss': 0.3219, 'learning_rate': 2.0098174376159965e-05, 'epoch': 1.54} 51%|█████▏ | 339/660 [15:53<14:45, 2.76s/it] 52%|█████▏ | 340/660 [15:55<14:43, 2.76s/it] {'loss': 0.3217, 'learning_rate': 2e-05, 'epoch': 1.55} 52%|█████▏ | 340/660 [15:55<14:43, 2.76s/it] 52%|█████▏ | 341/660 [15:58<14:37, 2.75s/it] {'loss': 0.396, 'learning_rate': 1.990182562384004e-05, 'epoch': 1.55} 52%|█████▏ | 341/660 [15:58<14:37, 2.75s/it] 52%|█████▏ | 342/660 [16:01<14:38, 2.76s/it] {'loss': 0.3062, 'learning_rate': 1.980365361325701e-05, 'epoch': 1.55} 52%|█████▏ | 342/660 [16:01<14:38, 2.76s/it] 52%|█████▏ | 343/660 [16:04<14:34, 2.76s/it] {'loss': 0.3457, 'learning_rate': 1.9705486333770837e-05, 'epoch': 1.56} 52%|█████▏ | 343/660 [16:04<14:34, 2.76s/it] 52%|█████▏ | 344/660 [16:06<14:33, 2.76s/it] {'loss': 0.3596, 'learning_rate': 1.960732615078744e-05, 'epoch': 1.56} 52%|█████▏ | 344/660 [16:06<14:33, 2.76s/it] 52%|█████▏ | 345/660 [16:09<14:29, 2.76s/it] {'loss': 0.3838, 'learning_rate': 1.950917542954176e-05, 'epoch': 1.57} 52%|█████▏ | 345/660 [16:09<14:29, 2.76s/it] 52%|█████▏ | 346/660 [16:12<14:27, 2.76s/it] {'loss': 0.3003, 'learning_rate': 1.9411036535040737e-05, 'epoch': 1.57} 52%|█████▏ | 346/660 [16:12<14:27, 2.76s/it] 53%|█████▎ | 347/660 [16:15<14:25, 2.77s/it] {'loss': 0.3389, 'learning_rate': 1.9312911832006355e-05, 'epoch': 1.58} 53%|█████▎ | 347/660 [16:15<14:25, 2.77s/it] 53%|█████▎ | 348/660 [16:17<14:24, 2.77s/it] {'loss': 0.2964, 'learning_rate': 1.9214803684818636e-05, 'epoch': 1.58} 53%|█████▎ | 348/660 [16:17<14:24, 2.77s/it] 53%|█████▎ | 349/660 [16:20<14:22, 2.77s/it] {'loss': 0.3171, 'learning_rate': 1.9116714457458657e-05, 'epoch': 1.59} 53%|█████▎ | 349/660 [16:20<14:22, 2.77s/it] 53%|█████▎ | 350/660 [16:23<14:22, 2.78s/it] {'loss': 0.2595, 'learning_rate': 1.901864651345164e-05, 'epoch': 1.59} 53%|█████▎ | 350/660 [16:23<14:22, 2.78s/it] 53%|█████▎ | 351/660 [16:26<14:19, 2.78s/it] {'loss': 0.3716, 'learning_rate': 1.8920602215809963e-05, 'epoch': 1.6} 53%|█████▎ | 351/660 [16:26<14:19, 2.78s/it] 53%|█████▎ | 352/660 [16:29<14:13, 2.77s/it] {'loss': 0.3564, 'learning_rate': 1.882258392697622e-05, 'epoch': 1.6} 53%|█████▎ | 352/660 [16:29<14:13, 2.77s/it] 53%|█████▎ | 353/660 [16:31<14:08, 2.76s/it] {'loss': 0.3958, 'learning_rate': 1.8724594008766314e-05, 'epoch': 1.6} 53%|█████▎ | 353/660 [16:31<14:08, 2.76s/it] 54%|█████▎ | 354/660 [16:34<14:07, 2.77s/it] {'loss': 0.3046, 'learning_rate': 1.862663482231253e-05, 'epoch': 1.61} 54%|█████▎ | 354/660 [16:34<14:07, 2.77s/it] 54%|█████▍ | 355/660 [16:37<14:04, 2.77s/it] {'loss': 0.3191, 'learning_rate': 1.8528708728006654e-05, 'epoch': 1.61} 54%|█████▍ | 355/660 [16:37<14:04, 2.77s/it] 54%|█████▍ | 356/660 [16:40<14:01, 2.77s/it] {'loss': 0.3713, 'learning_rate': 1.8430818085443106e-05, 'epoch': 1.62} 54%|█████▍ | 356/660 [16:40<14:01, 2.77s/it] 54%|█████▍ | 357/660 [16:42<13:58, 2.77s/it] {'loss': 0.3466, 'learning_rate': 1.833296525336205e-05, 'epoch': 1.62} 54%|█████▍ | 357/660 [16:42<13:58, 2.77s/it] 54%|█████▍ | 358/660 [16:45<13:57, 2.77s/it] {'loss': 0.3254, 'learning_rate': 1.8235152589592614e-05, 'epoch': 1.63} 54%|█████▍ | 358/660 [16:45<13:57, 2.77s/it] 54%|█████▍ | 359/660 [16:48<13:56, 2.78s/it] {'loss': 0.3276, 'learning_rate': 1.813738245099601e-05, 'epoch': 1.63} 54%|█████▍ | 359/660 [16:48<13:56, 2.78s/it] 55%|█████▍ | 360/660 [16:51<13:54, 2.78s/it] {'loss': 0.3124, 'learning_rate': 1.8039657193408788e-05, 'epoch': 1.64} 55%|█████▍ | 360/660 [16:51<13:54, 2.78s/it] 55%|█████▍ | 361/660 [16:53<13:48, 2.77s/it] {'loss': 0.3136, 'learning_rate': 1.7941979171586078e-05, 'epoch': 1.64} 55%|█████▍ | 361/660 [16:53<13:48, 2.77s/it] 55%|█████▍ | 362/660 [16:56<13:45, 2.77s/it] {'loss': 0.3234, 'learning_rate': 1.7844350739144814e-05, 'epoch': 1.65} 55%|█████▍ | 362/660 [16:56<13:45, 2.77s/it] 55%|█████▌ | 363/660 [16:59<14:24, 2.91s/it] {'loss': 0.355, 'learning_rate': 1.774677424850705e-05, 'epoch': 1.65} 55%|█████▌ | 363/660 [16:59<14:24, 2.91s/it] 55%|█████▌ | 364/660 [17:02<14:10, 2.87s/it] {'loss': 0.3156, 'learning_rate': 1.764925205084325e-05, 'epoch': 1.65} 55%|█████▌ | 364/660 [17:02<14:10, 2.87s/it] 55%|█████▌ | 365/660 [17:05<13:58, 2.84s/it] {'loss': 0.2852, 'learning_rate': 1.755178649601568e-05, 'epoch': 1.66} 55%|█████▌ | 365/660 [17:05<13:58, 2.84s/it] 55%|█████▌ | 366/660 [17:08<13:48, 2.82s/it] {'loss': 0.3195, 'learning_rate': 1.745437993252174e-05, 'epoch': 1.66} 55%|█████▌ | 366/660 [17:08<13:48, 2.82s/it] 56%|█████▌ | 367/660 [17:11<13:39, 2.80s/it] {'loss': 0.3387, 'learning_rate': 1.7357034707437397e-05, 'epoch': 1.67} 56%|█████▌ | 367/660 [17:11<13:39, 2.80s/it] 56%|█████▌ | 368/660 [17:13<13:33, 2.79s/it] {'loss': 0.3792, 'learning_rate': 1.7259753166360644e-05, 'epoch': 1.67} 56%|█████▌ | 368/660 [17:13<13:33, 2.79s/it] 56%|█████▌ | 369/660 [17:16<13:30, 2.79s/it] {'loss': 0.3331, 'learning_rate': 1.716253765335494e-05, 'epoch': 1.68} 56%|█████▌ | 369/660 [17:16<13:30, 2.79s/it] 56%|█████▌ | 370/660 [17:19<13:26, 2.78s/it] {'loss': 0.3182, 'learning_rate': 1.7065390510892767e-05, 'epoch': 1.68} 56%|█████▌ | 370/660 [17:19<13:26, 2.78s/it] 56%|█████▌ | 371/660 [17:22<13:22, 2.78s/it] {'loss': 0.3691, 'learning_rate': 1.696831407979918e-05, 'epoch': 1.69} 56%|█████▌ | 371/660 [17:22<13:22, 2.78s/it] 56%|█████▋ | 372/660 [17:24<13:19, 2.78s/it] {'loss': 0.3152, 'learning_rate': 1.687131069919538e-05, 'epoch': 1.69} 56%|█████▋ | 372/660 [17:24<13:19, 2.78s/it] 57%|█████▋ | 373/660 [17:27<13:19, 2.78s/it] {'loss': 0.3075, 'learning_rate': 1.6774382706442396e-05, 'epoch': 1.7} 57%|█████▋ | 373/660 [17:27<13:19, 2.78s/it] 57%|█████▋ | 374/660 [17:30<13:17, 2.79s/it] {'loss': 0.2823, 'learning_rate': 1.6677532437084696e-05, 'epoch': 1.7} 57%|█████▋ | 374/660 [17:30<13:17, 2.79s/it] 57%|█████▋ | 375/660 [17:33<13:13, 2.78s/it] {'loss': 0.3687, 'learning_rate': 1.6580762224793977e-05, 'epoch': 1.7} 57%|█████▋ | 375/660 [17:33<13:13, 2.78s/it] 57%|█████▋ | 376/660 [17:36<13:07, 2.77s/it] {'loss': 0.3433, 'learning_rate': 1.648407440131291e-05, 'epoch': 1.71} 57%|█████▋ | 376/660 [17:36<13:07, 2.77s/it] 57%|█████▋ | 377/660 [17:38<13:06, 2.78s/it] {'loss': 0.3411, 'learning_rate': 1.6387471296398945e-05, 'epoch': 1.71} 57%|█████▋ | 377/660 [17:38<13:06, 2.78s/it] 57%|█████▋ | 378/660 [17:41<13:02, 2.77s/it] {'loss': 0.3109, 'learning_rate': 1.6290955237768183e-05, 'epoch': 1.72} 57%|█████▋ | 378/660 [17:41<13:02, 2.77s/it] 57%|█████▋ | 379/660 [17:44<12:59, 2.77s/it] {'loss': 0.3356, 'learning_rate': 1.6194528551039285e-05, 'epoch': 1.72} 57%|█████▋ | 379/660 [17:44<12:59, 2.77s/it] 58%|█████▊ | 380/660 [17:47<12:59, 2.78s/it] {'loss': 0.2871, 'learning_rate': 1.609819355967744e-05, 'epoch': 1.73} 58%|█████▊ | 380/660 [17:47<12:59, 2.78s/it] 58%|█████▊ | 381/660 [17:49<12:55, 2.78s/it] {'loss': 0.3368, 'learning_rate': 1.6001952584938367e-05, 'epoch': 1.73} 58%|█████▊ | 381/660 [17:49<12:55, 2.78s/it] 58%|█████▊ | 382/660 [17:52<12:52, 2.78s/it] {'loss': 0.3088, 'learning_rate': 1.590580794581241e-05, 'epoch': 1.74} 58%|█████▊ | 382/660 [17:52<12:52, 2.78s/it] 58%|█████▊ | 383/660 [17:55<12:50, 2.78s/it] {'loss': 0.2722, 'learning_rate': 1.580976195896863e-05, 'epoch': 1.74} 58%|█████▊ | 383/660 [17:55<12:50, 2.78s/it] 58%|█████▊ | 384/660 [17:58<12:46, 2.78s/it] {'loss': 0.3474, 'learning_rate': 1.571381693869899e-05, 'epoch': 1.75} 58%|█████▊ | 384/660 [17:58<12:46, 2.78s/it] 58%|█████▊ | 385/660 [18:01<12:42, 2.77s/it] {'loss': 0.2866, 'learning_rate': 1.5617975196862607e-05, 'epoch': 1.75} 58%|█████▊ | 385/660 [18:01<12:42, 2.77s/it] 58%|█████▊ | 386/660 [18:03<12:41, 2.78s/it] {'loss': 0.3037, 'learning_rate': 1.5522239042830033e-05, 'epoch': 1.75} 58%|█████▊ | 386/660 [18:03<12:41, 2.78s/it] 59%|█████▊ | 387/660 [18:06<12:36, 2.77s/it] {'loss': 0.3723, 'learning_rate': 1.542661078342761e-05, 'epoch': 1.76} 59%|█████▊ | 387/660 [18:06<12:36, 2.77s/it] 59%|█████▉ | 388/660 [18:09<12:34, 2.77s/it] {'loss': 0.3107, 'learning_rate': 1.53310927228819e-05, 'epoch': 1.76} 59%|█████▉ | 388/660 [18:09<12:34, 2.77s/it] 59%|█████▉ | 389/660 [18:12<12:33, 2.78s/it] {'loss': 0.3101, 'learning_rate': 1.523568716276411e-05, 'epoch': 1.77} 59%|█████▉ | 389/660 [18:12<12:33, 2.78s/it] 59%|█████▉ | 390/660 [18:14<12:29, 2.78s/it] {'loss': 0.335, 'learning_rate': 1.5140396401934725e-05, 'epoch': 1.77} 59%|█████▉ | 390/660 [18:14<12:29, 2.78s/it] 59%|█████▉ | 391/660 [18:17<12:25, 2.77s/it] {'loss': 0.3372, 'learning_rate': 1.5045222736488032e-05, 'epoch': 1.78} 59%|█████▉ | 391/660 [18:17<12:25, 2.77s/it] 59%|█████▉ | 392/660 [18:20<12:23, 2.77s/it] {'loss': 0.3201, 'learning_rate': 1.4950168459696841e-05, 'epoch': 1.78} 59%|█████▉ | 392/660 [18:20<12:23, 2.77s/it] 60%|█████▉ | 393/660 [18:23<12:22, 2.78s/it] {'loss': 0.3141, 'learning_rate': 1.485523586195721e-05, 'epoch': 1.79} 60%|█████▉ | 393/660 [18:23<12:22, 2.78s/it] 60%|█████▉ | 394/660 [18:26<12:18, 2.78s/it] {'loss': 0.3197, 'learning_rate': 1.4760427230733254e-05, 'epoch': 1.79} 60%|█████▉ | 394/660 [18:26<12:18, 2.78s/it] 60%|█████▉ | 395/660 [18:28<12:14, 2.77s/it] {'loss': 0.3574, 'learning_rate': 1.4665744850502035e-05, 'epoch': 1.8} 60%|█████▉ | 395/660 [18:28<12:14, 2.77s/it] 60%|██████ | 396/660 [18:31<12:13, 2.78s/it] {'loss': 0.2993, 'learning_rate': 1.4571191002698517e-05, 'epoch': 1.8} 60%|██████ | 396/660 [18:31<12:13, 2.78s/it] 60%|██████ | 397/660 [18:34<12:49, 2.93s/it] {'loss': 0.3503, 'learning_rate': 1.4476767965660584e-05, 'epoch': 1.8} 60%|██████ | 397/660 [18:34<12:49, 2.93s/it] 60%|██████ | 398/660 [18:37<12:34, 2.88s/it] {'loss': 0.3347, 'learning_rate': 1.4382478014574164e-05, 'epoch': 1.81} 60%|██████ | 398/660 [18:37<12:34, 2.88s/it] 60%|██████ | 399/660 [18:40<12:19, 2.83s/it] {'loss': 0.412, 'learning_rate': 1.4288323421418357e-05, 'epoch': 1.81} 60%|██████ | 399/660 [18:40<12:19, 2.83s/it] 61%|██████ | 400/660 [18:43<12:10, 2.81s/it] {'loss': 0.3311, 'learning_rate': 1.4194306454910757e-05, 'epoch': 1.82} 61%|██████ | 400/660 [18:43<12:10, 2.81s/it] 61%|██████ | 401/660 [18:45<12:01, 2.79s/it] {'loss': 0.3855, 'learning_rate': 1.410042938045273e-05, 'epoch': 1.82} 61%|██████ | 401/660 [18:45<12:01, 2.79s/it] 61%|██████ | 402/660 [18:48<11:57, 2.78s/it] {'loss': 0.3253, 'learning_rate': 1.4006694460074867e-05, 'epoch': 1.83} 61%|██████ | 402/660 [18:48<11:57, 2.78s/it] 61%|██████ | 403/660 [18:51<11:52, 2.77s/it] {'loss': 0.2908, 'learning_rate': 1.391310395238246e-05, 'epoch': 1.83} 61%|██████ | 403/660 [18:51<11:52, 2.77s/it] 61%|██████ | 404/660 [18:54<11:49, 2.77s/it] {'loss': 0.2981, 'learning_rate': 1.3819660112501054e-05, 'epoch': 1.84} 61%|██████ | 404/660 [18:54<11:49, 2.77s/it] 61%|██████▏ | 405/660 [18:56<11:47, 2.77s/it] {'loss': 0.2854, 'learning_rate': 1.3726365192022173e-05, 'epoch': 1.84} 61%|██████▏ | 405/660 [18:56<11:47, 2.77s/it] 62%|██████▏ | 406/660 [18:59<11:43, 2.77s/it] {'loss': 0.2953, 'learning_rate': 1.3633221438949007e-05, 'epoch': 1.85} 62%|██████▏ | 406/660 [18:59<11:43, 2.77s/it] 62%|██████▏ | 407/660 [19:02<11:39, 2.77s/it] {'loss': 0.3359, 'learning_rate': 1.3540231097642273e-05, 'epoch': 1.85} 62%|██████▏ | 407/660 [19:02<11:39, 2.77s/it] 62%|██████▏ | 408/660 [19:05<11:35, 2.76s/it] {'loss': 0.3225, 'learning_rate': 1.3447396408766134e-05, 'epoch': 1.85} 62%|██████▏ | 408/660 [19:05<11:35, 2.76s/it] 62%|██████▏ | 409/660 [19:07<11:32, 2.76s/it] {'loss': 0.3533, 'learning_rate': 1.3354719609234188e-05, 'epoch': 1.86} 62%|██████▏ | 409/660 [19:07<11:32, 2.76s/it] 62%|██████▏ | 410/660 [19:10<11:31, 2.77s/it] {'loss': 0.3054, 'learning_rate': 1.3262202932155602e-05, 'epoch': 1.86} 62%|██████▏ | 410/660 [19:10<11:31, 2.77s/it] 62%|██████▏ | 411/660 [19:13<11:28, 2.77s/it] {'loss': 0.3206, 'learning_rate': 1.3169848606781278e-05, 'epoch': 1.87} 62%|██████▏ | 411/660 [19:13<11:28, 2.77s/it] 62%|██████▏ | 412/660 [19:16<11:25, 2.76s/it] {'loss': 0.3147, 'learning_rate': 1.3077658858450137e-05, 'epoch': 1.87} 62%|██████▏ | 412/660 [19:16<11:25, 2.76s/it] 63%|██████▎ | 413/660 [19:18<11:23, 2.77s/it] {'loss': 0.2908, 'learning_rate': 1.2985635908535543e-05, 'epoch': 1.88} 63%|██████▎ | 413/660 [19:18<11:23, 2.77s/it] 63%|██████▎ | 414/660 [19:21<11:21, 2.77s/it] {'loss': 0.3285, 'learning_rate': 1.2893781974391684e-05, 'epoch': 1.88} 63%|██████▎ | 414/660 [19:21<11:21, 2.77s/it] 63%|██████▎ | 415/660 [19:24<11:18, 2.77s/it] {'loss': 0.3513, 'learning_rate': 1.2802099269300237e-05, 'epoch': 1.89} 63%|██████▎ | 415/660 [19:24<11:18, 2.77s/it] 63%|██████▎ | 416/660 [19:27<11:14, 2.76s/it] {'loss': 0.3912, 'learning_rate': 1.2710590002417008e-05, 'epoch': 1.89} 63%|██████▎ | 416/660 [19:27<11:14, 2.76s/it] 63%|██████▎ | 417/660 [19:30<11:11, 2.76s/it] {'loss': 0.3047, 'learning_rate': 1.2619256378718672e-05, 'epoch': 1.9} 63%|██████▎ | 417/660 [19:30<11:11, 2.76s/it] 63%|██████▎ | 418/660 [19:32<11:06, 2.75s/it] {'loss': 0.3757, 'learning_rate': 1.2528100598949675e-05, 'epoch': 1.9} 63%|██████▎ | 418/660 [19:32<11:06, 2.75s/it] 63%|██████▎ | 419/660 [19:35<11:04, 2.76s/it] {'loss': 0.3464, 'learning_rate': 1.2437124859569191e-05, 'epoch': 1.9} 63%|██████▎ | 419/660 [19:35<11:04, 2.76s/it] 64%|██████▎ | 420/660 [19:38<11:02, 2.76s/it] {'loss': 0.3347, 'learning_rate': 1.2346331352698206e-05, 'epoch': 1.91} 64%|██████▎ | 420/660 [19:38<11:02, 2.76s/it] 64%|██████▍ | 421/660 [19:41<11:00, 2.76s/it] {'loss': 0.3248, 'learning_rate': 1.225572226606669e-05, 'epoch': 1.91} 64%|██████▍ | 421/660 [19:41<11:00, 2.76s/it] 64%|██████▍ | 422/660 [19:43<10:58, 2.77s/it] {'loss': 0.3179, 'learning_rate': 1.2165299782960882e-05, 'epoch': 1.92} 64%|██████▍ | 422/660 [19:43<10:58, 2.77s/it] 64%|██████▍ | 423/660 [19:46<10:54, 2.76s/it] {'loss': 0.3, 'learning_rate': 1.2075066082170693e-05, 'epoch': 1.92} 64%|██████▍ | 423/660 [19:46<10:54, 2.76s/it] 64%|██████▍ | 424/660 [19:49<10:50, 2.76s/it] {'loss': 0.3674, 'learning_rate': 1.1985023337937185e-05, 'epoch': 1.93} 64%|██████▍ | 424/660 [19:49<10:50, 2.76s/it] 64%|██████▍ | 425/660 [19:52<10:48, 2.76s/it] {'loss': 0.311, 'learning_rate': 1.1895173719900206e-05, 'epoch': 1.93} 64%|██████▍ | 425/660 [19:52<10:48, 2.76s/it] 65%|██████▍ | 426/660 [19:54<10:47, 2.77s/it] {'loss': 0.3199, 'learning_rate': 1.1805519393046092e-05, 'epoch': 1.94} 65%|██████▍ | 426/660 [19:54<10:47, 2.77s/it] 65%|██████▍ | 427/660 [19:57<10:44, 2.77s/it] {'loss': 0.3026, 'learning_rate': 1.1716062517655523e-05, 'epoch': 1.94} 65%|██████▍ | 427/660 [19:57<10:44, 2.77s/it] 65%|██████▍ | 428/660 [20:00<10:43, 2.77s/it] {'loss': 0.3115, 'learning_rate': 1.1626805249251444e-05, 'epoch': 1.95} 65%|██████▍ | 428/660 [20:00<10:43, 2.77s/it] 65%|██████▌ | 429/660 [20:03<10:36, 2.76s/it] {'loss': 0.3867, 'learning_rate': 1.153774973854712e-05, 'epoch': 1.95} 65%|██████▌ | 429/660 [20:03<10:36, 2.76s/it] 65%|██████▌ | 430/660 [20:05<10:35, 2.76s/it] {'loss': 0.3196, 'learning_rate': 1.1448898131394364e-05, 'epoch': 1.95} 65%|██████▌ | 430/660 [20:05<10:35, 2.76s/it] 65%|██████▌ | 431/660 [20:08<10:31, 2.76s/it] {'loss': 0.3701, 'learning_rate': 1.1360252568731764e-05, 'epoch': 1.96} 65%|██████▌ | 431/660 [20:08<10:31, 2.76s/it] 65%|██████▌ | 432/660 [20:11<10:29, 2.76s/it] {'loss': 0.3184, 'learning_rate': 1.1271815186533156e-05, 'epoch': 1.96} 65%|██████▌ | 432/660 [20:11<10:29, 2.76s/it] 66%|██████▌ | 433/660 [20:14<10:26, 2.76s/it] {'loss': 0.3152, 'learning_rate': 1.1183588115756127e-05, 'epoch': 1.97} 66%|██████▌ | 433/660 [20:14<10:26, 2.76s/it] 66%|██████▌ | 434/660 [20:16<10:24, 2.76s/it] {'loss': 0.3196, 'learning_rate': 1.109557348229064e-05, 'epoch': 1.97} 66%|██████▌ | 434/660 [20:16<10:24, 2.76s/it] 66%|██████▌ | 435/660 [20:19<10:19, 2.75s/it] {'loss': 0.3884, 'learning_rate': 1.1007773406907866e-05, 'epoch': 1.98} 66%|██████▌ | 435/660 [20:19<10:19, 2.75s/it] 66%|██████▌ | 436/660 [20:22<10:20, 2.77s/it] {'loss': 0.2828, 'learning_rate': 1.0920190005209066e-05, 'epoch': 1.98} 66%|██████▌ | 436/660 [20:22<10:20, 2.77s/it] 66%|██████▌ | 437/660 [20:25<10:14, 2.76s/it] {'loss': 0.4233, 'learning_rate': 1.0832825387574571e-05, 'epoch': 1.99} 66%|██████▌ | 437/660 [20:25<10:14, 2.76s/it] 66%|██████▋ | 438/660 [20:28<10:13, 2.76s/it] {'loss': 0.3059, 'learning_rate': 1.0745681659113e-05, 'epoch': 1.99} 66%|██████▋ | 438/660 [20:28<10:13, 2.76s/it] 67%|██████▋ | 439/660 [20:30<10:09, 2.76s/it] {'loss': 0.363, 'learning_rate': 1.0658760919610473e-05, 'epoch': 2.0} 67%|██████▋ | 439/660 [20:30<10:09, 2.76s/it] 67%|██████▋ | 440/660 [20:33<10:20, 2.82s/it] {'loss': 0.2411, 'learning_rate': 1.0572065263480046e-05, 'epoch': 2.0} 67%|██████▋ | 440/660 [20:33<10:20, 2.82s/it]Token indices sequence length is longer than the specified maximum sequence length for this model (2959 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2794 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2722 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2936 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2721 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3332 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2709 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3152 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3145 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3235 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3437 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2752 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3092 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3000 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3217 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2957 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3382 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2964 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2978 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2984 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2917 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2894 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3220 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3048 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3218 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2863 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2924 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3425 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3745 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3238 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (2835 > 2048). Running this sequence through the model will result in indexing errors Token indices sequence length is longer than the specified maximum sequence length for this model (3149 > 2048). Running this sequence through the model will result in indexing errors 67%|██████▋ | 441/660 [20:37<11:18, 3.10s/it] {'loss': 0.3091, 'learning_rate': 1.0485596779711267e-05, 'epoch': 2.0} 67%|██████▋ | 441/660 [20:37<11:18, 3.10s/it] 67%|██████▋ | 442/660 [20:40<10:54, 3.00s/it] {'loss': 0.2852, 'learning_rate': 1.0399357551819778e-05, 'epoch': 2.01} 67%|██████▋ | 442/660 [20:40<10:54, 3.00s/it] 67%|██████▋ | 443/660 [20:43<10:36, 2.93s/it] {'loss': 0.2843, 'learning_rate': 1.0313349657797178e-05, 'epoch': 2.01} 67%|██████▋ | 443/660 [20:43<10:36, 2.93s/it] 67%|██████▋ | 444/660 [20:45<10:21, 2.88s/it] {'loss': 0.296, 'learning_rate': 1.0227575170060909e-05, 'epoch': 2.02} 67%|██████▋ | 444/660 [20:45<10:21, 2.88s/it] 67%|██████▋ | 445/660 [20:48<10:10, 2.84s/it] {'loss': 0.3171, 'learning_rate': 1.0142036155404322e-05, 'epoch': 2.02} 67%|██████▋ | 445/660 [20:48<10:10, 2.84s/it] 68%|██████▊ | 446/660 [20:51<10:03, 2.82s/it] {'loss': 0.2894, 'learning_rate': 1.0056734674946909e-05, 'epoch': 2.03} 68%|██████▊ | 446/660 [20:51<10:03, 2.82s/it] 68%|██████▊ | 447/660 [20:54<09:57, 2.80s/it] {'loss': 0.3055, 'learning_rate': 9.971672784084621e-06, 'epoch': 2.03} 68%|██████▊ | 447/660 [20:54<09:57, 2.80s/it] 68%|██████▊ | 448/660 [20:56<09:54, 2.80s/it] {'loss': 0.2529, 'learning_rate': 9.886852532440312e-06, 'epoch': 2.04} 68%|██████▊ | 448/660 [20:56<09:54, 2.80s/it] 68%|██████▊ | 449/660 [20:59<09:49, 2.80s/it] {'loss': 0.2761, 'learning_rate': 9.802275963814387e-06, 'epoch': 2.04} 68%|██████▊ | 449/660 [20:59<09:49, 2.80s/it] 68%|██████▊ | 450/660 [21:02<09:47, 2.80s/it] {'loss': 0.2314, 'learning_rate': 9.717945116135568e-06, 'epoch': 2.05} 68%|██████▊ | 450/660 [21:02<09:47, 2.80s/it] 68%|██████▊ | 451/660 [21:05<09:40, 2.78s/it] {'loss': 0.3002, 'learning_rate': 9.633862021411735e-06, 'epoch': 2.05} 68%|██████▊ | 451/660 [21:05<09:40, 2.78s/it] 68%|██████▊ | 452/660 [21:07<09:35, 2.76s/it] {'loss': 0.3577, 'learning_rate': 9.550028705681024e-06, 'epoch': 2.05} 68%|██████▊ | 452/660 [21:07<09:35, 2.76s/it] 69%|██████▊ | 453/660 [21:10<09:30, 2.75s/it] {'loss': 0.3756, 'learning_rate': 9.466447188962964e-06, 'epoch': 2.06} 69%|██████▊ | 453/660 [21:10<09:30, 2.75s/it] 69%|██████▉ | 454/660 [21:13<09:27, 2.76s/it] {'loss': 0.3348, 'learning_rate': 9.383119485209812e-06, 'epoch': 2.06} 69%|██████▉ | 454/660 [21:13<09:27, 2.76s/it] 69%|██████▉ | 455/660 [21:16<09:24, 2.75s/it] {'loss': 0.3055, 'learning_rate': 9.30004760225806e-06, 'epoch': 2.07} 69%|██████▉ | 455/660 [21:16<09:24, 2.75s/it] 69%|██████▉ | 456/660 [21:18<09:23, 2.76s/it] {'loss': 0.2466, 'learning_rate': 9.217233541779995e-06, 'epoch': 2.07} 69%|██████▉ | 456/660 [21:18<09:23, 2.76s/it] 69%|██████▉ | 457/660 [21:21<09:20, 2.76s/it] {'loss': 0.2688, 'learning_rate': 9.134679299235526e-06, 'epoch': 2.08} 69%|██████▉ | 457/660 [21:21<09:20, 2.76s/it] 69%|██████▉ | 458/660 [21:24<09:19, 2.77s/it] {'loss': 0.2847, 'learning_rate': 9.052386863824076e-06, 'epoch': 2.08} 69%|██████▉ | 458/660 [21:24<09:19, 2.77s/it] 70%|██████▉ | 459/660 [21:27<09:16, 2.77s/it] {'loss': 0.2799, 'learning_rate': 8.970358218436614e-06, 'epoch': 2.09} 70%|██████▉ | 459/660 [21:27<09:16, 2.77s/it] 70%|██████▉ | 460/660 [21:30<09:13, 2.77s/it] {'loss': 0.2811, 'learning_rate': 8.888595339607961e-06, 'epoch': 2.09} 70%|██████▉ | 460/660 [21:30<09:13, 2.77s/it] 70%|██████▉ | 461/660 [21:32<09:11, 2.77s/it] {'loss': 0.2593, 'learning_rate': 8.807100197469081e-06, 'epoch': 2.1} 70%|██████▉ | 461/660 [21:32<09:11, 2.77s/it] 70%|███████ | 462/660 [21:35<09:06, 2.76s/it] {'loss': 0.3242, 'learning_rate': 8.725874755699664e-06, 'epoch': 2.1} 70%|███████ | 462/660 [21:35<09:06, 2.76s/it] 70%|███████ | 463/660 [21:38<09:04, 2.76s/it] {'loss': 0.2794, 'learning_rate': 8.644920971480797e-06, 'epoch': 2.1} 70%|███████ | 463/660 [21:38<09:04, 2.76s/it] 70%|███████ | 464/660 [21:41<09:03, 2.77s/it] {'loss': 0.255, 'learning_rate': 8.564240795447758e-06, 'epoch': 2.11} 70%|███████ | 464/660 [21:41<09:03, 2.77s/it] 70%|███████ | 465/660 [21:43<08:59, 2.77s/it] {'loss': 0.2946, 'learning_rate': 8.483836171643094e-06, 'epoch': 2.11} 70%|███████ | 465/660 [21:43<08:59, 2.77s/it] 71%|███████ | 466/660 [21:46<08:56, 2.77s/it] {'loss': 0.2794, 'learning_rate': 8.403709037469729e-06, 'epoch': 2.12} 71%|███████ | 466/660 [21:46<08:56, 2.77s/it] 71%|███████ | 467/660 [21:49<08:53, 2.76s/it] {'loss': 0.3159, 'learning_rate': 8.323861323644274e-06, 'epoch': 2.12} 71%|███████ | 467/660 [21:49<08:53, 2.76s/it] 71%|███████ | 468/660 [21:52<08:50, 2.76s/it] {'loss': 0.2756, 'learning_rate': 8.24429495415054e-06, 'epoch': 2.13} 71%|███████ | 468/660 [21:52<08:50, 2.76s/it] 71%|███████ | 469/660 [21:54<08:48, 2.76s/it] {'loss': 0.2874, 'learning_rate': 8.165011846193147e-06, 'epoch': 2.13} 71%|███████ | 469/660 [21:54<08:48, 2.76s/it] 71%|███████ | 470/660 [21:57<08:45, 2.77s/it] {'loss': 0.2682, 'learning_rate': 8.086013910151334e-06, 'epoch': 2.14} 71%|███████ | 470/660 [21:57<08:45, 2.77s/it] 71%|███████▏ | 471/660 [22:00<08:44, 2.77s/it] {'loss': 0.2745, 'learning_rate': 8.007303049532957e-06, 'epoch': 2.14} 71%|███████▏ | 471/660 [22:00<08:44, 2.77s/it] 72%|███████▏ | 472/660 [22:03<08:40, 2.77s/it] {'loss': 0.3018, 'learning_rate': 7.928881160928572e-06, 'epoch': 2.15} 72%|███████▏ | 472/660 [22:03<08:40, 2.77s/it] 72%|███████▏ | 473/660 [22:06<08:38, 2.77s/it] {'loss': 0.2761, 'learning_rate': 7.850750133965783e-06, 'epoch': 2.15} 72%|███████▏ | 473/660 [22:06<08:38, 2.77s/it] 72%|███████▏ | 474/660 [22:08<08:35, 2.77s/it] {'loss': 0.2791, 'learning_rate': 7.772911851263675e-06, 'epoch': 2.15} 72%|███████▏ | 474/660 [22:08<08:35, 2.77s/it] 72%|███████▏ | 475/660 [22:11<08:31, 2.76s/it] {'loss': 0.2853, 'learning_rate': 7.695368188387466e-06, 'epoch': 2.16} 72%|███████▏ | 475/660 [22:11<08:31, 2.76s/it] 72%|███████▏ | 476/660 [22:14<08:28, 2.76s/it] {'loss': 0.3075, 'learning_rate': 7.618121013803319e-06, 'epoch': 2.16} 72%|███████▏ | 476/660 [22:14<08:28, 2.76s/it] 72%|███████▏ | 477/660 [22:17<08:24, 2.76s/it] {'loss': 0.2869, 'learning_rate': 7.541172188833321e-06, 'epoch': 2.17} 72%|███████▏ | 477/660 [22:17<08:24, 2.76s/it] 72%|███████▏ | 478/660 [22:19<08:21, 2.76s/it] {'loss': 0.2665, 'learning_rate': 7.464523567610613e-06, 'epoch': 2.17} 72%|███████▏ | 478/660 [22:19<08:21, 2.76s/it] 73%|███████▎ | 479/660 [22:22<08:21, 2.77s/it] {'loss': 0.2784, 'learning_rate': 7.388176997034724e-06, 'epoch': 2.18} 73%|███████▎ | 479/660 [22:22<08:21, 2.77s/it] 73%|███████▎ | 480/660 [22:25<08:18, 2.77s/it] {'loss': 0.2744, 'learning_rate': 7.312134316727093e-06, 'epoch': 2.18} 73%|███████▎ | 480/660 [22:25<08:18, 2.77s/it] 73%|███████▎ | 481/660 [22:28<08:15, 2.77s/it] {'loss': 0.2787, 'learning_rate': 7.236397358986696e-06, 'epoch': 2.19} 73%|███████▎ | 481/660 [22:28<08:15, 2.77s/it] 73%|███████▎ | 482/660 [22:30<08:12, 2.76s/it] {'loss': 0.2686, 'learning_rate': 7.16096794874594e-06, 'epoch': 2.19} 73%|███████▎ | 482/660 [22:30<08:12, 2.76s/it] 73%|███████▎ | 483/660 [22:33<08:10, 2.77s/it] {'loss': 0.2643, 'learning_rate': 7.0858479035266595e-06, 'epoch': 2.2} 73%|███████▎ | 483/660 [22:33<08:10, 2.77s/it] 73%|███████▎ | 484/660 [22:36<08:08, 2.78s/it] {'loss': 0.2415, 'learning_rate': 7.01103903339633e-06, 'epoch': 2.2} 73%|███████▎ | 484/660 [22:36<08:08, 2.78s/it] 73%|███████▎ | 485/660 [22:39<08:05, 2.78s/it] {'loss': 0.2799, 'learning_rate': 6.93654314092447e-06, 'epoch': 2.2} 73%|███████▎ | 485/660 [22:39<08:05, 2.78s/it] 74%|███████▎ | 486/660 [22:41<08:02, 2.77s/it] {'loss': 0.2947, 'learning_rate': 6.862362021139175e-06, 'epoch': 2.21} 74%|███████▎ | 486/660 [22:42<08:02, 2.77s/it] 74%|███████▍ | 487/660 [22:44<08:00, 2.78s/it] {'loss': 0.2585, 'learning_rate': 6.788497461483896e-06, 'epoch': 2.21} 74%|███████▍ | 487/660 [22:44<08:00, 2.78s/it] 74%|███████▍ | 488/660 [22:47<07:57, 2.78s/it] {'loss': 0.2589, 'learning_rate': 6.7149512417743725e-06, 'epoch': 2.22} 74%|███████▍ | 488/660 [22:47<07:57, 2.78s/it] 74%|███████▍ | 489/660 [22:50<07:55, 2.78s/it] {'loss': 0.303, 'learning_rate': 6.641725134155681e-06, 'epoch': 2.22} 74%|███████▍ | 489/660 [22:50<07:55, 2.78s/it] 74%|███████▍ | 490/660 [22:53<07:51, 2.77s/it] {'loss': 0.3223, 'learning_rate': 6.568820903059632e-06, 'epoch': 2.23} 74%|███████▍ | 490/660 [22:53<07:51, 2.77s/it] 74%|███████▍ | 491/660 [22:55<07:48, 2.77s/it] {'loss': 0.3098, 'learning_rate': 6.496240305162194e-06, 'epoch': 2.23} 74%|███████▍ | 491/660 [22:55<07:48, 2.77s/it] 75%|███████▍ | 492/660 [22:58<07:45, 2.77s/it] {'loss': 0.2794, 'learning_rate': 6.423985089341165e-06, 'epoch': 2.24} 75%|███████▍ | 492/660 [22:58<07:45, 2.77s/it] 75%|███████▍ | 493/660 [23:01<07:42, 2.77s/it] {'loss': 0.277, 'learning_rate': 6.352056996634064e-06, 'epoch': 2.24} 75%|███████▍ | 493/660 [23:01<07:42, 2.77s/it] 75%|███████▍ | 494/660 [23:04<07:41, 2.78s/it] {'loss': 0.2354, 'learning_rate': 6.280457760196148e-06, 'epoch': 2.25} 75%|███████▍ | 494/660 [23:04<07:41, 2.78s/it] 75%|███████▌ | 495/660 [23:06<07:37, 2.78s/it] {'loss': 0.29, 'learning_rate': 6.209189105258661e-06, 'epoch': 2.25} 75%|███████▌ | 495/660 [23:06<07:37, 2.78s/it] 75%|███████▌ | 496/660 [23:09<07:35, 2.78s/it] {'loss': 0.2793, 'learning_rate': 6.138252749087286e-06, 'epoch': 2.25} 75%|███████▌ | 496/660 [23:09<07:35, 2.78s/it] 75%|███████▌ | 497/660 [23:12<07:31, 2.77s/it] {'loss': 0.2941, 'learning_rate': 6.0676504009407165e-06, 'epoch': 2.26} 75%|███████▌ | 497/660 [23:12<07:31, 2.77s/it] 75%|███████▌ | 498/660 [23:15<07:28, 2.77s/it] {'loss': 0.2695, 'learning_rate': 5.99738376202953e-06, 'epoch': 2.26} 75%|███████▌ | 498/660 [23:15<07:28, 2.77s/it] 76%|███████▌ | 499/660 [23:18<07:26, 2.77s/it] {'loss': 0.2406, 'learning_rate': 5.927454525475147e-06, 'epoch': 2.27} 76%|███████▌ | 499/660 [23:18<07:26, 2.77s/it] 76%|███████▌ | 500/660 [23:20<07:24, 2.78s/it] {'loss': 0.2849, 'learning_rate': 5.857864376269051e-06, 'epoch': 2.27} 76%|███████▌ | 500/660 [23:20<07:24, 2.78s/it] 76%|███████▌ | 501/660 [23:23<07:23, 2.79s/it] {'loss': 0.2434, 'learning_rate': 5.788614991232206e-06, 'epoch': 2.28} 76%|███████▌ | 501/660 [23:23<07:23, 2.79s/it] 76%|███████▌ | 502/660 [23:26<07:20, 2.79s/it] {'loss': 0.271, 'learning_rate': 5.719708038974636e-06, 'epoch': 2.28} 76%|███████▌ | 502/660 [23:26<07:20, 2.79s/it] 76%|███████▌ | 503/660 [23:29<07:18, 2.79s/it] {'loss': 0.2633, 'learning_rate': 5.651145179855204e-06, 'epoch': 2.29} 76%|███████▌ | 503/660 [23:29<07:18, 2.79s/it] 76%|███████▋ | 504/660 [23:32<07:15, 2.79s/it] {'loss': 0.243, 'learning_rate': 5.582928065941624e-06, 'epoch': 2.29} 76%|███████▋ | 504/660 [23:32<07:15, 2.79s/it] 77%|███████▋ | 505/660 [23:34<07:11, 2.79s/it] {'loss': 0.2878, 'learning_rate': 5.515058340970665e-06, 'epoch': 2.3} 77%|███████▋ | 505/660 [23:34<07:11, 2.79s/it] 77%|███████▋ | 506/660 [23:37<07:09, 2.79s/it] {'loss': 0.26, 'learning_rate': 5.447537640308502e-06, 'epoch': 2.3} 77%|███████▋ | 506/660 [23:37<07:09, 2.79s/it] 77%|███████▋ | 507/660 [23:40<07:06, 2.79s/it] {'loss': 0.2491, 'learning_rate': 5.380367590911368e-06, 'epoch': 2.3} 77%|███████▋ | 507/660 [23:40<07:06, 2.79s/it] 77%|███████▋ | 508/660 [23:43<07:02, 2.78s/it] {'loss': 0.2683, 'learning_rate': 5.313549811286294e-06, 'epoch': 2.31} 77%|███████▋ | 508/660 [23:43<07:02, 2.78s/it] 77%|███████▋ | 509/660 [23:45<07:00, 2.79s/it] {'loss': 0.2623, 'learning_rate': 5.247085911452143e-06, 'epoch': 2.31} 77%|███████▋ | 509/660 [23:45<07:00, 2.79s/it] 77%|███████▋ | 510/660 [23:48<06:58, 2.79s/it] {'loss': 0.2666, 'learning_rate': 5.180977492900823e-06, 'epoch': 2.32} 77%|███████▋ | 510/660 [23:48<06:58, 2.79s/it] 77%|███████▋ | 511/660 [23:51<06:53, 2.77s/it] {'loss': 0.3058, 'learning_rate': 5.115226148558661e-06, 'epoch': 2.32} 77%|███████▋ | 511/660 [23:51<06:53, 2.77s/it] 78%|███████▊ | 512/660 [23:54<06:50, 2.77s/it] {'loss': 0.2861, 'learning_rate': 5.049833462748061e-06, 'epoch': 2.33} 78%|███████▊ | 512/660 [23:54<06:50, 2.77s/it] 78%|███████▊ | 513/660 [23:57<06:47, 2.78s/it] {'loss': 0.2865, 'learning_rate': 4.9848010111493205e-06, 'epoch': 2.33} 78%|███████▊ | 513/660 [23:57<06:47, 2.78s/it] 78%|███████▊ | 514/660 [23:59<06:44, 2.77s/it] {'loss': 0.2866, 'learning_rate': 4.9201303607626114e-06, 'epoch': 2.34} 78%|███████▊ | 514/660 [23:59<06:44, 2.77s/it] 78%|███████▊ | 515/660 [24:02<06:42, 2.77s/it] {'loss': 0.2914, 'learning_rate': 4.855823069870309e-06, 'epoch': 2.34} 78%|███████▊ | 515/660 [24:02<06:42, 2.77s/it] 78%|███████▊ | 516/660 [24:05<06:40, 2.78s/it] {'loss': 0.2484, 'learning_rate': 4.791880687999382e-06, 'epoch': 2.35} 78%|███████▊ | 516/660 [24:05<06:40, 2.78s/it] 78%|███████▊ | 517/660 [24:08<06:36, 2.77s/it] {'loss': 0.2729, 'learning_rate': 4.72830475588407e-06, 'epoch': 2.35} 78%|███████▊ | 517/660 [24:08<06:36, 2.77s/it] 78%|███████▊ | 518/660 [24:10<06:32, 2.76s/it] {'loss': 0.2854, 'learning_rate': 4.66509680542877e-06, 'epoch': 2.35} 78%|███████▊ | 518/660 [24:10<06:32, 2.76s/it] 79%|███████▊ | 519/660 [24:13<06:29, 2.76s/it] {'loss': 0.2875, 'learning_rate': 4.602258359671115e-06, 'epoch': 2.36} 79%|███████▊ | 519/660 [24:13<06:29, 2.76s/it] 79%|███████▉ | 520/660 [24:16<06:25, 2.76s/it] {'loss': 0.2925, 'learning_rate': 4.53979093274526e-06, 'epoch': 2.36} 79%|███████▉ | 520/660 [24:16<06:25, 2.76s/it] 79%|███████▉ | 521/660 [24:19<06:22, 2.75s/it] {'loss': 0.3237, 'learning_rate': 4.477696029845447e-06, 'epoch': 2.37} 79%|███████▉ | 521/660 [24:19<06:22, 2.75s/it] 79%|███████▉ | 522/660 [24:21<06:21, 2.76s/it] {'loss': 0.2467, 'learning_rate': 4.415975147189666e-06, 'epoch': 2.37} 79%|███████▉ | 522/660 [24:21<06:21, 2.76s/it] 79%|███████▉ | 523/660 [24:24<06:19, 2.77s/it] {'loss': 0.2562, 'learning_rate': 4.354629771983674e-06, 'epoch': 2.38} 79%|███████▉ | 523/660 [24:24<06:19, 2.77s/it] 79%|███████▉ | 524/660 [24:27<06:16, 2.77s/it] {'loss': 0.2855, 'learning_rate': 4.293661382385106e-06, 'epoch': 2.38} 79%|███████▉ | 524/660 [24:27<06:16, 2.77s/it] 80%|███████▉ | 525/660 [24:30<06:12, 2.76s/it] {'loss': 0.3162, 'learning_rate': 4.233071447467876e-06, 'epoch': 2.39} 80%|███████▉ | 525/660 [24:30<06:12, 2.76s/it] 80%|███████▉ | 526/660 [24:32<06:09, 2.76s/it] {'loss': 0.3232, 'learning_rate': 4.172861427186794e-06, 'epoch': 2.39} 80%|███████▉ | 526/660 [24:32<06:09, 2.76s/it] 80%|███████▉ | 527/660 [24:35<06:07, 2.76s/it] {'loss': 0.2599, 'learning_rate': 4.113032772342373e-06, 'epoch': 2.4} 80%|███████▉ | 527/660 [24:35<06:07, 2.76s/it] 80%|████████ | 528/660 [24:38<06:04, 2.76s/it] {'loss': 0.3411, 'learning_rate': 4.05358692454586e-06, 'epoch': 2.4} 80%|████████ | 528/660 [24:38<06:04, 2.76s/it] 80%|████████ | 529/660 [24:41<06:02, 2.77s/it] {'loss': 0.2874, 'learning_rate': 3.994525316184515e-06, 'epoch': 2.4} 80%|████████ | 529/660 [24:41<06:02, 2.77s/it] 80%|████████ | 530/660 [24:44<05:58, 2.76s/it] {'loss': 0.3206, 'learning_rate': 3.935849370387104e-06, 'epoch': 2.41} 80%|████████ | 530/660 [24:44<05:58, 2.76s/it] 80%|████████ | 531/660 [24:46<05:56, 2.76s/it] {'loss': 0.2953, 'learning_rate': 3.877560500989581e-06, 'epoch': 2.41} 80%|████████ | 531/660 [24:46<05:56, 2.76s/it] 81%|████████ | 532/660 [24:49<05:53, 2.76s/it] {'loss': 0.3098, 'learning_rate': 3.819660112501053e-06, 'epoch': 2.42} 81%|████████ | 532/660 [24:49<05:53, 2.76s/it] 81%|████████ | 533/660 [24:52<05:51, 2.77s/it] {'loss': 0.2803, 'learning_rate': 3.762149600069909e-06, 'epoch': 2.42} 81%|████████ | 533/660 [24:52<05:51, 2.77s/it] 81%|████████ | 534/660 [24:55<05:48, 2.77s/it] {'loss': 0.2847, 'learning_rate': 3.7050303494502115e-06, 'epoch': 2.43} 81%|████████ | 534/660 [24:55<05:48, 2.77s/it] 81%|████████ | 535/660 [24:57<05:47, 2.78s/it] {'loss': 0.2615, 'learning_rate': 3.6483037369683284e-06, 'epoch': 2.43} 81%|████████ | 535/660 [24:57<05:47, 2.78s/it] 81%|████████ | 536/660 [25:00<05:43, 2.77s/it] {'loss': 0.3345, 'learning_rate': 3.5919711294897285e-06, 'epoch': 2.44} 81%|████████ | 536/660 [25:00<05:43, 2.77s/it] 81%|████████▏ | 537/660 [25:03<05:41, 2.77s/it] {'loss': 0.2454, 'learning_rate': 3.5360338843860808e-06, 'epoch': 2.44} 81%|████████▏ | 537/660 [25:03<05:41, 2.77s/it] 82%|████████▏ | 538/660 [25:06<05:38, 2.78s/it] {'loss': 0.2797, 'learning_rate': 3.4804933495025407e-06, 'epoch': 2.45} 82%|████████▏ | 538/660 [25:06<05:38, 2.78s/it] 82%|████████▏ | 539/660 [25:08<05:35, 2.77s/it] {'loss': 0.2883, 'learning_rate': 3.4253508631252406e-06, 'epoch': 2.45} 82%|████████▏ | 539/660 [25:08<05:35, 2.77s/it] 82%|████████▏ | 540/660 [25:11<05:32, 2.77s/it] {'loss': 0.2859, 'learning_rate': 3.3706077539490933e-06, 'epoch': 2.45} 82%|████████▏ | 540/660 [25:11<05:32, 2.77s/it] 82%|████████▏ | 541/660 [25:14<05:29, 2.77s/it] {'loss': 0.2999, 'learning_rate': 3.3162653410457545e-06, 'epoch': 2.46} 82%|████████▏ | 541/660 [25:14<05:29, 2.77s/it] 82%|████████▏ | 542/660 [25:17<05:28, 2.78s/it] {'loss': 0.2426, 'learning_rate': 3.262324933831813e-06, 'epoch': 2.46} 82%|████████▏ | 542/660 [25:17<05:28, 2.78s/it] 82%|████████▏ | 543/660 [25:20<05:25, 2.78s/it] {'loss': 0.2478, 'learning_rate': 3.2087878320372923e-06, 'epoch': 2.47} 82%|████████▏ | 543/660 [25:20<05:25, 2.78s/it] 82%|████████▏ | 544/660 [25:22<05:20, 2.77s/it] {'loss': 0.3087, 'learning_rate': 3.155655325674272e-06, 'epoch': 2.47} 82%|████████▏ | 544/660 [25:22<05:20, 2.77s/it] 83%|████████▎ | 545/660 [25:25<05:18, 2.77s/it] {'loss': 0.2605, 'learning_rate': 3.102928695005858e-06, 'epoch': 2.48} 83%|████████▎ | 545/660 [25:25<05:18, 2.77s/it] 83%|████████▎ | 546/660 [25:28<05:16, 2.77s/it] {'loss': 0.2654, 'learning_rate': 3.0506092105153118e-06, 'epoch': 2.48} 83%|████████▎ | 546/660 [25:28<05:16, 2.77s/it] 83%|████████▎ | 547/660 [25:31<05:13, 2.77s/it] {'loss': 0.295, 'learning_rate': 2.998698132875422e-06, 'epoch': 2.49} 83%|████████▎ | 547/660 [25:31<05:13, 2.77s/it] 83%|████████▎ | 548/660 [25:33<05:10, 2.77s/it] {'loss': 0.2882, 'learning_rate': 2.947196712918157e-06, 'epoch': 2.49} 83%|████████▎ | 548/660 [25:33<05:10, 2.77s/it] 83%|████████▎ | 549/660 [25:36<05:07, 2.77s/it] {'loss': 0.2771, 'learning_rate': 2.8961061916044997e-06, 'epoch': 2.5} 83%|████████▎ | 549/660 [25:36<05:07, 2.77s/it] 83%|████████▎ | 550/660 [25:39<05:05, 2.77s/it] {'loss': 0.2568, 'learning_rate': 2.8454277999945603e-06, 'epoch': 2.5} 83%|████████▎ | 550/660 [25:39<05:05, 2.77s/it] 83%|████████▎ | 551/660 [25:42<05:02, 2.77s/it] {'loss': 0.2961, 'learning_rate': 2.7951627592179133e-06, 'epoch': 2.5} 83%|████████▎ | 551/660 [25:42<05:02, 2.77s/it] 84%|████████▎ | 552/660 [25:45<04:59, 2.78s/it] {'loss': 0.2731, 'learning_rate': 2.7453122804441636e-06, 'epoch': 2.51} 84%|████████▎ | 552/660 [25:45<04:59, 2.78s/it] 84%|████████▍ | 553/660 [25:47<04:57, 2.78s/it] {'loss': 0.2705, 'learning_rate': 2.6958775648537794e-06, 'epoch': 2.51} 84%|████████▍ | 553/660 [25:47<04:57, 2.78s/it] 84%|████████▍ | 554/660 [25:50<04:53, 2.77s/it] {'loss': 0.2974, 'learning_rate': 2.646859803609123e-06, 'epoch': 2.52} 84%|████████▍ | 554/660 [25:50<04:53, 2.77s/it] 84%|████████▍ | 555/660 [25:53<04:51, 2.78s/it] {'loss': 0.2506, 'learning_rate': 2.5982601778257733e-06, 'epoch': 2.52} 84%|████████▍ | 555/660 [25:53<04:51, 2.78s/it] 84%|████████▍ | 556/660 [25:56<04:48, 2.77s/it] {'loss': 0.293, 'learning_rate': 2.550079858544057e-06, 'epoch': 2.53} 84%|████████▍ | 556/660 [25:56<04:48, 2.77s/it] 84%|████████▍ | 557/660 [25:58<04:45, 2.77s/it] {'loss': 0.254, 'learning_rate': 2.5023200067008356e-06, 'epoch': 2.53} 84%|████████▍ | 557/660 [25:58<04:45, 2.77s/it] 85%|████████▍ | 558/660 [26:01<04:42, 2.77s/it] {'loss': 0.2372, 'learning_rate': 2.454981773101519e-06, 'epoch': 2.54} 85%|████████▍ | 558/660 [26:01<04:42, 2.77s/it] 85%|████████▍ | 559/660 [26:04<04:40, 2.77s/it] {'loss': 0.3098, 'learning_rate': 2.408066298392342e-06, 'epoch': 2.54} 85%|████████▍ | 559/660 [26:04<04:40, 2.77s/it] 85%|████████▍ | 560/660 [26:07<04:37, 2.78s/it] {'loss': 0.2759, 'learning_rate': 2.3615747130329013e-06, 'epoch': 2.55} 85%|████████▍ | 560/660 [26:07<04:37, 2.78s/it] 85%|████████▌ | 561/660 [26:09<04:34, 2.77s/it] {'loss': 0.301, 'learning_rate': 2.315508137268876e-06, 'epoch': 2.55} 85%|████████▌ | 561/660 [26:09<04:34, 2.77s/it] 85%|████████▌ | 562/660 [26:12<04:31, 2.77s/it] {'loss': 0.2676, 'learning_rate': 2.2698676811050736e-06, 'epoch': 2.55} 85%|████████▌ | 562/660 [26:12<04:31, 2.77s/it] 85%|████████▌ | 563/660 [26:15<04:29, 2.77s/it] {'loss': 0.2759, 'learning_rate': 2.2246544442786535e-06, 'epoch': 2.56} 85%|████████▌ | 563/660 [26:15<04:29, 2.77s/it] 85%|████████▌ | 564/660 [26:18<04:26, 2.78s/it] {'loss': 0.2483, 'learning_rate': 2.1798695162326444e-06, 'epoch': 2.56} 85%|████████▌ | 564/660 [26:18<04:26, 2.78s/it] 86%|████████▌ | 565/660 [26:21<04:23, 2.77s/it] {'loss': 0.2769, 'learning_rate': 2.1355139760896957e-06, 'epoch': 2.57} 86%|████████▌ | 565/660 [26:21<04:23, 2.77s/it] 86%|████████▌ | 566/660 [26:23<04:20, 2.77s/it] {'loss': 0.3066, 'learning_rate': 2.091588892626062e-06, 'epoch': 2.57} 86%|████████▌ | 566/660 [26:23<04:20, 2.77s/it] 86%|████████▌ | 567/660 [26:26<04:17, 2.77s/it] {'loss': 0.2819, 'learning_rate': 2.04809532424586e-06, 'epoch': 2.58} 86%|████████▌ | 567/660 [26:26<04:17, 2.77s/it] 86%|████████▌ | 568/660 [26:29<04:14, 2.77s/it] {'loss': 0.3088, 'learning_rate': 2.0050343189555743e-06, 'epoch': 2.58} 86%|████████▌ | 568/660 [26:29<04:14, 2.77s/it] 86%|████████▌ | 569/660 [26:32<04:12, 2.77s/it] {'loss': 0.3223, 'learning_rate': 1.9624069143387682e-06, 'epoch': 2.59} 86%|████████▌ | 569/660 [26:32<04:12, 2.77s/it] 86%|████████▋ | 570/660 [26:34<04:09, 2.77s/it] {'loss': 0.3167, 'learning_rate': 1.9202141375311335e-06, 'epoch': 2.59} 86%|████████▋ | 570/660 [26:34<04:09, 2.77s/it] 87%|████████▋ | 571/660 [26:37<04:06, 2.77s/it] {'loss': 0.2642, 'learning_rate': 1.8784570051957062e-06, 'epoch': 2.6} 87%|████████▋ | 571/660 [26:37<04:06, 2.77s/it] 87%|████████▋ | 572/660 [26:40<04:04, 2.77s/it] {'loss': 0.2697, 'learning_rate': 1.837136523498373e-06, 'epoch': 2.6} 87%|████████▋ | 572/660 [26:40<04:04, 2.77s/it] 87%|████████▋ | 573/660 [26:43<04:01, 2.77s/it] {'loss': 0.2803, 'learning_rate': 1.7962536880836468e-06, 'epoch': 2.6} 87%|████████▋ | 573/660 [26:43<04:01, 2.77s/it] 87%|████████▋ | 574/660 [26:46<03:58, 2.77s/it] {'loss': 0.2827, 'learning_rate': 1.7558094840506478e-06, 'epoch': 2.61} 87%|████████▋ | 574/660 [26:46<03:58, 2.77s/it] 87%|████████▋ | 575/660 [26:48<03:56, 2.78s/it] {'loss': 0.2955, 'learning_rate': 1.7158048859293863e-06, 'epoch': 2.61} 87%|████████▋ | 575/660 [26:48<03:56, 2.78s/it] 87%|████████▋ | 576/660 [26:51<03:53, 2.78s/it] {'loss': 0.272, 'learning_rate': 1.676240857657283e-06, 'epoch': 2.62} 87%|████████▋ | 576/660 [26:51<03:53, 2.78s/it] 87%|████████▋ | 577/660 [26:54<03:51, 2.79s/it] {'loss': 0.2339, 'learning_rate': 1.637118352555922e-06, 'epoch': 2.62} 87%|████████▋ | 577/660 [26:54<03:51, 2.79s/it] 88%|████████▊ | 578/660 [26:57<03:48, 2.78s/it] {'loss': 0.2882, 'learning_rate': 1.5984383133081038e-06, 'epoch': 2.63} 88%|████████▊ | 578/660 [26:57<03:48, 2.78s/it] 88%|████████▊ | 579/660 [26:59<03:45, 2.78s/it] {'loss': 0.2825, 'learning_rate': 1.560201671935113e-06, 'epoch': 2.63} 88%|████████▊ | 579/660 [26:59<03:45, 2.78s/it] 88%|████████▊ | 580/660 [27:02<03:42, 2.78s/it] {'loss': 0.2831, 'learning_rate': 1.5224093497742654e-06, 'epoch': 2.64} 88%|████████▊ | 580/660 [27:02<03:42, 2.78s/it] 88%|████████▊ | 581/660 [27:05<03:39, 2.78s/it] {'loss': 0.2837, 'learning_rate': 1.4850622574567197e-06, 'epoch': 2.64} 88%|████████▊ | 581/660 [27:05<03:39, 2.78s/it] 88%|████████▊ | 582/660 [27:08<03:36, 2.78s/it] {'loss': 0.265, 'learning_rate': 1.4481612948855195e-06, 'epoch': 2.65} 88%|████████▊ | 582/660 [27:08<03:36, 2.78s/it] 88%|████████▊ | 583/660 [27:11<03:33, 2.77s/it] {'loss': 0.2871, 'learning_rate': 1.4117073512139134e-06, 'epoch': 2.65} 88%|████████▊ | 583/660 [27:11<03:33, 2.77s/it] 88%|████████▊ | 584/660 [27:13<03:30, 2.77s/it] {'loss': 0.2632, 'learning_rate': 1.3757013048239287e-06, 'epoch': 2.65} 88%|████████▊ | 584/660 [27:13<03:30, 2.77s/it] 89%|████████▊ | 585/660 [27:16<03:28, 2.78s/it] {'loss': 0.2576, 'learning_rate': 1.3401440233052233e-06, 'epoch': 2.66} 89%|████████▊ | 585/660 [27:16<03:28, 2.78s/it] 89%|████████▉ | 586/660 [27:19<03:25, 2.77s/it] {'loss': 0.2831, 'learning_rate': 1.3050363634341513e-06, 'epoch': 2.66} 89%|████████▉ | 586/660 [27:19<03:25, 2.77s/it] 89%|████████▉ | 587/660 [27:22<03:22, 2.78s/it] {'loss': 0.2596, 'learning_rate': 1.2703791711531466e-06, 'epoch': 2.67} 89%|████████▉ | 587/660 [27:22<03:22, 2.78s/it] 89%|████████▉ | 588/660 [27:24<03:19, 2.77s/it] {'loss': 0.2789, 'learning_rate': 1.236173281550319e-06, 'epoch': 2.67} 89%|████████▉ | 588/660 [27:24<03:19, 2.77s/it] 89%|████████▉ | 589/660 [27:27<03:17, 2.78s/it] {'loss': 0.2772, 'learning_rate': 1.2024195188393395e-06, 'epoch': 2.68} 89%|████████▉ | 589/660 [27:27<03:17, 2.78s/it] 89%|████████▉ | 590/660 [27:30<03:14, 2.77s/it] {'loss': 0.3138, 'learning_rate': 1.1691186963395861e-06, 'epoch': 2.68} 89%|████████▉ | 590/660 [27:30<03:14, 2.77s/it] 90%|████████▉ | 591/660 [27:33<03:11, 2.77s/it] {'loss': 0.3206, 'learning_rate': 1.1362716164565346e-06, 'epoch': 2.69} 90%|████████▉ | 591/660 [27:33<03:11, 2.77s/it] 90%|████████▉ | 592/660 [27:36<03:08, 2.77s/it] {'loss': 0.276, 'learning_rate': 1.103879070662439e-06, 'epoch': 2.69} 90%|████████▉ | 592/660 [27:36<03:08, 2.77s/it] 90%|████████▉ | 593/660 [27:38<03:05, 2.77s/it] {'loss': 0.3033, 'learning_rate': 1.0719418394772485e-06, 'epoch': 2.7} 90%|████████▉ | 593/660 [27:38<03:05, 2.77s/it] 90%|█████████ | 594/660 [27:41<03:03, 2.78s/it] {'loss': 0.2861, 'learning_rate': 1.040460692449794e-06, 'epoch': 2.7} 90%|█████████ | 594/660 [27:41<03:03, 2.78s/it] 90%|█████████ | 595/660 [27:44<03:00, 2.77s/it] {'loss': 0.2509, 'learning_rate': 1.0094363881392665e-06, 'epoch': 2.7} 90%|█████████ | 595/660 [27:44<03:00, 2.77s/it] 90%|█████████ | 596/660 [27:47<02:57, 2.78s/it] {'loss': 0.2758, 'learning_rate': 9.788696740969295e-07, 'epoch': 2.71} 90%|█████████ | 596/660 [27:47<02:57, 2.78s/it] 90%|█████████ | 597/660 [27:49<02:54, 2.78s/it] {'loss': 0.2804, 'learning_rate': 9.487612868480945e-07, 'epoch': 2.71} 90%|█████████ | 597/660 [27:49<02:54, 2.78s/it] 91%|█████████ | 598/660 [27:52<02:51, 2.77s/it] {'loss': 0.3339, 'learning_rate': 9.191119518743919e-07, 'epoch': 2.72} 91%|█████████ | 598/660 [27:52<02:51, 2.77s/it] 91%|█████████ | 599/660 [27:55<02:49, 2.77s/it] {'loss': 0.2886, 'learning_rate': 8.899223835962778e-07, 'epoch': 2.72} 91%|█████████ | 599/660 [27:55<02:49, 2.77s/it] 91%|█████████ | 600/660 [27:58<02:45, 2.76s/it] {'loss': 0.291, 'learning_rate': 8.611932853558236e-07, 'epoch': 2.73} 91%|█████████ | 600/660 [27:58<02:45, 2.76s/it] 91%|█████████ | 601/660 [28:00<02:43, 2.77s/it] {'loss': 0.2487, 'learning_rate': 8.329253493997736e-07, 'epoch': 2.73} 91%|█████████ | 601/660 [28:00<02:43, 2.77s/it] 91%|█████████ | 602/660 [28:03<02:40, 2.76s/it] {'loss': 0.2677, 'learning_rate': 8.051192568628518e-07, 'epoch': 2.74} 91%|█████████ | 602/660 [28:03<02:40, 2.76s/it] 91%|█████████▏| 603/660 [28:06<02:37, 2.76s/it] {'loss': 0.2979, 'learning_rate': 7.77775677751369e-07, 'epoch': 2.74} 91%|█████████▏| 603/660 [28:06<02:37, 2.76s/it] 92%|█████████▏| 604/660 [28:09<02:34, 2.77s/it] {'loss': 0.2582, 'learning_rate': 7.508952709270567e-07, 'epoch': 2.75} 92%|█████████▏| 604/660 [28:09<02:34, 2.77s/it] 92%|█████████▏| 605/660 [28:12<02:32, 2.77s/it] {'loss': 0.282, 'learning_rate': 7.244786840912033e-07, 'epoch': 2.75} 92%|█████████▏| 605/660 [28:12<02:32, 2.77s/it] 92%|█████████▏| 606/660 [28:14<02:29, 2.77s/it] {'loss': 0.2601, 'learning_rate': 6.985265537690522e-07, 'epoch': 2.75} 92%|█████████▏| 606/660 [28:14<02:29, 2.77s/it] 92%|█████████▏| 607/660 [28:17<02:26, 2.77s/it] {'loss': 0.2927, 'learning_rate': 6.730395052944549e-07, 'epoch': 2.76} 92%|█████████▏| 607/660 [28:17<02:26, 2.77s/it] 92%|█████████▏| 608/660 [28:20<02:23, 2.77s/it] {'loss': 0.287, 'learning_rate': 6.480181527948049e-07, 'epoch': 2.76} 92%|█████████▏| 608/660 [28:20<02:23, 2.77s/it] 92%|█████████▏| 609/660 [28:23<02:21, 2.77s/it] {'loss': 0.2721, 'learning_rate': 6.234630991762403e-07, 'epoch': 2.77} 92%|█████████▏| 609/660 [28:23<02:21, 2.77s/it] 92%|█████████▏| 610/660 [28:25<02:18, 2.77s/it] {'loss': 0.3389, 'learning_rate': 5.993749361091206e-07, 'epoch': 2.77} 92%|█████████▏| 610/660 [28:25<02:18, 2.77s/it] 93%|█████████▎| 611/660 [28:28<02:15, 2.77s/it] {'loss': 0.2916, 'learning_rate': 5.757542440137643e-07, 'epoch': 2.78} 93%|█████████▎| 611/660 [28:28<02:15, 2.77s/it] 93%|█████████▎| 612/660 [28:31<02:12, 2.77s/it] {'loss': 0.2896, 'learning_rate': 5.526015920464689e-07, 'epoch': 2.78} 93%|█████████▎| 612/660 [28:31<02:12, 2.77s/it] 93%|█████████▎| 613/660 [28:34<02:09, 2.76s/it] {'loss': 0.3224, 'learning_rate': 5.299175380857891e-07, 'epoch': 2.79} 93%|█████████▎| 613/660 [28:34<02:09, 2.76s/it] 93%|█████████▎| 614/660 [28:36<02:07, 2.77s/it] {'loss': 0.2533, 'learning_rate': 5.077026287191e-07, 'epoch': 2.79} 93%|█████████▎| 614/660 [28:36<02:07, 2.77s/it] 93%|█████████▎| 615/660 [28:39<02:04, 2.77s/it] {'loss': 0.2982, 'learning_rate': 4.859573992294309e-07, 'epoch': 2.8} 93%|█████████▎| 615/660 [28:39<02:04, 2.77s/it] 93%|█████████▎| 616/660 [28:42<02:01, 2.76s/it] {'loss': 0.3191, 'learning_rate': 4.646823735825523e-07, 'epoch': 2.8} 93%|█████████▎| 616/660 [28:42<02:01, 2.76s/it] 93%|█████████▎| 617/660 [28:45<01:58, 2.76s/it] {'loss': 0.2998, 'learning_rate': 4.43878064414367e-07, 'epoch': 2.8} 93%|█████████▎| 617/660 [28:45<01:58, 2.76s/it] 94%|█████████▎| 618/660 [28:47<01:56, 2.77s/it] {'loss': 0.2758, 'learning_rate': 4.235449730185548e-07, 'epoch': 2.81} 94%|█████████▎| 618/660 [28:47<01:56, 2.77s/it] 94%|█████████▍| 619/660 [28:50<01:53, 2.76s/it] {'loss': 0.2935, 'learning_rate': 4.036835893344759e-07, 'epoch': 2.81} 94%|█████████▍| 619/660 [28:50<01:53, 2.76s/it] 94%|█████████▍| 620/660 [28:53<01:50, 2.77s/it] {'loss': 0.2649, 'learning_rate': 3.842943919353914e-07, 'epoch': 2.82} 94%|█████████▍| 620/660 [28:53<01:50, 2.77s/it] 94%|█████████▍| 621/660 [28:56<01:48, 2.77s/it] {'loss': 0.2825, 'learning_rate': 3.6537784801691677e-07, 'epoch': 2.82} 94%|█████████▍| 621/660 [28:56<01:48, 2.77s/it] 94%|█████████▍| 622/660 [28:59<01:45, 2.77s/it] {'loss': 0.3108, 'learning_rate': 3.469344133857644e-07, 'epoch': 2.83} 94%|█████████▍| 622/660 [28:59<01:45, 2.77s/it] 94%|█████████▍| 623/660 [29:01<01:42, 2.76s/it] {'loss': 0.3005, 'learning_rate': 3.2896453244877005e-07, 'epoch': 2.83} 94%|█████████▍| 623/660 [29:01<01:42, 2.76s/it] 95%|█████████▍| 624/660 [29:04<01:39, 2.77s/it] {'loss': 0.2614, 'learning_rate': 3.114686382021681e-07, 'epoch': 2.84} 95%|█████████▍| 624/660 [29:04<01:39, 2.77s/it] 95%|█████████▍| 625/660 [29:07<01:36, 2.77s/it] {'loss': 0.2899, 'learning_rate': 2.944471522211756e-07, 'epoch': 2.84} 95%|█████████▍| 625/660 [29:07<01:36, 2.77s/it] 95%|█████████▍| 626/660 [29:10<01:34, 2.77s/it] {'loss': 0.2552, 'learning_rate': 2.7790048464982677e-07, 'epoch': 2.85} 95%|█████████▍| 626/660 [29:10<01:34, 2.77s/it] 95%|█████████▌| 627/660 [29:12<01:31, 2.78s/it] {'loss': 0.2657, 'learning_rate': 2.6182903419108343e-07, 'epoch': 2.85} 95%|█████████▌| 627/660 [29:12<01:31, 2.78s/it] 95%|█████████▌| 628/660 [29:15<01:28, 2.77s/it] {'loss': 0.2708, 'learning_rate': 2.462331880972468e-07, 'epoch': 2.85} 95%|█████████▌| 628/660 [29:15<01:28, 2.77s/it] 95%|█████████▌| 629/660 [29:18<01:25, 2.77s/it] {'loss': 0.2538, 'learning_rate': 2.31113322160601e-07, 'epoch': 2.86} 95%|█████████▌| 629/660 [29:18<01:25, 2.77s/it] 95%|█████████▌| 630/660 [29:21<01:23, 2.78s/it] {'loss': 0.2571, 'learning_rate': 2.1646980070437973e-07, 'epoch': 2.86} 95%|█████████▌| 630/660 [29:21<01:23, 2.78s/it] 96%|█████████▌| 631/660 [29:24<01:20, 2.77s/it] {'loss': 0.3091, 'learning_rate': 2.0230297657398034e-07, 'epoch': 2.87} 96%|█████████▌| 631/660 [29:24<01:20, 2.77s/it] 96%|█████████▌| 632/660 [29:26<01:17, 2.78s/it] {'loss': 0.2784, 'learning_rate': 1.88613191128455e-07, 'epoch': 2.87} 96%|█████████▌| 632/660 [29:26<01:17, 2.78s/it] 96%|█████████▌| 633/660 [29:29<01:14, 2.77s/it] {'loss': 0.3043, 'learning_rate': 1.7540077423229495e-07, 'epoch': 2.88} 96%|█████████▌| 633/660 [29:29<01:14, 2.77s/it] 96%|█████████▌| 634/660 [29:32<01:12, 2.77s/it] {'loss': 0.286, 'learning_rate': 1.6266604424747921e-07, 'epoch': 2.88} 96%|█████████▌| 634/660 [29:32<01:12, 2.77s/it] 96%|█████████▌| 635/660 [29:35<01:09, 2.77s/it] {'loss': 0.2731, 'learning_rate': 1.5040930802580066e-07, 'epoch': 2.89} 96%|█████████▌| 635/660 [29:35<01:09, 2.77s/it] 96%|█████████▋| 636/660 [29:37<01:06, 2.77s/it] {'loss': 0.3518, 'learning_rate': 1.3863086090147415e-07, 'epoch': 2.89} 96%|█████████▋| 636/660 [29:37<01:06, 2.77s/it] 97%|█████████▋| 637/660 [29:40<01:03, 2.77s/it] {'loss': 0.2872, 'learning_rate': 1.2733098668402444e-07, 'epoch': 2.9} 97%|█████████▋| 637/660 [29:40<01:03, 2.77s/it] 97%|█████████▋| 638/660 [29:43<01:00, 2.76s/it] {'loss': 0.2607, 'learning_rate': 1.1650995765143613e-07, 'epoch': 2.9} 97%|█████████▋| 638/660 [29:43<01:00, 2.76s/it] 97%|█████████▋| 639/660 [29:46<00:58, 2.76s/it] {'loss': 0.2729, 'learning_rate': 1.0616803454361001e-07, 'epoch': 2.9} 97%|█████████▋| 639/660 [29:46<00:58, 2.76s/it] 97%|█████████▋| 640/660 [29:48<00:55, 2.76s/it] {'loss': 0.2932, 'learning_rate': 9.630546655606365e-08, 'epoch': 2.91} 97%|█████████▋| 640/660 [29:48<00:55, 2.76s/it] 97%|█████████▋| 641/660 [29:51<00:52, 2.76s/it] {'loss': 0.3029, 'learning_rate': 8.692249133393394e-08, 'epoch': 2.91} 97%|█████████▋| 641/660 [29:51<00:52, 2.76s/it] 97%|█████████▋| 642/660 [29:54<00:49, 2.77s/it] {'loss': 0.2488, 'learning_rate': 7.801933496625724e-08, 'epoch': 2.92} 97%|█████████▋| 642/660 [29:54<00:49, 2.77s/it] 97%|█████████▋| 643/660 [29:57<00:47, 2.78s/it] {'loss': 0.2458, 'learning_rate': 6.959621198050715e-08, 'epoch': 2.92} 97%|█████████▋| 643/660 [29:57<00:47, 2.78s/it] 98%|█████████▊| 644/660 [30:00<00:44, 2.78s/it] {'loss': 0.2844, 'learning_rate': 6.165332533744072e-08, 'epoch': 2.93} 98%|█████████▊| 644/660 [30:00<00:44, 2.78s/it] 98%|█████████▊| 645/660 [30:02<00:41, 2.78s/it] {'loss': 0.2549, 'learning_rate': 5.4190866426195866e-08, 'epoch': 2.93} 98%|█████████▊| 645/660 [30:02<00:41, 2.78s/it] 98%|█████████▊| 646/660 [30:05<00:38, 2.78s/it] {'loss': 0.288, 'learning_rate': 4.7209015059686e-08, 'epoch': 2.94} 98%|█████████▊| 646/660 [30:05<00:38, 2.78s/it] 98%|█████████▊| 647/660 [30:08<00:36, 2.78s/it] {'loss': 0.2609, 'learning_rate': 4.0707939470268073e-08, 'epoch': 2.94} 98%|█████████▊| 647/660 [30:08<00:36, 2.78s/it] 98%|█████████▊| 648/660 [30:11<00:33, 2.78s/it] {'loss': 0.2925, 'learning_rate': 3.468779630568353e-08, 'epoch': 2.95} 98%|█████████▊| 648/660 [30:11<00:33, 2.78s/it] 98%|█████████▊| 649/660 [30:13<00:30, 2.77s/it] {'loss': 0.3137, 'learning_rate': 2.9148730625285782e-08, 'epoch': 2.95} 98%|█████████▊| 649/660 [30:13<00:30, 2.77s/it] 98%|█████████▊| 650/660 [30:16<00:27, 2.77s/it] {'loss': 0.2715, 'learning_rate': 2.4090875896551903e-08, 'epoch': 2.95} 98%|█████████▊| 650/660 [30:16<00:27, 2.77s/it] 99%|█████████▊| 651/660 [30:19<00:25, 2.78s/it] {'loss': 0.2378, 'learning_rate': 1.9514353991856307e-08, 'epoch': 2.96} 99%|█████████▊| 651/660 [30:19<00:25, 2.78s/it] 99%|█████████▉| 652/660 [30:22<00:22, 2.79s/it] {'loss': 0.2437, 'learning_rate': 1.541927518554198e-08, 'epoch': 2.96} 99%|█████████▉| 652/660 [30:22<00:22, 2.79s/it] 99%|█████████▉| 653/660 [30:25<00:19, 2.78s/it] {'loss': 0.2534, 'learning_rate': 1.1805738151253743e-08, 'epoch': 2.97} 99%|█████████▉| 653/660 [30:25<00:19, 2.78s/it] 99%|█████████▉| 654/660 [30:27<00:16, 2.78s/it] {'loss': 0.29, 'learning_rate': 8.673829959575664e-09, 'epoch': 2.97} 99%|█████████▉| 654/660 [30:27<00:16, 2.78s/it] 99%|█████████▉| 655/660 [30:30<00:13, 2.79s/it] {'loss': 0.2528, 'learning_rate': 6.023626075915001e-09, 'epoch': 2.98} 99%|█████████▉| 655/660 [30:30<00:13, 2.79s/it] 99%|█████████▉| 656/660 [30:33<00:11, 2.78s/it] {'loss': 0.275, 'learning_rate': 3.855190358703631e-09, 'epoch': 2.98} 99%|█████████▉| 656/660 [30:33<00:11, 2.78s/it] 100%|█████████▉| 657/660 [30:36<00:08, 2.77s/it] {'loss': 0.2783, 'learning_rate': 2.168575057839295e-09, 'epoch': 2.99} 100%|█████████▉| 657/660 [30:36<00:08, 2.77s/it] 100%|█████████▉| 658/660 [30:38<00:05, 2.78s/it] {'loss': 0.3087, 'learning_rate': 9.638208134399306e-10, 'epoch': 2.99} 100%|█████████▉| 658/660 [30:38<00:05, 2.78s/it] 100%|█████████▉| 659/660 [30:41<00:02, 2.77s/it] {'loss': 0.2959, 'learning_rate': 2.409566548622344e-10, 'epoch': 3.0} 100%|█████████▉| 659/660 [30:41<00:02, 2.77s/it] 100%|██████████| 660/660 [30:44<00:00, 2.81s/it] {'loss': 0.281, 'learning_rate': 0.0, 'epoch': 3.0} 100%|██████████| 660/660 [30:44<00:00, 2.81s/it] {'train_runtime': 1846.7187, 'train_samples_per_second': 45.554, 'train_steps_per_second': 0.357, 'train_loss': 0.35311297792376894, 'epoch': 3.0} 100%|██████████| 660/660 [30:44<00:00, 2.81s/it] 100%|██████████| 660/660 [30:44<00:00, 2.79s/it] [2024-03-15 13:13:39,953] [INFO] [launch.py:347:main] Process 17885 exits successfully. [2024-03-15 13:13:39,953] [INFO] [launch.py:347:main] Process 17883 exits successfully. [2024-03-15 13:13:40,955] [INFO] [launch.py:347:main] Process 17875 exits successfully. [2024-03-15 13:13:40,955] [INFO] [launch.py:347:main] Process 17881 exits successfully. [2024-03-15 13:13:40,955] [INFO] [launch.py:347:main] Process 17874 exits successfully. [2024-03-15 13:13:40,955] [INFO] [launch.py:347:main] Process 17877 exits successfully. [2024-03-15 13:13:40,955] [INFO] [launch.py:347:main] Process 17879 exits successfully. wandb: - 0.013 MB of 0.013 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: / 0.068 MB of 0.068 MB uploaded wandb: - 0.068 MB of 0.068 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: / 0.068 MB of 0.068 MB uploaded wandb: - 0.068 MB of 0.068 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: / 0.068 MB of 0.068 MB uploaded wandb: - 0.068 MB of 0.068 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: / 0.068 MB of 0.068 MB uploaded wandb: - 0.068 MB of 0.068 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: / 0.068 MB of 0.068 MB uploaded wandb: - 0.068 MB of 0.068 MB uploaded wandb: \ 0.068 MB of 0.068 MB uploaded wandb: | 0.068 MB of 0.068 MB uploaded wandb: wandb: Run history: wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/learning_rate ▄███████▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁ wandb: train/loss █▂▃▃▃▂▃▂▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▂▁▂▁▁▂▁▁ wandb: train/total_flos ▁ wandb: train/train_loss ▁ wandb: train/train_runtime ▁ wandb: train/train_samples_per_second ▁ wandb: train/train_steps_per_second ▁ wandb: wandb: Run summary: wandb: train/epoch 3.0 wandb: train/global_step 660 wandb: train/learning_rate 0.0 wandb: train/loss 0.281 wandb: train/total_flos 466658934128640.0 wandb: train/train_loss 0.35311 wandb: train/train_runtime 1846.7187 wandb: train/train_samples_per_second 45.554 wandb: train/train_steps_per_second 0.357 wandb: wandb: 🚀 View run solar-oath-104 at: https://wandb.ai/smellslikeml/huggingface/runs/mgav6nep wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240315_124251-mgav6nep/logs [2024-03-15 13:14:09,987] [INFO] [launch.py:347:main] Process 17873 exits successfully.