/usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( [2024-03-10 11:11:23,156] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,396] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,422] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,477] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,477] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2024-03-10 11:11:23,532] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,718] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,742] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,745] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,848] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,853] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:23,865] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:23,969] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-10 11:11:24,046] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,153] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,187] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-10 11:11:24,302] [INFO] [comm.py:637:init_distributed] cdb=None Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:09, 3.23s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:01<00:05, 1.69s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:02<00:06, 2.17s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:01<00:04, 1.37s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:01<00:05, 1.92s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:01<00:05, 1.98s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:02<00:06, 2.07s/it] Loading checkpoint shards: 25%|██▌ | 1/4 [00:01<00:04, 1.57s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:05<00:05, 2.94s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.33s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:05<00:06, 3.13s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.46s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.34s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:05<00:06, 3.18s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.30s/it] Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.40s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:10<00:03, 3.39s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.13s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00, 2.70s/it] Some weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.post_layernorm.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading checkpoint shards: 75%|███████▌ | 3/4 [00:08<00:02, 2.99s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.11s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.24s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:08<00:03, 3.02s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.17s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:08<00:03, 3.06s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [00:09<00:03, 3.11s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 1.96s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.32s/it] Some weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.post_layernorm.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', ' Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00, 1.90s/it]model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower. Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00, 2.19s/it]encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_towe r.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', Some weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.image_tower.image_tower.embeddings.class_embedding', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.post_layernorm.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.post_layernorm.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_a Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.01s/it]ttn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.image_tower.image_tower Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.39s/it]3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.vide o_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.vi.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13Some weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layer Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 1.97s/it]s.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.enc.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_towoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.34s/it].video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.b Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 1.94s/it] Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00, 1.91s/it]er.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). ias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'deo_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_t Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.28s/it]model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', ower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.teSome weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.vid Loading checkpoint shards: 100%|██████████| 4/4 [00:08<00:00, 2.23s/it]eo_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1. bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.post_layernorm.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.e Some weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attmporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). n.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.image_towerSome weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.post_layernorm.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.post_layernorm.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.05s/it]ncoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00, 2.36s/it].image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.enc oder.layers.20.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_pSome weights of the model checkpoint at /mnt/bn/liangkeg/ruohongz/save/gpt4v_video_alignment/Video-LLaVA-Finetune-frames-image_600k-videocaption_300k-from_pretrain were not used when initializing LlavaLlamaForCausalLM: ['model.video_tower.video_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.3.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.0.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.weight', 'model.image_tower.image_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.image_tower.image_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.image_tower.image_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm1.weight', 'model.image_tower.image_tower.embeddings.position_embedding.weight', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.9.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.pre_layrnorm.bias', 'model.video_tower.video_tower.encoder.layers.4.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.bias', 'model.image_tower.image_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.0.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.bias', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc1.weight', 'model.image_tower.image_tower.post_layernorm.weight', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.5.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.8.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.20.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.bias', 'model.video_tower.video_tower.post_layernorm.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.0.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.15.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.v_proj.weight', 'model.video_tower.video_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.18.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.6.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.0.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.0.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.22.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.19.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.1.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.1.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.2.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.21.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.10.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.1.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.12.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.bias'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). roj.weight', 'model.video_tower.video_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.12.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.19.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.21.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.22.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.12.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.15.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.3.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.12.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.3.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.23.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.19.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.8.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.16.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.temporal_layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.1.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.9.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.9.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.6.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.23.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.14.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.10.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.14.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.10.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.bias', 'model.video_tower.video_tower.embeddings.patch_embedding.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.7.layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.22.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.12.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.pre_layrnorm.weight', 'model.image_tower.image_tower.embeddings.class_embedding', 'model.video_tower.video_tower.encoder.layers.10.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.15.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.14.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.11.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.16.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.layer_norm1.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.17.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.18.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.20.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.7.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.2.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.22.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.23.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.22.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.11.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.16.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.15.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.10.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.10.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.17.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.2.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.14.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.0.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.self_attn.v_proj.weight', 'model.image_tower.image_tower.encoder.layers.9.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.6.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.0.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.10.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.3.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.3.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.23.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.5.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.2.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.post_layernorm.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.4.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.21.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.19.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.18.self_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.13.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.8.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.15.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.2.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.8.mlp.fc1.bias', 'model.image_tower.image_tower.encoder.layers.23.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.15.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.6.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.13.self_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.16.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.16.temporal_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.4.self_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.temporal_attn.out_proj.bias', 'model.image_tower.image_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.20.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.11.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.5.temporal_embedding', 'model.image_tower.image_tower.encoder.layers.11.layer_norm1.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.22.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.10.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.18.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.12.mlp.fc1.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.7.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.21.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.1.temporal_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.5.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.13.temporal_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.4.mlp.fc1.weight', 'model.video_tower.video_tower.encoder.layers.23.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.13.self_attn.k_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.self_attn.out_proj.weight', 'model.image_tower.image_tower.encoder.layers.2.mlp.fc2.weight', 'model.video_tower.video_tower.encoder.layers.19.temporal_attn.out_proj.weight', 'model.video_tower.video_tower.encoder.layers.18.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.11.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.16.layer_norm2.bias', 'model.image_tower.image_tower.post_layernorm.weight', 'model.image_tower.image_tower.encoder.layers.22.layer_norm2.weight', 'model.image_tower.image_tower.encoder.layers.6.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.11.mlp.fc2.bias', 'model.image_tower.image_tower.encoder.layers.12.self_attn.k_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.v_proj.bias', 'model.image_tower.image_tower.encoder.layers.7.self_attn.k_proj.bias', 'model.video_tower.video_tower.encoder.layers.12.temporal_embedding', 'model.video_tower.video_tower.encoder.layers.13.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.mlp.fc2.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm2.bias', 'model.video_tower.video_tower.encoder.layers.3.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.20.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.4.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.19.self_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.0.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.14.temporal_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.2.layer_norm2.weight', 'model.video_tower.video_tower.encoder.layers.5.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.9.temporal_attn.q_proj.bias', 'model.video_tower.video_tower.encoder.layers.8.temporal_attn.v_proj.weight', 'model.video_tower.video_tower.encoder.layers.20.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.11.temporal_attn.q_proj.bias', 'model.image_tower.image_tower.encoder.layers.17.mlp.fc1.weight', 'model.image_tower.image_tower.encoder.layers.11.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.9.self_attn.k_proj.weight', 'model.video_tower.video_tower.encoder.layers.21.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.18.temporal_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.13.self_attn.q_proj.weight', 'model.image_tower.image_tower.encoder.layers.6.self_attn.q_proj.weight', 'model.video_tower.video_tower.encoder.layers.1.temporal_layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.20.mlp.fc2.weight', 'model.image_tower.image_tower.encoder.layers.21.self_attn.v_proj.bias', 'model.video_tower.video_tower.encoder.layers.15.temporal_layer_norm1.weight', 'model.video_tower.video_tower.encoder.layers.7.layer_norm2.bias', 'model.image_tower.image_tower.encoder.layers.11.self_attn.out_proj.bias', 'model.video_tower.video_tower.encoder.layers.5.layer_norm1.bias', 'model.video_tower.video_tower.encoder.layers.16.self_attn.q_proj.weight'] - This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 Formatting inputs...Skip in lazy mode [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 [I 240310 11:12:06 train:701] data list length: 355000 /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") 2024-03-10 11:12:22.627 n193-018-074:2301448:2301448 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.627 n193-018-074:2301448:2301448 [0] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.635 n193-018-074:2301448:2301448 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.654 n193-018-074:2301449:2301449 [1] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.661 n193-018-074:2301451:2301451 [3] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.661 n193-018-074:2301451:2301451 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.662 n193-018-074:2301451:2301451 [3] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.668 n193-018-074:2301452:2301452 [4] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.669 n193-018-074:2301452:2301452 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.669 n193-018-074:2301452:2301452 [4] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.671 n193-018-074:2301455:2301455 [7] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.673 n193-018-074:2301449:2301449 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.677 n193-018-074:2301448:2301448 [0] NCCL INFO cudaDriverVersion 12010 NCCL version 2.19.3+cuda12.1 2024-03-10 11:12:22.680 n193-018-074:2301451:2301451 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.685 n193-018-074:2301452:2301452 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.689 n193-018-074:2301455:2301455 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.700 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.709 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.709 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.713 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.716 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.744 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.745 n193-018-074:2301449:2302318 [1] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.758 n193-018-074:2301449:2302318 [1] NCCL INFO Using network IB 2024-03-10 11:12:22.770 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.772 n193-018-074:2301455:2302322 [7] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.776 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.776 n193-018-074:2301451:2302320 [3] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.779 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.780 n193-018-074:2301448:2302319 [0] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.787 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.788 n193-018-074:2301452:2302321 [4] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.791 n193-018-074:2301451:2302320 [3] NCCL INFO Using network IB 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.791 n193-018-074:2301455:2302322 [7] NCCL INFO Using network IB 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.792 n193-018-074:2301448:2302319 [0] NCCL INFO Using network IB 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:22.800 n193-018-074:2301452:2302321 [4] NCCL INFO Using network IB 2024-03-10 11:12:22.880 n193-018-074:2301453:2301453 [5] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:22.880 n193-018-074:2301453:2301453 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.881 n193-018-074:2301453:2301453 [5] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:22.887 n193-018-074:2301453:2301453 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:22.908 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:22.927 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:22.927 n193-018-074:2301453:2302344 [5] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.068 n193-018-074:2301453:2302344 [5] NCCL INFO Using network IB 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.392 n193-018-074:2301450:2301450 [2] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:23.402 n193-018-074:2301450:2301450 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO cudaDriverVersion 12010 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.417 n193-018-074:2301454:2301454 [6] NCCL INFO Bootstrap : Using eth0:10.193.18.74<0> 2024-03-10 11:12:23.422 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:23.427 n193-018-074:2301454:2301454 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-10 11:12:23.439 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-10 11:12:23.447 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.448 n193-018-074:2301450:2302383 [2] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.461 n193-018-074:2301450:2302383 [2] NCCL INFO Using network IB 2024-03-10 11:12:23.462 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-10 11:12:23.462 n193-018-074:2301454:2302384 [6] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.193.18.74<0> 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:23.475 n193-018-074:2301454:2302384 [6] NCCL INFO Using network IB 2024-03-10 11:12:23.515 n193-018-074:2301454:2302384 [6] NCCL INFO comm 0xb8a29e30 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301450:2302383 [2] NCCL INFO comm 0x186781460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301455:2302322 [7] NCCL INFO comm 0xb983c130 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301453:2302344 [5] NCCL INFO comm 0x6f639cc0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301452:2302321 [4] NCCL INFO comm 0x185b03250 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301449:2302318 [1] NCCL INFO comm 0xa4fb2560 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301448:2302319 [0] NCCL INFO comm 0x198e24fa0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:23.515 n193-018-074:2301451:2302320 [3] NCCL INFO comm 0x6fd77d40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x2fa19338ddb25d3f - Init START 2024-03-10 11:12:25.574 n193-018-074:2301448:2302319 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.574 n193-018-074:2301448:2302319 [0] NCCL INFO NVLS multicast support is not available on dev 0 2024-03-10 11:12:25.600 n193-018-074:2301454:2302384 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.600 n193-018-074:2301454:2302384 [6] NCCL INFO NVLS multicast support is not available on dev 6 2024-03-10 11:12:25.608 n193-018-074:2301450:2302383 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.608 n193-018-074:2301450:2302383 [2] NCCL INFO NVLS multicast support is not available on dev 2 2024-03-10 11:12:25.611 n193-018-074:2301455:2302322 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.611 n193-018-074:2301455:2302322 [7] NCCL INFO NVLS multicast support is not available on dev 7 2024-03-10 11:12:25.613 n193-018-074:2301452:2302321 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.613 n193-018-074:2301452:2302321 [4] NCCL INFO NVLS multicast support is not available on dev 4 2024-03-10 11:12:25.614 n193-018-074:2301451:2302320 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.614 n193-018-074:2301451:2302320 [3] NCCL INFO NVLS multicast support is not available on dev 3 2024-03-10 11:12:25.617 n193-018-074:2301449:2302318 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,00000000,ffffffff 2024-03-10 11:12:25.618 n193-018-074:2301449:2302318 [1] NCCL INFO NVLS multicast support is not available on dev 1 2024-03-10 11:12:25.618 n193-018-074:2301453:2302344 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:25.618 n193-018-074:2301453:2302344 [5] NCCL INFO NVLS multicast support is not available on dev 5 2024-03-10 11:12:25.619 n193-018-074:2301453:2302344 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301454:2302384 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-10 11:12:25.619 n193-018-074:2301451:2302320 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-10 11:12:25.619 n193-018-074:2301455:2302322 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-10 11:12:25.619 n193-018-074:2301449:2302318 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301454:2302384 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301453:2302344 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301452:2302321 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301450:2302383 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-10 11:12:25.619 n193-018-074:2301449:2302318 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301455:2302322 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301451:2302320 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301452:2302321 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301450:2302383 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-10 11:12:25.619 n193-018-074:2301448:2302319 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:26.064 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.065 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.068 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.072 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.072 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.071 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.074 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.075 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.075 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.076 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.077 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.078 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.079 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.080 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.081 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.082 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.083 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.084 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.086 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.087 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.089 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.090 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.091 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.092 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.093 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.094 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.094 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.095 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.097 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.099 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.101 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.102 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.106 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.108 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.109 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.110 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.113 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.113 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.114 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.116 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.117 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.118 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.119 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.120 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.121 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.122 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.122 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.123 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.124 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.133 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.133 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.134 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.135 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.136 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.137 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.138 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.139 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.140 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.141 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.142 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.143 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.144 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.145 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.146 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.147 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.148 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.149 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.150 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.151 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.152 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.152 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.153 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.154 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.155 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.155 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.156 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.157 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.157 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.159 n193-018-074:2301448:2302319 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.159 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.753 n193-018-074:2301450:2302383 [2] NCCL INFO Connected all rings 2024-03-10 11:12:26.784 n193-018-074:2301449:2302318 [1] NCCL INFO Connected all rings 2024-03-10 11:12:26.784 n193-018-074:2301448:2302319 [0] NCCL INFO Connected all rings 2024-03-10 11:12:26.786 n193-018-074:2301452:2302321 [4] NCCL INFO Connected all rings 2024-03-10 11:12:26.786 n193-018-074:2301451:2302320 [3] NCCL INFO Connected all rings 2024-03-10 11:12:26.794 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.796 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.798 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.800 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.801 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.803 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.805 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.808 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.809 n193-018-074:2301455:2302322 [7] NCCL INFO Connected all rings 2024-03-10 11:12:26.809 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.809 n193-018-074:2301453:2302344 [5] NCCL INFO Connected all rings 2024-03-10 11:12:26.809 n193-018-074:2301454:2302384 [6] NCCL INFO Connected all rings 2024-03-10 11:12:26.810 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.812 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.813 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.815 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.816 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.818 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.819 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.820 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.822 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.823 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.824 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.827 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.829 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.829 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.832 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.832 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.835 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.836 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.838 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.839 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.841 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.842 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.843 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.844 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.844 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.846 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.846 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.847 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.847 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.849 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.850 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.851 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301450:2302383 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.852 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.854 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.854 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.855 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.856 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.857 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.857 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.858 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.858 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.859 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.859 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.860 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.861 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.861 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.862 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.863 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.863 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.864 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.864 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.865 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.865 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.866 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.866 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.868 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.869 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.870 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.870 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.871 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.871 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.872 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.873 n193-018-074:2301455:2302322 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:26.873 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.874 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.876 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.877 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.878 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.879 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.880 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.881 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.882 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.883 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.884 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.885 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.886 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.887 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.888 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.889 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.890 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.891 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.892 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.894 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.895 n193-018-074:2301452:2302321 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.896 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.897 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.897 n193-018-074:2301449:2302318 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.898 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.899 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.900 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.900 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.902 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.904 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.906 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.906 n193-018-074:2301451:2302320 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:26.907 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.911 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.911 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.913 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.913 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.914 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.915 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.916 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.917 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.918 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.918 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.920 n193-018-074:2301454:2302384 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:26.920 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:26.923 n193-018-074:2301453:2302344 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO Connected all trees 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.388 n193-018-074:2301448:2302319 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO Connected all trees 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.471 n193-018-074:2301449:2302318 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO Connected all trees 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.520 n193-018-074:2301450:2302383 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO Connected all trees 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.528 n193-018-074:2301451:2302320 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO Connected all trees 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.533 n193-018-074:2301455:2302322 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.535 n193-018-074:2301452:2302321 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.535 n193-018-074:2301454:2302384 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.535 n193-018-074:2301453:2302344 [5] NCCL INFO Connected all trees 2024-03-10 11:12:27.535 n193-018-074:2301453:2302344 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:27.536 n193-018-074:2301453:2302344 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:27.605 n193-018-074:2301455:2302322 [7] NCCL INFO comm 0xb983c130 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301452:2302321 [4] NCCL INFO comm 0x185b03250 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301454:2302384 [6] NCCL INFO comm 0xb8a29e30 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301450:2302383 [2] NCCL INFO comm 0x186781460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301449:2302318 [1] NCCL INFO comm 0xa4fb2560 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301451:2302320 [3] NCCL INFO comm 0x6fd77d40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301448:2302319 [0] NCCL INFO comm 0x198e24fa0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x2fa19338ddb25d3f - Init COMPLETE 2024-03-10 11:12:27.605 n193-018-074:2301453:2302344 [5] NCCL INFO comm 0x6f639cc0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x2fa19338ddb25d3f - Init COMPLETE /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") 0%| | 0/2774 [00:00<?, ?it/s]2024-03-10 11:12:44.332 n193-018-074:2301448:2302652 [0] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301448:2302652 [0] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301449:2302654 [1] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301449:2302654 [1] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301452:2302653 [4] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301453:2302656 [5] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301454:2302657 [6] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301451:2302655 [3] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301451:2302655 [3] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301452:2302653 [4] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301454:2302657 [6] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301453:2302656 [5] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301450:2302658 [2] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301450:2302658 [2] NCCL INFO Using network IB 2024-03-10 11:12:44.332 n193-018-074:2301455:2302659 [7] NCCL INFO Using non-device net plugin version 0 2024-03-10 11:12:44.332 n193-018-074:2301455:2302659 [7] NCCL INFO Using network IB 2024-03-10 11:12:44.341 n193-018-074:2301454:2302657 [6] NCCL INFO comm 0xb7182140 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301453:2302656 [5] NCCL INFO comm 0x1862cf940 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301452:2302653 [4] NCCL INFO comm 0x185a77340 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301451:2302655 [3] NCCL INFO comm 0xb6209bc0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301449:2302654 [1] NCCL INFO comm 0x1862a5d40 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301448:2302652 [0] NCCL INFO comm 0x1985b4bb0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301450:2302658 [2] NCCL INFO comm 0xb858a750 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:44.341 n193-018-074:2301455:2302659 [7] NCCL INFO comm 0x1872d08c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x699b1860b4474e85 - Init START 2024-03-10 11:12:46.445 n193-018-074:2301448:2302652 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,00000000,ffffffff 2024-03-10 11:12:46.445 n193-018-074:2301448:2302652 [0] NCCL INFO NVLS multicast support is not available on dev 0 2024-03-10 11:12:46.445 n193-018-074:2301449:2302654 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,00000000,ffffffff 2024-03-10 11:12:46.445 n193-018-074:2301449:2302654 [1] NCCL INFO NVLS multicast support is not available on dev 1 2024-03-10 11:12:46.446 n193-018-074:2301452:2302653 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:46.446 n193-018-074:2301452:2302653 [4] NCCL INFO NVLS multicast support is not available on dev 4 2024-03-10 11:12:46.451 n193-018-074:2301451:2302655 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,00000000,ffffffff 2024-03-10 11:12:46.451 n193-018-074:2301451:2302655 [3] NCCL INFO NVLS multicast support is not available on dev 3 2024-03-10 11:12:46.452 n193-018-074:2301455:2302659 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:46.452 n193-018-074:2301455:2302659 [7] NCCL INFO NVLS multicast support is not available on dev 7 2024-03-10 11:12:46.454 n193-018-074:2301454:2302657 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:46.454 n193-018-074:2301454:2302657 [6] NCCL INFO NVLS multicast support is not available on dev 6 2024-03-10 11:12:46.454 n193-018-074:2301450:2302658 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff 2024-03-10 11:12:46.454 n193-018-074:2301453:2302656 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,00000000,ffffffff,00000000 2024-03-10 11:12:46.454 n193-018-074:2301453:2302656 [5] NCCL INFO NVLS multicast support is not available on dev 5 2024-03-10 11:12:46.454 n193-018-074:2301450:2302658 [2] NCCL INFO NVLS multicast support is not available on dev 2 2024-03-10 11:12:46.455 n193-018-074:2301450:2302658 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-10 11:12:46.455 n193-018-074:2301450:2302658 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301449:2302654 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-10 11:12:46.455 n193-018-074:2301453:2302656 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301454:2302657 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-10 11:12:46.455 n193-018-074:2301453:2302656 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301449:2302654 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301455:2302659 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301454:2302657 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301455:2302659 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301452:2302653 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-10 11:12:46.455 n193-018-074:2301451:2302655 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301452:2302653 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301451:2302655 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-10 11:12:46.455 n193-018-074:2301448:2302652 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-10 11:12:46.773 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.774 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.775 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.776 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.776 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.777 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.778 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.779 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.780 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.781 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.782 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.783 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.783 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.784 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.784 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.785 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.786 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.787 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.787 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.788 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.789 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.789 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.790 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.791 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.792 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.793 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.793 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.794 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.795 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.795 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.796 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.797 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.799 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.800 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.801 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.803 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.804 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.805 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.806 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.807 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.808 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.809 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.810 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.811 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.812 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.812 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.813 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.814 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.815 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.816 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.816 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.817 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.818 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.819 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.820 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.821 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.821 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.822 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.823 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.824 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.825 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.826 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.827 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.828 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.829 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.830 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.831 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.832 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.833 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.834 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.835 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.836 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.837 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.838 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.839 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.839 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:46.840 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-10 11:12:46.841 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:46.842 n193-018-074:2301448:2302652 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:46.844 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.408 n193-018-074:2301449:2302654 [1] NCCL INFO Connected all rings 2024-03-10 11:12:47.416 n193-018-074:2301450:2302658 [2] NCCL INFO Connected all rings 2024-03-10 11:12:47.420 n193-018-074:2301448:2302652 [0] NCCL INFO Connected all rings 2024-03-10 11:12:47.445 n193-018-074:2301451:2302655 [3] NCCL INFO Connected all rings 2024-03-10 11:12:47.450 n193-018-074:2301452:2302653 [4] NCCL INFO Connected all rings 2024-03-10 11:12:47.455 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.457 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.459 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.461 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.463 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.465 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.465 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.466 n193-018-074:2301455:2302659 [7] NCCL INFO Connected all rings 2024-03-10 11:12:47.466 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.466 n193-018-074:2301453:2302656 [5] NCCL INFO Connected all rings 2024-03-10 11:12:47.466 n193-018-074:2301454:2302657 [6] NCCL INFO Connected all rings 2024-03-10 11:12:47.467 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.467 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.469 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.469 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.470 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.471 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.472 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.473 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.474 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.475 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.475 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.476 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.477 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.477 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.478 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.479 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.479 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.480 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.480 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.481 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.481 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.482 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.483 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.483 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.484 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.484 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.485 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.486 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.487 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.487 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.488 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.488 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.489 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.490 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.490 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.491 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.491 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.492 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.492 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.493 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.493 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.494 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.495 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.496 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.496 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.497 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301449:2302654 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.498 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.499 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.499 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.500 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.501 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.502 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.503 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.504 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.505 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.505 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.506 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.506 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.508 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.509 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.509 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301450:2302658 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.510 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.511 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.512 n193-018-074:2301455:2302659 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-10 11:12:47.513 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.514 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.515 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.516 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.517 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.518 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.519 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.520 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.521 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.522 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.522 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.523 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.523 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.524 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.524 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.525 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.526 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.527 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.528 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.528 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.529 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.530 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.531 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.532 n193-018-074:2301452:2302653 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-10 11:12:47.533 n193-018-074:2301451:2302655 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-10 11:12:47.533 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.534 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.535 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.535 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.536 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.537 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.538 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.538 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.539 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.540 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.540 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.542 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.543 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.544 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.544 n193-018-074:2301454:2302657 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-10 11:12:47.545 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:47.547 n193-018-074:2301453:2302656 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO Connected all trees 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.003 n193-018-074:2301448:2302652 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO Connected all trees 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.065 n193-018-074:2301449:2302654 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO Connected all trees 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.071 n193-018-074:2301450:2302658 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO Connected all trees 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.090 n193-018-074:2301455:2302659 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO Connected all trees 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.091 n193-018-074:2301454:2302657 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301451:2302655 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO Connected all trees 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-10 11:12:48.092 n193-018-074:2301453:2302656 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.092 n193-018-074:2301452:2302653 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-10 11:12:48.116 n193-018-074:2301448:2302652 [0] NCCL INFO comm 0x1985b4bb0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301453:2302656 [5] NCCL INFO comm 0x1862cf940 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301454:2302657 [6] NCCL INFO comm 0xb7182140 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301452:2302653 [4] NCCL INFO comm 0x185a77340 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.116 n193-018-074:2301450:2302658 [2] NCCL INFO comm 0xb858a750 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301455:2302659 [7] NCCL INFO comm 0x1872d08c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301451:2302655 [3] NCCL INFO comm 0xb6209bc0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0x699b1860b4474e85 - Init COMPLETE 2024-03-10 11:12:48.117 n193-018-074:2301449:2302654 [1] NCCL INFO comm 0x1862a5d40 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0x699b1860b4474e85 - Init COMPLETE /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True). warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) 0%| | 1/2774 [00:24<18:30:55, 24.04s/it] {'loss': 1.1294, 'learning_rate': 5.952380952380953e-08, 'epoch': 0.0} 0%| | 1/2774 [00:24<18:30:55, 24.04s/it] 0%| | 2/2774 [00:35<12:43:02, 16.52s/it] {'loss': 1.1387, 'learning_rate': 1.1904761904761906e-07, 'epoch': 0.0} 0%| | 2/2774 [00:35<12:43:02, 16.52s/it] 0%| | 3/2774 [00:46<11:00:19, 14.30s/it] {'loss': 1.1772, 'learning_rate': 1.7857142857142858e-07, 'epoch': 0.0} 0%| | 3/2774 [00:46<11:00:19, 14.30s/it] 0%| | 4/2774 [00:58<10:02:47, 13.06s/it] {'loss': 1.1333, 'learning_rate': 2.3809523809523811e-07, 'epoch': 0.0} 0%| | 4/2774 [00:58<10:02:47, 13.06s/it] 0%| | 5/2774 [01:09<9:34:00, 12.44s/it] {'loss': 1.1484, 'learning_rate': 2.9761904761904765e-07, 'epoch': 0.0} 0%| | 5/2774 [01:09<9:34:00, 12.44s/it] 0%| | 6/2774 [01:22<9:42:47, 12.63s/it] {'loss': 1.1582, 'learning_rate': 3.5714285714285716e-07, 'epoch': 0.0} 0%| | 6/2774 [01:22<9:42:47, 12.63s/it] 0%| | 7/2774 [01:34<9:36:10, 12.49s/it] {'loss': 1.0933, 'learning_rate': 4.1666666666666667e-07, 'epoch': 0.0} 0%| | 7/2774 [01:34<9:36:10, 12.49s/it] 0%| | 8/2774 [01:45<9:16:32, 12.07s/it] {'loss': 1.1504, 'learning_rate': 4.7619047619047623e-07, 'epoch': 0.0} 0%| | 8/2774 [01:45<9:16:32, 12.07s/it] 0%| | 9/2774 [01:58<9:19:28, 12.14s/it] {'loss': 1.1104, 'learning_rate': 5.357142857142857e-07, 'epoch': 0.0} 0%| | 9/2774 [01:58<9:19:28, 12.14s/it] 0%| | 10/2774 [02:09<9:10:17, 11.95s/it] {'loss': 1.1162, 'learning_rate': 5.952380952380953e-07, 'epoch': 0.0} 0%| | 10/2774 [02:09<9:10:17, 11.95s/it] 0%| | 11/2774 [02:21<9:04:01, 11.81s/it] {'loss': 1.1558, 'learning_rate': 6.547619047619048e-07, 'epoch': 0.0} 0%| | 11/2774 [02:21<9:04:01, 11.81s/it] 0%| | 12/2774 [02:33<9:06:58, 11.88s/it] {'loss': 1.165, 'learning_rate': 7.142857142857143e-07, 'epoch': 0.0} 0%| | 12/2774 [02:33<9:06:58, 11.88s/it] 0%| | 13/2774 [02:44<9:01:05, 11.76s/it] {'loss': 1.1729, 'learning_rate': 7.738095238095239e-07, 'epoch': 0.0} 0%| | 13/2774 [02:44<9:01:05, 11.76s/it] 1%| | 14/2774 [02:55<8:54:09, 11.61s/it] {'loss': 1.1426, 'learning_rate': 8.333333333333333e-07, 'epoch': 0.01} 1%| | 14/2774 [02:55<8:54:09, 11.61s/it] 1%| | 15/2774 [03:09<9:19:45, 12.17s/it] {'loss': 1.0425, 'learning_rate': 8.928571428571429e-07, 'epoch': 0.01} 1%| | 15/2774 [03:09<9:19:45, 12.17s/it] 1%| | 16/2774 [03:20<9:07:34, 11.91s/it] {'loss': 1.1709, 'learning_rate': 9.523809523809525e-07, 'epoch': 0.01} 1%| | 16/2774 [03:20<9:07:34, 11.91s/it] 1%| | 17/2774 [03:32<9:06:06, 11.88s/it] {'loss': 1.1748, 'learning_rate': 1.011904761904762e-06, 'epoch': 0.01} 1%| | 17/2774 [03:32<9:06:06, 11.88s/it] 1%| | 18/2774 [03:44<9:01:43, 11.79s/it] {'loss': 1.2041, 'learning_rate': 1.0714285714285714e-06, 'epoch': 0.01} 1%| | 18/2774 [03:44<9:01:43, 11.79s/it] 1%| | 19/2774 [03:55<9:00:46, 11.78s/it] {'loss': 1.1641, 'learning_rate': 1.130952380952381e-06, 'epoch': 0.01} 1%| | 19/2774 [03:55<9:00:46, 11.78s/it] 1%| | 20/2774 [04:07<9:02:40, 11.82s/it] {'loss': 1.0566, 'learning_rate': 1.1904761904761906e-06, 'epoch': 0.01} 1%| | 20/2774 [04:07<9:02:40, 11.82s/it] 1%| | 21/2774 [04:19<9:03:47, 11.85s/it] {'loss': 1.1689, 'learning_rate': 1.25e-06, 'epoch': 0.01} 1%| | 21/2774 [04:19<9:03:47, 11.85s/it] 1%| | 22/2774 [04:31<8:57:40, 11.72s/it] {'loss': 1.1099, 'learning_rate': 1.3095238095238096e-06, 'epoch': 0.01} 1%| | 22/2774 [04:31<8:57:40, 11.72s/it] 1%| | 23/2774 [04:42<8:49:14, 11.54s/it] {'loss': 1.104, 'learning_rate': 1.3690476190476193e-06, 'epoch': 0.01} 1%| | 23/2774 [04:42<8:49:14, 11.54s/it] 1%| | 24/2774 [04:53<8:45:14, 11.46s/it] {'loss': 1.126, 'learning_rate': 1.4285714285714286e-06, 'epoch': 0.01} 1%| | 24/2774 [04:53<8:45:14, 11.46s/it] 1%| | 25/2774 [05:05<8:47:18, 11.51s/it] {'loss': 1.0576, 'learning_rate': 1.4880952380952381e-06, 'epoch': 0.01} 1%| | 25/2774 [05:05<8:47:18, 11.51s/it] 1%| | 26/2774 [05:17<8:54:40, 11.67s/it] {'loss': 1.0488, 'learning_rate': 1.5476190476190479e-06, 'epoch': 0.01} 1%| | 26/2774 [05:17<8:54:40, 11.67s/it] 1%| | 27/2774 [05:28<8:53:58, 11.66s/it] {'loss': 1.1729, 'learning_rate': 1.6071428571428574e-06, 'epoch': 0.01} 1%| | 27/2774 [05:28<8:53:58, 11.66s/it] 1%| | 28/2774 [05:40<8:56:26, 11.72s/it] {'loss': 1.1636, 'learning_rate': 1.6666666666666667e-06, 'epoch': 0.01} 1%| | 28/2774 [05:40<8:56:26, 11.72s/it] 1%| | 29/2774 [05:53<9:06:07, 11.94s/it] {'loss': 1.0781, 'learning_rate': 1.7261904761904764e-06, 'epoch': 0.01} 1%| | 29/2774 [05:53<9:06:07, 11.94s/it] 1%| | 30/2774 [06:04<8:59:55, 11.81s/it] {'loss': 1.0566, 'learning_rate': 1.7857142857142859e-06, 'epoch': 0.01} 1%| | 30/2774 [06:04<8:59:55, 11.81s/it] 1%| | 31/2774 [06:16<8:54:05, 11.68s/it] {'loss': 1.082, 'learning_rate': 1.8452380952380954e-06, 'epoch': 0.01} 1%| | 31/2774 [06:16<8:54:05, 11.68s/it] 1%| | 32/2774 [06:28<9:00:40, 11.83s/it] {'loss': 1.042, 'learning_rate': 1.904761904761905e-06, 'epoch': 0.01} 1%| | 32/2774 [06:28<9:00:40, 11.83s/it] 1%| | 33/2774 [06:41<9:26:14, 12.39s/it] {'loss': 1.0249, 'learning_rate': 1.9642857142857144e-06, 'epoch': 0.01} 1%| | 33/2774 [06:41<9:26:14, 12.39s/it] 1%| | 34/2774 [06:53<9:10:32, 12.06s/it] {'loss': 1.0791, 'learning_rate': 2.023809523809524e-06, 'epoch': 0.01} 1%| | 34/2774 [06:53<9:10:32, 12.06s/it] 1%|▏ | 35/2774 [07:04<9:05:50, 11.96s/it] {'loss': 1.0693, 'learning_rate': 2.0833333333333334e-06, 'epoch': 0.01} 1%|▏ | 35/2774 [07:04<9:05:50, 11.96s/it] 1%|▏ | 36/2774 [07:16<9:04:14, 11.93s/it] {'loss': 1.0415, 'learning_rate': 2.1428571428571427e-06, 'epoch': 0.01} 1%|▏ | 36/2774 [07:16<9:04:14, 11.93s/it] 1%|▏ | 37/2774 [07:30<9:30:08, 12.50s/it] {'loss': 1.0752, 'learning_rate': 2.2023809523809525e-06, 'epoch': 0.01} 1%|▏ | 37/2774 [07:30<9:30:08, 12.50s/it] 1%|▏ | 38/2774 [07:42<9:17:09, 12.22s/it] {'loss': 1.0649, 'learning_rate': 2.261904761904762e-06, 'epoch': 0.01} 1%|▏ | 38/2774 [07:42<9:17:09, 12.22s/it] 1%|▏ | 39/2774 [07:55<9:31:16, 12.53s/it] {'loss': 1.0972, 'learning_rate': 2.321428571428572e-06, 'epoch': 0.01} 1%|▏ | 39/2774 [07:55<9:31:16, 12.53s/it] 1%|▏ | 40/2774 [08:06<9:14:58, 12.18s/it] {'loss': 1.0835, 'learning_rate': 2.380952380952381e-06, 'epoch': 0.01} 1%|▏ | 40/2774 [08:06<9:14:58, 12.18s/it] 1%|▏ | 41/2774 [08:19<9:28:01, 12.47s/it] {'loss': 1.0435, 'learning_rate': 2.4404761904761905e-06, 'epoch': 0.01} 1%|▏ | 41/2774 [08:19<9:28:01, 12.47s/it] 2%|▏ | 42/2774 [08:32<9:34:32, 12.62s/it] {'loss': 1.105, 'learning_rate': 2.5e-06, 'epoch': 0.02} 2%|▏ | 42/2774 [08:32<9:34:32, 12.62s/it] 2%|▏ | 43/2774 [08:44<9:21:49, 12.34s/it] {'loss': 1.1113, 'learning_rate': 2.5595238095238095e-06, 'epoch': 0.02} 2%|▏ | 43/2774 [08:44<9:21:49, 12.34s/it] 2%|▏ | 44/2774 [08:56<9:10:51, 12.11s/it] {'loss': 1.002, 'learning_rate': 2.6190476190476192e-06, 'epoch': 0.02} 2%|▏ | 44/2774 [08:56<9:10:51, 12.11s/it] 2%|▏ | 45/2774 [09:07<8:59:35, 11.86s/it] {'loss': 1.0391, 'learning_rate': 2.6785714285714285e-06, 'epoch': 0.02} 2%|▏ | 45/2774 [09:07<8:59:35, 11.86s/it] 2%|▏ | 46/2774 [09:18<8:52:02, 11.70s/it] {'loss': 1.0635, 'learning_rate': 2.7380952380952387e-06, 'epoch': 0.02} 2%|▏ | 46/2774 [09:18<8:52:02, 11.70s/it] 2%|▏ | 47/2774 [09:30<8:51:18, 11.69s/it] {'loss': 1.0591, 'learning_rate': 2.797619047619048e-06, 'epoch': 0.02} 2%|▏ | 47/2774 [09:30<8:51:18, 11.69s/it] 2%|▏ | 48/2774 [09:41<8:46:33, 11.59s/it] {'loss': 1.083, 'learning_rate': 2.8571428571428573e-06, 'epoch': 0.02} 2%|▏ | 48/2774 [09:41<8:46:33, 11.59s/it] 2%|▏ | 49/2774 [09:53<8:51:43, 11.71s/it] {'loss': 1.0903, 'learning_rate': 2.916666666666667e-06, 'epoch': 0.02} 2%|▏ | 49/2774 [09:53<8:51:43, 11.71s/it] 2%|▏ | 50/2774 [10:05<8:49:51, 11.67s/it] {'loss': 1.0776, 'learning_rate': 2.9761904761904763e-06, 'epoch': 0.02} 2%|▏ | 50/2774 [10:05<8:49:51, 11.67s/it] 2%|▏ | 51/2774 [10:16<8:46:36, 11.60s/it] {'loss': 0.9985, 'learning_rate': 3.0357142857142856e-06, 'epoch': 0.02} 2%|▏ | 51/2774 [10:16<8:46:36, 11.60s/it] 2%|▏ | 52/2774 [10:29<9:02:45, 11.96s/it] {'loss': 1.002, 'learning_rate': 3.0952380952380957e-06, 'epoch': 0.02} 2%|▏ | 52/2774 [10:29<9:02:45, 11.96s/it] 2%|▏ | 53/2774 [10:41<9:00:23, 11.92s/it] {'loss': 1.0967, 'learning_rate': 3.154761904761905e-06, 'epoch': 0.02} 2%|▏ | 53/2774 [10:41<9:00:23, 11.92s/it] 2%|▏ | 54/2774 [10:52<8:50:58, 11.71s/it] {'loss': 1.0996, 'learning_rate': 3.2142857142857147e-06, 'epoch': 0.02} 2%|▏ | 54/2774 [10:52<8:50:58, 11.71s/it] 2%|▏ | 55/2774 [11:04<8:46:49, 11.63s/it] {'loss': 1.0845, 'learning_rate': 3.273809523809524e-06, 'epoch': 0.02} 2%|▏ | 55/2774 [11:04<8:46:49, 11.63s/it] 2%|▏ | 56/2774 [11:15<8:45:14, 11.59s/it] {'loss': 1.0469, 'learning_rate': 3.3333333333333333e-06, 'epoch': 0.02} 2%|▏ | 56/2774 [11:15<8:45:14, 11.59s/it] 2%|▏ | 57/2774 [11:27<8:43:57, 11.57s/it] {'loss': 1.0381, 'learning_rate': 3.3928571428571435e-06, 'epoch': 0.02} 2%|▏ | 57/2774 [11:27<8:43:57, 11.57s/it] 2%|▏ | 58/2774 [11:38<8:45:27, 11.61s/it] {'loss': 1.0674, 'learning_rate': 3.4523809523809528e-06, 'epoch': 0.02} 2%|▏ | 58/2774 [11:38<8:45:27, 11.61s/it] 2%|▏ | 59/2774 [11:51<8:56:43, 11.86s/it] {'loss': 1.0181, 'learning_rate': 3.511904761904762e-06, 'epoch': 0.02} 2%|▏ | 59/2774 [11:51<8:56:43, 11.86s/it] 2%|▏ | 60/2774 [12:03<9:06:03, 12.07s/it] {'loss': 1.0278, 'learning_rate': 3.5714285714285718e-06, 'epoch': 0.02} 2%|▏ | 60/2774 [12:03<9:06:03, 12.07s/it] 2%|▏ | 61/2774 [12:17<9:25:33, 12.51s/it] {'loss': 0.9951, 'learning_rate': 3.630952380952381e-06, 'epoch': 0.02} 2%|▏ | 61/2774 [12:17<9:25:33, 12.51s/it] 2%|▏ | 62/2774 [12:29<9:14:41, 12.27s/it] {'loss': 1.0386, 'learning_rate': 3.690476190476191e-06, 'epoch': 0.02} 2%|▏ | 62/2774 [12:29<9:14:41, 12.27s/it] 2%|▏ | 63/2774 [12:42<9:33:38, 12.70s/it] {'loss': 1.0269, 'learning_rate': 3.7500000000000005e-06, 'epoch': 0.02} 2%|▏ | 63/2774 [12:42<9:33:38, 12.70s/it] 2%|▏ | 64/2774 [12:54<9:16:43, 12.33s/it] {'loss': 1.0552, 'learning_rate': 3.80952380952381e-06, 'epoch': 0.02} 2%|▏ | 64/2774 [12:54<9:16:43, 12.33s/it] 2%|▏ | 65/2774 [13:05<9:03:49, 12.04s/it] {'loss': 1.063, 'learning_rate': 3.869047619047619e-06, 'epoch': 0.02} 2%|▏ | 65/2774 [13:05<9:03:49, 12.04s/it] 2%|▏ | 66/2774 [13:18<9:15:38, 12.31s/it] {'loss': 1.0244, 'learning_rate': 3.928571428571429e-06, 'epoch': 0.02} 2%|▏ | 66/2774 [13:18<9:15:38, 12.31s/it] 2%|▏ | 67/2774 [13:29<9:02:32, 12.03s/it] {'loss': 1.105, 'learning_rate': 3.9880952380952386e-06, 'epoch': 0.02} 2%|▏ | 67/2774 [13:29<9:02:32, 12.03s/it] 2%|▏ | 68/2774 [13:41<8:55:36, 11.88s/it] {'loss': 1.043, 'learning_rate': 4.047619047619048e-06, 'epoch': 0.02} 2%|▏ | 68/2774 [13:41<8:55:36, 11.88s/it] 2%|▏ | 69/2774 [13:53<8:54:22, 11.85s/it] {'loss': 1.0854, 'learning_rate': 4.107142857142857e-06, 'epoch': 0.02} 2%|▏ | 69/2774 [13:53<8:54:22, 11.85s/it] 3%|▎ | 70/2774 [14:04<8:45:55, 11.67s/it] {'loss': 0.9614, 'learning_rate': 4.166666666666667e-06, 'epoch': 0.03} 3%|▎ | 70/2774 [14:04<8:45:55, 11.67s/it] 3%|▎ | 71/2774 [14:16<8:44:02, 11.63s/it] {'loss': 1.0498, 'learning_rate': 4.226190476190477e-06, 'epoch': 0.03} 3%|▎ | 71/2774 [14:16<8:44:02, 11.63s/it] 3%|▎ | 72/2774 [14:27<8:41:28, 11.58s/it] {'loss': 1.0669, 'learning_rate': 4.2857142857142855e-06, 'epoch': 0.03} 3%|▎ | 72/2774 [14:27<8:41:28, 11.58s/it] 3%|▎ | 73/2774 [14:38<8:38:21, 11.51s/it] {'loss': 1.042, 'learning_rate': 4.345238095238096e-06, 'epoch': 0.03} 3%|▎ | 73/2774 [14:38<8:38:21, 11.51s/it] 3%|▎ | 74/2774 [14:50<8:36:57, 11.49s/it] {'loss': 1.083, 'learning_rate': 4.404761904761905e-06, 'epoch': 0.03} 3%|▎ | 74/2774 [14:50<8:36:57, 11.49s/it] 3%|▎ | 75/2774 [15:01<8:37:22, 11.50s/it] {'loss': 1.0146, 'learning_rate': 4.464285714285715e-06, 'epoch': 0.03} 3%|▎ | 75/2774 [15:01<8:37:22, 11.50s/it] 3%|▎ | 76/2774 [15:12<8:32:22, 11.39s/it] {'loss': 1.0522, 'learning_rate': 4.523809523809524e-06, 'epoch': 0.03} 3%|▎ | 76/2774 [15:12<8:32:22, 11.39s/it] 3%|▎ | 77/2774 [15:24<8:36:05, 11.48s/it] {'loss': 1.0, 'learning_rate': 4.583333333333333e-06, 'epoch': 0.03} 3%|▎ | 77/2774 [15:24<8:36:05, 11.48s/it] 3%|▎ | 78/2774 [15:35<8:33:54, 11.44s/it] {'loss': 1.0752, 'learning_rate': 4.642857142857144e-06, 'epoch': 0.03} 3%|▎ | 78/2774 [15:35<8:33:54, 11.44s/it] 3%|▎ | 79/2774 [15:47<8:30:43, 11.37s/it] {'loss': 1.0649, 'learning_rate': 4.702380952380953e-06, 'epoch': 0.03} 3%|▎ | 79/2774 [15:47<8:30:43, 11.37s/it] 3%|▎ | 80/2774 [15:58<8:30:48, 11.38s/it] {'loss': 1.0405, 'learning_rate': 4.761904761904762e-06, 'epoch': 0.03} 3%|▎ | 80/2774 [15:58<8:30:48, 11.38s/it] 3%|▎ | 81/2774 [16:09<8:31:16, 11.39s/it] {'loss': 1.0566, 'learning_rate': 4.821428571428572e-06, 'epoch': 0.03} 3%|▎ | 81/2774 [16:09<8:31:16, 11.39s/it] 3%|▎ | 82/2774 [16:21<8:31:21, 11.40s/it] {'loss': 1.0557, 'learning_rate': 4.880952380952381e-06, 'epoch': 0.03} 3%|▎ | 82/2774 [16:21<8:31:21, 11.40s/it] 3%|▎ | 83/2774 [16:33<8:41:01, 11.62s/it] {'loss': 0.9883, 'learning_rate': 4.940476190476191e-06, 'epoch': 0.03} 3%|▎ | 83/2774 [16:33<8:41:01, 11.62s/it] 3%|▎ | 84/2774 [16:44<8:35:23, 11.50s/it] {'loss': 0.9775, 'learning_rate': 5e-06, 'epoch': 0.03} 3%|▎ | 84/2774 [16:44<8:35:23, 11.50s/it] 3%|▎ | 85/2774 [16:56<8:42:45, 11.66s/it] {'loss': 0.9976, 'learning_rate': 4.999998295075511e-06, 'epoch': 0.03} 3%|▎ | 85/2774 [16:56<8:42:45, 11.66s/it] 3%|▎ | 86/2774 [17:08<8:43:13, 11.68s/it] {'loss': 1.0503, 'learning_rate': 4.9999931803043675e-06, 'epoch': 0.03} 3%|▎ | 86/2774 [17:08<8:43:13, 11.68s/it] 3%|▎ | 87/2774 [17:19<8:39:57, 11.61s/it] {'loss': 1.0117, 'learning_rate': 4.999984655693547e-06, 'epoch': 0.03} 3%|▎ | 87/2774 [17:19<8:39:57, 11.61s/it] 3%|▎ | 88/2774 [17:31<8:37:36, 11.56s/it] {'loss': 1.0293, 'learning_rate': 4.999972721254676e-06, 'epoch': 0.03} 3%|▎ | 88/2774 [17:31<8:37:36, 11.56s/it] 3%|▎ | 89/2774 [17:43<8:38:24, 11.58s/it] {'loss': 1.0674, 'learning_rate': 4.999957377004031e-06, 'epoch': 0.03} 3%|▎ | 89/2774 [17:43<8:38:24, 11.58s/it] 3%|▎ | 90/2774 [17:54<8:33:31, 11.48s/it] {'loss': 1.0532, 'learning_rate': 4.9999386229625436e-06, 'epoch': 0.03} 3%|▎ | 90/2774 [17:54<8:33:31, 11.48s/it] 3%|▎ | 91/2774 [18:05<8:31:35, 11.44s/it] {'loss': 0.9653, 'learning_rate': 4.999916459155791e-06, 'epoch': 0.03} 3%|▎ | 91/2774 [18:05<8:31:35, 11.44s/it] 3%|▎ | 92/2774 [18:17<8:32:24, 11.46s/it] {'loss': 1.0649, 'learning_rate': 4.999890885614004e-06, 'epoch': 0.03} 3%|▎ | 92/2774 [18:17<8:32:24, 11.46s/it] 3%|▎ | 93/2774 [18:28<8:27:06, 11.35s/it] {'loss': 1.0786, 'learning_rate': 4.999861902372063e-06, 'epoch': 0.03} 3%|▎ | 93/2774 [18:28<8:27:06, 11.35s/it] 3%|▎ | 94/2774 [18:40<8:32:49, 11.48s/it] {'loss': 1.0527, 'learning_rate': 4.9998295094694995e-06, 'epoch': 0.03} 3%|▎ | 94/2774 [18:40<8:32:49, 11.48s/it] 3%|▎ | 95/2774 [18:53<8:56:49, 12.02s/it] {'loss': 0.979, 'learning_rate': 4.999793706950496e-06, 'epoch': 0.03} 3%|▎ | 95/2774 [18:53<8:56:49, 12.02s/it] 3%|▎ | 96/2774 [19:04<8:44:37, 11.75s/it] {'loss': 1.0317, 'learning_rate': 4.999754494863884e-06, 'epoch': 0.03} 3%|▎ | 96/2774 [19:04<8:44:37, 11.75s/it] 3%|▎ | 97/2774 [19:15<8:41:24, 11.69s/it] {'loss': 1.0347, 'learning_rate': 4.999711873263148e-06, 'epoch': 0.03} 3%|▎ | 97/2774 [19:15<8:41:24, 11.69s/it] 4%|▎ | 98/2774 [19:27<8:37:23, 11.60s/it] {'loss': 1.0322, 'learning_rate': 4.9996658422064195e-06, 'epoch': 0.04} 4%|▎ | 98/2774 [19:27<8:37:23, 11.60s/it] 4%|▎ | 99/2774 [19:39<8:38:23, 11.63s/it] {'loss': 1.0366, 'learning_rate': 4.9996164017564835e-06, 'epoch': 0.04} 4%|▎ | 99/2774 [19:39<8:38:23, 11.63s/it] 4%|▎ | 100/2774 [19:50<8:35:40, 11.57s/it] {'loss': 1.0151, 'learning_rate': 4.999563551980773e-06, 'epoch': 0.04} 4%|▎ | 100/2774 [19:50<8:35:40, 11.57s/it] 4%|▎ | 101/2774 [20:03<8:55:10, 12.01s/it] {'loss': 0.9834, 'learning_rate': 4.999507292951371e-06, 'epoch': 0.04} 4%|▎ | 101/2774 [20:03<8:55:10, 12.01s/it] 4%|▎ | 102/2774 [20:14<8:43:27, 11.75s/it] {'loss': 1.0244, 'learning_rate': 4.9994476247450145e-06, 'epoch': 0.04} 4%|▎ | 102/2774 [20:14<8:43:27, 11.75s/it] 4%|▎ | 103/2774 [20:26<8:48:57, 11.88s/it] {'loss': 0.9766, 'learning_rate': 4.999384547443084e-06, 'epoch': 0.04} 4%|▎ | 103/2774 [20:26<8:48:57, 11.88s/it] 4%|▎ | 104/2774 [20:38<8:43:11, 11.76s/it] {'loss': 1.0312, 'learning_rate': 4.999318061131614e-06, 'epoch': 0.04} 4%|▎ | 104/2774 [20:38<8:43:11, 11.76s/it] 4%|▍ | 105/2774 [20:49<8:37:39, 11.64s/it] {'loss': 1.0693, 'learning_rate': 4.999248165901289e-06, 'epoch': 0.04} 4%|▍ | 105/2774 [20:49<8:37:39, 11.64s/it] 4%|▍ | 106/2774 [21:01<8:39:37, 11.69s/it] {'loss': 1.0796, 'learning_rate': 4.999174861847441e-06, 'epoch': 0.04} 4%|▍ | 106/2774 [21:01<8:39:37, 11.69s/it] 4%|▍ | 107/2774 [21:13<8:38:49, 11.67s/it] {'loss': 1.0425, 'learning_rate': 4.999098149070052e-06, 'epoch': 0.04} 4%|▍ | 107/2774 [21:13<8:38:49, 11.67s/it] 4%|▍ | 108/2774 [21:26<9:07:31, 12.32s/it] {'loss': 1.0151, 'learning_rate': 4.999018027673754e-06, 'epoch': 0.04} 4%|▍ | 108/2774 [21:26<9:07:31, 12.32s/it] 4%|▍ | 109/2774 [21:38<9:02:47, 12.22s/it] {'loss': 1.0215, 'learning_rate': 4.998934497767829e-06, 'epoch': 0.04} 4%|▍ | 109/2774 [21:38<9:02:47, 12.22s/it] 4%|▍ | 110/2774 [21:50<8:55:07, 12.05s/it] {'loss': 1.0576, 'learning_rate': 4.998847559466204e-06, 'epoch': 0.04} 4%|▍ | 110/2774 [21:50<8:55:07, 12.05s/it] 4%|▍ | 111/2774 [22:03<9:06:03, 12.30s/it] {'loss': 1.0854, 'learning_rate': 4.99875721288746e-06, 'epoch': 0.04} 4%|▍ | 111/2774 [22:03<9:06:03, 12.30s/it] 4%|▍ | 112/2774 [22:14<8:50:11, 11.95s/it] {'loss': 1.042, 'learning_rate': 4.9986634581548235e-06, 'epoch': 0.04} 4%|▍ | 112/2774 [22:14<8:50:11, 11.95s/it] 4%|▍ | 113/2774 [22:26<8:43:53, 11.81s/it] {'loss': 1.0664, 'learning_rate': 4.998566295396169e-06, 'epoch': 0.04} 4%|▍ | 113/2774 [22:26<8:43:53, 11.81s/it] 4%|▍ | 114/2774 [22:37<8:40:05, 11.73s/it] {'loss': 1.0088, 'learning_rate': 4.998465724744023e-06, 'epoch': 0.04} 4%|▍ | 114/2774 [22:37<8:40:05, 11.73s/it] 4%|▍ | 115/2774 [22:48<8:31:47, 11.55s/it] {'loss': 1.0615, 'learning_rate': 4.998361746335556e-06, 'epoch': 0.04} 4%|▍ | 115/2774 [22:48<8:31:47, 11.55s/it] 4%|▍ | 116/2774 [23:01<8:48:11, 11.92s/it] {'loss': 1.042, 'learning_rate': 4.998254360312589e-06, 'epoch': 0.04} 4%|▍ | 116/2774 [23:01<8:48:11, 11.92s/it] 4%|▍ | 117/2774 [23:13<8:50:49, 11.99s/it] {'loss': 1.0137, 'learning_rate': 4.998143566821589e-06, 'epoch': 0.04} 4%|▍ | 117/2774 [23:13<8:50:49, 11.99s/it] 4%|▍ | 118/2774 [23:25<8:45:34, 11.87s/it] {'loss': 1.0347, 'learning_rate': 4.998029366013674e-06, 'epoch': 0.04} 4%|▍ | 118/2774 [23:25<8:45:34, 11.87s/it] 4%|▍ | 119/2774 [23:36<8:42:12, 11.80s/it] {'loss': 1.0532, 'learning_rate': 4.997911758044605e-06, 'epoch': 0.04} 4%|▍ | 119/2774 [23:36<8:42:12, 11.80s/it] 4%|▍ | 120/2774 [23:48<8:37:05, 11.69s/it] {'loss': 1.0752, 'learning_rate': 4.997790743074793e-06, 'epoch': 0.04} 4%|▍ | 120/2774 [23:48<8:37:05, 11.69s/it] 4%|▍ | 121/2774 [23:59<8:31:51, 11.58s/it] {'loss': 0.9868, 'learning_rate': 4.997666321269294e-06, 'epoch': 0.04} 4%|▍ | 121/2774 [23:59<8:31:51, 11.58s/it] 4%|▍ | 122/2774 [24:11<8:31:34, 11.57s/it] {'loss': 1.0439, 'learning_rate': 4.997538492797813e-06, 'epoch': 0.04} 4%|▍ | 122/2774 [24:11<8:31:34, 11.57s/it] 4%|▍ | 123/2774 [24:22<8:28:32, 11.51s/it] {'loss': 0.9834, 'learning_rate': 4.9974072578347e-06, 'epoch': 0.04} 4%|▍ | 123/2774 [24:22<8:28:32, 11.51s/it] 4%|▍ | 124/2774 [24:34<8:27:25, 11.49s/it] {'loss': 1.106, 'learning_rate': 4.997272616558952e-06, 'epoch': 0.04} 4%|▍ | 124/2774 [24:34<8:27:25, 11.49s/it] 5%|▍ | 125/2774 [24:46<8:46:10, 11.92s/it] {'loss': 1.0312, 'learning_rate': 4.99713456915421e-06, 'epoch': 0.05} 5%|▍ | 125/2774 [24:46<8:46:10, 11.92s/it] 5%|▍ | 126/2774 [24:58<8:40:16, 11.79s/it] {'loss': 0.9814, 'learning_rate': 4.996993115808765e-06, 'epoch': 0.05} 5%|▍ | 126/2774 [24:58<8:40:16, 11.79s/it] 5%|▍ | 127/2774 [25:10<8:38:26, 11.75s/it] {'loss': 1.0161, 'learning_rate': 4.996848256715547e-06, 'epoch': 0.05} 5%|▍ | 127/2774 [25:10<8:38:26, 11.75s/it] 5%|▍ | 128/2774 [25:22<8:41:46, 11.83s/it] {'loss': 1.0122, 'learning_rate': 4.996699992072139e-06, 'epoch': 0.05} 5%|▍ | 128/2774 [25:22<8:41:46, 11.83s/it] 5%|▍ | 129/2774 [25:34<8:53:12, 12.10s/it] {'loss': 1.0459, 'learning_rate': 4.996548322080763e-06, 'epoch': 0.05} 5%|▍ | 129/2774 [25:34<8:53:12, 12.10s/it] 5%|▍ | 130/2774 [25:46<8:46:33, 11.95s/it] {'loss': 1.0337, 'learning_rate': 4.996393246948288e-06, 'epoch': 0.05} 5%|▍ | 130/2774 [25:46<8:46:33, 11.95s/it] 5%|▍ | 131/2774 [25:57<8:39:34, 11.80s/it] {'loss': 1.0166, 'learning_rate': 4.996234766886227e-06, 'epoch': 0.05} 5%|▍ | 131/2774 [25:57<8:39:34, 11.80s/it] 5%|▍ | 132/2774 [26:09<8:32:49, 11.65s/it] {'loss': 1.0342, 'learning_rate': 4.996072882110737e-06, 'epoch': 0.05} 5%|▍ | 132/2774 [26:09<8:32:49, 11.65s/it] 5%|▍ | 133/2774 [26:20<8:33:21, 11.66s/it] {'loss': 1.0039, 'learning_rate': 4.995907592842619e-06, 'epoch': 0.05} 5%|▍ | 133/2774 [26:20<8:33:21, 11.66s/it] 5%|▍ | 134/2774 [26:32<8:26:23, 11.51s/it] {'loss': 1.0557, 'learning_rate': 4.995738899307319e-06, 'epoch': 0.05} 5%|▍ | 134/2774 [26:32<8:26:23, 11.51s/it] 5%|▍ | 135/2774 [26:43<8:27:52, 11.55s/it] {'loss': 1.0161, 'learning_rate': 4.995566801734923e-06, 'epoch': 0.05} 5%|▍ | 135/2774 [26:43<8:27:52, 11.55s/it] 5%|▍ | 136/2774 [26:55<8:31:51, 11.64s/it] {'loss': 1.0039, 'learning_rate': 4.9953913003601625e-06, 'epoch': 0.05} 5%|▍ | 136/2774 [26:55<8:31:51, 11.64s/it] 5%|▍ | 137/2774 [27:06<8:27:50, 11.56s/it] {'loss': 1.0215, 'learning_rate': 4.995212395422412e-06, 'epoch': 0.05} 5%|▍ | 137/2774 [27:06<8:27:50, 11.56s/it] 5%|▍ | 138/2774 [27:18<8:25:54, 11.52s/it] {'loss': 1.0527, 'learning_rate': 4.995030087165684e-06, 'epoch': 0.05} 5%|▍ | 138/2774 [27:18<8:25:54, 11.52s/it] 5%|▌ | 139/2774 [27:30<8:32:48, 11.68s/it] {'loss': 0.9849, 'learning_rate': 4.994844375838639e-06, 'epoch': 0.05} 5%|▌ | 139/2774 [27:30<8:32:48, 11.68s/it] 5%|▌ | 140/2774 [27:41<8:29:43, 11.61s/it] {'loss': 1.0498, 'learning_rate': 4.994655261694575e-06, 'epoch': 0.05} 5%|▌ | 140/2774 [27:41<8:29:43, 11.61s/it] 5%|▌ | 141/2774 [27:53<8:29:45, 11.62s/it] {'loss': 1.0322, 'learning_rate': 4.994462744991431e-06, 'epoch': 0.05} 5%|▌ | 141/2774 [27:53<8:29:45, 11.62s/it] 5%|▌ | 142/2774 [28:05<8:29:28, 11.61s/it] {'loss': 1.0396, 'learning_rate': 4.994266825991788e-06, 'epoch': 0.05} 5%|▌ | 142/2774 [28:05<8:29:28, 11.61s/it] 5%|▌ | 143/2774 [28:16<8:28:00, 11.58s/it] {'loss': 1.0088, 'learning_rate': 4.9940675049628715e-06, 'epoch': 0.05} 5%|▌ | 143/2774 [28:16<8:28:00, 11.58s/it] 5%|▌ | 144/2774 [28:28<8:25:43, 11.54s/it] {'loss': 1.0078, 'learning_rate': 4.993864782176539e-06, 'epoch': 0.05} 5%|▌ | 144/2774 [28:28<8:25:43, 11.54s/it] 5%|▌ | 145/2774 [28:40<8:36:47, 11.79s/it] {'loss': 0.9629, 'learning_rate': 4.993658657909294e-06, 'epoch': 0.05} 5%|▌ | 145/2774 [28:40<8:36:47, 11.79s/it] 5%|▌ | 146/2774 [28:53<8:48:59, 12.08s/it] {'loss': 0.9785, 'learning_rate': 4.993449132442278e-06, 'epoch': 0.05} 5%|▌ | 146/2774 [28:53<8:48:59, 12.08s/it] 5%|▌ | 147/2774 [29:04<8:45:38, 12.01s/it] {'loss': 1.0596, 'learning_rate': 4.9932362060612694e-06, 'epoch': 0.05} 5%|▌ | 147/2774 [29:04<8:45:38, 12.01s/it] 5%|▌ | 148/2774 [29:16<8:44:39, 11.99s/it] {'loss': 1.0303, 'learning_rate': 4.993019879056689e-06, 'epoch': 0.05} 5%|▌ | 148/2774 [29:16<8:44:39, 11.99s/it] 5%|▌ | 149/2774 [29:28<8:35:59, 11.79s/it] {'loss': 1.0684, 'learning_rate': 4.992800151723592e-06, 'epoch': 0.05} 5%|▌ | 149/2774 [29:28<8:35:59, 11.79s/it] 5%|▌ | 150/2774 [29:39<8:30:06, 11.66s/it] {'loss': 0.9956, 'learning_rate': 4.9925770243616745e-06, 'epoch': 0.05} 5%|▌ | 150/2774 [29:39<8:30:06, 11.66s/it] 5%|▌ | 151/2774 [29:50<8:24:15, 11.53s/it] {'loss': 1.0322, 'learning_rate': 4.992350497275268e-06, 'epoch': 0.05} 5%|▌ | 151/2774 [29:50<8:24:15, 11.53s/it] 5%|▌ | 152/2774 [30:02<8:24:34, 11.55s/it] {'loss': 1.0742, 'learning_rate': 4.992120570773342e-06, 'epoch': 0.05} 5%|▌ | 152/2774 [30:02<8:24:34, 11.55s/it] 6%|▌ | 153/2774 [30:14<8:30:18, 11.68s/it] {'loss': 1.0566, 'learning_rate': 4.991887245169502e-06, 'epoch': 0.06} 6%|▌ | 153/2774 [30:14<8:30:18, 11.68s/it] 6%|▌ | 154/2774 [30:26<8:29:25, 11.67s/it] {'loss': 1.0605, 'learning_rate': 4.99165052078199e-06, 'epoch': 0.06} 6%|▌ | 154/2774 [30:26<8:29:25, 11.67s/it] 6%|▌ | 155/2774 [30:38<8:39:45, 11.91s/it] {'loss': 1.0112, 'learning_rate': 4.991410397933685e-06, 'epoch': 0.06} 6%|▌ | 155/2774 [30:38<8:39:45, 11.91s/it] 6%|▌ | 156/2774 [30:49<8:32:35, 11.75s/it] {'loss': 1.0498, 'learning_rate': 4.991166876952098e-06, 'epoch': 0.06} 6%|▌ | 156/2774 [30:49<8:32:35, 11.75s/it] 6%|▌ | 157/2774 [31:01<8:32:24, 11.75s/it] {'loss': 1.0659, 'learning_rate': 4.990919958169379e-06, 'epoch': 0.06} 6%|▌ | 157/2774 [31:01<8:32:24, 11.75s/it] 6%|▌ | 158/2774 [31:14<8:50:46, 12.17s/it] {'loss': 0.9556, 'learning_rate': 4.99066964192231e-06, 'epoch': 0.06} 6%|▌ | 158/2774 [31:14<8:50:46, 12.17s/it] 6%|▌ | 159/2774 [31:27<8:53:02, 12.23s/it] {'loss': 1.0527, 'learning_rate': 4.990415928552306e-06, 'epoch': 0.06} 6%|▌ | 159/2774 [31:27<8:53:02, 12.23s/it] 6%|▌ | 160/2774 [31:40<9:05:10, 12.51s/it] {'loss': 1.1377, 'learning_rate': 4.990158818405417e-06, 'epoch': 0.06} 6%|▌ | 160/2774 [31:40<9:05:10, 12.51s/it] 6%|▌ | 161/2774 [31:52<8:57:01, 12.33s/it] {'loss': 1.0562, 'learning_rate': 4.9898983118323265e-06, 'epoch': 0.06} 6%|▌ | 161/2774 [31:52<8:57:01, 12.33s/it] 6%|▌ | 162/2774 [32:03<8:43:30, 12.03s/it] {'loss': 1.0586, 'learning_rate': 4.989634409188349e-06, 'epoch': 0.06} 6%|▌ | 162/2774 [32:03<8:43:30, 12.03s/it] 6%|▌ | 163/2774 [32:15<8:36:25, 11.87s/it] {'loss': 1.0205, 'learning_rate': 4.989367110833432e-06, 'epoch': 0.06} 6%|▌ | 163/2774 [32:15<8:36:25, 11.87s/it] 6%|▌ | 164/2774 [32:27<8:46:49, 12.11s/it] {'loss': 1.042, 'learning_rate': 4.9890964171321535e-06, 'epoch': 0.06} 6%|▌ | 164/2774 [32:27<8:46:49, 12.11s/it] 6%|▌ | 165/2774 [32:39<8:35:48, 11.86s/it] {'loss': 1.0288, 'learning_rate': 4.988822328453725e-06, 'epoch': 0.06} 6%|▌ | 165/2774 [32:39<8:35:48, 11.86s/it] 6%|▌ | 166/2774 [32:50<8:35:04, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.988544845171986e-06, 'epoch': 0.06} 6%|▌ | 166/2774 [32:50<8:35:04, 11.85s/it] 6%|▌ | 167/2774 [33:02<8:31:44, 11.78s/it] {'loss': 1.0356, 'learning_rate': 4.9882639676654075e-06, 'epoch': 0.06} 6%|▌ | 167/2774 [33:02<8:31:44, 11.78s/it] 6%|▌ | 168/2774 [33:13<8:26:07, 11.65s/it] {'loss': 1.0381, 'learning_rate': 4.987979696317088e-06, 'epoch': 0.06} 6%|▌ | 168/2774 [33:13<8:26:07, 11.65s/it] 6%|▌ | 169/2774 [33:25<8:25:22, 11.64s/it] {'loss': 1.0044, 'learning_rate': 4.987692031514758e-06, 'epoch': 0.06} 6%|▌ | 169/2774 [33:25<8:25:22, 11.64s/it] 6%|▌ | 170/2774 [33:36<8:22:40, 11.58s/it] {'loss': 1.0757, 'learning_rate': 4.9874009736507745e-06, 'epoch': 0.06} 6%|▌ | 170/2774 [33:36<8:22:40, 11.58s/it] 6%|▌ | 171/2774 [33:48<8:21:08, 11.55s/it] {'loss': 1.0215, 'learning_rate': 4.987106523122122e-06, 'epoch': 0.06} 6%|▌ | 171/2774 [33:48<8:21:08, 11.55s/it] 6%|▌ | 172/2774 [34:01<8:35:13, 11.88s/it] {'loss': 1.0396, 'learning_rate': 4.986808680330415e-06, 'epoch': 0.06} 6%|▌ | 172/2774 [34:01<8:35:13, 11.88s/it] 6%|▌ | 173/2774 [34:12<8:29:56, 11.76s/it] {'loss': 1.0537, 'learning_rate': 4.9865074456818906e-06, 'epoch': 0.06} 6%|▌ | 173/2774 [34:12<8:29:56, 11.76s/it] 6%|▋ | 174/2774 [34:26<8:57:15, 12.40s/it] {'loss': 1.0645, 'learning_rate': 4.9862028195874165e-06, 'epoch': 0.06} 6%|▋ | 174/2774 [34:26<8:57:15, 12.40s/it] 6%|▋ | 175/2774 [34:38<8:54:03, 12.33s/it] {'loss': 0.999, 'learning_rate': 4.985894802462485e-06, 'epoch': 0.06} 6%|▋ | 175/2774 [34:38<8:54:03, 12.33s/it] 6%|▋ | 176/2774 [34:51<9:02:01, 12.52s/it] {'loss': 1.0571, 'learning_rate': 4.985583394727211e-06, 'epoch': 0.06} 6%|▋ | 176/2774 [34:51<9:02:01, 12.52s/it] 6%|▋ | 177/2774 [35:04<9:11:46, 12.75s/it] {'loss': 0.9673, 'learning_rate': 4.985268596806336e-06, 'epoch': 0.06} 6%|▋ | 177/2774 [35:04<9:11:46, 12.75s/it] 6%|▋ | 178/2774 [35:15<8:48:26, 12.21s/it] {'loss': 1.0605, 'learning_rate': 4.9849504091292264e-06, 'epoch': 0.06} 6%|▋ | 178/2774 [35:15<8:48:26, 12.21s/it] 6%|▋ | 179/2774 [35:26<8:34:29, 11.90s/it] {'loss': 1.0771, 'learning_rate': 4.98462883212987e-06, 'epoch': 0.06} 6%|▋ | 179/2774 [35:26<8:34:29, 11.90s/it] 6%|▋ | 180/2774 [35:38<8:29:28, 11.78s/it] {'loss': 1.1079, 'learning_rate': 4.984303866246879e-06, 'epoch': 0.06} 6%|▋ | 180/2774 [35:38<8:29:28, 11.78s/it] 7%|▋ | 181/2774 [35:50<8:35:52, 11.94s/it] {'loss': 0.9976, 'learning_rate': 4.983975511923488e-06, 'epoch': 0.07} 7%|▋ | 181/2774 [35:50<8:35:52, 11.94s/it] 7%|▋ | 182/2774 [36:04<8:56:25, 12.42s/it] {'loss': 0.9878, 'learning_rate': 4.98364376960755e-06, 'epoch': 0.07} 7%|▋ | 182/2774 [36:04<8:56:25, 12.42s/it] 7%|▋ | 183/2774 [36:16<8:47:49, 12.22s/it] {'loss': 1.0107, 'learning_rate': 4.983308639751544e-06, 'epoch': 0.07} 7%|▋ | 183/2774 [36:16<8:47:49, 12.22s/it] 7%|▋ | 184/2774 [36:27<8:41:05, 12.07s/it] {'loss': 1.0322, 'learning_rate': 4.982970122812566e-06, 'epoch': 0.07} 7%|▋ | 184/2774 [36:27<8:41:05, 12.07s/it] 7%|▋ | 185/2774 [36:39<8:40:08, 12.05s/it] {'loss': 0.9829, 'learning_rate': 4.9826282192523315e-06, 'epoch': 0.07} 7%|▋ | 185/2774 [36:39<8:40:08, 12.05s/it] 7%|▋ | 186/2774 [36:51<8:31:24, 11.86s/it] {'loss': 1.0029, 'learning_rate': 4.982282929537179e-06, 'epoch': 0.07} 7%|▋ | 186/2774 [36:51<8:31:24, 11.86s/it] 7%|▋ | 187/2774 [37:02<8:24:13, 11.69s/it] {'loss': 1.0386, 'learning_rate': 4.98193425413806e-06, 'epoch': 0.07} 7%|▋ | 187/2774 [37:02<8:24:13, 11.69s/it] 7%|▋ | 188/2774 [37:14<8:22:51, 11.67s/it] {'loss': 1.0215, 'learning_rate': 4.9815821935305475e-06, 'epoch': 0.07} 7%|▋ | 188/2774 [37:14<8:22:51, 11.67s/it] 7%|▋ | 189/2774 [37:25<8:18:20, 11.57s/it] {'loss': 1.0249, 'learning_rate': 4.981226748194833e-06, 'epoch': 0.07} 7%|▋ | 189/2774 [37:25<8:18:20, 11.57s/it] 7%|▋ | 190/2774 [37:38<8:34:08, 11.94s/it] {'loss': 1.0068, 'learning_rate': 4.980867918615719e-06, 'epoch': 0.07} 7%|▋ | 190/2774 [37:38<8:34:08, 11.94s/it] 7%|▋ | 191/2774 [37:49<8:28:04, 11.80s/it] {'loss': 1.0366, 'learning_rate': 4.980505705282629e-06, 'epoch': 0.07} 7%|▋ | 191/2774 [37:49<8:28:04, 11.80s/it] 7%|▋ | 192/2774 [38:01<8:26:49, 11.78s/it] {'loss': 1.0464, 'learning_rate': 4.980140108689602e-06, 'epoch': 0.07} 7%|▋ | 192/2774 [38:01<8:26:49, 11.78s/it] 7%|▋ | 193/2774 [38:14<8:41:40, 12.13s/it] {'loss': 0.9741, 'learning_rate': 4.979771129335286e-06, 'epoch': 0.07} 7%|▋ | 193/2774 [38:14<8:41:40, 12.13s/it] 7%|▋ | 194/2774 [38:25<8:31:53, 11.90s/it] {'loss': 1.041, 'learning_rate': 4.979398767722949e-06, 'epoch': 0.07} 7%|▋ | 194/2774 [38:25<8:31:53, 11.90s/it] 7%|▋ | 195/2774 [38:36<8:22:28, 11.69s/it] {'loss': 0.9717, 'learning_rate': 4.97902302436047e-06, 'epoch': 0.07} 7%|▋ | 195/2774 [38:36<8:22:28, 11.69s/it] 7%|▋ | 196/2774 [38:48<8:18:11, 11.59s/it] {'loss': 1.0186, 'learning_rate': 4.9786438997603385e-06, 'epoch': 0.07} 7%|▋ | 196/2774 [38:48<8:18:11, 11.59s/it] 7%|▋ | 197/2774 [38:59<8:18:24, 11.60s/it] {'loss': 0.9966, 'learning_rate': 4.978261394439658e-06, 'epoch': 0.07} 7%|▋ | 197/2774 [38:59<8:18:24, 11.60s/it] 7%|▋ | 198/2774 [39:11<8:16:31, 11.56s/it] {'loss': 1.0659, 'learning_rate': 4.9778755089201445e-06, 'epoch': 0.07} 7%|▋ | 198/2774 [39:11<8:16:31, 11.56s/it] 7%|▋ | 199/2774 [39:24<8:35:59, 12.02s/it] {'loss': 1.0732, 'learning_rate': 4.97748624372812e-06, 'epoch': 0.07} 7%|▋ | 199/2774 [39:24<8:35:59, 12.02s/it] 7%|▋ | 200/2774 [39:36<8:28:51, 11.86s/it] {'loss': 1.0776, 'learning_rate': 4.97709359939452e-06, 'epoch': 0.07} 7%|▋ | 200/2774 [39:36<8:28:51, 11.86s/it] 7%|▋ | 201/2774 [39:47<8:24:03, 11.75s/it] {'loss': 1.166, 'learning_rate': 4.976697576454889e-06, 'epoch': 0.07} 7%|▋ | 201/2774 [39:47<8:24:03, 11.75s/it] 7%|▋ | 202/2774 [39:59<8:20:38, 11.68s/it] {'loss': 1.0479, 'learning_rate': 4.9762981754493755e-06, 'epoch': 0.07} 7%|▋ | 202/2774 [39:59<8:20:38, 11.68s/it] 7%|▋ | 203/2774 [40:10<8:17:45, 11.62s/it] {'loss': 1.0161, 'learning_rate': 4.97589539692274e-06, 'epoch': 0.07} 7%|▋ | 203/2774 [40:10<8:17:45, 11.62s/it] 7%|▋ | 204/2774 [40:23<8:29:58, 11.91s/it] {'loss': 1.0459, 'learning_rate': 4.975489241424347e-06, 'epoch': 0.07} 7%|▋ | 204/2774 [40:23<8:29:58, 11.91s/it] 7%|▋ | 205/2774 [40:34<8:22:34, 11.74s/it] {'loss': 1.0908, 'learning_rate': 4.975079709508171e-06, 'epoch': 0.07} 7%|▋ | 205/2774 [40:34<8:22:34, 11.74s/it] 7%|▋ | 206/2774 [40:45<8:16:36, 11.60s/it] {'loss': 1.0581, 'learning_rate': 4.9746668017327845e-06, 'epoch': 0.07} 7%|▋ | 206/2774 [40:45<8:16:36, 11.60s/it] 7%|▋ | 207/2774 [40:58<8:33:43, 12.01s/it] {'loss': 0.9844, 'learning_rate': 4.974250518661371e-06, 'epoch': 0.07} 7%|▋ | 207/2774 [40:58<8:33:43, 12.01s/it] 7%|▋ | 208/2774 [41:09<8:24:14, 11.79s/it] {'loss': 0.999, 'learning_rate': 4.973830860861717e-06, 'epoch': 0.07} 7%|▋ | 208/2774 [41:09<8:24:14, 11.79s/it] 8%|▊ | 209/2774 [41:21<8:17:14, 11.63s/it] {'loss': 1.0444, 'learning_rate': 4.973407828906208e-06, 'epoch': 0.08} 8%|▊ | 209/2774 [41:21<8:17:14, 11.63s/it] 8%|▊ | 210/2774 [41:33<8:26:28, 11.85s/it] {'loss': 1.084, 'learning_rate': 4.9729814233718345e-06, 'epoch': 0.08} 8%|▊ | 210/2774 [41:33<8:26:28, 11.85s/it] 8%|▊ | 211/2774 [41:44<8:20:25, 11.72s/it] {'loss': 0.9941, 'learning_rate': 4.972551644840188e-06, 'epoch': 0.08} 8%|▊ | 211/2774 [41:44<8:20:25, 11.72s/it] 8%|▊ | 212/2774 [41:57<8:33:28, 12.03s/it] {'loss': 1.0229, 'learning_rate': 4.972118493897461e-06, 'epoch': 0.08} 8%|▊ | 212/2774 [41:57<8:33:28, 12.03s/it] 8%|▊ | 213/2774 [42:08<8:22:07, 11.76s/it] {'loss': 1.0664, 'learning_rate': 4.9716819711344446e-06, 'epoch': 0.08} 8%|▊ | 213/2774 [42:08<8:22:07, 11.76s/it] 8%|▊ | 214/2774 [42:20<8:15:18, 11.61s/it] {'loss': 1.0444, 'learning_rate': 4.97124207714653e-06, 'epoch': 0.08} 8%|▊ | 214/2774 [42:20<8:15:18, 11.61s/it] 8%|▊ | 215/2774 [42:31<8:17:22, 11.66s/it] {'loss': 0.9995, 'learning_rate': 4.9707988125337056e-06, 'epoch': 0.08} 8%|▊ | 215/2774 [42:31<8:17:22, 11.66s/it] 8%|▊ | 216/2774 [42:44<8:33:38, 12.05s/it] {'loss': 1.0269, 'learning_rate': 4.970352177900558e-06, 'epoch': 0.08} 8%|▊ | 216/2774 [42:44<8:33:38, 12.05s/it] 8%|▊ | 217/2774 [42:56<8:23:16, 11.81s/it] {'loss': 1.0503, 'learning_rate': 4.9699021738562705e-06, 'epoch': 0.08} 8%|▊ | 217/2774 [42:56<8:23:16, 11.81s/it] 8%|▊ | 218/2774 [43:09<8:48:59, 12.42s/it] {'loss': 1.043, 'learning_rate': 4.9694488010146195e-06, 'epoch': 0.08} 8%|▊ | 218/2774 [43:09<8:48:59, 12.42s/it] 8%|▊ | 219/2774 [43:21<8:35:13, 12.10s/it] {'loss': 1.0435, 'learning_rate': 4.968992059993979e-06, 'epoch': 0.08} 8%|▊ | 219/2774 [43:21<8:35:13, 12.10s/it] 8%|▊ | 220/2774 [43:32<8:25:51, 11.88s/it] {'loss': 0.9829, 'learning_rate': 4.9685319514173165e-06, 'epoch': 0.08} 8%|▊ | 220/2774 [43:32<8:25:51, 11.88s/it] 8%|▊ | 221/2774 [43:44<8:25:08, 11.87s/it] {'loss': 1.0205, 'learning_rate': 4.968068475912192e-06, 'epoch': 0.08} 8%|▊ | 221/2774 [43:44<8:25:08, 11.87s/it] 8%|▊ | 222/2774 [43:55<8:16:14, 11.67s/it] {'loss': 1.0195, 'learning_rate': 4.967601634110758e-06, 'epoch': 0.08} 8%|▊ | 222/2774 [43:55<8:16:14, 11.67s/it] 8%|▊ | 223/2774 [44:07<8:14:52, 11.64s/it] {'loss': 1.02, 'learning_rate': 4.9671314266497595e-06, 'epoch': 0.08} 8%|▊ | 223/2774 [44:07<8:14:52, 11.64s/it] 8%|▊ | 224/2774 [44:18<8:11:57, 11.58s/it] {'loss': 1.0938, 'learning_rate': 4.96665785417053e-06, 'epoch': 0.08} 8%|▊ | 224/2774 [44:18<8:11:57, 11.58s/it] 8%|▊ | 225/2774 [44:30<8:08:52, 11.51s/it] {'loss': 0.9912, 'learning_rate': 4.966180917318994e-06, 'epoch': 0.08} 8%|▊ | 225/2774 [44:30<8:08:52, 11.51s/it] 8%|▊ | 226/2774 [44:41<8:05:17, 11.43s/it] {'loss': 1.0371, 'learning_rate': 4.965700616745665e-06, 'epoch': 0.08} 8%|▊ | 226/2774 [44:41<8:05:17, 11.43s/it] 8%|▊ | 227/2774 [44:53<8:14:42, 11.65s/it] {'loss': 1.0537, 'learning_rate': 4.965216953105644e-06, 'epoch': 0.08} 8%|▊ | 227/2774 [44:53<8:14:42, 11.65s/it] 8%|▊ | 228/2774 [45:04<8:08:18, 11.51s/it] {'loss': 1.0176, 'learning_rate': 4.964729927058618e-06, 'epoch': 0.08} 8%|▊ | 228/2774 [45:04<8:08:18, 11.51s/it] 8%|▊ | 229/2774 [45:16<8:10:58, 11.58s/it] {'loss': 1.0005, 'learning_rate': 4.964239539268861e-06, 'epoch': 0.08} 8%|▊ | 229/2774 [45:16<8:10:58, 11.58s/it] 8%|▊ | 230/2774 [45:27<8:10:18, 11.56s/it] {'loss': 0.9961, 'learning_rate': 4.963745790405234e-06, 'epoch': 0.08} 8%|▊ | 230/2774 [45:27<8:10:18, 11.56s/it] 8%|▊ | 231/2774 [45:39<8:08:31, 11.53s/it] {'loss': 1.02, 'learning_rate': 4.963248681141179e-06, 'epoch': 0.08} 8%|▊ | 231/2774 [45:39<8:08:31, 11.53s/it] 8%|▊ | 232/2774 [45:50<8:06:49, 11.49s/it] {'loss': 1.0684, 'learning_rate': 4.962748212154724e-06, 'epoch': 0.08} 8%|▊ | 232/2774 [45:50<8:06:49, 11.49s/it] 8%|▊ | 233/2774 [46:02<8:04:26, 11.44s/it] {'loss': 1.0229, 'learning_rate': 4.9622443841284786e-06, 'epoch': 0.08} 8%|▊ | 233/2774 [46:02<8:04:26, 11.44s/it] 8%|▊ | 234/2774 [46:13<8:05:56, 11.48s/it] {'loss': 1.0518, 'learning_rate': 4.961737197749633e-06, 'epoch': 0.08} 8%|▊ | 234/2774 [46:13<8:05:56, 11.48s/it] 8%|▊ | 235/2774 [46:24<8:01:43, 11.38s/it] {'loss': 1.0249, 'learning_rate': 4.961226653709959e-06, 'epoch': 0.08} 8%|▊ | 235/2774 [46:24<8:01:43, 11.38s/it] 9%|▊ | 236/2774 [46:36<8:02:21, 11.40s/it] {'loss': 1.0688, 'learning_rate': 4.960712752705808e-06, 'epoch': 0.09} 9%|▊ | 236/2774 [46:36<8:02:21, 11.40s/it] 9%|▊ | 237/2774 [46:47<8:05:20, 11.48s/it] {'loss': 1.0439, 'learning_rate': 4.96019549543811e-06, 'epoch': 0.09} 9%|▊ | 237/2774 [46:47<8:05:20, 11.48s/it] 9%|▊ | 238/2774 [46:59<8:07:21, 11.53s/it] {'loss': 1.0122, 'learning_rate': 4.959674882612372e-06, 'epoch': 0.09} 9%|▊ | 238/2774 [46:59<8:07:21, 11.53s/it] 9%|▊ | 239/2774 [47:11<8:08:16, 11.56s/it] {'loss': 1.0776, 'learning_rate': 4.95915091493868e-06, 'epoch': 0.09} 9%|▊ | 239/2774 [47:11<8:08:16, 11.56s/it] 9%|▊ | 240/2774 [47:22<8:10:33, 11.62s/it] {'loss': 1.0278, 'learning_rate': 4.958623593131691e-06, 'epoch': 0.09} 9%|▊ | 240/2774 [47:22<8:10:33, 11.62s/it] 9%|▊ | 241/2774 [47:34<8:07:43, 11.55s/it] {'loss': 0.9966, 'learning_rate': 4.958092917910646e-06, 'epoch': 0.09} 9%|▊ | 241/2774 [47:34<8:07:43, 11.55s/it] 9%|▊ | 242/2774 [47:45<8:06:11, 11.52s/it] {'loss': 1.0532, 'learning_rate': 4.9575588899993464e-06, 'epoch': 0.09} 9%|▊ | 242/2774 [47:45<8:06:11, 11.52s/it] 9%|▉ | 243/2774 [47:57<8:04:53, 11.49s/it] {'loss': 1.0562, 'learning_rate': 4.9570215101261796e-06, 'epoch': 0.09} 9%|▉ | 243/2774 [47:57<8:04:53, 11.49s/it] 9%|▉ | 244/2774 [48:08<8:04:13, 11.48s/it] {'loss': 1.0918, 'learning_rate': 4.956480779024098e-06, 'epoch': 0.09} 9%|▉ | 244/2774 [48:08<8:04:13, 11.48s/it] 9%|▉ | 245/2774 [48:20<8:03:05, 11.46s/it] {'loss': 1.0913, 'learning_rate': 4.955936697430625e-06, 'epoch': 0.09} 9%|▉ | 245/2774 [48:20<8:03:05, 11.46s/it] 9%|▉ | 246/2774 [48:31<8:04:51, 11.51s/it] {'loss': 1.0386, 'learning_rate': 4.955389266087856e-06, 'epoch': 0.09} 9%|▉ | 246/2774 [48:31<8:04:51, 11.51s/it] 9%|▉ | 247/2774 [48:43<8:04:52, 11.51s/it] {'loss': 1.0195, 'learning_rate': 4.954838485742453e-06, 'epoch': 0.09} 9%|▉ | 247/2774 [48:43<8:04:52, 11.51s/it] 9%|▉ | 248/2774 [48:54<8:07:41, 11.58s/it] {'loss': 1.0044, 'learning_rate': 4.95428435714565e-06, 'epoch': 0.09} 9%|▉ | 248/2774 [48:54<8:07:41, 11.58s/it] 9%|▉ | 249/2774 [49:06<8:11:13, 11.67s/it] {'loss': 0.9722, 'learning_rate': 4.953726881053242e-06, 'epoch': 0.09} 9%|▉ | 249/2774 [49:06<8:11:13, 11.67s/it] 9%|▉ | 250/2774 [49:18<8:05:39, 11.54s/it] {'loss': 1.0186, 'learning_rate': 4.9531660582255934e-06, 'epoch': 0.09} 9%|▉ | 250/2774 [49:18<8:05:39, 11.54s/it] 9%|▉ | 251/2774 [49:29<8:03:57, 11.51s/it] {'loss': 1.0244, 'learning_rate': 4.952601889427634e-06, 'epoch': 0.09} 9%|▉ | 251/2774 [49:29<8:03:57, 11.51s/it] 9%|▉ | 252/2774 [49:41<8:06:41, 11.58s/it] {'loss': 1.0239, 'learning_rate': 4.9520343754288545e-06, 'epoch': 0.09} 9%|▉ | 252/2774 [49:41<8:06:41, 11.58s/it] 9%|▉ | 253/2774 [49:53<8:08:59, 11.64s/it] {'loss': 1.0142, 'learning_rate': 4.951463517003311e-06, 'epoch': 0.09} 9%|▉ | 253/2774 [49:53<8:08:59, 11.64s/it] 9%|▉ | 254/2774 [50:04<8:02:39, 11.49s/it] {'loss': 1.0381, 'learning_rate': 4.950889314929618e-06, 'epoch': 0.09} 9%|▉ | 254/2774 [50:04<8:02:39, 11.49s/it] 9%|▉ | 255/2774 [50:15<7:57:48, 11.38s/it] {'loss': 1.0464, 'learning_rate': 4.9503117699909545e-06, 'epoch': 0.09} 9%|▉ | 255/2774 [50:15<7:57:48, 11.38s/it] 9%|▉ | 256/2774 [50:28<8:17:34, 11.86s/it] {'loss': 1.0361, 'learning_rate': 4.949730882975055e-06, 'epoch': 0.09} 9%|▉ | 256/2774 [50:28<8:17:34, 11.86s/it] 9%|▉ | 257/2774 [50:39<8:11:12, 11.71s/it] {'loss': 1.1025, 'learning_rate': 4.949146654674216e-06, 'epoch': 0.09} 9%|▉ | 257/2774 [50:39<8:11:12, 11.71s/it] 9%|▉ | 258/2774 [50:51<8:08:20, 11.65s/it] {'loss': 1.0479, 'learning_rate': 4.948559085885288e-06, 'epoch': 0.09} 9%|▉ | 258/2774 [50:51<8:08:20, 11.65s/it] 9%|▉ | 259/2774 [51:02<8:05:11, 11.58s/it] {'loss': 1.0273, 'learning_rate': 4.947968177409681e-06, 'epoch': 0.09} 9%|▉ | 259/2774 [51:02<8:05:11, 11.58s/it] 9%|▉ | 260/2774 [51:14<8:05:25, 11.59s/it] {'loss': 1.02, 'learning_rate': 4.9473739300533575e-06, 'epoch': 0.09} 9%|▉ | 260/2774 [51:14<8:05:25, 11.59s/it] 9%|▉ | 261/2774 [51:27<8:23:42, 12.03s/it] {'loss': 1.022, 'learning_rate': 4.946776344626834e-06, 'epoch': 0.09} 9%|▉ | 261/2774 [51:27<8:23:42, 12.03s/it] 9%|▉ | 262/2774 [51:40<8:44:11, 12.52s/it] {'loss': 0.936, 'learning_rate': 4.9461754219451844e-06, 'epoch': 0.09} 9%|▉ | 262/2774 [51:40<8:44:11, 12.52s/it] 9%|▉ | 263/2774 [51:52<8:32:28, 12.25s/it] {'loss': 1.0244, 'learning_rate': 4.945571162828027e-06, 'epoch': 0.09} 9%|▉ | 263/2774 [51:52<8:32:28, 12.25s/it] 10%|▉ | 264/2774 [52:03<8:20:11, 11.96s/it] {'loss': 1.0488, 'learning_rate': 4.9449635680995375e-06, 'epoch': 0.1} 10%|▉ | 264/2774 [52:03<8:20:11, 11.96s/it] 10%|▉ | 265/2774 [52:15<8:11:54, 11.76s/it] {'loss': 1.0425, 'learning_rate': 4.944352638588436e-06, 'epoch': 0.1} 10%|▉ | 265/2774 [52:15<8:11:54, 11.76s/it] 10%|▉ | 266/2774 [52:26<8:05:23, 11.61s/it] {'loss': 0.9795, 'learning_rate': 4.943738375127996e-06, 'epoch': 0.1} 10%|▉ | 266/2774 [52:26<8:05:23, 11.61s/it] 10%|▉ | 267/2774 [52:38<8:11:32, 11.76s/it] {'loss': 1.0034, 'learning_rate': 4.943120778556034e-06, 'epoch': 0.1} 10%|▉ | 267/2774 [52:38<8:11:32, 11.76s/it] 10%|▉ | 268/2774 [52:50<8:15:01, 11.85s/it] {'loss': 1.0171, 'learning_rate': 4.942499849714915e-06, 'epoch': 0.1} 10%|▉ | 268/2774 [52:50<8:15:01, 11.85s/it] 10%|▉ | 269/2774 [53:02<8:21:09, 12.00s/it] {'loss': 1.002, 'learning_rate': 4.941875589451548e-06, 'epoch': 0.1} 10%|▉ | 269/2774 [53:02<8:21:09, 12.00s/it] 10%|▉ | 270/2774 [53:14<8:19:03, 11.96s/it] {'loss': 1.0405, 'learning_rate': 4.9412479986173854e-06, 'epoch': 0.1} 10%|▉ | 270/2774 [53:14<8:19:03, 11.96s/it] 10%|▉ | 271/2774 [53:26<8:10:31, 11.76s/it] {'loss': 1.0332, 'learning_rate': 4.940617078068426e-06, 'epoch': 0.1} 10%|▉ | 271/2774 [53:26<8:10:31, 11.76s/it] 10%|▉ | 272/2774 [53:37<8:02:05, 11.56s/it] {'loss': 1.0444, 'learning_rate': 4.9399828286652056e-06, 'epoch': 0.1} 10%|▉ | 272/2774 [53:37<8:02:05, 11.56s/it] 10%|▉ | 273/2774 [53:50<8:29:42, 12.23s/it] {'loss': 1.0015, 'learning_rate': 4.939345251272802e-06, 'epoch': 0.1} 10%|▉ | 273/2774 [53:50<8:29:42, 12.23s/it] 10%|▉ | 274/2774 [54:02<8:18:13, 11.96s/it] {'loss': 0.9541, 'learning_rate': 4.938704346760832e-06, 'epoch': 0.1} 10%|▉ | 274/2774 [54:02<8:18:13, 11.96s/it] 10%|▉ | 275/2774 [54:13<8:12:37, 11.83s/it] {'loss': 1.0674, 'learning_rate': 4.938060116003452e-06, 'epoch': 0.1} 10%|▉ | 275/2774 [54:13<8:12:37, 11.83s/it] 10%|▉ | 276/2774 [54:25<8:05:35, 11.66s/it] {'loss': 1.0449, 'learning_rate': 4.937412559879352e-06, 'epoch': 0.1} 10%|▉ | 276/2774 [54:25<8:05:35, 11.66s/it] 10%|▉ | 277/2774 [54:37<8:13:51, 11.87s/it] {'loss': 1.0322, 'learning_rate': 4.936761679271761e-06, 'epoch': 0.1} 10%|▉ | 277/2774 [54:37<8:13:51, 11.87s/it] 10%|█ | 278/2774 [54:48<8:07:55, 11.73s/it] {'loss': 1.0488, 'learning_rate': 4.9361074750684404e-06, 'epoch': 0.1} 10%|█ | 278/2774 [54:48<8:07:55, 11.73s/it] 10%|█ | 279/2774 [55:00<8:03:12, 11.62s/it] {'loss': 1.0098, 'learning_rate': 4.935449948161684e-06, 'epoch': 0.1} 10%|█ | 279/2774 [55:00<8:03:12, 11.62s/it] 10%|█ | 280/2774 [55:13<8:22:10, 12.08s/it] {'loss': 1.0303, 'learning_rate': 4.93478909944832e-06, 'epoch': 0.1} 10%|█ | 280/2774 [55:13<8:22:10, 12.08s/it] 10%|█ | 281/2774 [55:24<8:15:17, 11.92s/it] {'loss': 1.0347, 'learning_rate': 4.934124929829706e-06, 'epoch': 0.1} 10%|█ | 281/2774 [55:24<8:15:17, 11.92s/it] 10%|█ | 282/2774 [55:36<8:12:33, 11.86s/it] {'loss': 1.0425, 'learning_rate': 4.9334574402117295e-06, 'epoch': 0.1} 10%|█ | 282/2774 [55:36<8:12:33, 11.86s/it] 10%|█ | 283/2774 [55:47<8:06:31, 11.72s/it] {'loss': 0.9795, 'learning_rate': 4.932786631504805e-06, 'epoch': 0.1} 10%|█ | 283/2774 [55:47<8:06:31, 11.72s/it] 10%|█ | 284/2774 [55:59<7:57:55, 11.52s/it] {'loss': 1.0762, 'learning_rate': 4.932112504623876e-06, 'epoch': 0.1} 10%|█ | 284/2774 [55:59<7:57:55, 11.52s/it] 10%|█ | 285/2774 [56:10<7:57:46, 11.52s/it] {'loss': 1.0469, 'learning_rate': 4.931435060488411e-06, 'epoch': 0.1} 10%|█ | 285/2774 [56:10<7:57:46, 11.52s/it] 10%|█ | 286/2774 [56:22<7:59:12, 11.56s/it] {'loss': 0.9966, 'learning_rate': 4.9307543000224024e-06, 'epoch': 0.1} 10%|█ | 286/2774 [56:22<7:59:12, 11.56s/it] 10%|█ | 287/2774 [56:33<7:56:36, 11.50s/it] {'loss': 1.0791, 'learning_rate': 4.930070224154366e-06, 'epoch': 0.1} 10%|█ | 287/2774 [56:33<7:56:36, 11.50s/it] 10%|█ | 288/2774 [56:45<7:59:24, 11.57s/it] {'loss': 1.0449, 'learning_rate': 4.92938283381734e-06, 'epoch': 0.1} 10%|█ | 288/2774 [56:45<7:59:24, 11.57s/it] 10%|█ | 289/2774 [56:57<8:02:19, 11.65s/it] {'loss': 1.1064, 'learning_rate': 4.928692129948884e-06, 'epoch': 0.1} 10%|█ | 289/2774 [56:57<8:02:19, 11.65s/it] 10%|█ | 290/2774 [57:09<8:13:24, 11.92s/it] {'loss': 0.9844, 'learning_rate': 4.927998113491076e-06, 'epoch': 0.1} 10%|█ | 290/2774 [57:09<8:13:24, 11.92s/it] 10%|█ | 291/2774 [57:21<8:10:34, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.927300785390513e-06, 'epoch': 0.1} 10%|█ | 291/2774 [57:21<8:10:34, 11.85s/it] 11%|█ | 292/2774 [57:32<8:05:03, 11.73s/it] {'loss': 1.0132, 'learning_rate': 4.926600146598307e-06, 'epoch': 0.11} 11%|█ | 292/2774 [57:32<8:05:03, 11.73s/it] 11%|█ | 293/2774 [57:44<8:04:56, 11.73s/it] {'loss': 1.0474, 'learning_rate': 4.925896198070088e-06, 'epoch': 0.11} 11%|█ | 293/2774 [57:44<8:04:56, 11.73s/it] 11%|█ | 294/2774 [57:56<8:09:43, 11.85s/it] {'loss': 1.0391, 'learning_rate': 4.925188940766e-06, 'epoch': 0.11} 11%|█ | 294/2774 [57:56<8:09:43, 11.85s/it] 11%|█ | 295/2774 [58:07<8:02:02, 11.67s/it] {'loss': 1.0317, 'learning_rate': 4.9244783756506975e-06, 'epoch': 0.11} 11%|█ | 295/2774 [58:07<8:02:02, 11.67s/it] 11%|█ | 296/2774 [58:19<7:59:14, 11.60s/it] {'loss': 1.0244, 'learning_rate': 4.9237645036933505e-06, 'epoch': 0.11} 11%|█ | 296/2774 [58:19<7:59:14, 11.60s/it] 11%|█ | 297/2774 [58:32<8:13:42, 11.96s/it] {'loss': 0.9746, 'learning_rate': 4.923047325867635e-06, 'epoch': 0.11} 11%|█ | 297/2774 [58:32<8:13:42, 11.96s/it] 11%|█ | 298/2774 [58:43<8:11:13, 11.90s/it] {'loss': 1.0181, 'learning_rate': 4.922326843151739e-06, 'epoch': 0.11} 11%|█ | 298/2774 [58:43<8:11:13, 11.90s/it] 11%|█ | 299/2774 [58:55<8:03:23, 11.72s/it] {'loss': 1.0024, 'learning_rate': 4.921603056528358e-06, 'epoch': 0.11} 11%|█ | 299/2774 [58:55<8:03:23, 11.72s/it] 11%|█ | 300/2774 [59:06<8:03:42, 11.73s/it] {'loss': 1.0146, 'learning_rate': 4.920875966984693e-06, 'epoch': 0.11} 11%|█ | 300/2774 [59:06<8:03:42, 11.73s/it] 11%|█ | 301/2774 [59:18<8:03:59, 11.74s/it] {'loss': 1.0181, 'learning_rate': 4.92014557551245e-06, 'epoch': 0.11} 11%|█ | 301/2774 [59:18<8:03:59, 11.74s/it] 11%|█ | 302/2774 [59:30<8:03:29, 11.74s/it] {'loss': 1.0601, 'learning_rate': 4.91941188310784e-06, 'epoch': 0.11} 11%|█ | 302/2774 [59:30<8:03:29, 11.74s/it] 11%|█ | 303/2774 [59:43<8:13:47, 11.99s/it] {'loss': 1.0024, 'learning_rate': 4.918674890771573e-06, 'epoch': 0.11} 11%|█ | 303/2774 [59:43<8:13:47, 11.99s/it] 11%|█ | 304/2774 [59:54<8:09:48, 11.90s/it] {'loss': 1.0527, 'learning_rate': 4.9179345995088625e-06, 'epoch': 0.11} 11%|█ | 304/2774 [59:54<8:09:48, 11.90s/it] 11%|█ | 305/2774 [1:00:06<8:01:59, 11.71s/it] {'loss': 1.0332, 'learning_rate': 4.917191010329423e-06, 'epoch': 0.11} 11%|█ | 305/2774 [1:00:06<8:01:59, 11.71s/it] 11%|█ | 306/2774 [1:00:17<8:00:32, 11.68s/it] {'loss': 1.0356, 'learning_rate': 4.916444124247463e-06, 'epoch': 0.11} 11%|█ | 306/2774 [1:00:17<8:00:32, 11.68s/it] 11%|█ | 307/2774 [1:00:29<8:06:14, 11.83s/it] {'loss': 1.064, 'learning_rate': 4.915693942281691e-06, 'epoch': 0.11} 11%|█ | 307/2774 [1:00:29<8:06:14, 11.83s/it] 11%|█ | 308/2774 [1:00:41<8:05:03, 11.80s/it] {'loss': 1.0547, 'learning_rate': 4.91494046545531e-06, 'epoch': 0.11} 11%|█ | 308/2774 [1:00:41<8:05:03, 11.80s/it] 11%|█ | 309/2774 [1:00:54<8:21:20, 12.20s/it] {'loss': 1.002, 'learning_rate': 4.914183694796017e-06, 'epoch': 0.11} 11%|█ | 309/2774 [1:00:54<8:21:20, 12.20s/it] 11%|█ | 310/2774 [1:01:07<8:34:30, 12.53s/it] {'loss': 1.0356, 'learning_rate': 4.913423631336e-06, 'epoch': 0.11} 11%|█ | 310/2774 [1:01:07<8:34:30, 12.53s/it] 11%|█ | 311/2774 [1:01:20<8:34:35, 12.54s/it] {'loss': 1.0054, 'learning_rate': 4.912660276111941e-06, 'epoch': 0.11} 11%|█ | 311/2774 [1:01:20<8:34:35, 12.54s/it] 11%|█ | 312/2774 [1:01:32<8:27:07, 12.36s/it] {'loss': 1.0439, 'learning_rate': 4.911893630165011e-06, 'epoch': 0.11} 11%|█ | 312/2774 [1:01:32<8:27:07, 12.36s/it] 11%|█▏ | 313/2774 [1:01:43<8:12:58, 12.02s/it] {'loss': 1.0205, 'learning_rate': 4.911123694540868e-06, 'epoch': 0.11} 11%|█▏ | 313/2774 [1:01:43<8:12:58, 12.02s/it] 11%|█▏ | 314/2774 [1:01:54<8:02:37, 11.77s/it] {'loss': 1.0732, 'learning_rate': 4.910350470289656e-06, 'epoch': 0.11} 11%|█▏ | 314/2774 [1:01:54<8:02:37, 11.77s/it] 11%|█▏ | 315/2774 [1:02:06<7:55:02, 11.59s/it] {'loss': 1.019, 'learning_rate': 4.90957395846601e-06, 'epoch': 0.11} 11%|█▏ | 315/2774 [1:02:06<7:55:02, 11.59s/it] 11%|█▏ | 316/2774 [1:02:17<7:49:51, 11.47s/it] {'loss': 1.0181, 'learning_rate': 4.9087941601290416e-06, 'epoch': 0.11} 11%|█▏ | 316/2774 [1:02:17<7:49:51, 11.47s/it] 11%|█▏ | 317/2774 [1:02:28<7:48:07, 11.43s/it] {'loss': 1.0522, 'learning_rate': 4.90801107634235e-06, 'epoch': 0.11} 11%|█▏ | 317/2774 [1:02:28<7:48:07, 11.43s/it] 11%|█▏ | 318/2774 [1:02:39<7:47:19, 11.42s/it] {'loss': 1.0625, 'learning_rate': 4.907224708174014e-06, 'epoch': 0.11} 11%|█▏ | 318/2774 [1:02:39<7:47:19, 11.42s/it] 11%|█▏ | 319/2774 [1:02:52<8:03:08, 11.81s/it] {'loss': 0.9829, 'learning_rate': 4.9064350566965925e-06, 'epoch': 0.11} 11%|█▏ | 319/2774 [1:02:52<8:03:08, 11.81s/it] 12%|█▏ | 320/2774 [1:03:04<8:05:34, 11.87s/it] {'loss': 1.042, 'learning_rate': 4.905642122987123e-06, 'epoch': 0.12} 12%|█▏ | 320/2774 [1:03:04<8:05:34, 11.87s/it] 12%|█▏ | 321/2774 [1:03:16<7:59:01, 11.72s/it] {'loss': 1.019, 'learning_rate': 4.904845908127119e-06, 'epoch': 0.12} 12%|█▏ | 321/2774 [1:03:16<7:59:01, 11.72s/it] 12%|█▏ | 322/2774 [1:03:27<7:57:52, 11.69s/it] {'loss': 1.0566, 'learning_rate': 4.904046413202568e-06, 'epoch': 0.12} 12%|█▏ | 322/2774 [1:03:27<7:57:52, 11.69s/it] 12%|█▏ | 323/2774 [1:03:38<7:52:06, 11.56s/it] {'loss': 1.0234, 'learning_rate': 4.903243639303934e-06, 'epoch': 0.12} 12%|█▏ | 323/2774 [1:03:38<7:52:06, 11.56s/it] 12%|█▏ | 324/2774 [1:03:50<7:51:53, 11.56s/it] {'loss': 1.0107, 'learning_rate': 4.902437587526152e-06, 'epoch': 0.12} 12%|█▏ | 324/2774 [1:03:50<7:51:53, 11.56s/it] 12%|█▏ | 325/2774 [1:04:01<7:47:29, 11.45s/it] {'loss': 1.0151, 'learning_rate': 4.901628258968628e-06, 'epoch': 0.12} 12%|█▏ | 325/2774 [1:04:01<7:47:29, 11.45s/it] 12%|█▏ | 326/2774 [1:04:13<7:48:54, 11.49s/it] {'loss': 1.0229, 'learning_rate': 4.900815654735237e-06, 'epoch': 0.12} 12%|█▏ | 326/2774 [1:04:13<7:48:54, 11.49s/it] 12%|█▏ | 327/2774 [1:04:25<7:58:22, 11.73s/it] {'loss': 0.9951, 'learning_rate': 4.8999997759343225e-06, 'epoch': 0.12} 12%|█▏ | 327/2774 [1:04:25<7:58:22, 11.73s/it] 12%|█▏ | 328/2774 [1:04:38<8:09:53, 12.02s/it] {'loss': 1.0283, 'learning_rate': 4.899180623678693e-06, 'epoch': 0.12} 12%|█▏ | 328/2774 [1:04:38<8:09:53, 12.02s/it] 12%|█▏ | 329/2774 [1:04:49<8:00:31, 11.79s/it] {'loss': 1.0195, 'learning_rate': 4.898358199085624e-06, 'epoch': 0.12} 12%|█▏ | 329/2774 [1:04:49<8:00:31, 11.79s/it] 12%|█▏ | 330/2774 [1:05:00<7:54:48, 11.66s/it] {'loss': 1.0605, 'learning_rate': 4.897532503276852e-06, 'epoch': 0.12} 12%|█▏ | 330/2774 [1:05:00<7:54:48, 11.66s/it] 12%|█▏ | 331/2774 [1:05:12<7:53:12, 11.62s/it] {'loss': 1.0337, 'learning_rate': 4.896703537378577e-06, 'epoch': 0.12} 12%|█▏ | 331/2774 [1:05:12<7:53:12, 11.62s/it] 12%|█▏ | 332/2774 [1:05:23<7:48:48, 11.52s/it] {'loss': 1.0493, 'learning_rate': 4.895871302521457e-06, 'epoch': 0.12} 12%|█▏ | 332/2774 [1:05:23<7:48:48, 11.52s/it] 12%|█▏ | 333/2774 [1:05:34<7:46:01, 11.45s/it] {'loss': 1.0229, 'learning_rate': 4.89503579984061e-06, 'epoch': 0.12} 12%|█▏ | 333/2774 [1:05:34<7:46:01, 11.45s/it] 12%|█▏ | 334/2774 [1:05:46<7:46:49, 11.48s/it] {'loss': 0.9893, 'learning_rate': 4.894197030475614e-06, 'epoch': 0.12} 12%|█▏ | 334/2774 [1:05:46<7:46:49, 11.48s/it] 12%|█▏ | 335/2774 [1:05:57<7:44:38, 11.43s/it] {'loss': 1.0786, 'learning_rate': 4.893354995570497e-06, 'epoch': 0.12} 12%|█▏ | 335/2774 [1:05:57<7:44:38, 11.43s/it] 12%|█▏ | 336/2774 [1:06:09<7:46:43, 11.49s/it] {'loss': 1.0552, 'learning_rate': 4.892509696273745e-06, 'epoch': 0.12} 12%|█▏ | 336/2774 [1:06:09<7:46:43, 11.49s/it] 12%|█▏ | 337/2774 [1:06:20<7:42:45, 11.39s/it] {'loss': 1.0103, 'learning_rate': 4.891661133738295e-06, 'epoch': 0.12} 12%|█▏ | 337/2774 [1:06:20<7:42:45, 11.39s/it] 12%|█▏ | 338/2774 [1:06:31<7:39:49, 11.33s/it] {'loss': 1.0234, 'learning_rate': 4.8908093091215344e-06, 'epoch': 0.12} 12%|█▏ | 338/2774 [1:06:31<7:39:49, 11.33s/it] 12%|█▏ | 339/2774 [1:06:45<8:06:33, 11.99s/it] {'loss': 1.021, 'learning_rate': 4.889954223585301e-06, 'epoch': 0.12} 12%|█▏ | 339/2774 [1:06:45<8:06:33, 11.99s/it] 12%|█▏ | 340/2774 [1:06:56<8:02:19, 11.89s/it] {'loss': 1.1162, 'learning_rate': 4.88909587829588e-06, 'epoch': 0.12} 12%|█▏ | 340/2774 [1:06:56<8:02:19, 11.89s/it] 12%|█▏ | 341/2774 [1:07:08<7:54:35, 11.70s/it] {'loss': 1.0635, 'learning_rate': 4.8882342744240015e-06, 'epoch': 0.12} 12%|█▏ | 341/2774 [1:07:08<7:54:35, 11.70s/it] 12%|█▏ | 342/2774 [1:07:19<7:49:17, 11.58s/it] {'loss': 0.978, 'learning_rate': 4.8873694131448425e-06, 'epoch': 0.12} 12%|█▏ | 342/2774 [1:07:19<7:49:17, 11.58s/it] 12%|█▏ | 343/2774 [1:07:30<7:46:14, 11.51s/it] {'loss': 1.0059, 'learning_rate': 4.886501295638021e-06, 'epoch': 0.12} 12%|█▏ | 343/2774 [1:07:30<7:46:14, 11.51s/it] 12%|█▏ | 344/2774 [1:07:42<7:46:09, 11.51s/it] {'loss': 1.0684, 'learning_rate': 4.885629923087597e-06, 'epoch': 0.12} 12%|█▏ | 344/2774 [1:07:42<7:46:09, 11.51s/it] 12%|█▏ | 345/2774 [1:07:54<7:46:59, 11.54s/it] {'loss': 0.9941, 'learning_rate': 4.88475529668207e-06, 'epoch': 0.12} 12%|█▏ | 345/2774 [1:07:54<7:46:59, 11.54s/it] 12%|█▏ | 346/2774 [1:08:05<7:45:31, 11.50s/it] {'loss': 1.0059, 'learning_rate': 4.883877417614376e-06, 'epoch': 0.12} 12%|█▏ | 346/2774 [1:08:05<7:45:31, 11.50s/it] 13%|█▎ | 347/2774 [1:08:16<7:42:02, 11.42s/it] {'loss': 1.0825, 'learning_rate': 4.882996287081892e-06, 'epoch': 0.13} 13%|█▎ | 347/2774 [1:08:16<7:42:02, 11.42s/it] 13%|█▎ | 348/2774 [1:08:28<7:42:25, 11.44s/it] {'loss': 1.0459, 'learning_rate': 4.882111906286425e-06, 'epoch': 0.13} 13%|█▎ | 348/2774 [1:08:28<7:42:25, 11.44s/it] 13%|█▎ | 349/2774 [1:08:39<7:39:40, 11.37s/it] {'loss': 1.002, 'learning_rate': 4.8812242764342165e-06, 'epoch': 0.13} 13%|█▎ | 349/2774 [1:08:39<7:39:40, 11.37s/it] 13%|█▎ | 350/2774 [1:08:50<7:41:25, 11.42s/it] {'loss': 0.998, 'learning_rate': 4.880333398735941e-06, 'epoch': 0.13} 13%|█▎ | 350/2774 [1:08:50<7:41:25, 11.42s/it] 13%|█▎ | 351/2774 [1:09:02<7:38:04, 11.34s/it] {'loss': 1.062, 'learning_rate': 4.879439274406702e-06, 'epoch': 0.13} 13%|█▎ | 351/2774 [1:09:02<7:38:04, 11.34s/it] 13%|█▎ | 352/2774 [1:09:13<7:40:46, 11.41s/it] {'loss': 0.9668, 'learning_rate': 4.87854190466603e-06, 'epoch': 0.13} 13%|█▎ | 352/2774 [1:09:13<7:40:46, 11.41s/it] 13%|█▎ | 353/2774 [1:09:24<7:39:19, 11.38s/it] {'loss': 1.0552, 'learning_rate': 4.8776412907378845e-06, 'epoch': 0.13} 13%|█▎ | 353/2774 [1:09:24<7:39:19, 11.38s/it] 13%|█▎ | 354/2774 [1:09:36<7:37:47, 11.35s/it] {'loss': 1.0029, 'learning_rate': 4.876737433850647e-06, 'epoch': 0.13} 13%|█▎ | 354/2774 [1:09:36<7:37:47, 11.35s/it] 13%|█▎ | 355/2774 [1:09:47<7:38:12, 11.37s/it] {'loss': 1.0596, 'learning_rate': 4.875830335237125e-06, 'epoch': 0.13} 13%|█▎ | 355/2774 [1:09:47<7:38:12, 11.37s/it] 13%|█▎ | 356/2774 [1:09:59<7:40:32, 11.43s/it] {'loss': 1.001, 'learning_rate': 4.874919996134546e-06, 'epoch': 0.13} 13%|█▎ | 356/2774 [1:09:59<7:40:32, 11.43s/it] 13%|█▎ | 357/2774 [1:10:10<7:38:07, 11.37s/it] {'loss': 1.0957, 'learning_rate': 4.874006417784557e-06, 'epoch': 0.13} 13%|█▎ | 357/2774 [1:10:10<7:38:07, 11.37s/it] 13%|█▎ | 358/2774 [1:10:22<7:40:57, 11.45s/it] {'loss': 1.0806, 'learning_rate': 4.873089601433223e-06, 'epoch': 0.13} 13%|█▎ | 358/2774 [1:10:22<7:40:57, 11.45s/it] 13%|█▎ | 359/2774 [1:10:33<7:40:00, 11.43s/it] {'loss': 1.0415, 'learning_rate': 4.872169548331028e-06, 'epoch': 0.13} 13%|█▎ | 359/2774 [1:10:33<7:40:00, 11.43s/it] 13%|█▎ | 360/2774 [1:10:44<7:38:43, 11.40s/it] {'loss': 1.0732, 'learning_rate': 4.871246259732867e-06, 'epoch': 0.13} 13%|█▎ | 360/2774 [1:10:44<7:38:43, 11.40s/it] 13%|█▎ | 361/2774 [1:10:55<7:35:52, 11.34s/it] {'loss': 1.0498, 'learning_rate': 4.870319736898052e-06, 'epoch': 0.13} 13%|█▎ | 361/2774 [1:10:55<7:35:52, 11.34s/it] 13%|█▎ | 362/2774 [1:11:08<7:44:09, 11.55s/it] {'loss': 1.0068, 'learning_rate': 4.869389981090302e-06, 'epoch': 0.13} 13%|█▎ | 362/2774 [1:11:08<7:44:09, 11.55s/it] 13%|█▎ | 363/2774 [1:11:19<7:45:41, 11.59s/it] {'loss': 1.0596, 'learning_rate': 4.868456993577749e-06, 'epoch': 0.13} 13%|█▎ | 363/2774 [1:11:19<7:45:41, 11.59s/it] 13%|█▎ | 364/2774 [1:11:30<7:41:57, 11.50s/it] {'loss': 1.0474, 'learning_rate': 4.867520775632931e-06, 'epoch': 0.13} 13%|█▎ | 364/2774 [1:11:30<7:41:57, 11.50s/it] 13%|█▎ | 365/2774 [1:11:42<7:38:31, 11.42s/it] {'loss': 1.0449, 'learning_rate': 4.866581328532793e-06, 'epoch': 0.13} 13%|█▎ | 365/2774 [1:11:42<7:38:31, 11.42s/it] 13%|█▎ | 366/2774 [1:11:55<8:04:17, 12.07s/it] {'loss': 1.0161, 'learning_rate': 4.865638653558684e-06, 'epoch': 0.13} 13%|█▎ | 366/2774 [1:11:55<8:04:17, 12.07s/it] 13%|█▎ | 367/2774 [1:12:07<7:56:18, 11.87s/it] {'loss': 1.0024, 'learning_rate': 4.864692751996356e-06, 'epoch': 0.13} 13%|█▎ | 367/2774 [1:12:07<7:56:18, 11.87s/it] 13%|█▎ | 368/2774 [1:12:19<7:55:38, 11.86s/it] {'loss': 1.0278, 'learning_rate': 4.863743625135962e-06, 'epoch': 0.13} 13%|█▎ | 368/2774 [1:12:19<7:55:38, 11.86s/it] 13%|█▎ | 369/2774 [1:12:30<7:46:08, 11.63s/it] {'loss': 1.0132, 'learning_rate': 4.862791274272053e-06, 'epoch': 0.13} 13%|█▎ | 369/2774 [1:12:30<7:46:08, 11.63s/it] 13%|█▎ | 370/2774 [1:12:43<8:10:15, 12.24s/it] {'loss': 1.0083, 'learning_rate': 4.861835700703578e-06, 'epoch': 0.13} 13%|█▎ | 370/2774 [1:12:43<8:10:15, 12.24s/it] 13%|█▎ | 371/2774 [1:12:55<8:02:39, 12.05s/it] {'loss': 1.02, 'learning_rate': 4.860876905733881e-06, 'epoch': 0.13} 13%|█▎ | 371/2774 [1:12:55<8:02:39, 12.05s/it] 13%|█▎ | 372/2774 [1:13:07<7:57:03, 11.92s/it] {'loss': 0.9839, 'learning_rate': 4.859914890670701e-06, 'epoch': 0.13} 13%|█▎ | 372/2774 [1:13:07<7:57:03, 11.92s/it] 13%|█▎ | 373/2774 [1:13:18<7:50:47, 11.76s/it] {'loss': 1.043, 'learning_rate': 4.85894965682617e-06, 'epoch': 0.13} 13%|█▎ | 373/2774 [1:13:18<7:50:47, 11.76s/it] 13%|█▎ | 374/2774 [1:13:29<7:45:01, 11.63s/it] {'loss': 1.0449, 'learning_rate': 4.857981205516807e-06, 'epoch': 0.13} 13%|█▎ | 374/2774 [1:13:29<7:45:01, 11.63s/it] 14%|█▎ | 375/2774 [1:13:41<7:40:52, 11.53s/it] {'loss': 1.0703, 'learning_rate': 4.8570095380635215e-06, 'epoch': 0.14} 14%|█▎ | 375/2774 [1:13:41<7:40:52, 11.53s/it] 14%|█▎ | 376/2774 [1:13:52<7:40:15, 11.52s/it] {'loss': 0.9351, 'learning_rate': 4.856034655791608e-06, 'epoch': 0.14} 14%|█▎ | 376/2774 [1:13:52<7:40:15, 11.52s/it] 14%|█▎ | 377/2774 [1:14:04<7:41:19, 11.55s/it] {'loss': 1.0044, 'learning_rate': 4.85505656003075e-06, 'epoch': 0.14} 14%|█▎ | 377/2774 [1:14:04<7:41:19, 11.55s/it] 14%|█▎ | 378/2774 [1:14:15<7:41:13, 11.55s/it] {'loss': 1.0781, 'learning_rate': 4.854075252115007e-06, 'epoch': 0.14} 14%|█▎ | 378/2774 [1:14:15<7:41:13, 11.55s/it] 14%|█▎ | 379/2774 [1:14:27<7:39:40, 11.52s/it] {'loss': 1.0605, 'learning_rate': 4.853090733382827e-06, 'epoch': 0.14} 14%|█▎ | 379/2774 [1:14:27<7:39:40, 11.52s/it] 14%|█▎ | 380/2774 [1:14:38<7:36:36, 11.44s/it] {'loss': 1.0381, 'learning_rate': 4.852103005177033e-06, 'epoch': 0.14} 14%|█▎ | 380/2774 [1:14:38<7:36:36, 11.44s/it] 14%|█▎ | 381/2774 [1:14:49<7:35:26, 11.42s/it] {'loss': 1.0171, 'learning_rate': 4.851112068844827e-06, 'epoch': 0.14} 14%|█▎ | 381/2774 [1:14:49<7:35:26, 11.42s/it] 14%|█▍ | 382/2774 [1:15:01<7:39:02, 11.51s/it] {'loss': 1.0132, 'learning_rate': 4.850117925737784e-06, 'epoch': 0.14} 14%|█▍ | 382/2774 [1:15:01<7:39:02, 11.51s/it] 14%|█▍ | 383/2774 [1:15:14<7:55:38, 11.94s/it] {'loss': 1.0059, 'learning_rate': 4.8491205772118585e-06, 'epoch': 0.14} 14%|█▍ | 383/2774 [1:15:14<7:55:38, 11.94s/it] 14%|█▍ | 384/2774 [1:15:26<7:54:07, 11.90s/it] {'loss': 1.0347, 'learning_rate': 4.848120024627372e-06, 'epoch': 0.14} 14%|█▍ | 384/2774 [1:15:26<7:54:07, 11.90s/it] 14%|█▍ | 385/2774 [1:15:39<8:06:32, 12.22s/it] {'loss': 1.0317, 'learning_rate': 4.847116269349018e-06, 'epoch': 0.14} 14%|█▍ | 385/2774 [1:15:39<8:06:32, 12.22s/it] 14%|█▍ | 386/2774 [1:15:50<7:54:55, 11.93s/it] {'loss': 0.9907, 'learning_rate': 4.846109312745857e-06, 'epoch': 0.14} 14%|█▍ | 386/2774 [1:15:50<7:54:55, 11.93s/it] 14%|█▍ | 387/2774 [1:16:03<8:05:19, 12.20s/it] {'loss': 1.0088, 'learning_rate': 4.845099156191319e-06, 'epoch': 0.14} 14%|█▍ | 387/2774 [1:16:03<8:05:19, 12.20s/it] 14%|█▍ | 388/2774 [1:16:14<7:58:02, 12.02s/it] {'loss': 1.0156, 'learning_rate': 4.844085801063195e-06, 'epoch': 0.14} 14%|█▍ | 388/2774 [1:16:14<7:58:02, 12.02s/it] 14%|█▍ | 389/2774 [1:16:26<7:50:20, 11.83s/it] {'loss': 1.0845, 'learning_rate': 4.843069248743641e-06, 'epoch': 0.14} 14%|█▍ | 389/2774 [1:16:26<7:50:20, 11.83s/it] 14%|█▍ | 390/2774 [1:16:37<7:48:06, 11.78s/it] {'loss': 1.0801, 'learning_rate': 4.842049500619173e-06, 'epoch': 0.14} 14%|█▍ | 390/2774 [1:16:37<7:48:06, 11.78s/it] 14%|█▍ | 391/2774 [1:16:49<7:41:48, 11.63s/it] {'loss': 1.0278, 'learning_rate': 4.8410265580806645e-06, 'epoch': 0.14} 14%|█▍ | 391/2774 [1:16:49<7:41:48, 11.63s/it] 14%|█▍ | 392/2774 [1:17:00<7:37:29, 11.52s/it] {'loss': 1.0737, 'learning_rate': 4.840000422523348e-06, 'epoch': 0.14} 14%|█▍ | 392/2774 [1:17:00<7:37:29, 11.52s/it] 14%|█▍ | 393/2774 [1:17:12<7:41:08, 11.62s/it] {'loss': 1.0405, 'learning_rate': 4.838971095346811e-06, 'epoch': 0.14} 14%|█▍ | 393/2774 [1:17:12<7:41:08, 11.62s/it] 14%|█▍ | 394/2774 [1:17:23<7:38:34, 11.56s/it] {'loss': 1.021, 'learning_rate': 4.8379385779549944e-06, 'epoch': 0.14} 14%|█▍ | 394/2774 [1:17:23<7:38:34, 11.56s/it] 14%|█▍ | 395/2774 [1:17:37<7:59:20, 12.09s/it] {'loss': 1.0415, 'learning_rate': 4.836902871756187e-06, 'epoch': 0.14} 14%|█▍ | 395/2774 [1:17:37<7:59:20, 12.09s/it] 14%|█▍ | 396/2774 [1:17:48<7:49:30, 11.85s/it] {'loss': 1.0488, 'learning_rate': 4.835863978163032e-06, 'epoch': 0.14} 14%|█▍ | 396/2774 [1:17:48<7:49:30, 11.85s/it] 14%|█▍ | 397/2774 [1:18:01<8:07:33, 12.31s/it] {'loss': 1.0195, 'learning_rate': 4.834821898592516e-06, 'epoch': 0.14} 14%|█▍ | 397/2774 [1:18:01<8:07:33, 12.31s/it] 14%|█▍ | 398/2774 [1:18:12<7:54:20, 11.98s/it] {'loss': 1.0327, 'learning_rate': 4.833776634465973e-06, 'epoch': 0.14} 14%|█▍ | 398/2774 [1:18:12<7:54:20, 11.98s/it] 14%|█▍ | 399/2774 [1:18:24<7:45:57, 11.77s/it] {'loss': 1.0293, 'learning_rate': 4.83272818720908e-06, 'epoch': 0.14} 14%|█▍ | 399/2774 [1:18:24<7:45:57, 11.77s/it] 14%|█▍ | 400/2774 [1:18:38<8:10:45, 12.40s/it] {'loss': 1.0093, 'learning_rate': 4.8316765582518565e-06, 'epoch': 0.14} 14%|█▍ | 400/2774 [1:18:38<8:10:45, 12.40s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 14%|█▍ | 401/2774 [1:19:18<13:46:00, 20.89s/it] {'loss': 0.9956, 'learning_rate': 4.830621749028659e-06, 'epoch': 0.14} 14%|█▍ | 401/2774 [1:19:18<13:46:00, 20.89s/it] 14%|█▍ | 402/2774 [1:19:30<11:52:42, 18.03s/it] {'loss': 1.0186, 'learning_rate': 4.829563760978186e-06, 'epoch': 0.14} 14%|█▍ | 402/2774 [1:19:30<11:52:42, 18.03s/it] 15%|█▍ | 403/2774 [1:19:41<10:32:12, 16.00s/it] {'loss': 1.0161, 'learning_rate': 4.828502595543467e-06, 'epoch': 0.15} 15%|█▍ | 403/2774 [1:19:41<10:32:12, 16.00s/it] 15%|█▍ | 404/2774 [1:19:53<9:40:04, 14.69s/it] {'loss': 1.0298, 'learning_rate': 4.8274382541718695e-06, 'epoch': 0.15} 15%|█▍ | 404/2774 [1:19:53<9:40:04, 14.69s/it] 15%|█▍ | 405/2774 [1:20:06<9:22:00, 14.23s/it] {'loss': 0.9888, 'learning_rate': 4.82637073831509e-06, 'epoch': 0.15} 15%|█▍ | 405/2774 [1:20:06<9:22:00, 14.23s/it] 15%|█▍ | 406/2774 [1:20:17<8:49:03, 13.41s/it] {'loss': 1.0068, 'learning_rate': 4.825300049429155e-06, 'epoch': 0.15} 15%|█▍ | 406/2774 [1:20:17<8:49:03, 13.41s/it] 15%|█▍ | 407/2774 [1:20:29<8:24:32, 12.79s/it] {'loss': 0.9551, 'learning_rate': 4.82422618897442e-06, 'epoch': 0.15} 15%|█▍ | 407/2774 [1:20:29<8:24:32, 12.79s/it] 15%|█▍ | 408/2774 [1:20:40<8:06:24, 12.33s/it] {'loss': 1.0566, 'learning_rate': 4.8231491584155665e-06, 'epoch': 0.15} 15%|█▍ | 408/2774 [1:20:40<8:06:24, 12.33s/it] 15%|█▍ | 409/2774 [1:20:51<7:53:57, 12.02s/it] {'loss': 1.0229, 'learning_rate': 4.822068959221599e-06, 'epoch': 0.15} 15%|█▍ | 409/2774 [1:20:51<7:53:57, 12.02s/it] 15%|█▍ | 410/2774 [1:21:03<7:46:58, 11.85s/it] {'loss': 1.0591, 'learning_rate': 4.8209855928658425e-06, 'epoch': 0.15} 15%|█▍ | 410/2774 [1:21:03<7:46:58, 11.85s/it] 15%|█▍ | 411/2774 [1:21:14<7:43:28, 11.77s/it] {'loss': 1.0337, 'learning_rate': 4.819899060825943e-06, 'epoch': 0.15} 15%|█▍ | 411/2774 [1:21:14<7:43:28, 11.77s/it] 15%|█▍ | 412/2774 [1:21:25<7:37:31, 11.62s/it] {'loss': 1.0273, 'learning_rate': 4.8188093645838674e-06, 'epoch': 0.15} 15%|█▍ | 412/2774 [1:21:25<7:37:31, 11.62s/it] 15%|█▍ | 413/2774 [1:21:38<7:42:42, 11.76s/it] {'loss': 1.0576, 'learning_rate': 4.817716505625894e-06, 'epoch': 0.15} 15%|█▍ | 413/2774 [1:21:38<7:42:42, 11.76s/it] 15%|█▍ | 414/2774 [1:21:50<7:48:54, 11.92s/it] {'loss': 0.9683, 'learning_rate': 4.816620485442616e-06, 'epoch': 0.15} 15%|█▍ | 414/2774 [1:21:50<7:48:54, 11.92s/it] 15%|█▍ | 415/2774 [1:22:01<7:45:20, 11.84s/it] {'loss': 1.041, 'learning_rate': 4.815521305528939e-06, 'epoch': 0.15} 15%|█▍ | 415/2774 [1:22:01<7:45:20, 11.84s/it] 15%|█▍ | 416/2774 [1:22:13<7:41:47, 11.75s/it] {'loss': 1.0493, 'learning_rate': 4.814418967384078e-06, 'epoch': 0.15} 15%|█▍ | 416/2774 [1:22:13<7:41:47, 11.75s/it] 15%|█▌ | 417/2774 [1:22:25<7:46:33, 11.88s/it] {'loss': 1.0293, 'learning_rate': 4.813313472511555e-06, 'epoch': 0.15} 15%|█▌ | 417/2774 [1:22:25<7:46:33, 11.88s/it] 15%|█▌ | 418/2774 [1:22:37<7:48:51, 11.94s/it] {'loss': 0.9956, 'learning_rate': 4.812204822419199e-06, 'epoch': 0.15} 15%|█▌ | 418/2774 [1:22:37<7:48:51, 11.94s/it] 15%|█▌ | 419/2774 [1:22:49<7:44:27, 11.83s/it] {'loss': 1.0215, 'learning_rate': 4.811093018619143e-06, 'epoch': 0.15} 15%|█▌ | 419/2774 [1:22:49<7:44:27, 11.83s/it] 15%|█▌ | 420/2774 [1:23:00<7:38:28, 11.69s/it] {'loss': 1.0117, 'learning_rate': 4.809978062627818e-06, 'epoch': 0.15} 15%|█▌ | 420/2774 [1:23:00<7:38:28, 11.69s/it] 15%|█▌ | 421/2774 [1:23:12<7:34:22, 11.59s/it] {'loss': 1.0, 'learning_rate': 4.808859955965957e-06, 'epoch': 0.15} 15%|█▌ | 421/2774 [1:23:12<7:34:22, 11.59s/it] 15%|█▌ | 422/2774 [1:23:23<7:27:52, 11.43s/it] {'loss': 1.0322, 'learning_rate': 4.807738700158592e-06, 'epoch': 0.15} 15%|█▌ | 422/2774 [1:23:23<7:27:52, 11.43s/it] 15%|█▌ | 423/2774 [1:23:34<7:32:29, 11.55s/it] {'loss': 1.0566, 'learning_rate': 4.806614296735045e-06, 'epoch': 0.15} 15%|█▌ | 423/2774 [1:23:34<7:32:29, 11.55s/it] 15%|█▌ | 424/2774 [1:23:46<7:31:24, 11.53s/it] {'loss': 1.0752, 'learning_rate': 4.805486747228936e-06, 'epoch': 0.15} 15%|█▌ | 424/2774 [1:23:46<7:31:24, 11.53s/it] 15%|█▌ | 425/2774 [1:23:57<7:25:58, 11.39s/it] {'loss': 1.0317, 'learning_rate': 4.804356053178175e-06, 'epoch': 0.15} 15%|█▌ | 425/2774 [1:23:57<7:25:58, 11.39s/it] 15%|█▌ | 426/2774 [1:24:09<7:33:48, 11.60s/it] {'loss': 1.0098, 'learning_rate': 4.8032222161249595e-06, 'epoch': 0.15} 15%|█▌ | 426/2774 [1:24:09<7:33:48, 11.60s/it] 15%|█▌ | 427/2774 [1:24:21<7:32:04, 11.56s/it] {'loss': 0.9946, 'learning_rate': 4.802085237615776e-06, 'epoch': 0.15} 15%|█▌ | 427/2774 [1:24:21<7:32:04, 11.56s/it] 15%|█▌ | 428/2774 [1:24:32<7:33:16, 11.59s/it] {'loss': 1.001, 'learning_rate': 4.800945119201392e-06, 'epoch': 0.15} 15%|█▌ | 428/2774 [1:24:32<7:33:16, 11.59s/it] 15%|█▌ | 429/2774 [1:24:45<7:44:28, 11.88s/it] {'loss': 0.9922, 'learning_rate': 4.799801862436863e-06, 'epoch': 0.15} 15%|█▌ | 429/2774 [1:24:45<7:44:28, 11.88s/it] 16%|█▌ | 430/2774 [1:24:58<7:55:07, 12.16s/it] {'loss': 1.0137, 'learning_rate': 4.798655468881519e-06, 'epoch': 0.16} 16%|█▌ | 430/2774 [1:24:58<7:55:07, 12.16s/it] 16%|█▌ | 431/2774 [1:25:09<7:50:09, 12.04s/it] {'loss': 1.0474, 'learning_rate': 4.797505940098975e-06, 'epoch': 0.16} 16%|█▌ | 431/2774 [1:25:09<7:50:09, 12.04s/it] 16%|█▌ | 432/2774 [1:25:21<7:43:07, 11.86s/it] {'loss': 1.1255, 'learning_rate': 4.796353277657117e-06, 'epoch': 0.16} 16%|█▌ | 432/2774 [1:25:21<7:43:07, 11.86s/it] 16%|█▌ | 433/2774 [1:25:33<7:49:16, 12.03s/it] {'loss': 1.0562, 'learning_rate': 4.795197483128107e-06, 'epoch': 0.16} 16%|█▌ | 433/2774 [1:25:33<7:49:16, 12.03s/it] 16%|█▌ | 434/2774 [1:25:44<7:38:33, 11.76s/it] {'loss': 1.0557, 'learning_rate': 4.794038558088378e-06, 'epoch': 0.16} 16%|█▌ | 434/2774 [1:25:44<7:38:33, 11.76s/it] 16%|█▌ | 435/2774 [1:25:56<7:33:08, 11.62s/it] {'loss': 1.0801, 'learning_rate': 4.792876504118636e-06, 'epoch': 0.16} 16%|█▌ | 435/2774 [1:25:56<7:33:08, 11.62s/it] 16%|█▌ | 436/2774 [1:26:07<7:27:24, 11.48s/it] {'loss': 1.041, 'learning_rate': 4.791711322803852e-06, 'epoch': 0.16} 16%|█▌ | 436/2774 [1:26:07<7:27:24, 11.48s/it] 16%|█▌ | 437/2774 [1:26:18<7:24:36, 11.41s/it] {'loss': 1.0317, 'learning_rate': 4.79054301573326e-06, 'epoch': 0.16} 16%|█▌ | 437/2774 [1:26:18<7:24:36, 11.41s/it] 16%|█▌ | 438/2774 [1:26:30<7:25:43, 11.45s/it] {'loss': 1.062, 'learning_rate': 4.789371584500364e-06, 'epoch': 0.16} 16%|█▌ | 438/2774 [1:26:30<7:25:43, 11.45s/it] 16%|█▌ | 439/2774 [1:26:41<7:24:47, 11.43s/it] {'loss': 1.0542, 'learning_rate': 4.788197030702924e-06, 'epoch': 0.16} 16%|█▌ | 439/2774 [1:26:41<7:24:47, 11.43s/it] 16%|█▌ | 440/2774 [1:26:52<7:24:26, 11.43s/it] {'loss': 1.0288, 'learning_rate': 4.787019355942959e-06, 'epoch': 0.16} 16%|█▌ | 440/2774 [1:26:52<7:24:26, 11.43s/it] 16%|█▌ | 441/2774 [1:27:04<7:21:26, 11.35s/it] {'loss': 1.0215, 'learning_rate': 4.785838561826749e-06, 'epoch': 0.16} 16%|█▌ | 441/2774 [1:27:04<7:21:26, 11.35s/it] 16%|█▌ | 442/2774 [1:27:18<7:57:48, 12.29s/it] {'loss': 1.0317, 'learning_rate': 4.7846546499648224e-06, 'epoch': 0.16} 16%|█▌ | 442/2774 [1:27:18<7:57:48, 12.29s/it] 16%|█▌ | 443/2774 [1:27:31<8:08:19, 12.57s/it] {'loss': 0.9839, 'learning_rate': 4.783467621971966e-06, 'epoch': 0.16} 16%|█▌ | 443/2774 [1:27:31<8:08:19, 12.57s/it] 16%|█▌ | 444/2774 [1:27:43<8:04:23, 12.47s/it] {'loss': 1.0156, 'learning_rate': 4.782277479467216e-06, 'epoch': 0.16} 16%|█▌ | 444/2774 [1:27:43<8:04:23, 12.47s/it] 16%|█▌ | 445/2774 [1:27:55<7:53:51, 12.21s/it] {'loss': 1.0405, 'learning_rate': 4.78108422407385e-06, 'epoch': 0.16} 16%|█▌ | 445/2774 [1:27:55<7:53:51, 12.21s/it] 16%|█▌ | 446/2774 [1:28:07<7:46:16, 12.02s/it] {'loss': 1.0566, 'learning_rate': 4.7798878574194e-06, 'epoch': 0.16} 16%|█▌ | 446/2774 [1:28:07<7:46:16, 12.02s/it] 16%|█▌ | 447/2774 [1:28:18<7:37:58, 11.81s/it] {'loss': 1.0425, 'learning_rate': 4.778688381135636e-06, 'epoch': 0.16} 16%|█▌ | 447/2774 [1:28:18<7:37:58, 11.81s/it] 16%|█▌ | 448/2774 [1:28:31<7:46:25, 12.03s/it] {'loss': 1.0542, 'learning_rate': 4.777485796858572e-06, 'epoch': 0.16} 16%|█▌ | 448/2774 [1:28:31<7:46:25, 12.03s/it] 16%|█▌ | 449/2774 [1:28:44<7:57:26, 12.32s/it] {'loss': 1.0361, 'learning_rate': 4.77628010622846e-06, 'epoch': 0.16} 16%|█▌ | 449/2774 [1:28:44<7:57:26, 12.32s/it] 16%|█▌ | 450/2774 [1:28:55<7:45:27, 12.02s/it] {'loss': 0.9922, 'learning_rate': 4.775071310889791e-06, 'epoch': 0.16} 16%|█▌ | 450/2774 [1:28:55<7:45:27, 12.02s/it] 16%|█▋ | 451/2774 [1:29:07<7:43:17, 11.97s/it] {'loss': 1.0073, 'learning_rate': 4.773859412491285e-06, 'epoch': 0.16} 16%|█▋ | 451/2774 [1:29:07<7:43:17, 11.97s/it] 16%|█▋ | 452/2774 [1:29:18<7:34:40, 11.75s/it] {'loss': 1.0298, 'learning_rate': 4.7726444126859015e-06, 'epoch': 0.16} 16%|█▋ | 452/2774 [1:29:18<7:34:40, 11.75s/it] 16%|█▋ | 453/2774 [1:29:30<7:37:06, 11.82s/it] {'loss': 1.0654, 'learning_rate': 4.771426313130826e-06, 'epoch': 0.16} 16%|█▋ | 453/2774 [1:29:30<7:37:06, 11.82s/it] 16%|█▋ | 454/2774 [1:29:41<7:33:26, 11.73s/it] {'loss': 1.0249, 'learning_rate': 4.770205115487471e-06, 'epoch': 0.16} 16%|█▋ | 454/2774 [1:29:41<7:33:26, 11.73s/it] 16%|█▋ | 455/2774 [1:29:53<7:29:22, 11.63s/it] {'loss': 0.9624, 'learning_rate': 4.76898082142148e-06, 'epoch': 0.16} 16%|█▋ | 455/2774 [1:29:53<7:29:22, 11.63s/it] 16%|█▋ | 456/2774 [1:30:04<7:26:27, 11.56s/it] {'loss': 1.001, 'learning_rate': 4.767753432602713e-06, 'epoch': 0.16} 16%|█▋ | 456/2774 [1:30:04<7:26:27, 11.56s/it] 16%|█▋ | 457/2774 [1:30:15<7:22:13, 11.45s/it] {'loss': 1.0176, 'learning_rate': 4.7665229507052545e-06, 'epoch': 0.16} 16%|█▋ | 457/2774 [1:30:15<7:22:13, 11.45s/it] 17%|█▋ | 458/2774 [1:30:27<7:21:47, 11.45s/it] {'loss': 1.0342, 'learning_rate': 4.765289377407409e-06, 'epoch': 0.17} 17%|█▋ | 458/2774 [1:30:27<7:21:47, 11.45s/it] 17%|█▋ | 459/2774 [1:30:39<7:27:38, 11.60s/it] {'loss': 1.0298, 'learning_rate': 4.764052714391695e-06, 'epoch': 0.17} 17%|█▋ | 459/2774 [1:30:39<7:27:38, 11.60s/it] 17%|█▋ | 460/2774 [1:30:50<7:28:27, 11.63s/it] {'loss': 1.0415, 'learning_rate': 4.762812963344845e-06, 'epoch': 0.17} 17%|█▋ | 460/2774 [1:30:50<7:28:27, 11.63s/it] 17%|█▋ | 461/2774 [1:31:02<7:26:51, 11.59s/it] {'loss': 1.0098, 'learning_rate': 4.7615701259578065e-06, 'epoch': 0.17} 17%|█▋ | 461/2774 [1:31:02<7:26:51, 11.59s/it] 17%|█▋ | 462/2774 [1:31:13<7:21:33, 11.46s/it] {'loss': 1.0547, 'learning_rate': 4.760324203925735e-06, 'epoch': 0.17} 17%|█▋ | 462/2774 [1:31:13<7:21:33, 11.46s/it] 17%|█▋ | 463/2774 [1:31:25<7:29:36, 11.67s/it] {'loss': 1.0513, 'learning_rate': 4.75907519894799e-06, 'epoch': 0.17} 17%|█▋ | 463/2774 [1:31:25<7:29:36, 11.67s/it] 17%|█▋ | 464/2774 [1:31:37<7:25:42, 11.58s/it] {'loss': 1.0586, 'learning_rate': 4.757823112728141e-06, 'epoch': 0.17} 17%|█▋ | 464/2774 [1:31:37<7:25:42, 11.58s/it] 17%|█▋ | 465/2774 [1:31:48<7:25:50, 11.59s/it] {'loss': 1.0977, 'learning_rate': 4.756567946973958e-06, 'epoch': 0.17} 17%|█▋ | 465/2774 [1:31:48<7:25:50, 11.59s/it] 17%|█▋ | 466/2774 [1:32:00<7:21:52, 11.49s/it] {'loss': 0.9995, 'learning_rate': 4.75530970339741e-06, 'epoch': 0.17} 17%|█▋ | 466/2774 [1:32:00<7:21:52, 11.49s/it] 17%|█▋ | 467/2774 [1:32:12<7:38:43, 11.93s/it] {'loss': 1.0361, 'learning_rate': 4.7540483837146675e-06, 'epoch': 0.17} 17%|█▋ | 467/2774 [1:32:13<7:38:43, 11.93s/it] 17%|█▋ | 468/2774 [1:32:24<7:33:16, 11.79s/it] {'loss': 1.0293, 'learning_rate': 4.752783989646092e-06, 'epoch': 0.17} 17%|█▋ | 468/2774 [1:32:24<7:33:16, 11.79s/it] 17%|█▋ | 469/2774 [1:32:35<7:29:47, 11.71s/it] {'loss': 1.0107, 'learning_rate': 4.751516522916242e-06, 'epoch': 0.17} 17%|█▋ | 469/2774 [1:32:35<7:29:47, 11.71s/it] 17%|█▋ | 470/2774 [1:32:47<7:27:41, 11.66s/it] {'loss': 1.0586, 'learning_rate': 4.750245985253864e-06, 'epoch': 0.17} 17%|█▋ | 470/2774 [1:32:47<7:27:41, 11.66s/it] 17%|█▋ | 471/2774 [1:33:00<7:37:19, 11.91s/it] {'loss': 1.0527, 'learning_rate': 4.748972378391897e-06, 'epoch': 0.17} 17%|█▋ | 471/2774 [1:33:00<7:37:19, 11.91s/it] 17%|█▋ | 472/2774 [1:33:11<7:33:15, 11.81s/it] {'loss': 1.0381, 'learning_rate': 4.747695704067462e-06, 'epoch': 0.17} 17%|█▋ | 472/2774 [1:33:11<7:33:15, 11.81s/it] 17%|█▋ | 473/2774 [1:33:22<7:26:11, 11.63s/it] {'loss': 1.0132, 'learning_rate': 4.746415964021866e-06, 'epoch': 0.17} 17%|█▋ | 473/2774 [1:33:22<7:26:11, 11.63s/it] 17%|█▋ | 474/2774 [1:33:36<7:49:14, 12.24s/it] {'loss': 1.0107, 'learning_rate': 4.745133160000598e-06, 'epoch': 0.17} 17%|█▋ | 474/2774 [1:33:36<7:49:14, 12.24s/it] 17%|█▋ | 475/2774 [1:33:47<7:39:39, 12.00s/it] {'loss': 1.0391, 'learning_rate': 4.743847293753323e-06, 'epoch': 0.17} 17%|█▋ | 475/2774 [1:33:47<7:39:39, 12.00s/it] 17%|█▋ | 476/2774 [1:33:59<7:36:00, 11.91s/it] {'loss': 0.9688, 'learning_rate': 4.7425583670338885e-06, 'epoch': 0.17} 17%|█▋ | 476/2774 [1:33:59<7:36:00, 11.91s/it] 17%|█▋ | 477/2774 [1:34:10<7:29:24, 11.74s/it] {'loss': 1.0645, 'learning_rate': 4.741266381600309e-06, 'epoch': 0.17} 17%|█▋ | 477/2774 [1:34:10<7:29:24, 11.74s/it] 17%|█▋ | 478/2774 [1:34:22<7:25:28, 11.64s/it] {'loss': 1.0747, 'learning_rate': 4.739971339214776e-06, 'epoch': 0.17} 17%|█▋ | 478/2774 [1:34:22<7:25:28, 11.64s/it] 17%|█▋ | 479/2774 [1:34:34<7:31:42, 11.81s/it] {'loss': 1.0225, 'learning_rate': 4.73867324164365e-06, 'epoch': 0.17} 17%|█▋ | 479/2774 [1:34:34<7:31:42, 11.81s/it] 17%|█▋ | 480/2774 [1:34:46<7:30:27, 11.78s/it] {'loss': 1.0137, 'learning_rate': 4.737372090657458e-06, 'epoch': 0.17} 17%|█▋ | 480/2774 [1:34:46<7:30:27, 11.78s/it] 17%|█▋ | 481/2774 [1:34:59<7:47:09, 12.22s/it] {'loss': 1.0283, 'learning_rate': 4.736067888030888e-06, 'epoch': 0.17} 17%|█▋ | 481/2774 [1:34:59<7:47:09, 12.22s/it] 17%|█▋ | 482/2774 [1:35:10<7:34:52, 11.91s/it] {'loss': 1.0547, 'learning_rate': 4.734760635542797e-06, 'epoch': 0.17} 17%|█▋ | 482/2774 [1:35:10<7:34:52, 11.91s/it] 17%|█▋ | 483/2774 [1:35:26<8:20:39, 13.11s/it] {'loss': 1.0171, 'learning_rate': 4.733450334976197e-06, 'epoch': 0.17} 17%|█▋ | 483/2774 [1:35:26<8:20:39, 13.11s/it] 17%|█▋ | 484/2774 [1:35:38<8:00:50, 12.60s/it] {'loss': 1.0459, 'learning_rate': 4.732136988118259e-06, 'epoch': 0.17} 17%|█▋ | 484/2774 [1:35:38<8:00:50, 12.60s/it] 17%|█▋ | 485/2774 [1:35:50<7:59:32, 12.57s/it] {'loss': 0.9922, 'learning_rate': 4.73082059676031e-06, 'epoch': 0.17} 17%|█▋ | 485/2774 [1:35:50<7:59:32, 12.57s/it] 18%|█▊ | 486/2774 [1:36:01<7:46:02, 12.22s/it] {'loss': 1.021, 'learning_rate': 4.7295011626978255e-06, 'epoch': 0.18} 18%|█▊ | 486/2774 [1:36:01<7:46:02, 12.22s/it] 18%|█▊ | 487/2774 [1:36:13<7:35:10, 11.94s/it] {'loss': 1.02, 'learning_rate': 4.728178687730436e-06, 'epoch': 0.18} 18%|█▊ | 487/2774 [1:36:13<7:35:10, 11.94s/it] 18%|█▊ | 488/2774 [1:36:24<7:30:07, 11.81s/it] {'loss': 1.0688, 'learning_rate': 4.726853173661917e-06, 'epoch': 0.18} 18%|█▊ | 488/2774 [1:36:24<7:30:07, 11.81s/it] 18%|█▊ | 489/2774 [1:36:36<7:26:51, 11.73s/it] {'loss': 1.0483, 'learning_rate': 4.725524622300191e-06, 'epoch': 0.18} 18%|█▊ | 489/2774 [1:36:36<7:26:51, 11.73s/it] 18%|█▊ | 490/2774 [1:36:47<7:23:31, 11.65s/it] {'loss': 1.063, 'learning_rate': 4.724193035457319e-06, 'epoch': 0.18} 18%|█▊ | 490/2774 [1:36:47<7:23:31, 11.65s/it] 18%|█▊ | 491/2774 [1:36:59<7:20:11, 11.57s/it] {'loss': 1.0371, 'learning_rate': 4.722858414949506e-06, 'epoch': 0.18} 18%|█▊ | 491/2774 [1:36:59<7:20:11, 11.57s/it] 18%|█▊ | 492/2774 [1:37:12<7:43:03, 12.18s/it] {'loss': 1.0298, 'learning_rate': 4.721520762597095e-06, 'epoch': 0.18} 18%|█▊ | 492/2774 [1:37:12<7:43:03, 12.18s/it] 18%|█▊ | 493/2774 [1:37:24<7:40:53, 12.12s/it] {'loss': 1.0337, 'learning_rate': 4.720180080224563e-06, 'epoch': 0.18} 18%|█▊ | 493/2774 [1:37:24<7:40:53, 12.12s/it] 18%|█▊ | 494/2774 [1:37:36<7:34:51, 11.97s/it] {'loss': 1.0405, 'learning_rate': 4.718836369660517e-06, 'epoch': 0.18} 18%|█▊ | 494/2774 [1:37:36<7:34:51, 11.97s/it] 18%|█▊ | 495/2774 [1:37:48<7:35:49, 12.00s/it] {'loss': 1.0986, 'learning_rate': 4.7174896327377e-06, 'epoch': 0.18} 18%|█▊ | 495/2774 [1:37:48<7:35:49, 12.00s/it] 18%|█▊ | 496/2774 [1:37:59<7:27:17, 11.78s/it] {'loss': 1.0879, 'learning_rate': 4.7161398712929785e-06, 'epoch': 0.18} 18%|█▊ | 496/2774 [1:37:59<7:27:17, 11.78s/it] 18%|█▊ | 497/2774 [1:38:11<7:27:23, 11.79s/it] {'loss': 1.0732, 'learning_rate': 4.714787087167346e-06, 'epoch': 0.18} 18%|█▊ | 497/2774 [1:38:11<7:27:23, 11.79s/it] 18%|█▊ | 498/2774 [1:38:22<7:21:05, 11.63s/it] {'loss': 1.0303, 'learning_rate': 4.713431282205919e-06, 'epoch': 0.18} 18%|█▊ | 498/2774 [1:38:22<7:21:05, 11.63s/it] 18%|█▊ | 499/2774 [1:38:34<7:17:36, 11.54s/it] {'loss': 1.0791, 'learning_rate': 4.712072458257932e-06, 'epoch': 0.18} 18%|█▊ | 499/2774 [1:38:34<7:17:36, 11.54s/it] 18%|█▊ | 500/2774 [1:38:47<7:38:37, 12.10s/it] {'loss': 1.0293, 'learning_rate': 4.710710617176739e-06, 'epoch': 0.18} 18%|█▊ | 500/2774 [1:38:47<7:38:37, 12.10s/it] 18%|█▊ | 501/2774 [1:38:58<7:29:25, 11.86s/it] {'loss': 1.0601, 'learning_rate': 4.70934576081981e-06, 'epoch': 0.18} 18%|█▊ | 501/2774 [1:38:58<7:29:25, 11.86s/it] 18%|█▊ | 502/2774 [1:39:12<7:44:27, 12.27s/it] {'loss': 0.9668, 'learning_rate': 4.7079778910487264e-06, 'epoch': 0.18} 18%|█▊ | 502/2774 [1:39:12<7:44:27, 12.27s/it] 18%|█▊ | 503/2774 [1:39:23<7:34:47, 12.02s/it] {'loss': 1.0669, 'learning_rate': 4.70660700972918e-06, 'epoch': 0.18} 18%|█▊ | 503/2774 [1:39:23<7:34:47, 12.02s/it] 18%|█▊ | 504/2774 [1:39:34<7:28:42, 11.86s/it] {'loss': 1.1055, 'learning_rate': 4.705233118730969e-06, 'epoch': 0.18} 18%|█▊ | 504/2774 [1:39:34<7:28:42, 11.86s/it] 18%|█▊ | 505/2774 [1:39:47<7:33:08, 11.98s/it] {'loss': 1.0156, 'learning_rate': 4.703856219927999e-06, 'epoch': 0.18} 18%|█▊ | 505/2774 [1:39:47<7:33:08, 11.98s/it] 18%|█▊ | 506/2774 [1:39:59<7:40:03, 12.17s/it] {'loss': 1.0615, 'learning_rate': 4.702476315198275e-06, 'epoch': 0.18} 18%|█▊ | 506/2774 [1:39:59<7:40:03, 12.17s/it] 18%|█▊ | 507/2774 [1:40:11<7:29:04, 11.89s/it] {'loss': 1.0332, 'learning_rate': 4.701093406423907e-06, 'epoch': 0.18} 18%|█▊ | 507/2774 [1:40:11<7:29:04, 11.89s/it] 18%|█▊ | 508/2774 [1:40:22<7:20:35, 11.67s/it] {'loss': 1.1064, 'learning_rate': 4.699707495491096e-06, 'epoch': 0.18} 18%|█▊ | 508/2774 [1:40:22<7:20:35, 11.67s/it] 18%|█▊ | 509/2774 [1:40:33<7:17:56, 11.60s/it] {'loss': 1.0078, 'learning_rate': 4.698318584290141e-06, 'epoch': 0.18} 18%|█▊ | 509/2774 [1:40:33<7:17:56, 11.60s/it] 18%|█▊ | 510/2774 [1:40:45<7:15:52, 11.55s/it] {'loss': 1.0317, 'learning_rate': 4.696926674715435e-06, 'epoch': 0.18} 18%|█▊ | 510/2774 [1:40:45<7:15:52, 11.55s/it] 18%|█▊ | 511/2774 [1:40:56<7:13:09, 11.48s/it] {'loss': 1.0947, 'learning_rate': 4.695531768665456e-06, 'epoch': 0.18} 18%|█▊ | 511/2774 [1:40:56<7:13:09, 11.48s/it] 18%|█▊ | 512/2774 [1:41:07<7:09:14, 11.39s/it] {'loss': 1.0269, 'learning_rate': 4.694133868042775e-06, 'epoch': 0.18} 18%|█▊ | 512/2774 [1:41:07<7:09:14, 11.39s/it] 18%|█▊ | 513/2774 [1:41:19<7:11:19, 11.45s/it] {'loss': 0.9688, 'learning_rate': 4.692732974754041e-06, 'epoch': 0.18} 18%|█▊ | 513/2774 [1:41:19<7:11:19, 11.45s/it] 19%|█▊ | 514/2774 [1:41:30<7:11:38, 11.46s/it] {'loss': 1.0244, 'learning_rate': 4.691329090709989e-06, 'epoch': 0.19} 19%|█▊ | 514/2774 [1:41:30<7:11:38, 11.46s/it] 19%|█▊ | 515/2774 [1:41:41<7:08:56, 11.39s/it] {'loss': 1.0576, 'learning_rate': 4.689922217825431e-06, 'epoch': 0.19} 19%|█▊ | 515/2774 [1:41:41<7:08:56, 11.39s/it] 19%|█▊ | 516/2774 [1:41:53<7:10:09, 11.43s/it] {'loss': 1.0073, 'learning_rate': 4.6885123580192575e-06, 'epoch': 0.19} 19%|█▊ | 516/2774 [1:41:53<7:10:09, 11.43s/it] 19%|█▊ | 517/2774 [1:42:04<7:10:47, 11.45s/it] {'loss': 1.1006, 'learning_rate': 4.687099513214433e-06, 'epoch': 0.19} 19%|█▊ | 517/2774 [1:42:04<7:10:47, 11.45s/it] 19%|█▊ | 518/2774 [1:42:16<7:14:30, 11.56s/it] {'loss': 1.0708, 'learning_rate': 4.685683685337991e-06, 'epoch': 0.19} 19%|█▊ | 518/2774 [1:42:16<7:14:30, 11.56s/it] 19%|█▊ | 519/2774 [1:42:28<7:14:18, 11.56s/it] {'loss': 1.061, 'learning_rate': 4.684264876321035e-06, 'epoch': 0.19} 19%|█▊ | 519/2774 [1:42:28<7:14:18, 11.56s/it] 19%|█▊ | 520/2774 [1:42:41<7:36:35, 12.15s/it] {'loss': 0.9976, 'learning_rate': 4.682843088098736e-06, 'epoch': 0.19} 19%|█▊ | 520/2774 [1:42:41<7:36:35, 12.15s/it] 19%|█▉ | 521/2774 [1:42:53<7:34:52, 12.11s/it] {'loss': 1.0098, 'learning_rate': 4.681418322610327e-06, 'epoch': 0.19} 19%|█▉ | 521/2774 [1:42:53<7:34:52, 12.11s/it] 19%|█▉ | 522/2774 [1:43:04<7:22:43, 11.80s/it] {'loss': 1.0615, 'learning_rate': 4.679990581799102e-06, 'epoch': 0.19} 19%|█▉ | 522/2774 [1:43:04<7:22:43, 11.80s/it] 19%|█▉ | 523/2774 [1:43:16<7:23:23, 11.82s/it] {'loss': 1.0679, 'learning_rate': 4.678559867612412e-06, 'epoch': 0.19} 19%|█▉ | 523/2774 [1:43:16<7:23:23, 11.82s/it] 19%|█▉ | 524/2774 [1:43:28<7:22:58, 11.81s/it] {'loss': 1.0898, 'learning_rate': 4.677126182001667e-06, 'epoch': 0.19} 19%|█▉ | 524/2774 [1:43:28<7:22:58, 11.81s/it] 19%|█▉ | 525/2774 [1:43:40<7:20:57, 11.76s/it] {'loss': 1.0562, 'learning_rate': 4.675689526922324e-06, 'epoch': 0.19} 19%|█▉ | 525/2774 [1:43:40<7:20:57, 11.76s/it] 19%|█▉ | 526/2774 [1:43:51<7:14:51, 11.61s/it] {'loss': 1.0024, 'learning_rate': 4.6742499043338985e-06, 'epoch': 0.19} 19%|█▉ | 526/2774 [1:43:51<7:14:51, 11.61s/it] 19%|█▉ | 527/2774 [1:44:02<7:09:30, 11.47s/it] {'loss': 0.9907, 'learning_rate': 4.672807316199946e-06, 'epoch': 0.19} 19%|█▉ | 527/2774 [1:44:02<7:09:30, 11.47s/it] 19%|█▉ | 528/2774 [1:44:16<7:32:46, 12.10s/it] {'loss': 1.0269, 'learning_rate': 4.671361764488069e-06, 'epoch': 0.19} 19%|█▉ | 528/2774 [1:44:16<7:32:46, 12.10s/it] 19%|█▉ | 529/2774 [1:44:27<7:23:31, 11.85s/it] {'loss': 1.085, 'learning_rate': 4.669913251169914e-06, 'epoch': 0.19} 19%|█▉ | 529/2774 [1:44:27<7:23:31, 11.85s/it] 19%|█▉ | 530/2774 [1:44:40<7:36:23, 12.20s/it] {'loss': 1.0078, 'learning_rate': 4.668461778221165e-06, 'epoch': 0.19} 19%|█▉ | 530/2774 [1:44:40<7:36:23, 12.20s/it] 19%|█▉ | 531/2774 [1:44:51<7:24:15, 11.88s/it] {'loss': 0.9917, 'learning_rate': 4.6670073476215435e-06, 'epoch': 0.19} 19%|█▉ | 531/2774 [1:44:51<7:24:15, 11.88s/it] 19%|█▉ | 532/2774 [1:45:04<7:32:46, 12.12s/it] {'loss': 1.0391, 'learning_rate': 4.665549961354806e-06, 'epoch': 0.19} 19%|█▉ | 532/2774 [1:45:04<7:32:46, 12.12s/it] 19%|█▉ | 533/2774 [1:45:15<7:25:52, 11.94s/it] {'loss': 1.0791, 'learning_rate': 4.664089621408738e-06, 'epoch': 0.19} 19%|█▉ | 533/2774 [1:45:15<7:25:52, 11.94s/it] 19%|█▉ | 534/2774 [1:45:28<7:33:37, 12.15s/it] {'loss': 1.0063, 'learning_rate': 4.662626329775155e-06, 'epoch': 0.19} 19%|█▉ | 534/2774 [1:45:28<7:33:37, 12.15s/it] 19%|█▉ | 535/2774 [1:45:40<7:27:17, 11.99s/it] {'loss': 1.0649, 'learning_rate': 4.6611600884498994e-06, 'epoch': 0.19} 19%|█▉ | 535/2774 [1:45:40<7:27:17, 11.99s/it] 19%|█▉ | 536/2774 [1:45:51<7:22:50, 11.87s/it] {'loss': 1.0093, 'learning_rate': 4.659690899432835e-06, 'epoch': 0.19} 19%|█▉ | 536/2774 [1:45:51<7:22:50, 11.87s/it] 19%|█▉ | 537/2774 [1:46:03<7:17:33, 11.74s/it] {'loss': 1.0449, 'learning_rate': 4.658218764727847e-06, 'epoch': 0.19} 19%|█▉ | 537/2774 [1:46:03<7:17:33, 11.74s/it] 19%|█▉ | 538/2774 [1:46:14<7:11:48, 11.59s/it] {'loss': 1.0474, 'learning_rate': 4.656743686342838e-06, 'epoch': 0.19} 19%|█▉ | 538/2774 [1:46:14<7:11:48, 11.59s/it] 19%|█▉ | 539/2774 [1:46:25<7:10:31, 11.56s/it] {'loss': 1.061, 'learning_rate': 4.655265666289727e-06, 'epoch': 0.19} 19%|█▉ | 539/2774 [1:46:25<7:10:31, 11.56s/it] 19%|█▉ | 540/2774 [1:46:37<7:11:38, 11.59s/it] {'loss': 1.0562, 'learning_rate': 4.653784706584443e-06, 'epoch': 0.19} 19%|█▉ | 540/2774 [1:46:37<7:11:38, 11.59s/it] 20%|█▉ | 541/2774 [1:46:50<7:26:44, 12.00s/it] {'loss': 1.0645, 'learning_rate': 4.6523008092469255e-06, 'epoch': 0.2} 20%|█▉ | 541/2774 [1:46:50<7:26:44, 12.00s/it] 20%|█▉ | 542/2774 [1:47:01<7:20:49, 11.85s/it] {'loss': 1.0762, 'learning_rate': 4.6508139763011205e-06, 'epoch': 0.2} 20%|█▉ | 542/2774 [1:47:01<7:20:49, 11.85s/it] 20%|█▉ | 543/2774 [1:47:13<7:17:18, 11.76s/it] {'loss': 1.0303, 'learning_rate': 4.649324209774979e-06, 'epoch': 0.2} 20%|█▉ | 543/2774 [1:47:13<7:17:18, 11.76s/it] 20%|█▉ | 544/2774 [1:47:25<7:19:13, 11.82s/it] {'loss': 1.0508, 'learning_rate': 4.647831511700453e-06, 'epoch': 0.2} 20%|█▉ | 544/2774 [1:47:25<7:19:13, 11.82s/it] 20%|█▉ | 545/2774 [1:47:36<7:13:47, 11.68s/it] {'loss': 1.0303, 'learning_rate': 4.646335884113492e-06, 'epoch': 0.2} 20%|█▉ | 545/2774 [1:47:36<7:13:47, 11.68s/it] 20%|█▉ | 546/2774 [1:47:48<7:14:11, 11.69s/it] {'loss': 1.0537, 'learning_rate': 4.644837329054042e-06, 'epoch': 0.2} 20%|█▉ | 546/2774 [1:47:48<7:14:11, 11.69s/it] 20%|█▉ | 547/2774 [1:48:00<7:13:40, 11.68s/it] {'loss': 1.0654, 'learning_rate': 4.6433358485660405e-06, 'epoch': 0.2} 20%|█▉ | 547/2774 [1:48:00<7:13:40, 11.68s/it] 20%|█▉ | 548/2774 [1:48:11<7:12:18, 11.65s/it] {'loss': 1.0566, 'learning_rate': 4.641831444697417e-06, 'epoch': 0.2} 20%|█▉ | 548/2774 [1:48:11<7:12:18, 11.65s/it] 20%|█▉ | 549/2774 [1:48:23<7:09:52, 11.59s/it] {'loss': 1.0347, 'learning_rate': 4.640324119500087e-06, 'epoch': 0.2} 20%|█▉ | 549/2774 [1:48:23<7:09:52, 11.59s/it] 20%|█▉ | 550/2774 [1:48:34<7:10:23, 11.61s/it] {'loss': 1.0337, 'learning_rate': 4.638813875029952e-06, 'epoch': 0.2} 20%|█▉ | 550/2774 [1:48:34<7:10:23, 11.61s/it] 20%|█▉ | 551/2774 [1:48:47<7:25:42, 12.03s/it] {'loss': 0.9717, 'learning_rate': 4.637300713346894e-06, 'epoch': 0.2} 20%|█▉ | 551/2774 [1:48:47<7:25:42, 12.03s/it] 20%|█▉ | 552/2774 [1:48:59<7:20:58, 11.91s/it] {'loss': 1.0137, 'learning_rate': 4.635784636514773e-06, 'epoch': 0.2} 20%|█▉ | 552/2774 [1:48:59<7:20:58, 11.91s/it] 20%|█▉ | 553/2774 [1:49:10<7:12:42, 11.69s/it] {'loss': 1.0098, 'learning_rate': 4.634265646601427e-06, 'epoch': 0.2} 20%|█▉ | 553/2774 [1:49:10<7:12:42, 11.69s/it] 20%|█▉ | 554/2774 [1:49:23<7:24:04, 12.00s/it] {'loss': 1.0166, 'learning_rate': 4.632743745678667e-06, 'epoch': 0.2} 20%|█▉ | 554/2774 [1:49:23<7:24:04, 12.00s/it] 20%|██ | 555/2774 [1:49:35<7:21:17, 11.93s/it] {'loss': 1.0649, 'learning_rate': 4.631218935822273e-06, 'epoch': 0.2} 20%|██ | 555/2774 [1:49:35<7:21:17, 11.93s/it] 20%|██ | 556/2774 [1:49:46<7:14:27, 11.75s/it] {'loss': 1.0801, 'learning_rate': 4.629691219111993e-06, 'epoch': 0.2} 20%|██ | 556/2774 [1:49:46<7:14:27, 11.75s/it] 20%|██ | 557/2774 [1:49:57<7:09:51, 11.63s/it] {'loss': 1.0073, 'learning_rate': 4.628160597631543e-06, 'epoch': 0.2} 20%|██ | 557/2774 [1:49:57<7:09:51, 11.63s/it] 20%|██ | 558/2774 [1:50:10<7:19:05, 11.89s/it] {'loss': 1.0127, 'learning_rate': 4.626627073468596e-06, 'epoch': 0.2} 20%|██ | 558/2774 [1:50:10<7:19:05, 11.89s/it] 20%|██ | 559/2774 [1:50:21<7:15:29, 11.80s/it] {'loss': 1.061, 'learning_rate': 4.6250906487147865e-06, 'epoch': 0.2} 20%|██ | 559/2774 [1:50:21<7:15:29, 11.80s/it] 20%|██ | 560/2774 [1:50:33<7:14:11, 11.77s/it] {'loss': 1.0293, 'learning_rate': 4.623551325465705e-06, 'epoch': 0.2} 20%|██ | 560/2774 [1:50:33<7:14:11, 11.77s/it] 20%|██ | 561/2774 [1:50:45<7:10:10, 11.66s/it] {'loss': 0.9629, 'learning_rate': 4.622009105820896e-06, 'epoch': 0.2} 20%|██ | 561/2774 [1:50:45<7:10:10, 11.66s/it] 20%|██ | 562/2774 [1:50:56<7:07:16, 11.59s/it] {'loss': 1.0488, 'learning_rate': 4.620463991883853e-06, 'epoch': 0.2} 20%|██ | 562/2774 [1:50:56<7:07:16, 11.59s/it] 20%|██ | 563/2774 [1:51:08<7:07:20, 11.60s/it] {'loss': 1.0161, 'learning_rate': 4.6189159857620194e-06, 'epoch': 0.2} 20%|██ | 563/2774 [1:51:08<7:07:20, 11.60s/it] 20%|██ | 564/2774 [1:51:21<7:23:16, 12.03s/it] {'loss': 0.9761, 'learning_rate': 4.617365089566782e-06, 'epoch': 0.2} 20%|██ | 564/2774 [1:51:21<7:23:16, 12.03s/it] 20%|██ | 565/2774 [1:51:32<7:16:14, 11.85s/it] {'loss': 1.0459, 'learning_rate': 4.615811305413468e-06, 'epoch': 0.2} 20%|██ | 565/2774 [1:51:32<7:16:14, 11.85s/it] 20%|██ | 566/2774 [1:51:46<7:37:38, 12.44s/it] {'loss': 1.0386, 'learning_rate': 4.614254635421347e-06, 'epoch': 0.2} 20%|██ | 566/2774 [1:51:46<7:37:38, 12.44s/it] 20%|██ | 567/2774 [1:51:58<7:40:02, 12.51s/it] {'loss': 1.0571, 'learning_rate': 4.61269508171362e-06, 'epoch': 0.2} 20%|██ | 567/2774 [1:51:58<7:40:02, 12.51s/it] 20%|██ | 568/2774 [1:52:12<7:49:16, 12.76s/it] {'loss': 0.9976, 'learning_rate': 4.611132646417428e-06, 'epoch': 0.2} 20%|██ | 568/2774 [1:52:12<7:49:16, 12.76s/it] 21%|██ | 569/2774 [1:52:24<7:38:54, 12.49s/it] {'loss': 0.9829, 'learning_rate': 4.609567331663836e-06, 'epoch': 0.21} 21%|██ | 569/2774 [1:52:24<7:38:54, 12.49s/it] 21%|██ | 570/2774 [1:52:35<7:29:21, 12.23s/it] {'loss': 1.0737, 'learning_rate': 4.607999139587838e-06, 'epoch': 0.21} 21%|██ | 570/2774 [1:52:35<7:29:21, 12.23s/it] 21%|██ | 571/2774 [1:52:47<7:18:46, 11.95s/it] {'loss': 1.0352, 'learning_rate': 4.606428072328355e-06, 'epoch': 0.21} 21%|██ | 571/2774 [1:52:47<7:18:46, 11.95s/it] 21%|██ | 572/2774 [1:52:58<7:12:05, 11.77s/it] {'loss': 1.0537, 'learning_rate': 4.604854132028227e-06, 'epoch': 0.21} 21%|██ | 572/2774 [1:52:58<7:12:05, 11.77s/it] 21%|██ | 573/2774 [1:53:09<7:06:12, 11.62s/it] {'loss': 1.0244, 'learning_rate': 4.603277320834213e-06, 'epoch': 0.21} 21%|██ | 573/2774 [1:53:09<7:06:12, 11.62s/it] 21%|██ | 574/2774 [1:53:21<7:06:13, 11.62s/it] {'loss': 1.0391, 'learning_rate': 4.6016976408969895e-06, 'epoch': 0.21} 21%|██ | 574/2774 [1:53:21<7:06:13, 11.62s/it] 21%|██ | 575/2774 [1:53:33<7:11:26, 11.77s/it] {'loss': 0.9937, 'learning_rate': 4.600115094371144e-06, 'epoch': 0.21} 21%|██ | 575/2774 [1:53:33<7:11:26, 11.77s/it] 21%|██ | 576/2774 [1:53:44<7:05:11, 11.61s/it] {'loss': 1.0244, 'learning_rate': 4.5985296834151735e-06, 'epoch': 0.21} 21%|██ | 576/2774 [1:53:44<7:05:11, 11.61s/it] 21%|██ | 577/2774 [1:53:56<7:02:27, 11.54s/it] {'loss': 1.0186, 'learning_rate': 4.5969414101914846e-06, 'epoch': 0.21} 21%|██ | 577/2774 [1:53:56<7:02:27, 11.54s/it] 21%|██ | 578/2774 [1:54:07<7:03:56, 11.58s/it] {'loss': 1.0601, 'learning_rate': 4.595350276866384e-06, 'epoch': 0.21} 21%|██ | 578/2774 [1:54:07<7:03:56, 11.58s/it] 21%|██ | 579/2774 [1:54:19<7:01:27, 11.52s/it] {'loss': 1.0034, 'learning_rate': 4.593756285610083e-06, 'epoch': 0.21} 21%|██ | 579/2774 [1:54:19<7:01:27, 11.52s/it] 21%|██ | 580/2774 [1:54:30<7:03:08, 11.57s/it] {'loss': 1.0093, 'learning_rate': 4.592159438596688e-06, 'epoch': 0.21} 21%|██ | 580/2774 [1:54:30<7:03:08, 11.57s/it] 21%|██ | 581/2774 [1:54:42<7:03:39, 11.59s/it] {'loss': 1.0259, 'learning_rate': 4.590559738004203e-06, 'epoch': 0.21} 21%|██ | 581/2774 [1:54:42<7:03:39, 11.59s/it] 21%|██ | 582/2774 [1:54:54<7:04:50, 11.63s/it] {'loss': 1.0225, 'learning_rate': 4.588957186014523e-06, 'epoch': 0.21} 21%|██ | 582/2774 [1:54:54<7:04:50, 11.63s/it] 21%|██ | 583/2774 [1:55:05<7:04:32, 11.63s/it] {'loss': 1.0181, 'learning_rate': 4.587351784813431e-06, 'epoch': 0.21} 21%|██ | 583/2774 [1:55:05<7:04:32, 11.63s/it] 21%|██ | 584/2774 [1:55:16<6:58:51, 11.48s/it] {'loss': 1.0469, 'learning_rate': 4.585743536590599e-06, 'epoch': 0.21} 21%|██ | 584/2774 [1:55:16<6:58:51, 11.48s/it] 21%|██ | 585/2774 [1:55:28<7:00:05, 11.51s/it] {'loss': 1.0615, 'learning_rate': 4.5841324435395785e-06, 'epoch': 0.21} 21%|██ | 585/2774 [1:55:28<7:00:05, 11.51s/it] 21%|██ | 586/2774 [1:55:40<6:59:46, 11.51s/it] {'loss': 1.0869, 'learning_rate': 4.582518507857804e-06, 'epoch': 0.21} 21%|██ | 586/2774 [1:55:40<6:59:46, 11.51s/it] 21%|██ | 587/2774 [1:55:51<6:57:22, 11.45s/it] {'loss': 0.9849, 'learning_rate': 4.580901731746587e-06, 'epoch': 0.21} 21%|██ | 587/2774 [1:55:51<6:57:22, 11.45s/it] 21%|██ | 588/2774 [1:56:04<7:11:44, 11.85s/it] {'loss': 1.0464, 'learning_rate': 4.579282117411111e-06, 'epoch': 0.21} 21%|██ | 588/2774 [1:56:04<7:11:44, 11.85s/it] 21%|██ | 589/2774 [1:56:16<7:21:08, 12.11s/it] {'loss': 1.0444, 'learning_rate': 4.577659667060432e-06, 'epoch': 0.21} 21%|██ | 589/2774 [1:56:16<7:21:08, 12.11s/it] 21%|██▏ | 590/2774 [1:56:30<7:39:38, 12.63s/it] {'loss': 1.0205, 'learning_rate': 4.576034382907476e-06, 'epoch': 0.21} 21%|██▏ | 590/2774 [1:56:30<7:39:38, 12.63s/it] 21%|██▏ | 591/2774 [1:56:42<7:29:24, 12.35s/it] {'loss': 1.0474, 'learning_rate': 4.574406267169031e-06, 'epoch': 0.21} 21%|██▏ | 591/2774 [1:56:42<7:29:24, 12.35s/it] 21%|██▏ | 592/2774 [1:56:53<7:20:44, 12.12s/it] {'loss': 1.0615, 'learning_rate': 4.57277532206575e-06, 'epoch': 0.21} 21%|██▏ | 592/2774 [1:56:53<7:20:44, 12.12s/it] 21%|██▏ | 593/2774 [1:57:05<7:19:13, 12.08s/it] {'loss': 1.0107, 'learning_rate': 4.571141549822142e-06, 'epoch': 0.21} 21%|██▏ | 593/2774 [1:57:05<7:19:13, 12.08s/it] 21%|██▏ | 594/2774 [1:57:18<7:28:21, 12.34s/it] {'loss': 1.0454, 'learning_rate': 4.569504952666574e-06, 'epoch': 0.21} 21%|██▏ | 594/2774 [1:57:18<7:28:21, 12.34s/it] 21%|██▏ | 595/2774 [1:57:30<7:17:01, 12.03s/it] {'loss': 1.0732, 'learning_rate': 4.567865532831266e-06, 'epoch': 0.21} 21%|██▏ | 595/2774 [1:57:30<7:17:01, 12.03s/it] 21%|██▏ | 596/2774 [1:57:41<7:08:56, 11.82s/it] {'loss': 1.0122, 'learning_rate': 4.566223292552287e-06, 'epoch': 0.21} 21%|██▏ | 596/2774 [1:57:41<7:08:56, 11.82s/it] 22%|██▏ | 597/2774 [1:57:53<7:08:07, 11.80s/it] {'loss': 1.0415, 'learning_rate': 4.564578234069556e-06, 'epoch': 0.22} 22%|██▏ | 597/2774 [1:57:53<7:08:07, 11.80s/it] 22%|██▏ | 598/2774 [1:58:04<7:05:19, 11.73s/it] {'loss': 1.0215, 'learning_rate': 4.5629303596268295e-06, 'epoch': 0.22} 22%|██▏ | 598/2774 [1:58:04<7:05:19, 11.73s/it] 22%|██▏ | 599/2774 [1:58:16<7:03:57, 11.70s/it] {'loss': 1.0142, 'learning_rate': 4.561279671471711e-06, 'epoch': 0.22} 22%|██▏ | 599/2774 [1:58:16<7:03:57, 11.70s/it] 22%|██▏ | 600/2774 [1:58:28<7:02:09, 11.65s/it] {'loss': 1.043, 'learning_rate': 4.55962617185564e-06, 'epoch': 0.22} 22%|██▏ | 600/2774 [1:58:28<7:02:09, 11.65s/it] 22%|██▏ | 601/2774 [1:58:39<7:02:08, 11.66s/it] {'loss': 1.0596, 'learning_rate': 4.557969863033889e-06, 'epoch': 0.22} 22%|██▏ | 601/2774 [1:58:39<7:02:08, 11.66s/it] 22%|██▏ | 602/2774 [1:58:51<7:02:37, 11.67s/it] {'loss': 1.0308, 'learning_rate': 4.556310747265562e-06, 'epoch': 0.22} 22%|██▏ | 602/2774 [1:58:51<7:02:37, 11.67s/it] 22%|██▏ | 603/2774 [1:59:02<7:00:08, 11.61s/it] {'loss': 1.0361, 'learning_rate': 4.554648826813595e-06, 'epoch': 0.22} 22%|██▏ | 603/2774 [1:59:02<7:00:08, 11.61s/it] 22%|██▏ | 604/2774 [1:59:14<6:59:12, 11.59s/it] {'loss': 0.9897, 'learning_rate': 4.5529841039447466e-06, 'epoch': 0.22} 22%|██▏ | 604/2774 [1:59:14<6:59:12, 11.59s/it] 22%|██▏ | 605/2774 [1:59:26<6:58:52, 11.59s/it] {'loss': 1.0264, 'learning_rate': 4.551316580929597e-06, 'epoch': 0.22} 22%|██▏ | 605/2774 [1:59:26<6:58:52, 11.59s/it] 22%|██▏ | 606/2774 [1:59:37<6:55:49, 11.51s/it] {'loss': 1.0728, 'learning_rate': 4.5496462600425474e-06, 'epoch': 0.22} 22%|██▏ | 606/2774 [1:59:37<6:55:49, 11.51s/it] 22%|██▏ | 607/2774 [1:59:49<6:58:33, 11.59s/it] {'loss': 1.0391, 'learning_rate': 4.547973143561816e-06, 'epoch': 0.22} 22%|██▏ | 607/2774 [1:59:49<6:58:33, 11.59s/it] 22%|██▏ | 608/2774 [2:00:00<6:57:38, 11.57s/it] {'loss': 1.04, 'learning_rate': 4.54629723376943e-06, 'epoch': 0.22} 22%|██▏ | 608/2774 [2:00:00<6:57:38, 11.57s/it] 22%|██▏ | 609/2774 [2:00:14<7:17:27, 12.12s/it] {'loss': 1.0068, 'learning_rate': 4.544618532951232e-06, 'epoch': 0.22} 22%|██▏ | 609/2774 [2:00:14<7:17:27, 12.12s/it] 22%|██▏ | 610/2774 [2:00:25<7:09:50, 11.92s/it] {'loss': 1.0098, 'learning_rate': 4.542937043396865e-06, 'epoch': 0.22} 22%|██▏ | 610/2774 [2:00:25<7:09:50, 11.92s/it] 22%|██▏ | 611/2774 [2:00:36<7:04:47, 11.78s/it] {'loss': 1.0566, 'learning_rate': 4.541252767399783e-06, 'epoch': 0.22} 22%|██▏ | 611/2774 [2:00:36<7:04:47, 11.78s/it] 22%|██▏ | 612/2774 [2:00:51<7:31:59, 12.54s/it] {'loss': 1.0215, 'learning_rate': 4.5395657072572345e-06, 'epoch': 0.22} 22%|██▏ | 612/2774 [2:00:51<7:31:59, 12.54s/it] 22%|██▏ | 613/2774 [2:01:03<7:25:47, 12.38s/it] {'loss': 1.0396, 'learning_rate': 4.537875865270267e-06, 'epoch': 0.22} 22%|██▏ | 613/2774 [2:01:03<7:25:47, 12.38s/it] 22%|██▏ | 614/2774 [2:01:14<7:13:32, 12.04s/it] {'loss': 1.084, 'learning_rate': 4.536183243743726e-06, 'epoch': 0.22} 22%|██▏ | 614/2774 [2:01:14<7:13:32, 12.04s/it] 22%|██▏ | 615/2774 [2:01:26<7:17:49, 12.17s/it] {'loss': 1.0376, 'learning_rate': 4.534487844986241e-06, 'epoch': 0.22} 22%|██▏ | 615/2774 [2:01:26<7:17:49, 12.17s/it] 22%|██▏ | 616/2774 [2:01:39<7:25:51, 12.40s/it] {'loss': 0.9961, 'learning_rate': 4.532789671310236e-06, 'epoch': 0.22} 22%|██▏ | 616/2774 [2:01:39<7:25:51, 12.40s/it] 22%|██▏ | 617/2774 [2:01:51<7:12:04, 12.02s/it] {'loss': 1.0186, 'learning_rate': 4.531088725031917e-06, 'epoch': 0.22} 22%|██▏ | 617/2774 [2:01:51<7:12:04, 12.02s/it] 22%|██▏ | 618/2774 [2:02:02<7:09:40, 11.96s/it] {'loss': 1.0024, 'learning_rate': 4.529385008471272e-06, 'epoch': 0.22} 22%|██▏ | 618/2774 [2:02:02<7:09:40, 11.96s/it] 22%|██▏ | 619/2774 [2:02:14<7:06:49, 11.88s/it] {'loss': 1.0552, 'learning_rate': 4.527678523952067e-06, 'epoch': 0.22} 22%|██▏ | 619/2774 [2:02:14<7:06:49, 11.88s/it] 22%|██▏ | 620/2774 [2:02:26<7:04:53, 11.84s/it] {'loss': 1.0347, 'learning_rate': 4.525969273801845e-06, 'epoch': 0.22} 22%|██▏ | 620/2774 [2:02:26<7:04:53, 11.84s/it] 22%|██▏ | 621/2774 [2:02:38<7:08:16, 11.94s/it] {'loss': 1.0371, 'learning_rate': 4.524257260351917e-06, 'epoch': 0.22} 22%|██▏ | 621/2774 [2:02:38<7:08:16, 11.94s/it] 22%|██▏ | 622/2774 [2:02:49<7:03:21, 11.80s/it] {'loss': 1.0161, 'learning_rate': 4.522542485937369e-06, 'epoch': 0.22} 22%|██▏ | 622/2774 [2:02:49<7:03:21, 11.80s/it] 22%|██▏ | 623/2774 [2:03:01<6:56:47, 11.63s/it] {'loss': 1.0449, 'learning_rate': 4.520824952897048e-06, 'epoch': 0.22} 22%|██▏ | 623/2774 [2:03:01<6:56:47, 11.63s/it] 22%|██▏ | 624/2774 [2:03:12<6:53:34, 11.54s/it] {'loss': 1.085, 'learning_rate': 4.519104663573567e-06, 'epoch': 0.22} 22%|██▏ | 624/2774 [2:03:12<6:53:34, 11.54s/it] 23%|██▎ | 625/2774 [2:03:25<7:10:10, 12.01s/it] {'loss': 1.0317, 'learning_rate': 4.517381620313295e-06, 'epoch': 0.23} 23%|██▎ | 625/2774 [2:03:25<7:10:10, 12.01s/it] 23%|██▎ | 626/2774 [2:03:36<7:01:49, 11.78s/it] {'loss': 1.0098, 'learning_rate': 4.515655825466359e-06, 'epoch': 0.23} 23%|██▎ | 626/2774 [2:03:36<7:01:49, 11.78s/it] 23%|██▎ | 627/2774 [2:03:48<6:59:19, 11.72s/it] {'loss': 1.0015, 'learning_rate': 4.5139272813866395e-06, 'epoch': 0.23} 23%|██▎ | 627/2774 [2:03:48<6:59:19, 11.72s/it] 23%|██▎ | 628/2774 [2:04:00<6:58:11, 11.69s/it] {'loss': 1.0483, 'learning_rate': 4.512195990431767e-06, 'epoch': 0.23} 23%|██▎ | 628/2774 [2:04:00<6:58:11, 11.69s/it] 23%|██▎ | 629/2774 [2:04:12<7:00:37, 11.77s/it] {'loss': 1.0479, 'learning_rate': 4.510461954963116e-06, 'epoch': 0.23} 23%|██▎ | 629/2774 [2:04:12<7:00:37, 11.77s/it] 23%|██▎ | 630/2774 [2:04:23<6:59:02, 11.73s/it] {'loss': 1.0059, 'learning_rate': 4.508725177345809e-06, 'epoch': 0.23} 23%|██▎ | 630/2774 [2:04:23<6:59:02, 11.73s/it] 23%|██▎ | 631/2774 [2:04:35<6:56:12, 11.65s/it] {'loss': 1.0029, 'learning_rate': 4.5069856599487014e-06, 'epoch': 0.23} 23%|██▎ | 631/2774 [2:04:35<6:56:12, 11.65s/it] 23%|██▎ | 632/2774 [2:04:46<6:52:48, 11.56s/it] {'loss': 1.0659, 'learning_rate': 4.505243405144394e-06, 'epoch': 0.23} 23%|██▎ | 632/2774 [2:04:46<6:52:48, 11.56s/it] 23%|██▎ | 633/2774 [2:04:58<6:54:22, 11.61s/it] {'loss': 1.0967, 'learning_rate': 4.5034984153092145e-06, 'epoch': 0.23} 23%|██▎ | 633/2774 [2:04:58<6:54:22, 11.61s/it] 23%|██▎ | 634/2774 [2:05:11<7:10:46, 12.08s/it] {'loss': 0.999, 'learning_rate': 4.501750692823225e-06, 'epoch': 0.23} 23%|██▎ | 634/2774 [2:05:11<7:10:46, 12.08s/it] 23%|██▎ | 635/2774 [2:05:23<7:06:00, 11.95s/it] {'loss': 1.0054, 'learning_rate': 4.500000240070212e-06, 'epoch': 0.23} 23%|██▎ | 635/2774 [2:05:23<7:06:00, 11.95s/it] 23%|██▎ | 636/2774 [2:05:35<7:11:20, 12.11s/it] {'loss': 1.0859, 'learning_rate': 4.498247059437689e-06, 'epoch': 0.23} 23%|██▎ | 636/2774 [2:05:35<7:11:20, 12.11s/it] 23%|██▎ | 637/2774 [2:05:46<7:03:22, 11.89s/it] {'loss': 1.0498, 'learning_rate': 4.496491153316887e-06, 'epoch': 0.23} 23%|██▎ | 637/2774 [2:05:46<7:03:22, 11.89s/it] 23%|██▎ | 638/2774 [2:05:58<7:00:55, 11.82s/it] {'loss': 1.0977, 'learning_rate': 4.494732524102757e-06, 'epoch': 0.23} 23%|██▎ | 638/2774 [2:05:58<7:00:55, 11.82s/it] 23%|██▎ | 639/2774 [2:06:10<7:03:16, 11.90s/it] {'loss': 1.022, 'learning_rate': 4.492971174193963e-06, 'epoch': 0.23} 23%|██▎ | 639/2774 [2:06:10<7:03:16, 11.90s/it] 23%|██▎ | 640/2774 [2:06:21<6:57:31, 11.74s/it] {'loss': 1.0728, 'learning_rate': 4.4912071059928794e-06, 'epoch': 0.23} 23%|██▎ | 640/2774 [2:06:21<6:57:31, 11.74s/it] 23%|██▎ | 641/2774 [2:06:33<6:52:41, 11.61s/it] {'loss': 1.0312, 'learning_rate': 4.489440321905588e-06, 'epoch': 0.23} 23%|██▎ | 641/2774 [2:06:33<6:52:41, 11.61s/it] 23%|██▎ | 642/2774 [2:06:44<6:47:56, 11.48s/it] {'loss': 1.04, 'learning_rate': 4.487670824341877e-06, 'epoch': 0.23} 23%|██▎ | 642/2774 [2:06:44<6:47:56, 11.48s/it] 23%|██▎ | 643/2774 [2:06:55<6:45:41, 11.42s/it] {'loss': 1.0278, 'learning_rate': 4.485898615715233e-06, 'epoch': 0.23} 23%|██▎ | 643/2774 [2:06:55<6:45:41, 11.42s/it] 23%|██▎ | 644/2774 [2:07:07<6:45:05, 11.41s/it] {'loss': 1.083, 'learning_rate': 4.4841236984428426e-06, 'epoch': 0.23} 23%|██▎ | 644/2774 [2:07:07<6:45:05, 11.41s/it] 23%|██▎ | 645/2774 [2:07:18<6:44:05, 11.39s/it] {'loss': 1.0161, 'learning_rate': 4.482346074945585e-06, 'epoch': 0.23} 23%|██▎ | 645/2774 [2:07:18<6:44:05, 11.39s/it] 23%|██▎ | 646/2774 [2:07:29<6:42:57, 11.36s/it] {'loss': 1.0615, 'learning_rate': 4.4805657476480305e-06, 'epoch': 0.23} 23%|██▎ | 646/2774 [2:07:29<6:42:57, 11.36s/it] 23%|██▎ | 647/2774 [2:07:41<6:47:26, 11.49s/it] {'loss': 1.0811, 'learning_rate': 4.4787827189784395e-06, 'epoch': 0.23} 23%|██▎ | 647/2774 [2:07:41<6:47:26, 11.49s/it] 23%|██▎ | 648/2774 [2:07:53<6:50:10, 11.58s/it] {'loss': 1.0283, 'learning_rate': 4.476996991368755e-06, 'epoch': 0.23} 23%|██▎ | 648/2774 [2:07:53<6:50:10, 11.58s/it] 23%|██▎ | 649/2774 [2:08:06<7:02:45, 11.94s/it] {'loss': 1.0137, 'learning_rate': 4.4752085672546005e-06, 'epoch': 0.23} 23%|██▎ | 649/2774 [2:08:06<7:02:45, 11.94s/it] 23%|██▎ | 650/2774 [2:08:17<6:55:43, 11.74s/it] {'loss': 1.0586, 'learning_rate': 4.47341744907528e-06, 'epoch': 0.23} 23%|██▎ | 650/2774 [2:08:17<6:55:43, 11.74s/it] 23%|██▎ | 651/2774 [2:08:29<6:56:12, 11.76s/it] {'loss': 1.0518, 'learning_rate': 4.47162363927377e-06, 'epoch': 0.23} 23%|██▎ | 651/2774 [2:08:29<6:56:12, 11.76s/it] 24%|██▎ | 652/2774 [2:08:42<7:08:37, 12.12s/it] {'loss': 1.0103, 'learning_rate': 4.469827140296719e-06, 'epoch': 0.24} 24%|██▎ | 652/2774 [2:08:42<7:08:37, 12.12s/it] 24%|██▎ | 653/2774 [2:08:53<7:00:43, 11.90s/it] {'loss': 1.0269, 'learning_rate': 4.468027954594442e-06, 'epoch': 0.24} 24%|██▎ | 653/2774 [2:08:53<7:00:43, 11.90s/it] 24%|██▎ | 654/2774 [2:09:05<6:57:21, 11.81s/it] {'loss': 0.998, 'learning_rate': 4.466226084620919e-06, 'epoch': 0.24} 24%|██▎ | 654/2774 [2:09:05<6:57:21, 11.81s/it] 24%|██▎ | 655/2774 [2:09:16<6:50:28, 11.62s/it] {'loss': 1.02, 'learning_rate': 4.464421532833794e-06, 'epoch': 0.24} 24%|██▎ | 655/2774 [2:09:16<6:50:28, 11.62s/it] 24%|██▎ | 656/2774 [2:09:27<6:49:11, 11.59s/it] {'loss': 1.0112, 'learning_rate': 4.462614301694367e-06, 'epoch': 0.24} 24%|██▎ | 656/2774 [2:09:27<6:49:11, 11.59s/it] 24%|██▎ | 657/2774 [2:09:39<6:48:59, 11.59s/it] {'loss': 1.0928, 'learning_rate': 4.460804393667589e-06, 'epoch': 0.24} 24%|██▎ | 657/2774 [2:09:39<6:48:59, 11.59s/it] 24%|██▎ | 658/2774 [2:09:51<6:50:05, 11.63s/it] {'loss': 1.0337, 'learning_rate': 4.458991811222067e-06, 'epoch': 0.24} 24%|██▎ | 658/2774 [2:09:51<6:50:05, 11.63s/it] 24%|██▍ | 659/2774 [2:10:02<6:46:45, 11.54s/it] {'loss': 1.04, 'learning_rate': 4.457176556830054e-06, 'epoch': 0.24} 24%|██▍ | 659/2774 [2:10:02<6:46:45, 11.54s/it] 24%|██▍ | 660/2774 [2:10:13<6:44:41, 11.49s/it] {'loss': 1.063, 'learning_rate': 4.4553586329674484e-06, 'epoch': 0.24} 24%|██▍ | 660/2774 [2:10:13<6:44:41, 11.49s/it] 24%|██▍ | 661/2774 [2:10:25<6:44:07, 11.48s/it] {'loss': 1.0332, 'learning_rate': 4.4535380421137865e-06, 'epoch': 0.24} 24%|██▍ | 661/2774 [2:10:25<6:44:07, 11.48s/it] 24%|██▍ | 662/2774 [2:10:37<6:46:25, 11.55s/it] {'loss': 1.0156, 'learning_rate': 4.451714786752245e-06, 'epoch': 0.24} 24%|██▍ | 662/2774 [2:10:37<6:46:25, 11.55s/it] 24%|██▍ | 663/2774 [2:10:48<6:44:46, 11.50s/it] {'loss': 1.0312, 'learning_rate': 4.449888869369634e-06, 'epoch': 0.24} 24%|██▍ | 663/2774 [2:10:48<6:44:46, 11.50s/it] 24%|██▍ | 664/2774 [2:10:59<6:43:30, 11.47s/it] {'loss': 1.083, 'learning_rate': 4.448060292456395e-06, 'epoch': 0.24} 24%|██▍ | 664/2774 [2:10:59<6:43:30, 11.47s/it] 24%|██▍ | 665/2774 [2:11:11<6:41:01, 11.41s/it] {'loss': 1.0181, 'learning_rate': 4.446229058506596e-06, 'epoch': 0.24} 24%|██▍ | 665/2774 [2:11:11<6:41:01, 11.41s/it] 24%|██▍ | 666/2774 [2:11:22<6:43:55, 11.50s/it] {'loss': 1.0503, 'learning_rate': 4.44439517001793e-06, 'epoch': 0.24} 24%|██▍ | 666/2774 [2:11:22<6:43:55, 11.50s/it] 24%|██▍ | 667/2774 [2:11:36<7:02:27, 12.03s/it] {'loss': 0.9688, 'learning_rate': 4.44255862949171e-06, 'epoch': 0.24} 24%|██▍ | 667/2774 [2:11:36<7:02:27, 12.03s/it] 24%|██▍ | 668/2774 [2:11:47<6:57:46, 11.90s/it] {'loss': 1.0264, 'learning_rate': 4.440719439432866e-06, 'epoch': 0.24} 24%|██▍ | 668/2774 [2:11:47<6:57:46, 11.90s/it] 24%|██▍ | 669/2774 [2:11:58<6:49:59, 11.69s/it] {'loss': 1.0029, 'learning_rate': 4.438877602349941e-06, 'epoch': 0.24} 24%|██▍ | 669/2774 [2:11:58<6:49:59, 11.69s/it] 24%|██▍ | 670/2774 [2:12:12<7:07:08, 12.18s/it] {'loss': 1.0894, 'learning_rate': 4.437033120755092e-06, 'epoch': 0.24} 24%|██▍ | 670/2774 [2:12:12<7:07:08, 12.18s/it] 24%|██▍ | 671/2774 [2:12:23<6:59:52, 11.98s/it] {'loss': 1.0142, 'learning_rate': 4.435185997164079e-06, 'epoch': 0.24} 24%|██▍ | 671/2774 [2:12:23<6:59:52, 11.98s/it] 24%|██▍ | 672/2774 [2:12:35<6:54:24, 11.83s/it] {'loss': 1.0151, 'learning_rate': 4.433336234096267e-06, 'epoch': 0.24} 24%|██▍ | 672/2774 [2:12:35<6:54:24, 11.83s/it] 24%|██▍ | 673/2774 [2:12:46<6:51:31, 11.75s/it] {'loss': 1.0249, 'learning_rate': 4.431483834074621e-06, 'epoch': 0.24} 24%|██▍ | 673/2774 [2:12:46<6:51:31, 11.75s/it] 24%|██▍ | 674/2774 [2:13:00<7:07:31, 12.22s/it] {'loss': 0.9697, 'learning_rate': 4.429628799625704e-06, 'epoch': 0.24} 24%|██▍ | 674/2774 [2:13:00<7:07:31, 12.22s/it] 24%|██▍ | 675/2774 [2:13:11<6:57:36, 11.94s/it] {'loss': 1.0215, 'learning_rate': 4.4277711332796695e-06, 'epoch': 0.24} 24%|██▍ | 675/2774 [2:13:11<6:57:36, 11.94s/it] 24%|██▍ | 676/2774 [2:13:23<6:56:53, 11.92s/it] {'loss': 0.9863, 'learning_rate': 4.425910837570263e-06, 'epoch': 0.24} 24%|██▍ | 676/2774 [2:13:23<6:56:53, 11.92s/it] 24%|██▍ | 677/2774 [2:13:34<6:51:33, 11.78s/it] {'loss': 1.0327, 'learning_rate': 4.4240479150348145e-06, 'epoch': 0.24} 24%|██▍ | 677/2774 [2:13:34<6:51:33, 11.78s/it] 24%|██▍ | 678/2774 [2:13:46<6:49:44, 11.73s/it] {'loss': 1.04, 'learning_rate': 4.4221823682142385e-06, 'epoch': 0.24} 24%|██▍ | 678/2774 [2:13:46<6:49:44, 11.73s/it] 24%|██▍ | 679/2774 [2:13:57<6:43:29, 11.56s/it] {'loss': 1.0503, 'learning_rate': 4.420314199653028e-06, 'epoch': 0.24} 24%|██▍ | 679/2774 [2:13:57<6:43:29, 11.56s/it] 25%|██▍ | 680/2774 [2:14:08<6:40:39, 11.48s/it] {'loss': 1.0229, 'learning_rate': 4.4184434118992525e-06, 'epoch': 0.25} 25%|██▍ | 680/2774 [2:14:08<6:40:39, 11.48s/it] 25%|██▍ | 681/2774 [2:14:20<6:41:52, 11.52s/it] {'loss': 0.9966, 'learning_rate': 4.4165700075045525e-06, 'epoch': 0.25} 25%|██▍ | 681/2774 [2:14:20<6:41:52, 11.52s/it] 25%|██▍ | 682/2774 [2:14:32<6:44:27, 11.60s/it] {'loss': 1.0381, 'learning_rate': 4.41469398902414e-06, 'epoch': 0.25} 25%|██▍ | 682/2774 [2:14:32<6:44:27, 11.60s/it] 25%|██▍ | 683/2774 [2:14:43<6:39:45, 11.47s/it] {'loss': 1.0498, 'learning_rate': 4.412815359016789e-06, 'epoch': 0.25} 25%|██▍ | 683/2774 [2:14:43<6:39:45, 11.47s/it] 25%|██▍ | 684/2774 [2:14:54<6:37:44, 11.42s/it] {'loss': 1.0483, 'learning_rate': 4.410934120044838e-06, 'epoch': 0.25} 25%|██▍ | 684/2774 [2:14:54<6:37:44, 11.42s/it] 25%|██▍ | 685/2774 [2:15:06<6:37:22, 11.41s/it] {'loss': 1.0933, 'learning_rate': 4.4090502746741845e-06, 'epoch': 0.25} 25%|██▍ | 685/2774 [2:15:06<6:37:22, 11.41s/it] 25%|██▍ | 686/2774 [2:15:17<6:36:33, 11.40s/it] {'loss': 0.9897, 'learning_rate': 4.4071638254742795e-06, 'epoch': 0.25} 25%|██▍ | 686/2774 [2:15:17<6:36:33, 11.40s/it] 25%|██▍ | 687/2774 [2:15:28<6:35:39, 11.37s/it] {'loss': 1.0376, 'learning_rate': 4.4052747750181245e-06, 'epoch': 0.25} 25%|██▍ | 687/2774 [2:15:28<6:35:39, 11.37s/it] 25%|██▍ | 688/2774 [2:15:40<6:39:03, 11.48s/it] {'loss': 1.0547, 'learning_rate': 4.40338312588227e-06, 'epoch': 0.25} 25%|██▍ | 688/2774 [2:15:40<6:39:03, 11.48s/it] 25%|██▍ | 689/2774 [2:15:52<6:40:13, 11.52s/it] {'loss': 1.0352, 'learning_rate': 4.401488880646813e-06, 'epoch': 0.25} 25%|██▍ | 689/2774 [2:15:52<6:40:13, 11.52s/it] 25%|██▍ | 690/2774 [2:16:03<6:41:23, 11.56s/it] {'loss': 1.0547, 'learning_rate': 4.399592041895389e-06, 'epoch': 0.25} 25%|██▍ | 690/2774 [2:16:03<6:41:23, 11.56s/it] 25%|██▍ | 691/2774 [2:16:14<6:37:48, 11.46s/it] {'loss': 1.0288, 'learning_rate': 4.397692612215169e-06, 'epoch': 0.25} 25%|██▍ | 691/2774 [2:16:14<6:37:48, 11.46s/it] 25%|██▍ | 692/2774 [2:16:26<6:36:45, 11.43s/it] {'loss': 1.0493, 'learning_rate': 4.395790594196864e-06, 'epoch': 0.25} 25%|██▍ | 692/2774 [2:16:26<6:36:45, 11.43s/it] 25%|██▍ | 693/2774 [2:16:37<6:34:02, 11.36s/it] {'loss': 1.0122, 'learning_rate': 4.39388599043471e-06, 'epoch': 0.25} 25%|██▍ | 693/2774 [2:16:37<6:34:02, 11.36s/it] 25%|██▌ | 694/2774 [2:16:50<6:46:19, 11.72s/it] {'loss': 1.0723, 'learning_rate': 4.391978803526471e-06, 'epoch': 0.25} 25%|██▌ | 694/2774 [2:16:50<6:46:19, 11.72s/it] 25%|██▌ | 695/2774 [2:17:02<6:50:19, 11.84s/it] {'loss': 1.0811, 'learning_rate': 4.390069036073436e-06, 'epoch': 0.25} 25%|██▌ | 695/2774 [2:17:02<6:50:19, 11.84s/it] 25%|██▌ | 696/2774 [2:17:14<6:52:14, 11.90s/it] {'loss': 1.0112, 'learning_rate': 4.3881566906804105e-06, 'epoch': 0.25} 25%|██▌ | 696/2774 [2:17:14<6:52:14, 11.90s/it] 25%|██▌ | 697/2774 [2:17:25<6:50:00, 11.84s/it] {'loss': 1.0562, 'learning_rate': 4.386241769955721e-06, 'epoch': 0.25} 25%|██▌ | 697/2774 [2:17:25<6:50:00, 11.84s/it] 25%|██▌ | 698/2774 [2:17:37<6:43:26, 11.66s/it] {'loss': 1.084, 'learning_rate': 4.3843242765112006e-06, 'epoch': 0.25} 25%|██▌ | 698/2774 [2:17:37<6:43:26, 11.66s/it] 25%|██▌ | 699/2774 [2:17:49<6:54:59, 12.00s/it] {'loss': 0.9971, 'learning_rate': 4.382404212962196e-06, 'epoch': 0.25} 25%|██▌ | 699/2774 [2:17:49<6:54:59, 12.00s/it] 25%|██▌ | 700/2774 [2:18:01<6:48:56, 11.83s/it] {'loss': 0.9771, 'learning_rate': 4.3804815819275585e-06, 'epoch': 0.25} 25%|██▌ | 700/2774 [2:18:01<6:48:56, 11.83s/it] 25%|██▌ | 701/2774 [2:18:12<6:44:53, 11.72s/it] {'loss': 1.0542, 'learning_rate': 4.378556386029638e-06, 'epoch': 0.25} 25%|██▌ | 701/2774 [2:18:12<6:44:53, 11.72s/it] 25%|██▌ | 702/2774 [2:18:23<6:38:47, 11.55s/it] {'loss': 1.0195, 'learning_rate': 4.37662862789429e-06, 'epoch': 0.25} 25%|██▌ | 702/2774 [2:18:23<6:38:47, 11.55s/it] 25%|██▌ | 703/2774 [2:18:35<6:36:09, 11.48s/it] {'loss': 1.0898, 'learning_rate': 4.374698310150856e-06, 'epoch': 0.25} 25%|██▌ | 703/2774 [2:18:35<6:36:09, 11.48s/it] 25%|██▌ | 704/2774 [2:18:46<6:38:18, 11.55s/it] {'loss': 1.0264, 'learning_rate': 4.372765435432176e-06, 'epoch': 0.25} 25%|██▌ | 704/2774 [2:18:46<6:38:18, 11.55s/it] 25%|██▌ | 705/2774 [2:18:59<6:48:57, 11.86s/it] {'loss': 1.0366, 'learning_rate': 4.370830006374571e-06, 'epoch': 0.25} 25%|██▌ | 705/2774 [2:18:59<6:48:57, 11.86s/it] 25%|██▌ | 706/2774 [2:19:12<7:02:01, 12.24s/it] {'loss': 1.0386, 'learning_rate': 4.368892025617852e-06, 'epoch': 0.25} 25%|██▌ | 706/2774 [2:19:12<7:02:01, 12.24s/it] 25%|██▌ | 707/2774 [2:19:24<6:53:58, 12.02s/it] {'loss': 1.0352, 'learning_rate': 4.366951495805306e-06, 'epoch': 0.25} 25%|██▌ | 707/2774 [2:19:24<6:53:58, 12.02s/it] 26%|██▌ | 708/2774 [2:19:35<6:48:18, 11.86s/it] {'loss': 1.0571, 'learning_rate': 4.3650084195837e-06, 'epoch': 0.26} 26%|██▌ | 708/2774 [2:19:35<6:48:18, 11.86s/it] 26%|██▌ | 709/2774 [2:19:47<6:48:50, 11.88s/it] {'loss': 1.0244, 'learning_rate': 4.363062799603271e-06, 'epoch': 0.26} 26%|██▌ | 709/2774 [2:19:47<6:48:50, 11.88s/it] 26%|██▌ | 710/2774 [2:19:59<6:43:51, 11.74s/it] {'loss': 1.0942, 'learning_rate': 4.361114638517728e-06, 'epoch': 0.26} 26%|██▌ | 710/2774 [2:19:59<6:43:51, 11.74s/it] 26%|██▌ | 711/2774 [2:20:10<6:39:28, 11.62s/it] {'loss': 1.0469, 'learning_rate': 4.359163938984245e-06, 'epoch': 0.26} 26%|██▌ | 711/2774 [2:20:10<6:39:28, 11.62s/it] 26%|██▌ | 712/2774 [2:20:21<6:38:23, 11.59s/it] {'loss': 1.0942, 'learning_rate': 4.357210703663458e-06, 'epoch': 0.26} 26%|██▌ | 712/2774 [2:20:21<6:38:23, 11.59s/it] 26%|██▌ | 713/2774 [2:20:33<6:36:14, 11.54s/it] {'loss': 1.0527, 'learning_rate': 4.355254935219462e-06, 'epoch': 0.26} 26%|██▌ | 713/2774 [2:20:33<6:36:14, 11.54s/it] 26%|██▌ | 714/2774 [2:20:49<7:22:05, 12.88s/it] {'loss': 1.0215, 'learning_rate': 4.353296636319808e-06, 'epoch': 0.26} 26%|██▌ | 714/2774 [2:20:49<7:22:05, 12.88s/it] 26%|██▌ | 715/2774 [2:21:00<7:05:03, 12.39s/it] {'loss': 1.0835, 'learning_rate': 4.3513358096354966e-06, 'epoch': 0.26} 26%|██▌ | 715/2774 [2:21:00<7:05:03, 12.39s/it] 26%|██▌ | 716/2774 [2:21:11<6:54:57, 12.10s/it] {'loss': 1.0234, 'learning_rate': 4.3493724578409756e-06, 'epoch': 0.26} 26%|██▌ | 716/2774 [2:21:11<6:54:57, 12.10s/it] 26%|██▌ | 717/2774 [2:21:23<6:47:22, 11.88s/it] {'loss': 1.0161, 'learning_rate': 4.347406583614141e-06, 'epoch': 0.26} 26%|██▌ | 717/2774 [2:21:23<6:47:22, 11.88s/it] 26%|██▌ | 718/2774 [2:21:35<6:45:29, 11.83s/it] {'loss': 1.04, 'learning_rate': 4.3454381896363245e-06, 'epoch': 0.26} 26%|██▌ | 718/2774 [2:21:35<6:45:29, 11.83s/it] 26%|██▌ | 719/2774 [2:21:46<6:45:16, 11.83s/it] {'loss': 1.0562, 'learning_rate': 4.343467278592297e-06, 'epoch': 0.26} 26%|██▌ | 719/2774 [2:21:46<6:45:16, 11.83s/it] 26%|██▌ | 720/2774 [2:21:58<6:40:57, 11.71s/it] {'loss': 1.0371, 'learning_rate': 4.341493853170263e-06, 'epoch': 0.26} 26%|██▌ | 720/2774 [2:21:58<6:40:57, 11.71s/it] 26%|██▌ | 721/2774 [2:22:09<6:37:57, 11.63s/it] {'loss': 1.0205, 'learning_rate': 4.3395179160618545e-06, 'epoch': 0.26} 26%|██▌ | 721/2774 [2:22:09<6:37:57, 11.63s/it] 26%|██▌ | 722/2774 [2:22:21<6:35:44, 11.57s/it] {'loss': 1.0317, 'learning_rate': 4.337539469962131e-06, 'epoch': 0.26} 26%|██▌ | 722/2774 [2:22:21<6:35:44, 11.57s/it] 26%|██▌ | 723/2774 [2:22:32<6:34:17, 11.53s/it] {'loss': 1.022, 'learning_rate': 4.335558517569573e-06, 'epoch': 0.26} 26%|██▌ | 723/2774 [2:22:32<6:34:17, 11.53s/it] 26%|██▌ | 724/2774 [2:22:45<6:46:52, 11.91s/it] {'loss': 1.0137, 'learning_rate': 4.333575061586079e-06, 'epoch': 0.26} 26%|██▌ | 724/2774 [2:22:45<6:46:52, 11.91s/it] 26%|██▌ | 725/2774 [2:22:56<6:41:42, 11.76s/it] {'loss': 1.1064, 'learning_rate': 4.331589104716965e-06, 'epoch': 0.26} 26%|██▌ | 725/2774 [2:22:56<6:41:42, 11.76s/it] 26%|██▌ | 726/2774 [2:23:08<6:40:08, 11.72s/it] {'loss': 1.0703, 'learning_rate': 4.329600649670955e-06, 'epoch': 0.26} 26%|██▌ | 726/2774 [2:23:08<6:40:08, 11.72s/it] 26%|██▌ | 727/2774 [2:23:19<6:37:14, 11.64s/it] {'loss': 1.083, 'learning_rate': 4.327609699160183e-06, 'epoch': 0.26} 26%|██▌ | 727/2774 [2:23:19<6:37:14, 11.64s/it] 26%|██▌ | 728/2774 [2:23:31<6:32:48, 11.52s/it] {'loss': 1.042, 'learning_rate': 4.325616255900183e-06, 'epoch': 0.26} 26%|██▌ | 728/2774 [2:23:31<6:32:48, 11.52s/it] 26%|██▋ | 729/2774 [2:23:44<6:49:19, 12.01s/it] {'loss': 0.9995, 'learning_rate': 4.323620322609894e-06, 'epoch': 0.26} 26%|██▋ | 729/2774 [2:23:44<6:49:19, 12.01s/it] 26%|██▋ | 730/2774 [2:23:55<6:42:33, 11.82s/it] {'loss': 1.0562, 'learning_rate': 4.321621902011645e-06, 'epoch': 0.26} 26%|██▋ | 730/2774 [2:23:55<6:42:33, 11.82s/it] 26%|██▋ | 731/2774 [2:24:06<6:36:56, 11.66s/it] {'loss': 0.9985, 'learning_rate': 4.319620996831164e-06, 'epoch': 0.26} 26%|██▋ | 731/2774 [2:24:06<6:36:56, 11.66s/it] 26%|██▋ | 732/2774 [2:24:20<6:51:27, 12.09s/it] {'loss': 0.9839, 'learning_rate': 4.3176176097975635e-06, 'epoch': 0.26} 26%|██▋ | 732/2774 [2:24:20<6:51:27, 12.09s/it] 26%|██▋ | 733/2774 [2:24:31<6:41:32, 11.80s/it] {'loss': 1.0054, 'learning_rate': 4.315611743643342e-06, 'epoch': 0.26} 26%|██▋ | 733/2774 [2:24:31<6:41:32, 11.80s/it] 26%|██▋ | 734/2774 [2:24:42<6:36:28, 11.66s/it] {'loss': 1.0654, 'learning_rate': 4.31360340110438e-06, 'epoch': 0.26} 26%|██▋ | 734/2774 [2:24:42<6:36:28, 11.66s/it] 26%|██▋ | 735/2774 [2:24:53<6:33:54, 11.59s/it] {'loss': 1.0303, 'learning_rate': 4.311592584919936e-06, 'epoch': 0.26} 26%|██▋ | 735/2774 [2:24:53<6:33:54, 11.59s/it] 27%|██▋ | 736/2774 [2:25:05<6:31:01, 11.51s/it] {'loss': 1.0601, 'learning_rate': 4.309579297832642e-06, 'epoch': 0.27} 27%|██▋ | 736/2774 [2:25:05<6:31:01, 11.51s/it] 27%|██▋ | 737/2774 [2:25:16<6:28:49, 11.45s/it] {'loss': 1.0293, 'learning_rate': 4.307563542588498e-06, 'epoch': 0.27} 27%|██▋ | 737/2774 [2:25:16<6:28:49, 11.45s/it] 27%|██▋ | 738/2774 [2:25:27<6:26:54, 11.40s/it] {'loss': 1.0166, 'learning_rate': 4.305545321936875e-06, 'epoch': 0.27} 27%|██▋ | 738/2774 [2:25:27<6:26:54, 11.40s/it] 27%|██▋ | 739/2774 [2:25:39<6:26:48, 11.40s/it] {'loss': 0.9912, 'learning_rate': 4.303524638630503e-06, 'epoch': 0.27} 27%|██▋ | 739/2774 [2:25:39<6:26:48, 11.40s/it] 27%|██▋ | 740/2774 [2:25:51<6:39:09, 11.77s/it] {'loss': 1.0156, 'learning_rate': 4.301501495425472e-06, 'epoch': 0.27} 27%|██▋ | 740/2774 [2:25:51<6:39:09, 11.77s/it] 27%|██▋ | 741/2774 [2:26:03<6:39:12, 11.78s/it] {'loss': 1.0273, 'learning_rate': 4.299475895081226e-06, 'epoch': 0.27} 27%|██▋ | 741/2774 [2:26:03<6:39:12, 11.78s/it] 27%|██▋ | 742/2774 [2:26:16<6:51:11, 12.14s/it] {'loss': 1.0278, 'learning_rate': 4.297447840360562e-06, 'epoch': 0.27} 27%|██▋ | 742/2774 [2:26:16<6:51:11, 12.14s/it] 27%|██▋ | 743/2774 [2:26:29<6:59:54, 12.40s/it] {'loss': 0.9575, 'learning_rate': 4.295417334029626e-06, 'epoch': 0.27} 27%|██▋ | 743/2774 [2:26:29<6:59:54, 12.40s/it] 27%|██▋ | 744/2774 [2:26:43<7:11:17, 12.75s/it] {'loss': 0.9702, 'learning_rate': 4.293384378857903e-06, 'epoch': 0.27} 27%|██▋ | 744/2774 [2:26:43<7:11:17, 12.75s/it] 27%|██▋ | 745/2774 [2:26:55<7:06:41, 12.62s/it] {'loss': 1.0752, 'learning_rate': 4.291348977618224e-06, 'epoch': 0.27} 27%|██▋ | 745/2774 [2:26:55<7:06:41, 12.62s/it] 27%|██▋ | 746/2774 [2:27:07<6:54:02, 12.25s/it] {'loss': 1.1016, 'learning_rate': 4.289311133086751e-06, 'epoch': 0.27} 27%|██▋ | 746/2774 [2:27:07<6:54:02, 12.25s/it] 27%|██▋ | 747/2774 [2:27:18<6:42:49, 11.92s/it] {'loss': 1.0044, 'learning_rate': 4.287270848042982e-06, 'epoch': 0.27} 27%|██▋ | 747/2774 [2:27:18<6:42:49, 11.92s/it] 27%|██▋ | 748/2774 [2:27:31<6:57:27, 12.36s/it] {'loss': 0.9775, 'learning_rate': 4.285228125269742e-06, 'epoch': 0.27} 27%|██▋ | 748/2774 [2:27:31<6:57:27, 12.36s/it] 27%|██▋ | 749/2774 [2:27:43<6:50:23, 12.16s/it] {'loss': 1.0493, 'learning_rate': 4.283182967553183e-06, 'epoch': 0.27} 27%|██▋ | 749/2774 [2:27:43<6:50:23, 12.16s/it] 27%|██▋ | 750/2774 [2:27:54<6:43:54, 11.97s/it] {'loss': 1.02, 'learning_rate': 4.281135377682775e-06, 'epoch': 0.27} 27%|██▋ | 750/2774 [2:27:54<6:43:54, 11.97s/it] 27%|██▋ | 751/2774 [2:28:06<6:36:54, 11.77s/it] {'loss': 1.084, 'learning_rate': 4.279085358451307e-06, 'epoch': 0.27} 27%|██▋ | 751/2774 [2:28:06<6:36:54, 11.77s/it] 27%|██▋ | 752/2774 [2:28:17<6:34:21, 11.70s/it] {'loss': 1.041, 'learning_rate': 4.277032912654881e-06, 'epoch': 0.27} 27%|██▋ | 752/2774 [2:28:17<6:34:21, 11.70s/it] 27%|██▋ | 753/2774 [2:28:28<6:30:02, 11.58s/it] {'loss': 1.0649, 'learning_rate': 4.27497804309291e-06, 'epoch': 0.27} 27%|██▋ | 753/2774 [2:28:28<6:30:02, 11.58s/it] 27%|██▋ | 754/2774 [2:28:40<6:27:11, 11.50s/it] {'loss': 1.043, 'learning_rate': 4.272920752568112e-06, 'epoch': 0.27} 27%|██▋ | 754/2774 [2:28:40<6:27:11, 11.50s/it] 27%|██▋ | 755/2774 [2:28:52<6:36:08, 11.77s/it] {'loss': 1.0254, 'learning_rate': 4.270861043886506e-06, 'epoch': 0.27} 27%|██▋ | 755/2774 [2:28:52<6:36:08, 11.77s/it] 27%|██▋ | 756/2774 [2:29:03<6:30:48, 11.62s/it] {'loss': 1.0215, 'learning_rate': 4.268798919857412e-06, 'epoch': 0.27} 27%|██▋ | 756/2774 [2:29:03<6:30:48, 11.62s/it] 27%|██▋ | 757/2774 [2:29:16<6:39:41, 11.89s/it] {'loss': 1.0444, 'learning_rate': 4.266734383293441e-06, 'epoch': 0.27} 27%|██▋ | 757/2774 [2:29:16<6:39:41, 11.89s/it] 27%|██▋ | 758/2774 [2:29:30<6:56:36, 12.40s/it] {'loss': 1.0283, 'learning_rate': 4.264667437010497e-06, 'epoch': 0.27} 27%|██▋ | 758/2774 [2:29:30<6:56:36, 12.40s/it] 27%|██▋ | 759/2774 [2:29:41<6:44:40, 12.05s/it] {'loss': 1.0664, 'learning_rate': 4.262598083827769e-06, 'epoch': 0.27} 27%|██▋ | 759/2774 [2:29:41<6:44:40, 12.05s/it] 27%|██▋ | 760/2774 [2:29:52<6:37:13, 11.83s/it] {'loss': 1.0234, 'learning_rate': 4.26052632656773e-06, 'epoch': 0.27} 27%|██▋ | 760/2774 [2:29:52<6:37:13, 11.83s/it] 27%|██▋ | 761/2774 [2:30:03<6:30:31, 11.64s/it] {'loss': 1.0239, 'learning_rate': 4.258452168056132e-06, 'epoch': 0.27} 27%|██▋ | 761/2774 [2:30:03<6:30:31, 11.64s/it] 27%|██▋ | 762/2774 [2:30:15<6:26:38, 11.53s/it] {'loss': 1.0439, 'learning_rate': 4.256375611122003e-06, 'epoch': 0.27} 27%|██▋ | 762/2774 [2:30:15<6:26:38, 11.53s/it] 28%|██▊ | 763/2774 [2:30:26<6:25:45, 11.51s/it] {'loss': 1.0244, 'learning_rate': 4.25429665859764e-06, 'epoch': 0.28} 28%|██▊ | 763/2774 [2:30:26<6:25:45, 11.51s/it] 28%|██▊ | 764/2774 [2:30:38<6:26:20, 11.53s/it] {'loss': 1.0332, 'learning_rate': 4.252215313318608e-06, 'epoch': 0.28} 28%|██▊ | 764/2774 [2:30:38<6:26:20, 11.53s/it] 28%|██▊ | 765/2774 [2:30:49<6:25:23, 11.51s/it] {'loss': 1.0469, 'learning_rate': 4.250131578123737e-06, 'epoch': 0.28} 28%|██▊ | 765/2774 [2:30:49<6:25:23, 11.51s/it] 28%|██▊ | 766/2774 [2:31:00<6:22:58, 11.44s/it] {'loss': 1.0312, 'learning_rate': 4.248045455855116e-06, 'epoch': 0.28} 28%|██▊ | 766/2774 [2:31:00<6:22:58, 11.44s/it] 28%|██▊ | 767/2774 [2:31:12<6:22:17, 11.43s/it] {'loss': 1.0728, 'learning_rate': 4.24595694935809e-06, 'epoch': 0.28} 28%|██▊ | 767/2774 [2:31:12<6:22:17, 11.43s/it] 28%|██▊ | 768/2774 [2:31:23<6:19:39, 11.36s/it] {'loss': 1.0049, 'learning_rate': 4.243866061481256e-06, 'epoch': 0.28} 28%|██▊ | 768/2774 [2:31:23<6:19:39, 11.36s/it] 28%|██▊ | 769/2774 [2:31:35<6:23:16, 11.47s/it] {'loss': 1.0488, 'learning_rate': 4.241772795076458e-06, 'epoch': 0.28} 28%|██▊ | 769/2774 [2:31:35<6:23:16, 11.47s/it] 28%|██▊ | 770/2774 [2:31:46<6:26:05, 11.56s/it] {'loss': 1.0186, 'learning_rate': 4.239677152998784e-06, 'epoch': 0.28} 28%|██▊ | 770/2774 [2:31:46<6:26:05, 11.56s/it] 28%|██▊ | 771/2774 [2:31:59<6:33:33, 11.79s/it] {'loss': 0.9648, 'learning_rate': 4.2375791381065654e-06, 'epoch': 0.28} 28%|██▊ | 771/2774 [2:31:59<6:33:33, 11.79s/it] 28%|██▊ | 772/2774 [2:32:10<6:32:24, 11.76s/it] {'loss': 1.0234, 'learning_rate': 4.235478753261366e-06, 'epoch': 0.28} 28%|██▊ | 772/2774 [2:32:10<6:32:24, 11.76s/it] 28%|██▊ | 773/2774 [2:32:22<6:27:04, 11.61s/it] {'loss': 0.9854, 'learning_rate': 4.233376001327984e-06, 'epoch': 0.28} 28%|██▊ | 773/2774 [2:32:22<6:27:04, 11.61s/it] 28%|██▊ | 774/2774 [2:32:33<6:25:22, 11.56s/it] {'loss': 1.0298, 'learning_rate': 4.231270885174448e-06, 'epoch': 0.28} 28%|██▊ | 774/2774 [2:32:33<6:25:22, 11.56s/it] 28%|██▊ | 775/2774 [2:32:44<6:21:30, 11.45s/it] {'loss': 1.042, 'learning_rate': 4.229163407672007e-06, 'epoch': 0.28} 28%|██▊ | 775/2774 [2:32:44<6:21:30, 11.45s/it] 28%|██▊ | 776/2774 [2:32:57<6:30:20, 11.72s/it] {'loss': 1.0093, 'learning_rate': 4.2270535716951345e-06, 'epoch': 0.28} 28%|██▊ | 776/2774 [2:32:57<6:30:20, 11.72s/it] 28%|██▊ | 777/2774 [2:33:08<6:26:31, 11.61s/it] {'loss': 1.0083, 'learning_rate': 4.224941380121518e-06, 'epoch': 0.28} 28%|██▊ | 777/2774 [2:33:08<6:26:31, 11.61s/it] 28%|██▊ | 778/2774 [2:33:20<6:26:56, 11.63s/it] {'loss': 1.0776, 'learning_rate': 4.2228268358320605e-06, 'epoch': 0.28} 28%|██▊ | 778/2774 [2:33:20<6:26:56, 11.63s/it] 28%|██▊ | 779/2774 [2:33:31<6:21:41, 11.48s/it] {'loss': 1.0464, 'learning_rate': 4.220709941710871e-06, 'epoch': 0.28} 28%|██▊ | 779/2774 [2:33:31<6:21:41, 11.48s/it] 28%|██▊ | 780/2774 [2:33:42<6:21:10, 11.47s/it] {'loss': 1.0151, 'learning_rate': 4.218590700645267e-06, 'epoch': 0.28} 28%|██▊ | 780/2774 [2:33:42<6:21:10, 11.47s/it] 28%|██▊ | 781/2774 [2:33:54<6:20:57, 11.47s/it] {'loss': 1.0562, 'learning_rate': 4.216469115525763e-06, 'epoch': 0.28} 28%|██▊ | 781/2774 [2:33:54<6:20:57, 11.47s/it] 28%|██▊ | 782/2774 [2:34:05<6:19:40, 11.44s/it] {'loss': 1.0498, 'learning_rate': 4.214345189246077e-06, 'epoch': 0.28} 28%|██▊ | 782/2774 [2:34:05<6:19:40, 11.44s/it] 28%|██▊ | 783/2774 [2:34:20<6:58:43, 12.62s/it] {'loss': 1.0576, 'learning_rate': 4.212218924703111e-06, 'epoch': 0.28} 28%|██▊ | 783/2774 [2:34:20<6:58:43, 12.62s/it] 28%|██▊ | 784/2774 [2:34:32<6:49:44, 12.35s/it] {'loss': 1.0464, 'learning_rate': 4.210090324796965e-06, 'epoch': 0.28} 28%|██▊ | 784/2774 [2:34:32<6:49:44, 12.35s/it] 28%|██▊ | 785/2774 [2:34:44<6:39:55, 12.06s/it] {'loss': 1.04, 'learning_rate': 4.207959392430921e-06, 'epoch': 0.28} 28%|██▊ | 785/2774 [2:34:44<6:39:55, 12.06s/it] 28%|██▊ | 786/2774 [2:34:55<6:37:16, 11.99s/it] {'loss': 1.0273, 'learning_rate': 4.205826130511439e-06, 'epoch': 0.28} 28%|██▊ | 786/2774 [2:34:55<6:37:16, 11.99s/it] 28%|██▊ | 787/2774 [2:35:07<6:30:35, 11.79s/it] {'loss': 1.064, 'learning_rate': 4.2036905419481615e-06, 'epoch': 0.28} 28%|██▊ | 787/2774 [2:35:07<6:30:35, 11.79s/it] 28%|██▊ | 788/2774 [2:35:18<6:28:17, 11.73s/it] {'loss': 1.0757, 'learning_rate': 4.201552629653902e-06, 'epoch': 0.28} 28%|██▊ | 788/2774 [2:35:18<6:28:17, 11.73s/it] 28%|██▊ | 789/2774 [2:35:30<6:24:30, 11.62s/it] {'loss': 0.9976, 'learning_rate': 4.1994123965446435e-06, 'epoch': 0.28} 28%|██▊ | 789/2774 [2:35:30<6:24:30, 11.62s/it] 28%|██▊ | 790/2774 [2:35:43<6:35:54, 11.97s/it] {'loss': 0.9971, 'learning_rate': 4.197269845539535e-06, 'epoch': 0.28} 28%|██▊ | 790/2774 [2:35:43<6:35:54, 11.97s/it] 29%|██▊ | 791/2774 [2:35:54<6:30:00, 11.80s/it] {'loss': 1.0527, 'learning_rate': 4.1951249795608865e-06, 'epoch': 0.29} 29%|██▊ | 791/2774 [2:35:54<6:30:00, 11.80s/it] 29%|██▊ | 792/2774 [2:36:05<6:26:18, 11.69s/it] {'loss': 1.0518, 'learning_rate': 4.192977801534165e-06, 'epoch': 0.29} 29%|██▊ | 792/2774 [2:36:05<6:26:18, 11.69s/it] 29%|██▊ | 793/2774 [2:36:17<6:23:55, 11.63s/it] {'loss': 1.063, 'learning_rate': 4.1908283143879925e-06, 'epoch': 0.29} 29%|██▊ | 793/2774 [2:36:17<6:23:55, 11.63s/it] 29%|██▊ | 794/2774 [2:36:29<6:26:43, 11.72s/it] {'loss': 0.9517, 'learning_rate': 4.188676521054139e-06, 'epoch': 0.29} 29%|██▊ | 794/2774 [2:36:29<6:26:43, 11.72s/it] 29%|██▊ | 795/2774 [2:36:40<6:25:00, 11.67s/it] {'loss': 1.0679, 'learning_rate': 4.186522424467522e-06, 'epoch': 0.29} 29%|██▊ | 795/2774 [2:36:40<6:25:00, 11.67s/it] 29%|██▊ | 796/2774 [2:36:52<6:21:45, 11.58s/it] {'loss': 1.0723, 'learning_rate': 4.1843660275661964e-06, 'epoch': 0.29} 29%|██▊ | 796/2774 [2:36:52<6:21:45, 11.58s/it] 29%|██▊ | 797/2774 [2:37:03<6:23:06, 11.63s/it] {'loss': 1.0864, 'learning_rate': 4.1822073332913605e-06, 'epoch': 0.29} 29%|██▊ | 797/2774 [2:37:03<6:23:06, 11.63s/it] 29%|██▉ | 798/2774 [2:37:15<6:22:30, 11.61s/it] {'loss': 1.0532, 'learning_rate': 4.1800463445873405e-06, 'epoch': 0.29} 29%|██▉ | 798/2774 [2:37:15<6:22:30, 11.61s/it] 29%|██▉ | 799/2774 [2:37:28<6:32:21, 11.92s/it] {'loss': 0.998, 'learning_rate': 4.177883064401596e-06, 'epoch': 0.29} 29%|██▉ | 799/2774 [2:37:28<6:32:21, 11.92s/it] 29%|██▉ | 800/2774 [2:37:41<6:45:51, 12.34s/it] {'loss': 0.9844, 'learning_rate': 4.175717495684709e-06, 'epoch': 0.29} 29%|██▉ | 800/2774 [2:37:41<6:45:51, 12.34s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 29%|██▉ | 801/2774 [2:38:20<11:09:33, 20.36s/it] {'loss': 1.0322, 'learning_rate': 4.1735496413903855e-06, 'epoch': 0.29} 29%|██▉ | 801/2774 [2:38:20<11:09:33, 20.36s/it] 29%|██▉ | 802/2774 [2:38:32<9:42:41, 17.73s/it] {'loss': 1.0605, 'learning_rate': 4.171379504475447e-06, 'epoch': 0.29} 29%|██▉ | 802/2774 [2:38:32<9:42:41, 17.73s/it] 29%|██▉ | 803/2774 [2:38:43<8:40:47, 15.85s/it] {'loss': 1.041, 'learning_rate': 4.169207087899829e-06, 'epoch': 0.29} 29%|██▉ | 803/2774 [2:38:43<8:40:47, 15.85s/it] 29%|██▉ | 804/2774 [2:38:54<7:54:39, 14.46s/it] {'loss': 1.0391, 'learning_rate': 4.167032394626579e-06, 'epoch': 0.29} 29%|██▉ | 804/2774 [2:38:54<7:54:39, 14.46s/it] 29%|██▉ | 805/2774 [2:39:06<7:31:41, 13.76s/it] {'loss': 1.0249, 'learning_rate': 4.164855427621844e-06, 'epoch': 0.29} 29%|██▉ | 805/2774 [2:39:06<7:31:41, 13.76s/it] 29%|██▉ | 806/2774 [2:39:18<7:07:35, 13.04s/it] {'loss': 1.0601, 'learning_rate': 4.162676189854877e-06, 'epoch': 0.29} 29%|██▉ | 806/2774 [2:39:18<7:07:35, 13.04s/it] 29%|██▉ | 807/2774 [2:39:29<6:52:33, 12.58s/it] {'loss': 1.085, 'learning_rate': 4.160494684298027e-06, 'epoch': 0.29} 29%|██▉ | 807/2774 [2:39:29<6:52:33, 12.58s/it] 29%|██▉ | 808/2774 [2:39:41<6:42:00, 12.27s/it] {'loss': 1.0444, 'learning_rate': 4.158310913926735e-06, 'epoch': 0.29} 29%|██▉ | 808/2774 [2:39:41<6:42:00, 12.27s/it] 29%|██▉ | 809/2774 [2:39:53<6:41:19, 12.25s/it] {'loss': 0.9932, 'learning_rate': 4.156124881719533e-06, 'epoch': 0.29} 29%|██▉ | 809/2774 [2:39:53<6:41:19, 12.25s/it] 29%|██▉ | 810/2774 [2:40:05<6:36:39, 12.12s/it] {'loss': 1.0425, 'learning_rate': 4.1539365906580354e-06, 'epoch': 0.29} 29%|██▉ | 810/2774 [2:40:05<6:36:39, 12.12s/it] 29%|██▉ | 811/2774 [2:40:16<6:31:21, 11.96s/it] {'loss': 1.0381, 'learning_rate': 4.15174604372694e-06, 'epoch': 0.29} 29%|██▉ | 811/2774 [2:40:16<6:31:21, 11.96s/it] 29%|██▉ | 812/2774 [2:40:28<6:24:49, 11.77s/it] {'loss': 1.02, 'learning_rate': 4.14955324391402e-06, 'epoch': 0.29} 29%|██▉ | 812/2774 [2:40:28<6:24:49, 11.77s/it] 29%|██▉ | 813/2774 [2:40:40<6:24:07, 11.75s/it] {'loss': 1.019, 'learning_rate': 4.147358194210122e-06, 'epoch': 0.29} 29%|██▉ | 813/2774 [2:40:40<6:24:07, 11.75s/it] 29%|██▉ | 814/2774 [2:40:52<6:33:59, 12.06s/it] {'loss': 0.9517, 'learning_rate': 4.14516089760916e-06, 'epoch': 0.29} 29%|██▉ | 814/2774 [2:40:52<6:33:59, 12.06s/it] 29%|██▉ | 815/2774 [2:41:04<6:31:49, 12.00s/it] {'loss': 1.0571, 'learning_rate': 4.1429613571081164e-06, 'epoch': 0.29} 29%|██▉ | 815/2774 [2:41:04<6:31:49, 12.00s/it] 29%|██▉ | 816/2774 [2:41:16<6:30:26, 11.96s/it] {'loss': 1.0913, 'learning_rate': 4.140759575707031e-06, 'epoch': 0.29} 29%|██▉ | 816/2774 [2:41:16<6:30:26, 11.96s/it] 29%|██▉ | 817/2774 [2:41:27<6:23:57, 11.77s/it] {'loss': 1.0903, 'learning_rate': 4.138555556408998e-06, 'epoch': 0.29} 29%|██▉ | 817/2774 [2:41:27<6:23:57, 11.77s/it] 29%|██▉ | 818/2774 [2:41:41<6:37:56, 12.21s/it] {'loss': 1.0503, 'learning_rate': 4.13634930222017e-06, 'epoch': 0.29} 29%|██▉ | 818/2774 [2:41:41<6:37:56, 12.21s/it] 30%|██▉ | 819/2774 [2:41:52<6:26:57, 11.88s/it] {'loss': 1.0332, 'learning_rate': 4.134140816149742e-06, 'epoch': 0.3} 30%|██▉ | 819/2774 [2:41:52<6:26:57, 11.88s/it] 30%|██▉ | 820/2774 [2:42:03<6:25:37, 11.84s/it] {'loss': 1.0962, 'learning_rate': 4.1319301012099575e-06, 'epoch': 0.3} 30%|██▉ | 820/2774 [2:42:03<6:25:37, 11.84s/it] 30%|██▉ | 821/2774 [2:42:15<6:22:16, 11.74s/it] {'loss': 1.0664, 'learning_rate': 4.1297171604160965e-06, 'epoch': 0.3} 30%|██▉ | 821/2774 [2:42:15<6:22:16, 11.74s/it] 30%|██▉ | 822/2774 [2:42:27<6:25:09, 11.84s/it] {'loss': 1.1011, 'learning_rate': 4.127501996786477e-06, 'epoch': 0.3} 30%|██▉ | 822/2774 [2:42:27<6:25:09, 11.84s/it] 30%|██▉ | 823/2774 [2:42:39<6:22:18, 11.76s/it] {'loss': 0.9912, 'learning_rate': 4.125284613342449e-06, 'epoch': 0.3} 30%|██▉ | 823/2774 [2:42:39<6:22:18, 11.76s/it] 30%|██▉ | 824/2774 [2:42:50<6:22:26, 11.77s/it] {'loss': 1.0684, 'learning_rate': 4.1230650131083884e-06, 'epoch': 0.3} 30%|██▉ | 824/2774 [2:42:50<6:22:26, 11.77s/it] 30%|██▉ | 825/2774 [2:43:02<6:16:50, 11.60s/it] {'loss': 1.0308, 'learning_rate': 4.120843199111697e-06, 'epoch': 0.3} 30%|██▉ | 825/2774 [2:43:02<6:16:50, 11.60s/it] 30%|██▉ | 826/2774 [2:43:13<6:14:04, 11.52s/it] {'loss': 1.0405, 'learning_rate': 4.118619174382794e-06, 'epoch': 0.3} 30%|██▉ | 826/2774 [2:43:13<6:14:04, 11.52s/it] 30%|██▉ | 827/2774 [2:43:24<6:13:15, 11.50s/it] {'loss': 1.0684, 'learning_rate': 4.116392941955117e-06, 'epoch': 0.3} 30%|██▉ | 827/2774 [2:43:24<6:13:15, 11.50s/it] 30%|██▉ | 828/2774 [2:43:36<6:13:08, 11.50s/it] {'loss': 0.9971, 'learning_rate': 4.114164504865108e-06, 'epoch': 0.3} 30%|██▉ | 828/2774 [2:43:36<6:13:08, 11.50s/it] 30%|██▉ | 829/2774 [2:43:47<6:11:23, 11.46s/it] {'loss': 1.0957, 'learning_rate': 4.111933866152225e-06, 'epoch': 0.3} 30%|██▉ | 829/2774 [2:43:47<6:11:23, 11.46s/it] 30%|██▉ | 830/2774 [2:43:59<6:13:11, 11.52s/it] {'loss': 1.1035, 'learning_rate': 4.1097010288589225e-06, 'epoch': 0.3} 30%|██▉ | 830/2774 [2:43:59<6:13:11, 11.52s/it] 30%|██▉ | 831/2774 [2:44:10<6:10:36, 11.44s/it] {'loss': 1.0396, 'learning_rate': 4.107465996030657e-06, 'epoch': 0.3} 30%|██▉ | 831/2774 [2:44:10<6:10:36, 11.44s/it] 30%|██▉ | 832/2774 [2:44:22<6:12:21, 11.50s/it] {'loss': 1.061, 'learning_rate': 4.105228770715876e-06, 'epoch': 0.3} 30%|██▉ | 832/2774 [2:44:22<6:12:21, 11.50s/it] 30%|███ | 833/2774 [2:44:33<6:13:52, 11.56s/it] {'loss': 1.0474, 'learning_rate': 4.102989355966021e-06, 'epoch': 0.3} 30%|███ | 833/2774 [2:44:33<6:13:52, 11.56s/it] 30%|███ | 834/2774 [2:44:45<6:12:24, 11.52s/it] {'loss': 1.0752, 'learning_rate': 4.100747754835518e-06, 'epoch': 0.3} 30%|███ | 834/2774 [2:44:45<6:12:24, 11.52s/it] 30%|███ | 835/2774 [2:44:58<6:30:46, 12.09s/it] {'loss': 1.0083, 'learning_rate': 4.098503970381777e-06, 'epoch': 0.3} 30%|███ | 835/2774 [2:44:58<6:30:46, 12.09s/it] 30%|███ | 836/2774 [2:45:10<6:26:56, 11.98s/it] {'loss': 0.9746, 'learning_rate': 4.0962580056651835e-06, 'epoch': 0.3} 30%|███ | 836/2774 [2:45:10<6:26:56, 11.98s/it] 30%|███ | 837/2774 [2:45:23<6:33:24, 12.19s/it] {'loss': 0.9878, 'learning_rate': 4.0940098637490964e-06, 'epoch': 0.3} 30%|███ | 837/2774 [2:45:23<6:33:24, 12.19s/it] 30%|███ | 838/2774 [2:45:35<6:32:04, 12.15s/it] {'loss': 1.063, 'learning_rate': 4.091759547699848e-06, 'epoch': 0.3} 30%|███ | 838/2774 [2:45:35<6:32:04, 12.15s/it] 30%|███ | 839/2774 [2:45:46<6:25:46, 11.96s/it] {'loss': 1.0576, 'learning_rate': 4.089507060586731e-06, 'epoch': 0.3} 30%|███ | 839/2774 [2:45:46<6:25:46, 11.96s/it] 30%|███ | 840/2774 [2:45:59<6:34:22, 12.24s/it] {'loss': 1.0542, 'learning_rate': 4.087252405482002e-06, 'epoch': 0.3} 30%|███ | 840/2774 [2:45:59<6:34:22, 12.24s/it] 30%|███ | 841/2774 [2:46:11<6:25:59, 11.98s/it] {'loss': 1.0767, 'learning_rate': 4.084995585460877e-06, 'epoch': 0.3} 30%|███ | 841/2774 [2:46:11<6:25:59, 11.98s/it] 30%|███ | 842/2774 [2:46:22<6:21:41, 11.85s/it] {'loss': 1.0278, 'learning_rate': 4.082736603601519e-06, 'epoch': 0.3} 30%|███ | 842/2774 [2:46:22<6:21:41, 11.85s/it] 30%|███ | 843/2774 [2:46:34<6:18:35, 11.76s/it] {'loss': 1.0537, 'learning_rate': 4.0804754629850445e-06, 'epoch': 0.3} 30%|███ | 843/2774 [2:46:34<6:18:35, 11.76s/it] 30%|███ | 844/2774 [2:46:45<6:13:51, 11.62s/it] {'loss': 1.0283, 'learning_rate': 4.078212166695513e-06, 'epoch': 0.3} 30%|███ | 844/2774 [2:46:45<6:13:51, 11.62s/it] 30%|███ | 845/2774 [2:46:57<6:15:41, 11.69s/it] {'loss': 1.0361, 'learning_rate': 4.075946717819923e-06, 'epoch': 0.3} 30%|███ | 845/2774 [2:46:57<6:15:41, 11.69s/it] 30%|███ | 846/2774 [2:47:08<6:13:44, 11.63s/it] {'loss': 1.0493, 'learning_rate': 4.07367911944821e-06, 'epoch': 0.3} 30%|███ | 846/2774 [2:47:08<6:13:44, 11.63s/it] 31%|███ | 847/2774 [2:47:19<6:08:40, 11.48s/it] {'loss': 1.0215, 'learning_rate': 4.071409374673241e-06, 'epoch': 0.31} 31%|███ | 847/2774 [2:47:19<6:08:40, 11.48s/it] 31%|███ | 848/2774 [2:47:32<6:19:41, 11.83s/it] {'loss': 0.9629, 'learning_rate': 4.069137486590812e-06, 'epoch': 0.31} 31%|███ | 848/2774 [2:47:32<6:19:41, 11.83s/it] 31%|███ | 849/2774 [2:47:43<6:12:58, 11.63s/it] {'loss': 1.0264, 'learning_rate': 4.06686345829964e-06, 'epoch': 0.31} 31%|███ | 849/2774 [2:47:43<6:12:58, 11.63s/it] 31%|███ | 850/2774 [2:47:55<6:14:44, 11.69s/it] {'loss': 1.0, 'learning_rate': 4.0645872929013626e-06, 'epoch': 0.31} 31%|███ | 850/2774 [2:47:55<6:14:44, 11.69s/it] 31%|███ | 851/2774 [2:48:06<6:11:23, 11.59s/it] {'loss': 1.0107, 'learning_rate': 4.062308993500531e-06, 'epoch': 0.31} 31%|███ | 851/2774 [2:48:06<6:11:23, 11.59s/it] 31%|███ | 852/2774 [2:48:18<6:08:01, 11.49s/it] {'loss': 1.0898, 'learning_rate': 4.06002856320461e-06, 'epoch': 0.31} 31%|███ | 852/2774 [2:48:18<6:08:01, 11.49s/it] 31%|███ | 853/2774 [2:48:29<6:07:11, 11.47s/it] {'loss': 1.0215, 'learning_rate': 4.057746005123966e-06, 'epoch': 0.31} 31%|███ | 853/2774 [2:48:29<6:07:11, 11.47s/it] 31%|███ | 854/2774 [2:48:41<6:07:14, 11.48s/it] {'loss': 1.0684, 'learning_rate': 4.055461322371873e-06, 'epoch': 0.31} 31%|███ | 854/2774 [2:48:41<6:07:14, 11.48s/it] 31%|███ | 855/2774 [2:48:52<6:05:49, 11.44s/it] {'loss': 0.9888, 'learning_rate': 4.053174518064499e-06, 'epoch': 0.31} 31%|███ | 855/2774 [2:48:52<6:05:49, 11.44s/it] 31%|███ | 856/2774 [2:49:05<6:22:33, 11.97s/it] {'loss': 1.0635, 'learning_rate': 4.050885595320906e-06, 'epoch': 0.31} 31%|███ | 856/2774 [2:49:05<6:22:33, 11.97s/it] 31%|███ | 857/2774 [2:49:17<6:18:16, 11.84s/it] {'loss': 1.0342, 'learning_rate': 4.048594557263049e-06, 'epoch': 0.31} 31%|███ | 857/2774 [2:49:17<6:18:16, 11.84s/it] 31%|███ | 858/2774 [2:49:29<6:22:13, 11.97s/it] {'loss': 0.9648, 'learning_rate': 4.046301407015763e-06, 'epoch': 0.31} 31%|███ | 858/2774 [2:49:29<6:22:13, 11.97s/it] 31%|███ | 859/2774 [2:49:40<6:16:49, 11.81s/it] {'loss': 1.063, 'learning_rate': 4.044006147706768e-06, 'epoch': 0.31} 31%|███ | 859/2774 [2:49:40<6:16:49, 11.81s/it] 31%|███ | 860/2774 [2:49:52<6:15:39, 11.78s/it] {'loss': 1.085, 'learning_rate': 4.041708782466657e-06, 'epoch': 0.31} 31%|███ | 860/2774 [2:49:52<6:15:39, 11.78s/it] 31%|███ | 861/2774 [2:50:04<6:14:00, 11.73s/it] {'loss': 1.0498, 'learning_rate': 4.039409314428899e-06, 'epoch': 0.31} 31%|███ | 861/2774 [2:50:04<6:14:00, 11.73s/it] 31%|███ | 862/2774 [2:50:15<6:10:54, 11.64s/it] {'loss': 1.0103, 'learning_rate': 4.0371077467298305e-06, 'epoch': 0.31} 31%|███ | 862/2774 [2:50:15<6:10:54, 11.64s/it] 31%|███ | 863/2774 [2:50:27<6:09:42, 11.61s/it] {'loss': 1.0767, 'learning_rate': 4.0348040825086485e-06, 'epoch': 0.31} 31%|███ | 863/2774 [2:50:27<6:09:42, 11.61s/it] 31%|███ | 864/2774 [2:50:38<6:08:23, 11.57s/it] {'loss': 0.9951, 'learning_rate': 4.032498324907413e-06, 'epoch': 0.31} 31%|███ | 864/2774 [2:50:38<6:08:23, 11.57s/it] 31%|███ | 865/2774 [2:50:50<6:06:07, 11.51s/it] {'loss': 1.0034, 'learning_rate': 4.030190477071039e-06, 'epoch': 0.31} 31%|███ | 865/2774 [2:50:50<6:06:07, 11.51s/it] 31%|███ | 866/2774 [2:51:01<6:06:12, 11.52s/it] {'loss': 1.0073, 'learning_rate': 4.02788054214729e-06, 'epoch': 0.31} 31%|███ | 866/2774 [2:51:01<6:06:12, 11.52s/it] 31%|███▏ | 867/2774 [2:51:12<6:02:25, 11.40s/it] {'loss': 0.9995, 'learning_rate': 4.025568523286778e-06, 'epoch': 0.31} 31%|███▏ | 867/2774 [2:51:12<6:02:25, 11.40s/it] 31%|███▏ | 868/2774 [2:51:24<6:08:10, 11.59s/it] {'loss': 1.0288, 'learning_rate': 4.023254423642957e-06, 'epoch': 0.31} 31%|███▏ | 868/2774 [2:51:24<6:08:10, 11.59s/it] 31%|███▏ | 869/2774 [2:51:36<6:06:46, 11.55s/it] {'loss': 1.02, 'learning_rate': 4.020938246372119e-06, 'epoch': 0.31} 31%|███▏ | 869/2774 [2:51:36<6:06:46, 11.55s/it] 31%|███▏ | 870/2774 [2:51:48<6:13:43, 11.78s/it] {'loss': 1.0273, 'learning_rate': 4.018619994633391e-06, 'epoch': 0.31} 31%|███▏ | 870/2774 [2:51:48<6:13:43, 11.78s/it] 31%|███▏ | 871/2774 [2:51:59<6:07:54, 11.60s/it] {'loss': 0.9927, 'learning_rate': 4.016299671588728e-06, 'epoch': 0.31} 31%|███▏ | 871/2774 [2:51:59<6:07:54, 11.60s/it] 31%|███▏ | 872/2774 [2:52:10<6:04:32, 11.50s/it] {'loss': 1.0015, 'learning_rate': 4.01397728040291e-06, 'epoch': 0.31} 31%|███▏ | 872/2774 [2:52:10<6:04:32, 11.50s/it] 31%|███▏ | 873/2774 [2:52:22<6:00:31, 11.38s/it] {'loss': 1.0078, 'learning_rate': 4.011652824243538e-06, 'epoch': 0.31} 31%|███▏ | 873/2774 [2:52:22<6:00:31, 11.38s/it] 32%|███▏ | 874/2774 [2:52:33<6:00:07, 11.37s/it] {'loss': 1.0513, 'learning_rate': 4.0093263062810305e-06, 'epoch': 0.32} 32%|███▏ | 874/2774 [2:52:33<6:00:07, 11.37s/it] 32%|███▏ | 875/2774 [2:52:44<5:59:24, 11.36s/it] {'loss': 1.0601, 'learning_rate': 4.006997729688616e-06, 'epoch': 0.32} 32%|███▏ | 875/2774 [2:52:44<5:59:24, 11.36s/it] 32%|███▏ | 876/2774 [2:52:55<5:58:04, 11.32s/it] {'loss': 1.0576, 'learning_rate': 4.004667097642334e-06, 'epoch': 0.32} 32%|███▏ | 876/2774 [2:52:55<5:58:04, 11.32s/it] 32%|███▏ | 877/2774 [2:53:07<5:58:39, 11.34s/it] {'loss': 1.0464, 'learning_rate': 4.002334413321026e-06, 'epoch': 0.32} 32%|███▏ | 877/2774 [2:53:07<5:58:39, 11.34s/it] 32%|███▏ | 878/2774 [2:53:18<6:00:52, 11.42s/it] {'loss': 1.0293, 'learning_rate': 3.999999679906331e-06, 'epoch': 0.32} 32%|███▏ | 878/2774 [2:53:18<6:00:52, 11.42s/it] 32%|███▏ | 879/2774 [2:53:30<5:59:43, 11.39s/it] {'loss': 1.0732, 'learning_rate': 3.997662900582685e-06, 'epoch': 0.32} 32%|███▏ | 879/2774 [2:53:30<5:59:43, 11.39s/it] 32%|███▏ | 880/2774 [2:53:41<6:00:37, 11.42s/it] {'loss': 1.0327, 'learning_rate': 3.9953240785373145e-06, 'epoch': 0.32} 32%|███▏ | 880/2774 [2:53:41<6:00:37, 11.42s/it] 32%|███▏ | 881/2774 [2:53:53<5:59:42, 11.40s/it] {'loss': 1.002, 'learning_rate': 3.992983216960231e-06, 'epoch': 0.32} 32%|███▏ | 881/2774 [2:53:53<5:59:42, 11.40s/it] 32%|███▏ | 882/2774 [2:54:06<6:15:28, 11.91s/it] {'loss': 0.9893, 'learning_rate': 3.990640319044228e-06, 'epoch': 0.32} 32%|███▏ | 882/2774 [2:54:06<6:15:28, 11.91s/it] 32%|███▏ | 883/2774 [2:54:17<6:11:05, 11.77s/it] {'loss': 0.9893, 'learning_rate': 3.9882953879848764e-06, 'epoch': 0.32} 32%|███▏ | 883/2774 [2:54:17<6:11:05, 11.77s/it] 32%|███▏ | 884/2774 [2:54:29<6:09:54, 11.74s/it] {'loss': 1.061, 'learning_rate': 3.9859484269805215e-06, 'epoch': 0.32} 32%|███▏ | 884/2774 [2:54:29<6:09:54, 11.74s/it] 32%|███▏ | 885/2774 [2:54:40<6:04:38, 11.58s/it] {'loss': 1.0249, 'learning_rate': 3.9835994392322755e-06, 'epoch': 0.32} 32%|███▏ | 885/2774 [2:54:40<6:04:38, 11.58s/it] 32%|███▏ | 886/2774 [2:54:53<6:18:40, 12.03s/it] {'loss': 0.9839, 'learning_rate': 3.981248427944017e-06, 'epoch': 0.32} 32%|███▏ | 886/2774 [2:54:53<6:18:40, 12.03s/it] 32%|███▏ | 887/2774 [2:55:05<6:21:20, 12.13s/it] {'loss': 1.0376, 'learning_rate': 3.9788953963223815e-06, 'epoch': 0.32} 32%|███▏ | 887/2774 [2:55:05<6:21:20, 12.13s/it] 32%|███▏ | 888/2774 [2:55:17<6:15:24, 11.94s/it] {'loss': 1.0083, 'learning_rate': 3.976540347576763e-06, 'epoch': 0.32} 32%|███▏ | 888/2774 [2:55:17<6:15:24, 11.94s/it] 32%|███▏ | 889/2774 [2:55:28<6:09:04, 11.75s/it] {'loss': 1.0679, 'learning_rate': 3.974183284919306e-06, 'epoch': 0.32} 32%|███▏ | 889/2774 [2:55:28<6:09:04, 11.75s/it] 32%|███▏ | 890/2774 [2:55:40<6:07:06, 11.69s/it] {'loss': 1.0908, 'learning_rate': 3.971824211564902e-06, 'epoch': 0.32} 32%|███▏ | 890/2774 [2:55:40<6:07:06, 11.69s/it] 32%|███▏ | 891/2774 [2:55:51<6:05:49, 11.66s/it] {'loss': 1.0347, 'learning_rate': 3.969463130731183e-06, 'epoch': 0.32} 32%|███▏ | 891/2774 [2:55:51<6:05:49, 11.66s/it] 32%|███▏ | 892/2774 [2:56:03<6:04:13, 11.61s/it] {'loss': 1.0439, 'learning_rate': 3.967100045638522e-06, 'epoch': 0.32} 32%|███▏ | 892/2774 [2:56:03<6:04:13, 11.61s/it] 32%|███▏ | 893/2774 [2:56:16<6:15:10, 11.97s/it] {'loss': 0.9761, 'learning_rate': 3.964734959510024e-06, 'epoch': 0.32} 32%|███▏ | 893/2774 [2:56:16<6:15:10, 11.97s/it] 32%|███▏ | 894/2774 [2:56:28<6:13:36, 11.92s/it] {'loss': 1.0459, 'learning_rate': 3.962367875571522e-06, 'epoch': 0.32} 32%|███▏ | 894/2774 [2:56:28<6:13:36, 11.92s/it] 32%|███▏ | 895/2774 [2:56:39<6:11:25, 11.86s/it] {'loss': 1.0366, 'learning_rate': 3.959998797051578e-06, 'epoch': 0.32} 32%|███▏ | 895/2774 [2:56:39<6:11:25, 11.86s/it] 32%|███▏ | 896/2774 [2:56:51<6:06:44, 11.72s/it] {'loss': 1.0112, 'learning_rate': 3.957627727181472e-06, 'epoch': 0.32} 32%|███▏ | 896/2774 [2:56:51<6:06:44, 11.72s/it] 32%|███▏ | 897/2774 [2:57:02<6:03:22, 11.62s/it] {'loss': 0.9858, 'learning_rate': 3.955254669195198e-06, 'epoch': 0.32} 32%|███▏ | 897/2774 [2:57:02<6:03:22, 11.62s/it] 32%|███▏ | 898/2774 [2:57:14<6:01:50, 11.57s/it] {'loss': 1.02, 'learning_rate': 3.952879626329464e-06, 'epoch': 0.32} 32%|███▏ | 898/2774 [2:57:14<6:01:50, 11.57s/it] 32%|███▏ | 899/2774 [2:57:26<6:06:42, 11.73s/it] {'loss': 1.0513, 'learning_rate': 3.950502601823686e-06, 'epoch': 0.32} 32%|███▏ | 899/2774 [2:57:26<6:06:42, 11.73s/it] 32%|███▏ | 900/2774 [2:57:38<6:09:45, 11.84s/it] {'loss': 1.0151, 'learning_rate': 3.948123598919982e-06, 'epoch': 0.32} 32%|███▏ | 900/2774 [2:57:38<6:09:45, 11.84s/it] 32%|███▏ | 901/2774 [2:57:49<6:06:59, 11.76s/it] {'loss': 1.0107, 'learning_rate': 3.9457426208631674e-06, 'epoch': 0.32} 32%|███▏ | 901/2774 [2:57:49<6:06:59, 11.76s/it] 33%|███▎ | 902/2774 [2:58:01<6:07:56, 11.79s/it] {'loss': 1.0273, 'learning_rate': 3.943359670900753e-06, 'epoch': 0.33} 33%|███▎ | 902/2774 [2:58:01<6:07:56, 11.79s/it] 33%|███▎ | 903/2774 [2:58:13<6:04:13, 11.68s/it] {'loss': 1.0801, 'learning_rate': 3.940974752282939e-06, 'epoch': 0.33} 33%|███▎ | 903/2774 [2:58:13<6:04:13, 11.68s/it] 33%|███▎ | 904/2774 [2:58:24<6:05:46, 11.74s/it] {'loss': 0.9902, 'learning_rate': 3.9385878682626085e-06, 'epoch': 0.33} 33%|███▎ | 904/2774 [2:58:24<6:05:46, 11.74s/it] 33%|███▎ | 905/2774 [2:58:36<6:02:26, 11.64s/it] {'loss': 0.9917, 'learning_rate': 3.93619902209533e-06, 'epoch': 0.33} 33%|███▎ | 905/2774 [2:58:36<6:02:26, 11.64s/it] 33%|███▎ | 906/2774 [2:58:51<6:30:42, 12.55s/it] {'loss': 0.9932, 'learning_rate': 3.933808217039343e-06, 'epoch': 0.33} 33%|███▎ | 906/2774 [2:58:51<6:30:42, 12.55s/it] 33%|███▎ | 907/2774 [2:59:03<6:31:12, 12.57s/it] {'loss': 1.0381, 'learning_rate': 3.931415456355562e-06, 'epoch': 0.33} 33%|███▎ | 907/2774 [2:59:03<6:31:12, 12.57s/it] 33%|███▎ | 908/2774 [2:59:15<6:23:49, 12.34s/it] {'loss': 1.084, 'learning_rate': 3.929020743307567e-06, 'epoch': 0.33} 33%|███▎ | 908/2774 [2:59:15<6:23:49, 12.34s/it] 33%|███▎ | 909/2774 [2:59:27<6:24:52, 12.38s/it] {'loss': 1.0977, 'learning_rate': 3.926624081161604e-06, 'epoch': 0.33} 33%|███▎ | 909/2774 [2:59:27<6:24:52, 12.38s/it] 33%|███▎ | 910/2774 [2:59:39<6:16:52, 12.13s/it] {'loss': 0.9834, 'learning_rate': 3.9242254731865734e-06, 'epoch': 0.33} 33%|███▎ | 910/2774 [2:59:39<6:16:52, 12.13s/it] 33%|███▎ | 911/2774 [2:59:50<6:09:36, 11.90s/it] {'loss': 1.04, 'learning_rate': 3.921824922654033e-06, 'epoch': 0.33} 33%|███▎ | 911/2774 [2:59:50<6:09:36, 11.90s/it] 33%|███▎ | 912/2774 [3:00:02<6:03:26, 11.71s/it] {'loss': 1.0034, 'learning_rate': 3.919422432838188e-06, 'epoch': 0.33} 33%|███▎ | 912/2774 [3:00:02<6:03:26, 11.71s/it] 33%|███▎ | 913/2774 [3:00:14<6:05:47, 11.79s/it] {'loss': 1.0654, 'learning_rate': 3.91701800701589e-06, 'epoch': 0.33} 33%|███▎ | 913/2774 [3:00:14<6:05:47, 11.79s/it] 33%|███▎ | 914/2774 [3:00:25<6:00:56, 11.64s/it] {'loss': 1.0283, 'learning_rate': 3.914611648466629e-06, 'epoch': 0.33} 33%|███▎ | 914/2774 [3:00:25<6:00:56, 11.64s/it] 33%|███▎ | 915/2774 [3:00:36<5:59:26, 11.60s/it] {'loss': 1.0371, 'learning_rate': 3.912203360472535e-06, 'epoch': 0.33} 33%|███▎ | 915/2774 [3:00:36<5:59:26, 11.60s/it] 33%|███▎ | 916/2774 [3:00:49<6:04:14, 11.76s/it] {'loss': 1.0063, 'learning_rate': 3.909793146318366e-06, 'epoch': 0.33} 33%|███▎ | 916/2774 [3:00:49<6:04:14, 11.76s/it] 33%|███▎ | 917/2774 [3:01:00<6:02:14, 11.70s/it] {'loss': 1.0693, 'learning_rate': 3.907381009291509e-06, 'epoch': 0.33} 33%|███▎ | 917/2774 [3:01:00<6:02:14, 11.70s/it] 33%|███▎ | 918/2774 [3:01:11<5:59:21, 11.62s/it] {'loss': 1.0557, 'learning_rate': 3.904966952681972e-06, 'epoch': 0.33} 33%|███▎ | 918/2774 [3:01:11<5:59:21, 11.62s/it] 33%|███▎ | 919/2774 [3:01:24<6:04:27, 11.79s/it] {'loss': 1.0449, 'learning_rate': 3.902550979782384e-06, 'epoch': 0.33} 33%|███▎ | 919/2774 [3:01:24<6:04:27, 11.79s/it] 33%|███▎ | 920/2774 [3:01:36<6:04:59, 11.81s/it] {'loss': 1.0718, 'learning_rate': 3.900133093887984e-06, 'epoch': 0.33} 33%|███▎ | 920/2774 [3:01:36<6:04:59, 11.81s/it] 33%|███▎ | 921/2774 [3:01:47<5:59:45, 11.65s/it] {'loss': 1.0659, 'learning_rate': 3.897713298296625e-06, 'epoch': 0.33} 33%|███▎ | 921/2774 [3:01:47<5:59:45, 11.65s/it] 33%|███▎ | 922/2774 [3:01:58<5:56:58, 11.57s/it] {'loss': 1.0146, 'learning_rate': 3.89529159630876e-06, 'epoch': 0.33} 33%|███▎ | 922/2774 [3:01:58<5:56:58, 11.57s/it] 33%|███▎ | 923/2774 [3:02:10<5:56:46, 11.56s/it] {'loss': 1.0591, 'learning_rate': 3.892867991227445e-06, 'epoch': 0.33} 33%|███▎ | 923/2774 [3:02:10<5:56:46, 11.56s/it] 33%|███▎ | 924/2774 [3:02:21<5:55:28, 11.53s/it] {'loss': 1.0298, 'learning_rate': 3.890442486358332e-06, 'epoch': 0.33} 33%|███▎ | 924/2774 [3:02:21<5:55:28, 11.53s/it] 33%|███▎ | 925/2774 [3:02:33<5:57:33, 11.60s/it] {'loss': 1.125, 'learning_rate': 3.88801508500966e-06, 'epoch': 0.33} 33%|███▎ | 925/2774 [3:02:33<5:57:33, 11.60s/it] 33%|███▎ | 926/2774 [3:02:44<5:56:15, 11.57s/it] {'loss': 1.0981, 'learning_rate': 3.88558579049226e-06, 'epoch': 0.33} 33%|███▎ | 926/2774 [3:02:44<5:56:15, 11.57s/it] 33%|███▎ | 927/2774 [3:02:56<5:51:52, 11.43s/it] {'loss': 1.0635, 'learning_rate': 3.883154606119544e-06, 'epoch': 0.33} 33%|███▎ | 927/2774 [3:02:56<5:51:52, 11.43s/it] 33%|███▎ | 928/2774 [3:03:08<5:59:51, 11.70s/it] {'loss': 1.04, 'learning_rate': 3.8807215352074975e-06, 'epoch': 0.33} 33%|███▎ | 928/2774 [3:03:08<5:59:51, 11.70s/it] 33%|███▎ | 929/2774 [3:03:19<5:54:58, 11.54s/it] {'loss': 1.1113, 'learning_rate': 3.878286581074685e-06, 'epoch': 0.33} 33%|███▎ | 929/2774 [3:03:19<5:54:58, 11.54s/it] 34%|███▎ | 930/2774 [3:03:31<5:54:33, 11.54s/it] {'loss': 1.0601, 'learning_rate': 3.875849747042236e-06, 'epoch': 0.34} 34%|███▎ | 930/2774 [3:03:31<5:54:33, 11.54s/it] 34%|███▎ | 931/2774 [3:03:42<5:54:06, 11.53s/it] {'loss': 1.0752, 'learning_rate': 3.873411036433845e-06, 'epoch': 0.34} 34%|███▎ | 931/2774 [3:03:42<5:54:06, 11.53s/it] 34%|███▎ | 932/2774 [3:03:54<5:56:48, 11.62s/it] {'loss': 1.0366, 'learning_rate': 3.870970452575765e-06, 'epoch': 0.34} 34%|███▎ | 932/2774 [3:03:54<5:56:48, 11.62s/it] 34%|███▎ | 933/2774 [3:04:05<5:54:47, 11.56s/it] {'loss': 1.0059, 'learning_rate': 3.868527998796807e-06, 'epoch': 0.34} 34%|███▎ | 933/2774 [3:04:05<5:54:47, 11.56s/it] 34%|███▎ | 934/2774 [3:04:17<5:57:50, 11.67s/it] {'loss': 1.0298, 'learning_rate': 3.866083678428328e-06, 'epoch': 0.34} 34%|███▎ | 934/2774 [3:04:17<5:57:50, 11.67s/it] 34%|███▎ | 935/2774 [3:04:29<5:53:55, 11.55s/it] {'loss': 1.0698, 'learning_rate': 3.863637494804235e-06, 'epoch': 0.34} 34%|███▎ | 935/2774 [3:04:29<5:53:55, 11.55s/it] 34%|███▎ | 936/2774 [3:04:40<5:56:39, 11.64s/it] {'loss': 1.0244, 'learning_rate': 3.861189451260974e-06, 'epoch': 0.34} 34%|███▎ | 936/2774 [3:04:40<5:56:39, 11.64s/it] 34%|███▍ | 937/2774 [3:04:52<5:56:47, 11.65s/it] {'loss': 1.0205, 'learning_rate': 3.858739551137528e-06, 'epoch': 0.34} 34%|███▍ | 937/2774 [3:04:52<5:56:47, 11.65s/it] 34%|███▍ | 938/2774 [3:05:05<6:08:59, 12.06s/it] {'loss': 0.9458, 'learning_rate': 3.856287797775414e-06, 'epoch': 0.34} 34%|███▍ | 938/2774 [3:05:05<6:08:59, 12.06s/it] 34%|███▍ | 939/2774 [3:05:16<6:00:52, 11.80s/it] {'loss': 1.0527, 'learning_rate': 3.853834194518675e-06, 'epoch': 0.34} 34%|███▍ | 939/2774 [3:05:16<6:00:52, 11.80s/it] 34%|███▍ | 940/2774 [3:05:27<5:55:03, 11.62s/it] {'loss': 1.0571, 'learning_rate': 3.851378744713879e-06, 'epoch': 0.34} 34%|███▍ | 940/2774 [3:05:27<5:55:03, 11.62s/it] 34%|███▍ | 941/2774 [3:05:40<5:59:38, 11.77s/it] {'loss': 1.0156, 'learning_rate': 3.848921451710107e-06, 'epoch': 0.34} 34%|███▍ | 941/2774 [3:05:40<5:59:38, 11.77s/it] 34%|███▍ | 942/2774 [3:05:51<5:58:10, 11.73s/it] {'loss': 1.0229, 'learning_rate': 3.846462318858962e-06, 'epoch': 0.34} 34%|███▍ | 942/2774 [3:05:51<5:58:10, 11.73s/it] 34%|███▍ | 943/2774 [3:06:03<5:55:28, 11.65s/it] {'loss': 1.0952, 'learning_rate': 3.844001349514553e-06, 'epoch': 0.34} 34%|███▍ | 943/2774 [3:06:03<5:55:28, 11.65s/it] 34%|███▍ | 944/2774 [3:06:14<5:52:24, 11.55s/it] {'loss': 1.0186, 'learning_rate': 3.8415385470334906e-06, 'epoch': 0.34} 34%|███▍ | 944/2774 [3:06:14<5:52:24, 11.55s/it] 34%|███▍ | 945/2774 [3:06:26<5:54:02, 11.61s/it] {'loss': 1.0308, 'learning_rate': 3.83907391477489e-06, 'epoch': 0.34} 34%|███▍ | 945/2774 [3:06:26<5:54:02, 11.61s/it] 34%|███▍ | 946/2774 [3:06:37<5:51:07, 11.52s/it] {'loss': 1.0142, 'learning_rate': 3.836607456100362e-06, 'epoch': 0.34} 34%|███▍ | 946/2774 [3:06:37<5:51:07, 11.52s/it] 34%|███▍ | 947/2774 [3:06:48<5:49:21, 11.47s/it] {'loss': 1.0366, 'learning_rate': 3.834139174374005e-06, 'epoch': 0.34} 34%|███▍ | 947/2774 [3:06:48<5:49:21, 11.47s/it] 34%|███▍ | 948/2774 [3:07:00<5:49:38, 11.49s/it] {'loss': 1.0117, 'learning_rate': 3.831669072962408e-06, 'epoch': 0.34} 34%|███▍ | 948/2774 [3:07:00<5:49:38, 11.49s/it] 34%|███▍ | 949/2774 [3:07:12<5:50:56, 11.54s/it] {'loss': 0.9795, 'learning_rate': 3.82919715523464e-06, 'epoch': 0.34} 34%|███▍ | 949/2774 [3:07:12<5:50:56, 11.54s/it] 34%|███▍ | 950/2774 [3:07:24<5:59:37, 11.83s/it] {'loss': 1.0293, 'learning_rate': 3.826723424562246e-06, 'epoch': 0.34} 34%|███▍ | 950/2774 [3:07:24<5:59:37, 11.83s/it] 34%|███▍ | 951/2774 [3:07:36<5:55:25, 11.70s/it] {'loss': 1.0405, 'learning_rate': 3.824247884319245e-06, 'epoch': 0.34} 34%|███▍ | 951/2774 [3:07:36<5:55:25, 11.70s/it] 34%|███▍ | 952/2774 [3:07:47<5:52:07, 11.60s/it] {'loss': 0.9946, 'learning_rate': 3.821770537882126e-06, 'epoch': 0.34} 34%|███▍ | 952/2774 [3:07:47<5:52:07, 11.60s/it] 34%|███▍ | 953/2774 [3:07:58<5:50:13, 11.54s/it] {'loss': 0.9727, 'learning_rate': 3.81929138862984e-06, 'epoch': 0.34} 34%|███▍ | 953/2774 [3:07:58<5:50:13, 11.54s/it] 34%|███▍ | 954/2774 [3:08:10<5:51:34, 11.59s/it] {'loss': 1.019, 'learning_rate': 3.816810439943795e-06, 'epoch': 0.34} 34%|███▍ | 954/2774 [3:08:10<5:51:34, 11.59s/it] 34%|███▍ | 955/2774 [3:08:21<5:49:40, 11.53s/it] {'loss': 1.0215, 'learning_rate': 3.814327695207858e-06, 'epoch': 0.34} 34%|███▍ | 955/2774 [3:08:21<5:49:40, 11.53s/it] 34%|███▍ | 956/2774 [3:08:33<5:50:18, 11.56s/it] {'loss': 1.0723, 'learning_rate': 3.81184315780834e-06, 'epoch': 0.34} 34%|███▍ | 956/2774 [3:08:33<5:50:18, 11.56s/it] 34%|███▍ | 957/2774 [3:08:44<5:48:11, 11.50s/it] {'loss': 1.0044, 'learning_rate': 3.8093568311340007e-06, 'epoch': 0.34} 34%|███▍ | 957/2774 [3:08:44<5:48:11, 11.50s/it] 35%|███▍ | 958/2774 [3:08:56<5:46:10, 11.44s/it] {'loss': 1.0259, 'learning_rate': 3.8068687185760406e-06, 'epoch': 0.35} 35%|███▍ | 958/2774 [3:08:56<5:46:10, 11.44s/it] 35%|███▍ | 959/2774 [3:09:07<5:43:24, 11.35s/it] {'loss': 1.0493, 'learning_rate': 3.804378823528093e-06, 'epoch': 0.35} 35%|███▍ | 959/2774 [3:09:07<5:43:24, 11.35s/it] 35%|███▍ | 960/2774 [3:09:18<5:42:05, 11.32s/it] {'loss': 1.0259, 'learning_rate': 3.8018871493862265e-06, 'epoch': 0.35} 35%|███▍ | 960/2774 [3:09:18<5:42:05, 11.32s/it] 35%|███▍ | 961/2774 [3:09:32<6:01:07, 11.95s/it] {'loss': 1.0288, 'learning_rate': 3.799393699548932e-06, 'epoch': 0.35} 35%|███▍ | 961/2774 [3:09:32<6:01:07, 11.95s/it] 35%|███▍ | 962/2774 [3:09:43<5:55:38, 11.78s/it] {'loss': 1.0537, 'learning_rate': 3.7968984774171268e-06, 'epoch': 0.35} 35%|███▍ | 962/2774 [3:09:43<5:55:38, 11.78s/it] 35%|███▍ | 963/2774 [3:09:54<5:52:03, 11.66s/it] {'loss': 0.9907, 'learning_rate': 3.7944014863941415e-06, 'epoch': 0.35} 35%|███▍ | 963/2774 [3:09:54<5:52:03, 11.66s/it] 35%|███▍ | 964/2774 [3:10:06<5:49:29, 11.59s/it] {'loss': 1.0645, 'learning_rate': 3.7919027298857213e-06, 'epoch': 0.35} 35%|███▍ | 964/2774 [3:10:06<5:49:29, 11.59s/it] 35%|███▍ | 965/2774 [3:10:17<5:49:39, 11.60s/it] {'loss': 1.0532, 'learning_rate': 3.7894022113000184e-06, 'epoch': 0.35} 35%|███▍ | 965/2774 [3:10:17<5:49:39, 11.60s/it] 35%|███▍ | 966/2774 [3:10:29<5:49:58, 11.61s/it] {'loss': 1.002, 'learning_rate': 3.786899934047591e-06, 'epoch': 0.35} 35%|███▍ | 966/2774 [3:10:29<5:49:58, 11.61s/it] 35%|███▍ | 967/2774 [3:10:41<5:49:10, 11.59s/it] {'loss': 1.0059, 'learning_rate': 3.784395901541393e-06, 'epoch': 0.35} 35%|███▍ | 967/2774 [3:10:41<5:49:10, 11.59s/it] 35%|███▍ | 968/2774 [3:10:52<5:46:54, 11.52s/it] {'loss': 1.0615, 'learning_rate': 3.7818901171967727e-06, 'epoch': 0.35} 35%|███▍ | 968/2774 [3:10:52<5:46:54, 11.52s/it] 35%|███▍ | 969/2774 [3:11:03<5:42:59, 11.40s/it] {'loss': 0.9937, 'learning_rate': 3.7793825844314702e-06, 'epoch': 0.35} 35%|███▍ | 969/2774 [3:11:03<5:42:59, 11.40s/it] 35%|███▍ | 970/2774 [3:11:15<5:45:06, 11.48s/it] {'loss': 1.0366, 'learning_rate': 3.7768733066656075e-06, 'epoch': 0.35} 35%|███▍ | 970/2774 [3:11:15<5:45:06, 11.48s/it] 35%|███▌ | 971/2774 [3:11:26<5:44:33, 11.47s/it] {'loss': 1.0361, 'learning_rate': 3.7743622873216886e-06, 'epoch': 0.35} 35%|███▌ | 971/2774 [3:11:26<5:44:33, 11.47s/it] 35%|███▌ | 972/2774 [3:11:38<5:44:20, 11.47s/it] {'loss': 1.0063, 'learning_rate': 3.7718495298245917e-06, 'epoch': 0.35} 35%|███▌ | 972/2774 [3:11:38<5:44:20, 11.47s/it] 35%|███▌ | 973/2774 [3:11:49<5:46:01, 11.53s/it] {'loss': 1.0449, 'learning_rate': 3.769335037601566e-06, 'epoch': 0.35} 35%|███▌ | 973/2774 [3:11:49<5:46:01, 11.53s/it] 35%|███▌ | 974/2774 [3:12:01<5:49:55, 11.66s/it] {'loss': 1.0537, 'learning_rate': 3.766818814082228e-06, 'epoch': 0.35} 35%|███▌ | 974/2774 [3:12:01<5:49:55, 11.66s/it] 35%|███▌ | 975/2774 [3:12:15<6:07:13, 12.25s/it] {'loss': 1.021, 'learning_rate': 3.7643008626985532e-06, 'epoch': 0.35} 35%|███▌ | 975/2774 [3:12:15<6:07:13, 12.25s/it] 35%|███▌ | 976/2774 [3:12:27<6:02:29, 12.10s/it] {'loss': 1.0664, 'learning_rate': 3.7617811868848774e-06, 'epoch': 0.35} 35%|███▌ | 976/2774 [3:12:27<6:02:29, 12.10s/it] 35%|███▌ | 977/2774 [3:12:40<6:12:14, 12.43s/it] {'loss': 1.0269, 'learning_rate': 3.7592597900778836e-06, 'epoch': 0.35} 35%|███▌ | 977/2774 [3:12:40<6:12:14, 12.43s/it] 35%|███▌ | 978/2774 [3:12:54<6:27:49, 12.96s/it] {'loss': 0.9629, 'learning_rate': 3.756736675716606e-06, 'epoch': 0.35} 35%|███▌ | 978/2774 [3:12:54<6:27:49, 12.96s/it] 35%|███▌ | 979/2774 [3:13:06<6:17:48, 12.63s/it] {'loss': 1.0464, 'learning_rate': 3.7542118472424207e-06, 'epoch': 0.35} 35%|███▌ | 979/2774 [3:13:06<6:17:48, 12.63s/it] 35%|███▌ | 980/2774 [3:13:17<6:08:14, 12.32s/it] {'loss': 1.0469, 'learning_rate': 3.7516853080990403e-06, 'epoch': 0.35} 35%|███▌ | 980/2774 [3:13:17<6:08:14, 12.32s/it] 35%|███▌ | 981/2774 [3:13:28<5:56:31, 11.93s/it] {'loss': 1.002, 'learning_rate': 3.749157061732511e-06, 'epoch': 0.35} 35%|███▌ | 981/2774 [3:13:28<5:56:31, 11.93s/it] 35%|███▌ | 982/2774 [3:13:40<5:54:02, 11.85s/it] {'loss': 1.0425, 'learning_rate': 3.74662711159121e-06, 'epoch': 0.35} 35%|███▌ | 982/2774 [3:13:40<5:54:02, 11.85s/it] 35%|███▌ | 983/2774 [3:13:52<5:58:39, 12.02s/it] {'loss': 1.0137, 'learning_rate': 3.744095461125835e-06, 'epoch': 0.35} 35%|███▌ | 983/2774 [3:13:52<5:58:39, 12.02s/it] 35%|███▌ | 984/2774 [3:14:04<5:55:05, 11.90s/it] {'loss': 1.0259, 'learning_rate': 3.7415621137894055e-06, 'epoch': 0.35} 35%|███▌ | 984/2774 [3:14:04<5:55:05, 11.90s/it] 36%|███▌ | 985/2774 [3:14:15<5:47:28, 11.65s/it] {'loss': 1.0522, 'learning_rate': 3.739027073037253e-06, 'epoch': 0.36} 36%|███▌ | 985/2774 [3:14:15<5:47:28, 11.65s/it] 36%|███▌ | 986/2774 [3:14:29<6:03:34, 12.20s/it] {'loss': 1.0464, 'learning_rate': 3.7364903423270204e-06, 'epoch': 0.36} 36%|███▌ | 986/2774 [3:14:29<6:03:34, 12.20s/it] 36%|███▌ | 987/2774 [3:14:41<6:00:00, 12.09s/it] {'loss': 1.0625, 'learning_rate': 3.733951925118655e-06, 'epoch': 0.36} 36%|███▌ | 987/2774 [3:14:41<6:00:00, 12.09s/it] 36%|███▌ | 988/2774 [3:14:52<5:56:26, 11.97s/it] {'loss': 1.0444, 'learning_rate': 3.7314118248744045e-06, 'epoch': 0.36} 36%|███▌ | 988/2774 [3:14:52<5:56:26, 11.97s/it] 36%|███▌ | 989/2774 [3:15:04<5:53:13, 11.87s/it] {'loss': 1.0332, 'learning_rate': 3.7288700450588134e-06, 'epoch': 0.36} 36%|███▌ | 989/2774 [3:15:04<5:53:13, 11.87s/it] 36%|███▌ | 990/2774 [3:15:15<5:49:19, 11.75s/it] {'loss': 1.106, 'learning_rate': 3.726326589138714e-06, 'epoch': 0.36} 36%|███▌ | 990/2774 [3:15:15<5:49:19, 11.75s/it] 36%|███▌ | 991/2774 [3:15:27<5:48:18, 11.72s/it] {'loss': 1.0371, 'learning_rate': 3.723781460583228e-06, 'epoch': 0.36} 36%|███▌ | 991/2774 [3:15:27<5:48:18, 11.72s/it] 36%|███▌ | 992/2774 [3:15:38<5:44:06, 11.59s/it] {'loss': 1.0503, 'learning_rate': 3.7212346628637557e-06, 'epoch': 0.36} 36%|███▌ | 992/2774 [3:15:38<5:44:06, 11.59s/it] 36%|███▌ | 993/2774 [3:15:50<5:42:11, 11.53s/it] {'loss': 1.042, 'learning_rate': 3.7186861994539763e-06, 'epoch': 0.36} 36%|███▌ | 993/2774 [3:15:50<5:42:11, 11.53s/it] 36%|███▌ | 994/2774 [3:16:02<5:48:00, 11.73s/it] {'loss': 1.0264, 'learning_rate': 3.716136073829839e-06, 'epoch': 0.36} 36%|███▌ | 994/2774 [3:16:02<5:48:00, 11.73s/it] 36%|███▌ | 995/2774 [3:16:14<5:55:14, 11.98s/it] {'loss': 1.0088, 'learning_rate': 3.713584289469563e-06, 'epoch': 0.36} 36%|███▌ | 995/2774 [3:16:14<5:55:14, 11.98s/it] 36%|███▌ | 996/2774 [3:16:26<5:51:37, 11.87s/it] {'loss': 1.0127, 'learning_rate': 3.7110308498536264e-06, 'epoch': 0.36} 36%|███▌ | 996/2774 [3:16:26<5:51:37, 11.87s/it] 36%|███▌ | 997/2774 [3:16:37<5:46:27, 11.70s/it] {'loss': 1.0225, 'learning_rate': 3.7084757584647662e-06, 'epoch': 0.36} 36%|███▌ | 997/2774 [3:16:37<5:46:27, 11.70s/it] 36%|███▌ | 998/2774 [3:16:49<5:46:26, 11.70s/it] {'loss': 1.1157, 'learning_rate': 3.705919018787974e-06, 'epoch': 0.36} 36%|███▌ | 998/2774 [3:16:49<5:46:26, 11.70s/it] 36%|███▌ | 999/2774 [3:17:00<5:44:02, 11.63s/it] {'loss': 1.0571, 'learning_rate': 3.7033606343104877e-06, 'epoch': 0.36} 36%|███▌ | 999/2774 [3:17:00<5:44:02, 11.63s/it] 36%|███▌ | 1000/2774 [3:17:12<5:40:08, 11.50s/it] {'loss': 1.0698, 'learning_rate': 3.700800608521789e-06, 'epoch': 0.36} 36%|███▌ | 1000/2774 [3:17:12<5:40:08, 11.50s/it] 36%|███▌ | 1001/2774 [3:17:23<5:38:05, 11.44s/it] {'loss': 1.0522, 'learning_rate': 3.6982389449135986e-06, 'epoch': 0.36} 36%|███▌ | 1001/2774 [3:17:23<5:38:05, 11.44s/it] 36%|███▌ | 1002/2774 [3:17:35<5:41:53, 11.58s/it] {'loss': 1.0659, 'learning_rate': 3.695675646979871e-06, 'epoch': 0.36} 36%|███▌ | 1002/2774 [3:17:35<5:41:53, 11.58s/it] 36%|███▌ | 1003/2774 [3:17:47<5:42:52, 11.62s/it] {'loss': 1.0371, 'learning_rate': 3.6931107182167904e-06, 'epoch': 0.36} 36%|███▌ | 1003/2774 [3:17:47<5:42:52, 11.62s/it] 36%|███▌ | 1004/2774 [3:17:58<5:44:56, 11.69s/it] {'loss': 1.0786, 'learning_rate': 3.690544162122763e-06, 'epoch': 0.36} 36%|███▌ | 1004/2774 [3:17:58<5:44:56, 11.69s/it] 36%|███▌ | 1005/2774 [3:18:10<5:46:25, 11.75s/it] {'loss': 1.0322, 'learning_rate': 3.6879759821984175e-06, 'epoch': 0.36} 36%|███▌ | 1005/2774 [3:18:10<5:46:25, 11.75s/it] 36%|███▋ | 1006/2774 [3:18:22<5:48:25, 11.82s/it] {'loss': 1.0098, 'learning_rate': 3.685406181946596e-06, 'epoch': 0.36} 36%|███▋ | 1006/2774 [3:18:22<5:48:25, 11.82s/it] 36%|███▋ | 1007/2774 [3:18:34<5:43:41, 11.67s/it] {'loss': 1.04, 'learning_rate': 3.682834764872351e-06, 'epoch': 0.36} 36%|███▋ | 1007/2774 [3:18:34<5:43:41, 11.67s/it] 36%|███▋ | 1008/2774 [3:18:45<5:40:23, 11.57s/it] {'loss': 1.0801, 'learning_rate': 3.6802617344829393e-06, 'epoch': 0.36} 36%|███▋ | 1008/2774 [3:18:45<5:40:23, 11.57s/it] 36%|███▋ | 1009/2774 [3:18:56<5:37:06, 11.46s/it] {'loss': 0.9507, 'learning_rate': 3.6776870942878196e-06, 'epoch': 0.36} 36%|███▋ | 1009/2774 [3:18:56<5:37:06, 11.46s/it] 36%|███▋ | 1010/2774 [3:19:09<5:50:14, 11.91s/it] {'loss': 0.9746, 'learning_rate': 3.675110847798645e-06, 'epoch': 0.36} 36%|███▋ | 1010/2774 [3:19:09<5:50:14, 11.91s/it] 36%|███▋ | 1011/2774 [3:19:21<5:49:30, 11.89s/it] {'loss': 1.0527, 'learning_rate': 3.6725329985292614e-06, 'epoch': 0.36} 36%|███▋ | 1011/2774 [3:19:21<5:49:30, 11.89s/it] 36%|███▋ | 1012/2774 [3:19:33<5:52:26, 12.00s/it] {'loss': 0.9746, 'learning_rate': 3.669953549995698e-06, 'epoch': 0.36} 36%|███▋ | 1012/2774 [3:19:33<5:52:26, 12.00s/it] 37%|███▋ | 1013/2774 [3:19:45<5:47:25, 11.84s/it] {'loss': 1.0317, 'learning_rate': 3.6673725057161676e-06, 'epoch': 0.37} 37%|███▋ | 1013/2774 [3:19:45<5:47:25, 11.84s/it] 37%|███▋ | 1014/2774 [3:19:57<5:47:17, 11.84s/it] {'loss': 1.0029, 'learning_rate': 3.6647898692110578e-06, 'epoch': 0.37} 37%|███▋ | 1014/2774 [3:19:57<5:47:17, 11.84s/it] 37%|███▋ | 1015/2774 [3:20:08<5:43:11, 11.71s/it] {'loss': 1.0073, 'learning_rate': 3.6622056440029303e-06, 'epoch': 0.37} 37%|███▋ | 1015/2774 [3:20:08<5:43:11, 11.71s/it] 37%|███▋ | 1016/2774 [3:20:19<5:41:22, 11.65s/it] {'loss': 1.0317, 'learning_rate': 3.6596198336165107e-06, 'epoch': 0.37} 37%|███▋ | 1016/2774 [3:20:19<5:41:22, 11.65s/it] 37%|███▋ | 1017/2774 [3:20:31<5:38:46, 11.57s/it] {'loss': 1.0498, 'learning_rate': 3.657032441578689e-06, 'epoch': 0.37} 37%|███▋ | 1017/2774 [3:20:31<5:38:46, 11.57s/it] 37%|███▋ | 1018/2774 [3:20:44<5:48:59, 11.92s/it] {'loss': 0.9985, 'learning_rate': 3.6544434714185117e-06, 'epoch': 0.37} 37%|███▋ | 1018/2774 [3:20:44<5:48:59, 11.92s/it] 37%|███▋ | 1019/2774 [3:20:55<5:45:39, 11.82s/it] {'loss': 1.0889, 'learning_rate': 3.6518529266671764e-06, 'epoch': 0.37} 37%|███▋ | 1019/2774 [3:20:55<5:45:39, 11.82s/it] 37%|███▋ | 1020/2774 [3:21:09<6:05:26, 12.50s/it] {'loss': 1.0239, 'learning_rate': 3.649260810858031e-06, 'epoch': 0.37} 37%|███▋ | 1020/2774 [3:21:09<6:05:26, 12.50s/it] 37%|███▋ | 1021/2774 [3:21:22<6:05:27, 12.51s/it] {'loss': 1.0254, 'learning_rate': 3.6466671275265653e-06, 'epoch': 0.37} 37%|███▋ | 1021/2774 [3:21:22<6:05:27, 12.51s/it] 37%|███▋ | 1022/2774 [3:21:35<6:14:38, 12.83s/it] {'loss': 1.0371, 'learning_rate': 3.644071880210405e-06, 'epoch': 0.37} 37%|███▋ | 1022/2774 [3:21:35<6:14:38, 12.83s/it] 37%|███▋ | 1023/2774 [3:21:47<6:01:38, 12.39s/it] {'loss': 1.0396, 'learning_rate': 3.641475072449312e-06, 'epoch': 0.37} 37%|███▋ | 1023/2774 [3:21:47<6:01:38, 12.39s/it] 37%|███▋ | 1024/2774 [3:21:58<5:52:25, 12.08s/it] {'loss': 1.0586, 'learning_rate': 3.6388767077851745e-06, 'epoch': 0.37} 37%|███▋ | 1024/2774 [3:21:58<5:52:25, 12.08s/it] 37%|███▋ | 1025/2774 [3:22:10<5:49:04, 11.97s/it] {'loss': 1.0269, 'learning_rate': 3.6362767897620054e-06, 'epoch': 0.37} 37%|███▋ | 1025/2774 [3:22:10<5:49:04, 11.97s/it] 37%|███▋ | 1026/2774 [3:22:22<5:46:38, 11.90s/it] {'loss': 1.0015, 'learning_rate': 3.633675321925936e-06, 'epoch': 0.37} 37%|███▋ | 1026/2774 [3:22:22<5:46:38, 11.90s/it] 37%|███▋ | 1027/2774 [3:22:33<5:41:06, 11.72s/it] {'loss': 1.103, 'learning_rate': 3.6310723078252103e-06, 'epoch': 0.37} 37%|███▋ | 1027/2774 [3:22:33<5:41:06, 11.72s/it] 37%|███▋ | 1028/2774 [3:22:44<5:39:22, 11.66s/it] {'loss': 1.0278, 'learning_rate': 3.6284677510101827e-06, 'epoch': 0.37} 37%|███▋ | 1028/2774 [3:22:44<5:39:22, 11.66s/it] 37%|███▋ | 1029/2774 [3:22:57<5:47:25, 11.95s/it] {'loss': 1.04, 'learning_rate': 3.6258616550333128e-06, 'epoch': 0.37} 37%|███▋ | 1029/2774 [3:22:57<5:47:25, 11.95s/it] 37%|███▋ | 1030/2774 [3:23:08<5:41:31, 11.75s/it] {'loss': 1.042, 'learning_rate': 3.623254023449156e-06, 'epoch': 0.37} 37%|███▋ | 1030/2774 [3:23:08<5:41:31, 11.75s/it] 37%|███▋ | 1031/2774 [3:23:20<5:39:36, 11.69s/it] {'loss': 1.061, 'learning_rate': 3.620644859814365e-06, 'epoch': 0.37} 37%|███▋ | 1031/2774 [3:23:20<5:39:36, 11.69s/it] 37%|███▋ | 1032/2774 [3:23:31<5:33:35, 11.49s/it] {'loss': 1.0278, 'learning_rate': 3.6180341676876818e-06, 'epoch': 0.37} 37%|███▋ | 1032/2774 [3:23:31<5:33:35, 11.49s/it] 37%|███▋ | 1033/2774 [3:23:43<5:38:03, 11.65s/it] {'loss': 1.0215, 'learning_rate': 3.615421950629932e-06, 'epoch': 0.37} 37%|███▋ | 1033/2774 [3:23:43<5:38:03, 11.65s/it] 37%|███▋ | 1034/2774 [3:23:56<5:49:39, 12.06s/it] {'loss': 1.0264, 'learning_rate': 3.6128082122040224e-06, 'epoch': 0.37} 37%|███▋ | 1034/2774 [3:23:56<5:49:39, 12.06s/it] 37%|███▋ | 1035/2774 [3:24:07<5:41:18, 11.78s/it] {'loss': 0.9805, 'learning_rate': 3.610192955974935e-06, 'epoch': 0.37} 37%|███▋ | 1035/2774 [3:24:07<5:41:18, 11.78s/it] 37%|███▋ | 1036/2774 [3:24:18<5:38:32, 11.69s/it] {'loss': 1.0176, 'learning_rate': 3.60757618550972e-06, 'epoch': 0.37} 37%|███▋ | 1036/2774 [3:24:18<5:38:32, 11.69s/it] 37%|███▋ | 1037/2774 [3:24:30<5:34:56, 11.57s/it] {'loss': 1.0298, 'learning_rate': 3.6049579043774946e-06, 'epoch': 0.37} 37%|███▋ | 1037/2774 [3:24:30<5:34:56, 11.57s/it] 37%|███▋ | 1038/2774 [3:24:41<5:33:58, 11.54s/it] {'loss': 1.0312, 'learning_rate': 3.602338116149437e-06, 'epoch': 0.37} 37%|███▋ | 1038/2774 [3:24:41<5:33:58, 11.54s/it] 37%|███▋ | 1039/2774 [3:24:52<5:30:31, 11.43s/it] {'loss': 1.0151, 'learning_rate': 3.599716824398779e-06, 'epoch': 0.37} 37%|███▋ | 1039/2774 [3:24:52<5:30:31, 11.43s/it] 37%|███▋ | 1040/2774 [3:25:04<5:28:57, 11.38s/it] {'loss': 1.0146, 'learning_rate': 3.5970940327008043e-06, 'epoch': 0.37} 37%|███▋ | 1040/2774 [3:25:04<5:28:57, 11.38s/it] 38%|███▊ | 1041/2774 [3:25:15<5:28:37, 11.38s/it] {'loss': 1.0918, 'learning_rate': 3.594469744632843e-06, 'epoch': 0.38} 38%|███▊ | 1041/2774 [3:25:15<5:28:37, 11.38s/it] 38%|███▊ | 1042/2774 [3:25:26<5:28:18, 11.37s/it] {'loss': 1.0337, 'learning_rate': 3.5918439637742648e-06, 'epoch': 0.38} 38%|███▊ | 1042/2774 [3:25:26<5:28:18, 11.37s/it] 38%|███▊ | 1043/2774 [3:25:38<5:31:47, 11.50s/it] {'loss': 1.04, 'learning_rate': 3.5892166937064765e-06, 'epoch': 0.38} 38%|███▊ | 1043/2774 [3:25:38<5:31:47, 11.50s/it] 38%|███▊ | 1044/2774 [3:25:50<5:33:17, 11.56s/it] {'loss': 1.0474, 'learning_rate': 3.5865879380129157e-06, 'epoch': 0.38} 38%|███▊ | 1044/2774 [3:25:50<5:33:17, 11.56s/it] 38%|███▊ | 1045/2774 [3:26:02<5:34:19, 11.60s/it] {'loss': 1.0444, 'learning_rate': 3.583957700279047e-06, 'epoch': 0.38} 38%|███▊ | 1045/2774 [3:26:02<5:34:19, 11.60s/it] 38%|███▊ | 1046/2774 [3:26:14<5:40:41, 11.83s/it] {'loss': 1.0015, 'learning_rate': 3.5813259840923543e-06, 'epoch': 0.38} 38%|███▊ | 1046/2774 [3:26:14<5:40:41, 11.83s/it] 38%|███▊ | 1047/2774 [3:26:25<5:37:40, 11.73s/it] {'loss': 1.0195, 'learning_rate': 3.5786927930423408e-06, 'epoch': 0.38} 38%|███▊ | 1047/2774 [3:26:25<5:37:40, 11.73s/it] 38%|███▊ | 1048/2774 [3:26:37<5:35:49, 11.67s/it] {'loss': 1.0327, 'learning_rate': 3.57605813072052e-06, 'epoch': 0.38} 38%|███▊ | 1048/2774 [3:26:37<5:35:49, 11.67s/it] 38%|███▊ | 1049/2774 [3:26:49<5:37:25, 11.74s/it] {'loss': 1.0444, 'learning_rate': 3.5734220007204114e-06, 'epoch': 0.38} 38%|███▊ | 1049/2774 [3:26:49<5:37:25, 11.74s/it] 38%|███▊ | 1050/2774 [3:27:01<5:40:02, 11.83s/it] {'loss': 1.0054, 'learning_rate': 3.5707844066375373e-06, 'epoch': 0.38} 38%|███▊ | 1050/2774 [3:27:01<5:40:02, 11.83s/it] 38%|███▊ | 1051/2774 [3:27:13<5:45:15, 12.02s/it] {'loss': 1.0435, 'learning_rate': 3.5681453520694164e-06, 'epoch': 0.38} 38%|███▊ | 1051/2774 [3:27:13<5:45:15, 12.02s/it] 38%|███▊ | 1052/2774 [3:27:25<5:38:11, 11.78s/it] {'loss': 1.02, 'learning_rate': 3.565504840615561e-06, 'epoch': 0.38} 38%|███▊ | 1052/2774 [3:27:25<5:38:11, 11.78s/it] 38%|███▊ | 1053/2774 [3:27:36<5:35:54, 11.71s/it] {'loss': 1.1133, 'learning_rate': 3.5628628758774685e-06, 'epoch': 0.38} 38%|███▊ | 1053/2774 [3:27:36<5:35:54, 11.71s/it] 38%|███▊ | 1054/2774 [3:27:48<5:33:40, 11.64s/it] {'loss': 1.0444, 'learning_rate': 3.5602194614586184e-06, 'epoch': 0.38} 38%|███▊ | 1054/2774 [3:27:48<5:33:40, 11.64s/it] 38%|███▊ | 1055/2774 [3:28:01<5:50:20, 12.23s/it] {'loss': 0.9902, 'learning_rate': 3.5575746009644696e-06, 'epoch': 0.38} 38%|███▊ | 1055/2774 [3:28:01<5:50:20, 12.23s/it] 38%|███▊ | 1056/2774 [3:28:13<5:46:25, 12.10s/it] {'loss': 0.9507, 'learning_rate': 3.554928298002451e-06, 'epoch': 0.38} 38%|███▊ | 1056/2774 [3:28:13<5:46:25, 12.10s/it] 38%|███▊ | 1057/2774 [3:28:26<5:51:55, 12.30s/it] {'loss': 1.0088, 'learning_rate': 3.5522805561819605e-06, 'epoch': 0.38} 38%|███▊ | 1057/2774 [3:28:26<5:51:55, 12.30s/it] 38%|███▊ | 1058/2774 [3:28:37<5:42:57, 11.99s/it] {'loss': 1.0659, 'learning_rate': 3.5496313791143578e-06, 'epoch': 0.38} 38%|███▊ | 1058/2774 [3:28:37<5:42:57, 11.99s/it] 38%|███▊ | 1059/2774 [3:28:49<5:38:38, 11.85s/it] {'loss': 1.0249, 'learning_rate': 3.54698077041296e-06, 'epoch': 0.38} 38%|███▊ | 1059/2774 [3:28:49<5:38:38, 11.85s/it] 38%|███▊ | 1060/2774 [3:29:00<5:33:26, 11.67s/it] {'loss': 0.9556, 'learning_rate': 3.544328733693038e-06, 'epoch': 0.38} 38%|███▊ | 1060/2774 [3:29:00<5:33:26, 11.67s/it] 38%|███▊ | 1061/2774 [3:29:11<5:30:49, 11.59s/it] {'loss': 1.0688, 'learning_rate': 3.54167527257181e-06, 'epoch': 0.38} 38%|███▊ | 1061/2774 [3:29:11<5:30:49, 11.59s/it] 38%|███▊ | 1062/2774 [3:29:24<5:38:37, 11.87s/it] {'loss': 0.979, 'learning_rate': 3.5390203906684356e-06, 'epoch': 0.38} 38%|███▊ | 1062/2774 [3:29:24<5:38:37, 11.87s/it] 38%|███▊ | 1063/2774 [3:29:35<5:35:54, 11.78s/it] {'loss': 1.0298, 'learning_rate': 3.5363640916040137e-06, 'epoch': 0.38} 38%|███▊ | 1063/2774 [3:29:35<5:35:54, 11.78s/it] 38%|███▊ | 1064/2774 [3:29:47<5:32:46, 11.68s/it] {'loss': 1.0249, 'learning_rate': 3.533706379001577e-06, 'epoch': 0.38} 38%|███▊ | 1064/2774 [3:29:47<5:32:46, 11.68s/it] 38%|███▊ | 1065/2774 [3:29:58<5:32:35, 11.68s/it] {'loss': 1.0166, 'learning_rate': 3.531047256486082e-06, 'epoch': 0.38} 38%|███▊ | 1065/2774 [3:29:58<5:32:35, 11.68s/it] 38%|███▊ | 1066/2774 [3:30:10<5:29:20, 11.57s/it] {'loss': 1.0186, 'learning_rate': 3.5283867276844147e-06, 'epoch': 0.38} 38%|███▊ | 1066/2774 [3:30:10<5:29:20, 11.57s/it] 38%|███▊ | 1067/2774 [3:30:22<5:30:57, 11.63s/it] {'loss': 1.0537, 'learning_rate': 3.5257247962253727e-06, 'epoch': 0.38} 38%|███▊ | 1067/2774 [3:30:22<5:30:57, 11.63s/it] 39%|███▊ | 1068/2774 [3:30:33<5:30:24, 11.62s/it] {'loss': 1.0591, 'learning_rate': 3.523061465739671e-06, 'epoch': 0.39} 39%|███▊ | 1068/2774 [3:30:33<5:30:24, 11.62s/it] 39%|███▊ | 1069/2774 [3:30:45<5:29:59, 11.61s/it] {'loss': 0.9956, 'learning_rate': 3.520396739859932e-06, 'epoch': 0.39} 39%|███▊ | 1069/2774 [3:30:45<5:29:59, 11.61s/it] 39%|███▊ | 1070/2774 [3:30:59<5:48:07, 12.26s/it] {'loss': 1.0176, 'learning_rate': 3.5177306222206797e-06, 'epoch': 0.39} 39%|███▊ | 1070/2774 [3:30:59<5:48:07, 12.26s/it] 39%|███▊ | 1071/2774 [3:31:10<5:45:15, 12.16s/it] {'loss': 0.9819, 'learning_rate': 3.5150631164583393e-06, 'epoch': 0.39} 39%|███▊ | 1071/2774 [3:31:10<5:45:15, 12.16s/it] 39%|███▊ | 1072/2774 [3:31:22<5:38:43, 11.94s/it] {'loss': 1.0278, 'learning_rate': 3.5123942262112255e-06, 'epoch': 0.39} 39%|███▊ | 1072/2774 [3:31:22<5:38:43, 11.94s/it] 39%|███▊ | 1073/2774 [3:31:34<5:41:35, 12.05s/it] {'loss': 1.0381, 'learning_rate': 3.509723955119544e-06, 'epoch': 0.39} 39%|███▊ | 1073/2774 [3:31:34<5:41:35, 12.05s/it] 39%|███▊ | 1074/2774 [3:31:46<5:38:29, 11.95s/it] {'loss': 1.0088, 'learning_rate': 3.5070523068253835e-06, 'epoch': 0.39} 39%|███▊ | 1074/2774 [3:31:46<5:38:29, 11.95s/it] 39%|███▉ | 1075/2774 [3:31:58<5:36:06, 11.87s/it] {'loss': 1.042, 'learning_rate': 3.5043792849727116e-06, 'epoch': 0.39} 39%|███▉ | 1075/2774 [3:31:58<5:36:06, 11.87s/it] 39%|███▉ | 1076/2774 [3:32:09<5:31:43, 11.72s/it] {'loss': 1.0908, 'learning_rate': 3.5017048932073674e-06, 'epoch': 0.39} 39%|███▉ | 1076/2774 [3:32:09<5:31:43, 11.72s/it] 39%|███▉ | 1077/2774 [3:32:24<5:55:37, 12.57s/it] {'loss': 1.0137, 'learning_rate': 3.49902913517706e-06, 'epoch': 0.39} 39%|███▉ | 1077/2774 [3:32:24<5:55:37, 12.57s/it] 39%|███▉ | 1078/2774 [3:32:35<5:42:48, 12.13s/it] {'loss': 1.083, 'learning_rate': 3.496352014531361e-06, 'epoch': 0.39} 39%|███▉ | 1078/2774 [3:32:35<5:42:48, 12.13s/it] 39%|███▉ | 1079/2774 [3:32:46<5:34:12, 11.83s/it] {'loss': 1.0103, 'learning_rate': 3.493673534921703e-06, 'epoch': 0.39} 39%|███▉ | 1079/2774 [3:32:46<5:34:12, 11.83s/it] 39%|███▉ | 1080/2774 [3:32:57<5:29:39, 11.68s/it] {'loss': 1.04, 'learning_rate': 3.4909937000013706e-06, 'epoch': 0.39} 39%|███▉ | 1080/2774 [3:32:57<5:29:39, 11.68s/it] 39%|███▉ | 1081/2774 [3:33:08<5:25:56, 11.55s/it] {'loss': 0.9487, 'learning_rate': 3.488312513425495e-06, 'epoch': 0.39} 39%|███▉ | 1081/2774 [3:33:08<5:25:56, 11.55s/it] 39%|███▉ | 1082/2774 [3:33:20<5:25:35, 11.55s/it] {'loss': 1.0898, 'learning_rate': 3.485629978851053e-06, 'epoch': 0.39} 39%|███▉ | 1082/2774 [3:33:20<5:25:35, 11.55s/it] 39%|███▉ | 1083/2774 [3:33:32<5:34:17, 11.86s/it] {'loss': 1.0166, 'learning_rate': 3.4829460999368597e-06, 'epoch': 0.39} 39%|███▉ | 1083/2774 [3:33:32<5:34:17, 11.86s/it] 39%|███▉ | 1084/2774 [3:33:49<6:11:28, 13.19s/it] {'loss': 1.0083, 'learning_rate': 3.480260880343565e-06, 'epoch': 0.39} 39%|███▉ | 1084/2774 [3:33:49<6:11:28, 13.19s/it] 39%|███▉ | 1085/2774 [3:34:00<5:56:42, 12.67s/it] {'loss': 1.0151, 'learning_rate': 3.477574323733645e-06, 'epoch': 0.39} 39%|███▉ | 1085/2774 [3:34:00<5:56:42, 12.67s/it] 39%|███▉ | 1086/2774 [3:34:11<5:43:50, 12.22s/it] {'loss': 1.0444, 'learning_rate': 3.474886433771401e-06, 'epoch': 0.39} 39%|███▉ | 1086/2774 [3:34:11<5:43:50, 12.22s/it] 39%|███▉ | 1087/2774 [3:34:23<5:42:41, 12.19s/it] {'loss': 1.043, 'learning_rate': 3.472197214122953e-06, 'epoch': 0.39} 39%|███▉ | 1087/2774 [3:34:23<5:42:41, 12.19s/it] 39%|███▉ | 1088/2774 [3:34:35<5:37:13, 12.00s/it] {'loss': 1.0225, 'learning_rate': 3.469506668456234e-06, 'epoch': 0.39} 39%|███▉ | 1088/2774 [3:34:35<5:37:13, 12.00s/it] 39%|███▉ | 1089/2774 [3:34:48<5:46:40, 12.34s/it] {'loss': 0.9492, 'learning_rate': 3.466814800440985e-06, 'epoch': 0.39} 39%|███▉ | 1089/2774 [3:34:48<5:46:40, 12.34s/it] 39%|███▉ | 1090/2774 [3:35:00<5:44:19, 12.27s/it] {'loss': 1.0264, 'learning_rate': 3.464121613748752e-06, 'epoch': 0.39} 39%|███▉ | 1090/2774 [3:35:00<5:44:19, 12.27s/it] 39%|███▉ | 1091/2774 [3:35:12<5:37:59, 12.05s/it] {'loss': 1.041, 'learning_rate': 3.4614271120528787e-06, 'epoch': 0.39} 39%|███▉ | 1091/2774 [3:35:12<5:37:59, 12.05s/it] 39%|███▉ | 1092/2774 [3:35:23<5:31:24, 11.82s/it] {'loss': 1.022, 'learning_rate': 3.458731299028503e-06, 'epoch': 0.39} 39%|███▉ | 1092/2774 [3:35:23<5:31:24, 11.82s/it] 39%|███▉ | 1093/2774 [3:35:34<5:26:34, 11.66s/it] {'loss': 1.0337, 'learning_rate': 3.456034178352551e-06, 'epoch': 0.39} 39%|███▉ | 1093/2774 [3:35:34<5:26:34, 11.66s/it] 39%|███▉ | 1094/2774 [3:35:46<5:25:46, 11.63s/it] {'loss': 1.0239, 'learning_rate': 3.4533357537037315e-06, 'epoch': 0.39} 39%|███▉ | 1094/2774 [3:35:46<5:25:46, 11.63s/it] 39%|███▉ | 1095/2774 [3:35:57<5:21:37, 11.49s/it] {'loss': 1.0645, 'learning_rate': 3.4506360287625337e-06, 'epoch': 0.39} 39%|███▉ | 1095/2774 [3:35:57<5:21:37, 11.49s/it] 40%|███▉ | 1096/2774 [3:36:08<5:19:41, 11.43s/it] {'loss': 1.0762, 'learning_rate': 3.4479350072112183e-06, 'epoch': 0.4} 40%|███▉ | 1096/2774 [3:36:08<5:19:41, 11.43s/it] 40%|███▉ | 1097/2774 [3:36:20<5:22:38, 11.54s/it] {'loss': 1.0298, 'learning_rate': 3.445232692733817e-06, 'epoch': 0.4} 40%|███▉ | 1097/2774 [3:36:20<5:22:38, 11.54s/it] 40%|███▉ | 1098/2774 [3:36:32<5:20:36, 11.48s/it] {'loss': 1.0229, 'learning_rate': 3.442529089016123e-06, 'epoch': 0.4} 40%|███▉ | 1098/2774 [3:36:32<5:20:36, 11.48s/it] 40%|███▉ | 1099/2774 [3:36:43<5:19:40, 11.45s/it] {'loss': 0.9966, 'learning_rate': 3.439824199745688e-06, 'epoch': 0.4} 40%|███▉ | 1099/2774 [3:36:43<5:19:40, 11.45s/it] 40%|███▉ | 1100/2774 [3:36:54<5:17:08, 11.37s/it] {'loss': 1.0884, 'learning_rate': 3.4371180286118172e-06, 'epoch': 0.4} 40%|███▉ | 1100/2774 [3:36:54<5:17:08, 11.37s/it] 40%|███▉ | 1101/2774 [3:37:06<5:20:37, 11.50s/it] {'loss': 1.021, 'learning_rate': 3.434410579305565e-06, 'epoch': 0.4} 40%|███▉ | 1101/2774 [3:37:06<5:20:37, 11.50s/it] 40%|███▉ | 1102/2774 [3:37:17<5:20:17, 11.49s/it] {'loss': 1.0591, 'learning_rate': 3.4317018555197303e-06, 'epoch': 0.4} 40%|███▉ | 1102/2774 [3:37:17<5:20:17, 11.49s/it] 40%|███▉ | 1103/2774 [3:37:29<5:24:06, 11.64s/it] {'loss': 1.0356, 'learning_rate': 3.4289918609488453e-06, 'epoch': 0.4} 40%|███▉ | 1103/2774 [3:37:29<5:24:06, 11.64s/it] 40%|███▉ | 1104/2774 [3:37:41<5:21:41, 11.56s/it] {'loss': 1.0093, 'learning_rate': 3.426280599289182e-06, 'epoch': 0.4} 40%|███▉ | 1104/2774 [3:37:41<5:21:41, 11.56s/it] 40%|███▉ | 1105/2774 [3:37:52<5:22:02, 11.58s/it] {'loss': 1.064, 'learning_rate': 3.4235680742387355e-06, 'epoch': 0.4} 40%|███▉ | 1105/2774 [3:37:52<5:22:02, 11.58s/it] 40%|███▉ | 1106/2774 [3:38:03<5:17:48, 11.43s/it] {'loss': 0.9858, 'learning_rate': 3.4208542894972272e-06, 'epoch': 0.4} 40%|███▉ | 1106/2774 [3:38:03<5:17:48, 11.43s/it] 40%|███▉ | 1107/2774 [3:38:15<5:16:16, 11.38s/it] {'loss': 1.0391, 'learning_rate': 3.4181392487660964e-06, 'epoch': 0.4} 40%|███▉ | 1107/2774 [3:38:15<5:16:16, 11.38s/it] 40%|███▉ | 1108/2774 [3:38:26<5:17:26, 11.43s/it] {'loss': 1.0908, 'learning_rate': 3.4154229557484924e-06, 'epoch': 0.4} 40%|███▉ | 1108/2774 [3:38:26<5:17:26, 11.43s/it] 40%|███▉ | 1109/2774 [3:38:38<5:19:32, 11.52s/it] {'loss': 1.0366, 'learning_rate': 3.412705414149276e-06, 'epoch': 0.4} 40%|███▉ | 1109/2774 [3:38:38<5:19:32, 11.52s/it] 40%|████ | 1110/2774 [3:38:50<5:19:49, 11.53s/it] {'loss': 1.0205, 'learning_rate': 3.4099866276750106e-06, 'epoch': 0.4} 40%|████ | 1110/2774 [3:38:50<5:19:49, 11.53s/it] 40%|████ | 1111/2774 [3:39:02<5:29:19, 11.88s/it] {'loss': 1.0029, 'learning_rate': 3.407266600033955e-06, 'epoch': 0.4} 40%|████ | 1111/2774 [3:39:02<5:29:19, 11.88s/it] 40%|████ | 1112/2774 [3:39:13<5:22:49, 11.65s/it] {'loss': 1.0264, 'learning_rate': 3.4045453349360643e-06, 'epoch': 0.4} 40%|████ | 1112/2774 [3:39:13<5:22:49, 11.65s/it] 40%|████ | 1113/2774 [3:39:25<5:20:51, 11.59s/it] {'loss': 1.0835, 'learning_rate': 3.401822836092977e-06, 'epoch': 0.4} 40%|████ | 1113/2774 [3:39:25<5:20:51, 11.59s/it] 40%|████ | 1114/2774 [3:39:37<5:23:13, 11.68s/it] {'loss': 0.9907, 'learning_rate': 3.39909910721802e-06, 'epoch': 0.4} 40%|████ | 1114/2774 [3:39:37<5:23:13, 11.68s/it] 40%|████ | 1115/2774 [3:39:49<5:26:00, 11.79s/it] {'loss': 0.9399, 'learning_rate': 3.396374152026194e-06, 'epoch': 0.4} 40%|████ | 1115/2774 [3:39:49<5:26:00, 11.79s/it] 40%|████ | 1116/2774 [3:40:00<5:23:55, 11.72s/it] {'loss': 1.0205, 'learning_rate': 3.3936479742341734e-06, 'epoch': 0.4} 40%|████ | 1116/2774 [3:40:00<5:23:55, 11.72s/it] 40%|████ | 1117/2774 [3:40:12<5:20:29, 11.61s/it] {'loss': 1.0361, 'learning_rate': 3.390920577560299e-06, 'epoch': 0.4} 40%|████ | 1117/2774 [3:40:12<5:20:29, 11.61s/it] 40%|████ | 1118/2774 [3:40:23<5:20:37, 11.62s/it] {'loss': 1.0571, 'learning_rate': 3.388191965724576e-06, 'epoch': 0.4} 40%|████ | 1118/2774 [3:40:23<5:20:37, 11.62s/it] 40%|████ | 1119/2774 [3:40:34<5:16:37, 11.48s/it] {'loss': 1.02, 'learning_rate': 3.3854621424486663e-06, 'epoch': 0.4} 40%|████ | 1119/2774 [3:40:34<5:16:37, 11.48s/it] 40%|████ | 1120/2774 [3:40:46<5:17:35, 11.52s/it] {'loss': 1.0649, 'learning_rate': 3.3827311114558834e-06, 'epoch': 0.4} 40%|████ | 1120/2774 [3:40:46<5:17:35, 11.52s/it] 40%|████ | 1121/2774 [3:40:58<5:23:30, 11.74s/it] {'loss': 0.9761, 'learning_rate': 3.3799988764711883e-06, 'epoch': 0.4} 40%|████ | 1121/2774 [3:40:58<5:23:30, 11.74s/it] 40%|████ | 1122/2774 [3:41:10<5:21:07, 11.66s/it] {'loss': 1.0967, 'learning_rate': 3.3772654412211854e-06, 'epoch': 0.4} 40%|████ | 1122/2774 [3:41:10<5:21:07, 11.66s/it] 40%|████ | 1123/2774 [3:41:22<5:21:36, 11.69s/it] {'loss': 1.0664, 'learning_rate': 3.3745308094341144e-06, 'epoch': 0.4} 40%|████ | 1123/2774 [3:41:22<5:21:36, 11.69s/it] 41%|████ | 1124/2774 [3:41:33<5:21:54, 11.71s/it] {'loss': 1.0107, 'learning_rate': 3.3717949848398485e-06, 'epoch': 0.41} 41%|████ | 1124/2774 [3:41:33<5:21:54, 11.71s/it] 41%|████ | 1125/2774 [3:41:45<5:19:18, 11.62s/it] {'loss': 0.9727, 'learning_rate': 3.369057971169888e-06, 'epoch': 0.41} 41%|████ | 1125/2774 [3:41:45<5:19:18, 11.62s/it] 41%|████ | 1126/2774 [3:41:58<5:31:24, 12.07s/it] {'loss': 1.0039, 'learning_rate': 3.3663197721573516e-06, 'epoch': 0.41} 41%|████ | 1126/2774 [3:41:58<5:31:24, 12.07s/it] 41%|████ | 1127/2774 [3:42:09<5:25:58, 11.88s/it] {'loss': 1.0557, 'learning_rate': 3.3635803915369795e-06, 'epoch': 0.41} 41%|████ | 1127/2774 [3:42:09<5:25:58, 11.88s/it] 41%|████ | 1128/2774 [3:42:21<5:27:46, 11.95s/it] {'loss': 1.0635, 'learning_rate': 3.3608398330451206e-06, 'epoch': 0.41} 41%|████ | 1128/2774 [3:42:21<5:27:46, 11.95s/it] 41%|████ | 1129/2774 [3:42:33<5:25:33, 11.87s/it] {'loss': 1.0112, 'learning_rate': 3.3580981004197323e-06, 'epoch': 0.41} 41%|████ | 1129/2774 [3:42:33<5:25:33, 11.87s/it] 41%|████ | 1130/2774 [3:42:44<5:21:18, 11.73s/it] {'loss': 1.0054, 'learning_rate': 3.35535519740037e-06, 'epoch': 0.41} 41%|████ | 1130/2774 [3:42:44<5:21:18, 11.73s/it] 41%|████ | 1131/2774 [3:42:56<5:19:04, 11.65s/it] {'loss': 1.0371, 'learning_rate': 3.3526111277281897e-06, 'epoch': 0.41} 41%|████ | 1131/2774 [3:42:56<5:19:04, 11.65s/it] 41%|████ | 1132/2774 [3:43:08<5:18:26, 11.64s/it] {'loss': 1.0132, 'learning_rate': 3.3498658951459357e-06, 'epoch': 0.41} 41%|████ | 1132/2774 [3:43:08<5:18:26, 11.64s/it] 41%|████ | 1133/2774 [3:43:19<5:14:37, 11.50s/it] {'loss': 1.0361, 'learning_rate': 3.3471195033979405e-06, 'epoch': 0.41} 41%|████ | 1133/2774 [3:43:19<5:14:37, 11.50s/it] 41%|████ | 1134/2774 [3:43:31<5:17:18, 11.61s/it] {'loss': 1.0649, 'learning_rate': 3.3443719562301147e-06, 'epoch': 0.41} 41%|████ | 1134/2774 [3:43:31<5:17:18, 11.61s/it] 41%|████ | 1135/2774 [3:43:42<5:14:22, 11.51s/it] {'loss': 1.0249, 'learning_rate': 3.341623257389949e-06, 'epoch': 0.41} 41%|████ | 1135/2774 [3:43:42<5:14:22, 11.51s/it] 41%|████ | 1136/2774 [3:43:53<5:13:53, 11.50s/it] {'loss': 1.002, 'learning_rate': 3.3388734106264997e-06, 'epoch': 0.41} 41%|████ | 1136/2774 [3:43:53<5:13:53, 11.50s/it] 41%|████ | 1137/2774 [3:44:05<5:14:31, 11.53s/it] {'loss': 1.0347, 'learning_rate': 3.336122419690394e-06, 'epoch': 0.41} 41%|████ | 1137/2774 [3:44:05<5:14:31, 11.53s/it] 41%|████ | 1138/2774 [3:44:17<5:15:48, 11.58s/it] {'loss': 1.0264, 'learning_rate': 3.333370288333817e-06, 'epoch': 0.41} 41%|████ | 1138/2774 [3:44:17<5:15:48, 11.58s/it] 41%|████ | 1139/2774 [3:44:31<5:34:12, 12.26s/it] {'loss': 1.0127, 'learning_rate': 3.3306170203105086e-06, 'epoch': 0.41} 41%|████ | 1139/2774 [3:44:31<5:34:12, 12.26s/it] 41%|████ | 1140/2774 [3:44:42<5:25:26, 11.95s/it] {'loss': 1.0225, 'learning_rate': 3.3278626193757607e-06, 'epoch': 0.41} 41%|████ | 1140/2774 [3:44:42<5:25:26, 11.95s/it] 41%|████ | 1141/2774 [3:44:54<5:25:18, 11.95s/it] {'loss': 1.063, 'learning_rate': 3.3251070892864097e-06, 'epoch': 0.41} 41%|████ | 1141/2774 [3:44:54<5:25:18, 11.95s/it] 41%|████ | 1142/2774 [3:45:05<5:19:39, 11.75s/it] {'loss': 0.9995, 'learning_rate': 3.322350433800832e-06, 'epoch': 0.41} 41%|████ | 1142/2774 [3:45:05<5:19:39, 11.75s/it] 41%|████ | 1143/2774 [3:45:17<5:19:02, 11.74s/it] {'loss': 1.0444, 'learning_rate': 3.3195926566789405e-06, 'epoch': 0.41} 41%|████ | 1143/2774 [3:45:17<5:19:02, 11.74s/it] 41%|████ | 1144/2774 [3:45:28<5:14:00, 11.56s/it] {'loss': 1.0435, 'learning_rate': 3.316833761682175e-06, 'epoch': 0.41} 41%|████ | 1144/2774 [3:45:28<5:14:00, 11.56s/it] 41%|████▏ | 1145/2774 [3:45:40<5:15:48, 11.63s/it] {'loss': 1.0327, 'learning_rate': 3.3140737525735017e-06, 'epoch': 0.41} 41%|████▏ | 1145/2774 [3:45:40<5:15:48, 11.63s/it] 41%|████▏ | 1146/2774 [3:45:51<5:13:40, 11.56s/it] {'loss': 1.0522, 'learning_rate': 3.311312633117407e-06, 'epoch': 0.41} 41%|████▏ | 1146/2774 [3:45:51<5:13:40, 11.56s/it] 41%|████▏ | 1147/2774 [3:46:02<5:12:50, 11.54s/it] {'loss': 1.0029, 'learning_rate': 3.3085504070798915e-06, 'epoch': 0.41} 41%|████▏ | 1147/2774 [3:46:02<5:12:50, 11.54s/it] 41%|████▏ | 1148/2774 [3:46:14<5:12:55, 11.55s/it] {'loss': 1.0493, 'learning_rate': 3.305787078228463e-06, 'epoch': 0.41} 41%|████▏ | 1148/2774 [3:46:14<5:12:55, 11.55s/it] 41%|████▏ | 1149/2774 [3:46:28<5:28:35, 12.13s/it] {'loss': 1.0, 'learning_rate': 3.303022650332136e-06, 'epoch': 0.41} 41%|████▏ | 1149/2774 [3:46:28<5:28:35, 12.13s/it] 41%|████▏ | 1150/2774 [3:46:39<5:23:38, 11.96s/it] {'loss': 1.0249, 'learning_rate': 3.3002571271614233e-06, 'epoch': 0.41} 41%|████▏ | 1150/2774 [3:46:39<5:23:38, 11.96s/it] 41%|████▏ | 1151/2774 [3:46:51<5:20:22, 11.84s/it] {'loss': 1.0166, 'learning_rate': 3.2974905124883315e-06, 'epoch': 0.41} 41%|████▏ | 1151/2774 [3:46:51<5:20:22, 11.84s/it] 42%|████▏ | 1152/2774 [3:47:02<5:17:48, 11.76s/it] {'loss': 1.1074, 'learning_rate': 3.2947228100863558e-06, 'epoch': 0.42} 42%|████▏ | 1152/2774 [3:47:02<5:17:48, 11.76s/it] 42%|████▏ | 1153/2774 [3:47:14<5:17:53, 11.77s/it] {'loss': 1.0884, 'learning_rate': 3.2919540237304746e-06, 'epoch': 0.42} 42%|████▏ | 1153/2774 [3:47:14<5:17:53, 11.77s/it] 42%|████▏ | 1154/2774 [3:47:26<5:17:13, 11.75s/it] {'loss': 0.9922, 'learning_rate': 3.2891841571971463e-06, 'epoch': 0.42} 42%|████▏ | 1154/2774 [3:47:26<5:17:13, 11.75s/it] 42%|████▏ | 1155/2774 [3:47:39<5:30:38, 12.25s/it] {'loss': 0.9536, 'learning_rate': 3.2864132142643e-06, 'epoch': 0.42} 42%|████▏ | 1155/2774 [3:47:39<5:30:38, 12.25s/it] 42%|████▏ | 1156/2774 [3:47:53<5:40:12, 12.62s/it] {'loss': 1.0347, 'learning_rate': 3.283641198711337e-06, 'epoch': 0.42} 42%|████▏ | 1156/2774 [3:47:53<5:40:12, 12.62s/it] 42%|████▏ | 1157/2774 [3:48:04<5:28:54, 12.20s/it] {'loss': 1.0542, 'learning_rate': 3.2808681143191162e-06, 'epoch': 0.42} 42%|████▏ | 1157/2774 [3:48:04<5:28:54, 12.20s/it] 42%|████▏ | 1158/2774 [3:48:15<5:22:01, 11.96s/it] {'loss': 1.0859, 'learning_rate': 3.278093964869959e-06, 'epoch': 0.42} 42%|████▏ | 1158/2774 [3:48:15<5:22:01, 11.96s/it] 42%|████▏ | 1159/2774 [3:48:26<5:15:26, 11.72s/it] {'loss': 1.0015, 'learning_rate': 3.275318754147636e-06, 'epoch': 0.42} 42%|████▏ | 1159/2774 [3:48:26<5:15:26, 11.72s/it] 42%|████▏ | 1160/2774 [3:48:40<5:30:16, 12.28s/it] {'loss': 1.0317, 'learning_rate': 3.272542485937369e-06, 'epoch': 0.42} 42%|████▏ | 1160/2774 [3:48:40<5:30:16, 12.28s/it] 42%|████▏ | 1161/2774 [3:48:52<5:29:21, 12.25s/it] {'loss': 1.0337, 'learning_rate': 3.2697651640258195e-06, 'epoch': 0.42} 42%|████▏ | 1161/2774 [3:48:52<5:29:21, 12.25s/it] 42%|████▏ | 1162/2774 [3:49:03<5:20:47, 11.94s/it] {'loss': 0.9932, 'learning_rate': 3.266986792201086e-06, 'epoch': 0.42} 42%|████▏ | 1162/2774 [3:49:03<5:20:47, 11.94s/it] 42%|████▏ | 1163/2774 [3:49:16<5:23:41, 12.06s/it] {'loss': 1.0225, 'learning_rate': 3.2642073742527e-06, 'epoch': 0.42} 42%|████▏ | 1163/2774 [3:49:16<5:23:41, 12.06s/it] 42%|████▏ | 1164/2774 [3:49:27<5:16:35, 11.80s/it] {'loss': 1.0181, 'learning_rate': 3.26142691397162e-06, 'epoch': 0.42} 42%|████▏ | 1164/2774 [3:49:27<5:16:35, 11.80s/it] 42%|████▏ | 1165/2774 [3:49:38<5:11:12, 11.61s/it] {'loss': 1.0171, 'learning_rate': 3.258645415150226e-06, 'epoch': 0.42} 42%|████▏ | 1165/2774 [3:49:38<5:11:12, 11.61s/it] 42%|████▏ | 1166/2774 [3:49:49<5:09:27, 11.55s/it] {'loss': 1.0332, 'learning_rate': 3.2558628815823144e-06, 'epoch': 0.42} 42%|████▏ | 1166/2774 [3:49:49<5:09:27, 11.55s/it] 42%|████▏ | 1167/2774 [3:50:03<5:22:13, 12.03s/it] {'loss': 1.0835, 'learning_rate': 3.2530793170630926e-06, 'epoch': 0.42} 42%|████▏ | 1167/2774 [3:50:03<5:22:13, 12.03s/it] 42%|████▏ | 1168/2774 [3:50:14<5:18:01, 11.88s/it] {'loss': 1.0527, 'learning_rate': 3.2502947253891742e-06, 'epoch': 0.42} 42%|████▏ | 1168/2774 [3:50:14<5:18:01, 11.88s/it] 42%|████▏ | 1169/2774 [3:50:27<5:27:00, 12.22s/it] {'loss': 0.9717, 'learning_rate': 3.247509110358575e-06, 'epoch': 0.42} 42%|████▏ | 1169/2774 [3:50:27<5:27:00, 12.22s/it] 42%|████▏ | 1170/2774 [3:50:40<5:29:11, 12.31s/it] {'loss': 1.0449, 'learning_rate': 3.244722475770705e-06, 'epoch': 0.42} 42%|████▏ | 1170/2774 [3:50:40<5:29:11, 12.31s/it] 42%|████▏ | 1171/2774 [3:50:54<5:43:37, 12.86s/it] {'loss': 0.978, 'learning_rate': 3.2419348254263653e-06, 'epoch': 0.42} 42%|████▏ | 1171/2774 [3:50:54<5:43:37, 12.86s/it] 42%|████▏ | 1172/2774 [3:51:05<5:29:35, 12.34s/it] {'loss': 0.9688, 'learning_rate': 3.239146163127743e-06, 'epoch': 0.42} 42%|████▏ | 1172/2774 [3:51:05<5:29:35, 12.34s/it] 42%|████▏ | 1173/2774 [3:51:16<5:20:32, 12.01s/it] {'loss': 1.0488, 'learning_rate': 3.236356492678404e-06, 'epoch': 0.42} 42%|████▏ | 1173/2774 [3:51:16<5:20:32, 12.01s/it] 42%|████▏ | 1174/2774 [3:51:29<5:25:39, 12.21s/it] {'loss': 1.0083, 'learning_rate': 3.2335658178832926e-06, 'epoch': 0.42} 42%|████▏ | 1174/2774 [3:51:29<5:25:39, 12.21s/it] 42%|████▏ | 1175/2774 [3:51:40<5:18:18, 11.94s/it] {'loss': 1.0234, 'learning_rate': 3.230774142548718e-06, 'epoch': 0.42} 42%|████▏ | 1175/2774 [3:51:40<5:18:18, 11.94s/it] 42%|████▏ | 1176/2774 [3:51:53<5:24:46, 12.19s/it] {'loss': 0.98, 'learning_rate': 3.2279814704823575e-06, 'epoch': 0.42} 42%|████▏ | 1176/2774 [3:51:53<5:24:46, 12.19s/it] 42%|████▏ | 1177/2774 [3:52:05<5:22:16, 12.11s/it] {'loss': 1.0581, 'learning_rate': 3.2251878054932482e-06, 'epoch': 0.42} 42%|████▏ | 1177/2774 [3:52:05<5:22:16, 12.11s/it] 42%|████▏ | 1178/2774 [3:52:17<5:17:54, 11.95s/it] {'loss': 1.0107, 'learning_rate': 3.222393151391779e-06, 'epoch': 0.42} 42%|████▏ | 1178/2774 [3:52:17<5:17:54, 11.95s/it] 43%|████▎ | 1179/2774 [3:52:28<5:13:06, 11.78s/it] {'loss': 1.0454, 'learning_rate': 3.2195975119896907e-06, 'epoch': 0.43} 43%|████▎ | 1179/2774 [3:52:28<5:13:06, 11.78s/it] 43%|████▎ | 1180/2774 [3:52:42<5:28:02, 12.35s/it] {'loss': 1.0195, 'learning_rate': 3.216800891100065e-06, 'epoch': 0.43} 43%|████▎ | 1180/2774 [3:52:42<5:28:02, 12.35s/it] 43%|████▎ | 1181/2774 [3:52:54<5:32:05, 12.51s/it] {'loss': 1.0405, 'learning_rate': 3.214003292537325e-06, 'epoch': 0.43} 43%|████▎ | 1181/2774 [3:52:54<5:32:05, 12.51s/it] 43%|████▎ | 1182/2774 [3:53:07<5:28:54, 12.40s/it] {'loss': 1.0732, 'learning_rate': 3.211204720117225e-06, 'epoch': 0.43} 43%|████▎ | 1182/2774 [3:53:07<5:28:54, 12.40s/it] 43%|████▎ | 1183/2774 [3:53:18<5:21:41, 12.13s/it] {'loss': 1.0366, 'learning_rate': 3.2084051776568504e-06, 'epoch': 0.43} 43%|████▎ | 1183/2774 [3:53:18<5:21:41, 12.13s/it] 43%|████▎ | 1184/2774 [3:53:29<5:13:48, 11.84s/it] {'loss': 1.0669, 'learning_rate': 3.205604668974607e-06, 'epoch': 0.43} 43%|████▎ | 1184/2774 [3:53:29<5:13:48, 11.84s/it] 43%|████▎ | 1185/2774 [3:53:41<5:10:12, 11.71s/it] {'loss': 1.0674, 'learning_rate': 3.2028031978902186e-06, 'epoch': 0.43} 43%|████▎ | 1185/2774 [3:53:41<5:10:12, 11.71s/it] 43%|████▎ | 1186/2774 [3:53:53<5:13:38, 11.85s/it] {'loss': 1.0845, 'learning_rate': 3.2000007682247243e-06, 'epoch': 0.43} 43%|████▎ | 1186/2774 [3:53:53<5:13:38, 11.85s/it] 43%|████▎ | 1187/2774 [3:54:05<5:12:21, 11.81s/it] {'loss': 1.0503, 'learning_rate': 3.1971973838004673e-06, 'epoch': 0.43} 43%|████▎ | 1187/2774 [3:54:05<5:12:21, 11.81s/it] 43%|████▎ | 1188/2774 [3:54:16<5:11:44, 11.79s/it] {'loss': 1.0073, 'learning_rate': 3.1943930484410963e-06, 'epoch': 0.43} 43%|████▎ | 1188/2774 [3:54:16<5:11:44, 11.79s/it] 43%|████▎ | 1189/2774 [3:54:29<5:16:04, 11.96s/it] {'loss': 1.0361, 'learning_rate': 3.191587765971553e-06, 'epoch': 0.43} 43%|████▎ | 1189/2774 [3:54:29<5:16:04, 11.96s/it] 43%|████▎ | 1190/2774 [3:54:42<5:28:50, 12.46s/it] {'loss': 1.0171, 'learning_rate': 3.1887815402180756e-06, 'epoch': 0.43} 43%|████▎ | 1190/2774 [3:54:42<5:28:50, 12.46s/it] 43%|████▎ | 1191/2774 [3:54:54<5:19:51, 12.12s/it] {'loss': 1.0322, 'learning_rate': 3.1859743750081853e-06, 'epoch': 0.43} 43%|████▎ | 1191/2774 [3:54:54<5:19:51, 12.12s/it] 43%|████▎ | 1192/2774 [3:55:05<5:14:23, 11.92s/it] {'loss': 1.0059, 'learning_rate': 3.1831662741706853e-06, 'epoch': 0.43} 43%|████▎ | 1192/2774 [3:55:05<5:14:23, 11.92s/it] 43%|████▎ | 1193/2774 [3:55:16<5:09:10, 11.73s/it] {'loss': 1.0601, 'learning_rate': 3.1803572415356576e-06, 'epoch': 0.43} 43%|████▎ | 1193/2774 [3:55:16<5:09:10, 11.73s/it] 43%|████▎ | 1194/2774 [3:55:27<5:03:15, 11.52s/it] {'loss': 1.0273, 'learning_rate': 3.177547280934451e-06, 'epoch': 0.43} 43%|████▎ | 1194/2774 [3:55:27<5:03:15, 11.52s/it] 43%|████▎ | 1195/2774 [3:55:39<5:03:28, 11.53s/it] {'loss': 0.9937, 'learning_rate': 3.1747363961996823e-06, 'epoch': 0.43} 43%|████▎ | 1195/2774 [3:55:39<5:03:28, 11.53s/it] 43%|████▎ | 1196/2774 [3:55:51<5:04:06, 11.56s/it] {'loss': 1.0278, 'learning_rate': 3.171924591165229e-06, 'epoch': 0.43} 43%|████▎ | 1196/2774 [3:55:51<5:04:06, 11.56s/it] 43%|████▎ | 1197/2774 [3:56:02<5:06:07, 11.65s/it] {'loss': 1.0942, 'learning_rate': 3.1691118696662245e-06, 'epoch': 0.43} 43%|████▎ | 1197/2774 [3:56:02<5:06:07, 11.65s/it] 43%|████▎ | 1198/2774 [3:56:14<5:02:38, 11.52s/it] {'loss': 1.0977, 'learning_rate': 3.166298235539048e-06, 'epoch': 0.43} 43%|████▎ | 1198/2774 [3:56:14<5:02:38, 11.52s/it] 43%|████▎ | 1199/2774 [3:56:26<5:10:37, 11.83s/it] {'loss': 1.0049, 'learning_rate': 3.1634836926213287e-06, 'epoch': 0.43} 43%|████▎ | 1199/2774 [3:56:26<5:10:37, 11.83s/it] 43%|████▎ | 1200/2774 [3:56:37<5:05:58, 11.66s/it] {'loss': 1.0566, 'learning_rate': 3.1606682447519333e-06, 'epoch': 0.43} 43%|████▎ | 1200/2774 [3:56:37<5:05:58, 11.66s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 43%|████▎ | 1201/2774 [3:57:15<8:32:41, 19.56s/it] {'loss': 1.0459, 'learning_rate': 3.157851895770961e-06, 'epoch': 0.43} 43%|████▎ | 1201/2774 [3:57:16<8:32:41, 19.56s/it] 43%|████▎ | 1202/2774 [3:57:29<7:41:41, 17.62s/it] {'loss': 1.0112, 'learning_rate': 3.1550346495197433e-06, 'epoch': 0.43} 43%|████▎ | 1202/2774 [3:57:29<7:41:41, 17.62s/it] 43%|████▎ | 1203/2774 [3:57:40<6:52:07, 15.74s/it] {'loss': 1.0371, 'learning_rate': 3.1522165098408332e-06, 'epoch': 0.43} 43%|████▎ | 1203/2774 [3:57:40<6:52:07, 15.74s/it] 43%|████▎ | 1204/2774 [3:57:51<6:18:28, 14.46s/it] {'loss': 1.1455, 'learning_rate': 3.149397480578002e-06, 'epoch': 0.43} 43%|████▎ | 1204/2774 [3:57:51<6:18:28, 14.46s/it] 43%|████▎ | 1205/2774 [3:58:03<5:53:45, 13.53s/it] {'loss': 1.0488, 'learning_rate': 3.1465775655762377e-06, 'epoch': 0.43} 43%|████▎ | 1205/2774 [3:58:03<5:53:45, 13.53s/it] 43%|████▎ | 1206/2774 [3:58:14<5:36:35, 12.88s/it] {'loss': 1.0068, 'learning_rate': 3.1437567686817317e-06, 'epoch': 0.43} 43%|████▎ | 1206/2774 [3:58:14<5:36:35, 12.88s/it] 44%|████▎ | 1207/2774 [3:58:25<5:24:29, 12.42s/it] {'loss': 1.04, 'learning_rate': 3.140935093741882e-06, 'epoch': 0.44} 44%|████▎ | 1207/2774 [3:58:25<5:24:29, 12.42s/it] 44%|████▎ | 1208/2774 [3:58:37<5:16:06, 12.11s/it] {'loss': 1.0581, 'learning_rate': 3.138112544605282e-06, 'epoch': 0.44} 44%|████▎ | 1208/2774 [3:58:37<5:16:06, 12.11s/it] 44%|████▎ | 1209/2774 [3:58:49<5:17:56, 12.19s/it] {'loss': 1.0122, 'learning_rate': 3.1352891251217183e-06, 'epoch': 0.44} 44%|████▎ | 1209/2774 [3:58:49<5:17:56, 12.19s/it] 44%|████▎ | 1210/2774 [3:59:02<5:22:09, 12.36s/it] {'loss': 0.9976, 'learning_rate': 3.132464839142165e-06, 'epoch': 0.44} 44%|████▎ | 1210/2774 [3:59:02<5:22:09, 12.36s/it] 44%|████▎ | 1211/2774 [3:59:13<5:15:00, 12.09s/it] {'loss': 0.9946, 'learning_rate': 3.129639690518777e-06, 'epoch': 0.44} 44%|████▎ | 1211/2774 [3:59:13<5:15:00, 12.09s/it] 44%|████▎ | 1212/2774 [3:59:25<5:10:48, 11.94s/it] {'loss': 1.0156, 'learning_rate': 3.126813683104887e-06, 'epoch': 0.44} 44%|████▎ | 1212/2774 [3:59:25<5:10:48, 11.94s/it] 44%|████▎ | 1213/2774 [3:59:37<5:08:15, 11.85s/it] {'loss': 1.0679, 'learning_rate': 3.1239868207549974e-06, 'epoch': 0.44} 44%|████▎ | 1213/2774 [3:59:37<5:08:15, 11.85s/it] 44%|████▍ | 1214/2774 [3:59:48<5:05:14, 11.74s/it] {'loss': 1.1006, 'learning_rate': 3.121159107324778e-06, 'epoch': 0.44} 44%|████▍ | 1214/2774 [3:59:48<5:05:14, 11.74s/it] 44%|████▍ | 1215/2774 [4:00:01<5:10:53, 11.96s/it] {'loss': 1.0308, 'learning_rate': 3.1183305466710605e-06, 'epoch': 0.44} 44%|████▍ | 1215/2774 [4:00:01<5:10:53, 11.96s/it] 44%|████▍ | 1216/2774 [4:00:12<5:05:03, 11.75s/it] {'loss': 1.0269, 'learning_rate': 3.115501142651829e-06, 'epoch': 0.44} 44%|████▍ | 1216/2774 [4:00:12<5:05:03, 11.75s/it] 44%|████▍ | 1217/2774 [4:00:23<5:03:12, 11.68s/it] {'loss': 1.0254, 'learning_rate': 3.1126708991262205e-06, 'epoch': 0.44} 44%|████▍ | 1217/2774 [4:00:23<5:03:12, 11.68s/it] 44%|████▍ | 1218/2774 [4:00:35<5:04:53, 11.76s/it] {'loss': 1.0059, 'learning_rate': 3.109839819954516e-06, 'epoch': 0.44} 44%|████▍ | 1218/2774 [4:00:35<5:04:53, 11.76s/it] 44%|████▍ | 1219/2774 [4:00:47<5:04:41, 11.76s/it] {'loss': 1.019, 'learning_rate': 3.1070079089981364e-06, 'epoch': 0.44} 44%|████▍ | 1219/2774 [4:00:47<5:04:41, 11.76s/it] 44%|████▍ | 1220/2774 [4:01:01<5:17:26, 12.26s/it] {'loss': 1.0669, 'learning_rate': 3.1041751701196377e-06, 'epoch': 0.44} 44%|████▍ | 1220/2774 [4:01:01<5:17:26, 12.26s/it] 44%|████▍ | 1221/2774 [4:01:14<5:22:50, 12.47s/it] {'loss': 0.9658, 'learning_rate': 3.1013416071827034e-06, 'epoch': 0.44} 44%|████▍ | 1221/2774 [4:01:14<5:22:50, 12.47s/it] 44%|████▍ | 1222/2774 [4:01:25<5:16:26, 12.23s/it] {'loss': 1.0435, 'learning_rate': 3.0985072240521434e-06, 'epoch': 0.44} 44%|████▍ | 1222/2774 [4:01:25<5:16:26, 12.23s/it] 44%|████▍ | 1223/2774 [4:01:39<5:26:46, 12.64s/it] {'loss': 0.9946, 'learning_rate': 3.0956720245938845e-06, 'epoch': 0.44} 44%|████▍ | 1223/2774 [4:01:39<5:26:46, 12.64s/it] 44%|████▍ | 1224/2774 [4:01:50<5:16:44, 12.26s/it] {'loss': 1.0352, 'learning_rate': 3.092836012674968e-06, 'epoch': 0.44} 44%|████▍ | 1224/2774 [4:01:50<5:16:44, 12.26s/it] 44%|████▍ | 1225/2774 [4:02:02<5:17:05, 12.28s/it] {'loss': 1.1304, 'learning_rate': 3.089999192163542e-06, 'epoch': 0.44} 44%|████▍ | 1225/2774 [4:02:02<5:17:05, 12.28s/it] 44%|████▍ | 1226/2774 [4:02:14<5:12:31, 12.11s/it] {'loss': 0.9897, 'learning_rate': 3.0871615669288584e-06, 'epoch': 0.44} 44%|████▍ | 1226/2774 [4:02:14<5:12:31, 12.11s/it] 44%|████▍ | 1227/2774 [4:02:27<5:16:49, 12.29s/it] {'loss': 1.0098, 'learning_rate': 3.0843231408412675e-06, 'epoch': 0.44} 44%|████▍ | 1227/2774 [4:02:27<5:16:49, 12.29s/it] 44%|████▍ | 1228/2774 [4:02:38<5:08:28, 11.97s/it] {'loss': 1.0762, 'learning_rate': 3.0814839177722108e-06, 'epoch': 0.44} 44%|████▍ | 1228/2774 [4:02:38<5:08:28, 11.97s/it] 44%|████▍ | 1229/2774 [4:02:50<5:11:08, 12.08s/it] {'loss': 0.9956, 'learning_rate': 3.078643901594216e-06, 'epoch': 0.44} 44%|████▍ | 1229/2774 [4:02:50<5:11:08, 12.08s/it] 44%|████▍ | 1230/2774 [4:03:02<5:03:50, 11.81s/it] {'loss': 1.0835, 'learning_rate': 3.0758030961808954e-06, 'epoch': 0.44} 44%|████▍ | 1230/2774 [4:03:02<5:03:50, 11.81s/it] 44%|████▍ | 1231/2774 [4:03:13<5:01:24, 11.72s/it] {'loss': 1.0317, 'learning_rate': 3.0729615054069338e-06, 'epoch': 0.44} 44%|████▍ | 1231/2774 [4:03:13<5:01:24, 11.72s/it] 44%|████▍ | 1232/2774 [4:03:25<4:59:07, 11.64s/it] {'loss': 1.0166, 'learning_rate': 3.0701191331480905e-06, 'epoch': 0.44} 44%|████▍ | 1232/2774 [4:03:25<4:59:07, 11.64s/it] 44%|████▍ | 1233/2774 [4:03:36<4:55:31, 11.51s/it] {'loss': 1.0029, 'learning_rate': 3.0672759832811904e-06, 'epoch': 0.44} 44%|████▍ | 1233/2774 [4:03:36<4:55:31, 11.51s/it] 44%|████▍ | 1234/2774 [4:03:47<4:53:43, 11.44s/it] {'loss': 1.0884, 'learning_rate': 3.064432059684117e-06, 'epoch': 0.44} 44%|████▍ | 1234/2774 [4:03:47<4:53:43, 11.44s/it] 45%|████▍ | 1235/2774 [4:03:59<4:55:01, 11.50s/it] {'loss': 1.0776, 'learning_rate': 3.06158736623581e-06, 'epoch': 0.45} 45%|████▍ | 1235/2774 [4:03:59<4:55:01, 11.50s/it] 45%|████▍ | 1236/2774 [4:04:11<4:57:25, 11.60s/it] {'loss': 1.0776, 'learning_rate': 3.0587419068162605e-06, 'epoch': 0.45} 45%|████▍ | 1236/2774 [4:04:11<4:57:25, 11.60s/it] 45%|████▍ | 1237/2774 [4:04:22<4:55:17, 11.53s/it] {'loss': 1.0825, 'learning_rate': 3.0558956853065024e-06, 'epoch': 0.45} 45%|████▍ | 1237/2774 [4:04:22<4:55:17, 11.53s/it] 45%|████▍ | 1238/2774 [4:04:33<4:54:43, 11.51s/it] {'loss': 1.0132, 'learning_rate': 3.053048705588611e-06, 'epoch': 0.45} 45%|████▍ | 1238/2774 [4:04:33<4:54:43, 11.51s/it] 45%|████▍ | 1239/2774 [4:04:45<4:53:39, 11.48s/it] {'loss': 1.0645, 'learning_rate': 3.050200971545693e-06, 'epoch': 0.45} 45%|████▍ | 1239/2774 [4:04:45<4:53:39, 11.48s/it] 45%|████▍ | 1240/2774 [4:04:56<4:53:32, 11.48s/it] {'loss': 0.9956, 'learning_rate': 3.047352487061887e-06, 'epoch': 0.45} 45%|████▍ | 1240/2774 [4:04:56<4:53:32, 11.48s/it] 45%|████▍ | 1241/2774 [4:05:08<4:52:07, 11.43s/it] {'loss': 1.1069, 'learning_rate': 3.044503256022353e-06, 'epoch': 0.45} 45%|████▍ | 1241/2774 [4:05:08<4:52:07, 11.43s/it] 45%|████▍ | 1242/2774 [4:05:19<4:52:14, 11.45s/it] {'loss': 1.0874, 'learning_rate': 3.041653282313271e-06, 'epoch': 0.45} 45%|████▍ | 1242/2774 [4:05:19<4:52:14, 11.45s/it] 45%|████▍ | 1243/2774 [4:05:31<4:53:50, 11.52s/it] {'loss': 1.0337, 'learning_rate': 3.0388025698218315e-06, 'epoch': 0.45} 45%|████▍ | 1243/2774 [4:05:31<4:53:50, 11.52s/it] 45%|████▍ | 1244/2774 [4:05:42<4:53:06, 11.49s/it] {'loss': 0.981, 'learning_rate': 3.0359511224362353e-06, 'epoch': 0.45} 45%|████▍ | 1244/2774 [4:05:42<4:53:06, 11.49s/it] 45%|████▍ | 1245/2774 [4:05:54<4:54:05, 11.54s/it] {'loss': 1.0, 'learning_rate': 3.0330989440456837e-06, 'epoch': 0.45} 45%|████▍ | 1245/2774 [4:05:54<4:54:05, 11.54s/it] 45%|████▍ | 1246/2774 [4:06:07<5:09:14, 12.14s/it] {'loss': 0.9878, 'learning_rate': 3.0302460385403763e-06, 'epoch': 0.45} 45%|████▍ | 1246/2774 [4:06:07<5:09:14, 12.14s/it] 45%|████▍ | 1247/2774 [4:06:20<5:08:58, 12.14s/it] {'loss': 1.0156, 'learning_rate': 3.0273924098115045e-06, 'epoch': 0.45} 45%|████▍ | 1247/2774 [4:06:20<5:08:58, 12.14s/it] 45%|████▍ | 1248/2774 [4:06:31<5:01:43, 11.86s/it] {'loss': 1.0747, 'learning_rate': 3.024538061751243e-06, 'epoch': 0.45} 45%|████▍ | 1248/2774 [4:06:31<5:01:43, 11.86s/it] 45%|████▌ | 1249/2774 [4:06:44<5:11:30, 12.26s/it] {'loss': 0.9688, 'learning_rate': 3.021682998252753e-06, 'epoch': 0.45} 45%|████▌ | 1249/2774 [4:06:44<5:11:30, 12.26s/it] 45%|████▌ | 1250/2774 [4:06:55<5:03:55, 11.97s/it] {'loss': 1.1143, 'learning_rate': 3.0188272232101666e-06, 'epoch': 0.45} 45%|████▌ | 1250/2774 [4:06:55<5:03:55, 11.97s/it] 45%|████▌ | 1251/2774 [4:07:06<4:58:15, 11.75s/it] {'loss': 1.0503, 'learning_rate': 3.01597074051859e-06, 'epoch': 0.45} 45%|████▌ | 1251/2774 [4:07:06<4:58:15, 11.75s/it] 45%|████▌ | 1252/2774 [4:07:18<4:56:05, 11.67s/it] {'loss': 0.9985, 'learning_rate': 3.0131135540740915e-06, 'epoch': 0.45} 45%|████▌ | 1252/2774 [4:07:18<4:56:05, 11.67s/it] 45%|████▌ | 1253/2774 [4:07:30<5:01:13, 11.88s/it] {'loss': 0.9858, 'learning_rate': 3.0102556677737024e-06, 'epoch': 0.45} 45%|████▌ | 1253/2774 [4:07:30<5:01:13, 11.88s/it] 45%|████▌ | 1254/2774 [4:07:42<4:57:16, 11.73s/it] {'loss': 1.0024, 'learning_rate': 3.0073970855154057e-06, 'epoch': 0.45} 45%|████▌ | 1254/2774 [4:07:42<4:57:16, 11.73s/it] 45%|████▌ | 1255/2774 [4:07:54<5:04:23, 12.02s/it] {'loss': 0.9883, 'learning_rate': 3.004537811198135e-06, 'epoch': 0.45} 45%|████▌ | 1255/2774 [4:07:54<5:04:23, 12.02s/it] 45%|████▌ | 1256/2774 [4:08:06<5:00:16, 11.87s/it] {'loss': 1.0713, 'learning_rate': 3.0016778487217683e-06, 'epoch': 0.45} 45%|████▌ | 1256/2774 [4:08:06<5:00:16, 11.87s/it] 45%|████▌ | 1257/2774 [4:08:18<4:59:35, 11.85s/it] {'loss': 0.9922, 'learning_rate': 2.9988172019871216e-06, 'epoch': 0.45} 45%|████▌ | 1257/2774 [4:08:18<4:59:35, 11.85s/it] 45%|████▌ | 1258/2774 [4:08:29<4:54:34, 11.66s/it] {'loss': 1.0132, 'learning_rate': 2.995955874895944e-06, 'epoch': 0.45} 45%|████▌ | 1258/2774 [4:08:29<4:54:34, 11.66s/it] 45%|████▌ | 1259/2774 [4:08:41<4:53:57, 11.64s/it] {'loss': 1.0488, 'learning_rate': 2.9930938713509127e-06, 'epoch': 0.45} 45%|████▌ | 1259/2774 [4:08:41<4:53:57, 11.64s/it] 45%|████▌ | 1260/2774 [4:08:52<4:53:54, 11.65s/it] {'loss': 1.0405, 'learning_rate': 2.9902311952556286e-06, 'epoch': 0.45} 45%|████▌ | 1260/2774 [4:08:52<4:53:54, 11.65s/it] 45%|████▌ | 1261/2774 [4:09:05<5:02:27, 11.99s/it] {'loss': 0.979, 'learning_rate': 2.9873678505146077e-06, 'epoch': 0.45} 45%|████▌ | 1261/2774 [4:09:05<5:02:27, 11.99s/it] 45%|████▌ | 1262/2774 [4:09:18<5:10:33, 12.32s/it] {'loss': 1.061, 'learning_rate': 2.9845038410332793e-06, 'epoch': 0.45} 45%|████▌ | 1262/2774 [4:09:18<5:10:33, 12.32s/it] 46%|████▌ | 1263/2774 [4:09:29<5:02:45, 12.02s/it] {'loss': 1.0059, 'learning_rate': 2.9816391707179802e-06, 'epoch': 0.46} 46%|████▌ | 1263/2774 [4:09:29<5:02:45, 12.02s/it] 46%|████▌ | 1264/2774 [4:09:41<4:59:43, 11.91s/it] {'loss': 1.0088, 'learning_rate': 2.9787738434759472e-06, 'epoch': 0.46} 46%|████▌ | 1264/2774 [4:09:41<4:59:43, 11.91s/it] 46%|████▌ | 1265/2774 [4:09:53<4:59:13, 11.90s/it] {'loss': 1.0557, 'learning_rate': 2.9759078632153145e-06, 'epoch': 0.46} 46%|████▌ | 1265/2774 [4:09:53<4:59:13, 11.90s/it] 46%|████▌ | 1266/2774 [4:10:04<4:55:09, 11.74s/it] {'loss': 1.0107, 'learning_rate': 2.9730412338451044e-06, 'epoch': 0.46} 46%|████▌ | 1266/2774 [4:10:04<4:55:09, 11.74s/it] 46%|████▌ | 1267/2774 [4:10:16<4:54:18, 11.72s/it] {'loss': 1.0137, 'learning_rate': 2.9701739592752265e-06, 'epoch': 0.46} 46%|████▌ | 1267/2774 [4:10:16<4:54:18, 11.72s/it] 46%|████▌ | 1268/2774 [4:10:28<4:57:31, 11.85s/it] {'loss': 1.0244, 'learning_rate': 2.9673060434164712e-06, 'epoch': 0.46} 46%|████▌ | 1268/2774 [4:10:28<4:57:31, 11.85s/it] 46%|████▌ | 1269/2774 [4:10:39<4:53:03, 11.68s/it] {'loss': 1.0547, 'learning_rate': 2.9644374901805025e-06, 'epoch': 0.46} 46%|████▌ | 1269/2774 [4:10:39<4:53:03, 11.68s/it] 46%|████▌ | 1270/2774 [4:10:53<5:05:04, 12.17s/it] {'loss': 0.9863, 'learning_rate': 2.9615683034798514e-06, 'epoch': 0.46} 46%|████▌ | 1270/2774 [4:10:53<5:05:04, 12.17s/it] 46%|████▌ | 1271/2774 [4:11:05<5:03:30, 12.12s/it] {'loss': 0.9966, 'learning_rate': 2.9586984872279178e-06, 'epoch': 0.46} 46%|████▌ | 1271/2774 [4:11:05<5:03:30, 12.12s/it] 46%|████▌ | 1272/2774 [4:11:18<5:08:22, 12.32s/it] {'loss': 1.0317, 'learning_rate': 2.955828045338957e-06, 'epoch': 0.46} 46%|████▌ | 1272/2774 [4:11:18<5:08:22, 12.32s/it] 46%|████▌ | 1273/2774 [4:11:29<5:03:43, 12.14s/it] {'loss': 1.0034, 'learning_rate': 2.952956981728078e-06, 'epoch': 0.46} 46%|████▌ | 1273/2774 [4:11:29<5:03:43, 12.14s/it] 46%|████▌ | 1274/2774 [4:11:42<5:05:01, 12.20s/it] {'loss': 1.0479, 'learning_rate': 2.9500853003112384e-06, 'epoch': 0.46} 46%|████▌ | 1274/2774 [4:11:42<5:05:01, 12.20s/it] 46%|████▌ | 1275/2774 [4:11:53<4:58:29, 11.95s/it] {'loss': 1.0254, 'learning_rate': 2.9472130050052385e-06, 'epoch': 0.46} 46%|████▌ | 1275/2774 [4:11:53<4:58:29, 11.95s/it] 46%|████▌ | 1276/2774 [4:12:06<5:08:15, 12.35s/it] {'loss': 1.0239, 'learning_rate': 2.944340099727715e-06, 'epoch': 0.46} 46%|████▌ | 1276/2774 [4:12:06<5:08:15, 12.35s/it] 46%|████▌ | 1277/2774 [4:12:18<5:01:24, 12.08s/it] {'loss': 1.0601, 'learning_rate': 2.9414665883971365e-06, 'epoch': 0.46} 46%|████▌ | 1277/2774 [4:12:18<5:01:24, 12.08s/it] 46%|████▌ | 1278/2774 [4:12:29<4:54:42, 11.82s/it] {'loss': 1.0093, 'learning_rate': 2.938592474932801e-06, 'epoch': 0.46} 46%|████▌ | 1278/2774 [4:12:29<4:54:42, 11.82s/it] 46%|████▌ | 1279/2774 [4:12:40<4:50:59, 11.68s/it] {'loss': 1.0312, 'learning_rate': 2.9357177632548234e-06, 'epoch': 0.46} 46%|████▌ | 1279/2774 [4:12:40<4:50:59, 11.68s/it] 46%|████▌ | 1280/2774 [4:12:52<4:50:45, 11.68s/it] {'loss': 1.042, 'learning_rate': 2.9328424572841375e-06, 'epoch': 0.46} 46%|████▌ | 1280/2774 [4:12:52<4:50:45, 11.68s/it] 46%|████▌ | 1281/2774 [4:13:03<4:48:27, 11.59s/it] {'loss': 1.0854, 'learning_rate': 2.929966560942487e-06, 'epoch': 0.46} 46%|████▌ | 1281/2774 [4:13:03<4:48:27, 11.59s/it] 46%|████▌ | 1282/2774 [4:13:14<4:44:34, 11.44s/it] {'loss': 1.0332, 'learning_rate': 2.9270900781524216e-06, 'epoch': 0.46} 46%|████▌ | 1282/2774 [4:13:14<4:44:34, 11.44s/it] 46%|████▋ | 1283/2774 [4:13:26<4:43:47, 11.42s/it] {'loss': 1.0566, 'learning_rate': 2.9242130128372896e-06, 'epoch': 0.46} 46%|████▋ | 1283/2774 [4:13:26<4:43:47, 11.42s/it] 46%|████▋ | 1284/2774 [4:13:37<4:45:20, 11.49s/it] {'loss': 1.0054, 'learning_rate': 2.9213353689212337e-06, 'epoch': 0.46} 46%|████▋ | 1284/2774 [4:13:37<4:45:20, 11.49s/it] 46%|████▋ | 1285/2774 [4:13:49<4:47:00, 11.57s/it] {'loss': 1.0156, 'learning_rate': 2.9184571503291865e-06, 'epoch': 0.46} 46%|████▋ | 1285/2774 [4:13:49<4:47:00, 11.57s/it] 46%|████▋ | 1286/2774 [4:14:00<4:43:58, 11.45s/it] {'loss': 1.1016, 'learning_rate': 2.915578360986865e-06, 'epoch': 0.46} 46%|████▋ | 1286/2774 [4:14:00<4:43:58, 11.45s/it] 46%|████▋ | 1287/2774 [4:14:12<4:42:54, 11.41s/it] {'loss': 0.9805, 'learning_rate': 2.9126990048207633e-06, 'epoch': 0.46} 46%|████▋ | 1287/2774 [4:14:12<4:42:54, 11.41s/it] 46%|████▋ | 1288/2774 [4:14:23<4:41:42, 11.37s/it] {'loss': 1.0142, 'learning_rate': 2.9098190857581493e-06, 'epoch': 0.46} 46%|████▋ | 1288/2774 [4:14:23<4:41:42, 11.37s/it] 46%|████▋ | 1289/2774 [4:14:35<4:43:22, 11.45s/it] {'loss': 1.0479, 'learning_rate': 2.906938607727059e-06, 'epoch': 0.46} 46%|████▋ | 1289/2774 [4:14:35<4:43:22, 11.45s/it] 47%|████▋ | 1290/2774 [4:14:46<4:45:42, 11.55s/it] {'loss': 1.0278, 'learning_rate': 2.90405757465629e-06, 'epoch': 0.47} 47%|████▋ | 1290/2774 [4:14:46<4:45:42, 11.55s/it] 47%|████▋ | 1291/2774 [4:14:59<4:50:52, 11.77s/it] {'loss': 1.0518, 'learning_rate': 2.901175990475398e-06, 'epoch': 0.47} 47%|████▋ | 1291/2774 [4:14:59<4:50:52, 11.77s/it] 47%|████▋ | 1292/2774 [4:15:11<4:51:57, 11.82s/it] {'loss': 1.0542, 'learning_rate': 2.8982938591146892e-06, 'epoch': 0.47} 47%|████▋ | 1292/2774 [4:15:11<4:51:57, 11.82s/it] 47%|████▋ | 1293/2774 [4:15:22<4:49:18, 11.72s/it] {'loss': 1.0151, 'learning_rate': 2.895411184505217e-06, 'epoch': 0.47} 47%|████▋ | 1293/2774 [4:15:22<4:49:18, 11.72s/it] 47%|████▋ | 1294/2774 [4:15:36<5:04:57, 12.36s/it] {'loss': 0.9927, 'learning_rate': 2.892527970578775e-06, 'epoch': 0.47} 47%|████▋ | 1294/2774 [4:15:36<5:04:57, 12.36s/it] 47%|████▋ | 1295/2774 [4:15:48<4:58:49, 12.12s/it] {'loss': 1.0298, 'learning_rate': 2.8896442212678933e-06, 'epoch': 0.47} 47%|████▋ | 1295/2774 [4:15:48<4:58:49, 12.12s/it] 47%|████▋ | 1296/2774 [4:15:59<4:54:32, 11.96s/it] {'loss': 0.9819, 'learning_rate': 2.8867599405058315e-06, 'epoch': 0.47} 47%|████▋ | 1296/2774 [4:15:59<4:54:32, 11.96s/it] 47%|████▋ | 1297/2774 [4:16:11<4:51:38, 11.85s/it] {'loss': 1.0771, 'learning_rate': 2.8838751322265746e-06, 'epoch': 0.47} 47%|████▋ | 1297/2774 [4:16:11<4:51:38, 11.85s/it] 47%|████▋ | 1298/2774 [4:16:22<4:49:55, 11.79s/it] {'loss': 1.0503, 'learning_rate': 2.880989800364826e-06, 'epoch': 0.47} 47%|████▋ | 1298/2774 [4:16:22<4:49:55, 11.79s/it] 47%|████▋ | 1299/2774 [4:16:34<4:46:13, 11.64s/it] {'loss': 1.0527, 'learning_rate': 2.8781039488560055e-06, 'epoch': 0.47} 47%|████▋ | 1299/2774 [4:16:34<4:46:13, 11.64s/it] 47%|████▋ | 1300/2774 [4:16:45<4:45:35, 11.63s/it] {'loss': 1.0547, 'learning_rate': 2.8752175816362384e-06, 'epoch': 0.47} 47%|████▋ | 1300/2774 [4:16:45<4:45:35, 11.63s/it] 47%|████▋ | 1301/2774 [4:16:57<4:43:52, 11.56s/it] {'loss': 1.0244, 'learning_rate': 2.8723307026423565e-06, 'epoch': 0.47} 47%|████▋ | 1301/2774 [4:16:57<4:43:52, 11.56s/it] 47%|████▋ | 1302/2774 [4:17:08<4:41:28, 11.47s/it] {'loss': 1.0107, 'learning_rate': 2.869443315811889e-06, 'epoch': 0.47} 47%|████▋ | 1302/2774 [4:17:08<4:41:28, 11.47s/it] 47%|████▋ | 1303/2774 [4:17:20<4:42:27, 11.52s/it] {'loss': 1.0742, 'learning_rate': 2.866555425083055e-06, 'epoch': 0.47} 47%|████▋ | 1303/2774 [4:17:20<4:42:27, 11.52s/it] 47%|████▋ | 1304/2774 [4:17:31<4:41:38, 11.50s/it] {'loss': 1.0879, 'learning_rate': 2.8636670343947646e-06, 'epoch': 0.47} 47%|████▋ | 1304/2774 [4:17:31<4:41:38, 11.50s/it] 47%|████▋ | 1305/2774 [4:17:43<4:46:10, 11.69s/it] {'loss': 1.0054, 'learning_rate': 2.860778147686608e-06, 'epoch': 0.47} 47%|████▋ | 1305/2774 [4:17:43<4:46:10, 11.69s/it] 47%|████▋ | 1306/2774 [4:17:54<4:43:03, 11.57s/it] {'loss': 1.0205, 'learning_rate': 2.857888768898852e-06, 'epoch': 0.47} 47%|████▋ | 1306/2774 [4:17:54<4:43:03, 11.57s/it] 47%|████▋ | 1307/2774 [4:18:06<4:41:42, 11.52s/it] {'loss': 1.062, 'learning_rate': 2.8549989019724344e-06, 'epoch': 0.47} 47%|████▋ | 1307/2774 [4:18:06<4:41:42, 11.52s/it] 47%|████▋ | 1308/2774 [4:18:17<4:40:04, 11.46s/it] {'loss': 1.0127, 'learning_rate': 2.852108550848959e-06, 'epoch': 0.47} 47%|████▋ | 1308/2774 [4:18:17<4:40:04, 11.46s/it] 47%|████▋ | 1309/2774 [4:18:29<4:40:13, 11.48s/it] {'loss': 1.1143, 'learning_rate': 2.849217719470691e-06, 'epoch': 0.47} 47%|████▋ | 1309/2774 [4:18:29<4:40:13, 11.48s/it] 47%|████▋ | 1310/2774 [4:18:40<4:39:55, 11.47s/it] {'loss': 1.04, 'learning_rate': 2.84632641178055e-06, 'epoch': 0.47} 47%|████▋ | 1310/2774 [4:18:40<4:39:55, 11.47s/it] 47%|████▋ | 1311/2774 [4:18:51<4:38:32, 11.42s/it] {'loss': 1.0347, 'learning_rate': 2.8434346317221033e-06, 'epoch': 0.47} 47%|████▋ | 1311/2774 [4:18:51<4:38:32, 11.42s/it] 47%|████▋ | 1312/2774 [4:19:03<4:39:14, 11.46s/it] {'loss': 0.9863, 'learning_rate': 2.840542383239565e-06, 'epoch': 0.47} 47%|████▋ | 1312/2774 [4:19:03<4:39:14, 11.46s/it] 47%|████▋ | 1313/2774 [4:19:15<4:45:24, 11.72s/it] {'loss': 1.0469, 'learning_rate': 2.8376496702777884e-06, 'epoch': 0.47} 47%|████▋ | 1313/2774 [4:19:15<4:45:24, 11.72s/it] 47%|████▋ | 1314/2774 [4:19:27<4:43:33, 11.65s/it] {'loss': 1.0361, 'learning_rate': 2.8347564967822583e-06, 'epoch': 0.47} 47%|████▋ | 1314/2774 [4:19:27<4:43:33, 11.65s/it] 47%|████▋ | 1315/2774 [4:19:38<4:41:50, 11.59s/it] {'loss': 1.0122, 'learning_rate': 2.831862866699089e-06, 'epoch': 0.47} 47%|████▋ | 1315/2774 [4:19:38<4:41:50, 11.59s/it] 47%|████▋ | 1316/2774 [4:19:50<4:41:59, 11.60s/it] {'loss': 1.0396, 'learning_rate': 2.8289687839750157e-06, 'epoch': 0.47} 47%|████▋ | 1316/2774 [4:19:50<4:41:59, 11.60s/it] 47%|████▋ | 1317/2774 [4:20:01<4:41:52, 11.61s/it] {'loss': 1.02, 'learning_rate': 2.8260742525573944e-06, 'epoch': 0.47} 47%|████▋ | 1317/2774 [4:20:01<4:41:52, 11.61s/it] 48%|████▊ | 1318/2774 [4:20:13<4:39:48, 11.53s/it] {'loss': 0.9785, 'learning_rate': 2.8231792763941894e-06, 'epoch': 0.48} 48%|████▊ | 1318/2774 [4:20:13<4:39:48, 11.53s/it] 48%|████▊ | 1319/2774 [4:20:26<4:49:21, 11.93s/it] {'loss': 1.0073, 'learning_rate': 2.8202838594339756e-06, 'epoch': 0.48} 48%|████▊ | 1319/2774 [4:20:26<4:49:21, 11.93s/it] 48%|████▊ | 1320/2774 [4:20:37<4:47:31, 11.86s/it] {'loss': 1.0801, 'learning_rate': 2.817388005625924e-06, 'epoch': 0.48} 48%|████▊ | 1320/2774 [4:20:37<4:47:31, 11.86s/it] 48%|████▊ | 1321/2774 [4:20:49<4:47:58, 11.89s/it] {'loss': 0.9878, 'learning_rate': 2.8144917189198055e-06, 'epoch': 0.48} 48%|████▊ | 1321/2774 [4:20:49<4:47:58, 11.89s/it] 48%|████▊ | 1322/2774 [4:21:01<4:44:08, 11.74s/it] {'loss': 1.0576, 'learning_rate': 2.811595003265981e-06, 'epoch': 0.48} 48%|████▊ | 1322/2774 [4:21:01<4:44:08, 11.74s/it] 48%|████▊ | 1323/2774 [4:21:12<4:40:49, 11.61s/it] {'loss': 1.0518, 'learning_rate': 2.808697862615395e-06, 'epoch': 0.48} 48%|████▊ | 1323/2774 [4:21:12<4:40:49, 11.61s/it] 48%|████▊ | 1324/2774 [4:21:24<4:40:27, 11.61s/it] {'loss': 1.0015, 'learning_rate': 2.805800300919572e-06, 'epoch': 0.48} 48%|████▊ | 1324/2774 [4:21:24<4:40:27, 11.61s/it] 48%|████▊ | 1325/2774 [4:21:35<4:36:51, 11.46s/it] {'loss': 0.9849, 'learning_rate': 2.8029023221306117e-06, 'epoch': 0.48} 48%|████▊ | 1325/2774 [4:21:35<4:36:51, 11.46s/it] 48%|████▊ | 1326/2774 [4:21:49<4:57:57, 12.35s/it] {'loss': 1.0156, 'learning_rate': 2.8000039302011817e-06, 'epoch': 0.48} 48%|████▊ | 1326/2774 [4:21:49<4:57:57, 12.35s/it] 48%|████▊ | 1327/2774 [4:22:01<4:51:31, 12.09s/it] {'loss': 1.0195, 'learning_rate': 2.7971051290845137e-06, 'epoch': 0.48} 48%|████▊ | 1327/2774 [4:22:01<4:51:31, 12.09s/it] 48%|████▊ | 1328/2774 [4:22:12<4:46:31, 11.89s/it] {'loss': 1.0244, 'learning_rate': 2.7942059227343974e-06, 'epoch': 0.48} 48%|████▊ | 1328/2774 [4:22:12<4:46:31, 11.89s/it] 48%|████▊ | 1329/2774 [4:22:24<4:43:16, 11.76s/it] {'loss': 1.021, 'learning_rate': 2.7913063151051744e-06, 'epoch': 0.48} 48%|████▊ | 1329/2774 [4:22:24<4:43:16, 11.76s/it] 48%|████▊ | 1330/2774 [4:22:35<4:41:28, 11.70s/it] {'loss': 1.0317, 'learning_rate': 2.7884063101517354e-06, 'epoch': 0.48} 48%|████▊ | 1330/2774 [4:22:35<4:41:28, 11.70s/it] 48%|████▊ | 1331/2774 [4:22:47<4:43:06, 11.77s/it] {'loss': 1.0361, 'learning_rate': 2.7855059118295114e-06, 'epoch': 0.48} 48%|████▊ | 1331/2774 [4:22:47<4:43:06, 11.77s/it] 48%|████▊ | 1332/2774 [4:22:59<4:41:59, 11.73s/it] {'loss': 1.0029, 'learning_rate': 2.7826051240944706e-06, 'epoch': 0.48} 48%|████▊ | 1332/2774 [4:22:59<4:41:59, 11.73s/it] 48%|████▊ | 1333/2774 [4:23:10<4:40:35, 11.68s/it] {'loss': 1.083, 'learning_rate': 2.779703950903112e-06, 'epoch': 0.48} 48%|████▊ | 1333/2774 [4:23:10<4:40:35, 11.68s/it] 48%|████▊ | 1334/2774 [4:23:23<4:47:03, 11.96s/it] {'loss': 1.0205, 'learning_rate': 2.7768023962124613e-06, 'epoch': 0.48} 48%|████▊ | 1334/2774 [4:23:23<4:47:03, 11.96s/it] 48%|████▊ | 1335/2774 [4:23:34<4:42:36, 11.78s/it] {'loss': 1.0757, 'learning_rate': 2.7739004639800628e-06, 'epoch': 0.48} 48%|████▊ | 1335/2774 [4:23:34<4:42:36, 11.78s/it] 48%|████▊ | 1336/2774 [4:23:46<4:42:15, 11.78s/it] {'loss': 0.9976, 'learning_rate': 2.7709981581639772e-06, 'epoch': 0.48} 48%|████▊ | 1336/2774 [4:23:46<4:42:15, 11.78s/it] 48%|████▊ | 1337/2774 [4:23:58<4:40:48, 11.72s/it] {'loss': 1.0337, 'learning_rate': 2.768095482722775e-06, 'epoch': 0.48} 48%|████▊ | 1337/2774 [4:23:58<4:40:48, 11.72s/it] 48%|████▊ | 1338/2774 [4:24:11<4:54:47, 12.32s/it] {'loss': 1.0156, 'learning_rate': 2.7651924416155298e-06, 'epoch': 0.48} 48%|████▊ | 1338/2774 [4:24:11<4:54:47, 12.32s/it] 48%|████▊ | 1339/2774 [4:24:23<4:48:41, 12.07s/it] {'loss': 1.0645, 'learning_rate': 2.7622890388018133e-06, 'epoch': 0.48} 48%|████▊ | 1339/2774 [4:24:23<4:48:41, 12.07s/it] 48%|████▊ | 1340/2774 [4:24:35<4:46:22, 11.98s/it] {'loss': 1.061, 'learning_rate': 2.7593852782416923e-06, 'epoch': 0.48} 48%|████▊ | 1340/2774 [4:24:35<4:46:22, 11.98s/it] 48%|████▊ | 1341/2774 [4:24:46<4:40:55, 11.76s/it] {'loss': 1.0259, 'learning_rate': 2.756481163895722e-06, 'epoch': 0.48} 48%|████▊ | 1341/2774 [4:24:46<4:40:55, 11.76s/it] 48%|████▊ | 1342/2774 [4:24:59<4:49:05, 12.11s/it] {'loss': 1.0225, 'learning_rate': 2.753576699724936e-06, 'epoch': 0.48} 48%|████▊ | 1342/2774 [4:24:59<4:49:05, 12.11s/it] 48%|████▊ | 1343/2774 [4:25:10<4:43:27, 11.89s/it] {'loss': 1.0308, 'learning_rate': 2.750671889690851e-06, 'epoch': 0.48} 48%|████▊ | 1343/2774 [4:25:10<4:43:27, 11.89s/it] 48%|████▊ | 1344/2774 [4:25:21<4:37:44, 11.65s/it] {'loss': 1.0288, 'learning_rate': 2.7477667377554506e-06, 'epoch': 0.48} 48%|████▊ | 1344/2774 [4:25:21<4:37:44, 11.65s/it] 48%|████▊ | 1345/2774 [4:25:32<4:34:44, 11.54s/it] {'loss': 1.0298, 'learning_rate': 2.7448612478811878e-06, 'epoch': 0.48} 48%|████▊ | 1345/2774 [4:25:32<4:34:44, 11.54s/it] 49%|████▊ | 1346/2774 [4:25:44<4:34:09, 11.52s/it] {'loss': 1.0244, 'learning_rate': 2.7419554240309737e-06, 'epoch': 0.49} 49%|████▊ | 1346/2774 [4:25:44<4:34:09, 11.52s/it] 49%|████▊ | 1347/2774 [4:25:58<4:51:56, 12.28s/it] {'loss': 0.9937, 'learning_rate': 2.739049270168177e-06, 'epoch': 0.49} 49%|████▊ | 1347/2774 [4:25:58<4:51:56, 12.28s/it] 49%|████▊ | 1348/2774 [4:26:12<5:01:23, 12.68s/it] {'loss': 0.9971, 'learning_rate': 2.7361427902566175e-06, 'epoch': 0.49} 49%|████▊ | 1348/2774 [4:26:12<5:01:23, 12.68s/it] 49%|████▊ | 1349/2774 [4:26:23<4:51:40, 12.28s/it] {'loss': 1.0059, 'learning_rate': 2.7332359882605563e-06, 'epoch': 0.49} 49%|████▊ | 1349/2774 [4:26:23<4:51:40, 12.28s/it] 49%|████▊ | 1350/2774 [4:26:34<4:45:14, 12.02s/it] {'loss': 1.0342, 'learning_rate': 2.7303288681446966e-06, 'epoch': 0.49} 49%|████▊ | 1350/2774 [4:26:34<4:45:14, 12.02s/it] 49%|████▊ | 1351/2774 [4:26:46<4:42:35, 11.92s/it] {'loss': 1.0073, 'learning_rate': 2.727421433874175e-06, 'epoch': 0.49} 49%|████▊ | 1351/2774 [4:26:46<4:42:35, 11.92s/it] 49%|████▊ | 1352/2774 [4:26:57<4:38:18, 11.74s/it] {'loss': 1.0786, 'learning_rate': 2.7245136894145556e-06, 'epoch': 0.49} 49%|████▊ | 1352/2774 [4:26:57<4:38:18, 11.74s/it] 49%|████▉ | 1353/2774 [4:27:09<4:35:44, 11.64s/it] {'loss': 1.0142, 'learning_rate': 2.7216056387318257e-06, 'epoch': 0.49} 49%|████▉ | 1353/2774 [4:27:09<4:35:44, 11.64s/it] 49%|████▉ | 1354/2774 [4:27:20<4:33:37, 11.56s/it] {'loss': 1.0845, 'learning_rate': 2.7186972857923922e-06, 'epoch': 0.49} 49%|████▉ | 1354/2774 [4:27:20<4:33:37, 11.56s/it] 49%|████▉ | 1355/2774 [4:27:32<4:32:27, 11.52s/it] {'loss': 1.0879, 'learning_rate': 2.715788634563072e-06, 'epoch': 0.49} 49%|████▉ | 1355/2774 [4:27:32<4:32:27, 11.52s/it] 49%|████▉ | 1356/2774 [4:27:44<4:35:34, 11.66s/it] {'loss': 1.0654, 'learning_rate': 2.712879689011089e-06, 'epoch': 0.49} 49%|████▉ | 1356/2774 [4:27:44<4:35:34, 11.66s/it] 49%|████▉ | 1357/2774 [4:27:55<4:32:56, 11.56s/it] {'loss': 1.0576, 'learning_rate': 2.70997045310407e-06, 'epoch': 0.49} 49%|████▉ | 1357/2774 [4:27:55<4:32:56, 11.56s/it] 49%|████▉ | 1358/2774 [4:28:06<4:30:59, 11.48s/it] {'loss': 1.0181, 'learning_rate': 2.707060930810037e-06, 'epoch': 0.49} 49%|████▉ | 1358/2774 [4:28:06<4:30:59, 11.48s/it] 49%|████▉ | 1359/2774 [4:28:18<4:31:49, 11.53s/it] {'loss': 1.0396, 'learning_rate': 2.704151126097403e-06, 'epoch': 0.49} 49%|████▉ | 1359/2774 [4:28:18<4:31:49, 11.53s/it] 49%|████▉ | 1360/2774 [4:28:29<4:32:10, 11.55s/it] {'loss': 0.9819, 'learning_rate': 2.7012410429349656e-06, 'epoch': 0.49} 49%|████▉ | 1360/2774 [4:28:29<4:32:10, 11.55s/it] 49%|████▉ | 1361/2774 [4:28:41<4:29:42, 11.45s/it] {'loss': 1.0513, 'learning_rate': 2.698330685291902e-06, 'epoch': 0.49} 49%|████▉ | 1361/2774 [4:28:41<4:29:42, 11.45s/it] 49%|████▉ | 1362/2774 [4:28:52<4:29:57, 11.47s/it] {'loss': 1.0181, 'learning_rate': 2.695420057137764e-06, 'epoch': 0.49} 49%|████▉ | 1362/2774 [4:28:52<4:29:57, 11.47s/it] 49%|████▉ | 1363/2774 [4:29:04<4:30:24, 11.50s/it] {'loss': 1.0244, 'learning_rate': 2.692509162442473e-06, 'epoch': 0.49} 49%|████▉ | 1363/2774 [4:29:04<4:30:24, 11.50s/it] 49%|████▉ | 1364/2774 [4:29:15<4:28:15, 11.41s/it] {'loss': 1.0537, 'learning_rate': 2.6895980051763145e-06, 'epoch': 0.49} 49%|████▉ | 1364/2774 [4:29:15<4:28:15, 11.41s/it] 49%|████▉ | 1365/2774 [4:29:26<4:27:49, 11.40s/it] {'loss': 1.0776, 'learning_rate': 2.6866865893099298e-06, 'epoch': 0.49} 49%|████▉ | 1365/2774 [4:29:26<4:27:49, 11.40s/it] 49%|████▉ | 1366/2774 [4:29:38<4:26:26, 11.35s/it] {'loss': 0.9785, 'learning_rate': 2.683774918814314e-06, 'epoch': 0.49} 49%|████▉ | 1366/2774 [4:29:38<4:26:26, 11.35s/it] 49%|████▉ | 1367/2774 [4:29:49<4:29:48, 11.51s/it] {'loss': 1.0386, 'learning_rate': 2.6808629976608114e-06, 'epoch': 0.49} 49%|████▉ | 1367/2774 [4:29:49<4:29:48, 11.51s/it] 49%|████▉ | 1368/2774 [4:30:01<4:30:15, 11.53s/it] {'loss': 1.0732, 'learning_rate': 2.6779508298211055e-06, 'epoch': 0.49} 49%|████▉ | 1368/2774 [4:30:01<4:30:15, 11.53s/it] 49%|████▉ | 1369/2774 [4:30:12<4:28:29, 11.47s/it] {'loss': 1.0068, 'learning_rate': 2.6750384192672172e-06, 'epoch': 0.49} 49%|████▉ | 1369/2774 [4:30:12<4:28:29, 11.47s/it] 49%|████▉ | 1370/2774 [4:30:24<4:26:58, 11.41s/it] {'loss': 0.9795, 'learning_rate': 2.6721257699714985e-06, 'epoch': 0.49} 49%|████▉ | 1370/2774 [4:30:24<4:26:58, 11.41s/it] 49%|████▉ | 1371/2774 [4:30:35<4:27:01, 11.42s/it] {'loss': 1.0039, 'learning_rate': 2.6692128859066283e-06, 'epoch': 0.49} 49%|████▉ | 1371/2774 [4:30:35<4:27:01, 11.42s/it] 49%|████▉ | 1372/2774 [4:30:47<4:27:05, 11.43s/it] {'loss': 1.0806, 'learning_rate': 2.666299771045603e-06, 'epoch': 0.49} 49%|████▉ | 1372/2774 [4:30:47<4:27:05, 11.43s/it] 49%|████▉ | 1373/2774 [4:31:00<4:38:43, 11.94s/it] {'loss': 0.9761, 'learning_rate': 2.663386429361736e-06, 'epoch': 0.49} 49%|████▉ | 1373/2774 [4:31:00<4:38:43, 11.94s/it] 50%|████▉ | 1374/2774 [4:31:12<4:41:19, 12.06s/it] {'loss': 1.0244, 'learning_rate': 2.6604728648286494e-06, 'epoch': 0.5} 50%|████▉ | 1374/2774 [4:31:12<4:41:19, 12.06s/it] 50%|████▉ | 1375/2774 [4:31:24<4:37:21, 11.90s/it] {'loss': 1.0557, 'learning_rate': 2.657559081420269e-06, 'epoch': 0.5} 50%|████▉ | 1375/2774 [4:31:24<4:37:21, 11.90s/it] 50%|████▉ | 1376/2774 [4:31:37<4:47:41, 12.35s/it] {'loss': 0.9741, 'learning_rate': 2.6546450831108187e-06, 'epoch': 0.5} 50%|████▉ | 1376/2774 [4:31:37<4:47:41, 12.35s/it] 50%|████▉ | 1377/2774 [4:31:50<4:51:09, 12.50s/it] {'loss': 1.0015, 'learning_rate': 2.6517308738748178e-06, 'epoch': 0.5} 50%|████▉ | 1377/2774 [4:31:50<4:51:09, 12.50s/it] 50%|████▉ | 1378/2774 [4:32:04<5:03:54, 13.06s/it] {'loss': 0.9722, 'learning_rate': 2.6488164576870706e-06, 'epoch': 0.5} 50%|████▉ | 1378/2774 [4:32:04<5:03:54, 13.06s/it] 50%|████▉ | 1379/2774 [4:32:15<4:50:50, 12.51s/it] {'loss': 0.9961, 'learning_rate': 2.6459018385226643e-06, 'epoch': 0.5} 50%|████▉ | 1379/2774 [4:32:15<4:50:50, 12.51s/it] 50%|████▉ | 1380/2774 [4:32:27<4:42:52, 12.18s/it] {'loss': 1.0605, 'learning_rate': 2.642987020356964e-06, 'epoch': 0.5} 50%|████▉ | 1380/2774 [4:32:27<4:42:52, 12.18s/it] 50%|████▉ | 1381/2774 [4:32:39<4:39:57, 12.06s/it] {'loss': 1.0288, 'learning_rate': 2.640072007165606e-06, 'epoch': 0.5} 50%|████▉ | 1381/2774 [4:32:39<4:39:57, 12.06s/it] 50%|████▉ | 1382/2774 [4:32:50<4:35:50, 11.89s/it] {'loss': 0.9858, 'learning_rate': 2.637156802924492e-06, 'epoch': 0.5} 50%|████▉ | 1382/2774 [4:32:50<4:35:50, 11.89s/it] 50%|████▉ | 1383/2774 [4:33:01<4:31:44, 11.72s/it] {'loss': 1.019, 'learning_rate': 2.6342414116097838e-06, 'epoch': 0.5} 50%|████▉ | 1383/2774 [4:33:01<4:31:44, 11.72s/it] 50%|████▉ | 1384/2774 [4:33:13<4:29:12, 11.62s/it] {'loss': 1.0024, 'learning_rate': 2.6313258371978996e-06, 'epoch': 0.5} 50%|████▉ | 1384/2774 [4:33:13<4:29:12, 11.62s/it] 50%|████▉ | 1385/2774 [4:33:31<5:14:40, 13.59s/it] {'loss': 1.0571, 'learning_rate': 2.628410083665506e-06, 'epoch': 0.5} 50%|████▉ | 1385/2774 [4:33:31<5:14:40, 13.59s/it] 50%|████▉ | 1386/2774 [4:33:44<5:11:16, 13.46s/it] {'loss': 1.0059, 'learning_rate': 2.6254941549895156e-06, 'epoch': 0.5} 50%|████▉ | 1386/2774 [4:33:44<5:11:16, 13.46s/it] 50%|█████ | 1387/2774 [4:33:56<5:00:28, 13.00s/it] {'loss': 1.0396, 'learning_rate': 2.62257805514708e-06, 'epoch': 0.5} 50%|█████ | 1387/2774 [4:33:56<5:00:28, 13.00s/it] 50%|█████ | 1388/2774 [4:34:09<5:03:16, 13.13s/it] {'loss': 1.0405, 'learning_rate': 2.61966178811558e-06, 'epoch': 0.5} 50%|█████ | 1388/2774 [4:34:09<5:03:16, 13.13s/it] 50%|█████ | 1389/2774 [4:34:23<5:03:21, 13.14s/it] {'loss': 1.0308, 'learning_rate': 2.6167453578726303e-06, 'epoch': 0.5} 50%|█████ | 1389/2774 [4:34:23<5:03:21, 13.14s/it] 50%|█████ | 1390/2774 [4:34:35<4:55:24, 12.81s/it] {'loss': 1.0508, 'learning_rate': 2.613828768396065e-06, 'epoch': 0.5} 50%|█████ | 1390/2774 [4:34:35<4:55:24, 12.81s/it] 50%|█████ | 1391/2774 [4:34:46<4:44:45, 12.35s/it] {'loss': 1.0996, 'learning_rate': 2.610912023663936e-06, 'epoch': 0.5} 50%|█████ | 1391/2774 [4:34:46<4:44:45, 12.35s/it] 50%|█████ | 1392/2774 [4:35:00<4:53:21, 12.74s/it] {'loss': 1.0029, 'learning_rate': 2.6079951276545067e-06, 'epoch': 0.5} 50%|█████ | 1392/2774 [4:35:00<4:53:21, 12.74s/it] 50%|█████ | 1393/2774 [4:35:12<4:49:43, 12.59s/it] {'loss': 0.9922, 'learning_rate': 2.605078084346247e-06, 'epoch': 0.5} 50%|█████ | 1393/2774 [4:35:12<4:49:43, 12.59s/it] 50%|█████ | 1394/2774 [4:35:26<4:58:47, 12.99s/it] {'loss': 0.9883, 'learning_rate': 2.602160897717828e-06, 'epoch': 0.5} 50%|█████ | 1394/2774 [4:35:26<4:58:47, 12.99s/it] 50%|█████ | 1395/2774 [4:35:37<4:47:16, 12.50s/it] {'loss': 1.0234, 'learning_rate': 2.599243571748116e-06, 'epoch': 0.5} 50%|█████ | 1395/2774 [4:35:37<4:47:16, 12.50s/it] 50%|█████ | 1396/2774 [4:35:48<4:39:24, 12.17s/it] {'loss': 1.0361, 'learning_rate': 2.596326110416167e-06, 'epoch': 0.5} 50%|█████ | 1396/2774 [4:35:48<4:39:24, 12.17s/it] 50%|█████ | 1397/2774 [4:36:01<4:42:00, 12.29s/it] {'loss': 0.9829, 'learning_rate': 2.593408517701222e-06, 'epoch': 0.5} 50%|█████ | 1397/2774 [4:36:01<4:42:00, 12.29s/it] 50%|█████ | 1398/2774 [4:36:13<4:36:22, 12.05s/it] {'loss': 1.0044, 'learning_rate': 2.5904907975827015e-06, 'epoch': 0.5} 50%|█████ | 1398/2774 [4:36:13<4:36:22, 12.05s/it] 50%|█████ | 1399/2774 [4:36:24<4:34:26, 11.98s/it] {'loss': 1.04, 'learning_rate': 2.5875729540401993e-06, 'epoch': 0.5} 50%|█████ | 1399/2774 [4:36:24<4:34:26, 11.98s/it] 50%|█████ | 1400/2774 [4:36:36<4:29:58, 11.79s/it] {'loss': 1.0386, 'learning_rate': 2.584654991053479e-06, 'epoch': 0.5} 50%|█████ | 1400/2774 [4:36:36<4:29:58, 11.79s/it] 51%|█████ | 1401/2774 [4:36:47<4:26:24, 11.64s/it] {'loss': 1.0483, 'learning_rate': 2.581736912602464e-06, 'epoch': 0.51} 51%|█████ | 1401/2774 [4:36:47<4:26:24, 11.64s/it] 51%|█████ | 1402/2774 [4:36:58<4:23:52, 11.54s/it] {'loss': 1.0571, 'learning_rate': 2.578818722667238e-06, 'epoch': 0.51} 51%|█████ | 1402/2774 [4:36:58<4:23:52, 11.54s/it] 51%|█████ | 1403/2774 [4:37:10<4:22:29, 11.49s/it] {'loss': 1.083, 'learning_rate': 2.575900425228035e-06, 'epoch': 0.51} 51%|█████ | 1403/2774 [4:37:10<4:22:29, 11.49s/it] 51%|█████ | 1404/2774 [4:37:21<4:22:05, 11.48s/it] {'loss': 1.0425, 'learning_rate': 2.5729820242652376e-06, 'epoch': 0.51} 51%|█████ | 1404/2774 [4:37:21<4:22:05, 11.48s/it] 51%|█████ | 1405/2774 [4:37:35<4:37:33, 12.16s/it] {'loss': 0.9502, 'learning_rate': 2.570063523759368e-06, 'epoch': 0.51} 51%|█████ | 1405/2774 [4:37:35<4:37:33, 12.16s/it] 51%|█████ | 1406/2774 [4:37:47<4:34:43, 12.05s/it] {'loss': 1.0723, 'learning_rate': 2.5671449276910836e-06, 'epoch': 0.51} 51%|█████ | 1406/2774 [4:37:47<4:34:43, 12.05s/it] 51%|█████ | 1407/2774 [4:37:58<4:30:09, 11.86s/it] {'loss': 1.0703, 'learning_rate': 2.5642262400411745e-06, 'epoch': 0.51} 51%|█████ | 1407/2774 [4:37:58<4:30:09, 11.86s/it] 51%|█████ | 1408/2774 [4:38:09<4:25:09, 11.65s/it] {'loss': 1.0239, 'learning_rate': 2.561307464790554e-06, 'epoch': 0.51} 51%|█████ | 1408/2774 [4:38:09<4:25:09, 11.65s/it] 51%|█████ | 1409/2774 [4:38:21<4:24:03, 11.61s/it] {'loss': 0.9731, 'learning_rate': 2.558388605920255e-06, 'epoch': 0.51} 51%|█████ | 1409/2774 [4:38:21<4:24:03, 11.61s/it] 51%|█████ | 1410/2774 [4:38:32<4:22:19, 11.54s/it] {'loss': 0.979, 'learning_rate': 2.5554696674114243e-06, 'epoch': 0.51} 51%|█████ | 1410/2774 [4:38:32<4:22:19, 11.54s/it] 51%|█████ | 1411/2774 [4:38:44<4:22:04, 11.54s/it] {'loss': 0.9814, 'learning_rate': 2.552550653245318e-06, 'epoch': 0.51} 51%|█████ | 1411/2774 [4:38:44<4:22:04, 11.54s/it] 51%|█████ | 1412/2774 [4:38:55<4:23:08, 11.59s/it] {'loss': 1.0835, 'learning_rate': 2.5496315674032952e-06, 'epoch': 0.51} 51%|█████ | 1412/2774 [4:38:55<4:23:08, 11.59s/it] 51%|█████ | 1413/2774 [4:39:07<4:22:12, 11.56s/it] {'loss': 1.0508, 'learning_rate': 2.5467124138668126e-06, 'epoch': 0.51} 51%|█████ | 1413/2774 [4:39:07<4:22:12, 11.56s/it] 51%|█████ | 1414/2774 [4:39:18<4:21:21, 11.53s/it] {'loss': 1.0312, 'learning_rate': 2.54379319661742e-06, 'epoch': 0.51} 51%|█████ | 1414/2774 [4:39:18<4:21:21, 11.53s/it] 51%|█████ | 1415/2774 [4:39:30<4:22:28, 11.59s/it] {'loss': 1.0566, 'learning_rate': 2.540873919636752e-06, 'epoch': 0.51} 51%|█████ | 1415/2774 [4:39:30<4:22:28, 11.59s/it] 51%|█████ | 1416/2774 [4:39:41<4:21:06, 11.54s/it] {'loss': 1.0889, 'learning_rate': 2.537954586906527e-06, 'epoch': 0.51} 51%|█████ | 1416/2774 [4:39:41<4:21:06, 11.54s/it] 51%|█████ | 1417/2774 [4:39:53<4:19:35, 11.48s/it] {'loss': 1.0518, 'learning_rate': 2.5350352024085383e-06, 'epoch': 0.51} 51%|█████ | 1417/2774 [4:39:53<4:19:35, 11.48s/it] 51%|█████ | 1418/2774 [4:40:04<4:18:01, 11.42s/it] {'loss': 1.0479, 'learning_rate': 2.5321157701246503e-06, 'epoch': 0.51} 51%|█████ | 1418/2774 [4:40:04<4:18:01, 11.42s/it] 51%|█████ | 1419/2774 [4:40:15<4:16:36, 11.36s/it] {'loss': 1.0264, 'learning_rate': 2.5291962940367915e-06, 'epoch': 0.51} 51%|█████ | 1419/2774 [4:40:15<4:16:36, 11.36s/it] 51%|█████ | 1420/2774 [4:40:27<4:16:40, 11.37s/it] {'loss': 1.0513, 'learning_rate': 2.526276778126951e-06, 'epoch': 0.51} 51%|█████ | 1420/2774 [4:40:27<4:16:40, 11.37s/it] 51%|█████ | 1421/2774 [4:40:38<4:15:54, 11.35s/it] {'loss': 1.0171, 'learning_rate': 2.5233572263771727e-06, 'epoch': 0.51} 51%|█████ | 1421/2774 [4:40:38<4:15:54, 11.35s/it] 51%|█████▏ | 1422/2774 [4:40:52<4:35:28, 12.22s/it] {'loss': 1.0981, 'learning_rate': 2.520437642769549e-06, 'epoch': 0.51} 51%|█████▏ | 1422/2774 [4:40:52<4:35:28, 12.22s/it] 51%|█████▏ | 1423/2774 [4:41:04<4:30:12, 12.00s/it] {'loss': 1.0522, 'learning_rate': 2.5175180312862145e-06, 'epoch': 0.51} 51%|█████▏ | 1423/2774 [4:41:04<4:30:12, 12.00s/it] 51%|█████▏ | 1424/2774 [4:41:16<4:30:00, 12.00s/it] {'loss': 1.0879, 'learning_rate': 2.514598395909344e-06, 'epoch': 0.51} 51%|█████▏ | 1424/2774 [4:41:16<4:30:00, 12.00s/it] 51%|█████▏ | 1425/2774 [4:41:28<4:28:20, 11.93s/it] {'loss': 1.0459, 'learning_rate': 2.511678740621143e-06, 'epoch': 0.51} 51%|█████▏ | 1425/2774 [4:41:28<4:28:20, 11.93s/it] 51%|█████▏ | 1426/2774 [4:41:41<4:36:28, 12.31s/it] {'loss': 1.0586, 'learning_rate': 2.5087590694038455e-06, 'epoch': 0.51} 51%|█████▏ | 1426/2774 [4:41:41<4:36:28, 12.31s/it] 51%|█████▏ | 1427/2774 [4:41:52<4:31:20, 12.09s/it] {'loss': 1.0493, 'learning_rate': 2.5058393862397067e-06, 'epoch': 0.51} 51%|█████▏ | 1427/2774 [4:41:52<4:31:20, 12.09s/it] 51%|█████▏ | 1428/2774 [4:42:04<4:29:29, 12.01s/it] {'loss': 1.0288, 'learning_rate': 2.5029196951109975e-06, 'epoch': 0.51} 51%|█████▏ | 1428/2774 [4:42:04<4:29:29, 12.01s/it] 52%|█████▏ | 1429/2774 [4:42:17<4:31:43, 12.12s/it] {'loss': 0.9868, 'learning_rate': 2.5e-06, 'epoch': 0.52} 52%|█████▏ | 1429/2774 [4:42:17<4:31:43, 12.12s/it] 52%|█████▏ | 1430/2774 [4:42:28<4:28:22, 11.98s/it] {'loss': 1.0098, 'learning_rate': 2.4970803048890033e-06, 'epoch': 0.52} 52%|█████▏ | 1430/2774 [4:42:28<4:28:22, 11.98s/it] 52%|█████▏ | 1431/2774 [4:42:39<4:23:33, 11.77s/it] {'loss': 1.0249, 'learning_rate': 2.4941606137602937e-06, 'epoch': 0.52} 52%|█████▏ | 1431/2774 [4:42:39<4:23:33, 11.77s/it] 52%|█████▏ | 1432/2774 [4:42:51<4:21:42, 11.70s/it] {'loss': 1.0, 'learning_rate': 2.491240930596155e-06, 'epoch': 0.52} 52%|█████▏ | 1432/2774 [4:42:51<4:21:42, 11.70s/it] 52%|█████▏ | 1433/2774 [4:43:02<4:20:17, 11.65s/it] {'loss': 1.0854, 'learning_rate': 2.488321259378857e-06, 'epoch': 0.52} 52%|█████▏ | 1433/2774 [4:43:03<4:20:17, 11.65s/it] 52%|█████▏ | 1434/2774 [4:43:14<4:17:49, 11.54s/it] {'loss': 1.0322, 'learning_rate': 2.4854016040906574e-06, 'epoch': 0.52} 52%|█████▏ | 1434/2774 [4:43:14<4:17:49, 11.54s/it] 52%|█████▏ | 1435/2774 [4:43:25<4:18:03, 11.56s/it] {'loss': 1.0215, 'learning_rate': 2.482481968713787e-06, 'epoch': 0.52} 52%|█████▏ | 1435/2774 [4:43:25<4:18:03, 11.56s/it] 52%|█████▏ | 1436/2774 [4:43:37<4:17:33, 11.55s/it] {'loss': 1.0977, 'learning_rate': 2.4795623572304523e-06, 'epoch': 0.52} 52%|█████▏ | 1436/2774 [4:43:37<4:17:33, 11.55s/it] 52%|█████▏ | 1437/2774 [4:43:48<4:16:49, 11.53s/it] {'loss': 1.0264, 'learning_rate': 2.4766427736228277e-06, 'epoch': 0.52} 52%|█████▏ | 1437/2774 [4:43:48<4:16:49, 11.53s/it] 52%|█████▏ | 1438/2774 [4:44:00<4:15:16, 11.46s/it] {'loss': 1.0391, 'learning_rate': 2.4737232218730495e-06, 'epoch': 0.52} 52%|█████▏ | 1438/2774 [4:44:00<4:15:16, 11.46s/it] 52%|█████▏ | 1439/2774 [4:44:11<4:15:31, 11.48s/it] {'loss': 1.1123, 'learning_rate': 2.4708037059632094e-06, 'epoch': 0.52} 52%|█████▏ | 1439/2774 [4:44:11<4:15:31, 11.48s/it] 52%|█████▏ | 1440/2774 [4:44:23<4:14:36, 11.45s/it] {'loss': 1.0186, 'learning_rate': 2.467884229875351e-06, 'epoch': 0.52} 52%|█████▏ | 1440/2774 [4:44:23<4:14:36, 11.45s/it] 52%|█████▏ | 1441/2774 [4:44:37<4:30:58, 12.20s/it] {'loss': 0.9585, 'learning_rate': 2.464964797591462e-06, 'epoch': 0.52} 52%|█████▏ | 1441/2774 [4:44:37<4:30:58, 12.20s/it] 52%|█████▏ | 1442/2774 [4:44:48<4:25:34, 11.96s/it] {'loss': 1.1006, 'learning_rate': 2.4620454130934732e-06, 'epoch': 0.52} 52%|█████▏ | 1442/2774 [4:44:48<4:25:34, 11.96s/it] 52%|█████▏ | 1443/2774 [4:45:00<4:22:34, 11.84s/it] {'loss': 1.0444, 'learning_rate': 2.4591260803632484e-06, 'epoch': 0.52} 52%|█████▏ | 1443/2774 [4:45:00<4:22:34, 11.84s/it] 52%|█████▏ | 1444/2774 [4:45:11<4:18:17, 11.65s/it] {'loss': 1.0112, 'learning_rate': 2.4562068033825807e-06, 'epoch': 0.52} 52%|█████▏ | 1444/2774 [4:45:11<4:18:17, 11.65s/it] 52%|█████▏ | 1445/2774 [4:45:22<4:16:54, 11.60s/it] {'loss': 1.0176, 'learning_rate': 2.453287586133188e-06, 'epoch': 0.52} 52%|█████▏ | 1445/2774 [4:45:22<4:16:54, 11.60s/it] 52%|█████▏ | 1446/2774 [4:45:34<4:15:19, 11.54s/it] {'loss': 1.0479, 'learning_rate': 2.450368432596705e-06, 'epoch': 0.52} 52%|█████▏ | 1446/2774 [4:45:34<4:15:19, 11.54s/it] 52%|█████▏ | 1447/2774 [4:45:46<4:21:05, 11.81s/it] {'loss': 1.105, 'learning_rate': 2.4474493467546828e-06, 'epoch': 0.52} 52%|█████▏ | 1447/2774 [4:45:46<4:21:05, 11.81s/it] 52%|█████▏ | 1448/2774 [4:45:58<4:21:31, 11.83s/it] {'loss': 1.0073, 'learning_rate': 2.4445303325885765e-06, 'epoch': 0.52} 52%|█████▏ | 1448/2774 [4:45:58<4:21:31, 11.83s/it] 52%|█████▏ | 1449/2774 [4:46:09<4:19:07, 11.73s/it] {'loss': 1.0127, 'learning_rate': 2.4416113940797457e-06, 'epoch': 0.52} 52%|█████▏ | 1449/2774 [4:46:09<4:19:07, 11.73s/it] 52%|█████▏ | 1450/2774 [4:46:21<4:18:41, 11.72s/it] {'loss': 1.0103, 'learning_rate': 2.4386925352094464e-06, 'epoch': 0.52} 52%|█████▏ | 1450/2774 [4:46:21<4:18:41, 11.72s/it] 52%|█████▏ | 1451/2774 [4:46:33<4:16:24, 11.63s/it] {'loss': 1.0479, 'learning_rate': 2.4357737599588255e-06, 'epoch': 0.52} 52%|█████▏ | 1451/2774 [4:46:33<4:16:24, 11.63s/it] 52%|█████▏ | 1452/2774 [4:46:44<4:14:31, 11.55s/it] {'loss': 1.0073, 'learning_rate': 2.4328550723089173e-06, 'epoch': 0.52} 52%|█████▏ | 1452/2774 [4:46:44<4:14:31, 11.55s/it] 52%|█████▏ | 1453/2774 [4:46:55<4:14:03, 11.54s/it] {'loss': 1.0278, 'learning_rate': 2.429936476240633e-06, 'epoch': 0.52} 52%|█████▏ | 1453/2774 [4:46:55<4:14:03, 11.54s/it] 52%|█████▏ | 1454/2774 [4:47:07<4:12:53, 11.50s/it] {'loss': 1.0059, 'learning_rate': 2.4270179757347633e-06, 'epoch': 0.52} 52%|█████▏ | 1454/2774 [4:47:07<4:12:53, 11.50s/it] 52%|█████▏ | 1455/2774 [4:47:18<4:13:01, 11.51s/it] {'loss': 1.0088, 'learning_rate': 2.4240995747719657e-06, 'epoch': 0.52} 52%|█████▏ | 1455/2774 [4:47:18<4:13:01, 11.51s/it] 52%|█████▏ | 1456/2774 [4:47:30<4:11:54, 11.47s/it] {'loss': 1.0039, 'learning_rate': 2.421181277332763e-06, 'epoch': 0.52} 52%|█████▏ | 1456/2774 [4:47:30<4:11:54, 11.47s/it] 53%|█████▎ | 1457/2774 [4:47:42<4:18:30, 11.78s/it] {'loss': 1.0498, 'learning_rate': 2.418263087397537e-06, 'epoch': 0.53} 53%|█████▎ | 1457/2774 [4:47:42<4:18:30, 11.78s/it] 53%|█████▎ | 1458/2774 [4:47:56<4:31:43, 12.39s/it] {'loss': 1.0488, 'learning_rate': 2.415345008946522e-06, 'epoch': 0.53} 53%|█████▎ | 1458/2774 [4:47:56<4:31:43, 12.39s/it] 53%|█████▎ | 1459/2774 [4:48:08<4:27:50, 12.22s/it] {'loss': 1.0903, 'learning_rate': 2.4124270459598007e-06, 'epoch': 0.53} 53%|█████▎ | 1459/2774 [4:48:08<4:27:50, 12.22s/it] 53%|█████▎ | 1460/2774 [4:48:19<4:22:30, 11.99s/it] {'loss': 1.0298, 'learning_rate': 2.4095092024172994e-06, 'epoch': 0.53} 53%|█████▎ | 1460/2774 [4:48:19<4:22:30, 11.99s/it] 53%|█████▎ | 1461/2774 [4:48:32<4:27:16, 12.21s/it] {'loss': 1.0024, 'learning_rate': 2.406591482298779e-06, 'epoch': 0.53} 53%|█████▎ | 1461/2774 [4:48:32<4:27:16, 12.21s/it] 53%|█████▎ | 1462/2774 [4:48:44<4:22:12, 11.99s/it] {'loss': 1.0576, 'learning_rate': 2.403673889583835e-06, 'epoch': 0.53} 53%|█████▎ | 1462/2774 [4:48:44<4:22:12, 11.99s/it] 53%|█████▎ | 1463/2774 [4:48:55<4:17:52, 11.80s/it] {'loss': 1.0254, 'learning_rate': 2.4007564282518854e-06, 'epoch': 0.53} 53%|█████▎ | 1463/2774 [4:48:55<4:17:52, 11.80s/it] 53%|█████▎ | 1464/2774 [4:49:06<4:15:17, 11.69s/it] {'loss': 1.0337, 'learning_rate': 2.397839102282173e-06, 'epoch': 0.53} 53%|█████▎ | 1464/2774 [4:49:06<4:15:17, 11.69s/it] 53%|█████▎ | 1465/2774 [4:49:19<4:18:46, 11.86s/it] {'loss': 1.04, 'learning_rate': 2.394921915653754e-06, 'epoch': 0.53} 53%|█████▎ | 1465/2774 [4:49:19<4:18:46, 11.86s/it] 53%|█████▎ | 1466/2774 [4:49:30<4:17:13, 11.80s/it] {'loss': 0.9746, 'learning_rate': 2.3920048723454938e-06, 'epoch': 0.53} 53%|█████▎ | 1466/2774 [4:49:30<4:17:13, 11.80s/it] 53%|█████▎ | 1467/2774 [4:49:41<4:13:17, 11.63s/it] {'loss': 0.999, 'learning_rate': 2.3890879763360643e-06, 'epoch': 0.53} 53%|█████▎ | 1467/2774 [4:49:41<4:13:17, 11.63s/it] 53%|█████▎ | 1468/2774 [4:49:53<4:10:47, 11.52s/it] {'loss': 1.0801, 'learning_rate': 2.386171231603935e-06, 'epoch': 0.53} 53%|█████▎ | 1468/2774 [4:49:53<4:10:47, 11.52s/it] 53%|█████▎ | 1469/2774 [4:50:04<4:08:39, 11.43s/it] {'loss': 0.9653, 'learning_rate': 2.3832546421273693e-06, 'epoch': 0.53} 53%|█████▎ | 1469/2774 [4:50:04<4:08:39, 11.43s/it] 53%|█████▎ | 1470/2774 [4:50:16<4:09:10, 11.46s/it] {'loss': 1.0659, 'learning_rate': 2.380338211884421e-06, 'epoch': 0.53} 53%|█████▎ | 1470/2774 [4:50:16<4:09:10, 11.46s/it] 53%|█████▎ | 1471/2774 [4:50:29<4:24:41, 12.19s/it] {'loss': 1.0488, 'learning_rate': 2.377421944852922e-06, 'epoch': 0.53} 53%|█████▎ | 1471/2774 [4:50:29<4:24:41, 12.19s/it] 53%|█████▎ | 1472/2774 [4:50:41<4:21:50, 12.07s/it] {'loss': 1.0283, 'learning_rate': 2.374505845010485e-06, 'epoch': 0.53} 53%|█████▎ | 1472/2774 [4:50:41<4:21:50, 12.07s/it] 53%|█████▎ | 1473/2774 [4:50:53<4:18:58, 11.94s/it] {'loss': 1.0571, 'learning_rate': 2.3715899163344947e-06, 'epoch': 0.53} 53%|█████▎ | 1473/2774 [4:50:53<4:18:58, 11.94s/it] 53%|█████▎ | 1474/2774 [4:51:04<4:15:48, 11.81s/it] {'loss': 1.0449, 'learning_rate': 2.3686741628021016e-06, 'epoch': 0.53} 53%|█████▎ | 1474/2774 [4:51:04<4:15:48, 11.81s/it] 53%|█████▎ | 1475/2774 [4:51:16<4:12:17, 11.65s/it] {'loss': 1.0811, 'learning_rate': 2.365758588390217e-06, 'epoch': 0.53} 53%|█████▎ | 1475/2774 [4:51:16<4:12:17, 11.65s/it] 53%|█████▎ | 1476/2774 [4:51:28<4:18:26, 11.95s/it] {'loss': 0.9927, 'learning_rate': 2.3628431970755087e-06, 'epoch': 0.53} 53%|█████▎ | 1476/2774 [4:51:28<4:18:26, 11.95s/it] 53%|█████▎ | 1477/2774 [4:51:40<4:18:36, 11.96s/it] {'loss': 1.0483, 'learning_rate': 2.359927992834394e-06, 'epoch': 0.53} 53%|█████▎ | 1477/2774 [4:51:40<4:18:36, 11.96s/it] 53%|█████▎ | 1478/2774 [4:51:52<4:18:01, 11.95s/it] {'loss': 0.9756, 'learning_rate': 2.357012979643036e-06, 'epoch': 0.53} 53%|█████▎ | 1478/2774 [4:51:52<4:18:01, 11.95s/it] 53%|█████▎ | 1479/2774 [4:52:04<4:15:41, 11.85s/it] {'loss': 1.022, 'learning_rate': 2.3540981614773366e-06, 'epoch': 0.53} 53%|█████▎ | 1479/2774 [4:52:04<4:15:41, 11.85s/it] 53%|█████▎ | 1480/2774 [4:52:16<4:16:11, 11.88s/it] {'loss': 1.0703, 'learning_rate': 2.3511835423129307e-06, 'epoch': 0.53} 53%|█████▎ | 1480/2774 [4:52:16<4:16:11, 11.88s/it] 53%|█████▎ | 1481/2774 [4:52:27<4:12:10, 11.70s/it] {'loss': 0.9927, 'learning_rate': 2.3482691261251835e-06, 'epoch': 0.53} 53%|█████▎ | 1481/2774 [4:52:27<4:12:10, 11.70s/it] 53%|█████▎ | 1482/2774 [4:52:38<4:10:16, 11.62s/it] {'loss': 1.0234, 'learning_rate': 2.3453549168891817e-06, 'epoch': 0.53} 53%|█████▎ | 1482/2774 [4:52:38<4:10:16, 11.62s/it] 53%|█████▎ | 1483/2774 [4:52:50<4:07:49, 11.52s/it] {'loss': 1.0459, 'learning_rate': 2.342440918579732e-06, 'epoch': 0.53} 53%|█████▎ | 1483/2774 [4:52:50<4:07:49, 11.52s/it] 53%|█████▎ | 1484/2774 [4:53:01<4:06:57, 11.49s/it] {'loss': 0.9639, 'learning_rate': 2.3395271351713515e-06, 'epoch': 0.53} 53%|█████▎ | 1484/2774 [4:53:01<4:06:57, 11.49s/it] 54%|█████▎ | 1485/2774 [4:53:13<4:08:20, 11.56s/it] {'loss': 1.0854, 'learning_rate': 2.3366135706382644e-06, 'epoch': 0.54} 54%|█████▎ | 1485/2774 [4:53:13<4:08:20, 11.56s/it] 54%|█████▎ | 1486/2774 [4:53:25<4:09:40, 11.63s/it] {'loss': 1.083, 'learning_rate': 2.333700228954398e-06, 'epoch': 0.54} 54%|█████▎ | 1486/2774 [4:53:25<4:09:40, 11.63s/it] 54%|█████▎ | 1487/2774 [4:53:36<4:07:08, 11.52s/it] {'loss': 1.0493, 'learning_rate': 2.3307871140933725e-06, 'epoch': 0.54} 54%|█████▎ | 1487/2774 [4:53:36<4:07:08, 11.52s/it] 54%|█████▎ | 1488/2774 [4:53:47<4:06:36, 11.51s/it] {'loss': 1.0029, 'learning_rate': 2.327874230028502e-06, 'epoch': 0.54} 54%|█████▎ | 1488/2774 [4:53:47<4:06:36, 11.51s/it] 54%|█████▎ | 1489/2774 [4:53:59<4:07:17, 11.55s/it] {'loss': 1.1045, 'learning_rate': 2.3249615807327836e-06, 'epoch': 0.54} 54%|█████▎ | 1489/2774 [4:53:59<4:07:17, 11.55s/it] 54%|█████▎ | 1490/2774 [4:54:11<4:06:36, 11.52s/it] {'loss': 1.085, 'learning_rate': 2.3220491701788953e-06, 'epoch': 0.54} 54%|█████▎ | 1490/2774 [4:54:11<4:06:36, 11.52s/it] 54%|█████▎ | 1491/2774 [4:54:23<4:10:12, 11.70s/it] {'loss': 1.0215, 'learning_rate': 2.3191370023391894e-06, 'epoch': 0.54} 54%|█████▎ | 1491/2774 [4:54:23<4:10:12, 11.70s/it] 54%|█████▍ | 1492/2774 [4:54:34<4:08:44, 11.64s/it] {'loss': 1.0093, 'learning_rate': 2.3162250811856863e-06, 'epoch': 0.54} 54%|█████▍ | 1492/2774 [4:54:34<4:08:44, 11.64s/it] 54%|█████▍ | 1493/2774 [4:54:45<4:05:11, 11.48s/it] {'loss': 1.0586, 'learning_rate': 2.313313410690071e-06, 'epoch': 0.54} 54%|█████▍ | 1493/2774 [4:54:45<4:05:11, 11.48s/it] 54%|█████▍ | 1494/2774 [4:54:57<4:04:55, 11.48s/it] {'loss': 1.1016, 'learning_rate': 2.3104019948236864e-06, 'epoch': 0.54} 54%|█████▍ | 1494/2774 [4:54:57<4:04:55, 11.48s/it] 54%|█████▍ | 1495/2774 [4:55:08<4:04:04, 11.45s/it] {'loss': 1.0366, 'learning_rate': 2.3074908375575273e-06, 'epoch': 0.54} 54%|█████▍ | 1495/2774 [4:55:08<4:04:04, 11.45s/it] 54%|█████▍ | 1496/2774 [4:55:19<4:02:23, 11.38s/it] {'loss': 1.0449, 'learning_rate': 2.3045799428622366e-06, 'epoch': 0.54} 54%|█████▍ | 1496/2774 [4:55:19<4:02:23, 11.38s/it] 54%|█████▍ | 1497/2774 [4:55:31<4:02:58, 11.42s/it] {'loss': 1.0093, 'learning_rate': 2.3016693147081e-06, 'epoch': 0.54} 54%|█████▍ | 1497/2774 [4:55:31<4:02:58, 11.42s/it] 54%|█████▍ | 1498/2774 [4:55:44<4:14:47, 11.98s/it] {'loss': 0.9878, 'learning_rate': 2.298758957065036e-06, 'epoch': 0.54} 54%|█████▍ | 1498/2774 [4:55:44<4:14:47, 11.98s/it] 54%|█████▍ | 1499/2774 [4:55:56<4:14:02, 11.95s/it] {'loss': 0.9653, 'learning_rate': 2.295848873902598e-06, 'epoch': 0.54} 54%|█████▍ | 1499/2774 [4:55:56<4:14:02, 11.95s/it] 54%|█████▍ | 1500/2774 [4:56:08<4:11:00, 11.82s/it] {'loss': 1.0483, 'learning_rate': 2.2929390691899635e-06, 'epoch': 0.54} 54%|█████▍ | 1500/2774 [4:56:08<4:11:00, 11.82s/it] 54%|█████▍ | 1501/2774 [4:56:19<4:07:08, 11.65s/it] {'loss': 1.0049, 'learning_rate': 2.2900295468959304e-06, 'epoch': 0.54} 54%|█████▍ | 1501/2774 [4:56:19<4:07:08, 11.65s/it] 54%|█████▍ | 1502/2774 [4:56:31<4:07:32, 11.68s/it] {'loss': 0.9829, 'learning_rate': 2.2871203109889117e-06, 'epoch': 0.54} 54%|█████▍ | 1502/2774 [4:56:31<4:07:32, 11.68s/it] 54%|█████▍ | 1503/2774 [4:56:42<4:07:01, 11.66s/it] {'loss': 1.0122, 'learning_rate': 2.284211365436929e-06, 'epoch': 0.54} 54%|█████▍ | 1503/2774 [4:56:42<4:07:01, 11.66s/it] 54%|█████▍ | 1504/2774 [4:56:54<4:09:00, 11.76s/it] {'loss': 1.019, 'learning_rate': 2.281302714207608e-06, 'epoch': 0.54} 54%|█████▍ | 1504/2774 [4:56:54<4:09:00, 11.76s/it] 54%|█████▍ | 1505/2774 [4:57:08<4:19:31, 12.27s/it] {'loss': 1.0117, 'learning_rate': 2.2783943612681743e-06, 'epoch': 0.54} 54%|█████▍ | 1505/2774 [4:57:08<4:19:31, 12.27s/it] 54%|█████▍ | 1506/2774 [4:57:19<4:14:15, 12.03s/it] {'loss': 0.9893, 'learning_rate': 2.2754863105854456e-06, 'epoch': 0.54} 54%|█████▍ | 1506/2774 [4:57:19<4:14:15, 12.03s/it] 54%|█████▍ | 1507/2774 [4:57:30<4:08:30, 11.77s/it] {'loss': 1.0283, 'learning_rate': 2.272578566125826e-06, 'epoch': 0.54} 54%|█████▍ | 1507/2774 [4:57:30<4:08:30, 11.77s/it] 54%|█████▍ | 1508/2774 [4:57:42<4:06:47, 11.70s/it] {'loss': 1.0322, 'learning_rate': 2.269671131855304e-06, 'epoch': 0.54} 54%|█████▍ | 1508/2774 [4:57:42<4:06:47, 11.70s/it] 54%|█████▍ | 1509/2774 [4:57:53<4:06:38, 11.70s/it] {'loss': 1.0518, 'learning_rate': 2.266764011739444e-06, 'epoch': 0.54} 54%|█████▍ | 1509/2774 [4:57:53<4:06:38, 11.70s/it] 54%|█████▍ | 1510/2774 [4:58:05<4:04:55, 11.63s/it] {'loss': 1.0142, 'learning_rate': 2.263857209743383e-06, 'epoch': 0.54} 54%|█████▍ | 1510/2774 [4:58:05<4:04:55, 11.63s/it] 54%|█████▍ | 1511/2774 [4:58:17<4:06:29, 11.71s/it] {'loss': 1.0156, 'learning_rate': 2.2609507298318235e-06, 'epoch': 0.54} 54%|█████▍ | 1511/2774 [4:58:17<4:06:29, 11.71s/it] 55%|█████▍ | 1512/2774 [4:58:28<4:05:13, 11.66s/it] {'loss': 1.0151, 'learning_rate': 2.258044575969027e-06, 'epoch': 0.55} 55%|█████▍ | 1512/2774 [4:58:28<4:05:13, 11.66s/it] 55%|█████▍ | 1513/2774 [4:58:40<4:04:06, 11.61s/it] {'loss': 1.0464, 'learning_rate': 2.2551387521188135e-06, 'epoch': 0.55} 55%|█████▍ | 1513/2774 [4:58:40<4:04:06, 11.61s/it] 55%|█████▍ | 1514/2774 [4:58:51<4:00:35, 11.46s/it] {'loss': 1.0225, 'learning_rate': 2.25223326224455e-06, 'epoch': 0.55} 55%|█████▍ | 1514/2774 [4:58:51<4:00:35, 11.46s/it] 55%|█████▍ | 1515/2774 [4:59:02<3:59:23, 11.41s/it] {'loss': 1.0718, 'learning_rate': 2.24932811030915e-06, 'epoch': 0.55} 55%|█████▍ | 1515/2774 [4:59:02<3:59:23, 11.41s/it] 55%|█████▍ | 1516/2774 [4:59:14<3:59:01, 11.40s/it] {'loss': 1.0127, 'learning_rate': 2.246423300275065e-06, 'epoch': 0.55} 55%|█████▍ | 1516/2774 [4:59:14<3:59:01, 11.40s/it] 55%|█████▍ | 1517/2774 [4:59:25<3:57:30, 11.34s/it] {'loss': 1.0371, 'learning_rate': 2.2435188361042794e-06, 'epoch': 0.55} 55%|█████▍ | 1517/2774 [4:59:25<3:57:30, 11.34s/it] 55%|█████▍ | 1518/2774 [4:59:38<4:10:56, 11.99s/it] {'loss': 0.9985, 'learning_rate': 2.240614721758308e-06, 'epoch': 0.55} 55%|█████▍ | 1518/2774 [4:59:38<4:10:56, 11.99s/it] 55%|█████▍ | 1519/2774 [4:59:50<4:09:21, 11.92s/it] {'loss': 1.0176, 'learning_rate': 2.2377109611981875e-06, 'epoch': 0.55} 55%|█████▍ | 1519/2774 [4:59:50<4:09:21, 11.92s/it] 55%|█████▍ | 1520/2774 [5:00:02<4:09:46, 11.95s/it] {'loss': 1.0322, 'learning_rate': 2.234807558384471e-06, 'epoch': 0.55} 55%|█████▍ | 1520/2774 [5:00:02<4:09:46, 11.95s/it] 55%|█████▍ | 1521/2774 [5:00:15<4:14:19, 12.18s/it] {'loss': 1.0449, 'learning_rate': 2.2319045172772254e-06, 'epoch': 0.55} 55%|█████▍ | 1521/2774 [5:00:15<4:14:19, 12.18s/it] 55%|█████▍ | 1522/2774 [5:00:28<4:20:02, 12.46s/it] {'loss': 0.9907, 'learning_rate': 2.2290018418360228e-06, 'epoch': 0.55} 55%|█████▍ | 1522/2774 [5:00:28<4:20:02, 12.46s/it] 55%|█████▍ | 1523/2774 [5:00:41<4:21:24, 12.54s/it] {'loss': 0.9746, 'learning_rate': 2.2260995360199376e-06, 'epoch': 0.55} 55%|█████▍ | 1523/2774 [5:00:41<4:21:24, 12.54s/it] 55%|█████▍ | 1524/2774 [5:00:52<4:13:33, 12.17s/it] {'loss': 1.0635, 'learning_rate': 2.2231976037875404e-06, 'epoch': 0.55} 55%|█████▍ | 1524/2774 [5:00:52<4:13:33, 12.17s/it] 55%|█████▍ | 1525/2774 [5:01:04<4:11:05, 12.06s/it] {'loss': 1.0654, 'learning_rate': 2.220296049096889e-06, 'epoch': 0.55} 55%|█████▍ | 1525/2774 [5:01:04<4:11:05, 12.06s/it] 55%|█████▌ | 1526/2774 [5:01:15<4:05:43, 11.81s/it] {'loss': 1.0605, 'learning_rate': 2.2173948759055306e-06, 'epoch': 0.55} 55%|█████▌ | 1526/2774 [5:01:15<4:05:43, 11.81s/it] 55%|█████▌ | 1527/2774 [5:01:27<4:04:30, 11.76s/it] {'loss': 1.021, 'learning_rate': 2.21449408817049e-06, 'epoch': 0.55} 55%|█████▌ | 1527/2774 [5:01:27<4:04:30, 11.76s/it] 55%|█████▌ | 1528/2774 [5:01:38<4:01:37, 11.64s/it] {'loss': 0.9937, 'learning_rate': 2.2115936898482654e-06, 'epoch': 0.55} 55%|█████▌ | 1528/2774 [5:01:38<4:01:37, 11.64s/it] 55%|█████▌ | 1529/2774 [5:01:49<3:58:50, 11.51s/it] {'loss': 1.0864, 'learning_rate': 2.208693684894826e-06, 'epoch': 0.55} 55%|█████▌ | 1529/2774 [5:01:49<3:58:50, 11.51s/it] 55%|█████▌ | 1530/2774 [5:02:01<3:59:43, 11.56s/it] {'loss': 1.0703, 'learning_rate': 2.2057940772656034e-06, 'epoch': 0.55} 55%|█████▌ | 1530/2774 [5:02:01<3:59:43, 11.56s/it] 55%|█████▌ | 1531/2774 [5:02:12<3:59:08, 11.54s/it] {'loss': 1.0186, 'learning_rate': 2.2028948709154867e-06, 'epoch': 0.55} 55%|█████▌ | 1531/2774 [5:02:12<3:59:08, 11.54s/it] 55%|█████▌ | 1532/2774 [5:02:25<4:05:53, 11.88s/it] {'loss': 1.061, 'learning_rate': 2.199996069798819e-06, 'epoch': 0.55} 55%|█████▌ | 1532/2774 [5:02:25<4:05:53, 11.88s/it] 55%|█████▌ | 1533/2774 [5:02:36<4:00:47, 11.64s/it] {'loss': 1.0508, 'learning_rate': 2.197097677869389e-06, 'epoch': 0.55} 55%|█████▌ | 1533/2774 [5:02:36<4:00:47, 11.64s/it] 55%|█████▌ | 1534/2774 [5:02:50<4:12:20, 12.21s/it] {'loss': 1.0029, 'learning_rate': 2.1941996990804287e-06, 'epoch': 0.55} 55%|█████▌ | 1534/2774 [5:02:50<4:12:20, 12.21s/it] 55%|█████▌ | 1535/2774 [5:03:01<4:06:35, 11.94s/it] {'loss': 1.0044, 'learning_rate': 2.1913021373846056e-06, 'epoch': 0.55} 55%|█████▌ | 1535/2774 [5:03:01<4:06:35, 11.94s/it] 55%|█████▌ | 1536/2774 [5:03:15<4:17:21, 12.47s/it] {'loss': 0.9658, 'learning_rate': 2.1884049967340193e-06, 'epoch': 0.55} 55%|█████▌ | 1536/2774 [5:03:15<4:17:21, 12.47s/it] 55%|█████▌ | 1537/2774 [5:03:26<4:10:55, 12.17s/it] {'loss': 1.0859, 'learning_rate': 2.185508281080195e-06, 'epoch': 0.55} 55%|█████▌ | 1537/2774 [5:03:26<4:10:55, 12.17s/it] 55%|█████▌ | 1538/2774 [5:03:38<4:06:56, 11.99s/it] {'loss': 1.0078, 'learning_rate': 2.182611994374077e-06, 'epoch': 0.55} 55%|█████▌ | 1538/2774 [5:03:38<4:06:56, 11.99s/it] 55%|█████▌ | 1539/2774 [5:03:49<4:02:50, 11.80s/it] {'loss': 1.0024, 'learning_rate': 2.1797161405660257e-06, 'epoch': 0.55} 55%|█████▌ | 1539/2774 [5:03:49<4:02:50, 11.80s/it] 56%|█████▌ | 1540/2774 [5:04:01<4:00:41, 11.70s/it] {'loss': 1.0522, 'learning_rate': 2.1768207236058106e-06, 'epoch': 0.56} 56%|█████▌ | 1540/2774 [5:04:01<4:00:41, 11.70s/it] 56%|█████▌ | 1541/2774 [5:04:12<3:59:04, 11.63s/it] {'loss': 0.9976, 'learning_rate': 2.173925747442606e-06, 'epoch': 0.56} 56%|█████▌ | 1541/2774 [5:04:12<3:59:04, 11.63s/it] 56%|█████▌ | 1542/2774 [5:04:23<3:57:04, 11.55s/it] {'loss': 1.085, 'learning_rate': 2.1710312160249856e-06, 'epoch': 0.56} 56%|█████▌ | 1542/2774 [5:04:23<3:57:04, 11.55s/it] 56%|█████▌ | 1543/2774 [5:04:36<4:04:57, 11.94s/it] {'loss': 0.9932, 'learning_rate': 2.1681371333009127e-06, 'epoch': 0.56} 56%|█████▌ | 1543/2774 [5:04:36<4:04:57, 11.94s/it] 56%|█████▌ | 1544/2774 [5:04:48<4:02:13, 11.82s/it] {'loss': 1.0488, 'learning_rate': 2.1652435032177425e-06, 'epoch': 0.56} 56%|█████▌ | 1544/2774 [5:04:48<4:02:13, 11.82s/it] 56%|█████▌ | 1545/2774 [5:04:59<3:59:04, 11.67s/it] {'loss': 1.0444, 'learning_rate': 2.1623503297222124e-06, 'epoch': 0.56} 56%|█████▌ | 1545/2774 [5:04:59<3:59:04, 11.67s/it] 56%|█████▌ | 1546/2774 [5:05:11<3:58:29, 11.65s/it] {'loss': 1.0547, 'learning_rate': 2.1594576167604355e-06, 'epoch': 0.56} 56%|█████▌ | 1546/2774 [5:05:11<3:58:29, 11.65s/it] 56%|█████▌ | 1547/2774 [5:05:22<3:58:17, 11.65s/it] {'loss': 1.0273, 'learning_rate': 2.1565653682778975e-06, 'epoch': 0.56} 56%|█████▌ | 1547/2774 [5:05:22<3:58:17, 11.65s/it] 56%|█████▌ | 1548/2774 [5:05:34<3:55:48, 11.54s/it] {'loss': 1.0239, 'learning_rate': 2.153673588219451e-06, 'epoch': 0.56} 56%|█████▌ | 1548/2774 [5:05:34<3:55:48, 11.54s/it] 56%|█████▌ | 1549/2774 [5:05:45<3:57:15, 11.62s/it] {'loss': 1.0684, 'learning_rate': 2.150782280529309e-06, 'epoch': 0.56} 56%|█████▌ | 1549/2774 [5:05:45<3:57:15, 11.62s/it] 56%|█████▌ | 1550/2774 [5:05:57<3:55:34, 11.55s/it] {'loss': 1.0542, 'learning_rate': 2.1478914491510412e-06, 'epoch': 0.56} 56%|█████▌ | 1550/2774 [5:05:57<3:55:34, 11.55s/it] 56%|█████▌ | 1551/2774 [5:06:09<3:57:15, 11.64s/it] {'loss': 0.9854, 'learning_rate': 2.145001098027567e-06, 'epoch': 0.56} 56%|█████▌ | 1551/2774 [5:06:09<3:57:15, 11.64s/it] 56%|█████▌ | 1552/2774 [5:06:20<3:56:25, 11.61s/it] {'loss': 0.9917, 'learning_rate': 2.1421112311011493e-06, 'epoch': 0.56} 56%|█████▌ | 1552/2774 [5:06:20<3:56:25, 11.61s/it] 56%|█████▌ | 1553/2774 [5:06:32<3:54:46, 11.54s/it] {'loss': 1.0078, 'learning_rate': 2.1392218523133927e-06, 'epoch': 0.56} 56%|█████▌ | 1553/2774 [5:06:32<3:54:46, 11.54s/it] 56%|█████▌ | 1554/2774 [5:06:43<3:55:53, 11.60s/it] {'loss': 1.0273, 'learning_rate': 2.136332965605236e-06, 'epoch': 0.56} 56%|█████▌ | 1554/2774 [5:06:43<3:55:53, 11.60s/it] 56%|█████▌ | 1555/2774 [5:06:55<3:55:55, 11.61s/it] {'loss': 1.0586, 'learning_rate': 2.1334445749169457e-06, 'epoch': 0.56} 56%|█████▌ | 1555/2774 [5:06:55<3:55:55, 11.61s/it] 56%|█████▌ | 1556/2774 [5:07:06<3:54:18, 11.54s/it] {'loss': 1.0327, 'learning_rate': 2.130556684188112e-06, 'epoch': 0.56} 56%|█████▌ | 1556/2774 [5:07:06<3:54:18, 11.54s/it] 56%|█████▌ | 1557/2774 [5:07:18<3:54:26, 11.56s/it] {'loss': 1.0215, 'learning_rate': 2.127669297357644e-06, 'epoch': 0.56} 56%|█████▌ | 1557/2774 [5:07:18<3:54:26, 11.56s/it] 56%|█████▌ | 1558/2774 [5:07:29<3:54:04, 11.55s/it] {'loss': 1.019, 'learning_rate': 2.124782418363762e-06, 'epoch': 0.56} 56%|█████▌ | 1558/2774 [5:07:29<3:54:04, 11.55s/it] 56%|█████▌ | 1559/2774 [5:07:41<3:53:42, 11.54s/it] {'loss': 1.0552, 'learning_rate': 2.1218960511439953e-06, 'epoch': 0.56} 56%|█████▌ | 1559/2774 [5:07:41<3:53:42, 11.54s/it] 56%|█████▌ | 1560/2774 [5:07:53<3:53:16, 11.53s/it] {'loss': 0.9746, 'learning_rate': 2.1190101996351745e-06, 'epoch': 0.56} 56%|█████▌ | 1560/2774 [5:07:53<3:53:16, 11.53s/it] 56%|█████▋ | 1561/2774 [5:08:04<3:52:42, 11.51s/it] {'loss': 1.0146, 'learning_rate': 2.1161248677734263e-06, 'epoch': 0.56} 56%|█████▋ | 1561/2774 [5:08:04<3:52:42, 11.51s/it] 56%|█████▋ | 1562/2774 [5:08:17<3:59:18, 11.85s/it] {'loss': 0.9775, 'learning_rate': 2.1132400594941697e-06, 'epoch': 0.56} 56%|█████▋ | 1562/2774 [5:08:17<3:59:18, 11.85s/it] 56%|█████▋ | 1563/2774 [5:08:28<3:55:53, 11.69s/it] {'loss': 1.0454, 'learning_rate': 2.1103557787321076e-06, 'epoch': 0.56} 56%|█████▋ | 1563/2774 [5:08:28<3:55:53, 11.69s/it] 56%|█████▋ | 1564/2774 [5:08:40<3:55:16, 11.67s/it] {'loss': 1.04, 'learning_rate': 2.107472029421226e-06, 'epoch': 0.56} 56%|█████▋ | 1564/2774 [5:08:40<3:55:16, 11.67s/it] 56%|█████▋ | 1565/2774 [5:08:51<3:54:47, 11.65s/it] {'loss': 1.0537, 'learning_rate': 2.104588815494784e-06, 'epoch': 0.56} 56%|█████▋ | 1565/2774 [5:08:51<3:54:47, 11.65s/it] 56%|█████▋ | 1566/2774 [5:09:03<3:54:09, 11.63s/it] {'loss': 1.041, 'learning_rate': 2.101706140885311e-06, 'epoch': 0.56} 56%|█████▋ | 1566/2774 [5:09:03<3:54:09, 11.63s/it] 56%|█████▋ | 1567/2774 [5:09:14<3:53:14, 11.59s/it] {'loss': 1.0352, 'learning_rate': 2.0988240095246025e-06, 'epoch': 0.56} 56%|█████▋ | 1567/2774 [5:09:14<3:53:14, 11.59s/it] 57%|█████▋ | 1568/2774 [5:09:25<3:50:44, 11.48s/it] {'loss': 1.0142, 'learning_rate': 2.09594242534371e-06, 'epoch': 0.57} 57%|█████▋ | 1568/2774 [5:09:25<3:50:44, 11.48s/it] 57%|█████▋ | 1569/2774 [5:09:37<3:48:25, 11.37s/it] {'loss': 0.9736, 'learning_rate': 2.0930613922729424e-06, 'epoch': 0.57} 57%|█████▋ | 1569/2774 [5:09:37<3:48:25, 11.37s/it] 57%|█████▋ | 1570/2774 [5:09:48<3:48:11, 11.37s/it] {'loss': 1.0083, 'learning_rate': 2.090180914241852e-06, 'epoch': 0.57} 57%|█████▋ | 1570/2774 [5:09:48<3:48:11, 11.37s/it] 57%|█████▋ | 1571/2774 [5:09:59<3:47:19, 11.34s/it] {'loss': 1.04, 'learning_rate': 2.087300995179238e-06, 'epoch': 0.57} 57%|█████▋ | 1571/2774 [5:09:59<3:47:19, 11.34s/it] 57%|█████▋ | 1572/2774 [5:10:10<3:46:30, 11.31s/it] {'loss': 1.0029, 'learning_rate': 2.084421639013136e-06, 'epoch': 0.57} 57%|█████▋ | 1572/2774 [5:10:10<3:46:30, 11.31s/it] 57%|█████▋ | 1573/2774 [5:10:22<3:47:28, 11.36s/it] {'loss': 1.0522, 'learning_rate': 2.0815428496708143e-06, 'epoch': 0.57} 57%|█████▋ | 1573/2774 [5:10:22<3:47:28, 11.36s/it] 57%|█████▋ | 1574/2774 [5:10:33<3:47:11, 11.36s/it] {'loss': 1.0327, 'learning_rate': 2.078664631078767e-06, 'epoch': 0.57} 57%|█████▋ | 1574/2774 [5:10:33<3:47:11, 11.36s/it] 57%|█████▋ | 1575/2774 [5:10:47<3:59:48, 12.00s/it] {'loss': 0.9668, 'learning_rate': 2.0757869871627112e-06, 'epoch': 0.57} 57%|█████▋ | 1575/2774 [5:10:47<3:59:48, 12.00s/it] 57%|█████▋ | 1576/2774 [5:10:58<3:56:42, 11.86s/it] {'loss': 1.0615, 'learning_rate': 2.0729099218475784e-06, 'epoch': 0.57} 57%|█████▋ | 1576/2774 [5:10:58<3:56:42, 11.86s/it] 57%|█████▋ | 1577/2774 [5:11:09<3:52:20, 11.65s/it] {'loss': 1.0366, 'learning_rate': 2.0700334390575126e-06, 'epoch': 0.57} 57%|█████▋ | 1577/2774 [5:11:09<3:52:20, 11.65s/it] 57%|█████▋ | 1578/2774 [5:11:21<3:52:35, 11.67s/it] {'loss': 0.9565, 'learning_rate': 2.0671575427158638e-06, 'epoch': 0.57} 57%|█████▋ | 1578/2774 [5:11:21<3:52:35, 11.67s/it] 57%|█████▋ | 1579/2774 [5:11:33<3:54:01, 11.75s/it] {'loss': 1.0215, 'learning_rate': 2.0642822367451783e-06, 'epoch': 0.57} 57%|█████▋ | 1579/2774 [5:11:33<3:54:01, 11.75s/it] 57%|█████▋ | 1580/2774 [5:11:46<3:59:13, 12.02s/it] {'loss': 1.0239, 'learning_rate': 2.0614075250672006e-06, 'epoch': 0.57} 57%|█████▋ | 1580/2774 [5:11:46<3:59:13, 12.02s/it] 57%|█████▋ | 1581/2774 [5:11:57<3:56:48, 11.91s/it] {'loss': 1.0464, 'learning_rate': 2.058533411602864e-06, 'epoch': 0.57} 57%|█████▋ | 1581/2774 [5:11:57<3:56:48, 11.91s/it] 57%|█████▋ | 1582/2774 [5:12:09<3:53:56, 11.78s/it] {'loss': 1.0146, 'learning_rate': 2.055659900272286e-06, 'epoch': 0.57} 57%|█████▋ | 1582/2774 [5:12:09<3:53:56, 11.78s/it] 57%|█████▋ | 1583/2774 [5:12:21<3:53:55, 11.78s/it] {'loss': 1.0396, 'learning_rate': 2.052786994994763e-06, 'epoch': 0.57} 57%|█████▋ | 1583/2774 [5:12:21<3:53:55, 11.78s/it] 57%|█████▋ | 1584/2774 [5:12:35<4:07:47, 12.49s/it] {'loss': 0.9941, 'learning_rate': 2.049914699688762e-06, 'epoch': 0.57} 57%|█████▋ | 1584/2774 [5:12:35<4:07:47, 12.49s/it] 57%|█████▋ | 1585/2774 [5:12:46<4:00:58, 12.16s/it] {'loss': 1.0786, 'learning_rate': 2.047043018271922e-06, 'epoch': 0.57} 57%|█████▋ | 1585/2774 [5:12:46<4:00:58, 12.16s/it] 57%|█████▋ | 1586/2774 [5:12:57<3:54:30, 11.84s/it] {'loss': 0.9897, 'learning_rate': 2.044171954661043e-06, 'epoch': 0.57} 57%|█████▋ | 1586/2774 [5:12:57<3:54:30, 11.84s/it] 57%|█████▋ | 1587/2774 [5:13:09<3:55:17, 11.89s/it] {'loss': 1.0459, 'learning_rate': 2.0413015127720826e-06, 'epoch': 0.57} 57%|█████▋ | 1587/2774 [5:13:09<3:55:17, 11.89s/it] 57%|█████▋ | 1588/2774 [5:13:21<3:51:46, 11.73s/it] {'loss': 1.0244, 'learning_rate': 2.0384316965201494e-06, 'epoch': 0.57} 57%|█████▋ | 1588/2774 [5:13:21<3:51:46, 11.73s/it] 57%|█████▋ | 1589/2774 [5:13:33<3:53:51, 11.84s/it] {'loss': 1.0049, 'learning_rate': 2.035562509819499e-06, 'epoch': 0.57} 57%|█████▋ | 1589/2774 [5:13:33<3:53:51, 11.84s/it] 57%|█████▋ | 1590/2774 [5:13:44<3:51:04, 11.71s/it] {'loss': 1.0215, 'learning_rate': 2.0326939565835296e-06, 'epoch': 0.57} 57%|█████▋ | 1590/2774 [5:13:44<3:51:04, 11.71s/it] 57%|█████▋ | 1591/2774 [5:13:56<3:51:58, 11.77s/it] {'loss': 1.0376, 'learning_rate': 2.029826040724774e-06, 'epoch': 0.57} 57%|█████▋ | 1591/2774 [5:13:56<3:51:58, 11.77s/it] 57%|█████▋ | 1592/2774 [5:14:08<3:51:56, 11.77s/it] {'loss': 1.0303, 'learning_rate': 2.0269587661548964e-06, 'epoch': 0.57} 57%|█████▋ | 1592/2774 [5:14:08<3:51:56, 11.77s/it] 57%|█████▋ | 1593/2774 [5:14:19<3:50:23, 11.70s/it] {'loss': 1.0366, 'learning_rate': 2.0240921367846863e-06, 'epoch': 0.57} 57%|█████▋ | 1593/2774 [5:14:19<3:50:23, 11.70s/it] 57%|█████▋ | 1594/2774 [5:14:31<3:46:48, 11.53s/it] {'loss': 1.0078, 'learning_rate': 2.0212261565240528e-06, 'epoch': 0.57} 57%|█████▋ | 1594/2774 [5:14:31<3:46:48, 11.53s/it] 57%|█████▋ | 1595/2774 [5:14:43<3:50:10, 11.71s/it] {'loss': 1.0356, 'learning_rate': 2.0183608292820197e-06, 'epoch': 0.57} 57%|█████▋ | 1595/2774 [5:14:43<3:50:10, 11.71s/it] 58%|█████▊ | 1596/2774 [5:14:54<3:48:06, 11.62s/it] {'loss': 1.019, 'learning_rate': 2.015496158966722e-06, 'epoch': 0.58} 58%|█████▊ | 1596/2774 [5:14:54<3:48:06, 11.62s/it] 58%|█████▊ | 1597/2774 [5:15:06<3:47:36, 11.60s/it] {'loss': 1.082, 'learning_rate': 2.0126321494853936e-06, 'epoch': 0.58} 58%|█████▊ | 1597/2774 [5:15:06<3:47:36, 11.60s/it] 58%|█████▊ | 1598/2774 [5:15:17<3:45:24, 11.50s/it] {'loss': 1.085, 'learning_rate': 2.0097688047443727e-06, 'epoch': 0.58} 58%|█████▊ | 1598/2774 [5:15:17<3:45:24, 11.50s/it] 58%|█████▊ | 1599/2774 [5:15:29<3:46:28, 11.57s/it] {'loss': 1.0591, 'learning_rate': 2.0069061286490877e-06, 'epoch': 0.58} 58%|█████▊ | 1599/2774 [5:15:29<3:46:28, 11.57s/it] 58%|█████▊ | 1600/2774 [5:15:40<3:45:14, 11.51s/it] {'loss': 1.0796, 'learning_rate': 2.004044125104057e-06, 'epoch': 0.58} 58%|█████▊ | 1600/2774 [5:15:40<3:45:14, 11.51s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 58%|█████▊ | 1601/2774 [5:16:18<6:21:54, 19.54s/it] {'loss': 1.0679, 'learning_rate': 2.0011827980128788e-06, 'epoch': 0.58} 58%|█████▊ | 1601/2774 [5:16:18<6:21:54, 19.54s/it] 58%|█████▊ | 1602/2774 [5:16:30<5:36:10, 17.21s/it] {'loss': 1.0166, 'learning_rate': 1.998322151278232e-06, 'epoch': 0.58} 58%|█████▊ | 1602/2774 [5:16:30<5:36:10, 17.21s/it] 58%|█████▊ | 1603/2774 [5:16:41<5:01:00, 15.42s/it] {'loss': 1.0493, 'learning_rate': 1.9954621888018656e-06, 'epoch': 0.58} 58%|█████▊ | 1603/2774 [5:16:41<5:01:00, 15.42s/it] 58%|█████▊ | 1604/2774 [5:16:53<4:40:13, 14.37s/it] {'loss': 1.0366, 'learning_rate': 1.9926029144845956e-06, 'epoch': 0.58} 58%|█████▊ | 1604/2774 [5:16:53<4:40:13, 14.37s/it] 58%|█████▊ | 1605/2774 [5:17:05<4:26:40, 13.69s/it] {'loss': 1.0444, 'learning_rate': 1.9897443322262985e-06, 'epoch': 0.58} 58%|█████▊ | 1605/2774 [5:17:05<4:26:40, 13.69s/it] 58%|█████▊ | 1606/2774 [5:17:17<4:13:51, 13.04s/it] {'loss': 1.0464, 'learning_rate': 1.986886445925909e-06, 'epoch': 0.58} 58%|█████▊ | 1606/2774 [5:17:17<4:13:51, 13.04s/it] 58%|█████▊ | 1607/2774 [5:17:29<4:06:27, 12.67s/it] {'loss': 1.0278, 'learning_rate': 1.984029259481411e-06, 'epoch': 0.58} 58%|█████▊ | 1607/2774 [5:17:29<4:06:27, 12.67s/it] 58%|█████▊ | 1608/2774 [5:17:40<3:58:35, 12.28s/it] {'loss': 1.0576, 'learning_rate': 1.981172776789834e-06, 'epoch': 0.58} 58%|█████▊ | 1608/2774 [5:17:40<3:58:35, 12.28s/it] 58%|█████▊ | 1609/2774 [5:17:52<3:58:56, 12.31s/it] {'loss': 1.0083, 'learning_rate': 1.978317001747248e-06, 'epoch': 0.58} 58%|█████▊ | 1609/2774 [5:17:52<3:58:56, 12.31s/it] 58%|█████▊ | 1610/2774 [5:18:04<3:54:30, 12.09s/it] {'loss': 0.9878, 'learning_rate': 1.9754619382487572e-06, 'epoch': 0.58} 58%|█████▊ | 1610/2774 [5:18:04<3:54:30, 12.09s/it] 58%|█████▊ | 1611/2774 [5:18:16<3:55:43, 12.16s/it] {'loss': 1.0107, 'learning_rate': 1.9726075901884964e-06, 'epoch': 0.58} 58%|█████▊ | 1611/2774 [5:18:16<3:55:43, 12.16s/it] 58%|█████▊ | 1612/2774 [5:18:28<3:52:22, 12.00s/it] {'loss': 1.0063, 'learning_rate': 1.9697539614596237e-06, 'epoch': 0.58} 58%|█████▊ | 1612/2774 [5:18:28<3:52:22, 12.00s/it] 58%|█████▊ | 1613/2774 [5:18:39<3:48:56, 11.83s/it] {'loss': 1.0801, 'learning_rate': 1.9669010559543163e-06, 'epoch': 0.58} 58%|█████▊ | 1613/2774 [5:18:39<3:48:56, 11.83s/it] 58%|█████▊ | 1614/2774 [5:18:51<3:48:19, 11.81s/it] {'loss': 1.0244, 'learning_rate': 1.9640488775637647e-06, 'epoch': 0.58} 58%|█████▊ | 1614/2774 [5:18:51<3:48:19, 11.81s/it] 58%|█████▊ | 1615/2774 [5:19:02<3:44:57, 11.65s/it] {'loss': 1.0552, 'learning_rate': 1.9611974301781693e-06, 'epoch': 0.58} 58%|█████▊ | 1615/2774 [5:19:02<3:44:57, 11.65s/it] 58%|█████▊ | 1616/2774 [5:19:14<3:44:04, 11.61s/it] {'loss': 0.9683, 'learning_rate': 1.95834671768673e-06, 'epoch': 0.58} 58%|█████▊ | 1616/2774 [5:19:14<3:44:04, 11.61s/it] 58%|█████▊ | 1617/2774 [5:19:25<3:43:35, 11.60s/it] {'loss': 1.0337, 'learning_rate': 1.9554967439776474e-06, 'epoch': 0.58} 58%|█████▊ | 1617/2774 [5:19:25<3:43:35, 11.60s/it] 58%|█████▊ | 1618/2774 [5:19:37<3:44:01, 11.63s/it] {'loss': 1.0786, 'learning_rate': 1.952647512938114e-06, 'epoch': 0.58} 58%|█████▊ | 1618/2774 [5:19:37<3:44:01, 11.63s/it] 58%|█████▊ | 1619/2774 [5:19:48<3:41:13, 11.49s/it] {'loss': 1.0259, 'learning_rate': 1.9497990284543076e-06, 'epoch': 0.58} 58%|█████▊ | 1619/2774 [5:19:48<3:41:13, 11.49s/it] 58%|█████▊ | 1620/2774 [5:20:00<3:40:19, 11.46s/it] {'loss': 0.9878, 'learning_rate': 1.94695129441139e-06, 'epoch': 0.58} 58%|█████▊ | 1620/2774 [5:20:00<3:40:19, 11.46s/it] 58%|█████▊ | 1621/2774 [5:20:11<3:41:13, 11.51s/it] {'loss': 1.0674, 'learning_rate': 1.944104314693498e-06, 'epoch': 0.58} 58%|█████▊ | 1621/2774 [5:20:11<3:41:13, 11.51s/it] 58%|█████▊ | 1622/2774 [5:20:23<3:41:40, 11.55s/it] {'loss': 1.0474, 'learning_rate': 1.94125809318374e-06, 'epoch': 0.58} 58%|█████▊ | 1622/2774 [5:20:23<3:41:40, 11.55s/it] 59%|█████▊ | 1623/2774 [5:20:36<3:51:37, 12.07s/it] {'loss': 1.0312, 'learning_rate': 1.93841263376419e-06, 'epoch': 0.59} 59%|█████▊ | 1623/2774 [5:20:36<3:51:37, 12.07s/it] 59%|█████▊ | 1624/2774 [5:20:48<3:46:49, 11.83s/it] {'loss': 0.9824, 'learning_rate': 1.9355679403158843e-06, 'epoch': 0.59} 59%|█████▊ | 1624/2774 [5:20:48<3:46:49, 11.83s/it] 59%|█████▊ | 1625/2774 [5:20:59<3:46:16, 11.82s/it] {'loss': 1.0137, 'learning_rate': 1.932724016718811e-06, 'epoch': 0.59} 59%|█████▊ | 1625/2774 [5:20:59<3:46:16, 11.82s/it] 59%|█████▊ | 1626/2774 [5:21:11<3:45:13, 11.77s/it] {'loss': 1.0581, 'learning_rate': 1.92988086685191e-06, 'epoch': 0.59} 59%|█████▊ | 1626/2774 [5:21:11<3:45:13, 11.77s/it] 59%|█████▊ | 1627/2774 [5:21:22<3:41:23, 11.58s/it] {'loss': 1.0234, 'learning_rate': 1.9270384945930667e-06, 'epoch': 0.59} 59%|█████▊ | 1627/2774 [5:21:22<3:41:23, 11.58s/it] 59%|█████▊ | 1628/2774 [5:21:35<3:50:47, 12.08s/it] {'loss': 1.0142, 'learning_rate': 1.9241969038191055e-06, 'epoch': 0.59} 59%|█████▊ | 1628/2774 [5:21:35<3:50:47, 12.08s/it] 59%|█████▊ | 1629/2774 [5:21:48<3:51:33, 12.13s/it] {'loss': 1.0376, 'learning_rate': 1.9213560984057844e-06, 'epoch': 0.59} 59%|█████▊ | 1629/2774 [5:21:48<3:51:33, 12.13s/it] 59%|█████▉ | 1630/2774 [5:21:59<3:48:34, 11.99s/it] {'loss': 1.0186, 'learning_rate': 1.9185160822277896e-06, 'epoch': 0.59} 59%|█████▉ | 1630/2774 [5:21:59<3:48:34, 11.99s/it] 59%|█████▉ | 1631/2774 [5:22:11<3:47:51, 11.96s/it] {'loss': 0.9902, 'learning_rate': 1.915676859158733e-06, 'epoch': 0.59} 59%|█████▉ | 1631/2774 [5:22:11<3:47:51, 11.96s/it] 59%|█████▉ | 1632/2774 [5:22:23<3:46:01, 11.88s/it] {'loss': 1.043, 'learning_rate': 1.9128384330711416e-06, 'epoch': 0.59} 59%|█████▉ | 1632/2774 [5:22:23<3:46:01, 11.88s/it] 59%|█████▉ | 1633/2774 [5:22:34<3:43:05, 11.73s/it] {'loss': 1.02, 'learning_rate': 1.9100008078364586e-06, 'epoch': 0.59} 59%|█████▉ | 1633/2774 [5:22:34<3:43:05, 11.73s/it] 59%|█████▉ | 1634/2774 [5:22:48<3:52:27, 12.23s/it] {'loss': 1.0117, 'learning_rate': 1.9071639873250333e-06, 'epoch': 0.59} 59%|█████▉ | 1634/2774 [5:22:48<3:52:27, 12.23s/it] 59%|█████▉ | 1635/2774 [5:22:59<3:47:22, 11.98s/it] {'loss': 1.0488, 'learning_rate': 1.9043279754061164e-06, 'epoch': 0.59} 59%|█████▉ | 1635/2774 [5:22:59<3:47:22, 11.98s/it] 59%|█████▉ | 1636/2774 [5:23:11<3:44:19, 11.83s/it] {'loss': 1.0405, 'learning_rate': 1.9014927759478575e-06, 'epoch': 0.59} 59%|█████▉ | 1636/2774 [5:23:11<3:44:19, 11.83s/it] 59%|█████▉ | 1637/2774 [5:23:22<3:40:59, 11.66s/it] {'loss': 1.001, 'learning_rate': 1.8986583928172972e-06, 'epoch': 0.59} 59%|█████▉ | 1637/2774 [5:23:22<3:40:59, 11.66s/it] 59%|█████▉ | 1638/2774 [5:23:33<3:40:08, 11.63s/it] {'loss': 1.0308, 'learning_rate': 1.8958248298803634e-06, 'epoch': 0.59} 59%|█████▉ | 1638/2774 [5:23:33<3:40:08, 11.63s/it] 59%|█████▉ | 1639/2774 [5:23:45<3:37:39, 11.51s/it] {'loss': 0.98, 'learning_rate': 1.892992091001864e-06, 'epoch': 0.59} 59%|█████▉ | 1639/2774 [5:23:45<3:37:39, 11.51s/it] 59%|█████▉ | 1640/2774 [5:23:56<3:37:39, 11.52s/it] {'loss': 1.0264, 'learning_rate': 1.8901601800454845e-06, 'epoch': 0.59} 59%|█████▉ | 1640/2774 [5:23:56<3:37:39, 11.52s/it] 59%|█████▉ | 1641/2774 [5:24:07<3:35:55, 11.43s/it] {'loss': 1.0312, 'learning_rate': 1.8873291008737795e-06, 'epoch': 0.59} 59%|█████▉ | 1641/2774 [5:24:07<3:35:55, 11.43s/it] 59%|█████▉ | 1642/2774 [5:24:19<3:35:14, 11.41s/it] {'loss': 1.062, 'learning_rate': 1.8844988573481722e-06, 'epoch': 0.59} 59%|█████▉ | 1642/2774 [5:24:19<3:35:14, 11.41s/it] 59%|█████▉ | 1643/2774 [5:24:30<3:35:51, 11.45s/it] {'loss': 0.9956, 'learning_rate': 1.8816694533289405e-06, 'epoch': 0.59} 59%|█████▉ | 1643/2774 [5:24:30<3:35:51, 11.45s/it] 59%|█████▉ | 1644/2774 [5:24:42<3:38:28, 11.60s/it] {'loss': 1.0181, 'learning_rate': 1.8788408926752225e-06, 'epoch': 0.59} 59%|█████▉ | 1644/2774 [5:24:42<3:38:28, 11.60s/it] 59%|█████▉ | 1645/2774 [5:24:54<3:36:38, 11.51s/it] {'loss': 1.0698, 'learning_rate': 1.8760131792450034e-06, 'epoch': 0.59} 59%|█████▉ | 1645/2774 [5:24:54<3:36:38, 11.51s/it] 59%|█████▉ | 1646/2774 [5:25:05<3:37:28, 11.57s/it] {'loss': 1.0098, 'learning_rate': 1.8731863168951142e-06, 'epoch': 0.59} 59%|█████▉ | 1646/2774 [5:25:05<3:37:28, 11.57s/it] 59%|█████▉ | 1647/2774 [5:25:17<3:35:59, 11.50s/it] {'loss': 0.9658, 'learning_rate': 1.8703603094812236e-06, 'epoch': 0.59} 59%|█████▉ | 1647/2774 [5:25:17<3:35:59, 11.50s/it] 59%|█████▉ | 1648/2774 [5:25:28<3:34:41, 11.44s/it] {'loss': 1.0034, 'learning_rate': 1.8675351608578358e-06, 'epoch': 0.59} 59%|█████▉ | 1648/2774 [5:25:28<3:34:41, 11.44s/it] 59%|█████▉ | 1649/2774 [5:25:39<3:33:23, 11.38s/it] {'loss': 1.0171, 'learning_rate': 1.864710874878282e-06, 'epoch': 0.59} 59%|█████▉ | 1649/2774 [5:25:39<3:33:23, 11.38s/it] 59%|█████▉ | 1650/2774 [5:25:51<3:35:11, 11.49s/it] {'loss': 1.0508, 'learning_rate': 1.8618874553947189e-06, 'epoch': 0.59} 59%|█████▉ | 1650/2774 [5:25:51<3:35:11, 11.49s/it] 60%|█████▉ | 1651/2774 [5:26:05<3:49:00, 12.24s/it] {'loss': 1.0151, 'learning_rate': 1.8590649062581192e-06, 'epoch': 0.6} 60%|█████▉ | 1651/2774 [5:26:05<3:49:00, 12.24s/it] 60%|█████▉ | 1652/2774 [5:26:16<3:43:43, 11.96s/it] {'loss': 0.9712, 'learning_rate': 1.8562432313182692e-06, 'epoch': 0.6} 60%|█████▉ | 1652/2774 [5:26:16<3:43:43, 11.96s/it] 60%|█████▉ | 1653/2774 [5:26:27<3:39:47, 11.76s/it] {'loss': 1.0508, 'learning_rate': 1.8534224344237634e-06, 'epoch': 0.6} 60%|█████▉ | 1653/2774 [5:26:27<3:39:47, 11.76s/it] 60%|█████▉ | 1654/2774 [5:26:39<3:39:49, 11.78s/it] {'loss': 1.0146, 'learning_rate': 1.8506025194219984e-06, 'epoch': 0.6} 60%|█████▉ | 1654/2774 [5:26:39<3:39:49, 11.78s/it] 60%|█████▉ | 1655/2774 [5:26:50<3:36:34, 11.61s/it] {'loss': 1.0083, 'learning_rate': 1.8477834901591678e-06, 'epoch': 0.6} 60%|█████▉ | 1655/2774 [5:26:50<3:36:34, 11.61s/it] 60%|█████▉ | 1656/2774 [5:27:03<3:39:01, 11.75s/it] {'loss': 0.9971, 'learning_rate': 1.8449653504802573e-06, 'epoch': 0.6} 60%|█████▉ | 1656/2774 [5:27:03<3:39:01, 11.75s/it] 60%|█████▉ | 1657/2774 [5:27:14<3:38:35, 11.74s/it] {'loss': 1.0532, 'learning_rate': 1.8421481042290393e-06, 'epoch': 0.6} 60%|█████▉ | 1657/2774 [5:27:14<3:38:35, 11.74s/it] 60%|█████▉ | 1658/2774 [5:27:26<3:36:49, 11.66s/it] {'loss': 1.0298, 'learning_rate': 1.8393317552480672e-06, 'epoch': 0.6} 60%|█████▉ | 1658/2774 [5:27:26<3:36:49, 11.66s/it] 60%|█████▉ | 1659/2774 [5:27:37<3:36:36, 11.66s/it] {'loss': 1.0259, 'learning_rate': 1.8365163073786712e-06, 'epoch': 0.6} 60%|█████▉ | 1659/2774 [5:27:37<3:36:36, 11.66s/it] 60%|█████▉ | 1660/2774 [5:27:49<3:34:14, 11.54s/it] {'loss': 0.9697, 'learning_rate': 1.8337017644609532e-06, 'epoch': 0.6} 60%|█████▉ | 1660/2774 [5:27:49<3:34:14, 11.54s/it] 60%|█████▉ | 1661/2774 [5:28:02<3:46:08, 12.19s/it] {'loss': 1.0239, 'learning_rate': 1.8308881303337772e-06, 'epoch': 0.6} 60%|█████▉ | 1661/2774 [5:28:02<3:46:08, 12.19s/it] 60%|█████▉ | 1662/2774 [5:28:15<3:49:00, 12.36s/it] {'loss': 1.0366, 'learning_rate': 1.8280754088347714e-06, 'epoch': 0.6} 60%|█████▉ | 1662/2774 [5:28:15<3:49:00, 12.36s/it] 60%|█████▉ | 1663/2774 [5:28:26<3:43:24, 12.06s/it] {'loss': 1.0098, 'learning_rate': 1.8252636038003181e-06, 'epoch': 0.6} 60%|█████▉ | 1663/2774 [5:28:26<3:43:24, 12.06s/it] 60%|█████▉ | 1664/2774 [5:28:38<3:39:53, 11.89s/it] {'loss': 1.0171, 'learning_rate': 1.82245271906555e-06, 'epoch': 0.6} 60%|█████▉ | 1664/2774 [5:28:38<3:39:53, 11.89s/it] 60%|██████ | 1665/2774 [5:28:49<3:37:18, 11.76s/it] {'loss': 1.0806, 'learning_rate': 1.8196427584643433e-06, 'epoch': 0.6} 60%|██████ | 1665/2774 [5:28:49<3:37:18, 11.76s/it] 60%|██████ | 1666/2774 [5:29:01<3:36:24, 11.72s/it] {'loss': 1.0269, 'learning_rate': 1.8168337258293145e-06, 'epoch': 0.6} 60%|██████ | 1666/2774 [5:29:01<3:36:24, 11.72s/it] 60%|██████ | 1667/2774 [5:29:13<3:34:56, 11.65s/it] {'loss': 1.0005, 'learning_rate': 1.8140256249918153e-06, 'epoch': 0.6} 60%|██████ | 1667/2774 [5:29:13<3:34:56, 11.65s/it] 60%|██████ | 1668/2774 [5:29:24<3:35:41, 11.70s/it] {'loss': 0.9678, 'learning_rate': 1.8112184597819246e-06, 'epoch': 0.6} 60%|██████ | 1668/2774 [5:29:24<3:35:41, 11.70s/it] 60%|██████ | 1669/2774 [5:29:36<3:35:30, 11.70s/it] {'loss': 1.0684, 'learning_rate': 1.808412234028448e-06, 'epoch': 0.6} 60%|██████ | 1669/2774 [5:29:36<3:35:30, 11.70s/it] 60%|██████ | 1670/2774 [5:29:47<3:31:45, 11.51s/it] {'loss': 0.9517, 'learning_rate': 1.8056069515589054e-06, 'epoch': 0.6} 60%|██████ | 1670/2774 [5:29:47<3:31:45, 11.51s/it] 60%|██████ | 1671/2774 [5:29:59<3:31:19, 11.50s/it] {'loss': 1.0537, 'learning_rate': 1.8028026161995335e-06, 'epoch': 0.6} 60%|██████ | 1671/2774 [5:29:59<3:31:19, 11.50s/it] 60%|██████ | 1672/2774 [5:30:10<3:30:26, 11.46s/it] {'loss': 1.0259, 'learning_rate': 1.7999992317752768e-06, 'epoch': 0.6} 60%|██████ | 1672/2774 [5:30:10<3:30:26, 11.46s/it] 60%|██████ | 1673/2774 [5:30:21<3:30:33, 11.47s/it] {'loss': 0.9946, 'learning_rate': 1.7971968021097818e-06, 'epoch': 0.6} 60%|██████ | 1673/2774 [5:30:21<3:30:33, 11.47s/it] 60%|██████ | 1674/2774 [5:30:33<3:31:46, 11.55s/it] {'loss': 1.0249, 'learning_rate': 1.7943953310253939e-06, 'epoch': 0.6} 60%|██████ | 1674/2774 [5:30:33<3:31:46, 11.55s/it] 60%|██████ | 1675/2774 [5:30:45<3:32:27, 11.60s/it] {'loss': 0.9741, 'learning_rate': 1.7915948223431506e-06, 'epoch': 0.6} 60%|██████ | 1675/2774 [5:30:45<3:32:27, 11.60s/it] 60%|██████ | 1676/2774 [5:30:56<3:31:27, 11.56s/it] {'loss': 1.0264, 'learning_rate': 1.788795279882775e-06, 'epoch': 0.6} 60%|██████ | 1676/2774 [5:30:56<3:31:27, 11.56s/it] 60%|██████ | 1677/2774 [5:31:08<3:30:39, 11.52s/it] {'loss': 1.0107, 'learning_rate': 1.7859967074626756e-06, 'epoch': 0.6} 60%|██████ | 1677/2774 [5:31:08<3:30:39, 11.52s/it] 60%|██████ | 1678/2774 [5:31:19<3:29:13, 11.45s/it] {'loss': 1.0093, 'learning_rate': 1.7831991088999357e-06, 'epoch': 0.6} 60%|██████ | 1678/2774 [5:31:19<3:29:13, 11.45s/it] 61%|██████ | 1679/2774 [5:31:30<3:27:46, 11.38s/it] {'loss': 1.0088, 'learning_rate': 1.7804024880103101e-06, 'epoch': 0.61} 61%|██████ | 1679/2774 [5:31:30<3:27:46, 11.38s/it] 61%|██████ | 1680/2774 [5:31:43<3:35:59, 11.85s/it] {'loss': 1.0386, 'learning_rate': 1.7776068486082215e-06, 'epoch': 0.61} 61%|██████ | 1680/2774 [5:31:43<3:35:59, 11.85s/it] 61%|██████ | 1681/2774 [5:31:55<3:34:32, 11.78s/it] {'loss': 1.0269, 'learning_rate': 1.7748121945067526e-06, 'epoch': 0.61} 61%|██████ | 1681/2774 [5:31:55<3:34:32, 11.78s/it] 61%|██████ | 1682/2774 [5:32:06<3:33:04, 11.71s/it] {'loss': 1.0269, 'learning_rate': 1.772018529517643e-06, 'epoch': 0.61} 61%|██████ | 1682/2774 [5:32:06<3:33:04, 11.71s/it] 61%|██████ | 1683/2774 [5:32:18<3:31:53, 11.65s/it] {'loss': 1.0674, 'learning_rate': 1.7692258574512827e-06, 'epoch': 0.61} 61%|██████ | 1683/2774 [5:32:18<3:31:53, 11.65s/it] 61%|██████ | 1684/2774 [5:32:29<3:29:56, 11.56s/it] {'loss': 1.0, 'learning_rate': 1.766434182116708e-06, 'epoch': 0.61} 61%|██████ | 1684/2774 [5:32:29<3:29:56, 11.56s/it] 61%|██████ | 1685/2774 [5:32:41<3:28:28, 11.49s/it] {'loss': 1.0254, 'learning_rate': 1.7636435073215956e-06, 'epoch': 0.61} 61%|██████ | 1685/2774 [5:32:41<3:28:28, 11.49s/it] 61%|██████ | 1686/2774 [5:32:52<3:27:13, 11.43s/it] {'loss': 1.0176, 'learning_rate': 1.7608538368722572e-06, 'epoch': 0.61} 61%|██████ | 1686/2774 [5:32:52<3:27:13, 11.43s/it] 61%|██████ | 1687/2774 [5:33:04<3:29:25, 11.56s/it] {'loss': 1.0669, 'learning_rate': 1.7580651745736357e-06, 'epoch': 0.61} 61%|██████ | 1687/2774 [5:33:04<3:29:25, 11.56s/it] 61%|██████ | 1688/2774 [5:33:15<3:26:59, 11.44s/it] {'loss': 0.9888, 'learning_rate': 1.755277524229296e-06, 'epoch': 0.61} 61%|██████ | 1688/2774 [5:33:15<3:26:59, 11.44s/it] 61%|██████ | 1689/2774 [5:33:26<3:26:43, 11.43s/it] {'loss': 1.0381, 'learning_rate': 1.752490889641426e-06, 'epoch': 0.61} 61%|██████ | 1689/2774 [5:33:26<3:26:43, 11.43s/it] 61%|██████ | 1690/2774 [5:33:38<3:27:50, 11.50s/it] {'loss': 0.9927, 'learning_rate': 1.7497052746108262e-06, 'epoch': 0.61} 61%|██████ | 1690/2774 [5:33:38<3:27:50, 11.50s/it] 61%|██████ | 1691/2774 [5:33:50<3:27:53, 11.52s/it] {'loss': 1.0376, 'learning_rate': 1.7469206829369085e-06, 'epoch': 0.61} 61%|██████ | 1691/2774 [5:33:50<3:27:53, 11.52s/it] 61%|██████ | 1692/2774 [5:34:01<3:28:52, 11.58s/it] {'loss': 0.9658, 'learning_rate': 1.7441371184176865e-06, 'epoch': 0.61} 61%|██████ | 1692/2774 [5:34:01<3:28:52, 11.58s/it] 61%|██████ | 1693/2774 [5:34:12<3:26:41, 11.47s/it] {'loss': 1.0371, 'learning_rate': 1.7413545848497745e-06, 'epoch': 0.61} 61%|██████ | 1693/2774 [5:34:12<3:26:41, 11.47s/it] 61%|██████ | 1694/2774 [5:34:24<3:25:03, 11.39s/it] {'loss': 1.0171, 'learning_rate': 1.7385730860283806e-06, 'epoch': 0.61} 61%|██████ | 1694/2774 [5:34:24<3:25:03, 11.39s/it] 61%|██████ | 1695/2774 [5:34:35<3:26:09, 11.46s/it] {'loss': 1.0537, 'learning_rate': 1.7357926257473007e-06, 'epoch': 0.61} 61%|██████ | 1695/2774 [5:34:35<3:26:09, 11.46s/it] 61%|██████ | 1696/2774 [5:34:47<3:25:46, 11.45s/it] {'loss': 1.0767, 'learning_rate': 1.7330132077989159e-06, 'epoch': 0.61} 61%|██████ | 1696/2774 [5:34:47<3:25:46, 11.45s/it] 61%|██████ | 1697/2774 [5:34:59<3:29:01, 11.64s/it] {'loss': 1.0366, 'learning_rate': 1.7302348359741821e-06, 'epoch': 0.61} 61%|██████ | 1697/2774 [5:34:59<3:29:01, 11.64s/it] 61%|██████ | 1698/2774 [5:35:10<3:27:58, 11.60s/it] {'loss': 1.0205, 'learning_rate': 1.7274575140626318e-06, 'epoch': 0.61} 61%|██████ | 1698/2774 [5:35:10<3:27:58, 11.60s/it] 61%|██████ | 1699/2774 [5:35:22<3:26:53, 11.55s/it] {'loss': 1.0396, 'learning_rate': 1.7246812458523642e-06, 'epoch': 0.61} 61%|██████ | 1699/2774 [5:35:22<3:26:53, 11.55s/it] 61%|██████▏ | 1700/2774 [5:35:33<3:25:02, 11.45s/it] {'loss': 1.0181, 'learning_rate': 1.7219060351300417e-06, 'epoch': 0.61} 61%|██████▏ | 1700/2774 [5:35:33<3:25:02, 11.45s/it] 61%|██████▏ | 1701/2774 [5:35:44<3:23:42, 11.39s/it] {'loss': 1.0186, 'learning_rate': 1.7191318856808848e-06, 'epoch': 0.61} 61%|██████▏ | 1701/2774 [5:35:44<3:23:42, 11.39s/it] 61%|██████▏ | 1702/2774 [5:35:56<3:23:08, 11.37s/it] {'loss': 1.0146, 'learning_rate': 1.716358801288664e-06, 'epoch': 0.61} 61%|██████▏ | 1702/2774 [5:35:56<3:23:08, 11.37s/it] 61%|██████▏ | 1703/2774 [5:36:07<3:24:57, 11.48s/it] {'loss': 1.0576, 'learning_rate': 1.7135867857356998e-06, 'epoch': 0.61} 61%|██████▏ | 1703/2774 [5:36:07<3:24:57, 11.48s/it] 61%|██████▏ | 1704/2774 [5:36:19<3:24:07, 11.45s/it] {'loss': 1.0127, 'learning_rate': 1.7108158428028537e-06, 'epoch': 0.61} 61%|██████▏ | 1704/2774 [5:36:19<3:24:07, 11.45s/it] 61%|██████▏ | 1705/2774 [5:36:30<3:22:37, 11.37s/it] {'loss': 1.0298, 'learning_rate': 1.7080459762695262e-06, 'epoch': 0.61} 61%|██████▏ | 1705/2774 [5:36:30<3:22:37, 11.37s/it] 61%|██████▏ | 1706/2774 [5:36:41<3:22:08, 11.36s/it] {'loss': 0.9614, 'learning_rate': 1.7052771899136453e-06, 'epoch': 0.61} 61%|██████▏ | 1706/2774 [5:36:41<3:22:08, 11.36s/it] 62%|██████▏ | 1707/2774 [5:36:52<3:20:53, 11.30s/it] {'loss': 1.0728, 'learning_rate': 1.7025094875116693e-06, 'epoch': 0.62} 62%|██████▏ | 1707/2774 [5:36:52<3:20:53, 11.30s/it] 62%|██████▏ | 1708/2774 [5:37:04<3:22:21, 11.39s/it] {'loss': 1.0068, 'learning_rate': 1.6997428728385773e-06, 'epoch': 0.62} 62%|██████▏ | 1708/2774 [5:37:04<3:22:21, 11.39s/it] 62%|██████▏ | 1709/2774 [5:37:16<3:23:16, 11.45s/it] {'loss': 1.061, 'learning_rate': 1.6969773496678648e-06, 'epoch': 0.62} 62%|██████▏ | 1709/2774 [5:37:16<3:23:16, 11.45s/it] 62%|██████▏ | 1710/2774 [5:37:28<3:27:00, 11.67s/it] {'loss': 1.0859, 'learning_rate': 1.6942129217715382e-06, 'epoch': 0.62} 62%|██████▏ | 1710/2774 [5:37:28<3:27:00, 11.67s/it] 62%|██████▏ | 1711/2774 [5:37:39<3:26:33, 11.66s/it] {'loss': 1.0635, 'learning_rate': 1.6914495929201098e-06, 'epoch': 0.62} 62%|██████▏ | 1711/2774 [5:37:39<3:26:33, 11.66s/it] 62%|██████▏ | 1712/2774 [5:37:51<3:23:57, 11.52s/it] {'loss': 1.0327, 'learning_rate': 1.6886873668825932e-06, 'epoch': 0.62} 62%|██████▏ | 1712/2774 [5:37:51<3:23:57, 11.52s/it] 62%|██████▏ | 1713/2774 [5:38:02<3:24:26, 11.56s/it] {'loss': 1.0698, 'learning_rate': 1.6859262474264985e-06, 'epoch': 0.62} 62%|██████▏ | 1713/2774 [5:38:02<3:24:26, 11.56s/it] 62%|██████▏ | 1714/2774 [5:38:14<3:24:06, 11.55s/it] {'loss': 1.0859, 'learning_rate': 1.6831662383178262e-06, 'epoch': 0.62} 62%|██████▏ | 1714/2774 [5:38:14<3:24:06, 11.55s/it] 62%|██████▏ | 1715/2774 [5:38:25<3:23:31, 11.53s/it] {'loss': 1.0762, 'learning_rate': 1.6804073433210605e-06, 'epoch': 0.62} 62%|██████▏ | 1715/2774 [5:38:25<3:23:31, 11.53s/it] 62%|██████▏ | 1716/2774 [5:38:36<3:21:15, 11.41s/it] {'loss': 1.0142, 'learning_rate': 1.6776495661991682e-06, 'epoch': 0.62} 62%|██████▏ | 1716/2774 [5:38:36<3:21:15, 11.41s/it] 62%|██████▏ | 1717/2774 [5:38:48<3:19:57, 11.35s/it] {'loss': 1.0317, 'learning_rate': 1.674892910713591e-06, 'epoch': 0.62} 62%|██████▏ | 1717/2774 [5:38:48<3:19:57, 11.35s/it] 62%|██████▏ | 1718/2774 [5:38:59<3:20:04, 11.37s/it] {'loss': 1.0332, 'learning_rate': 1.67213738062424e-06, 'epoch': 0.62} 62%|██████▏ | 1718/2774 [5:38:59<3:20:04, 11.37s/it] 62%|██████▏ | 1719/2774 [5:39:10<3:19:55, 11.37s/it] {'loss': 1.0479, 'learning_rate': 1.6693829796894923e-06, 'epoch': 0.62} 62%|██████▏ | 1719/2774 [5:39:10<3:19:55, 11.37s/it] 62%|██████▏ | 1720/2774 [5:39:23<3:25:10, 11.68s/it] {'loss': 1.0342, 'learning_rate': 1.666629711666184e-06, 'epoch': 0.62} 62%|██████▏ | 1720/2774 [5:39:23<3:25:10, 11.68s/it] 62%|██████▏ | 1721/2774 [5:39:34<3:22:29, 11.54s/it] {'loss': 0.9551, 'learning_rate': 1.663877580309607e-06, 'epoch': 0.62} 62%|██████▏ | 1721/2774 [5:39:34<3:22:29, 11.54s/it] 62%|██████▏ | 1722/2774 [5:39:46<3:25:41, 11.73s/it] {'loss': 1.0151, 'learning_rate': 1.6611265893735007e-06, 'epoch': 0.62} 62%|██████▏ | 1722/2774 [5:39:46<3:25:41, 11.73s/it] 62%|██████▏ | 1723/2774 [5:39:58<3:24:37, 11.68s/it] {'loss': 1.084, 'learning_rate': 1.6583767426100528e-06, 'epoch': 0.62} 62%|██████▏ | 1723/2774 [5:39:58<3:24:37, 11.68s/it] 62%|██████▏ | 1724/2774 [5:40:09<3:22:01, 11.54s/it] {'loss': 0.998, 'learning_rate': 1.6556280437698857e-06, 'epoch': 0.62} 62%|██████▏ | 1724/2774 [5:40:09<3:22:01, 11.54s/it] 62%|██████▏ | 1725/2774 [5:40:22<3:27:08, 11.85s/it] {'loss': 1.0239, 'learning_rate': 1.6528804966020603e-06, 'epoch': 0.62} 62%|██████▏ | 1725/2774 [5:40:22<3:27:08, 11.85s/it] 62%|██████▏ | 1726/2774 [5:40:33<3:24:00, 11.68s/it] {'loss': 1.0386, 'learning_rate': 1.6501341048540647e-06, 'epoch': 0.62} 62%|██████▏ | 1726/2774 [5:40:33<3:24:00, 11.68s/it] 62%|██████▏ | 1727/2774 [5:40:44<3:23:19, 11.65s/it] {'loss': 1.0283, 'learning_rate': 1.647388872271811e-06, 'epoch': 0.62} 62%|██████▏ | 1727/2774 [5:40:44<3:23:19, 11.65s/it] 62%|██████▏ | 1728/2774 [5:40:56<3:21:44, 11.57s/it] {'loss': 1.0229, 'learning_rate': 1.6446448025996303e-06, 'epoch': 0.62} 62%|██████▏ | 1728/2774 [5:40:56<3:21:44, 11.57s/it] 62%|██████▏ | 1729/2774 [5:41:07<3:21:50, 11.59s/it] {'loss': 0.9937, 'learning_rate': 1.6419018995802685e-06, 'epoch': 0.62} 62%|██████▏ | 1729/2774 [5:41:07<3:21:50, 11.59s/it] 62%|██████▏ | 1730/2774 [5:41:19<3:21:44, 11.59s/it] {'loss': 1.0156, 'learning_rate': 1.6391601669548796e-06, 'epoch': 0.62} 62%|██████▏ | 1730/2774 [5:41:19<3:21:44, 11.59s/it] 62%|██████▏ | 1731/2774 [5:41:30<3:20:43, 11.55s/it] {'loss': 1.0571, 'learning_rate': 1.6364196084630207e-06, 'epoch': 0.62} 62%|██████▏ | 1731/2774 [5:41:30<3:20:43, 11.55s/it] 62%|██████▏ | 1732/2774 [5:41:43<3:24:30, 11.78s/it] {'loss': 1.0527, 'learning_rate': 1.6336802278426494e-06, 'epoch': 0.62} 62%|██████▏ | 1732/2774 [5:41:43<3:24:30, 11.78s/it] 62%|██████▏ | 1733/2774 [5:41:55<3:28:19, 12.01s/it] {'loss': 1.0278, 'learning_rate': 1.6309420288301136e-06, 'epoch': 0.62} 62%|██████▏ | 1733/2774 [5:41:55<3:28:19, 12.01s/it] 63%|██████▎ | 1734/2774 [5:42:08<3:30:02, 12.12s/it] {'loss': 0.9946, 'learning_rate': 1.628205015160152e-06, 'epoch': 0.63} 63%|██████▎ | 1734/2774 [5:42:08<3:30:02, 12.12s/it] 63%|██████▎ | 1735/2774 [5:42:20<3:28:54, 12.06s/it] {'loss': 1.0776, 'learning_rate': 1.625469190565886e-06, 'epoch': 0.63} 63%|██████▎ | 1735/2774 [5:42:20<3:28:54, 12.06s/it] 63%|██████▎ | 1736/2774 [5:42:31<3:26:07, 11.91s/it] {'loss': 1.0142, 'learning_rate': 1.6227345587788152e-06, 'epoch': 0.63} 63%|██████▎ | 1736/2774 [5:42:31<3:26:07, 11.91s/it] 63%|██████▎ | 1737/2774 [5:42:43<3:23:59, 11.80s/it] {'loss': 0.9854, 'learning_rate': 1.620001123528812e-06, 'epoch': 0.63} 63%|██████▎ | 1737/2774 [5:42:43<3:23:59, 11.80s/it] 63%|██████▎ | 1738/2774 [5:42:56<3:30:57, 12.22s/it] {'loss': 0.9805, 'learning_rate': 1.6172688885441174e-06, 'epoch': 0.63} 63%|██████▎ | 1738/2774 [5:42:56<3:30:57, 12.22s/it] 63%|██████▎ | 1739/2774 [5:43:08<3:30:44, 12.22s/it] {'loss': 1.0596, 'learning_rate': 1.6145378575513343e-06, 'epoch': 0.63} 63%|██████▎ | 1739/2774 [5:43:08<3:30:44, 12.22s/it] 63%|██████▎ | 1740/2774 [5:43:20<3:27:51, 12.06s/it] {'loss': 1.062, 'learning_rate': 1.611808034275424e-06, 'epoch': 0.63} 63%|██████▎ | 1740/2774 [5:43:20<3:27:51, 12.06s/it] 63%|██████▎ | 1741/2774 [5:43:31<3:23:29, 11.82s/it] {'loss': 1.063, 'learning_rate': 1.609079422439702e-06, 'epoch': 0.63} 63%|██████▎ | 1741/2774 [5:43:31<3:23:29, 11.82s/it] 63%|██████▎ | 1742/2774 [5:43:42<3:20:02, 11.63s/it] {'loss': 1.0278, 'learning_rate': 1.6063520257658278e-06, 'epoch': 0.63} 63%|██████▎ | 1742/2774 [5:43:42<3:20:02, 11.63s/it] 63%|██████▎ | 1743/2774 [5:43:53<3:17:21, 11.49s/it] {'loss': 1.0264, 'learning_rate': 1.6036258479738065e-06, 'epoch': 0.63} 63%|██████▎ | 1743/2774 [5:43:53<3:17:21, 11.49s/it] 63%|██████▎ | 1744/2774 [5:44:05<3:16:40, 11.46s/it] {'loss': 1.0161, 'learning_rate': 1.6009008927819802e-06, 'epoch': 0.63} 63%|██████▎ | 1744/2774 [5:44:05<3:16:40, 11.46s/it] 63%|██████▎ | 1745/2774 [5:44:16<3:17:17, 11.50s/it] {'loss': 1.0464, 'learning_rate': 1.598177163907023e-06, 'epoch': 0.63} 63%|██████▎ | 1745/2774 [5:44:16<3:17:17, 11.50s/it] 63%|██████▎ | 1746/2774 [5:44:30<3:26:08, 12.03s/it] {'loss': 0.9849, 'learning_rate': 1.5954546650639368e-06, 'epoch': 0.63} 63%|██████▎ | 1746/2774 [5:44:30<3:26:08, 12.03s/it] 63%|██████▎ | 1747/2774 [5:44:43<3:33:20, 12.46s/it] {'loss': 0.9863, 'learning_rate': 1.5927333999660457e-06, 'epoch': 0.63} 63%|██████▎ | 1747/2774 [5:44:43<3:33:20, 12.46s/it] 63%|██████▎ | 1748/2774 [5:44:55<3:27:52, 12.16s/it] {'loss': 1.0112, 'learning_rate': 1.59001337232499e-06, 'epoch': 0.63} 63%|██████▎ | 1748/2774 [5:44:55<3:27:52, 12.16s/it] 63%|██████▎ | 1749/2774 [5:45:08<3:33:30, 12.50s/it] {'loss': 0.9531, 'learning_rate': 1.5872945858507239e-06, 'epoch': 0.63} 63%|██████▎ | 1749/2774 [5:45:08<3:33:30, 12.50s/it] 63%|██████▎ | 1750/2774 [5:45:20<3:29:25, 12.27s/it] {'loss': 1.0688, 'learning_rate': 1.5845770442515082e-06, 'epoch': 0.63} 63%|██████▎ | 1750/2774 [5:45:20<3:29:25, 12.27s/it] 63%|██████▎ | 1751/2774 [5:45:31<3:24:25, 11.99s/it] {'loss': 1.0796, 'learning_rate': 1.5818607512339048e-06, 'epoch': 0.63} 63%|██████▎ | 1751/2774 [5:45:31<3:24:25, 11.99s/it] 63%|██████▎ | 1752/2774 [5:45:43<3:22:37, 11.90s/it] {'loss': 1.0728, 'learning_rate': 1.579145710502773e-06, 'epoch': 0.63} 63%|██████▎ | 1752/2774 [5:45:43<3:22:37, 11.90s/it] 63%|██████▎ | 1753/2774 [5:45:54<3:20:23, 11.78s/it] {'loss': 1.0781, 'learning_rate': 1.5764319257612649e-06, 'epoch': 0.63} 63%|██████▎ | 1753/2774 [5:45:54<3:20:23, 11.78s/it] 63%|██████▎ | 1754/2774 [5:46:05<3:17:53, 11.64s/it] {'loss': 1.0649, 'learning_rate': 1.573719400710819e-06, 'epoch': 0.63} 63%|██████▎ | 1754/2774 [5:46:05<3:17:53, 11.64s/it] 63%|██████▎ | 1755/2774 [5:46:17<3:15:45, 11.53s/it] {'loss': 1.0737, 'learning_rate': 1.571008139051155e-06, 'epoch': 0.63} 63%|██████▎ | 1755/2774 [5:46:17<3:15:45, 11.53s/it] 63%|██████▎ | 1756/2774 [5:46:28<3:15:14, 11.51s/it] {'loss': 1.0762, 'learning_rate': 1.5682981444802708e-06, 'epoch': 0.63} 63%|██████▎ | 1756/2774 [5:46:28<3:15:14, 11.51s/it] 63%|██████▎ | 1757/2774 [5:46:39<3:14:08, 11.45s/it] {'loss': 1.0669, 'learning_rate': 1.565589420694435e-06, 'epoch': 0.63} 63%|██████▎ | 1757/2774 [5:46:40<3:14:08, 11.45s/it] 63%|██████▎ | 1758/2774 [5:46:51<3:13:33, 11.43s/it] {'loss': 1.0376, 'learning_rate': 1.5628819713881832e-06, 'epoch': 0.63} 63%|██████▎ | 1758/2774 [5:46:51<3:13:33, 11.43s/it] 63%|██████▎ | 1759/2774 [5:47:03<3:19:04, 11.77s/it] {'loss': 1.0342, 'learning_rate': 1.5601758002543138e-06, 'epoch': 0.63} 63%|██████▎ | 1759/2774 [5:47:03<3:19:04, 11.77s/it] 63%|██████▎ | 1760/2774 [5:47:15<3:16:30, 11.63s/it] {'loss': 0.9917, 'learning_rate': 1.5574709109838782e-06, 'epoch': 0.63} 63%|██████▎ | 1760/2774 [5:47:15<3:16:30, 11.63s/it] 63%|██████▎ | 1761/2774 [5:47:26<3:15:44, 11.59s/it] {'loss': 1.0225, 'learning_rate': 1.5547673072661837e-06, 'epoch': 0.63} 63%|██████▎ | 1761/2774 [5:47:26<3:15:44, 11.59s/it] 64%|██████▎ | 1762/2774 [5:47:38<3:14:19, 11.52s/it] {'loss': 0.9888, 'learning_rate': 1.552064992788782e-06, 'epoch': 0.64} 64%|██████▎ | 1762/2774 [5:47:38<3:14:19, 11.52s/it] 64%|██████▎ | 1763/2774 [5:47:50<3:20:15, 11.88s/it] {'loss': 0.959, 'learning_rate': 1.5493639712374672e-06, 'epoch': 0.64} 64%|██████▎ | 1763/2774 [5:47:50<3:20:15, 11.88s/it] 64%|██████▎ | 1764/2774 [5:48:02<3:20:30, 11.91s/it] {'loss': 0.9824, 'learning_rate': 1.5466642462962695e-06, 'epoch': 0.64} 64%|██████▎ | 1764/2774 [5:48:02<3:20:30, 11.91s/it] 64%|██████▎ | 1765/2774 [5:48:14<3:19:38, 11.87s/it] {'loss': 1.042, 'learning_rate': 1.54396582164745e-06, 'epoch': 0.64} 64%|██████▎ | 1765/2774 [5:48:14<3:19:38, 11.87s/it] 64%|██████▎ | 1766/2774 [5:48:26<3:17:13, 11.74s/it] {'loss': 0.9873, 'learning_rate': 1.5412687009714974e-06, 'epoch': 0.64} 64%|██████▎ | 1766/2774 [5:48:26<3:17:13, 11.74s/it] 64%|██████▎ | 1767/2774 [5:48:37<3:14:59, 11.62s/it] {'loss': 1.0161, 'learning_rate': 1.5385728879471217e-06, 'epoch': 0.64} 64%|██████▎ | 1767/2774 [5:48:37<3:14:59, 11.62s/it] 64%|██████▎ | 1768/2774 [5:48:48<3:14:50, 11.62s/it] {'loss': 0.9897, 'learning_rate': 1.535878386251249e-06, 'epoch': 0.64} 64%|██████▎ | 1768/2774 [5:48:48<3:14:50, 11.62s/it] 64%|██████▍ | 1769/2774 [5:49:00<3:13:57, 11.58s/it] {'loss': 1.0239, 'learning_rate': 1.5331851995590159e-06, 'epoch': 0.64} 64%|██████▍ | 1769/2774 [5:49:00<3:13:57, 11.58s/it] 64%|██████▍ | 1770/2774 [5:49:13<3:19:07, 11.90s/it] {'loss': 1.0444, 'learning_rate': 1.530493331543767e-06, 'epoch': 0.64} 64%|██████▍ | 1770/2774 [5:49:13<3:19:07, 11.90s/it] 64%|██████▍ | 1771/2774 [5:49:24<3:15:30, 11.70s/it] {'loss': 0.9868, 'learning_rate': 1.5278027858770472e-06, 'epoch': 0.64} 64%|██████▍ | 1771/2774 [5:49:24<3:15:30, 11.70s/it] 64%|██████▍ | 1772/2774 [5:49:36<3:16:19, 11.76s/it] {'loss': 1.0654, 'learning_rate': 1.5251135662285993e-06, 'epoch': 0.64} 64%|██████▍ | 1772/2774 [5:49:36<3:16:19, 11.76s/it] 64%|██████▍ | 1773/2774 [5:49:48<3:20:57, 12.05s/it] {'loss': 1.0332, 'learning_rate': 1.5224256762663556e-06, 'epoch': 0.64} 64%|██████▍ | 1773/2774 [5:49:48<3:20:57, 12.05s/it] 64%|██████▍ | 1774/2774 [5:50:00<3:18:20, 11.90s/it] {'loss': 0.9829, 'learning_rate': 1.5197391196564357e-06, 'epoch': 0.64} 64%|██████▍ | 1774/2774 [5:50:00<3:18:20, 11.90s/it] 64%|██████▍ | 1775/2774 [5:50:11<3:15:26, 11.74s/it] {'loss': 1.019, 'learning_rate': 1.5170539000631407e-06, 'epoch': 0.64} 64%|██████▍ | 1775/2774 [5:50:11<3:15:26, 11.74s/it] 64%|██████▍ | 1776/2774 [5:50:23<3:12:30, 11.57s/it] {'loss': 1.0684, 'learning_rate': 1.5143700211489476e-06, 'epoch': 0.64} 64%|██████▍ | 1776/2774 [5:50:23<3:12:30, 11.57s/it] 64%|██████▍ | 1777/2774 [5:50:34<3:14:01, 11.68s/it] {'loss': 1.0308, 'learning_rate': 1.5116874865745069e-06, 'epoch': 0.64} 64%|██████▍ | 1777/2774 [5:50:34<3:14:01, 11.68s/it] 64%|██████▍ | 1778/2774 [5:50:46<3:12:15, 11.58s/it] {'loss': 1.0015, 'learning_rate': 1.5090062999986304e-06, 'epoch': 0.64} 64%|██████▍ | 1778/2774 [5:50:46<3:12:15, 11.58s/it] 64%|██████▍ | 1779/2774 [5:50:57<3:10:38, 11.50s/it] {'loss': 1.0645, 'learning_rate': 1.5063264650782972e-06, 'epoch': 0.64} 64%|██████▍ | 1779/2774 [5:50:57<3:10:38, 11.50s/it] 64%|██████▍ | 1780/2774 [5:51:09<3:10:17, 11.49s/it] {'loss': 1.0039, 'learning_rate': 1.5036479854686392e-06, 'epoch': 0.64} 64%|██████▍ | 1780/2774 [5:51:09<3:10:17, 11.49s/it] 64%|██████▍ | 1781/2774 [5:51:22<3:18:13, 11.98s/it] {'loss': 1.0532, 'learning_rate': 1.5009708648229409e-06, 'epoch': 0.64} 64%|██████▍ | 1781/2774 [5:51:22<3:18:13, 11.98s/it] 64%|██████▍ | 1782/2774 [5:51:33<3:14:27, 11.76s/it] {'loss': 1.0522, 'learning_rate': 1.4982951067926335e-06, 'epoch': 0.64} 64%|██████▍ | 1782/2774 [5:51:33<3:14:27, 11.76s/it] 64%|██████▍ | 1783/2774 [5:51:44<3:11:37, 11.60s/it] {'loss': 0.999, 'learning_rate': 1.495620715027289e-06, 'epoch': 0.64} 64%|██████▍ | 1783/2774 [5:51:44<3:11:37, 11.60s/it] 64%|██████▍ | 1784/2774 [5:51:56<3:10:56, 11.57s/it] {'loss': 1.0737, 'learning_rate': 1.4929476931746167e-06, 'epoch': 0.64} 64%|██████▍ | 1784/2774 [5:51:56<3:10:56, 11.57s/it] 64%|██████▍ | 1785/2774 [5:52:07<3:09:26, 11.49s/it] {'loss': 1.0142, 'learning_rate': 1.4902760448804559e-06, 'epoch': 0.64} 64%|██████▍ | 1785/2774 [5:52:07<3:09:26, 11.49s/it] 64%|██████▍ | 1786/2774 [5:52:18<3:08:12, 11.43s/it] {'loss': 1.0449, 'learning_rate': 1.4876057737887755e-06, 'epoch': 0.64} 64%|██████▍ | 1786/2774 [5:52:18<3:08:12, 11.43s/it] 64%|██████▍ | 1787/2774 [5:52:30<3:07:29, 11.40s/it] {'loss': 1.0527, 'learning_rate': 1.484936883541662e-06, 'epoch': 0.64} 64%|██████▍ | 1787/2774 [5:52:30<3:07:29, 11.40s/it] 64%|██████▍ | 1788/2774 [5:52:41<3:07:47, 11.43s/it] {'loss': 0.998, 'learning_rate': 1.4822693777793207e-06, 'epoch': 0.64} 64%|██████▍ | 1788/2774 [5:52:41<3:07:47, 11.43s/it] 64%|██████▍ | 1789/2774 [5:52:53<3:07:31, 11.42s/it] {'loss': 1.0063, 'learning_rate': 1.4796032601400687e-06, 'epoch': 0.64} 64%|██████▍ | 1789/2774 [5:52:53<3:07:31, 11.42s/it] 65%|██████▍ | 1790/2774 [5:53:06<3:19:00, 12.13s/it] {'loss': 0.9961, 'learning_rate': 1.4769385342603292e-06, 'epoch': 0.65} 65%|██████▍ | 1790/2774 [5:53:06<3:19:00, 12.13s/it] 65%|██████▍ | 1791/2774 [5:53:18<3:17:18, 12.04s/it] {'loss': 0.957, 'learning_rate': 1.4742752037746277e-06, 'epoch': 0.65} 65%|██████▍ | 1791/2774 [5:53:18<3:17:18, 12.04s/it] 65%|██████▍ | 1792/2774 [5:53:29<3:13:12, 11.81s/it] {'loss': 1.0806, 'learning_rate': 1.4716132723155864e-06, 'epoch': 0.65} 65%|██████▍ | 1792/2774 [5:53:29<3:13:12, 11.81s/it] 65%|██████▍ | 1793/2774 [5:53:41<3:11:11, 11.69s/it] {'loss': 1.0098, 'learning_rate': 1.4689527435139184e-06, 'epoch': 0.65} 65%|██████▍ | 1793/2774 [5:53:41<3:11:11, 11.69s/it] 65%|██████▍ | 1794/2774 [5:53:53<3:11:10, 11.71s/it] {'loss': 1.0205, 'learning_rate': 1.4662936209984242e-06, 'epoch': 0.65} 65%|██████▍ | 1794/2774 [5:53:53<3:11:10, 11.71s/it] 65%|██████▍ | 1795/2774 [5:54:04<3:10:57, 11.70s/it] {'loss': 1.0142, 'learning_rate': 1.4636359083959867e-06, 'epoch': 0.65} 65%|██████▍ | 1795/2774 [5:54:04<3:10:57, 11.70s/it] 65%|██████▍ | 1796/2774 [5:54:16<3:08:33, 11.57s/it] {'loss': 0.9731, 'learning_rate': 1.460979609331565e-06, 'epoch': 0.65} 65%|██████▍ | 1796/2774 [5:54:16<3:08:33, 11.57s/it] 65%|██████▍ | 1797/2774 [5:54:27<3:08:31, 11.58s/it] {'loss': 1.0439, 'learning_rate': 1.458324727428191e-06, 'epoch': 0.65} 65%|██████▍ | 1797/2774 [5:54:27<3:08:31, 11.58s/it] 65%|██████▍ | 1798/2774 [5:54:40<3:15:45, 12.03s/it] {'loss': 0.9717, 'learning_rate': 1.4556712663069622e-06, 'epoch': 0.65} 65%|██████▍ | 1798/2774 [5:54:40<3:15:45, 12.03s/it] 65%|██████▍ | 1799/2774 [5:54:52<3:12:47, 11.86s/it] {'loss': 0.9766, 'learning_rate': 1.4530192295870405e-06, 'epoch': 0.65} 65%|██████▍ | 1799/2774 [5:54:52<3:12:47, 11.86s/it] 65%|██████▍ | 1800/2774 [5:55:03<3:10:51, 11.76s/it] {'loss': 1.0596, 'learning_rate': 1.4503686208856426e-06, 'epoch': 0.65} 65%|██████▍ | 1800/2774 [5:55:03<3:10:51, 11.76s/it] 65%|██████▍ | 1801/2774 [5:55:14<3:07:33, 11.57s/it] {'loss': 1.0288, 'learning_rate': 1.4477194438180403e-06, 'epoch': 0.65} 65%|██████▍ | 1801/2774 [5:55:14<3:07:33, 11.57s/it] 65%|██████▍ | 1802/2774 [5:55:26<3:06:30, 11.51s/it] {'loss': 1.0347, 'learning_rate': 1.445071701997549e-06, 'epoch': 0.65} 65%|██████▍ | 1802/2774 [5:55:26<3:06:30, 11.51s/it] 65%|██████▍ | 1803/2774 [5:55:39<3:12:56, 11.92s/it] {'loss': 1.0371, 'learning_rate': 1.4424253990355308e-06, 'epoch': 0.65} 65%|██████▍ | 1803/2774 [5:55:39<3:12:56, 11.92s/it] 65%|██████▌ | 1804/2774 [5:55:51<3:14:49, 12.05s/it] {'loss': 1.021, 'learning_rate': 1.439780538541382e-06, 'epoch': 0.65} 65%|██████▌ | 1804/2774 [5:55:51<3:14:49, 12.05s/it] 65%|██████▌ | 1805/2774 [5:56:03<3:13:21, 11.97s/it] {'loss': 1.0654, 'learning_rate': 1.4371371241225326e-06, 'epoch': 0.65} 65%|██████▌ | 1805/2774 [5:56:03<3:13:21, 11.97s/it] 65%|██████▌ | 1806/2774 [5:56:14<3:10:14, 11.79s/it] {'loss': 1.0342, 'learning_rate': 1.4344951593844391e-06, 'epoch': 0.65} 65%|██████▌ | 1806/2774 [5:56:14<3:10:14, 11.79s/it] 65%|██████▌ | 1807/2774 [5:56:26<3:09:47, 11.78s/it] {'loss': 0.9966, 'learning_rate': 1.431854647930584e-06, 'epoch': 0.65} 65%|██████▌ | 1807/2774 [5:56:26<3:09:47, 11.78s/it] 65%|██████▌ | 1808/2774 [5:56:37<3:07:25, 11.64s/it] {'loss': 1.0718, 'learning_rate': 1.4292155933624642e-06, 'epoch': 0.65} 65%|██████▌ | 1808/2774 [5:56:37<3:07:25, 11.64s/it] 65%|██████▌ | 1809/2774 [5:56:48<3:05:43, 11.55s/it] {'loss': 1.0454, 'learning_rate': 1.4265779992795894e-06, 'epoch': 0.65} 65%|██████▌ | 1809/2774 [5:56:48<3:05:43, 11.55s/it] 65%|██████▌ | 1810/2774 [5:57:00<3:04:50, 11.50s/it] {'loss': 1.1113, 'learning_rate': 1.4239418692794813e-06, 'epoch': 0.65} 65%|██████▌ | 1810/2774 [5:57:00<3:04:50, 11.50s/it] 65%|██████▌ | 1811/2774 [5:57:11<3:04:26, 11.49s/it] {'loss': 0.9775, 'learning_rate': 1.4213072069576594e-06, 'epoch': 0.65} 65%|██████▌ | 1811/2774 [5:57:11<3:04:26, 11.49s/it] 65%|██████▌ | 1812/2774 [5:57:23<3:04:23, 11.50s/it] {'loss': 0.9922, 'learning_rate': 1.4186740159076461e-06, 'epoch': 0.65} 65%|██████▌ | 1812/2774 [5:57:23<3:04:23, 11.50s/it] 65%|██████▌ | 1813/2774 [5:57:34<3:04:25, 11.51s/it] {'loss': 1.0044, 'learning_rate': 1.4160422997209543e-06, 'epoch': 0.65} 65%|██████▌ | 1813/2774 [5:57:34<3:04:25, 11.51s/it] 65%|██████▌ | 1814/2774 [5:57:46<3:03:19, 11.46s/it] {'loss': 1.0215, 'learning_rate': 1.4134120619870855e-06, 'epoch': 0.65} 65%|██████▌ | 1814/2774 [5:57:46<3:03:19, 11.46s/it] 65%|██████▌ | 1815/2774 [5:57:59<3:13:34, 12.11s/it] {'loss': 0.9731, 'learning_rate': 1.4107833062935244e-06, 'epoch': 0.65} 65%|██████▌ | 1815/2774 [5:57:59<3:13:34, 12.11s/it] 65%|██████▌ | 1816/2774 [5:58:11<3:09:47, 11.89s/it] {'loss': 1.0439, 'learning_rate': 1.4081560362257365e-06, 'epoch': 0.65} 65%|██████▌ | 1816/2774 [5:58:11<3:09:47, 11.89s/it] 66%|██████▌ | 1817/2774 [5:58:22<3:06:42, 11.71s/it] {'loss': 0.9956, 'learning_rate': 1.405530255367158e-06, 'epoch': 0.66} 66%|██████▌ | 1817/2774 [5:58:22<3:06:42, 11.71s/it] 66%|██████▌ | 1818/2774 [5:58:34<3:06:08, 11.68s/it] {'loss': 1.0713, 'learning_rate': 1.402905967299197e-06, 'epoch': 0.66} 66%|██████▌ | 1818/2774 [5:58:34<3:06:08, 11.68s/it] 66%|██████▌ | 1819/2774 [5:58:47<3:12:11, 12.07s/it] {'loss': 0.9941, 'learning_rate': 1.4002831756012215e-06, 'epoch': 0.66} 66%|██████▌ | 1819/2774 [5:58:47<3:12:11, 12.07s/it] 66%|██████▌ | 1820/2774 [5:58:58<3:09:44, 11.93s/it] {'loss': 0.9834, 'learning_rate': 1.3976618838505637e-06, 'epoch': 0.66} 66%|██████▌ | 1820/2774 [5:58:58<3:09:44, 11.93s/it] 66%|██████▌ | 1821/2774 [5:59:10<3:07:09, 11.78s/it] {'loss': 0.9604, 'learning_rate': 1.3950420956225052e-06, 'epoch': 0.66} 66%|██████▌ | 1821/2774 [5:59:10<3:07:09, 11.78s/it] 66%|██████▌ | 1822/2774 [5:59:21<3:05:02, 11.66s/it] {'loss': 1.0059, 'learning_rate': 1.3924238144902813e-06, 'epoch': 0.66} 66%|██████▌ | 1822/2774 [5:59:21<3:05:02, 11.66s/it] 66%|██████▌ | 1823/2774 [5:59:32<3:02:54, 11.54s/it] {'loss': 1.0239, 'learning_rate': 1.3898070440250656e-06, 'epoch': 0.66} 66%|██████▌ | 1823/2774 [5:59:32<3:02:54, 11.54s/it] 66%|██████▌ | 1824/2774 [5:59:44<3:01:41, 11.48s/it] {'loss': 1.0474, 'learning_rate': 1.387191787795978e-06, 'epoch': 0.66} 66%|██████▌ | 1824/2774 [5:59:44<3:01:41, 11.48s/it] 66%|██████▌ | 1825/2774 [5:59:55<3:00:27, 11.41s/it] {'loss': 0.9966, 'learning_rate': 1.3845780493700684e-06, 'epoch': 0.66} 66%|██████▌ | 1825/2774 [5:59:55<3:00:27, 11.41s/it] 66%|██████▌ | 1826/2774 [6:00:08<3:07:44, 11.88s/it] {'loss': 1.0156, 'learning_rate': 1.3819658323123193e-06, 'epoch': 0.66} 66%|██████▌ | 1826/2774 [6:00:08<3:07:44, 11.88s/it] 66%|██████▌ | 1827/2774 [6:00:19<3:05:47, 11.77s/it] {'loss': 0.9736, 'learning_rate': 1.3793551401856353e-06, 'epoch': 0.66} 66%|██████▌ | 1827/2774 [6:00:19<3:05:47, 11.77s/it] 66%|██████▌ | 1828/2774 [6:00:31<3:04:55, 11.73s/it] {'loss': 1.0249, 'learning_rate': 1.3767459765508448e-06, 'epoch': 0.66} 66%|██████▌ | 1828/2774 [6:00:31<3:04:55, 11.73s/it] 66%|██████▌ | 1829/2774 [6:00:45<3:13:25, 12.28s/it] {'loss': 0.9917, 'learning_rate': 1.3741383449666885e-06, 'epoch': 0.66} 66%|██████▌ | 1829/2774 [6:00:45<3:13:25, 12.28s/it] 66%|██████▌ | 1830/2774 [6:00:56<3:09:13, 12.03s/it] {'loss': 0.98, 'learning_rate': 1.3715322489898169e-06, 'epoch': 0.66} 66%|██████▌ | 1830/2774 [6:00:56<3:09:13, 12.03s/it] 66%|██████▌ | 1831/2774 [6:01:07<3:06:12, 11.85s/it] {'loss': 1.0195, 'learning_rate': 1.3689276921747901e-06, 'epoch': 0.66} 66%|██████▌ | 1831/2774 [6:01:07<3:06:12, 11.85s/it] 66%|██████▌ | 1832/2774 [6:01:19<3:05:13, 11.80s/it] {'loss': 1.0381, 'learning_rate': 1.3663246780740653e-06, 'epoch': 0.66} 66%|██████▌ | 1832/2774 [6:01:19<3:05:13, 11.80s/it] 66%|██████▌ | 1833/2774 [6:01:31<3:03:22, 11.69s/it] {'loss': 1.0317, 'learning_rate': 1.363723210237996e-06, 'epoch': 0.66} 66%|██████▌ | 1833/2774 [6:01:31<3:03:22, 11.69s/it] 66%|██████▌ | 1834/2774 [6:01:42<3:02:41, 11.66s/it] {'loss': 1.0469, 'learning_rate': 1.361123292214826e-06, 'epoch': 0.66} 66%|██████▌ | 1834/2774 [6:01:42<3:02:41, 11.66s/it] 66%|██████▌ | 1835/2774 [6:01:55<3:06:14, 11.90s/it] {'loss': 1.0088, 'learning_rate': 1.358524927550689e-06, 'epoch': 0.66} 66%|██████▌ | 1835/2774 [6:01:55<3:06:14, 11.90s/it] 66%|██████▌ | 1836/2774 [6:02:06<3:04:02, 11.77s/it] {'loss': 1.0225, 'learning_rate': 1.3559281197895955e-06, 'epoch': 0.66} 66%|██████▌ | 1836/2774 [6:02:06<3:04:02, 11.77s/it] 66%|██████▌ | 1837/2774 [6:02:19<3:09:07, 12.11s/it] {'loss': 0.9814, 'learning_rate': 1.3533328724734358e-06, 'epoch': 0.66} 66%|██████▌ | 1837/2774 [6:02:19<3:09:07, 12.11s/it] 66%|██████▋ | 1838/2774 [6:02:31<3:07:47, 12.04s/it] {'loss': 0.9575, 'learning_rate': 1.3507391891419689e-06, 'epoch': 0.66} 66%|██████▋ | 1838/2774 [6:02:31<3:07:47, 12.04s/it] 66%|██████▋ | 1839/2774 [6:02:43<3:05:59, 11.94s/it] {'loss': 1.0278, 'learning_rate': 1.3481470733328238e-06, 'epoch': 0.66} 66%|██████▋ | 1839/2774 [6:02:43<3:05:59, 11.94s/it] 66%|██████▋ | 1840/2774 [6:02:54<3:03:36, 11.80s/it] {'loss': 1.041, 'learning_rate': 1.3455565285814898e-06, 'epoch': 0.66} 66%|██████▋ | 1840/2774 [6:02:54<3:03:36, 11.80s/it] 66%|██████▋ | 1841/2774 [6:03:06<3:02:02, 11.71s/it] {'loss': 1.0508, 'learning_rate': 1.3429675584213122e-06, 'epoch': 0.66} 66%|██████▋ | 1841/2774 [6:03:06<3:02:02, 11.71s/it] 66%|██████▋ | 1842/2774 [6:03:17<3:01:07, 11.66s/it] {'loss': 1.0044, 'learning_rate': 1.3403801663834897e-06, 'epoch': 0.66} 66%|██████▋ | 1842/2774 [6:03:17<3:01:07, 11.66s/it] 66%|██████▋ | 1843/2774 [6:03:29<3:01:20, 11.69s/it] {'loss': 0.979, 'learning_rate': 1.3377943559970707e-06, 'epoch': 0.66} 66%|██████▋ | 1843/2774 [6:03:29<3:01:20, 11.69s/it] 66%|██████▋ | 1844/2774 [6:03:40<3:00:43, 11.66s/it] {'loss': 1.0791, 'learning_rate': 1.3352101307889422e-06, 'epoch': 0.66} 66%|██████▋ | 1844/2774 [6:03:40<3:00:43, 11.66s/it] 67%|██████▋ | 1845/2774 [6:03:52<2:59:37, 11.60s/it] {'loss': 1.0098, 'learning_rate': 1.3326274942838333e-06, 'epoch': 0.67} 67%|██████▋ | 1845/2774 [6:03:52<2:59:37, 11.60s/it] 67%|██████▋ | 1846/2774 [6:04:04<3:01:10, 11.71s/it] {'loss': 1.0166, 'learning_rate': 1.330046450004302e-06, 'epoch': 0.67} 67%|██████▋ | 1846/2774 [6:04:04<3:01:10, 11.71s/it] 67%|██████▋ | 1847/2774 [6:04:18<3:10:23, 12.32s/it] {'loss': 0.9639, 'learning_rate': 1.3274670014707392e-06, 'epoch': 0.67} 67%|██████▋ | 1847/2774 [6:04:18<3:10:23, 12.32s/it] 67%|██████▋ | 1848/2774 [6:04:29<3:05:50, 12.04s/it] {'loss': 1.002, 'learning_rate': 1.3248891522013546e-06, 'epoch': 0.67} 67%|██████▋ | 1848/2774 [6:04:29<3:05:50, 12.04s/it] 67%|██████▋ | 1849/2774 [6:04:40<3:01:53, 11.80s/it] {'loss': 1.0405, 'learning_rate': 1.3223129057121816e-06, 'epoch': 0.67} 67%|██████▋ | 1849/2774 [6:04:40<3:01:53, 11.80s/it] 67%|██████▋ | 1850/2774 [6:04:52<3:00:09, 11.70s/it] {'loss': 1.0806, 'learning_rate': 1.3197382655170616e-06, 'epoch': 0.67} 67%|██████▋ | 1850/2774 [6:04:52<3:00:09, 11.70s/it] 67%|██████▋ | 1851/2774 [6:05:06<3:09:51, 12.34s/it] {'loss': 0.9961, 'learning_rate': 1.3171652351276505e-06, 'epoch': 0.67} 67%|██████▋ | 1851/2774 [6:05:06<3:09:51, 12.34s/it] 67%|██████▋ | 1852/2774 [6:05:18<3:08:26, 12.26s/it] {'loss': 1.0498, 'learning_rate': 1.3145938180534045e-06, 'epoch': 0.67} 67%|██████▋ | 1852/2774 [6:05:18<3:08:26, 12.26s/it] 67%|██████▋ | 1853/2774 [6:05:29<3:03:24, 11.95s/it] {'loss': 1.0029, 'learning_rate': 1.3120240178015834e-06, 'epoch': 0.67} 67%|██████▋ | 1853/2774 [6:05:29<3:03:24, 11.95s/it] 67%|██████▋ | 1854/2774 [6:05:40<3:00:21, 11.76s/it] {'loss': 1.0596, 'learning_rate': 1.3094558378772383e-06, 'epoch': 0.67} 67%|██████▋ | 1854/2774 [6:05:40<3:00:21, 11.76s/it] 67%|██████▋ | 1855/2774 [6:05:53<3:03:48, 12.00s/it] {'loss': 1.0264, 'learning_rate': 1.3068892817832108e-06, 'epoch': 0.67} 67%|██████▋ | 1855/2774 [6:05:53<3:03:48, 12.00s/it] 67%|██████▋ | 1856/2774 [6:06:04<3:02:24, 11.92s/it] {'loss': 1.0283, 'learning_rate': 1.30432435302013e-06, 'epoch': 0.67} 67%|██████▋ | 1856/2774 [6:06:04<3:02:24, 11.92s/it] 67%|██████▋ | 1857/2774 [6:06:16<2:58:50, 11.70s/it] {'loss': 1.0142, 'learning_rate': 1.3017610550864019e-06, 'epoch': 0.67} 67%|██████▋ | 1857/2774 [6:06:16<2:58:50, 11.70s/it] 67%|██████▋ | 1858/2774 [6:06:27<2:56:55, 11.59s/it] {'loss': 1.0156, 'learning_rate': 1.299199391478212e-06, 'epoch': 0.67} 67%|██████▋ | 1858/2774 [6:06:27<2:56:55, 11.59s/it] 67%|██████▋ | 1859/2774 [6:06:39<2:56:30, 11.57s/it] {'loss': 1.0464, 'learning_rate': 1.2966393656895136e-06, 'epoch': 0.67} 67%|██████▋ | 1859/2774 [6:06:39<2:56:30, 11.57s/it] 67%|██████▋ | 1860/2774 [6:06:50<2:55:41, 11.53s/it] {'loss': 1.0332, 'learning_rate': 1.2940809812120276e-06, 'epoch': 0.67} 67%|██████▋ | 1860/2774 [6:06:50<2:55:41, 11.53s/it] 67%|██████▋ | 1861/2774 [6:07:01<2:55:14, 11.52s/it] {'loss': 0.9932, 'learning_rate': 1.2915242415352346e-06, 'epoch': 0.67} 67%|██████▋ | 1861/2774 [6:07:01<2:55:14, 11.52s/it] 67%|██████▋ | 1862/2774 [6:07:13<2:54:01, 11.45s/it] {'loss': 1.0288, 'learning_rate': 1.2889691501463753e-06, 'epoch': 0.67} 67%|██████▋ | 1862/2774 [6:07:13<2:54:01, 11.45s/it] 67%|██████▋ | 1863/2774 [6:07:24<2:53:55, 11.45s/it] {'loss': 1.0151, 'learning_rate': 1.2864157105304376e-06, 'epoch': 0.67} 67%|██████▋ | 1863/2774 [6:07:24<2:53:55, 11.45s/it] 67%|██████▋ | 1864/2774 [6:07:36<2:56:16, 11.62s/it] {'loss': 1.0317, 'learning_rate': 1.2838639261701614e-06, 'epoch': 0.67} 67%|██████▋ | 1864/2774 [6:07:36<2:56:16, 11.62s/it] 67%|██████▋ | 1865/2774 [6:07:49<3:03:04, 12.08s/it] {'loss': 0.9688, 'learning_rate': 1.2813138005460241e-06, 'epoch': 0.67} 67%|██████▋ | 1865/2774 [6:07:49<3:03:04, 12.08s/it] 67%|██████▋ | 1866/2774 [6:08:01<2:59:20, 11.85s/it] {'loss': 1.0659, 'learning_rate': 1.278765337136245e-06, 'epoch': 0.67} 67%|██████▋ | 1866/2774 [6:08:01<2:59:20, 11.85s/it] 67%|██████▋ | 1867/2774 [6:08:12<2:57:26, 11.74s/it] {'loss': 1.0635, 'learning_rate': 1.276218539416773e-06, 'epoch': 0.67} 67%|██████▋ | 1867/2774 [6:08:12<2:57:26, 11.74s/it] 67%|██████▋ | 1868/2774 [6:08:25<3:01:35, 12.03s/it] {'loss': 0.9834, 'learning_rate': 1.273673410861287e-06, 'epoch': 0.67} 67%|██████▋ | 1868/2774 [6:08:25<3:01:35, 12.03s/it] 67%|██████▋ | 1869/2774 [6:08:36<2:59:05, 11.87s/it] {'loss': 1.0117, 'learning_rate': 1.271129954941187e-06, 'epoch': 0.67} 67%|██████▋ | 1869/2774 [6:08:36<2:59:05, 11.87s/it] 67%|██████▋ | 1870/2774 [6:08:48<2:55:51, 11.67s/it] {'loss': 1.0557, 'learning_rate': 1.2685881751255957e-06, 'epoch': 0.67} 67%|██████▋ | 1870/2774 [6:08:48<2:55:51, 11.67s/it] 67%|██████▋ | 1871/2774 [6:08:59<2:52:42, 11.48s/it] {'loss': 0.9854, 'learning_rate': 1.2660480748813453e-06, 'epoch': 0.67} 67%|██████▋ | 1871/2774 [6:08:59<2:52:42, 11.48s/it] 67%|██████▋ | 1872/2774 [6:09:10<2:51:53, 11.43s/it] {'loss': 1.0508, 'learning_rate': 1.2635096576729804e-06, 'epoch': 0.67} 67%|██████▋ | 1872/2774 [6:09:10<2:51:53, 11.43s/it] 68%|██████▊ | 1873/2774 [6:09:21<2:52:15, 11.47s/it] {'loss': 1.0557, 'learning_rate': 1.260972926962747e-06, 'epoch': 0.68} 68%|██████▊ | 1873/2774 [6:09:21<2:52:15, 11.47s/it] 68%|██████▊ | 1874/2774 [6:09:33<2:51:21, 11.42s/it] {'loss': 1.0615, 'learning_rate': 1.258437886210595e-06, 'epoch': 0.68} 68%|██████▊ | 1874/2774 [6:09:33<2:51:21, 11.42s/it] 68%|██████▊ | 1875/2774 [6:09:44<2:50:15, 11.36s/it] {'loss': 1.0059, 'learning_rate': 1.2559045388741654e-06, 'epoch': 0.68} 68%|██████▊ | 1875/2774 [6:09:44<2:50:15, 11.36s/it] 68%|██████▊ | 1876/2774 [6:09:56<2:50:45, 11.41s/it] {'loss': 1.064, 'learning_rate': 1.2533728884087909e-06, 'epoch': 0.68} 68%|██████▊ | 1876/2774 [6:09:56<2:50:45, 11.41s/it] 68%|██████▊ | 1877/2774 [6:10:07<2:50:54, 11.43s/it] {'loss': 1.0615, 'learning_rate': 1.250842938267489e-06, 'epoch': 0.68} 68%|██████▊ | 1877/2774 [6:10:07<2:50:54, 11.43s/it] 68%|██████▊ | 1878/2774 [6:10:19<2:51:02, 11.45s/it] {'loss': 0.9751, 'learning_rate': 1.2483146919009608e-06, 'epoch': 0.68} 68%|██████▊ | 1878/2774 [6:10:19<2:51:02, 11.45s/it] 68%|██████▊ | 1879/2774 [6:10:30<2:52:32, 11.57s/it] {'loss': 1.0312, 'learning_rate': 1.2457881527575808e-06, 'epoch': 0.68} 68%|██████▊ | 1879/2774 [6:10:30<2:52:32, 11.57s/it] 68%|██████▊ | 1880/2774 [6:10:42<2:51:21, 11.50s/it] {'loss': 1.0439, 'learning_rate': 1.2432633242833943e-06, 'epoch': 0.68} 68%|██████▊ | 1880/2774 [6:10:42<2:51:21, 11.50s/it] 68%|██████▊ | 1881/2774 [6:10:53<2:50:26, 11.45s/it] {'loss': 1.0547, 'learning_rate': 1.2407402099221174e-06, 'epoch': 0.68} 68%|██████▊ | 1881/2774 [6:10:53<2:50:26, 11.45s/it] 68%|██████▊ | 1882/2774 [6:11:04<2:49:21, 11.39s/it] {'loss': 1.0283, 'learning_rate': 1.2382188131151234e-06, 'epoch': 0.68} 68%|██████▊ | 1882/2774 [6:11:04<2:49:21, 11.39s/it] 68%|██████▊ | 1883/2774 [6:11:16<2:49:56, 11.44s/it] {'loss': 1.0449, 'learning_rate': 1.235699137301447e-06, 'epoch': 0.68} 68%|██████▊ | 1883/2774 [6:11:16<2:49:56, 11.44s/it] 68%|██████▊ | 1884/2774 [6:11:27<2:49:48, 11.45s/it] {'loss': 1.126, 'learning_rate': 1.2331811859177722e-06, 'epoch': 0.68} 68%|██████▊ | 1884/2774 [6:11:27<2:49:48, 11.45s/it] 68%|██████▊ | 1885/2774 [6:11:39<2:50:40, 11.52s/it] {'loss': 1.0024, 'learning_rate': 1.2306649623984355e-06, 'epoch': 0.68} 68%|██████▊ | 1885/2774 [6:11:39<2:50:40, 11.52s/it] 68%|██████▊ | 1886/2774 [6:11:50<2:49:17, 11.44s/it] {'loss': 1.0249, 'learning_rate': 1.2281504701754094e-06, 'epoch': 0.68} 68%|██████▊ | 1886/2774 [6:11:50<2:49:17, 11.44s/it] 68%|██████▊ | 1887/2774 [6:12:02<2:50:47, 11.55s/it] {'loss': 1.0093, 'learning_rate': 1.2256377126783128e-06, 'epoch': 0.68} 68%|██████▊ | 1887/2774 [6:12:02<2:50:47, 11.55s/it] 68%|██████▊ | 1888/2774 [6:12:13<2:49:18, 11.47s/it] {'loss': 1.0488, 'learning_rate': 1.223126693334393e-06, 'epoch': 0.68} 68%|██████▊ | 1888/2774 [6:12:13<2:49:18, 11.47s/it] 68%|██████▊ | 1889/2774 [6:12:26<2:55:41, 11.91s/it] {'loss': 0.9512, 'learning_rate': 1.2206174155685308e-06, 'epoch': 0.68} 68%|██████▊ | 1889/2774 [6:12:26<2:55:41, 11.91s/it] 68%|██████▊ | 1890/2774 [6:12:38<2:54:04, 11.81s/it] {'loss': 1.0112, 'learning_rate': 1.2181098828032273e-06, 'epoch': 0.68} 68%|██████▊ | 1890/2774 [6:12:38<2:54:04, 11.81s/it] 68%|██████▊ | 1891/2774 [6:12:49<2:52:59, 11.75s/it] {'loss': 1.0454, 'learning_rate': 1.2156040984586079e-06, 'epoch': 0.68} 68%|██████▊ | 1891/2774 [6:12:49<2:52:59, 11.75s/it] 68%|██████▊ | 1892/2774 [6:13:02<2:54:08, 11.85s/it] {'loss': 1.0679, 'learning_rate': 1.213100065952409e-06, 'epoch': 0.68} 68%|██████▊ | 1892/2774 [6:13:02<2:54:08, 11.85s/it] 68%|██████▊ | 1893/2774 [6:13:13<2:52:15, 11.73s/it] {'loss': 1.082, 'learning_rate': 1.2105977886999814e-06, 'epoch': 0.68} 68%|██████▊ | 1893/2774 [6:13:13<2:52:15, 11.73s/it] 68%|██████▊ | 1894/2774 [6:13:24<2:49:27, 11.55s/it] {'loss': 1.0444, 'learning_rate': 1.2080972701142795e-06, 'epoch': 0.68} 68%|██████▊ | 1894/2774 [6:13:24<2:49:27, 11.55s/it] 68%|██████▊ | 1895/2774 [6:13:36<2:48:40, 11.51s/it] {'loss': 1.0151, 'learning_rate': 1.2055985136058595e-06, 'epoch': 0.68} 68%|██████▊ | 1895/2774 [6:13:36<2:48:40, 11.51s/it] 68%|██████▊ | 1896/2774 [6:13:47<2:48:41, 11.53s/it] {'loss': 1.0059, 'learning_rate': 1.2031015225828734e-06, 'epoch': 0.68} 68%|██████▊ | 1896/2774 [6:13:47<2:48:41, 11.53s/it] 68%|██████▊ | 1897/2774 [6:14:00<2:52:33, 11.81s/it] {'loss': 0.9873, 'learning_rate': 1.200606300451068e-06, 'epoch': 0.68} 68%|██████▊ | 1897/2774 [6:14:00<2:52:33, 11.81s/it] 68%|██████▊ | 1898/2774 [6:14:11<2:50:39, 11.69s/it] {'loss': 1.0366, 'learning_rate': 1.1981128506137737e-06, 'epoch': 0.68} 68%|██████▊ | 1898/2774 [6:14:11<2:50:39, 11.69s/it] 68%|██████▊ | 1899/2774 [6:14:22<2:48:55, 11.58s/it] {'loss': 0.9766, 'learning_rate': 1.1956211764719072e-06, 'epoch': 0.68} 68%|██████▊ | 1899/2774 [6:14:22<2:48:55, 11.58s/it] 68%|██████▊ | 1900/2774 [6:14:35<2:55:27, 12.05s/it] {'loss': 0.9917, 'learning_rate': 1.1931312814239607e-06, 'epoch': 0.68} 68%|██████▊ | 1900/2774 [6:14:35<2:55:27, 12.05s/it] 69%|██████▊ | 1901/2774 [6:14:47<2:51:32, 11.79s/it] {'loss': 1.0361, 'learning_rate': 1.1906431688659995e-06, 'epoch': 0.69} 69%|██████▊ | 1901/2774 [6:14:47<2:51:32, 11.79s/it] 69%|██████▊ | 1902/2774 [6:14:58<2:49:40, 11.67s/it] {'loss': 1.0063, 'learning_rate': 1.188156842191661e-06, 'epoch': 0.69} 69%|██████▊ | 1902/2774 [6:14:58<2:49:40, 11.67s/it] 69%|██████▊ | 1903/2774 [6:15:09<2:47:27, 11.54s/it] {'loss': 1.0659, 'learning_rate': 1.1856723047921434e-06, 'epoch': 0.69} 69%|██████▊ | 1903/2774 [6:15:09<2:47:27, 11.54s/it] 69%|██████▊ | 1904/2774 [6:15:21<2:47:08, 11.53s/it] {'loss': 1.0317, 'learning_rate': 1.1831895600562046e-06, 'epoch': 0.69} 69%|██████▊ | 1904/2774 [6:15:21<2:47:08, 11.53s/it] 69%|██████▊ | 1905/2774 [6:15:32<2:45:05, 11.40s/it] {'loss': 1.0171, 'learning_rate': 1.1807086113701608e-06, 'epoch': 0.69} 69%|██████▊ | 1905/2774 [6:15:32<2:45:05, 11.40s/it] 69%|██████▊ | 1906/2774 [6:15:43<2:44:44, 11.39s/it] {'loss': 1.0205, 'learning_rate': 1.178229462117875e-06, 'epoch': 0.69} 69%|██████▊ | 1906/2774 [6:15:43<2:44:44, 11.39s/it] 69%|██████▊ | 1907/2774 [6:15:55<2:47:06, 11.56s/it] {'loss': 1.0215, 'learning_rate': 1.1757521156807556e-06, 'epoch': 0.69} 69%|██████▊ | 1907/2774 [6:15:55<2:47:06, 11.56s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 2. warnings.warn(str(msg)) 69%|██████▉ | 1908/2774 [6:16:07<2:45:50, 11.49s/it] {'loss': 1.0469, 'learning_rate': 1.1732765754377558e-06, 'epoch': 0.69} 69%|██████▉ | 1908/2774 [6:16:07<2:45:50, 11.49s/it] 69%|██████▉ | 1909/2774 [6:16:18<2:47:11, 11.60s/it] {'loss': 1.064, 'learning_rate': 1.1708028447653614e-06, 'epoch': 0.69} 69%|██████▉ | 1909/2774 [6:16:18<2:47:11, 11.60s/it] 69%|██████▉ | 1910/2774 [6:16:30<2:46:05, 11.53s/it] {'loss': 1.0127, 'learning_rate': 1.1683309270375928e-06, 'epoch': 0.69} 69%|██████▉ | 1910/2774 [6:16:30<2:46:05, 11.53s/it] 69%|██████▉ | 1911/2774 [6:16:42<2:47:57, 11.68s/it] {'loss': 1.0303, 'learning_rate': 1.165860825625995e-06, 'epoch': 0.69} 69%|██████▉ | 1911/2774 [6:16:42<2:47:57, 11.68s/it] 69%|██████▉ | 1912/2774 [6:16:53<2:47:07, 11.63s/it] {'loss': 1.0195, 'learning_rate': 1.16339254389964e-06, 'epoch': 0.69} 69%|██████▉ | 1912/2774 [6:16:53<2:47:07, 11.63s/it] 69%|██████▉ | 1913/2774 [6:17:05<2:45:09, 11.51s/it] {'loss': 1.0151, 'learning_rate': 1.1609260852251105e-06, 'epoch': 0.69} 69%|██████▉ | 1913/2774 [6:17:05<2:45:09, 11.51s/it] 69%|██████▉ | 1914/2774 [6:17:16<2:44:19, 11.46s/it] {'loss': 1.0171, 'learning_rate': 1.158461452966511e-06, 'epoch': 0.69} 69%|██████▉ | 1914/2774 [6:17:16<2:44:19, 11.46s/it] 69%|██████▉ | 1915/2774 [6:17:27<2:44:15, 11.47s/it] {'loss': 1.0654, 'learning_rate': 1.1559986504854481e-06, 'epoch': 0.69} 69%|██████▉ | 1915/2774 [6:17:27<2:44:15, 11.47s/it] 69%|██████▉ | 1916/2774 [6:17:39<2:45:10, 11.55s/it] {'loss': 1.0532, 'learning_rate': 1.1535376811410384e-06, 'epoch': 0.69} 69%|██████▉ | 1916/2774 [6:17:39<2:45:10, 11.55s/it] 69%|██████▉ | 1917/2774 [6:17:51<2:47:10, 11.70s/it] {'loss': 1.0601, 'learning_rate': 1.1510785482898928e-06, 'epoch': 0.69} 69%|██████▉ | 1917/2774 [6:17:51<2:47:10, 11.70s/it] 69%|██████▉ | 1918/2774 [6:18:03<2:46:38, 11.68s/it] {'loss': 1.0205, 'learning_rate': 1.1486212552861225e-06, 'epoch': 0.69} 69%|██████▉ | 1918/2774 [6:18:03<2:46:38, 11.68s/it] 69%|██████▉ | 1919/2774 [6:18:15<2:47:00, 11.72s/it] {'loss': 1.0308, 'learning_rate': 1.1461658054813244e-06, 'epoch': 0.69} 69%|██████▉ | 1919/2774 [6:18:15<2:47:00, 11.72s/it] 69%|██████▉ | 1920/2774 [6:18:26<2:46:35, 11.70s/it] {'loss': 1.085, 'learning_rate': 1.1437122022245859e-06, 'epoch': 0.69} 69%|██████▉ | 1920/2774 [6:18:26<2:46:35, 11.70s/it] 69%|██████▉ | 1921/2774 [6:18:38<2:44:45, 11.59s/it] {'loss': 1.04, 'learning_rate': 1.1412604488624721e-06, 'epoch': 0.69} 69%|██████▉ | 1921/2774 [6:18:38<2:44:45, 11.59s/it] 69%|██████▉ | 1922/2774 [6:18:51<2:51:17, 12.06s/it] {'loss': 0.9932, 'learning_rate': 1.1388105487390273e-06, 'epoch': 0.69} 69%|██████▉ | 1922/2774 [6:18:51<2:51:17, 12.06s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) 69%|██████▉ | 1923/2774 [6:19:02<2:48:02, 11.85s/it] {'loss': 0.999, 'learning_rate': 1.1363625051957655e-06, 'epoch': 0.69} 69%|██████▉ | 1923/2774 [6:19:02<2:48:02, 11.85s/it] 69%|██████▉ | 1924/2774 [6:19:14<2:46:03, 11.72s/it] {'loss': 1.0474, 'learning_rate': 1.1339163215716728e-06, 'epoch': 0.69} 69%|██████▉ | 1924/2774 [6:19:14<2:46:03, 11.72s/it] 69%|██████▉ | 1925/2774 [6:19:25<2:43:56, 11.59s/it] {'loss': 1.0659, 'learning_rate': 1.1314720012031935e-06, 'epoch': 0.69} 69%|██████▉ | 1925/2774 [6:19:25<2:43:56, 11.59s/it] 69%|██████▉ | 1926/2774 [6:19:37<2:44:49, 11.66s/it] {'loss': 0.9482, 'learning_rate': 1.129029547424235e-06, 'epoch': 0.69} 69%|██████▉ | 1926/2774 [6:19:37<2:44:49, 11.66s/it] 69%|██████▉ | 1927/2774 [6:19:48<2:43:11, 11.56s/it] {'loss': 1.0518, 'learning_rate': 1.1265889635661558e-06, 'epoch': 0.69} 69%|██████▉ | 1927/2774 [6:19:48<2:43:11, 11.56s/it] 70%|██████▉ | 1928/2774 [6:20:00<2:43:37, 11.61s/it] {'loss': 1.0073, 'learning_rate': 1.1241502529577642e-06, 'epoch': 0.7} 70%|██████▉ | 1928/2774 [6:20:00<2:43:37, 11.61s/it] 70%|██████▉ | 1929/2774 [6:20:11<2:42:44, 11.56s/it] {'loss': 1.0137, 'learning_rate': 1.1217134189253155e-06, 'epoch': 0.7} 70%|██████▉ | 1929/2774 [6:20:11<2:42:44, 11.56s/it] 70%|██████▉ | 1930/2774 [6:20:24<2:49:12, 12.03s/it] {'loss': 1.0093, 'learning_rate': 1.1192784647925031e-06, 'epoch': 0.7} 70%|██████▉ | 1930/2774 [6:20:24<2:49:12, 12.03s/it] 70%|██████▉ | 1931/2774 [6:20:36<2:46:57, 11.88s/it] {'loss': 0.998, 'learning_rate': 1.116845393880458e-06, 'epoch': 0.7} 70%|██████▉ | 1931/2774 [6:20:36<2:46:57, 11.88s/it] 70%|██████▉ | 1932/2774 [6:20:48<2:46:03, 11.83s/it] {'loss': 1.0015, 'learning_rate': 1.1144142095077406e-06, 'epoch': 0.7} 70%|██████▉ | 1932/2774 [6:20:48<2:46:03, 11.83s/it] 70%|██████▉ | 1933/2774 [6:21:01<2:51:58, 12.27s/it] {'loss': 0.9766, 'learning_rate': 1.1119849149903414e-06, 'epoch': 0.7} 70%|██████▉ | 1933/2774 [6:21:01<2:51:58, 12.27s/it] 70%|██████▉ | 1934/2774 [6:21:12<2:47:35, 11.97s/it] {'loss': 1.0454, 'learning_rate': 1.1095575136416695e-06, 'epoch': 0.7} 70%|██████▉ | 1934/2774 [6:21:12<2:47:35, 11.97s/it] 70%|██████▉ | 1935/2774 [6:21:24<2:45:30, 11.84s/it] {'loss': 1.0303, 'learning_rate': 1.1071320087725557e-06, 'epoch': 0.7} 70%|██████▉ | 1935/2774 [6:21:24<2:45:30, 11.84s/it] 70%|██████▉ | 1936/2774 [6:21:35<2:43:34, 11.71s/it] {'loss': 1.0181, 'learning_rate': 1.10470840369124e-06, 'epoch': 0.7} 70%|██████▉ | 1936/2774 [6:21:35<2:43:34, 11.71s/it] 70%|██████▉ | 1937/2774 [6:21:47<2:43:10, 11.70s/it] {'loss': 1.0171, 'learning_rate': 1.1022867017033757e-06, 'epoch': 0.7} 70%|██████▉ | 1937/2774 [6:21:47<2:43:10, 11.70s/it] 70%|██████▉ | 1938/2774 [6:21:58<2:40:58, 11.55s/it] {'loss': 1.0244, 'learning_rate': 1.0998669061120157e-06, 'epoch': 0.7} 70%|██████▉ | 1938/2774 [6:21:58<2:40:58, 11.55s/it] 70%|██████▉ | 1939/2774 [6:22:11<2:46:39, 11.98s/it] {'loss': 1.0327, 'learning_rate': 1.097449020217617e-06, 'epoch': 0.7} 70%|██████▉ | 1939/2774 [6:22:11<2:46:39, 11.98s/it] 70%|██████▉ | 1940/2774 [6:22:23<2:49:07, 12.17s/it] {'loss': 0.9478, 'learning_rate': 1.0950330473180287e-06, 'epoch': 0.7} 70%|██████▉ | 1940/2774 [6:22:23<2:49:07, 12.17s/it] 70%|██████▉ | 1941/2774 [6:22:35<2:45:32, 11.92s/it] {'loss': 1.0537, 'learning_rate': 1.0926189907084922e-06, 'epoch': 0.7} 70%|██████▉ | 1941/2774 [6:22:35<2:45:32, 11.92s/it] 70%|███████ | 1942/2774 [6:22:46<2:42:31, 11.72s/it] {'loss': 1.0225, 'learning_rate': 1.090206853681634e-06, 'epoch': 0.7} 70%|███████ | 1942/2774 [6:22:46<2:42:31, 11.72s/it] 70%|███████ | 1943/2774 [6:22:59<2:47:52, 12.12s/it] {'loss': 0.999, 'learning_rate': 1.0877966395274654e-06, 'epoch': 0.7} 70%|███████ | 1943/2774 [6:22:59<2:47:52, 12.12s/it] 70%|███████ | 1944/2774 [6:23:11<2:45:13, 11.94s/it] {'loss': 1.0176, 'learning_rate': 1.08538835153337e-06, 'epoch': 0.7} 70%|███████ | 1944/2774 [6:23:11<2:45:13, 11.94s/it] 70%|███████ | 1945/2774 [6:23:23<2:44:50, 11.93s/it] {'loss': 1.0073, 'learning_rate': 1.0829819929841104e-06, 'epoch': 0.7} 70%|███████ | 1945/2774 [6:23:23<2:44:50, 11.93s/it] 70%|███████ | 1946/2774 [6:23:34<2:41:56, 11.73s/it] {'loss': 1.0464, 'learning_rate': 1.0805775671618124e-06, 'epoch': 0.7} 70%|███████ | 1946/2774 [6:23:34<2:41:56, 11.73s/it] 70%|███████ | 1947/2774 [6:23:47<2:45:52, 12.03s/it] {'loss': 1.0552, 'learning_rate': 1.078175077345967e-06, 'epoch': 0.7} 70%|███████ | 1947/2774 [6:23:47<2:45:52, 12.03s/it] 70%|███████ | 1948/2774 [6:23:58<2:43:28, 11.87s/it] {'loss': 1.021, 'learning_rate': 1.075774526813427e-06, 'epoch': 0.7} 70%|███████ | 1948/2774 [6:23:58<2:43:28, 11.87s/it] 70%|███████ | 1949/2774 [6:24:09<2:40:36, 11.68s/it] {'loss': 1.0107, 'learning_rate': 1.073375918838397e-06, 'epoch': 0.7} 70%|███████ | 1949/2774 [6:24:09<2:40:36, 11.68s/it] 70%|███████ | 1950/2774 [6:24:20<2:38:11, 11.52s/it] {'loss': 1.04, 'learning_rate': 1.0709792566924333e-06, 'epoch': 0.7} 70%|███████ | 1950/2774 [6:24:20<2:38:11, 11.52s/it] 70%|███████ | 1951/2774 [6:24:32<2:38:09, 11.53s/it] {'loss': 1.0396, 'learning_rate': 1.0685845436444391e-06, 'epoch': 0.7} 70%|███████ | 1951/2774 [6:24:32<2:38:09, 11.53s/it] 70%|███████ | 1952/2774 [6:24:43<2:37:16, 11.48s/it] {'loss': 0.9595, 'learning_rate': 1.0661917829606585e-06, 'epoch': 0.7} 70%|███████ | 1952/2774 [6:24:43<2:37:16, 11.48s/it] 70%|███████ | 1953/2774 [6:24:55<2:36:10, 11.41s/it] {'loss': 1.0322, 'learning_rate': 1.0638009779046707e-06, 'epoch': 0.7} 70%|███████ | 1953/2774 [6:24:55<2:36:10, 11.41s/it] 70%|███████ | 1954/2774 [6:25:06<2:35:57, 11.41s/it] {'loss': 1.0132, 'learning_rate': 1.061412131737392e-06, 'epoch': 0.7} 70%|███████ | 1954/2774 [6:25:06<2:35:57, 11.41s/it] 70%|███████ | 1955/2774 [6:25:18<2:36:31, 11.47s/it] {'loss': 1.0742, 'learning_rate': 1.0590252477170614e-06, 'epoch': 0.7} 70%|███████ | 1955/2774 [6:25:18<2:36:31, 11.47s/it] 71%|███████ | 1956/2774 [6:25:30<2:38:45, 11.65s/it] {'loss': 1.0288, 'learning_rate': 1.0566403290992471e-06, 'epoch': 0.71} 71%|███████ | 1956/2774 [6:25:30<2:38:45, 11.65s/it] 71%|███████ | 1957/2774 [6:25:41<2:37:20, 11.55s/it] {'loss': 1.1035, 'learning_rate': 1.0542573791368323e-06, 'epoch': 0.71} 71%|███████ | 1957/2774 [6:25:41<2:37:20, 11.55s/it] 71%|███████ | 1958/2774 [6:25:53<2:38:23, 11.65s/it] {'loss': 1.0181, 'learning_rate': 1.0518764010800193e-06, 'epoch': 0.71} 71%|███████ | 1958/2774 [6:25:53<2:38:23, 11.65s/it] 71%|███████ | 1959/2774 [6:26:04<2:37:41, 11.61s/it] {'loss': 1.0112, 'learning_rate': 1.0494973981763145e-06, 'epoch': 0.71} 71%|███████ | 1959/2774 [6:26:04<2:37:41, 11.61s/it] 71%|███████ | 1960/2774 [6:26:16<2:38:06, 11.65s/it] {'loss': 0.9917, 'learning_rate': 1.0471203736705371e-06, 'epoch': 0.71} 71%|███████ | 1960/2774 [6:26:16<2:38:06, 11.65s/it] 71%|███████ | 1961/2774 [6:26:27<2:36:05, 11.52s/it] {'loss': 1.0576, 'learning_rate': 1.044745330804803e-06, 'epoch': 0.71} 71%|███████ | 1961/2774 [6:26:27<2:36:05, 11.52s/it] 71%|███████ | 1962/2774 [6:26:40<2:41:29, 11.93s/it] {'loss': 1.0181, 'learning_rate': 1.0423722728185292e-06, 'epoch': 0.71} 71%|███████ | 1962/2774 [6:26:40<2:41:29, 11.93s/it] 71%|███████ | 1963/2774 [6:26:52<2:39:44, 11.82s/it] {'loss': 0.998, 'learning_rate': 1.0400012029484216e-06, 'epoch': 0.71} 71%|███████ | 1963/2774 [6:26:52<2:39:44, 11.82s/it] 71%|███████ | 1964/2774 [6:27:03<2:38:02, 11.71s/it] {'loss': 1.0181, 'learning_rate': 1.0376321244284778e-06, 'epoch': 0.71} 71%|███████ | 1964/2774 [6:27:03<2:38:02, 11.71s/it] 71%|███████ | 1965/2774 [6:27:15<2:36:43, 11.62s/it] {'loss': 0.9922, 'learning_rate': 1.0352650404899765e-06, 'epoch': 0.71} 71%|███████ | 1965/2774 [6:27:15<2:36:43, 11.62s/it] 71%|███████ | 1966/2774 [6:27:27<2:37:22, 11.69s/it] {'loss': 1.0283, 'learning_rate': 1.0328999543614782e-06, 'epoch': 0.71} 71%|███████ | 1966/2774 [6:27:27<2:37:22, 11.69s/it] 71%|███████ | 1967/2774 [6:27:40<2:42:30, 12.08s/it] {'loss': 1.061, 'learning_rate': 1.0305368692688175e-06, 'epoch': 0.71} 71%|███████ | 1967/2774 [6:27:40<2:42:30, 12.08s/it] 71%|███████ | 1968/2774 [6:27:51<2:39:21, 11.86s/it] {'loss': 1.0615, 'learning_rate': 1.028175788435099e-06, 'epoch': 0.71} 71%|███████ | 1968/2774 [6:27:51<2:39:21, 11.86s/it] 71%|███████ | 1969/2774 [6:28:02<2:37:16, 11.72s/it] {'loss': 1.0083, 'learning_rate': 1.0258167150806938e-06, 'epoch': 0.71} 71%|███████ | 1969/2774 [6:28:02<2:37:16, 11.72s/it] 71%|███████ | 1970/2774 [6:28:13<2:34:57, 11.56s/it] {'loss': 1.0039, 'learning_rate': 1.0234596524232374e-06, 'epoch': 0.71} 71%|███████ | 1970/2774 [6:28:13<2:34:57, 11.56s/it] 71%|███████ | 1971/2774 [6:28:25<2:34:38, 11.56s/it] {'loss': 1.0176, 'learning_rate': 1.0211046036776187e-06, 'epoch': 0.71} 71%|███████ | 1971/2774 [6:28:25<2:34:38, 11.56s/it] 71%|███████ | 1972/2774 [6:28:37<2:34:25, 11.55s/it] {'loss': 1.0649, 'learning_rate': 1.018751572055984e-06, 'epoch': 0.71} 71%|███████ | 1972/2774 [6:28:37<2:34:25, 11.55s/it] 71%|███████ | 1973/2774 [6:28:48<2:33:12, 11.48s/it] {'loss': 1.0034, 'learning_rate': 1.0164005607677253e-06, 'epoch': 0.71} 71%|███████ | 1973/2774 [6:28:48<2:33:12, 11.48s/it] 71%|███████ | 1974/2774 [6:29:00<2:35:59, 11.70s/it] {'loss': 0.9907, 'learning_rate': 1.014051573019479e-06, 'epoch': 0.71} 71%|███████ | 1974/2774 [6:29:00<2:35:59, 11.70s/it] 71%|███████ | 1975/2774 [6:29:12<2:35:28, 11.68s/it] {'loss': 1.0874, 'learning_rate': 1.0117046120151242e-06, 'epoch': 0.71} 71%|███████ | 1975/2774 [6:29:12<2:35:28, 11.68s/it] 71%|███████ | 1976/2774 [6:29:23<2:34:06, 11.59s/it] {'loss': 1.0483, 'learning_rate': 1.0093596809557732e-06, 'epoch': 0.71} 71%|███████ | 1976/2774 [6:29:23<2:34:06, 11.59s/it] 71%|███████▏ | 1977/2774 [6:29:35<2:33:21, 11.54s/it] {'loss': 1.02, 'learning_rate': 1.0070167830397702e-06, 'epoch': 0.71} 71%|███████▏ | 1977/2774 [6:29:35<2:33:21, 11.54s/it] 71%|███████▏ | 1978/2774 [6:29:46<2:31:46, 11.44s/it] {'loss': 1.0254, 'learning_rate': 1.004675921462686e-06, 'epoch': 0.71} 71%|███████▏ | 1978/2774 [6:29:46<2:31:46, 11.44s/it] 71%|███████▏ | 1979/2774 [6:29:57<2:31:16, 11.42s/it] {'loss': 0.9995, 'learning_rate': 1.0023370994173155e-06, 'epoch': 0.71} 71%|███████▏ | 1979/2774 [6:29:57<2:31:16, 11.42s/it] 71%|███████▏ | 1980/2774 [6:30:09<2:31:38, 11.46s/it] {'loss': 1.0137, 'learning_rate': 1.000000320093669e-06, 'epoch': 0.71} 71%|███████▏ | 1980/2774 [6:30:09<2:31:38, 11.46s/it] 71%|███████▏ | 1981/2774 [6:30:20<2:31:16, 11.45s/it] {'loss': 0.9404, 'learning_rate': 9.976655866789745e-07, 'epoch': 0.71} 71%|███████▏ | 1981/2774 [6:30:20<2:31:16, 11.45s/it] 71%|███████▏ | 1982/2774 [6:30:31<2:31:01, 11.44s/it] {'loss': 0.9761, 'learning_rate': 9.953329023576655e-07, 'epoch': 0.71} 71%|███████▏ | 1982/2774 [6:30:31<2:31:01, 11.44s/it] 71%|███████▏ | 1983/2774 [6:30:43<2:31:22, 11.48s/it] {'loss': 1.0913, 'learning_rate': 9.93002270311384e-07, 'epoch': 0.71} 71%|███████▏ | 1983/2774 [6:30:43<2:31:22, 11.48s/it] 72%|███████▏ | 1984/2774 [6:30:55<2:31:03, 11.47s/it] {'loss': 0.9878, 'learning_rate': 9.9067369371897e-07, 'epoch': 0.72} 72%|███████▏ | 1984/2774 [6:30:55<2:31:03, 11.47s/it] 72%|███████▏ | 1985/2774 [6:31:07<2:33:21, 11.66s/it] {'loss': 1.019, 'learning_rate': 9.883471757564634e-07, 'epoch': 0.72} 72%|███████▏ | 1985/2774 [6:31:07<2:33:21, 11.66s/it] 72%|███████▏ | 1986/2774 [6:31:18<2:32:01, 11.58s/it] {'loss': 1.0146, 'learning_rate': 9.860227195970906e-07, 'epoch': 0.72} 72%|███████▏ | 1986/2774 [6:31:18<2:32:01, 11.58s/it] 72%|███████▏ | 1987/2774 [6:31:29<2:30:12, 11.45s/it] {'loss': 0.9927, 'learning_rate': 9.837003284112727e-07, 'epoch': 0.72} 72%|███████▏ | 1987/2774 [6:31:29<2:30:12, 11.45s/it] 72%|███████▏ | 1988/2774 [6:31:40<2:29:06, 11.38s/it] {'loss': 1.0522, 'learning_rate': 9.813800053666086e-07, 'epoch': 0.72} 72%|███████▏ | 1988/2774 [6:31:40<2:29:06, 11.38s/it] 72%|███████▏ | 1989/2774 [6:31:52<2:28:58, 11.39s/it] {'loss': 1.0073, 'learning_rate': 9.790617536278809e-07, 'epoch': 0.72} 72%|███████▏ | 1989/2774 [6:31:52<2:28:58, 11.39s/it] 72%|███████▏ | 1990/2774 [6:32:03<2:29:50, 11.47s/it] {'loss': 1.0352, 'learning_rate': 9.767455763570433e-07, 'epoch': 0.72} 72%|███████▏ | 1990/2774 [6:32:03<2:29:50, 11.47s/it] 72%|███████▏ | 1991/2774 [6:32:17<2:37:19, 12.06s/it] {'loss': 1.0088, 'learning_rate': 9.74431476713223e-07, 'epoch': 0.72} 72%|███████▏ | 1991/2774 [6:32:17<2:37:19, 12.06s/it] 72%|███████▏ | 1992/2774 [6:32:28<2:34:16, 11.84s/it] {'loss': 0.9961, 'learning_rate': 9.721194578527112e-07, 'epoch': 0.72} 72%|███████▏ | 1992/2774 [6:32:28<2:34:16, 11.84s/it] 72%|███████▏ | 1993/2774 [6:32:39<2:31:34, 11.64s/it] {'loss': 0.9985, 'learning_rate': 9.698095229289614e-07, 'epoch': 0.72} 72%|███████▏ | 1993/2774 [6:32:39<2:31:34, 11.64s/it] 72%|███████▏ | 1994/2774 [6:32:51<2:32:10, 11.71s/it] {'loss': 0.9868, 'learning_rate': 9.67501675092587e-07, 'epoch': 0.72} 72%|███████▏ | 1994/2774 [6:32:51<2:32:10, 11.71s/it] 72%|███████▏ | 1995/2774 [6:33:03<2:30:20, 11.58s/it] {'loss': 1.0405, 'learning_rate': 9.65195917491352e-07, 'epoch': 0.72} 72%|███████▏ | 1995/2774 [6:33:03<2:30:20, 11.58s/it] 72%|███████▏ | 1996/2774 [6:33:14<2:31:25, 11.68s/it] {'loss': 1.0435, 'learning_rate': 9.6289225327017e-07, 'epoch': 0.72} 72%|███████▏ | 1996/2774 [6:33:14<2:31:25, 11.68s/it] 72%|███████▏ | 1997/2774 [6:33:27<2:35:53, 12.04s/it] {'loss': 0.9395, 'learning_rate': 9.605906855711011e-07, 'epoch': 0.72} 72%|███████▏ | 1997/2774 [6:33:27<2:35:53, 12.04s/it] 72%|███████▏ | 1998/2774 [6:33:39<2:35:05, 11.99s/it] {'loss': 1.0776, 'learning_rate': 9.582912175333438e-07, 'epoch': 0.72} 72%|███████▏ | 1998/2774 [6:33:39<2:35:05, 11.99s/it] 72%|███████▏ | 1999/2774 [6:33:51<2:34:06, 11.93s/it] {'loss': 1.0024, 'learning_rate': 9.55993852293233e-07, 'epoch': 0.72} 72%|███████▏ | 1999/2774 [6:33:51<2:34:06, 11.93s/it] 72%|███████▏ | 2000/2774 [6:34:03<2:32:34, 11.83s/it] {'loss': 1.04, 'learning_rate': 9.53698592984238e-07, 'epoch': 0.72} 72%|███████▏ | 2000/2774 [6:34:03<2:32:34, 11.83s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 72%|███████▏ | 2001/2774 [6:34:40<4:13:02, 19.64s/it] {'loss': 1.0435, 'learning_rate': 9.514054427369515e-07, 'epoch': 0.72} 72%|███████▏ | 2001/2774 [6:34:40<4:13:02, 19.64s/it] 72%|███████▏ | 2002/2774 [6:34:53<3:47:07, 17.65s/it] {'loss': 1.0259, 'learning_rate': 9.491144046790939e-07, 'epoch': 0.72} 72%|███████▏ | 2002/2774 [6:34:53<3:47:07, 17.65s/it] 72%|███████▏ | 2003/2774 [6:35:06<3:25:26, 15.99s/it] {'loss': 1.0303, 'learning_rate': 9.468254819355019e-07, 'epoch': 0.72} 72%|███████▏ | 2003/2774 [6:35:06<3:25:26, 15.99s/it] 72%|███████▏ | 2004/2774 [6:35:17<3:07:05, 14.58s/it] {'loss': 1.0024, 'learning_rate': 9.445386776281282e-07, 'epoch': 0.72} 72%|███████▏ | 2004/2774 [6:35:17<3:07:05, 14.58s/it] 72%|███████▏ | 2005/2774 [6:35:29<2:59:05, 13.97s/it] {'loss': 1.0171, 'learning_rate': 9.422539948760342e-07, 'epoch': 0.72} 72%|███████▏ | 2005/2774 [6:35:29<2:59:05, 13.97s/it] 72%|███████▏ | 2006/2774 [6:35:43<2:56:25, 13.78s/it] {'loss': 1.0029, 'learning_rate': 9.399714367953913e-07, 'epoch': 0.72} 72%|███████▏ | 2006/2774 [6:35:43<2:56:25, 13.78s/it] 72%|███████▏ | 2007/2774 [6:35:54<2:47:59, 13.14s/it] {'loss': 1.021, 'learning_rate': 9.37691006499469e-07, 'epoch': 0.72} 72%|███████▏ | 2007/2774 [6:35:54<2:47:59, 13.14s/it] 72%|███████▏ | 2008/2774 [6:36:06<2:41:09, 12.62s/it] {'loss': 1.0415, 'learning_rate': 9.354127070986385e-07, 'epoch': 0.72} 72%|███████▏ | 2008/2774 [6:36:06<2:41:09, 12.62s/it] 72%|███████▏ | 2009/2774 [6:36:17<2:35:38, 12.21s/it] {'loss': 1.0073, 'learning_rate': 9.331365417003602e-07, 'epoch': 0.72} 72%|███████▏ | 2009/2774 [6:36:17<2:35:38, 12.21s/it] 72%|███████▏ | 2010/2774 [6:36:28<2:31:47, 11.92s/it] {'loss': 1.0308, 'learning_rate': 9.308625134091886e-07, 'epoch': 0.72} 72%|███████▏ | 2010/2774 [6:36:28<2:31:47, 11.92s/it] 72%|███████▏ | 2011/2774 [6:36:40<2:29:02, 11.72s/it] {'loss': 0.9873, 'learning_rate': 9.285906253267587e-07, 'epoch': 0.72} 72%|███████▏ | 2011/2774 [6:36:40<2:29:02, 11.72s/it] 73%|███████▎ | 2012/2774 [6:36:51<2:28:12, 11.67s/it] {'loss': 0.9902, 'learning_rate': 9.263208805517912e-07, 'epoch': 0.73} 73%|███████▎ | 2012/2774 [6:36:51<2:28:12, 11.67s/it] 73%|███████▎ | 2013/2774 [6:37:02<2:26:27, 11.55s/it] {'loss': 1.0034, 'learning_rate': 9.24053282180078e-07, 'epoch': 0.73} 73%|███████▎ | 2013/2774 [6:37:02<2:26:27, 11.55s/it] 73%|███████▎ | 2014/2774 [6:37:14<2:25:32, 11.49s/it] {'loss': 1.103, 'learning_rate': 9.21787833304488e-07, 'epoch': 0.73} 73%|███████▎ | 2014/2774 [6:37:14<2:25:32, 11.49s/it] 73%|███████▎ | 2015/2774 [6:37:25<2:24:26, 11.42s/it] {'loss': 1.0088, 'learning_rate': 9.195245370149555e-07, 'epoch': 0.73} 73%|███████▎ | 2015/2774 [6:37:25<2:24:26, 11.42s/it] 73%|███████▎ | 2016/2774 [6:37:36<2:23:57, 11.39s/it] {'loss': 1.0767, 'learning_rate': 9.172633963984818e-07, 'epoch': 0.73} 73%|███████▎ | 2016/2774 [6:37:36<2:23:57, 11.39s/it] 73%|███████▎ | 2017/2774 [6:37:50<2:31:53, 12.04s/it] {'loss': 0.9736, 'learning_rate': 9.150044145391237e-07, 'epoch': 0.73} 73%|███████▎ | 2017/2774 [6:37:50<2:31:53, 12.04s/it] 73%|███████▎ | 2018/2774 [6:38:01<2:29:18, 11.85s/it] {'loss': 1.0132, 'learning_rate': 9.127475945179982e-07, 'epoch': 0.73} 73%|███████▎ | 2018/2774 [6:38:01<2:29:18, 11.85s/it] 73%|███████▎ | 2019/2774 [6:38:15<2:34:56, 12.31s/it] {'loss': 0.958, 'learning_rate': 9.104929394132706e-07, 'epoch': 0.73} 73%|███████▎ | 2019/2774 [6:38:15<2:34:56, 12.31s/it] 73%|███████▎ | 2020/2774 [6:38:26<2:30:27, 11.97s/it] {'loss': 0.9673, 'learning_rate': 9.082404523001531e-07, 'epoch': 0.73} 73%|███████▎ | 2020/2774 [6:38:26<2:30:27, 11.97s/it] 73%|███████▎ | 2021/2774 [6:38:38<2:29:35, 11.92s/it] {'loss': 1.0376, 'learning_rate': 9.059901362509044e-07, 'epoch': 0.73} 73%|███████▎ | 2021/2774 [6:38:38<2:29:35, 11.92s/it] 73%|███████▎ | 2022/2774 [6:38:49<2:27:21, 11.76s/it] {'loss': 1.0054, 'learning_rate': 9.03741994334818e-07, 'epoch': 0.73} 73%|███████▎ | 2022/2774 [6:38:49<2:27:21, 11.76s/it] 73%|███████▎ | 2023/2774 [6:39:00<2:25:41, 11.64s/it] {'loss': 0.9878, 'learning_rate': 9.01496029618224e-07, 'epoch': 0.73} 73%|███████▎ | 2023/2774 [6:39:00<2:25:41, 11.64s/it] 73%|███████▎ | 2024/2774 [6:39:12<2:26:01, 11.68s/it] {'loss': 1.021, 'learning_rate': 8.992522451644823e-07, 'epoch': 0.73} 73%|███████▎ | 2024/2774 [6:39:12<2:26:01, 11.68s/it] 73%|███████▎ | 2025/2774 [6:39:24<2:26:05, 11.70s/it] {'loss': 1.0391, 'learning_rate': 8.970106440339801e-07, 'epoch': 0.73} 73%|███████▎ | 2025/2774 [6:39:24<2:26:05, 11.70s/it] 73%|███████▎ | 2026/2774 [6:39:36<2:25:32, 11.67s/it] {'loss': 1.0024, 'learning_rate': 8.947712292841248e-07, 'epoch': 0.73} 73%|███████▎ | 2026/2774 [6:39:36<2:25:32, 11.67s/it] 73%|███████▎ | 2027/2774 [6:39:47<2:25:45, 11.71s/it] {'loss': 1.0337, 'learning_rate': 8.925340039693444e-07, 'epoch': 0.73} 73%|███████▎ | 2027/2774 [6:39:47<2:25:45, 11.71s/it] 73%|███████▎ | 2028/2774 [6:40:00<2:29:11, 12.00s/it] {'loss': 0.999, 'learning_rate': 8.902989711410773e-07, 'epoch': 0.73} 73%|███████▎ | 2028/2774 [6:40:00<2:29:11, 12.00s/it] 73%|███████▎ | 2029/2774 [6:40:12<2:28:53, 11.99s/it] {'loss': 1.0078, 'learning_rate': 8.880661338477753e-07, 'epoch': 0.73} 73%|███████▎ | 2029/2774 [6:40:12<2:28:53, 11.99s/it] 73%|███████▎ | 2030/2774 [6:40:24<2:30:14, 12.12s/it] {'loss': 0.9888, 'learning_rate': 8.858354951348924e-07, 'epoch': 0.73} 73%|███████▎ | 2030/2774 [6:40:24<2:30:14, 12.12s/it] 73%|███████▎ | 2031/2774 [6:40:36<2:27:21, 11.90s/it] {'loss': 0.9946, 'learning_rate': 8.83607058044885e-07, 'epoch': 0.73} 73%|███████▎ | 2031/2774 [6:40:36<2:27:21, 11.90s/it] 73%|███████▎ | 2032/2774 [6:40:49<2:33:20, 12.40s/it] {'loss': 0.9937, 'learning_rate': 8.813808256172063e-07, 'epoch': 0.73} 73%|███████▎ | 2032/2774 [6:40:49<2:33:20, 12.40s/it] 73%|███████▎ | 2033/2774 [6:41:01<2:30:48, 12.21s/it] {'loss': 1.0059, 'learning_rate': 8.791568008883039e-07, 'epoch': 0.73} 73%|███████▎ | 2033/2774 [6:41:01<2:30:48, 12.21s/it] 73%|███████▎ | 2034/2774 [6:41:13<2:30:34, 12.21s/it] {'loss': 1.022, 'learning_rate': 8.769349868916119e-07, 'epoch': 0.73} 73%|███████▎ | 2034/2774 [6:41:13<2:30:34, 12.21s/it] 73%|███████▎ | 2035/2774 [6:41:25<2:27:10, 11.95s/it] {'loss': 1.0469, 'learning_rate': 8.747153866575522e-07, 'epoch': 0.73} 73%|███████▎ | 2035/2774 [6:41:25<2:27:10, 11.95s/it] 73%|███████▎ | 2036/2774 [6:41:37<2:30:17, 12.22s/it] {'loss': 0.9766, 'learning_rate': 8.724980032135233e-07, 'epoch': 0.73} 73%|███████▎ | 2036/2774 [6:41:37<2:30:17, 12.22s/it] 73%|███████▎ | 2037/2774 [6:41:49<2:26:28, 11.93s/it] {'loss': 1.0771, 'learning_rate': 8.702828395839044e-07, 'epoch': 0.73} 73%|███████▎ | 2037/2774 [6:41:49<2:26:28, 11.93s/it] 73%|███████▎ | 2038/2774 [6:42:01<2:28:11, 12.08s/it] {'loss': 1.042, 'learning_rate': 8.680698987900435e-07, 'epoch': 0.73} 73%|███████▎ | 2038/2774 [6:42:01<2:28:11, 12.08s/it] 74%|███████▎ | 2039/2774 [6:42:13<2:26:32, 11.96s/it] {'loss': 1.0332, 'learning_rate': 8.658591838502587e-07, 'epoch': 0.74} 74%|███████▎ | 2039/2774 [6:42:13<2:26:32, 11.96s/it] 74%|███████▎ | 2040/2774 [6:42:25<2:25:39, 11.91s/it] {'loss': 1.0391, 'learning_rate': 8.636506977798306e-07, 'epoch': 0.74} 74%|███████▎ | 2040/2774 [6:42:25<2:25:39, 11.91s/it] 74%|███████▎ | 2041/2774 [6:42:36<2:23:49, 11.77s/it] {'loss': 1.0659, 'learning_rate': 8.614444435910024e-07, 'epoch': 0.74} 74%|███████▎ | 2041/2774 [6:42:36<2:23:49, 11.77s/it] 74%|███████▎ | 2042/2774 [6:42:48<2:25:13, 11.90s/it] {'loss': 1.0288, 'learning_rate': 8.592404242929697e-07, 'epoch': 0.74} 74%|███████▎ | 2042/2774 [6:42:48<2:25:13, 11.90s/it] 74%|███████▎ | 2043/2774 [6:43:01<2:26:58, 12.06s/it] {'loss': 0.9707, 'learning_rate': 8.57038642891884e-07, 'epoch': 0.74} 74%|███████▎ | 2043/2774 [6:43:01<2:26:58, 12.06s/it] 74%|███████▎ | 2044/2774 [6:43:12<2:25:19, 11.94s/it] {'loss': 1.0522, 'learning_rate': 8.548391023908403e-07, 'epoch': 0.74} 74%|███████▎ | 2044/2774 [6:43:12<2:25:19, 11.94s/it] 74%|███████▎ | 2045/2774 [6:43:24<2:23:08, 11.78s/it] {'loss': 1.0239, 'learning_rate': 8.526418057898791e-07, 'epoch': 0.74} 74%|███████▎ | 2045/2774 [6:43:24<2:23:08, 11.78s/it] 74%|███████▍ | 2046/2774 [6:43:35<2:21:28, 11.66s/it] {'loss': 1.02, 'learning_rate': 8.504467560859814e-07, 'epoch': 0.74} 74%|███████▍ | 2046/2774 [6:43:35<2:21:28, 11.66s/it] 74%|███████▍ | 2047/2774 [6:43:47<2:20:38, 11.61s/it] {'loss': 1.04, 'learning_rate': 8.482539562730607e-07, 'epoch': 0.74} 74%|███████▍ | 2047/2774 [6:43:47<2:20:38, 11.61s/it] 74%|███████▍ | 2048/2774 [6:43:59<2:21:22, 11.68s/it] {'loss': 1.0156, 'learning_rate': 8.460634093419662e-07, 'epoch': 0.74} 74%|███████▍ | 2048/2774 [6:43:59<2:21:22, 11.68s/it] 74%|███████▍ | 2049/2774 [6:44:11<2:25:22, 12.03s/it] {'loss': 1.1001, 'learning_rate': 8.43875118280468e-07, 'epoch': 0.74} 74%|███████▍ | 2049/2774 [6:44:11<2:25:22, 12.03s/it] 74%|███████▍ | 2050/2774 [6:44:23<2:22:43, 11.83s/it] {'loss': 0.9956, 'learning_rate': 8.416890860732657e-07, 'epoch': 0.74} 74%|███████▍ | 2050/2774 [6:44:23<2:22:43, 11.83s/it] 74%|███████▍ | 2051/2774 [6:44:35<2:22:30, 11.83s/it] {'loss': 0.9927, 'learning_rate': 8.395053157019733e-07, 'epoch': 0.74} 74%|███████▍ | 2051/2774 [6:44:35<2:22:30, 11.83s/it] 74%|███████▍ | 2052/2774 [6:44:46<2:21:11, 11.73s/it] {'loss': 1.0181, 'learning_rate': 8.373238101451234e-07, 'epoch': 0.74} 74%|███████▍ | 2052/2774 [6:44:46<2:21:11, 11.73s/it] 74%|███████▍ | 2053/2774 [6:44:59<2:24:50, 12.05s/it] {'loss': 1.0181, 'learning_rate': 8.351445723781562e-07, 'epoch': 0.74} 74%|███████▍ | 2053/2774 [6:44:59<2:24:50, 12.05s/it] 74%|███████▍ | 2054/2774 [6:45:10<2:22:19, 11.86s/it] {'loss': 0.9766, 'learning_rate': 8.32967605373422e-07, 'epoch': 0.74} 74%|███████▍ | 2054/2774 [6:45:10<2:22:19, 11.86s/it] 74%|███████▍ | 2055/2774 [6:45:21<2:19:24, 11.63s/it] {'loss': 0.9927, 'learning_rate': 8.307929121001704e-07, 'epoch': 0.74} 74%|███████▍ | 2055/2774 [6:45:21<2:19:24, 11.63s/it] 74%|███████▍ | 2056/2774 [6:45:33<2:18:00, 11.53s/it] {'loss': 1.0151, 'learning_rate': 8.286204955245535e-07, 'epoch': 0.74} 74%|███████▍ | 2056/2774 [6:45:33<2:18:00, 11.53s/it] 74%|███████▍ | 2057/2774 [6:45:44<2:18:00, 11.55s/it] {'loss': 1.0312, 'learning_rate': 8.264503586096159e-07, 'epoch': 0.74} 74%|███████▍ | 2057/2774 [6:45:44<2:18:00, 11.55s/it] 74%|███████▍ | 2058/2774 [6:45:56<2:17:54, 11.56s/it] {'loss': 1.0542, 'learning_rate': 8.242825043152924e-07, 'epoch': 0.74} 74%|███████▍ | 2058/2774 [6:45:56<2:17:54, 11.56s/it] 74%|███████▍ | 2059/2774 [6:46:07<2:17:50, 11.57s/it] {'loss': 1.0605, 'learning_rate': 8.221169355984051e-07, 'epoch': 0.74} 74%|███████▍ | 2059/2774 [6:46:07<2:17:50, 11.57s/it] 74%|███████▍ | 2060/2774 [6:46:20<2:22:36, 11.98s/it] {'loss': 1.0186, 'learning_rate': 8.199536554126603e-07, 'epoch': 0.74} 74%|███████▍ | 2060/2774 [6:46:20<2:22:36, 11.98s/it] 74%|███████▍ | 2061/2774 [6:46:32<2:19:36, 11.75s/it] {'loss': 1.042, 'learning_rate': 8.177926667086399e-07, 'epoch': 0.74} 74%|███████▍ | 2061/2774 [6:46:32<2:19:36, 11.75s/it] 74%|███████▍ | 2062/2774 [6:46:43<2:18:04, 11.64s/it] {'loss': 1.0693, 'learning_rate': 8.156339724338036e-07, 'epoch': 0.74} 74%|███████▍ | 2062/2774 [6:46:43<2:18:04, 11.64s/it] 74%|███████▍ | 2063/2774 [6:46:55<2:17:48, 11.63s/it] {'loss': 0.9844, 'learning_rate': 8.134775755324784e-07, 'epoch': 0.74} 74%|███████▍ | 2063/2774 [6:46:55<2:17:48, 11.63s/it] 74%|███████▍ | 2064/2774 [6:47:06<2:16:51, 11.57s/it] {'loss': 1.0176, 'learning_rate': 8.11323478945861e-07, 'epoch': 0.74} 74%|███████▍ | 2064/2774 [6:47:06<2:16:51, 11.57s/it] 74%|███████▍ | 2065/2774 [6:47:17<2:15:13, 11.44s/it] {'loss': 0.9912, 'learning_rate': 8.09171685612008e-07, 'epoch': 0.74} 74%|███████▍ | 2065/2774 [6:47:17<2:15:13, 11.44s/it] 74%|███████▍ | 2066/2774 [6:47:29<2:15:20, 11.47s/it] {'loss': 1.0278, 'learning_rate': 8.070221984658358e-07, 'epoch': 0.74} 74%|███████▍ | 2066/2774 [6:47:29<2:15:20, 11.47s/it] 75%|███████▍ | 2067/2774 [6:47:41<2:17:39, 11.68s/it] {'loss': 1.04, 'learning_rate': 8.048750204391143e-07, 'epoch': 0.75} 75%|███████▍ | 2067/2774 [6:47:41<2:17:39, 11.68s/it] 75%|███████▍ | 2068/2774 [6:47:52<2:16:21, 11.59s/it] {'loss': 1.0083, 'learning_rate': 8.027301544604657e-07, 'epoch': 0.75} 75%|███████▍ | 2068/2774 [6:47:52<2:16:21, 11.59s/it] 75%|███████▍ | 2069/2774 [6:48:04<2:15:27, 11.53s/it] {'loss': 1.0181, 'learning_rate': 8.005876034553575e-07, 'epoch': 0.75} 75%|███████▍ | 2069/2774 [6:48:04<2:15:27, 11.53s/it] 75%|███████▍ | 2070/2774 [6:48:15<2:14:35, 11.47s/it] {'loss': 0.9946, 'learning_rate': 7.984473703460985e-07, 'epoch': 0.75} 75%|███████▍ | 2070/2774 [6:48:15<2:14:35, 11.47s/it] 75%|███████▍ | 2071/2774 [6:48:26<2:14:29, 11.48s/it] {'loss': 1.0322, 'learning_rate': 7.963094580518394e-07, 'epoch': 0.75} 75%|███████▍ | 2071/2774 [6:48:26<2:14:29, 11.48s/it] 75%|███████▍ | 2072/2774 [6:48:40<2:22:33, 12.18s/it] {'loss': 0.9697, 'learning_rate': 7.941738694885614e-07, 'epoch': 0.75} 75%|███████▍ | 2072/2774 [6:48:40<2:22:33, 12.18s/it] 75%|███████▍ | 2073/2774 [6:48:52<2:20:24, 12.02s/it] {'loss': 1.0049, 'learning_rate': 7.920406075690804e-07, 'epoch': 0.75} 75%|███████▍ | 2073/2774 [6:48:52<2:20:24, 12.02s/it] 75%|███████▍ | 2074/2774 [6:49:03<2:17:27, 11.78s/it] {'loss': 1.0376, 'learning_rate': 7.899096752030346e-07, 'epoch': 0.75} 75%|███████▍ | 2074/2774 [6:49:03<2:17:27, 11.78s/it] 75%|███████▍ | 2075/2774 [6:49:14<2:15:33, 11.64s/it] {'loss': 1.02, 'learning_rate': 7.877810752968901e-07, 'epoch': 0.75} 75%|███████▍ | 2075/2774 [6:49:14<2:15:33, 11.64s/it] 75%|███████▍ | 2076/2774 [6:49:26<2:13:43, 11.50s/it] {'loss': 1.0474, 'learning_rate': 7.856548107539247e-07, 'epoch': 0.75} 75%|███████▍ | 2076/2774 [6:49:26<2:13:43, 11.50s/it] 75%|███████▍ | 2077/2774 [6:49:37<2:14:46, 11.60s/it] {'loss': 1.0615, 'learning_rate': 7.835308844742376e-07, 'epoch': 0.75} 75%|███████▍ | 2077/2774 [6:49:37<2:14:46, 11.60s/it] 75%|███████▍ | 2078/2774 [6:49:49<2:14:49, 11.62s/it] {'loss': 1.0791, 'learning_rate': 7.814092993547342e-07, 'epoch': 0.75} 75%|███████▍ | 2078/2774 [6:49:49<2:14:49, 11.62s/it] 75%|███████▍ | 2079/2774 [6:50:01<2:14:09, 11.58s/it] {'loss': 1.0361, 'learning_rate': 7.792900582891303e-07, 'epoch': 0.75} 75%|███████▍ | 2079/2774 [6:50:01<2:14:09, 11.58s/it] 75%|███████▍ | 2080/2774 [6:50:12<2:13:23, 11.53s/it] {'loss': 1.0083, 'learning_rate': 7.771731641679406e-07, 'epoch': 0.75} 75%|███████▍ | 2080/2774 [6:50:12<2:13:23, 11.53s/it] 75%|███████▌ | 2081/2774 [6:50:24<2:14:16, 11.63s/it] {'loss': 1.0181, 'learning_rate': 7.750586198784829e-07, 'epoch': 0.75} 75%|███████▌ | 2081/2774 [6:50:24<2:14:16, 11.63s/it] 75%|███████▌ | 2082/2774 [6:50:39<2:24:55, 12.57s/it] {'loss': 1.0103, 'learning_rate': 7.72946428304866e-07, 'epoch': 0.75} 75%|███████▌ | 2082/2774 [6:50:39<2:24:55, 12.57s/it] 75%|███████▌ | 2083/2774 [6:50:50<2:21:10, 12.26s/it] {'loss': 0.9775, 'learning_rate': 7.708365923279931e-07, 'epoch': 0.75} 75%|███████▌ | 2083/2774 [6:50:50<2:21:10, 12.26s/it] 75%|███████▌ | 2084/2774 [6:51:02<2:19:54, 12.17s/it] {'loss': 1.0142, 'learning_rate': 7.687291148255527e-07, 'epoch': 0.75} 75%|███████▌ | 2084/2774 [6:51:02<2:19:54, 12.17s/it] 75%|███████▌ | 2085/2774 [6:51:14<2:19:23, 12.14s/it] {'loss': 1.0474, 'learning_rate': 7.666239986720162e-07, 'epoch': 0.75} 75%|███████▌ | 2085/2774 [6:51:14<2:19:23, 12.14s/it] 75%|███████▌ | 2086/2774 [6:51:26<2:16:38, 11.92s/it] {'loss': 1.0283, 'learning_rate': 7.645212467386346e-07, 'epoch': 0.75} 75%|███████▌ | 2086/2774 [6:51:26<2:16:38, 11.92s/it] 75%|███████▌ | 2087/2774 [6:51:37<2:13:41, 11.68s/it] {'loss': 1.0396, 'learning_rate': 7.624208618934356e-07, 'epoch': 0.75} 75%|███████▌ | 2087/2774 [6:51:37<2:13:41, 11.68s/it] 75%|███████▌ | 2088/2774 [6:51:48<2:12:05, 11.55s/it] {'loss': 0.9951, 'learning_rate': 7.603228470012162e-07, 'epoch': 0.75} 75%|███████▌ | 2088/2774 [6:51:48<2:12:05, 11.55s/it] 75%|███████▌ | 2089/2774 [6:52:00<2:11:55, 11.56s/it] {'loss': 1.0215, 'learning_rate': 7.582272049235431e-07, 'epoch': 0.75} 75%|███████▌ | 2089/2774 [6:52:00<2:11:55, 11.56s/it] 75%|███████▌ | 2090/2774 [6:52:13<2:17:09, 12.03s/it] {'loss': 1.0024, 'learning_rate': 7.561339385187449e-07, 'epoch': 0.75} 75%|███████▌ | 2090/2774 [6:52:13<2:17:09, 12.03s/it] 75%|███████▌ | 2091/2774 [6:52:25<2:19:38, 12.27s/it] {'loss': 0.9609, 'learning_rate': 7.540430506419099e-07, 'epoch': 0.75} 75%|███████▌ | 2091/2774 [6:52:25<2:19:38, 12.27s/it] 75%|███████▌ | 2092/2774 [6:52:37<2:17:10, 12.07s/it] {'loss': 1.0498, 'learning_rate': 7.519545441448842e-07, 'epoch': 0.75} 75%|███████▌ | 2092/2774 [6:52:37<2:17:10, 12.07s/it] 75%|███████▌ | 2093/2774 [6:52:50<2:19:03, 12.25s/it] {'loss': 1.0283, 'learning_rate': 7.498684218762639e-07, 'epoch': 0.75} 75%|███████▌ | 2093/2774 [6:52:50<2:19:03, 12.25s/it] 75%|███████▌ | 2094/2774 [6:53:01<2:16:17, 12.03s/it] {'loss': 1.0894, 'learning_rate': 7.477846866813934e-07, 'epoch': 0.75} 75%|███████▌ | 2094/2774 [6:53:01<2:16:17, 12.03s/it] 76%|███████▌ | 2095/2774 [6:53:12<2:12:48, 11.74s/it] {'loss': 1.0615, 'learning_rate': 7.457033414023613e-07, 'epoch': 0.76} 76%|███████▌ | 2095/2774 [6:53:12<2:12:48, 11.74s/it] 76%|███████▌ | 2096/2774 [6:53:25<2:15:29, 11.99s/it] {'loss': 1.0386, 'learning_rate': 7.436243888779982e-07, 'epoch': 0.76} 76%|███████▌ | 2096/2774 [6:53:25<2:15:29, 11.99s/it] 76%|███████▌ | 2097/2774 [6:53:36<2:13:37, 11.84s/it] {'loss': 1.0293, 'learning_rate': 7.41547831943868e-07, 'epoch': 0.76} 76%|███████▌ | 2097/2774 [6:53:36<2:13:37, 11.84s/it] 76%|███████▌ | 2098/2774 [6:53:50<2:18:07, 12.26s/it] {'loss': 0.9868, 'learning_rate': 7.394736734322705e-07, 'epoch': 0.76} 76%|███████▌ | 2098/2774 [6:53:50<2:18:07, 12.26s/it] 76%|███████▌ | 2099/2774 [6:54:01<2:14:52, 11.99s/it] {'loss': 1.0449, 'learning_rate': 7.374019161722315e-07, 'epoch': 0.76} 76%|███████▌ | 2099/2774 [6:54:01<2:14:52, 11.99s/it] 76%|███████▌ | 2100/2774 [6:54:14<2:19:16, 12.40s/it] {'loss': 1.0029, 'learning_rate': 7.353325629895039e-07, 'epoch': 0.76} 76%|███████▌ | 2100/2774 [6:54:14<2:19:16, 12.40s/it] 76%|███████▌ | 2101/2774 [6:54:26<2:15:11, 12.05s/it] {'loss': 0.9878, 'learning_rate': 7.332656167065591e-07, 'epoch': 0.76} 76%|███████▌ | 2101/2774 [6:54:26<2:15:11, 12.05s/it] 76%|███████▌ | 2102/2774 [6:54:37<2:13:56, 11.96s/it] {'loss': 1.0098, 'learning_rate': 7.312010801425892e-07, 'epoch': 0.76} 76%|███████▌ | 2102/2774 [6:54:37<2:13:56, 11.96s/it] 76%|███████▌ | 2103/2774 [6:54:49<2:13:07, 11.90s/it] {'loss': 1.0288, 'learning_rate': 7.29138956113494e-07, 'epoch': 0.76} 76%|███████▌ | 2103/2774 [6:54:49<2:13:07, 11.90s/it] 76%|███████▌ | 2104/2774 [6:55:01<2:11:11, 11.75s/it] {'loss': 0.9541, 'learning_rate': 7.270792474318889e-07, 'epoch': 0.76} 76%|███████▌ | 2104/2774 [6:55:01<2:11:11, 11.75s/it] 76%|███████▌ | 2105/2774 [6:55:14<2:15:48, 12.18s/it] {'loss': 0.9346, 'learning_rate': 7.250219569070904e-07, 'epoch': 0.76} 76%|███████▌ | 2105/2774 [6:55:14<2:15:48, 12.18s/it] 76%|███████▌ | 2106/2774 [6:55:25<2:12:57, 11.94s/it] {'loss': 1.0762, 'learning_rate': 7.229670873451197e-07, 'epoch': 0.76} 76%|███████▌ | 2106/2774 [6:55:25<2:12:57, 11.94s/it] 76%|███████▌ | 2107/2774 [6:55:38<2:14:51, 12.13s/it] {'loss': 0.9526, 'learning_rate': 7.20914641548694e-07, 'epoch': 0.76} 76%|███████▌ | 2107/2774 [6:55:38<2:14:51, 12.13s/it] 76%|███████▌ | 2108/2774 [6:55:49<2:12:22, 11.93s/it] {'loss': 1.0376, 'learning_rate': 7.18864622317226e-07, 'epoch': 0.76} 76%|███████▌ | 2108/2774 [6:55:49<2:12:22, 11.93s/it] 76%|███████▌ | 2109/2774 [6:56:01<2:11:02, 11.82s/it] {'loss': 1.0493, 'learning_rate': 7.168170324468171e-07, 'epoch': 0.76} 76%|███████▌ | 2109/2774 [6:56:01<2:11:02, 11.82s/it] 76%|███████▌ | 2110/2774 [6:56:13<2:11:10, 11.85s/it] {'loss': 0.9941, 'learning_rate': 7.147718747302577e-07, 'epoch': 0.76} 76%|███████▌ | 2110/2774 [6:56:13<2:11:10, 11.85s/it] 76%|███████▌ | 2111/2774 [6:56:24<2:09:56, 11.76s/it] {'loss': 1.0103, 'learning_rate': 7.127291519570184e-07, 'epoch': 0.76} 76%|███████▌ | 2111/2774 [6:56:24<2:09:56, 11.76s/it] 76%|███████▌ | 2112/2774 [6:56:36<2:08:44, 11.67s/it] {'loss': 0.978, 'learning_rate': 7.106888669132497e-07, 'epoch': 0.76} 76%|███████▌ | 2112/2774 [6:56:36<2:08:44, 11.67s/it] 76%|███████▌ | 2113/2774 [6:56:47<2:08:19, 11.65s/it] {'loss': 1.063, 'learning_rate': 7.086510223817766e-07, 'epoch': 0.76} 76%|███████▌ | 2113/2774 [6:56:47<2:08:19, 11.65s/it] 76%|███████▌ | 2114/2774 [6:56:59<2:07:10, 11.56s/it] {'loss': 0.9873, 'learning_rate': 7.066156211420975e-07, 'epoch': 0.76} 76%|███████▌ | 2114/2774 [6:56:59<2:07:10, 11.56s/it] 76%|███████▌ | 2115/2774 [6:57:11<2:08:29, 11.70s/it] {'loss': 1.0122, 'learning_rate': 7.045826659703756e-07, 'epoch': 0.76} 76%|███████▌ | 2115/2774 [6:57:11<2:08:29, 11.70s/it] 76%|███████▋ | 2116/2774 [6:57:22<2:08:00, 11.67s/it] {'loss': 1.0859, 'learning_rate': 7.025521596394382e-07, 'epoch': 0.76} 76%|███████▋ | 2116/2774 [6:57:22<2:08:00, 11.67s/it] 76%|███████▋ | 2117/2774 [6:57:34<2:07:05, 11.61s/it] {'loss': 1.0078, 'learning_rate': 7.005241049187752e-07, 'epoch': 0.76} 76%|███████▋ | 2117/2774 [6:57:34<2:07:05, 11.61s/it] 76%|███████▋ | 2118/2774 [6:57:47<2:12:26, 12.11s/it] {'loss': 0.9414, 'learning_rate': 6.98498504574529e-07, 'epoch': 0.76} 76%|███████▋ | 2118/2774 [6:57:47<2:12:26, 12.11s/it] 76%|███████▋ | 2119/2774 [6:57:58<2:10:16, 11.93s/it] {'loss': 1.0469, 'learning_rate': 6.964753613694977e-07, 'epoch': 0.76} 76%|███████▋ | 2119/2774 [6:57:58<2:10:16, 11.93s/it] 76%|███████▋ | 2120/2774 [6:58:10<2:09:19, 11.87s/it] {'loss': 1.0254, 'learning_rate': 6.944546780631256e-07, 'epoch': 0.76} 76%|███████▋ | 2120/2774 [6:58:10<2:09:19, 11.87s/it] 76%|███████▋ | 2121/2774 [6:58:22<2:08:10, 11.78s/it] {'loss': 1.002, 'learning_rate': 6.924364574115025e-07, 'epoch': 0.76} 76%|███████▋ | 2121/2774 [6:58:22<2:08:10, 11.78s/it] 76%|███████▋ | 2122/2774 [6:58:33<2:05:59, 11.59s/it] {'loss': 1.042, 'learning_rate': 6.90420702167359e-07, 'epoch': 0.76} 76%|███████▋ | 2122/2774 [6:58:33<2:05:59, 11.59s/it] 77%|███████▋ | 2123/2774 [6:58:44<2:04:43, 11.50s/it] {'loss': 1.0161, 'learning_rate': 6.884074150800649e-07, 'epoch': 0.77} 77%|███████▋ | 2123/2774 [6:58:44<2:04:43, 11.50s/it] 77%|███████▋ | 2124/2774 [6:58:56<2:04:11, 11.46s/it] {'loss': 1.0181, 'learning_rate': 6.863965988956203e-07, 'epoch': 0.77} 77%|███████▋ | 2124/2774 [6:58:56<2:04:11, 11.46s/it] 77%|███████▋ | 2125/2774 [6:59:07<2:03:25, 11.41s/it] {'loss': 0.998, 'learning_rate': 6.843882563566589e-07, 'epoch': 0.77} 77%|███████▋ | 2125/2774 [6:59:07<2:03:25, 11.41s/it] 77%|███████▋ | 2126/2774 [6:59:18<2:03:35, 11.44s/it] {'loss': 1.0073, 'learning_rate': 6.82382390202437e-07, 'epoch': 0.77} 77%|███████▋ | 2126/2774 [6:59:18<2:03:35, 11.44s/it] 77%|███████▋ | 2127/2774 [6:59:30<2:04:22, 11.53s/it] {'loss': 0.9951, 'learning_rate': 6.803790031688365e-07, 'epoch': 0.77} 77%|███████▋ | 2127/2774 [6:59:30<2:04:22, 11.53s/it] 77%|███████▋ | 2128/2774 [6:59:42<2:03:55, 11.51s/it] {'loss': 1.0215, 'learning_rate': 6.783780979883548e-07, 'epoch': 0.77} 77%|███████▋ | 2128/2774 [6:59:42<2:03:55, 11.51s/it] 77%|███████▋ | 2129/2774 [6:59:53<2:03:03, 11.45s/it] {'loss': 1.0542, 'learning_rate': 6.763796773901074e-07, 'epoch': 0.77} 77%|███████▋ | 2129/2774 [6:59:53<2:03:03, 11.45s/it] 77%|███████▋ | 2130/2774 [7:00:04<2:02:31, 11.41s/it] {'loss': 1.0273, 'learning_rate': 6.743837440998169e-07, 'epoch': 0.77} 77%|███████▋ | 2130/2774 [7:00:04<2:02:31, 11.41s/it] 77%|███████▋ | 2131/2774 [7:00:16<2:03:53, 11.56s/it] {'loss': 1.0713, 'learning_rate': 6.723903008398178e-07, 'epoch': 0.77} 77%|███████▋ | 2131/2774 [7:00:16<2:03:53, 11.56s/it] 77%|███████▋ | 2132/2774 [7:00:27<2:02:20, 11.43s/it] {'loss': 1.0767, 'learning_rate': 6.703993503290448e-07, 'epoch': 0.77} 77%|███████▋ | 2132/2774 [7:00:27<2:02:20, 11.43s/it] 77%|███████▋ | 2133/2774 [7:00:40<2:04:51, 11.69s/it] {'loss': 1.064, 'learning_rate': 6.684108952830354e-07, 'epoch': 0.77} 77%|███████▋ | 2133/2774 [7:00:40<2:04:51, 11.69s/it] 77%|███████▋ | 2134/2774 [7:00:51<2:04:44, 11.69s/it] {'loss': 1.0327, 'learning_rate': 6.66424938413921e-07, 'epoch': 0.77} 77%|███████▋ | 2134/2774 [7:00:51<2:04:44, 11.69s/it] 77%|███████▋ | 2135/2774 [7:01:03<2:04:26, 11.68s/it] {'loss': 1.0454, 'learning_rate': 6.644414824304282e-07, 'epoch': 0.77} 77%|███████▋ | 2135/2774 [7:01:03<2:04:26, 11.68s/it] 77%|███████▋ | 2136/2774 [7:01:15<2:04:14, 11.68s/it] {'loss': 1.0835, 'learning_rate': 6.624605300378703e-07, 'epoch': 0.77} 77%|███████▋ | 2136/2774 [7:01:15<2:04:14, 11.68s/it] 77%|███████▋ | 2137/2774 [7:01:27<2:04:53, 11.76s/it] {'loss': 0.9961, 'learning_rate': 6.604820839381459e-07, 'epoch': 0.77} 77%|███████▋ | 2137/2774 [7:01:27<2:04:53, 11.76s/it] 77%|███████▋ | 2138/2774 [7:01:38<2:03:36, 11.66s/it] {'loss': 1.002, 'learning_rate': 6.585061468297377e-07, 'epoch': 0.77} 77%|███████▋ | 2138/2774 [7:01:38<2:03:36, 11.66s/it] 77%|███████▋ | 2139/2774 [7:01:49<2:02:34, 11.58s/it] {'loss': 0.9863, 'learning_rate': 6.565327214077033e-07, 'epoch': 0.77} 77%|███████▋ | 2139/2774 [7:01:49<2:02:34, 11.58s/it] 77%|███████▋ | 2140/2774 [7:02:00<2:00:29, 11.40s/it] {'loss': 1.0127, 'learning_rate': 6.545618103636764e-07, 'epoch': 0.77} 77%|███████▋ | 2140/2774 [7:02:00<2:00:29, 11.40s/it] 77%|███████▋ | 2141/2774 [7:02:12<2:00:37, 11.43s/it] {'loss': 1.0117, 'learning_rate': 6.525934163858597e-07, 'epoch': 0.77} 77%|███████▋ | 2141/2774 [7:02:12<2:00:37, 11.43s/it] 77%|███████▋ | 2142/2774 [7:02:23<2:00:29, 11.44s/it] {'loss': 1.0298, 'learning_rate': 6.50627542159025e-07, 'epoch': 0.77} 77%|███████▋ | 2142/2774 [7:02:23<2:00:29, 11.44s/it] 77%|███████▋ | 2143/2774 [7:02:35<2:00:22, 11.45s/it] {'loss': 1.0098, 'learning_rate': 6.486641903645044e-07, 'epoch': 0.77} 77%|███████▋ | 2143/2774 [7:02:35<2:00:22, 11.45s/it] 77%|███████▋ | 2144/2774 [7:02:46<2:00:36, 11.49s/it] {'loss': 1.0469, 'learning_rate': 6.467033636801928e-07, 'epoch': 0.77} 77%|███████▋ | 2144/2774 [7:02:46<2:00:36, 11.49s/it] 77%|███████▋ | 2145/2774 [7:02:58<2:00:54, 11.53s/it] {'loss': 1.0576, 'learning_rate': 6.447450647805378e-07, 'epoch': 0.77} 77%|███████▋ | 2145/2774 [7:02:58<2:00:54, 11.53s/it] 77%|███████▋ | 2146/2774 [7:03:10<2:03:02, 11.76s/it] {'loss': 1.0161, 'learning_rate': 6.427892963365425e-07, 'epoch': 0.77} 77%|███████▋ | 2146/2774 [7:03:10<2:03:02, 11.76s/it] 77%|███████▋ | 2147/2774 [7:03:22<2:02:26, 11.72s/it] {'loss': 1.043, 'learning_rate': 6.40836061015756e-07, 'epoch': 0.77} 77%|███████▋ | 2147/2774 [7:03:22<2:02:26, 11.72s/it] 77%|███████▋ | 2148/2774 [7:03:33<2:01:35, 11.65s/it] {'loss': 1.022, 'learning_rate': 6.388853614822732e-07, 'epoch': 0.77} 77%|███████▋ | 2148/2774 [7:03:33<2:01:35, 11.65s/it] 77%|███████▋ | 2149/2774 [7:03:45<2:00:13, 11.54s/it] {'loss': 1.0615, 'learning_rate': 6.369372003967297e-07, 'epoch': 0.77} 77%|███████▋ | 2149/2774 [7:03:45<2:00:13, 11.54s/it] 78%|███████▊ | 2150/2774 [7:03:57<2:01:00, 11.64s/it] {'loss': 1.0132, 'learning_rate': 6.349915804163012e-07, 'epoch': 0.78} 78%|███████▊ | 2150/2774 [7:03:57<2:01:00, 11.64s/it] 78%|███████▊ | 2151/2774 [7:04:08<1:59:35, 11.52s/it] {'loss': 0.9966, 'learning_rate': 6.330485041946943e-07, 'epoch': 0.78} 78%|███████▊ | 2151/2774 [7:04:08<1:59:35, 11.52s/it] 78%|███████▊ | 2152/2774 [7:04:19<1:59:58, 11.57s/it] {'loss': 1.0664, 'learning_rate': 6.311079743821489e-07, 'epoch': 0.78} 78%|███████▊ | 2152/2774 [7:04:19<1:59:58, 11.57s/it] 78%|███████▊ | 2153/2774 [7:04:32<2:03:06, 11.89s/it] {'loss': 1.0845, 'learning_rate': 6.29169993625429e-07, 'epoch': 0.78} 78%|███████▊ | 2153/2774 [7:04:32<2:03:06, 11.89s/it] 78%|███████▊ | 2154/2774 [7:04:43<2:00:43, 11.68s/it] {'loss': 1.0146, 'learning_rate': 6.272345645678249e-07, 'epoch': 0.78} 78%|███████▊ | 2154/2774 [7:04:43<2:00:43, 11.68s/it] 78%|███████▊ | 2155/2774 [7:04:55<1:59:40, 11.60s/it] {'loss': 1.0488, 'learning_rate': 6.253016898491435e-07, 'epoch': 0.78} 78%|███████▊ | 2155/2774 [7:04:55<1:59:40, 11.60s/it] 78%|███████▊ | 2156/2774 [7:05:06<2:00:00, 11.65s/it] {'loss': 1.0776, 'learning_rate': 6.233713721057108e-07, 'epoch': 0.78} 78%|███████▊ | 2156/2774 [7:05:06<2:00:00, 11.65s/it] 78%|███████▊ | 2157/2774 [7:05:18<1:59:13, 11.59s/it] {'loss': 1.0864, 'learning_rate': 6.214436139703614e-07, 'epoch': 0.78} 78%|███████▊ | 2157/2774 [7:05:18<1:59:13, 11.59s/it] 78%|███████▊ | 2158/2774 [7:05:30<2:00:01, 11.69s/it] {'loss': 1.0542, 'learning_rate': 6.195184180724429e-07, 'epoch': 0.78} 78%|███████▊ | 2158/2774 [7:05:30<2:00:01, 11.69s/it] 78%|███████▊ | 2159/2774 [7:05:41<1:59:40, 11.67s/it] {'loss': 1.0366, 'learning_rate': 6.175957870378043e-07, 'epoch': 0.78} 78%|███████▊ | 2159/2774 [7:05:41<1:59:40, 11.67s/it] 78%|███████▊ | 2160/2774 [7:05:54<2:02:36, 11.98s/it] {'loss': 1.0381, 'learning_rate': 6.156757234888006e-07, 'epoch': 0.78} 78%|███████▊ | 2160/2774 [7:05:54<2:02:36, 11.98s/it] 78%|███████▊ | 2161/2774 [7:06:06<2:01:26, 11.89s/it] {'loss': 1.0674, 'learning_rate': 6.137582300442807e-07, 'epoch': 0.78} 78%|███████▊ | 2161/2774 [7:06:06<2:01:26, 11.89s/it] 78%|███████▊ | 2162/2774 [7:06:19<2:04:39, 12.22s/it] {'loss': 1.0059, 'learning_rate': 6.118433093195897e-07, 'epoch': 0.78} 78%|███████▊ | 2162/2774 [7:06:19<2:04:39, 12.22s/it] 78%|███████▊ | 2163/2774 [7:06:30<2:01:44, 11.95s/it] {'loss': 1.0752, 'learning_rate': 6.099309639265652e-07, 'epoch': 0.78} 78%|███████▊ | 2163/2774 [7:06:30<2:01:44, 11.95s/it] 78%|███████▊ | 2164/2774 [7:06:42<2:00:36, 11.86s/it] {'loss': 1.0649, 'learning_rate': 6.080211964735292e-07, 'epoch': 0.78} 78%|███████▊ | 2164/2774 [7:06:42<2:00:36, 11.86s/it] 78%|███████▊ | 2165/2774 [7:06:53<1:59:12, 11.75s/it] {'loss': 1.0635, 'learning_rate': 6.061140095652906e-07, 'epoch': 0.78} 78%|███████▊ | 2165/2774 [7:06:53<1:59:12, 11.75s/it] 78%|███████▊ | 2166/2774 [7:07:05<1:58:14, 11.67s/it] {'loss': 1.0542, 'learning_rate': 6.042094058031367e-07, 'epoch': 0.78} 78%|███████▊ | 2166/2774 [7:07:05<1:58:14, 11.67s/it] 78%|███████▊ | 2167/2774 [7:07:17<2:00:14, 11.88s/it] {'loss': 1.0269, 'learning_rate': 6.023073877848314e-07, 'epoch': 0.78} 78%|███████▊ | 2167/2774 [7:07:17<2:00:14, 11.88s/it] 78%|███████▊ | 2168/2774 [7:07:29<1:58:19, 11.72s/it] {'loss': 1.0161, 'learning_rate': 6.004079581046123e-07, 'epoch': 0.78} 78%|███████▊ | 2168/2774 [7:07:29<1:58:19, 11.72s/it] 78%|███████▊ | 2169/2774 [7:07:41<1:59:08, 11.82s/it] {'loss': 1.0229, 'learning_rate': 5.985111193531878e-07, 'epoch': 0.78} 78%|███████▊ | 2169/2774 [7:07:41<1:59:08, 11.82s/it] 78%|███████▊ | 2170/2774 [7:07:52<1:58:15, 11.75s/it] {'loss': 0.9775, 'learning_rate': 5.9661687411773e-07, 'epoch': 0.78} 78%|███████▊ | 2170/2774 [7:07:52<1:58:15, 11.75s/it] 78%|███████▊ | 2171/2774 [7:08:05<2:00:43, 12.01s/it] {'loss': 1.0293, 'learning_rate': 5.947252249818764e-07, 'epoch': 0.78} 78%|███████▊ | 2171/2774 [7:08:05<2:00:43, 12.01s/it] 78%|███████▊ | 2172/2774 [7:08:16<1:59:05, 11.87s/it] {'loss': 1.04, 'learning_rate': 5.928361745257207e-07, 'epoch': 0.78} 78%|███████▊ | 2172/2774 [7:08:16<1:59:05, 11.87s/it] 78%|███████▊ | 2173/2774 [7:08:29<2:01:09, 12.10s/it] {'loss': 0.9697, 'learning_rate': 5.909497253258153e-07, 'epoch': 0.78} 78%|███████▊ | 2173/2774 [7:08:29<2:01:09, 12.10s/it] 78%|███████▊ | 2174/2774 [7:08:40<1:58:33, 11.86s/it] {'loss': 1.0015, 'learning_rate': 5.890658799551619e-07, 'epoch': 0.78} 78%|███████▊ | 2174/2774 [7:08:40<1:58:33, 11.86s/it] 78%|███████▊ | 2175/2774 [7:08:52<1:58:54, 11.91s/it] {'loss': 1.0601, 'learning_rate': 5.871846409832119e-07, 'epoch': 0.78} 78%|███████▊ | 2175/2774 [7:08:52<1:58:54, 11.91s/it] 78%|███████▊ | 2176/2774 [7:09:04<1:57:14, 11.76s/it] {'loss': 1.0078, 'learning_rate': 5.853060109758608e-07, 'epoch': 0.78} 78%|███████▊ | 2176/2774 [7:09:04<1:57:14, 11.76s/it] 78%|███████▊ | 2177/2774 [7:09:15<1:56:02, 11.66s/it] {'loss': 1.0303, 'learning_rate': 5.834299924954482e-07, 'epoch': 0.78} 78%|███████▊ | 2177/2774 [7:09:15<1:56:02, 11.66s/it] 79%|███████▊ | 2178/2774 [7:09:28<1:59:42, 12.05s/it] {'loss': 0.9976, 'learning_rate': 5.815565881007481e-07, 'epoch': 0.79} 79%|███████▊ | 2178/2774 [7:09:28<1:59:42, 12.05s/it] 79%|███████▊ | 2179/2774 [7:09:40<1:58:27, 11.94s/it] {'loss': 1.0254, 'learning_rate': 5.796858003469727e-07, 'epoch': 0.79} 79%|███████▊ | 2179/2774 [7:09:40<1:58:27, 11.94s/it] 79%|███████▊ | 2180/2774 [7:09:51<1:56:41, 11.79s/it] {'loss': 1.0244, 'learning_rate': 5.778176317857618e-07, 'epoch': 0.79} 79%|███████▊ | 2180/2774 [7:09:51<1:56:41, 11.79s/it] 79%|███████▊ | 2181/2774 [7:10:04<1:59:45, 12.12s/it] {'loss': 0.9849, 'learning_rate': 5.759520849651862e-07, 'epoch': 0.79} 79%|███████▊ | 2181/2774 [7:10:04<1:59:45, 12.12s/it] 79%|███████▊ | 2182/2774 [7:10:16<1:57:55, 11.95s/it] {'loss': 1.0073, 'learning_rate': 5.740891624297381e-07, 'epoch': 0.79} 79%|███████▊ | 2182/2774 [7:10:16<1:57:55, 11.95s/it] 79%|███████▊ | 2183/2774 [7:10:27<1:55:28, 11.72s/it] {'loss': 1.0234, 'learning_rate': 5.722288667203315e-07, 'epoch': 0.79} 79%|███████▊ | 2183/2774 [7:10:27<1:55:28, 11.72s/it] 79%|███████▊ | 2184/2774 [7:10:39<1:55:35, 11.76s/it] {'loss': 1.0645, 'learning_rate': 5.703712003742965e-07, 'epoch': 0.79} 79%|███████▊ | 2184/2774 [7:10:39<1:55:35, 11.76s/it] 79%|███████▉ | 2185/2774 [7:10:50<1:53:24, 11.55s/it] {'loss': 1.0874, 'learning_rate': 5.685161659253791e-07, 'epoch': 0.79} 79%|███████▉ | 2185/2774 [7:10:50<1:53:24, 11.55s/it] 79%|███████▉ | 2186/2774 [7:11:01<1:52:49, 11.51s/it] {'loss': 0.9956, 'learning_rate': 5.666637659037338e-07, 'epoch': 0.79} 79%|███████▉ | 2186/2774 [7:11:01<1:52:49, 11.51s/it] 79%|███████▉ | 2187/2774 [7:11:13<1:52:36, 11.51s/it] {'loss': 1.0483, 'learning_rate': 5.648140028359214e-07, 'epoch': 0.79} 79%|███████▉ | 2187/2774 [7:11:13<1:52:36, 11.51s/it] 79%|███████▉ | 2188/2774 [7:11:24<1:52:25, 11.51s/it] {'loss': 1.0186, 'learning_rate': 5.629668792449086e-07, 'epoch': 0.79} 79%|███████▉ | 2188/2774 [7:11:24<1:52:25, 11.51s/it] 79%|███████▉ | 2189/2774 [7:11:36<1:53:17, 11.62s/it] {'loss': 0.998, 'learning_rate': 5.611223976500591e-07, 'epoch': 0.79} 79%|███████▉ | 2189/2774 [7:11:36<1:53:17, 11.62s/it] 79%|███████▉ | 2190/2774 [7:11:48<1:52:39, 11.57s/it] {'loss': 1.0479, 'learning_rate': 5.59280560567135e-07, 'epoch': 0.79} 79%|███████▉ | 2190/2774 [7:11:48<1:52:39, 11.57s/it] 79%|███████▉ | 2191/2774 [7:12:00<1:53:52, 11.72s/it] {'loss': 0.9741, 'learning_rate': 5.574413705082904e-07, 'epoch': 0.79} 79%|███████▉ | 2191/2774 [7:12:00<1:53:52, 11.72s/it] 79%|███████▉ | 2192/2774 [7:12:11<1:52:26, 11.59s/it] {'loss': 1.1064, 'learning_rate': 5.55604829982071e-07, 'epoch': 0.79} 79%|███████▉ | 2192/2774 [7:12:11<1:52:26, 11.59s/it] 79%|███████▉ | 2193/2774 [7:12:22<1:52:01, 11.57s/it] {'loss': 1.0767, 'learning_rate': 5.537709414934045e-07, 'epoch': 0.79} 79%|███████▉ | 2193/2774 [7:12:22<1:52:01, 11.57s/it] 79%|███████▉ | 2194/2774 [7:12:34<1:51:37, 11.55s/it] {'loss': 1.0308, 'learning_rate': 5.519397075436058e-07, 'epoch': 0.79} 79%|███████▉ | 2194/2774 [7:12:34<1:51:37, 11.55s/it] 79%|███████▉ | 2195/2774 [7:12:45<1:51:16, 11.53s/it] {'loss': 1.0386, 'learning_rate': 5.501111306303666e-07, 'epoch': 0.79} 79%|███████▉ | 2195/2774 [7:12:45<1:51:16, 11.53s/it] 79%|███████▉ | 2196/2774 [7:12:58<1:54:13, 11.86s/it] {'loss': 1.0103, 'learning_rate': 5.482852132477562e-07, 'epoch': 0.79} 79%|███████▉ | 2196/2774 [7:12:58<1:54:13, 11.86s/it] 79%|███████▉ | 2197/2774 [7:13:10<1:53:23, 11.79s/it] {'loss': 1.0752, 'learning_rate': 5.464619578862143e-07, 'epoch': 0.79} 79%|███████▉ | 2197/2774 [7:13:10<1:53:23, 11.79s/it] 79%|███████▉ | 2198/2774 [7:13:21<1:51:24, 11.61s/it] {'loss': 0.98, 'learning_rate': 5.446413670325529e-07, 'epoch': 0.79} 79%|███████▉ | 2198/2774 [7:13:21<1:51:24, 11.61s/it] 79%|███████▉ | 2199/2774 [7:13:32<1:51:06, 11.59s/it] {'loss': 0.999, 'learning_rate': 5.428234431699459e-07, 'epoch': 0.79} 79%|███████▉ | 2199/2774 [7:13:32<1:51:06, 11.59s/it] 79%|███████▉ | 2200/2774 [7:13:45<1:53:15, 11.84s/it] {'loss': 1.0127, 'learning_rate': 5.410081887779334e-07, 'epoch': 0.79} 79%|███████▉ | 2200/2774 [7:13:45<1:53:15, 11.84s/it] 79%|███████▉ | 2201/2774 [7:13:56<1:50:58, 11.62s/it] {'loss': 0.9937, 'learning_rate': 5.391956063324122e-07, 'epoch': 0.79} 79%|███████▉ | 2201/2774 [7:13:56<1:50:58, 11.62s/it] 79%|███████▉ | 2202/2774 [7:14:07<1:50:28, 11.59s/it] {'loss': 1.0317, 'learning_rate': 5.373856983056347e-07, 'epoch': 0.79} 79%|███████▉ | 2202/2774 [7:14:07<1:50:28, 11.59s/it] 79%|███████▉ | 2203/2774 [7:14:19<1:51:18, 11.70s/it] {'loss': 1.0205, 'learning_rate': 5.355784671662059e-07, 'epoch': 0.79} 79%|███████▉ | 2203/2774 [7:14:19<1:51:18, 11.70s/it] 79%|███████▉ | 2204/2774 [7:14:30<1:49:29, 11.53s/it] {'loss': 1.0713, 'learning_rate': 5.337739153790813e-07, 'epoch': 0.79} 79%|███████▉ | 2204/2774 [7:14:30<1:49:29, 11.53s/it] 79%|███████▉ | 2205/2774 [7:14:42<1:50:19, 11.63s/it] {'loss': 1.0532, 'learning_rate': 5.31972045405559e-07, 'epoch': 0.79} 79%|███████▉ | 2205/2774 [7:14:42<1:50:19, 11.63s/it] 80%|███████▉ | 2206/2774 [7:14:56<1:54:50, 12.13s/it] {'loss': 0.9907, 'learning_rate': 5.301728597032821e-07, 'epoch': 0.8} 80%|███████▉ | 2206/2774 [7:14:56<1:54:50, 12.13s/it] 80%|███████▉ | 2207/2774 [7:15:07<1:53:12, 11.98s/it] {'loss': 1.0498, 'learning_rate': 5.283763607262305e-07, 'epoch': 0.8} 80%|███████▉ | 2207/2774 [7:15:07<1:53:12, 11.98s/it] 80%|███████▉ | 2208/2774 [7:15:19<1:50:58, 11.76s/it] {'loss': 0.9775, 'learning_rate': 5.265825509247199e-07, 'epoch': 0.8} 80%|███████▉ | 2208/2774 [7:15:19<1:50:58, 11.76s/it] 80%|███████▉ | 2209/2774 [7:15:30<1:49:52, 11.67s/it] {'loss': 0.9893, 'learning_rate': 5.247914327453996e-07, 'epoch': 0.8} 80%|███████▉ | 2209/2774 [7:15:30<1:49:52, 11.67s/it] 80%|███████▉ | 2210/2774 [7:15:42<1:49:52, 11.69s/it] {'loss': 1.0137, 'learning_rate': 5.23003008631246e-07, 'epoch': 0.8} 80%|███████▉ | 2210/2774 [7:15:42<1:49:52, 11.69s/it] 80%|███████▉ | 2211/2774 [7:15:53<1:49:32, 11.67s/it] {'loss': 0.9878, 'learning_rate': 5.212172810215607e-07, 'epoch': 0.8} 80%|███████▉ | 2211/2774 [7:15:53<1:49:32, 11.67s/it] 80%|███████▉ | 2212/2774 [7:16:05<1:48:34, 11.59s/it] {'loss': 1.0649, 'learning_rate': 5.194342523519699e-07, 'epoch': 0.8} 80%|███████▉ | 2212/2774 [7:16:05<1:48:34, 11.59s/it] 80%|███████▉ | 2213/2774 [7:16:16<1:47:13, 11.47s/it] {'loss': 1.0488, 'learning_rate': 5.176539250544163e-07, 'epoch': 0.8} 80%|███████▉ | 2213/2774 [7:16:16<1:47:13, 11.47s/it] 80%|███████▉ | 2214/2774 [7:16:28<1:48:52, 11.67s/it] {'loss': 0.9814, 'learning_rate': 5.158763015571581e-07, 'epoch': 0.8} 80%|███████▉ | 2214/2774 [7:16:28<1:48:52, 11.67s/it] 80%|███████▉ | 2215/2774 [7:16:39<1:47:54, 11.58s/it] {'loss': 1.0552, 'learning_rate': 5.141013842847672e-07, 'epoch': 0.8} 80%|███████▉ | 2215/2774 [7:16:39<1:47:54, 11.58s/it] 80%|███████▉ | 2216/2774 [7:16:51<1:47:44, 11.58s/it] {'loss': 1.0396, 'learning_rate': 5.123291756581231e-07, 'epoch': 0.8} 80%|███████▉ | 2216/2774 [7:16:51<1:47:44, 11.58s/it] 80%|███████▉ | 2217/2774 [7:17:02<1:47:02, 11.53s/it] {'loss': 1.0317, 'learning_rate': 5.105596780944122e-07, 'epoch': 0.8} 80%|███████▉ | 2217/2774 [7:17:02<1:47:02, 11.53s/it] 80%|███████▉ | 2218/2774 [7:17:14<1:45:47, 11.42s/it] {'loss': 1.0396, 'learning_rate': 5.087928940071207e-07, 'epoch': 0.8} 80%|███████▉ | 2218/2774 [7:17:14<1:45:47, 11.42s/it] 80%|███████▉ | 2219/2774 [7:17:25<1:45:33, 11.41s/it] {'loss': 0.9702, 'learning_rate': 5.07028825806038e-07, 'epoch': 0.8} 80%|███████▉ | 2219/2774 [7:17:25<1:45:33, 11.41s/it] 80%|████████ | 2220/2774 [7:17:36<1:45:18, 11.40s/it] {'loss': 0.981, 'learning_rate': 5.052674758972431e-07, 'epoch': 0.8} 80%|████████ | 2220/2774 [7:17:36<1:45:18, 11.40s/it] 80%|████████ | 2221/2774 [7:17:50<1:49:54, 11.92s/it] {'loss': 0.9531, 'learning_rate': 5.035088466831134e-07, 'epoch': 0.8} 80%|████████ | 2221/2774 [7:17:50<1:49:54, 11.92s/it] 80%|████████ | 2222/2774 [7:18:01<1:48:34, 11.80s/it] {'loss': 1.0352, 'learning_rate': 5.017529405623115e-07, 'epoch': 0.8} 80%|████████ | 2222/2774 [7:18:01<1:48:34, 11.80s/it] 80%|████████ | 2223/2774 [7:18:12<1:46:51, 11.64s/it] {'loss': 1.0737, 'learning_rate': 4.999997599297888e-07, 'epoch': 0.8} 80%|████████ | 2223/2774 [7:18:12<1:46:51, 11.64s/it] 80%|████████ | 2224/2774 [7:18:24<1:46:12, 11.59s/it] {'loss': 0.9834, 'learning_rate': 4.982493071767758e-07, 'epoch': 0.8} 80%|████████ | 2224/2774 [7:18:24<1:46:12, 11.59s/it] 80%|████████ | 2225/2774 [7:18:35<1:45:43, 11.55s/it] {'loss': 1.0239, 'learning_rate': 4.965015846907865e-07, 'epoch': 0.8} 80%|████████ | 2225/2774 [7:18:35<1:45:43, 11.55s/it] 80%|████████ | 2226/2774 [7:18:47<1:45:43, 11.58s/it] {'loss': 1.0044, 'learning_rate': 4.947565948556066e-07, 'epoch': 0.8} 80%|████████ | 2226/2774 [7:18:47<1:45:43, 11.58s/it] 80%|████████ | 2227/2774 [7:18:59<1:47:11, 11.76s/it] {'loss': 1.0078, 'learning_rate': 4.930143400512988e-07, 'epoch': 0.8} 80%|████████ | 2227/2774 [7:18:59<1:47:11, 11.76s/it] 80%|████████ | 2228/2774 [7:19:10<1:45:34, 11.60s/it] {'loss': 1.0547, 'learning_rate': 4.912748226541924e-07, 'epoch': 0.8} 80%|████████ | 2228/2774 [7:19:10<1:45:34, 11.60s/it] 80%|████████ | 2229/2774 [7:19:22<1:45:07, 11.57s/it] {'loss': 1.0571, 'learning_rate': 4.895380450368841e-07, 'epoch': 0.8} 80%|████████ | 2229/2774 [7:19:22<1:45:07, 11.57s/it] 80%|████████ | 2230/2774 [7:19:33<1:44:15, 11.50s/it] {'loss': 1.0903, 'learning_rate': 4.878040095682335e-07, 'epoch': 0.8} 80%|████████ | 2230/2774 [7:19:33<1:44:15, 11.50s/it] 80%|████████ | 2231/2774 [7:19:47<1:49:38, 12.11s/it] {'loss': 1.0127, 'learning_rate': 4.860727186133607e-07, 'epoch': 0.8} 80%|████████ | 2231/2774 [7:19:47<1:49:38, 12.11s/it] 80%|████████ | 2232/2774 [7:19:58<1:48:33, 12.02s/it] {'loss': 1.0435, 'learning_rate': 4.843441745336419e-07, 'epoch': 0.8} 80%|████████ | 2232/2774 [7:19:58<1:48:33, 12.02s/it] 80%|████████ | 2233/2774 [7:20:10<1:46:06, 11.77s/it] {'loss': 1.0547, 'learning_rate': 4.826183796867059e-07, 'epoch': 0.8} 80%|████████ | 2233/2774 [7:20:10<1:46:06, 11.77s/it] 81%|████████ | 2234/2774 [7:20:22<1:48:19, 12.04s/it] {'loss': 1.0034, 'learning_rate': 4.80895336426434e-07, 'epoch': 0.81} 81%|████████ | 2234/2774 [7:20:22<1:48:19, 12.04s/it] 81%|████████ | 2235/2774 [7:20:33<1:45:36, 11.76s/it] {'loss': 1.0024, 'learning_rate': 4.791750471029519e-07, 'epoch': 0.81} 81%|████████ | 2235/2774 [7:20:33<1:45:36, 11.76s/it] 81%|████████ | 2236/2774 [7:20:45<1:44:30, 11.66s/it] {'loss': 1.0684, 'learning_rate': 4.774575140626317e-07, 'epoch': 0.81} 81%|████████ | 2236/2774 [7:20:45<1:44:30, 11.66s/it] 81%|████████ | 2237/2774 [7:20:56<1:43:46, 11.59s/it] {'loss': 1.0527, 'learning_rate': 4.757427396480838e-07, 'epoch': 0.81} 81%|████████ | 2237/2774 [7:20:56<1:43:46, 11.59s/it] 81%|████████ | 2238/2774 [7:21:08<1:43:09, 11.55s/it] {'loss': 1.04, 'learning_rate': 4.7403072619815696e-07, 'epoch': 0.81} 81%|████████ | 2238/2774 [7:21:08<1:43:09, 11.55s/it] 81%|████████ | 2239/2774 [7:21:19<1:42:33, 11.50s/it] {'loss': 1.0562, 'learning_rate': 4.723214760479333e-07, 'epoch': 0.81} 81%|████████ | 2239/2774 [7:21:19<1:42:33, 11.50s/it] 81%|████████ | 2240/2774 [7:21:31<1:43:03, 11.58s/it] {'loss': 1.0356, 'learning_rate': 4.7061499152872866e-07, 'epoch': 0.81} 81%|████████ | 2240/2774 [7:21:31<1:43:03, 11.58s/it] 81%|████████ | 2241/2774 [7:21:44<1:46:09, 11.95s/it] {'loss': 1.0186, 'learning_rate': 4.6891127496808295e-07, 'epoch': 0.81} 81%|████████ | 2241/2774 [7:21:44<1:46:09, 11.95s/it] 81%|████████ | 2242/2774 [7:21:55<1:45:08, 11.86s/it] {'loss': 1.0117, 'learning_rate': 4.6721032868976417e-07, 'epoch': 0.81} 81%|████████ | 2242/2774 [7:21:55<1:45:08, 11.86s/it] 81%|████████ | 2243/2774 [7:22:07<1:43:23, 11.68s/it] {'loss': 1.083, 'learning_rate': 4.6551215501375896e-07, 'epoch': 0.81} 81%|████████ | 2243/2774 [7:22:07<1:43:23, 11.68s/it] 81%|████████ | 2244/2774 [7:22:18<1:42:10, 11.57s/it] {'loss': 1.022, 'learning_rate': 4.638167562562751e-07, 'epoch': 0.81} 81%|████████ | 2244/2774 [7:22:18<1:42:10, 11.57s/it] 81%|████████ | 2245/2774 [7:22:29<1:41:22, 11.50s/it] {'loss': 1.0049, 'learning_rate': 4.6212413472973257e-07, 'epoch': 0.81} 81%|████████ | 2245/2774 [7:22:29<1:41:22, 11.50s/it] 81%|████████ | 2246/2774 [7:22:40<1:40:10, 11.38s/it] {'loss': 0.9932, 'learning_rate': 4.6043429274276685e-07, 'epoch': 0.81} 81%|████████ | 2246/2774 [7:22:40<1:40:10, 11.38s/it] 81%|████████ | 2247/2774 [7:22:52<1:40:53, 11.49s/it] {'loss': 1.041, 'learning_rate': 4.5874723260021794e-07, 'epoch': 0.81} 81%|████████ | 2247/2774 [7:22:52<1:40:53, 11.49s/it] 81%|████████ | 2248/2774 [7:23:05<1:45:05, 11.99s/it] {'loss': 1.0034, 'learning_rate': 4.570629566031354e-07, 'epoch': 0.81} 81%|████████ | 2248/2774 [7:23:05<1:45:05, 11.99s/it] 81%|████████ | 2249/2774 [7:23:17<1:43:10, 11.79s/it] {'loss': 1.0229, 'learning_rate': 4.553814670487694e-07, 'epoch': 0.81} 81%|████████ | 2249/2774 [7:23:17<1:43:10, 11.79s/it] 81%|████████ | 2250/2774 [7:23:28<1:42:39, 11.75s/it] {'loss': 0.9512, 'learning_rate': 4.537027662305707e-07, 'epoch': 0.81} 81%|████████ | 2250/2774 [7:23:28<1:42:39, 11.75s/it] 81%|████████ | 2251/2774 [7:23:40<1:41:36, 11.66s/it] {'loss': 1.0342, 'learning_rate': 4.5202685643818495e-07, 'epoch': 0.81} 81%|████████ | 2251/2774 [7:23:40<1:41:36, 11.66s/it] 81%|████████ | 2252/2774 [7:23:51<1:40:47, 11.59s/it] {'loss': 1.0464, 'learning_rate': 4.5035373995745287e-07, 'epoch': 0.81} 81%|████████ | 2252/2774 [7:23:51<1:40:47, 11.59s/it] 81%|████████ | 2253/2774 [7:24:02<1:39:24, 11.45s/it] {'loss': 1.0439, 'learning_rate': 4.48683419070404e-07, 'epoch': 0.81} 81%|████████ | 2253/2774 [7:24:02<1:39:24, 11.45s/it] 81%|████████▏ | 2254/2774 [7:24:14<1:38:50, 11.40s/it] {'loss': 1.0693, 'learning_rate': 4.4701589605525427e-07, 'epoch': 0.81} 81%|████████▏ | 2254/2774 [7:24:14<1:38:50, 11.40s/it] 81%|████████▏ | 2255/2774 [7:24:25<1:39:26, 11.50s/it] {'loss': 1.0518, 'learning_rate': 4.4535117318640545e-07, 'epoch': 0.81} 81%|████████▏ | 2255/2774 [7:24:25<1:39:26, 11.50s/it] 81%|████████▏ | 2256/2774 [7:24:37<1:39:12, 11.49s/it] {'loss': 1.0317, 'learning_rate': 4.4368925273443856e-07, 'epoch': 0.81} 81%|████████▏ | 2256/2774 [7:24:37<1:39:12, 11.49s/it] 81%|████████▏ | 2257/2774 [7:24:48<1:38:55, 11.48s/it] {'loss': 1.0293, 'learning_rate': 4.4203013696611203e-07, 'epoch': 0.81} 81%|████████▏ | 2257/2774 [7:24:48<1:38:55, 11.48s/it] 81%|████████▏ | 2258/2774 [7:25:00<1:38:59, 11.51s/it] {'loss': 1.0068, 'learning_rate': 4.403738281443609e-07, 'epoch': 0.81} 81%|████████▏ | 2258/2774 [7:25:00<1:38:59, 11.51s/it] 81%|████████▏ | 2259/2774 [7:25:11<1:38:57, 11.53s/it] {'loss': 1.0376, 'learning_rate': 4.3872032852828955e-07, 'epoch': 0.81} 81%|████████▏ | 2259/2774 [7:25:11<1:38:57, 11.53s/it] 81%|████████▏ | 2260/2774 [7:25:23<1:38:29, 11.50s/it] {'loss': 1.0215, 'learning_rate': 4.3706964037317085e-07, 'epoch': 0.81} 81%|████████▏ | 2260/2774 [7:25:23<1:38:29, 11.50s/it] 82%|████████▏ | 2261/2774 [7:25:35<1:41:07, 11.83s/it] {'loss': 1.0762, 'learning_rate': 4.354217659304452e-07, 'epoch': 0.82} 82%|████████▏ | 2261/2774 [7:25:35<1:41:07, 11.83s/it] 82%|████████▏ | 2262/2774 [7:25:47<1:41:30, 11.89s/it] {'loss': 1.0068, 'learning_rate': 4.3377670744771253e-07, 'epoch': 0.82} 82%|████████▏ | 2262/2774 [7:25:47<1:41:30, 11.89s/it] 82%|████████▏ | 2263/2774 [7:25:59<1:41:02, 11.86s/it] {'loss': 0.9595, 'learning_rate': 4.321344671687344e-07, 'epoch': 0.82} 82%|████████▏ | 2263/2774 [7:25:59<1:41:02, 11.86s/it] 82%|████████▏ | 2264/2774 [7:26:13<1:46:37, 12.54s/it] {'loss': 0.9985, 'learning_rate': 4.304950473334268e-07, 'epoch': 0.82} 82%|████████▏ | 2264/2774 [7:26:13<1:46:37, 12.54s/it] 82%|████████▏ | 2265/2774 [7:26:25<1:45:11, 12.40s/it] {'loss': 1.0142, 'learning_rate': 4.288584501778592e-07, 'epoch': 0.82} 82%|████████▏ | 2265/2774 [7:26:25<1:45:11, 12.40s/it] 82%|████████▏ | 2266/2774 [7:26:37<1:42:39, 12.12s/it] {'loss': 1.0469, 'learning_rate': 4.2722467793425093e-07, 'epoch': 0.82} 82%|████████▏ | 2266/2774 [7:26:37<1:42:39, 12.12s/it] 82%|████████▏ | 2267/2774 [7:26:49<1:41:47, 12.05s/it] {'loss': 1.0059, 'learning_rate': 4.255937328309695e-07, 'epoch': 0.82} 82%|████████▏ | 2267/2774 [7:26:49<1:41:47, 12.05s/it] 82%|████████▏ | 2268/2774 [7:27:00<1:39:39, 11.82s/it] {'loss': 0.9692, 'learning_rate': 4.2396561709252436e-07, 'epoch': 0.82} 82%|████████▏ | 2268/2774 [7:27:00<1:39:39, 11.82s/it] 82%|████████▏ | 2269/2774 [7:27:12<1:38:54, 11.75s/it] {'loss': 1.0625, 'learning_rate': 4.2234033293956865e-07, 'epoch': 0.82} 82%|████████▏ | 2269/2774 [7:27:12<1:38:54, 11.75s/it] 82%|████████▏ | 2270/2774 [7:27:23<1:38:01, 11.67s/it] {'loss': 1.019, 'learning_rate': 4.2071788258889025e-07, 'epoch': 0.82} 82%|████████▏ | 2270/2774 [7:27:23<1:38:01, 11.67s/it] 82%|████████▏ | 2271/2774 [7:27:34<1:36:37, 11.53s/it] {'loss': 1.021, 'learning_rate': 4.190982682534145e-07, 'epoch': 0.82} 82%|████████▏ | 2271/2774 [7:27:34<1:36:37, 11.53s/it] 82%|████████▏ | 2272/2774 [7:27:46<1:36:22, 11.52s/it] {'loss': 0.9858, 'learning_rate': 4.174814921421963e-07, 'epoch': 0.82} 82%|████████▏ | 2272/2774 [7:27:46<1:36:22, 11.52s/it] 82%|████████▏ | 2273/2774 [7:27:58<1:38:21, 11.78s/it] {'loss': 1.0645, 'learning_rate': 4.158675564604223e-07, 'epoch': 0.82} 82%|████████▏ | 2273/2774 [7:27:58<1:38:21, 11.78s/it] 82%|████████▏ | 2274/2774 [7:28:10<1:37:18, 11.68s/it] {'loss': 1.0137, 'learning_rate': 4.142564634094021e-07, 'epoch': 0.82} 82%|████████▏ | 2274/2774 [7:28:10<1:37:18, 11.68s/it] 82%|████████▏ | 2275/2774 [7:28:23<1:40:49, 12.12s/it] {'loss': 0.9854, 'learning_rate': 4.126482151865696e-07, 'epoch': 0.82} 82%|████████▏ | 2275/2774 [7:28:23<1:40:49, 12.12s/it] 82%|████████▏ | 2276/2774 [7:28:34<1:38:22, 11.85s/it] {'loss': 1.0342, 'learning_rate': 4.1104281398547746e-07, 'epoch': 0.82} 82%|████████▏ | 2276/2774 [7:28:34<1:38:22, 11.85s/it] 82%|████████▏ | 2277/2774 [7:28:45<1:36:58, 11.71s/it] {'loss': 0.9956, 'learning_rate': 4.094402619957974e-07, 'epoch': 0.82} 82%|████████▏ | 2277/2774 [7:28:45<1:36:58, 11.71s/it] 82%|████████▏ | 2278/2774 [7:28:57<1:36:50, 11.72s/it] {'loss': 1.0142, 'learning_rate': 4.078405614033126e-07, 'epoch': 0.82} 82%|████████▏ | 2278/2774 [7:28:57<1:36:50, 11.72s/it] 82%|████████▏ | 2279/2774 [7:29:09<1:36:15, 11.67s/it] {'loss': 0.9961, 'learning_rate': 4.062437143899176e-07, 'epoch': 0.82} 82%|████████▏ | 2279/2774 [7:29:09<1:36:15, 11.67s/it] 82%|████████▏ | 2280/2774 [7:29:20<1:35:05, 11.55s/it] {'loss': 1.0923, 'learning_rate': 4.046497231336166e-07, 'epoch': 0.82} 82%|████████▏ | 2280/2774 [7:29:20<1:35:05, 11.55s/it] 82%|████████▏ | 2281/2774 [7:29:32<1:36:02, 11.69s/it] {'loss': 1.0625, 'learning_rate': 4.0305858980851595e-07, 'epoch': 0.82} 82%|████████▏ | 2281/2774 [7:29:32<1:36:02, 11.69s/it] 82%|████████▏ | 2282/2774 [7:29:45<1:39:16, 12.11s/it] {'loss': 1.0444, 'learning_rate': 4.014703165848266e-07, 'epoch': 0.82} 82%|████████▏ | 2282/2774 [7:29:45<1:39:16, 12.11s/it] 82%|████████▏ | 2283/2774 [7:29:58<1:41:58, 12.46s/it] {'loss': 0.9663, 'learning_rate': 3.9988490562885675e-07, 'epoch': 0.82} 82%|████████▏ | 2283/2774 [7:29:58<1:41:58, 12.46s/it] 82%|████████▏ | 2284/2774 [7:30:10<1:39:19, 12.16s/it] {'loss': 0.9785, 'learning_rate': 3.983023591030113e-07, 'epoch': 0.82} 82%|████████▏ | 2284/2774 [7:30:10<1:39:19, 12.16s/it] 82%|████████▏ | 2285/2774 [7:30:22<1:39:50, 12.25s/it] {'loss': 1.0137, 'learning_rate': 3.9672267916578743e-07, 'epoch': 0.82} 82%|████████▏ | 2285/2774 [7:30:22<1:39:50, 12.25s/it] 82%|████████▏ | 2286/2774 [7:30:34<1:37:40, 12.01s/it] {'loss': 1.0454, 'learning_rate': 3.951458679717743e-07, 'epoch': 0.82} 82%|████████▏ | 2286/2774 [7:30:34<1:37:40, 12.01s/it] 82%|████████▏ | 2287/2774 [7:30:45<1:36:44, 11.92s/it] {'loss': 0.999, 'learning_rate': 3.935719276716457e-07, 'epoch': 0.82} 82%|████████▏ | 2287/2774 [7:30:45<1:36:44, 11.92s/it] 82%|████████▏ | 2288/2774 [7:30:57<1:35:42, 11.82s/it] {'loss': 1.0127, 'learning_rate': 3.920008604121628e-07, 'epoch': 0.82} 82%|████████▏ | 2288/2774 [7:30:57<1:35:42, 11.82s/it] 83%|████████▎ | 2289/2774 [7:31:08<1:34:09, 11.65s/it] {'loss': 1.0488, 'learning_rate': 3.904326683361648e-07, 'epoch': 0.83} 83%|████████▎ | 2289/2774 [7:31:08<1:34:09, 11.65s/it] 83%|████████▎ | 2290/2774 [7:31:20<1:34:47, 11.75s/it] {'loss': 1.0327, 'learning_rate': 3.888673535825727e-07, 'epoch': 0.83} 83%|████████▎ | 2290/2774 [7:31:20<1:34:47, 11.75s/it] 83%|████████▎ | 2291/2774 [7:31:33<1:37:47, 12.15s/it] {'loss': 1.0322, 'learning_rate': 3.8730491828637944e-07, 'epoch': 0.83} 83%|████████▎ | 2291/2774 [7:31:33<1:37:47, 12.15s/it] 83%|████████▎ | 2292/2774 [7:31:45<1:36:47, 12.05s/it] {'loss': 1.0415, 'learning_rate': 3.8574536457865436e-07, 'epoch': 0.83} 83%|████████▎ | 2292/2774 [7:31:45<1:36:47, 12.05s/it] 83%|████████▎ | 2293/2774 [7:31:56<1:34:58, 11.85s/it] {'loss': 1.0576, 'learning_rate': 3.841886945865325e-07, 'epoch': 0.83} 83%|████████▎ | 2293/2774 [7:31:56<1:34:58, 11.85s/it] 83%|████████▎ | 2294/2774 [7:32:08<1:34:04, 11.76s/it] {'loss': 1.0234, 'learning_rate': 3.8263491043321887e-07, 'epoch': 0.83} 83%|████████▎ | 2294/2774 [7:32:08<1:34:04, 11.76s/it] 83%|████████▎ | 2295/2774 [7:32:20<1:33:27, 11.71s/it] {'loss': 1.0195, 'learning_rate': 3.810840142379807e-07, 'epoch': 0.83} 83%|████████▎ | 2295/2774 [7:32:20<1:33:27, 11.71s/it] 83%|████████▎ | 2296/2774 [7:32:31<1:33:09, 11.69s/it] {'loss': 1.0356, 'learning_rate': 3.7953600811614727e-07, 'epoch': 0.83} 83%|████████▎ | 2296/2774 [7:32:31<1:33:09, 11.69s/it] 83%|████████▎ | 2297/2774 [7:32:43<1:33:05, 11.71s/it] {'loss': 1.0522, 'learning_rate': 3.7799089417910467e-07, 'epoch': 0.83} 83%|████████▎ | 2297/2774 [7:32:43<1:33:05, 11.71s/it] 83%|████████▎ | 2298/2774 [7:32:55<1:32:28, 11.66s/it] {'loss': 1.0293, 'learning_rate': 3.7644867453429575e-07, 'epoch': 0.83} 83%|████████▎ | 2298/2774 [7:32:55<1:32:28, 11.66s/it] 83%|████████▎ | 2299/2774 [7:33:06<1:32:02, 11.63s/it] {'loss': 1.0166, 'learning_rate': 3.749093512852148e-07, 'epoch': 0.83} 83%|████████▎ | 2299/2774 [7:33:06<1:32:02, 11.63s/it] 83%|████████▎ | 2300/2774 [7:33:18<1:31:50, 11.63s/it] {'loss': 1.0107, 'learning_rate': 3.7337292653140485e-07, 'epoch': 0.83} 83%|████████▎ | 2300/2774 [7:33:18<1:31:50, 11.63s/it] 83%|████████▎ | 2301/2774 [7:33:29<1:31:31, 11.61s/it] {'loss': 1.0073, 'learning_rate': 3.7183940236845767e-07, 'epoch': 0.83} 83%|████████▎ | 2301/2774 [7:33:29<1:31:31, 11.61s/it] 83%|████████▎ | 2302/2774 [7:33:41<1:32:01, 11.70s/it] {'loss': 0.9937, 'learning_rate': 3.703087808880071e-07, 'epoch': 0.83} 83%|████████▎ | 2302/2774 [7:33:41<1:32:01, 11.70s/it] 83%|████████▎ | 2303/2774 [7:33:53<1:31:13, 11.62s/it] {'loss': 1.0513, 'learning_rate': 3.6878106417772757e-07, 'epoch': 0.83} 83%|████████▎ | 2303/2774 [7:33:53<1:31:13, 11.62s/it] 83%|████████▎ | 2304/2774 [7:34:06<1:35:15, 12.16s/it] {'loss': 0.9731, 'learning_rate': 3.6725625432133374e-07, 'epoch': 0.83} 83%|████████▎ | 2304/2774 [7:34:06<1:35:15, 12.16s/it] 83%|████████▎ | 2305/2774 [7:34:18<1:34:24, 12.08s/it] {'loss': 0.9956, 'learning_rate': 3.6573435339857384e-07, 'epoch': 0.83} 83%|████████▎ | 2305/2774 [7:34:18<1:34:24, 12.08s/it] 83%|████████▎ | 2306/2774 [7:34:30<1:32:56, 11.92s/it] {'loss': 1.0107, 'learning_rate': 3.6421536348522746e-07, 'epoch': 0.83} 83%|████████▎ | 2306/2774 [7:34:30<1:32:56, 11.92s/it] 83%|████████▎ | 2307/2774 [7:34:41<1:31:41, 11.78s/it] {'loss': 1.0029, 'learning_rate': 3.6269928665310707e-07, 'epoch': 0.83} 83%|████████▎ | 2307/2774 [7:34:41<1:31:41, 11.78s/it] 83%|████████▎ | 2308/2774 [7:34:53<1:32:30, 11.91s/it] {'loss': 1.0669, 'learning_rate': 3.611861249700482e-07, 'epoch': 0.83} 83%|████████▎ | 2308/2774 [7:34:53<1:32:30, 11.91s/it] 83%|████████▎ | 2309/2774 [7:35:05<1:31:07, 11.76s/it] {'loss': 1.0469, 'learning_rate': 3.5967588049991317e-07, 'epoch': 0.83} 83%|████████▎ | 2309/2774 [7:35:05<1:31:07, 11.76s/it] 83%|████████▎ | 2310/2774 [7:35:16<1:30:21, 11.68s/it] {'loss': 1.0176, 'learning_rate': 3.5816855530258376e-07, 'epoch': 0.83} 83%|████████▎ | 2310/2774 [7:35:16<1:30:21, 11.68s/it] 83%|████████▎ | 2311/2774 [7:35:27<1:29:24, 11.59s/it] {'loss': 1.0029, 'learning_rate': 3.5666415143396054e-07, 'epoch': 0.83} 83%|████████▎ | 2311/2774 [7:35:27<1:29:24, 11.59s/it] 83%|████████▎ | 2312/2774 [7:35:39<1:29:58, 11.69s/it] {'loss': 1.0469, 'learning_rate': 3.551626709459588e-07, 'epoch': 0.83} 83%|████████▎ | 2312/2774 [7:35:39<1:29:58, 11.69s/it] 83%|████████▎ | 2313/2774 [7:35:52<1:32:23, 12.03s/it] {'loss': 1.0259, 'learning_rate': 3.5366411588650866e-07, 'epoch': 0.83} 83%|████████▎ | 2313/2774 [7:35:52<1:32:23, 12.03s/it] 83%|████████▎ | 2314/2774 [7:36:04<1:32:13, 12.03s/it] {'loss': 0.9956, 'learning_rate': 3.5216848829954714e-07, 'epoch': 0.83} 83%|████████▎ | 2314/2774 [7:36:04<1:32:13, 12.03s/it] 83%|████████▎ | 2315/2774 [7:36:16<1:30:23, 11.82s/it] {'loss': 1.0425, 'learning_rate': 3.50675790225021e-07, 'epoch': 0.83} 83%|████████▎ | 2315/2774 [7:36:16<1:30:23, 11.82s/it] 83%|████████▎ | 2316/2774 [7:36:27<1:28:51, 11.64s/it] {'loss': 1.0049, 'learning_rate': 3.491860236988798e-07, 'epoch': 0.83} 83%|████████▎ | 2316/2774 [7:36:27<1:28:51, 11.64s/it] 84%|████████▎ | 2317/2774 [7:36:41<1:35:16, 12.51s/it] {'loss': 0.9854, 'learning_rate': 3.476991907530755e-07, 'epoch': 0.84} 84%|████████▎ | 2317/2774 [7:36:41<1:35:16, 12.51s/it] 84%|████████▎ | 2318/2774 [7:36:53<1:33:43, 12.33s/it] {'loss': 1.0923, 'learning_rate': 3.4621529341555745e-07, 'epoch': 0.84} 84%|████████▎ | 2318/2774 [7:36:53<1:33:43, 12.33s/it] 84%|████████▎ | 2319/2774 [7:37:05<1:31:44, 12.10s/it] {'loss': 1.021, 'learning_rate': 3.4473433371027406e-07, 'epoch': 0.84} 84%|████████▎ | 2319/2774 [7:37:05<1:31:44, 12.10s/it] 84%|████████▎ | 2320/2774 [7:37:16<1:29:44, 11.86s/it] {'loss': 1.0493, 'learning_rate': 3.432563136571621e-07, 'epoch': 0.84} 84%|████████▎ | 2320/2774 [7:37:16<1:29:44, 11.86s/it] 84%|████████▎ | 2321/2774 [7:37:28<1:28:40, 11.74s/it] {'loss': 0.98, 'learning_rate': 3.417812352721536e-07, 'epoch': 0.84} 84%|████████▎ | 2321/2774 [7:37:28<1:28:40, 11.74s/it] 84%|████████▎ | 2322/2774 [7:37:39<1:28:44, 11.78s/it] {'loss': 0.9312, 'learning_rate': 3.403091005671655e-07, 'epoch': 0.84} 84%|████████▎ | 2322/2774 [7:37:39<1:28:44, 11.78s/it] 84%|████████▎ | 2323/2774 [7:37:52<1:29:25, 11.90s/it] {'loss': 1.0015, 'learning_rate': 3.388399115501012e-07, 'epoch': 0.84} 84%|████████▎ | 2323/2774 [7:37:52<1:29:25, 11.90s/it] 84%|████████▍ | 2324/2774 [7:38:03<1:27:37, 11.68s/it] {'loss': 1.0752, 'learning_rate': 3.373736702248451e-07, 'epoch': 0.84} 84%|████████▍ | 2324/2774 [7:38:03<1:27:37, 11.68s/it] 84%|████████▍ | 2325/2774 [7:38:15<1:27:53, 11.75s/it] {'loss': 0.9497, 'learning_rate': 3.3591037859126266e-07, 'epoch': 0.84} 84%|████████▍ | 2325/2774 [7:38:15<1:27:53, 11.75s/it] 84%|████████▍ | 2326/2774 [7:38:26<1:27:18, 11.69s/it] {'loss': 1.062, 'learning_rate': 3.3445003864519486e-07, 'epoch': 0.84} 84%|████████▍ | 2326/2774 [7:38:26<1:27:18, 11.69s/it] 84%|████████▍ | 2327/2774 [7:38:40<1:32:34, 12.43s/it] {'loss': 1.019, 'learning_rate': 3.329926523784563e-07, 'epoch': 0.84} 84%|████████▍ | 2327/2774 [7:38:40<1:32:34, 12.43s/it] 84%|████████▍ | 2328/2774 [7:38:52<1:30:15, 12.14s/it] {'loss': 1.0464, 'learning_rate': 3.3153822177883543e-07, 'epoch': 0.84} 84%|████████▍ | 2328/2774 [7:38:52<1:30:15, 12.14s/it] 84%|████████▍ | 2329/2774 [7:39:03<1:28:42, 11.96s/it] {'loss': 1.022, 'learning_rate': 3.3008674883008686e-07, 'epoch': 0.84} 84%|████████▍ | 2329/2774 [7:39:03<1:28:42, 11.96s/it] 84%|████████▍ | 2330/2774 [7:39:15<1:27:03, 11.77s/it] {'loss': 1.0703, 'learning_rate': 3.286382355119319e-07, 'epoch': 0.84} 84%|████████▍ | 2330/2774 [7:39:15<1:27:03, 11.77s/it] 84%|████████▍ | 2331/2774 [7:39:26<1:26:14, 11.68s/it] {'loss': 1.0474, 'learning_rate': 3.2719268380005496e-07, 'epoch': 0.84} 84%|████████▍ | 2331/2774 [7:39:26<1:26:14, 11.68s/it] 84%|████████▍ | 2332/2774 [7:39:38<1:25:18, 11.58s/it] {'loss': 1.0332, 'learning_rate': 3.2575009566610193e-07, 'epoch': 0.84} 84%|████████▍ | 2332/2774 [7:39:38<1:25:18, 11.58s/it] 84%|████████▍ | 2333/2774 [7:39:49<1:24:53, 11.55s/it] {'loss': 1.0713, 'learning_rate': 3.243104730776753e-07, 'epoch': 0.84} 84%|████████▍ | 2333/2774 [7:39:49<1:24:53, 11.55s/it] 84%|████████▍ | 2334/2774 [7:40:00<1:23:33, 11.39s/it] {'loss': 1.0474, 'learning_rate': 3.2287381799833427e-07, 'epoch': 0.84} 84%|████████▍ | 2334/2774 [7:40:00<1:23:33, 11.39s/it] 84%|████████▍ | 2335/2774 [7:40:12<1:25:16, 11.65s/it] {'loss': 1.0483, 'learning_rate': 3.214401323875882e-07, 'epoch': 0.84} 84%|████████▍ | 2335/2774 [7:40:12<1:25:16, 11.65s/it] 84%|████████▍ | 2336/2774 [7:40:25<1:27:56, 12.05s/it] {'loss': 1.0532, 'learning_rate': 3.2000941820089893e-07, 'epoch': 0.84} 84%|████████▍ | 2336/2774 [7:40:25<1:27:56, 12.05s/it] 84%|████████▍ | 2337/2774 [7:40:36<1:25:49, 11.78s/it] {'loss': 1.0405, 'learning_rate': 3.1858167738967383e-07, 'epoch': 0.84} 84%|████████▍ | 2337/2774 [7:40:36<1:25:49, 11.78s/it] 84%|████████▍ | 2338/2774 [7:40:48<1:24:50, 11.68s/it] {'loss': 1.0117, 'learning_rate': 3.171569119012649e-07, 'epoch': 0.84} 84%|████████▍ | 2338/2774 [7:40:48<1:24:50, 11.68s/it] 84%|████████▍ | 2339/2774 [7:40:59<1:24:19, 11.63s/it] {'loss': 1.0142, 'learning_rate': 3.1573512367896545e-07, 'epoch': 0.84} 84%|████████▍ | 2339/2774 [7:40:59<1:24:19, 11.63s/it] 84%|████████▍ | 2340/2774 [7:41:11<1:24:41, 11.71s/it] {'loss': 1.0063, 'learning_rate': 3.143163146620104e-07, 'epoch': 0.84} 84%|████████▍ | 2340/2774 [7:41:11<1:24:41, 11.71s/it] 84%|████████▍ | 2341/2774 [7:41:25<1:28:09, 12.22s/it] {'loss': 1.0674, 'learning_rate': 3.1290048678556786e-07, 'epoch': 0.84} 84%|████████▍ | 2341/2774 [7:41:25<1:28:09, 12.22s/it] 84%|████████▍ | 2342/2774 [7:41:36<1:26:14, 11.98s/it] {'loss': 1.0239, 'learning_rate': 3.1148764198074304e-07, 'epoch': 0.84} 84%|████████▍ | 2342/2774 [7:41:36<1:26:14, 11.98s/it] 84%|████████▍ | 2343/2774 [7:41:47<1:24:04, 11.70s/it] {'loss': 1.0122, 'learning_rate': 3.1007778217456956e-07, 'epoch': 0.84} 84%|████████▍ | 2343/2774 [7:41:47<1:24:04, 11.70s/it] 84%|████████▍ | 2344/2774 [7:41:59<1:23:13, 11.61s/it] {'loss': 1.0352, 'learning_rate': 3.08670909290012e-07, 'epoch': 0.84} 84%|████████▍ | 2344/2774 [7:41:59<1:23:13, 11.61s/it] 85%|████████▍ | 2345/2774 [7:42:10<1:22:22, 11.52s/it] {'loss': 0.9897, 'learning_rate': 3.0726702524596003e-07, 'epoch': 0.85} 85%|████████▍ | 2345/2774 [7:42:10<1:22:22, 11.52s/it] 85%|████████▍ | 2346/2774 [7:42:21<1:21:44, 11.46s/it] {'loss': 0.9932, 'learning_rate': 3.058661319572259e-07, 'epoch': 0.85} 85%|████████▍ | 2346/2774 [7:42:21<1:21:44, 11.46s/it] 85%|████████▍ | 2347/2774 [7:42:35<1:26:27, 12.15s/it] {'loss': 0.9722, 'learning_rate': 3.0446823133454346e-07, 'epoch': 0.85} 85%|████████▍ | 2347/2774 [7:42:35<1:26:27, 12.15s/it] 85%|████████▍ | 2348/2774 [7:42:46<1:24:08, 11.85s/it] {'loss': 1.0273, 'learning_rate': 3.0307332528456577e-07, 'epoch': 0.85} 85%|████████▍ | 2348/2774 [7:42:46<1:24:08, 11.85s/it] 85%|████████▍ | 2349/2774 [7:42:57<1:22:55, 11.71s/it] {'loss': 1.0522, 'learning_rate': 3.016814157098588e-07, 'epoch': 0.85} 85%|████████▍ | 2349/2774 [7:42:57<1:22:55, 11.71s/it] 85%|████████▍ | 2350/2774 [7:43:09<1:22:34, 11.68s/it] {'loss': 1.0474, 'learning_rate': 3.00292504508905e-07, 'epoch': 0.85} 85%|████████▍ | 2350/2774 [7:43:09<1:22:34, 11.68s/it] 85%|████████▍ | 2351/2774 [7:43:21<1:22:03, 11.64s/it] {'loss': 1.001, 'learning_rate': 2.989065935760943e-07, 'epoch': 0.85} 85%|████████▍ | 2351/2774 [7:43:21<1:22:03, 11.64s/it] 85%|████████▍ | 2352/2774 [7:43:32<1:21:28, 11.58s/it] {'loss': 1.0259, 'learning_rate': 2.975236848017249e-07, 'epoch': 0.85} 85%|████████▍ | 2352/2774 [7:43:32<1:21:28, 11.58s/it] 85%|████████▍ | 2353/2774 [7:43:44<1:21:18, 11.59s/it] {'loss': 1.0107, 'learning_rate': 2.961437800720021e-07, 'epoch': 0.85} 85%|████████▍ | 2353/2774 [7:43:44<1:21:18, 11.59s/it] 85%|████████▍ | 2354/2774 [7:43:55<1:20:30, 11.50s/it] {'loss': 1.0146, 'learning_rate': 2.947668812690316e-07, 'epoch': 0.85} 85%|████████▍ | 2354/2774 [7:43:55<1:20:30, 11.50s/it] 85%|████████▍ | 2355/2774 [7:44:07<1:21:51, 11.72s/it] {'loss': 1.0132, 'learning_rate': 2.933929902708213e-07, 'epoch': 0.85} 85%|████████▍ | 2355/2774 [7:44:07<1:21:51, 11.72s/it] 85%|████████▍ | 2356/2774 [7:44:18<1:20:29, 11.55s/it] {'loss': 0.979, 'learning_rate': 2.9202210895127424e-07, 'epoch': 0.85} 85%|████████▍ | 2356/2774 [7:44:18<1:20:29, 11.55s/it] 85%|████████▍ | 2357/2774 [7:44:30<1:19:25, 11.43s/it] {'loss': 1.0161, 'learning_rate': 2.906542391801906e-07, 'epoch': 0.85} 85%|████████▍ | 2357/2774 [7:44:30<1:19:25, 11.43s/it] 85%|████████▌ | 2358/2774 [7:44:41<1:18:38, 11.34s/it] {'loss': 1.0283, 'learning_rate': 2.8928938282326123e-07, 'epoch': 0.85} 85%|████████▌ | 2358/2774 [7:44:41<1:18:38, 11.34s/it] 85%|████████▌ | 2359/2774 [7:44:52<1:19:02, 11.43s/it] {'loss': 1.0537, 'learning_rate': 2.8792754174206903e-07, 'epoch': 0.85} 85%|████████▌ | 2359/2774 [7:44:52<1:19:02, 11.43s/it] 85%|████████▌ | 2360/2774 [7:45:04<1:19:14, 11.48s/it] {'loss': 0.9868, 'learning_rate': 2.865687177940818e-07, 'epoch': 0.85} 85%|████████▌ | 2360/2774 [7:45:04<1:19:14, 11.48s/it] 85%|████████▌ | 2361/2774 [7:45:16<1:19:20, 11.53s/it] {'loss': 1.0591, 'learning_rate': 2.8521291283265417e-07, 'epoch': 0.85} 85%|████████▌ | 2361/2774 [7:45:16<1:19:20, 11.53s/it] 85%|████████▌ | 2362/2774 [7:45:27<1:19:33, 11.59s/it] {'loss': 1.0723, 'learning_rate': 2.838601287070214e-07, 'epoch': 0.85} 85%|████████▌ | 2362/2774 [7:45:27<1:19:33, 11.59s/it] 85%|████████▌ | 2363/2774 [7:45:38<1:18:30, 11.46s/it] {'loss': 1.0576, 'learning_rate': 2.825103672623003e-07, 'epoch': 0.85} 85%|████████▌ | 2363/2774 [7:45:38<1:18:30, 11.46s/it] 85%|████████▌ | 2364/2774 [7:45:50<1:17:45, 11.38s/it] {'loss': 1.0005, 'learning_rate': 2.811636303394835e-07, 'epoch': 0.85} 85%|████████▌ | 2364/2774 [7:45:50<1:17:45, 11.38s/it] 85%|████████▌ | 2365/2774 [7:46:01<1:18:19, 11.49s/it] {'loss': 1.0786, 'learning_rate': 2.7981991977543865e-07, 'epoch': 0.85} 85%|████████▌ | 2365/2774 [7:46:01<1:18:19, 11.49s/it] 85%|████████▌ | 2366/2774 [7:46:13<1:17:57, 11.46s/it] {'loss': 1.0723, 'learning_rate': 2.784792374029055e-07, 'epoch': 0.85} 85%|████████▌ | 2366/2774 [7:46:13<1:17:57, 11.46s/it] 85%|████████▌ | 2367/2774 [7:46:27<1:24:02, 12.39s/it] {'loss': 0.9805, 'learning_rate': 2.7714158505049437e-07, 'epoch': 0.85} 85%|████████▌ | 2367/2774 [7:46:27<1:24:02, 12.39s/it] 85%|████████▌ | 2368/2774 [7:46:39<1:21:42, 12.07s/it] {'loss': 0.9829, 'learning_rate': 2.758069645426817e-07, 'epoch': 0.85} 85%|████████▌ | 2368/2774 [7:46:39<1:21:42, 12.07s/it] 85%|████████▌ | 2369/2774 [7:46:50<1:20:05, 11.87s/it] {'loss': 0.9932, 'learning_rate': 2.744753776998102e-07, 'epoch': 0.85} 85%|████████▌ | 2369/2774 [7:46:50<1:20:05, 11.87s/it] 85%|████████▌ | 2370/2774 [7:47:01<1:18:39, 11.68s/it] {'loss': 1.0312, 'learning_rate': 2.731468263380827e-07, 'epoch': 0.85} 85%|████████▌ | 2370/2774 [7:47:01<1:18:39, 11.68s/it] 85%|████████▌ | 2371/2774 [7:47:13<1:17:38, 11.56s/it] {'loss': 1.042, 'learning_rate': 2.7182131226956427e-07, 'epoch': 0.85} 85%|████████▌ | 2371/2774 [7:47:13<1:17:38, 11.56s/it] 86%|████████▌ | 2372/2774 [7:47:24<1:17:45, 11.61s/it] {'loss': 1.0874, 'learning_rate': 2.7049883730217526e-07, 'epoch': 0.86} 86%|████████▌ | 2372/2774 [7:47:24<1:17:45, 11.61s/it] 86%|████████▌ | 2373/2774 [7:47:36<1:17:15, 11.56s/it] {'loss': 1.0752, 'learning_rate': 2.691794032396916e-07, 'epoch': 0.86} 86%|████████▌ | 2373/2774 [7:47:36<1:17:15, 11.56s/it] 86%|████████▌ | 2374/2774 [7:47:48<1:17:56, 11.69s/it] {'loss': 0.9507, 'learning_rate': 2.678630118817413e-07, 'epoch': 0.86} 86%|████████▌ | 2374/2774 [7:47:48<1:17:56, 11.69s/it] 86%|████████▌ | 2375/2774 [7:47:59<1:17:16, 11.62s/it] {'loss': 1.0298, 'learning_rate': 2.6654966502380365e-07, 'epoch': 0.86} 86%|████████▌ | 2375/2774 [7:47:59<1:17:16, 11.62s/it] 86%|████████▌ | 2376/2774 [7:48:11<1:16:45, 11.57s/it] {'loss': 1.0508, 'learning_rate': 2.6523936445720407e-07, 'epoch': 0.86} 86%|████████▌ | 2376/2774 [7:48:11<1:16:45, 11.57s/it] 86%|████████▌ | 2377/2774 [7:48:25<1:22:00, 12.39s/it] {'loss': 0.9697, 'learning_rate': 2.6393211196911267e-07, 'epoch': 0.86} 86%|████████▌ | 2377/2774 [7:48:25<1:22:00, 12.39s/it] 86%|████████▌ | 2378/2774 [7:48:36<1:19:39, 12.07s/it] {'loss': 0.9995, 'learning_rate': 2.626279093425438e-07, 'epoch': 0.86} 86%|████████▌ | 2378/2774 [7:48:36<1:19:39, 12.07s/it] 86%|████████▌ | 2379/2774 [7:48:48<1:18:04, 11.86s/it] {'loss': 1.0083, 'learning_rate': 2.6132675835635e-07, 'epoch': 0.86} 86%|████████▌ | 2379/2774 [7:48:48<1:18:04, 11.86s/it] 86%|████████▌ | 2380/2774 [7:48:59<1:16:50, 11.70s/it] {'loss': 1.0146, 'learning_rate': 2.6002866078522425e-07, 'epoch': 0.86} 86%|████████▌ | 2380/2774 [7:48:59<1:16:50, 11.70s/it] 86%|████████▌ | 2381/2774 [7:49:10<1:16:01, 11.61s/it] {'loss': 1.0605, 'learning_rate': 2.587336183996914e-07, 'epoch': 0.86} 86%|████████▌ | 2381/2774 [7:49:10<1:16:01, 11.61s/it] 86%|████████▌ | 2382/2774 [7:49:22<1:15:20, 11.53s/it] {'loss': 1.0439, 'learning_rate': 2.5744163296611307e-07, 'epoch': 0.86} 86%|████████▌ | 2382/2774 [7:49:22<1:15:20, 11.53s/it] 86%|████████▌ | 2383/2774 [7:49:33<1:15:18, 11.56s/it] {'loss': 1.0088, 'learning_rate': 2.5615270624667706e-07, 'epoch': 0.86} 86%|████████▌ | 2383/2774 [7:49:33<1:15:18, 11.56s/it] 86%|████████▌ | 2384/2774 [7:49:45<1:15:29, 11.61s/it] {'loss': 1.0249, 'learning_rate': 2.5486683999940335e-07, 'epoch': 0.86} 86%|████████▌ | 2384/2774 [7:49:45<1:15:29, 11.61s/it] 86%|████████▌ | 2385/2774 [7:49:57<1:15:59, 11.72s/it] {'loss': 1.02, 'learning_rate': 2.5358403597813443e-07, 'epoch': 0.86} 86%|████████▌ | 2385/2774 [7:49:57<1:15:59, 11.72s/it] 86%|████████▌ | 2386/2774 [7:50:08<1:15:12, 11.63s/it] {'loss': 1.0566, 'learning_rate': 2.5230429593253893e-07, 'epoch': 0.86} 86%|████████▌ | 2386/2774 [7:50:08<1:15:12, 11.63s/it] 86%|████████▌ | 2387/2774 [7:50:22<1:18:02, 12.10s/it] {'loss': 1.0205, 'learning_rate': 2.510276216081037e-07, 'epoch': 0.86} 86%|████████▌ | 2387/2774 [7:50:22<1:18:02, 12.10s/it] 86%|████████▌ | 2388/2774 [7:50:33<1:16:27, 11.88s/it] {'loss': 1.0513, 'learning_rate': 2.497540147461361e-07, 'epoch': 0.86} 86%|████████▌ | 2388/2774 [7:50:33<1:16:27, 11.88s/it] 86%|████████▌ | 2389/2774 [7:50:44<1:15:05, 11.70s/it] {'loss': 0.9673, 'learning_rate': 2.484834770837585e-07, 'epoch': 0.86} 86%|████████▌ | 2389/2774 [7:50:44<1:15:05, 11.70s/it] 86%|████████▌ | 2390/2774 [7:50:57<1:16:29, 11.95s/it] {'loss': 0.9937, 'learning_rate': 2.472160103539084e-07, 'epoch': 0.86} 86%|████████▌ | 2390/2774 [7:50:57<1:16:29, 11.95s/it] 86%|████████▌ | 2391/2774 [7:51:10<1:18:05, 12.23s/it] {'loss': 1.0112, 'learning_rate': 2.4595161628533315e-07, 'epoch': 0.86} 86%|████████▌ | 2391/2774 [7:51:10<1:18:05, 12.23s/it] 86%|████████▌ | 2392/2774 [7:51:22<1:17:55, 12.24s/it] {'loss': 0.9502, 'learning_rate': 2.446902966025902e-07, 'epoch': 0.86} 86%|████████▌ | 2392/2774 [7:51:22<1:17:55, 12.24s/it] 86%|████████▋ | 2393/2774 [7:51:34<1:16:39, 12.07s/it] {'loss': 1.085, 'learning_rate': 2.4343205302604254e-07, 'epoch': 0.86} 86%|████████▋ | 2393/2774 [7:51:34<1:16:39, 12.07s/it] 86%|████████▋ | 2394/2774 [7:51:45<1:15:25, 11.91s/it] {'loss': 1.0283, 'learning_rate': 2.421768872718594e-07, 'epoch': 0.86} 86%|████████▋ | 2394/2774 [7:51:45<1:15:25, 11.91s/it] 86%|████████▋ | 2395/2774 [7:51:56<1:13:59, 11.71s/it] {'loss': 1.0137, 'learning_rate': 2.4092480105201043e-07, 'epoch': 0.86} 86%|████████▋ | 2395/2774 [7:51:56<1:13:59, 11.71s/it] 86%|████████▋ | 2396/2774 [7:52:08<1:12:58, 11.58s/it] {'loss': 1.0186, 'learning_rate': 2.396757960742663e-07, 'epoch': 0.86} 86%|████████▋ | 2396/2774 [7:52:08<1:12:58, 11.58s/it] 86%|████████▋ | 2397/2774 [7:52:19<1:13:02, 11.63s/it] {'loss': 1.0522, 'learning_rate': 2.384298740421939e-07, 'epoch': 0.86} 86%|████████▋ | 2397/2774 [7:52:19<1:13:02, 11.63s/it] 86%|████████▋ | 2398/2774 [7:52:31<1:13:09, 11.67s/it] {'loss': 1.0171, 'learning_rate': 2.3718703665515515e-07, 'epoch': 0.86} 86%|████████▋ | 2398/2774 [7:52:31<1:13:09, 11.67s/it] 86%|████████▋ | 2399/2774 [7:52:42<1:11:59, 11.52s/it] {'loss': 1.0117, 'learning_rate': 2.3594728560830615e-07, 'epoch': 0.86} 86%|████████▋ | 2399/2774 [7:52:42<1:11:59, 11.52s/it] 87%|████████▋ | 2400/2774 [7:52:54<1:11:20, 11.45s/it] {'loss': 1.0171, 'learning_rate': 2.3471062259259187e-07, 'epoch': 0.87} 87%|████████▋ | 2400/2774 [7:52:54<1:11:20, 11.45s/it]/usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 87%|████████▋ | 2401/2774 [7:53:32<2:00:44, 19.42s/it] {'loss': 0.979, 'learning_rate': 2.3347704929474606e-07, 'epoch': 0.87} 87%|████████▋ | 2401/2774 [7:53:32<2:00:44, 19.42s/it] 87%|████████▋ | 2402/2774 [7:53:43<1:45:52, 17.08s/it] {'loss': 0.9883, 'learning_rate': 2.3224656739728761e-07, 'epoch': 0.87} 87%|████████▋ | 2402/2774 [7:53:43<1:45:52, 17.08s/it] 87%|████████▋ | 2403/2774 [7:53:55<1:35:49, 15.50s/it] {'loss': 1.0498, 'learning_rate': 2.310191785785207e-07, 'epoch': 0.87} 87%|████████▋ | 2403/2774 [7:53:55<1:35:49, 15.50s/it] 87%|████████▋ | 2404/2774 [7:54:08<1:29:57, 14.59s/it] {'loss': 0.9873, 'learning_rate': 2.2979488451252834e-07, 'epoch': 0.87} 87%|████████▋ | 2404/2774 [7:54:08<1:29:57, 14.59s/it] 87%|████████▋ | 2405/2774 [7:54:19<1:24:01, 13.66s/it] {'loss': 0.9927, 'learning_rate': 2.285736868691746e-07, 'epoch': 0.87} 87%|████████▋ | 2405/2774 [7:54:19<1:24:01, 13.66s/it] 87%|████████▋ | 2406/2774 [7:54:31<1:19:38, 12.99s/it] {'loss': 1.0112, 'learning_rate': 2.273555873140984e-07, 'epoch': 0.87} 87%|████████▋ | 2406/2774 [7:54:31<1:19:38, 12.99s/it] 87%|████████▋ | 2407/2774 [7:54:42<1:16:30, 12.51s/it] {'loss': 1.0156, 'learning_rate': 2.26140587508715e-07, 'epoch': 0.87} 87%|████████▋ | 2407/2774 [7:54:42<1:16:30, 12.51s/it] 87%|████████▋ | 2408/2774 [7:54:53<1:14:31, 12.22s/it] {'loss': 0.9893, 'learning_rate': 2.249286891102098e-07, 'epoch': 0.87} 87%|████████▋ | 2408/2774 [7:54:53<1:14:31, 12.22s/it] 87%|████████▋ | 2409/2774 [7:55:05<1:12:40, 11.95s/it] {'loss': 0.9448, 'learning_rate': 2.2371989377154013e-07, 'epoch': 0.87} 87%|████████▋ | 2409/2774 [7:55:05<1:12:40, 11.95s/it] 87%|████████▋ | 2410/2774 [7:55:16<1:11:20, 11.76s/it] {'loss': 1.0483, 'learning_rate': 2.2251420314142845e-07, 'epoch': 0.87} 87%|████████▋ | 2410/2774 [7:55:16<1:11:20, 11.76s/it] 87%|████████▋ | 2411/2774 [7:55:28<1:10:53, 11.72s/it] {'loss': 1.0332, 'learning_rate': 2.2131161886436465e-07, 'epoch': 0.87} 87%|████████▋ | 2411/2774 [7:55:28<1:10:53, 11.72s/it] 87%|████████▋ | 2412/2774 [7:55:39<1:09:47, 11.57s/it] {'loss': 0.9751, 'learning_rate': 2.2011214258060076e-07, 'epoch': 0.87} 87%|████████▋ | 2412/2774 [7:55:39<1:09:47, 11.57s/it] 87%|████████▋ | 2413/2774 [7:55:51<1:09:38, 11.58s/it] {'loss': 1.0034, 'learning_rate': 2.1891577592615065e-07, 'epoch': 0.87} 87%|████████▋ | 2413/2774 [7:55:51<1:09:38, 11.58s/it] 87%|████████▋ | 2414/2774 [7:56:02<1:09:35, 11.60s/it] {'loss': 1.0811, 'learning_rate': 2.1772252053278513e-07, 'epoch': 0.87} 87%|████████▋ | 2414/2774 [7:56:02<1:09:35, 11.60s/it] 87%|████████▋ | 2415/2774 [7:56:14<1:09:08, 11.56s/it] {'loss': 1.0391, 'learning_rate': 2.1653237802803345e-07, 'epoch': 0.87} 87%|████████▋ | 2415/2774 [7:56:14<1:09:08, 11.56s/it] 87%|████████▋ | 2416/2774 [7:56:25<1:08:33, 11.49s/it] {'loss': 0.9873, 'learning_rate': 2.1534535003517736e-07, 'epoch': 0.87} 87%|████████▋ | 2416/2774 [7:56:25<1:08:33, 11.49s/it] 87%|████████▋ | 2417/2774 [7:56:37<1:08:27, 11.51s/it] {'loss': 1.0005, 'learning_rate': 2.1416143817325207e-07, 'epoch': 0.87} 87%|████████▋ | 2417/2774 [7:56:37<1:08:27, 11.51s/it] 87%|████████▋ | 2418/2774 [7:56:49<1:09:36, 11.73s/it] {'loss': 1.0801, 'learning_rate': 2.1298064405704145e-07, 'epoch': 0.87} 87%|████████▋ | 2418/2774 [7:56:49<1:09:36, 11.73s/it] 87%|████████▋ | 2419/2774 [7:57:01<1:10:48, 11.97s/it] {'loss': 1.0039, 'learning_rate': 2.1180296929707717e-07, 'epoch': 0.87} 87%|████████▋ | 2419/2774 [7:57:01<1:10:48, 11.97s/it] 87%|████████▋ | 2420/2774 [7:57:12<1:09:05, 11.71s/it] {'loss': 1.0288, 'learning_rate': 2.106284154996363e-07, 'epoch': 0.87} 87%|████████▋ | 2420/2774 [7:57:12<1:09:05, 11.71s/it] 87%|████████▋ | 2421/2774 [7:57:25<1:10:38, 12.01s/it] {'loss': 1.0098, 'learning_rate': 2.0945698426673988e-07, 'epoch': 0.87} 87%|████████▋ | 2421/2774 [7:57:25<1:10:38, 12.01s/it] 87%|████████▋ | 2422/2774 [7:57:36<1:09:22, 11.82s/it] {'loss': 1.0049, 'learning_rate': 2.0828867719614926e-07, 'epoch': 0.87} 87%|████████▋ | 2422/2774 [7:57:36<1:09:22, 11.82s/it] 87%|████████▋ | 2423/2774 [7:57:48<1:08:47, 11.76s/it] {'loss': 0.9868, 'learning_rate': 2.0712349588136392e-07, 'epoch': 0.87} 87%|████████▋ | 2423/2774 [7:57:48<1:08:47, 11.76s/it] 87%|████████▋ | 2424/2774 [7:58:00<1:08:02, 11.67s/it] {'loss': 1.0093, 'learning_rate': 2.0596144191162182e-07, 'epoch': 0.87} 87%|████████▋ | 2424/2774 [7:58:00<1:08:02, 11.67s/it] 87%|████████▋ | 2425/2774 [7:58:11<1:07:22, 11.58s/it] {'loss': 1.0298, 'learning_rate': 2.048025168718934e-07, 'epoch': 0.87} 87%|████████▋ | 2425/2774 [7:58:11<1:07:22, 11.58s/it] 87%|████████▋ | 2426/2774 [7:58:23<1:07:12, 11.59s/it] {'loss': 1.0811, 'learning_rate': 2.036467223428834e-07, 'epoch': 0.87} 87%|████████▋ | 2426/2774 [7:58:23<1:07:12, 11.59s/it] 87%|████████▋ | 2427/2774 [7:58:36<1:10:12, 12.14s/it] {'loss': 0.9634, 'learning_rate': 2.0249405990102554e-07, 'epoch': 0.87} 87%|████████▋ | 2427/2774 [7:58:36<1:10:12, 12.14s/it] 88%|████████▊ | 2428/2774 [7:58:47<1:08:24, 11.86s/it] {'loss': 1.0093, 'learning_rate': 2.0134453111848112e-07, 'epoch': 0.88} 88%|████████▊ | 2428/2774 [7:58:47<1:08:24, 11.86s/it] 88%|████████▊ | 2429/2774 [7:58:59<1:07:17, 11.70s/it] {'loss': 1.0107, 'learning_rate': 2.0019813756313815e-07, 'epoch': 0.88} 88%|████████▊ | 2429/2774 [7:58:59<1:07:17, 11.70s/it] 88%|████████▊ | 2430/2774 [7:59:10<1:06:55, 11.67s/it] {'loss': 0.9697, 'learning_rate': 1.990548807986084e-07, 'epoch': 0.88} 88%|████████▊ | 2430/2774 [7:59:10<1:06:55, 11.67s/it] 88%|████████▊ | 2431/2774 [7:59:21<1:06:10, 11.58s/it] {'loss': 1.0566, 'learning_rate': 1.979147623842248e-07, 'epoch': 0.88} 88%|████████▊ | 2431/2774 [7:59:21<1:06:10, 11.58s/it] 88%|████████▊ | 2432/2774 [7:59:34<1:07:30, 11.84s/it] {'loss': 1.0483, 'learning_rate': 1.9677778387504064e-07, 'epoch': 0.88} 88%|████████▊ | 2432/2774 [7:59:34<1:07:30, 11.84s/it] 88%|████████▊ | 2433/2774 [7:59:46<1:06:52, 11.77s/it] {'loss': 1.0356, 'learning_rate': 1.9564394682182518e-07, 'epoch': 0.88} 88%|████████▊ | 2433/2774 [7:59:46<1:06:52, 11.77s/it] 88%|████████▊ | 2434/2774 [7:59:57<1:06:02, 11.65s/it] {'loss': 0.979, 'learning_rate': 1.9451325277106415e-07, 'epoch': 0.88} 88%|████████▊ | 2434/2774 [7:59:57<1:06:02, 11.65s/it] 88%|████████▊ | 2435/2774 [8:00:09<1:05:49, 11.65s/it] {'loss': 1.0063, 'learning_rate': 1.9338570326495555e-07, 'epoch': 0.88} 88%|████████▊ | 2435/2774 [8:00:09<1:05:49, 11.65s/it] 88%|████████▊ | 2436/2774 [8:00:20<1:05:52, 11.69s/it] {'loss': 1.0327, 'learning_rate': 1.9226129984140945e-07, 'epoch': 0.88} 88%|████████▊ | 2436/2774 [8:00:20<1:05:52, 11.69s/it] 88%|████████▊ | 2437/2774 [8:00:34<1:08:37, 12.22s/it] {'loss': 1.084, 'learning_rate': 1.911400440340433e-07, 'epoch': 0.88} 88%|████████▊ | 2437/2774 [8:00:34<1:08:37, 12.22s/it] 88%|████████▊ | 2438/2774 [8:00:45<1:06:47, 11.93s/it] {'loss': 1.0742, 'learning_rate': 1.9002193737218288e-07, 'epoch': 0.88} 88%|████████▊ | 2438/2774 [8:00:45<1:06:47, 11.93s/it] 88%|████████▊ | 2439/2774 [8:00:56<1:05:31, 11.74s/it] {'loss': 1.0225, 'learning_rate': 1.889069813808575e-07, 'epoch': 0.88} 88%|████████▊ | 2439/2774 [8:00:56<1:05:31, 11.74s/it] 88%|████████▊ | 2440/2774 [8:01:07<1:04:22, 11.56s/it] {'loss': 1.0371, 'learning_rate': 1.8779517758080096e-07, 'epoch': 0.88} 88%|████████▊ | 2440/2774 [8:01:07<1:04:22, 11.56s/it] 88%|████████▊ | 2441/2774 [8:01:20<1:05:53, 11.87s/it] {'loss': 1.0156, 'learning_rate': 1.8668652748844524e-07, 'epoch': 0.88} 88%|████████▊ | 2441/2774 [8:01:20<1:05:53, 11.87s/it] 88%|████████▊ | 2442/2774 [8:01:33<1:07:27, 12.19s/it] {'loss': 0.9971, 'learning_rate': 1.8558103261592298e-07, 'epoch': 0.88} 88%|████████▊ | 2442/2774 [8:01:33<1:07:27, 12.19s/it] 88%|████████▊ | 2443/2774 [8:01:47<1:10:00, 12.69s/it] {'loss': 0.9946, 'learning_rate': 1.8447869447106194e-07, 'epoch': 0.88} 88%|████████▊ | 2443/2774 [8:01:47<1:10:00, 12.69s/it] 88%|████████▊ | 2444/2774 [8:01:58<1:07:24, 12.26s/it] {'loss': 1.061, 'learning_rate': 1.8337951455738469e-07, 'epoch': 0.88} 88%|████████▊ | 2444/2774 [8:01:58<1:07:24, 12.26s/it] 88%|████████▊ | 2445/2774 [8:02:10<1:06:24, 12.11s/it] {'loss': 1.002, 'learning_rate': 1.822834943741067e-07, 'epoch': 0.88} 88%|████████▊ | 2445/2774 [8:02:10<1:06:24, 12.11s/it] 88%|████████▊ | 2446/2774 [8:02:21<1:05:20, 11.95s/it] {'loss': 1.0156, 'learning_rate': 1.8119063541613302e-07, 'epoch': 0.88} 88%|████████▊ | 2446/2774 [8:02:21<1:05:20, 11.95s/it] 88%|████████▊ | 2447/2774 [8:02:33<1:03:48, 11.71s/it] {'loss': 0.9839, 'learning_rate': 1.8010093917405714e-07, 'epoch': 0.88} 88%|████████▊ | 2447/2774 [8:02:33<1:03:48, 11.71s/it] 88%|████████▊ | 2448/2774 [8:02:44<1:03:27, 11.68s/it] {'loss': 1.061, 'learning_rate': 1.7901440713415873e-07, 'epoch': 0.88} 88%|████████▊ | 2448/2774 [8:02:44<1:03:27, 11.68s/it] 88%|████████▊ | 2449/2774 [8:02:58<1:06:01, 12.19s/it] {'loss': 1.0303, 'learning_rate': 1.779310407784024e-07, 'epoch': 0.88} 88%|████████▊ | 2449/2774 [8:02:58<1:06:01, 12.19s/it] 88%|████████▊ | 2450/2774 [8:03:09<1:04:49, 12.00s/it] {'loss': 1.022, 'learning_rate': 1.768508415844339e-07, 'epoch': 0.88} 88%|████████▊ | 2450/2774 [8:03:09<1:04:49, 12.00s/it] 88%|████████▊ | 2451/2774 [8:03:20<1:03:15, 11.75s/it] {'loss': 1.0259, 'learning_rate': 1.757738110255802e-07, 'epoch': 0.88} 88%|████████▊ | 2451/2774 [8:03:20<1:03:15, 11.75s/it] 88%|████████▊ | 2452/2774 [8:03:34<1:05:33, 12.22s/it] {'loss': 1.0674, 'learning_rate': 1.746999505708452e-07, 'epoch': 0.88} 88%|████████▊ | 2452/2774 [8:03:34<1:05:33, 12.22s/it] 88%|████████▊ | 2453/2774 [8:03:45<1:04:26, 12.05s/it] {'loss': 1.0098, 'learning_rate': 1.7362926168491057e-07, 'epoch': 0.88} 88%|████████▊ | 2453/2774 [8:03:45<1:04:26, 12.05s/it] 88%|████████▊ | 2454/2774 [8:03:57<1:02:58, 11.81s/it] {'loss': 1.02, 'learning_rate': 1.725617458281309e-07, 'epoch': 0.88} 88%|████████▊ | 2454/2774 [8:03:57<1:02:58, 11.81s/it] 89%|████████▊ | 2455/2774 [8:04:10<1:04:49, 12.19s/it] {'loss': 1.0654, 'learning_rate': 1.7149740445653345e-07, 'epoch': 0.89} 89%|████████▊ | 2455/2774 [8:04:10<1:04:49, 12.19s/it] 89%|████████▊ | 2456/2774 [8:04:21<1:03:48, 12.04s/it] {'loss': 1.0288, 'learning_rate': 1.7043623902181478e-07, 'epoch': 0.89} 89%|████████▊ | 2456/2774 [8:04:21<1:03:48, 12.04s/it] 89%|████████▊ | 2457/2774 [8:04:33<1:02:34, 11.84s/it] {'loss': 1.0459, 'learning_rate': 1.6937825097134126e-07, 'epoch': 0.89} 89%|████████▊ | 2457/2774 [8:04:33<1:02:34, 11.84s/it] 89%|████████▊ | 2458/2774 [8:04:44<1:01:41, 11.71s/it] {'loss': 0.998, 'learning_rate': 1.6832344174814413e-07, 'epoch': 0.89} 89%|████████▊ | 2458/2774 [8:04:44<1:01:41, 11.71s/it] 89%|████████▊ | 2459/2774 [8:04:57<1:03:45, 12.14s/it] {'loss': 1.0039, 'learning_rate': 1.6727181279092037e-07, 'epoch': 0.89} 89%|████████▊ | 2459/2774 [8:04:57<1:03:45, 12.14s/it] 89%|████████▊ | 2460/2774 [8:05:09<1:02:41, 11.98s/it] {'loss': 1.0693, 'learning_rate': 1.662233655340273e-07, 'epoch': 0.89} 89%|████████▊ | 2460/2774 [8:05:09<1:02:41, 11.98s/it] 89%|████████▊ | 2461/2774 [8:05:20<1:01:37, 11.81s/it] {'loss': 1.0024, 'learning_rate': 1.6517810140748436e-07, 'epoch': 0.89} 89%|████████▊ | 2461/2774 [8:05:20<1:01:37, 11.81s/it] 89%|████████▉ | 2462/2774 [8:05:31<1:00:29, 11.63s/it] {'loss': 1.0796, 'learning_rate': 1.6413602183696808e-07, 'epoch': 0.89} 89%|████████▉ | 2462/2774 [8:05:31<1:00:29, 11.63s/it] 89%|████████▉ | 2463/2774 [8:05:44<1:00:55, 11.75s/it] {'loss': 1.0527, 'learning_rate': 1.6309712824381318e-07, 'epoch': 0.89} 89%|████████▉ | 2463/2774 [8:05:44<1:00:55, 11.75s/it] 89%|████████▉ | 2464/2774 [8:05:55<1:00:18, 11.67s/it] {'loss': 1.042, 'learning_rate': 1.620614220450062e-07, 'epoch': 0.89} 89%|████████▉ | 2464/2774 [8:05:55<1:00:18, 11.67s/it] 89%|████████▉ | 2465/2774 [8:06:06<59:30, 11.55s/it] {'loss': 1.0562, 'learning_rate': 1.6102890465318904e-07, 'epoch': 0.89} 89%|████████▉ | 2465/2774 [8:06:06<59:30, 11.55s/it] 89%|████████▉ | 2466/2774 [8:06:18<59:45, 11.64s/it] {'loss': 0.9609, 'learning_rate': 1.5999957747665191e-07, 'epoch': 0.89} 89%|████████▉ | 2466/2774 [8:06:18<59:45, 11.64s/it] 89%|████████▉ | 2467/2774 [8:06:29<58:52, 11.51s/it] {'loss': 1.0127, 'learning_rate': 1.5897344191933617e-07, 'epoch': 0.89} 89%|████████▉ | 2467/2774 [8:06:29<58:52, 11.51s/it] 89%|████████▉ | 2468/2774 [8:06:42<1:01:00, 11.96s/it] {'loss': 0.9526, 'learning_rate': 1.5795049938082841e-07, 'epoch': 0.89} 89%|████████▉ | 2468/2774 [8:06:42<1:01:00, 11.96s/it] 89%|████████▉ | 2469/2774 [8:06:54<59:46, 11.76s/it] {'loss': 1.0396, 'learning_rate': 1.5693075125635949e-07, 'epoch': 0.89} 89%|████████▉ | 2469/2774 [8:06:54<59:46, 11.76s/it] 89%|████████▉ | 2470/2774 [8:07:05<59:12, 11.69s/it] {'loss': 1.062, 'learning_rate': 1.5591419893680542e-07, 'epoch': 0.89} 89%|████████▉ | 2470/2774 [8:07:05<59:12, 11.69s/it] 89%|████████▉ | 2471/2774 [8:07:16<58:29, 11.58s/it] {'loss': 0.9961, 'learning_rate': 1.549008438086813e-07, 'epoch': 0.89} 89%|████████▉ | 2471/2774 [8:07:16<58:29, 11.58s/it] 89%|████████▉ | 2472/2774 [8:07:28<58:32, 11.63s/it] {'loss': 1.0503, 'learning_rate': 1.5389068725414346e-07, 'epoch': 0.89} 89%|████████▉ | 2472/2774 [8:07:28<58:32, 11.63s/it] 89%|████████▉ | 2473/2774 [8:07:40<59:17, 11.82s/it] {'loss': 1.0786, 'learning_rate': 1.5288373065098284e-07, 'epoch': 0.89} 89%|████████▉ | 2473/2774 [8:07:40<59:17, 11.82s/it] 89%|████████▉ | 2474/2774 [8:07:52<58:12, 11.64s/it] {'loss': 1.0371, 'learning_rate': 1.5187997537262882e-07, 'epoch': 0.89} 89%|████████▉ | 2474/2774 [8:07:52<58:12, 11.64s/it] 89%|████████▉ | 2475/2774 [8:08:03<57:57, 11.63s/it] {'loss': 0.9902, 'learning_rate': 1.5087942278814188e-07, 'epoch': 0.89} 89%|████████▉ | 2475/2774 [8:08:03<57:57, 11.63s/it] 89%|████████▉ | 2476/2774 [8:08:15<57:31, 11.58s/it] {'loss': 0.9819, 'learning_rate': 1.4988207426221617e-07, 'epoch': 0.89} 89%|████████▉ | 2476/2774 [8:08:15<57:31, 11.58s/it] 89%|████████▉ | 2477/2774 [8:08:26<57:26, 11.60s/it] {'loss': 0.9756, 'learning_rate': 1.4888793115517412e-07, 'epoch': 0.89} 89%|████████▉ | 2477/2774 [8:08:26<57:26, 11.60s/it] 89%|████████▉ | 2478/2774 [8:08:40<59:55, 12.15s/it] {'loss': 1.0034, 'learning_rate': 1.478969948229675e-07, 'epoch': 0.89} 89%|████████▉ | 2478/2774 [8:08:40<59:55, 12.15s/it] 89%|████████▉ | 2479/2774 [8:08:51<58:15, 11.85s/it] {'loss': 1.0073, 'learning_rate': 1.4690926661717313e-07, 'epoch': 0.89} 89%|████████▉ | 2479/2774 [8:08:51<58:15, 11.85s/it] 89%|████████▉ | 2480/2774 [8:09:03<57:36, 11.76s/it] {'loss': 1.0615, 'learning_rate': 1.4592474788499317e-07, 'epoch': 0.89} 89%|████████▉ | 2480/2774 [8:09:03<57:36, 11.76s/it] 89%|████████▉ | 2481/2774 [8:09:15<58:49, 12.05s/it] {'loss': 0.979, 'learning_rate': 1.449434399692512e-07, 'epoch': 0.89} 89%|████████▉ | 2481/2774 [8:09:15<58:49, 12.05s/it] 89%|████████▉ | 2482/2774 [8:09:27<58:02, 11.93s/it] {'loss': 1.0068, 'learning_rate': 1.4396534420839214e-07, 'epoch': 0.89} 89%|████████▉ | 2482/2774 [8:09:27<58:02, 11.93s/it] 90%|████████▉ | 2483/2774 [8:09:39<58:22, 12.04s/it] {'loss': 1.0342, 'learning_rate': 1.4299046193647914e-07, 'epoch': 0.9} 90%|████████▉ | 2483/2774 [8:09:39<58:22, 12.04s/it] 90%|████████▉ | 2484/2774 [8:09:51<57:42, 11.94s/it] {'loss': 1.0298, 'learning_rate': 1.4201879448319356e-07, 'epoch': 0.9} 90%|████████▉ | 2484/2774 [8:09:51<57:42, 11.94s/it] 90%|████████▉ | 2485/2774 [8:10:02<56:23, 11.71s/it] {'loss': 1.0645, 'learning_rate': 1.4105034317383e-07, 'epoch': 0.9} 90%|████████▉ | 2485/2774 [8:10:02<56:23, 11.71s/it] 90%|████████▉ | 2486/2774 [8:10:13<55:35, 11.58s/it] {'loss': 0.9995, 'learning_rate': 1.4008510932929848e-07, 'epoch': 0.9} 90%|████████▉ | 2486/2774 [8:10:13<55:35, 11.58s/it] 90%|████████▉ | 2487/2774 [8:10:26<56:22, 11.78s/it] {'loss': 1.0591, 'learning_rate': 1.3912309426611924e-07, 'epoch': 0.9} 90%|████████▉ | 2487/2774 [8:10:26<56:22, 11.78s/it] 90%|████████▉ | 2488/2774 [8:10:37<55:24, 11.63s/it] {'loss': 0.9956, 'learning_rate': 1.38164299296423e-07, 'epoch': 0.9} 90%|████████▉ | 2488/2774 [8:10:37<55:24, 11.63s/it] 90%|████████▉ | 2489/2774 [8:10:49<55:23, 11.66s/it] {'loss': 0.9932, 'learning_rate': 1.372087257279478e-07, 'epoch': 0.9} 90%|████████▉ | 2489/2774 [8:10:49<55:23, 11.66s/it] 90%|████████▉ | 2490/2774 [8:11:01<56:38, 11.97s/it] {'loss': 1.0444, 'learning_rate': 1.362563748640386e-07, 'epoch': 0.9} 90%|████████▉ | 2490/2774 [8:11:01<56:38, 11.97s/it] 90%|████████▉ | 2491/2774 [8:11:12<55:16, 11.72s/it] {'loss': 1.0303, 'learning_rate': 1.353072480036438e-07, 'epoch': 0.9} 90%|████████▉ | 2491/2774 [8:11:12<55:16, 11.72s/it] 90%|████████▉ | 2492/2774 [8:11:24<54:37, 11.62s/it] {'loss': 1.0664, 'learning_rate': 1.3436134644131627e-07, 'epoch': 0.9} 90%|████████▉ | 2492/2774 [8:11:24<54:37, 11.62s/it] 90%|████████▉ | 2493/2774 [8:11:35<53:59, 11.53s/it] {'loss': 1.0273, 'learning_rate': 1.3341867146720755e-07, 'epoch': 0.9} 90%|████████▉ | 2493/2774 [8:11:35<53:59, 11.53s/it] 90%|████████▉ | 2494/2774 [8:11:47<53:39, 11.50s/it] {'loss': 1.0142, 'learning_rate': 1.3247922436706972e-07, 'epoch': 0.9} 90%|████████▉ | 2494/2774 [8:11:47<53:39, 11.50s/it] 90%|████████▉ | 2495/2774 [8:11:58<53:15, 11.45s/it] {'loss': 1.0796, 'learning_rate': 1.3154300642225198e-07, 'epoch': 0.9} 90%|████████▉ | 2495/2774 [8:11:58<53:15, 11.45s/it] 90%|████████▉ | 2496/2774 [8:12:09<52:52, 11.41s/it] {'loss': 1.0122, 'learning_rate': 1.306100189096987e-07, 'epoch': 0.9} 90%|████████▉ | 2496/2774 [8:12:09<52:52, 11.41s/it] 90%|█████████ | 2497/2774 [8:12:22<53:52, 11.67s/it] {'loss': 0.9663, 'learning_rate': 1.2968026310194892e-07, 'epoch': 0.9} 90%|█████████ | 2497/2774 [8:12:22<53:52, 11.67s/it] 90%|█████████ | 2498/2774 [8:12:33<53:24, 11.61s/it] {'loss': 1.0693, 'learning_rate': 1.2875374026713294e-07, 'epoch': 0.9} 90%|█████████ | 2498/2774 [8:12:33<53:24, 11.61s/it] 90%|█████████ | 2499/2774 [8:12:45<53:28, 11.67s/it] {'loss': 1.0288, 'learning_rate': 1.2783045166897296e-07, 'epoch': 0.9} 90%|█████████ | 2499/2774 [8:12:45<53:28, 11.67s/it] 90%|█████████ | 2500/2774 [8:12:57<54:42, 11.98s/it] {'loss': 1.0083, 'learning_rate': 1.2691039856677744e-07, 'epoch': 0.9} 90%|█████████ | 2500/2774 [8:12:58<54:42, 11.98s/it] 90%|█████████ | 2501/2774 [8:13:09<53:29, 11.76s/it] {'loss': 1.0503, 'learning_rate': 1.259935822154443e-07, 'epoch': 0.9} 90%|█████████ | 2501/2774 [8:13:09<53:29, 11.76s/it] 90%|█████████ | 2502/2774 [8:13:20<53:12, 11.74s/it] {'loss': 0.9917, 'learning_rate': 1.2508000386545482e-07, 'epoch': 0.9} 90%|█████████ | 2502/2774 [8:13:20<53:12, 11.74s/it] 90%|█████████ | 2503/2774 [8:13:32<52:34, 11.64s/it] {'loss': 1.0308, 'learning_rate': 1.2416966476287538e-07, 'epoch': 0.9} 90%|█████████ | 2503/2774 [8:13:32<52:34, 11.64s/it] 90%|█████████ | 2504/2774 [8:13:43<51:53, 11.53s/it] {'loss': 1.0386, 'learning_rate': 1.2326256614935306e-07, 'epoch': 0.9} 90%|█████████ | 2504/2774 [8:13:43<51:53, 11.53s/it] 90%|█████████ | 2505/2774 [8:13:55<51:42, 11.53s/it] {'loss': 1.0693, 'learning_rate': 1.223587092621162e-07, 'epoch': 0.9} 90%|█████████ | 2505/2774 [8:13:55<51:42, 11.53s/it] 90%|█████████ | 2506/2774 [8:14:06<51:34, 11.55s/it] {'loss': 1.0361, 'learning_rate': 1.2145809533397e-07, 'epoch': 0.9} 90%|█████████ | 2506/2774 [8:14:06<51:34, 11.55s/it] 90%|█████████ | 2507/2774 [8:14:19<53:18, 11.98s/it] {'loss': 1.019, 'learning_rate': 1.2056072559329861e-07, 'epoch': 0.9} 90%|█████████ | 2507/2774 [8:14:19<53:18, 11.98s/it] 90%|█████████ | 2508/2774 [8:14:31<53:15, 12.01s/it] {'loss': 1.0361, 'learning_rate': 1.1966660126405934e-07, 'epoch': 0.9} 90%|█████████ | 2508/2774 [8:14:31<53:15, 12.01s/it] 90%|█████████ | 2509/2774 [8:14:43<52:42, 11.93s/it] {'loss': 1.0371, 'learning_rate': 1.1877572356578409e-07, 'epoch': 0.9} 90%|█████████ | 2509/2774 [8:14:43<52:42, 11.93s/it] 90%|█████████ | 2510/2774 [8:14:55<52:35, 11.95s/it] {'loss': 0.9746, 'learning_rate': 1.1788809371357568e-07, 'epoch': 0.9} 90%|█████████ | 2510/2774 [8:14:55<52:35, 11.95s/it] 91%|█████████ | 2511/2774 [8:15:06<51:40, 11.79s/it] {'loss': 1.0137, 'learning_rate': 1.1700371291810842e-07, 'epoch': 0.91} 91%|█████████ | 2511/2774 [8:15:06<51:40, 11.79s/it] 91%|█████████ | 2512/2774 [8:15:18<51:09, 11.72s/it] {'loss': 1.0518, 'learning_rate': 1.161225823856238e-07, 'epoch': 0.91} 91%|█████████ | 2512/2774 [8:15:18<51:09, 11.72s/it] 91%|█████████ | 2513/2774 [8:15:30<50:48, 11.68s/it] {'loss': 1.0591, 'learning_rate': 1.1524470331793075e-07, 'epoch': 0.91} 91%|█████████ | 2513/2774 [8:15:30<50:48, 11.68s/it] 91%|█████████ | 2514/2774 [8:15:41<50:07, 11.57s/it] {'loss': 0.9868, 'learning_rate': 1.143700769124037e-07, 'epoch': 0.91} 91%|█████████ | 2514/2774 [8:15:41<50:07, 11.57s/it] 91%|█████████ | 2515/2774 [8:15:53<50:32, 11.71s/it] {'loss': 1.042, 'learning_rate': 1.1349870436197924e-07, 'epoch': 0.91} 91%|█████████ | 2515/2774 [8:15:53<50:32, 11.71s/it] 91%|█████████ | 2516/2774 [8:16:04<49:54, 11.61s/it] {'loss': 1.0444, 'learning_rate': 1.1263058685515776e-07, 'epoch': 0.91} 91%|█████████ | 2516/2774 [8:16:04<49:54, 11.61s/it] 91%|█████████ | 2517/2774 [8:16:16<49:44, 11.61s/it] {'loss': 1.0273, 'learning_rate': 1.117657255759988e-07, 'epoch': 0.91} 91%|█████████ | 2517/2774 [8:16:16<49:44, 11.61s/it] 91%|█████████ | 2518/2774 [8:16:28<49:29, 11.60s/it] {'loss': 1.0342, 'learning_rate': 1.1090412170412068e-07, 'epoch': 0.91} 91%|█████████ | 2518/2774 [8:16:28<49:29, 11.60s/it] 91%|█████████ | 2519/2774 [8:16:39<49:00, 11.53s/it] {'loss': 1.0806, 'learning_rate': 1.100457764146995e-07, 'epoch': 0.91} 91%|█████████ | 2519/2774 [8:16:39<49:00, 11.53s/it] 91%|█████████ | 2520/2774 [8:16:50<48:44, 11.52s/it] {'loss': 1.0332, 'learning_rate': 1.0919069087846624e-07, 'epoch': 0.91} 91%|█████████ | 2520/2774 [8:16:50<48:44, 11.52s/it] 91%|█████████ | 2521/2774 [8:17:02<48:18, 11.46s/it] {'loss': 1.0171, 'learning_rate': 1.0833886626170547e-07, 'epoch': 0.91} 91%|█████████ | 2521/2774 [8:17:02<48:18, 11.46s/it] 91%|█████████ | 2522/2774 [8:17:13<48:09, 11.47s/it] {'loss': 1.062, 'learning_rate': 1.0749030372625535e-07, 'epoch': 0.91} 91%|█████████ | 2522/2774 [8:17:13<48:09, 11.47s/it] 91%|█████████ | 2523/2774 [8:17:25<48:39, 11.63s/it] {'loss': 1.0229, 'learning_rate': 1.0664500442950281e-07, 'epoch': 0.91} 91%|█████████ | 2523/2774 [8:17:25<48:39, 11.63s/it] 91%|█████████ | 2524/2774 [8:17:37<48:18, 11.60s/it] {'loss': 1.0083, 'learning_rate': 1.0580296952438618e-07, 'epoch': 0.91} 91%|█████████ | 2524/2774 [8:17:37<48:18, 11.60s/it] 91%|█████████ | 2525/2774 [8:17:48<47:40, 11.49s/it] {'loss': 0.9702, 'learning_rate': 1.0496420015938924e-07, 'epoch': 0.91} 91%|█████████ | 2525/2774 [8:17:48<47:40, 11.49s/it] 91%|█████████ | 2526/2774 [8:18:00<47:34, 11.51s/it] {'loss': 0.9692, 'learning_rate': 1.0412869747854409e-07, 'epoch': 0.91} 91%|█████████ | 2526/2774 [8:18:00<47:34, 11.51s/it] 91%|█████████ | 2527/2774 [8:18:11<47:23, 11.51s/it] {'loss': 1.0415, 'learning_rate': 1.032964626214239e-07, 'epoch': 0.91} 91%|█████████ | 2527/2774 [8:18:11<47:23, 11.51s/it] 91%|█████████ | 2528/2774 [8:18:25<50:12, 12.25s/it] {'loss': 1.0146, 'learning_rate': 1.0246749672314844e-07, 'epoch': 0.91} 91%|█████████ | 2528/2774 [8:18:25<50:12, 12.25s/it] 91%|█████████ | 2529/2774 [8:18:37<49:48, 12.20s/it] {'loss': 1.0605, 'learning_rate': 1.0164180091437631e-07, 'epoch': 0.91} 91%|█████████ | 2529/2774 [8:18:37<49:48, 12.20s/it] 91%|█████████ | 2530/2774 [8:18:51<52:08, 12.82s/it] {'loss': 0.9883, 'learning_rate': 1.0081937632130695e-07, 'epoch': 0.91} 91%|█████████ | 2530/2774 [8:18:51<52:08, 12.82s/it] 91%|█████████ | 2531/2774 [8:19:04<51:29, 12.71s/it] {'loss': 0.9946, 'learning_rate': 1.0000022406567777e-07, 'epoch': 0.91} 91%|█████████ | 2531/2774 [8:19:04<51:29, 12.71s/it] 91%|█████████▏| 2532/2774 [8:19:17<52:11, 12.94s/it] {'loss': 1.0737, 'learning_rate': 9.918434526476311e-08, 'epoch': 0.91} 91%|█████████▏| 2532/2774 [8:19:17<52:11, 12.94s/it] 91%|█████████▏| 2533/2774 [8:19:29<49:58, 12.44s/it] {'loss': 1.085, 'learning_rate': 9.837174103137199e-08, 'epoch': 0.91} 91%|█████████▏| 2533/2774 [8:19:29<49:58, 12.44s/it] 91%|█████████▏| 2534/2774 [8:19:40<48:20, 12.09s/it] {'loss': 1.0244, 'learning_rate': 9.756241247384807e-08, 'epoch': 0.91} 91%|█████████▏| 2534/2774 [8:19:40<48:20, 12.09s/it] 91%|█████████▏| 2535/2774 [8:19:52<48:31, 12.18s/it] {'loss': 0.9917, 'learning_rate': 9.675636069606642e-08, 'epoch': 0.91} 91%|█████████▏| 2535/2774 [8:19:52<48:31, 12.18s/it]/usr/local/lib/python3.9/dist-packages/PIL/TiffImagePlugin.py:850: UserWarning: Corrupt EXIF data. Expecting to read 2 bytes but only got 0. warnings.warn(str(msg)) 91%|█████████▏| 2536/2774 [8:20:04<47:40, 12.02s/it] {'loss': 1.0112, 'learning_rate': 9.595358679743261e-08, 'epoch': 0.91} 91%|█████████▏| 2536/2774 [8:20:04<47:40, 12.02s/it] 91%|█████████▏| 2537/2774 [8:20:15<46:44, 11.83s/it] {'loss': 0.9985, 'learning_rate': 9.515409187288188e-08, 'epoch': 0.91} 91%|█████████▏| 2537/2774 [8:20:15<46:44, 11.83s/it] 91%|█████████▏| 2538/2774 [8:20:27<45:55, 11.68s/it] {'loss': 1.0415, 'learning_rate': 9.435787701287724e-08, 'epoch': 0.91} 91%|█████████▏| 2538/2774 [8:20:27<45:55, 11.68s/it] 92%|█████████▏| 2539/2774 [8:20:38<45:03, 11.51s/it] {'loss': 1.0122, 'learning_rate': 9.356494330340749e-08, 'epoch': 0.92} 92%|█████████▏| 2539/2774 [8:20:38<45:03, 11.51s/it] 92%|█████████▏| 2540/2774 [8:20:49<44:41, 11.46s/it] {'loss': 1.0435, 'learning_rate': 9.277529182598638e-08, 'epoch': 0.92} 92%|█████████▏| 2540/2774 [8:20:49<44:41, 11.46s/it] 92%|█████████▏| 2541/2774 [8:21:01<44:52, 11.56s/it] {'loss': 0.9966, 'learning_rate': 9.198892365765072e-08, 'epoch': 0.92} 92%|█████████▏| 2541/2774 [8:21:01<44:52, 11.56s/it] 92%|█████████▏| 2542/2774 [8:21:12<44:32, 11.52s/it] {'loss': 1.0322, 'learning_rate': 9.120583987095921e-08, 'epoch': 0.92} 92%|█████████▏| 2542/2774 [8:21:12<44:32, 11.52s/it] 92%|█████████▏| 2543/2774 [8:21:24<44:10, 11.47s/it] {'loss': 1.0215, 'learning_rate': 9.04260415339911e-08, 'epoch': 0.92} 92%|█████████▏| 2543/2774 [8:21:24<44:10, 11.47s/it] 92%|█████████▏| 2544/2774 [8:21:35<44:19, 11.56s/it] {'loss': 1.0488, 'learning_rate': 8.964952971034418e-08, 'epoch': 0.92} 92%|█████████▏| 2544/2774 [8:21:35<44:19, 11.56s/it] 92%|█████████▏| 2545/2774 [8:21:47<44:04, 11.55s/it] {'loss': 0.98, 'learning_rate': 8.887630545913323e-08, 'epoch': 0.92} 92%|█████████▏| 2545/2774 [8:21:47<44:04, 11.55s/it] 92%|█████████▏| 2546/2774 [8:21:59<44:09, 11.62s/it] {'loss': 1.0576, 'learning_rate': 8.81063698349896e-08, 'epoch': 0.92} 92%|█████████▏| 2546/2774 [8:21:59<44:09, 11.62s/it] 92%|█████████▏| 2547/2774 [8:22:10<43:37, 11.53s/it] {'loss': 0.9971, 'learning_rate': 8.733972388805911e-08, 'epoch': 0.92} 92%|█████████▏| 2547/2774 [8:22:10<43:37, 11.53s/it] 92%|█████████▏| 2548/2774 [8:22:22<43:28, 11.54s/it] {'loss': 1.0005, 'learning_rate': 8.657636866400032e-08, 'epoch': 0.92} 92%|█████████▏| 2548/2774 [8:22:22<43:28, 11.54s/it] 92%|█████████▏| 2549/2774 [8:22:33<42:59, 11.47s/it] {'loss': 1.0708, 'learning_rate': 8.581630520398398e-08, 'epoch': 0.92} 92%|█████████▏| 2549/2774 [8:22:33<42:59, 11.47s/it] 92%|█████████▏| 2550/2774 [8:22:47<45:47, 12.27s/it] {'loss': 0.9536, 'learning_rate': 8.505953454469057e-08, 'epoch': 0.92} 92%|█████████▏| 2550/2774 [8:22:47<45:47, 12.27s/it] 92%|█████████▏| 2551/2774 [8:22:59<44:46, 12.05s/it] {'loss': 1.0186, 'learning_rate': 8.430605771830941e-08, 'epoch': 0.92} 92%|█████████▏| 2551/2774 [8:22:59<44:46, 12.05s/it] 92%|█████████▏| 2552/2774 [8:23:11<45:11, 12.21s/it] {'loss': 0.9834, 'learning_rate': 8.355587575253732e-08, 'epoch': 0.92} 92%|█████████▏| 2552/2774 [8:23:11<45:11, 12.21s/it] 92%|█████████▏| 2553/2774 [8:23:22<43:59, 11.94s/it] {'loss': 0.9927, 'learning_rate': 8.280898967057805e-08, 'epoch': 0.92} 92%|█████████▏| 2553/2774 [8:23:22<43:59, 11.94s/it] 92%|█████████▏| 2554/2774 [8:23:34<43:41, 11.92s/it] {'loss': 1.0493, 'learning_rate': 8.206540049113781e-08, 'epoch': 0.92} 92%|█████████▏| 2554/2774 [8:23:34<43:41, 11.92s/it] 92%|█████████▏| 2555/2774 [8:23:46<43:29, 11.92s/it] {'loss': 0.9897, 'learning_rate': 8.132510922842812e-08, 'epoch': 0.92} 92%|█████████▏| 2555/2774 [8:23:46<43:29, 11.92s/it] 92%|█████████▏| 2556/2774 [8:23:58<42:40, 11.74s/it] {'loss': 1.0137, 'learning_rate': 8.0588116892161e-08, 'epoch': 0.92} 92%|█████████▏| 2556/2774 [8:23:58<42:40, 11.74s/it] 92%|█████████▏| 2557/2774 [8:24:11<44:45, 12.37s/it] {'loss': 1.02, 'learning_rate': 7.985442448755015e-08, 'epoch': 0.92} 92%|█████████▏| 2557/2774 [8:24:11<44:45, 12.37s/it] 92%|█████████▏| 2558/2774 [8:24:23<43:12, 12.00s/it] {'loss': 1.0171, 'learning_rate': 7.912403301530703e-08, 'epoch': 0.92} 92%|█████████▏| 2558/2774 [8:24:23<43:12, 12.00s/it] 92%|█████████▏| 2559/2774 [8:24:34<42:31, 11.87s/it] {'loss': 1.0571, 'learning_rate': 7.839694347164223e-08, 'epoch': 0.92} 92%|█████████▏| 2559/2774 [8:24:34<42:31, 11.87s/it] 92%|█████████▏| 2560/2774 [8:24:46<42:38, 11.95s/it] {'loss': 1.0679, 'learning_rate': 7.767315684826138e-08, 'epoch': 0.92} 92%|█████████▏| 2560/2774 [8:24:46<42:38, 11.95s/it] 92%|█████████▏| 2561/2774 [8:24:58<42:11, 11.88s/it] {'loss': 1.0547, 'learning_rate': 7.695267413236562e-08, 'epoch': 0.92} 92%|█████████▏| 2561/2774 [8:24:58<42:11, 11.88s/it] 92%|█████████▏| 2562/2774 [8:25:10<42:29, 12.03s/it] {'loss': 1.0122, 'learning_rate': 7.623549630665056e-08, 'epoch': 0.92} 92%|█████████▏| 2562/2774 [8:25:10<42:29, 12.03s/it] 92%|█████████▏| 2563/2774 [8:25:25<44:52, 12.76s/it] {'loss': 0.9458, 'learning_rate': 7.552162434930288e-08, 'epoch': 0.92} 92%|█████████▏| 2563/2774 [8:25:25<44:52, 12.76s/it] 92%|█████████▏| 2564/2774 [8:25:36<43:09, 12.33s/it] {'loss': 1.0293, 'learning_rate': 7.481105923400039e-08, 'epoch': 0.92} 92%|█████████▏| 2564/2774 [8:25:36<43:09, 12.33s/it] 92%|█████████▏| 2565/2774 [8:25:48<42:03, 12.08s/it] {'loss': 0.9995, 'learning_rate': 7.410380192991202e-08, 'epoch': 0.92} 92%|█████████▏| 2565/2774 [8:25:48<42:03, 12.08s/it] 93%|█████████▎| 2566/2774 [8:25:59<41:31, 11.98s/it] {'loss': 1.0171, 'learning_rate': 7.339985340169359e-08, 'epoch': 0.93} 93%|█████████▎| 2566/2774 [8:25:59<41:31, 11.98s/it] 93%|█████████▎| 2567/2774 [8:26:14<43:41, 12.67s/it] {'loss': 1.0171, 'learning_rate': 7.269921460948764e-08, 'epoch': 0.93} 93%|█████████▎| 2567/2774 [8:26:14<43:41, 12.67s/it] 93%|█████████▎| 2568/2774 [8:26:25<42:32, 12.39s/it] {'loss': 1.0117, 'learning_rate': 7.200188650892448e-08, 'epoch': 0.93} 93%|█████████▎| 2568/2774 [8:26:25<42:32, 12.39s/it] 93%|█████████▎| 2569/2774 [8:26:37<41:28, 12.14s/it] {'loss': 1.0059, 'learning_rate': 7.130787005111605e-08, 'epoch': 0.93} 93%|█████████▎| 2569/2774 [8:26:37<41:28, 12.14s/it] 93%|█████████▎| 2570/2774 [8:26:49<40:43, 11.98s/it] {'loss': 1.0488, 'learning_rate': 7.061716618266018e-08, 'epoch': 0.93} 93%|█████████▎| 2570/2774 [8:26:49<40:43, 11.98s/it] 93%|█████████▎| 2571/2774 [8:27:00<39:50, 11.78s/it] {'loss': 0.9976, 'learning_rate': 6.992977584563465e-08, 'epoch': 0.93} 93%|█████████▎| 2571/2774 [8:27:00<39:50, 11.78s/it] 93%|█████████▎| 2572/2774 [8:27:13<41:12, 12.24s/it] {'loss': 0.9805, 'learning_rate': 6.92456999775984e-08, 'epoch': 0.93} 93%|█████████▎| 2572/2774 [8:27:13<41:12, 12.24s/it] 93%|█████████▎| 2573/2774 [8:27:25<40:26, 12.07s/it] {'loss': 1.1016, 'learning_rate': 6.856493951158949e-08, 'epoch': 0.93} 93%|█████████▎| 2573/2774 [8:27:25<40:26, 12.07s/it] 93%|█████████▎| 2574/2774 [8:27:36<39:36, 11.88s/it] {'loss': 1.0166, 'learning_rate': 6.788749537612411e-08, 'epoch': 0.93} 93%|█████████▎| 2574/2774 [8:27:36<39:36, 11.88s/it] 93%|█████████▎| 2575/2774 [8:27:48<39:02, 11.77s/it] {'loss': 1.0098, 'learning_rate': 6.721336849519505e-08, 'epoch': 0.93} 93%|█████████▎| 2575/2774 [8:27:48<39:02, 11.77s/it] 93%|█████████▎| 2576/2774 [8:27:59<38:31, 11.68s/it] {'loss': 1.0347, 'learning_rate': 6.654255978827101e-08, 'epoch': 0.93} 93%|█████████▎| 2576/2774 [8:27:59<38:31, 11.68s/it] 93%|█████████▎| 2577/2774 [8:28:11<38:22, 11.69s/it] {'loss': 0.9604, 'learning_rate': 6.587507017029427e-08, 'epoch': 0.93} 93%|█████████▎| 2577/2774 [8:28:11<38:22, 11.69s/it] 93%|█████████▎| 2578/2774 [8:28:23<38:06, 11.67s/it] {'loss': 0.9731, 'learning_rate': 6.521090055168044e-08, 'epoch': 0.93} 93%|█████████▎| 2578/2774 [8:28:23<38:06, 11.67s/it] 93%|█████████▎| 2579/2774 [8:28:35<39:09, 12.05s/it] {'loss': 1.0361, 'learning_rate': 6.455005183831659e-08, 'epoch': 0.93} 93%|█████████▎| 2579/2774 [8:28:35<39:09, 12.05s/it] 93%|█████████▎| 2580/2774 [8:28:47<38:14, 11.83s/it] {'loss': 1.0073, 'learning_rate': 6.389252493156084e-08, 'epoch': 0.93} 93%|█████████▎| 2580/2774 [8:28:47<38:14, 11.83s/it] 93%|█████████▎| 2581/2774 [8:28:58<37:35, 11.69s/it] {'loss': 1.0244, 'learning_rate': 6.323832072823971e-08, 'epoch': 0.93} 93%|█████████▎| 2581/2774 [8:28:58<37:35, 11.69s/it] 93%|█████████▎| 2582/2774 [8:29:10<37:27, 11.71s/it] {'loss': 0.9639, 'learning_rate': 6.258744012064833e-08, 'epoch': 0.93} 93%|█████████▎| 2582/2774 [8:29:10<37:27, 11.71s/it] 93%|█████████▎| 2583/2774 [8:29:21<37:07, 11.66s/it] {'loss': 1.0171, 'learning_rate': 6.193988399654849e-08, 'epoch': 0.93} 93%|█████████▎| 2583/2774 [8:29:21<37:07, 11.66s/it] 93%|█████████▎| 2584/2774 [8:29:33<37:05, 11.71s/it] {'loss': 0.9917, 'learning_rate': 6.129565323916814e-08, 'epoch': 0.93} 93%|█████████▎| 2584/2774 [8:29:33<37:05, 11.71s/it] 93%|█████████▎| 2585/2774 [8:29:45<37:01, 11.75s/it] {'loss': 1.103, 'learning_rate': 6.065474872719856e-08, 'epoch': 0.93} 93%|█████████▎| 2585/2774 [8:29:45<37:01, 11.75s/it] 93%|█████████▎| 2586/2774 [8:29:56<36:16, 11.58s/it] {'loss': 1.0117, 'learning_rate': 6.001717133479496e-08, 'epoch': 0.93} 93%|█████████▎| 2586/2774 [8:29:56<36:16, 11.58s/it] 93%|█████████▎| 2587/2774 [8:30:08<35:44, 11.47s/it] {'loss': 1.021, 'learning_rate': 5.938292193157419e-08, 'epoch': 0.93} 93%|█████████▎| 2587/2774 [8:30:08<35:44, 11.47s/it] 93%|█████████▎| 2588/2774 [8:30:19<35:25, 11.43s/it] {'loss': 1.0669, 'learning_rate': 5.875200138261428e-08, 'epoch': 0.93} 93%|█████████▎| 2588/2774 [8:30:19<35:25, 11.43s/it] 93%|█████████▎| 2589/2774 [8:30:30<34:59, 11.35s/it] {'loss': 1.0122, 'learning_rate': 5.812441054845325e-08, 'epoch': 0.93} 93%|█████████▎| 2589/2774 [8:30:30<34:59, 11.35s/it] 93%|█████████▎| 2590/2774 [8:30:41<34:49, 11.35s/it] {'loss': 1.0249, 'learning_rate': 5.7500150285086376e-08, 'epoch': 0.93} 93%|█████████▎| 2590/2774 [8:30:41<34:49, 11.35s/it] 93%|█████████▎| 2591/2774 [8:30:53<34:32, 11.32s/it] {'loss': 0.979, 'learning_rate': 5.6879221443967016e-08, 'epoch': 0.93} 93%|█████████▎| 2591/2774 [8:30:53<34:32, 11.32s/it] 93%|█████████▎| 2592/2774 [8:31:04<34:27, 11.36s/it] {'loss': 0.9678, 'learning_rate': 5.626162487200465e-08, 'epoch': 0.93} 93%|█████████▎| 2592/2774 [8:31:04<34:27, 11.36s/it] 93%|█████████▎| 2593/2774 [8:31:16<34:34, 11.46s/it] {'loss': 0.9839, 'learning_rate': 5.564736141156407e-08, 'epoch': 0.93} 93%|█████████▎| 2593/2774 [8:31:16<34:34, 11.46s/it] 94%|█████████▎| 2594/2774 [8:31:27<34:10, 11.39s/it] {'loss': 1.0205, 'learning_rate': 5.503643190046315e-08, 'epoch': 0.94} 94%|█████████▎| 2594/2774 [8:31:27<34:10, 11.39s/it] 94%|█████████▎| 2595/2774 [8:31:38<34:00, 11.40s/it] {'loss': 1.042, 'learning_rate': 5.4428837171973114e-08, 'epoch': 0.94} 94%|█████████▎| 2595/2774 [8:31:38<34:00, 11.40s/it] 94%|█████████▎| 2596/2774 [8:31:50<34:07, 11.50s/it] {'loss': 0.9409, 'learning_rate': 5.382457805481606e-08, 'epoch': 0.94} 94%|█████████▎| 2596/2774 [8:31:50<34:07, 11.50s/it] 94%|█████████▎| 2597/2774 [8:32:02<34:00, 11.53s/it] {'loss': 1.0264, 'learning_rate': 5.322365537316549e-08, 'epoch': 0.94} 94%|█████████▎| 2597/2774 [8:32:02<34:00, 11.53s/it] 94%|█████████▎| 2598/2774 [8:32:13<33:56, 11.57s/it] {'loss': 1.0366, 'learning_rate': 5.2626069946643264e-08, 'epoch': 0.94} 94%|█████████▎| 2598/2774 [8:32:13<33:56, 11.57s/it] 94%|█████████▎| 2599/2774 [8:32:25<33:37, 11.53s/it] {'loss': 1.0552, 'learning_rate': 5.2031822590319636e-08, 'epoch': 0.94} 94%|█████████▎| 2599/2774 [8:32:25<33:37, 11.53s/it] 94%|█████████▎| 2600/2774 [8:32:36<33:08, 11.43s/it] {'loss': 1.0391, 'learning_rate': 5.144091411471236e-08, 'epoch': 0.94} 94%|█████████▎| 2600/2774 [8:32:36<33:08, 11.43s/it] 94%|█████████▍| 2601/2774 [8:32:47<32:55, 11.42s/it] {'loss': 1.0376, 'learning_rate': 5.0853345325785064e-08, 'epoch': 0.94} 94%|█████████▍| 2601/2774 [8:32:47<32:55, 11.42s/it] 94%|█████████▍| 2602/2774 [8:33:00<33:17, 11.61s/it] {'loss': 1.0283, 'learning_rate': 5.026911702494558e-08, 'epoch': 0.94} 94%|█████████▍| 2602/2774 [8:33:00<33:17, 11.61s/it] 94%|█████████▍| 2603/2774 [8:33:11<33:15, 11.67s/it] {'loss': 0.9492, 'learning_rate': 4.968823000904649e-08, 'epoch': 0.94} 94%|█████████▍| 2603/2774 [8:33:11<33:15, 11.67s/it] 94%|█████████▍| 2604/2774 [8:33:23<33:12, 11.72s/it] {'loss': 1.0137, 'learning_rate': 4.911068507038236e-08, 'epoch': 0.94} 94%|█████████▍| 2604/2774 [8:33:23<33:12, 11.72s/it] 94%|█████████▍| 2605/2774 [8:33:35<32:49, 11.66s/it] {'loss': 1.0508, 'learning_rate': 4.8536482996690004e-08, 'epoch': 0.94} 94%|█████████▍| 2605/2774 [8:33:35<32:49, 11.66s/it] 94%|█████████▍| 2606/2774 [8:33:46<32:33, 11.63s/it] {'loss': 1.0205, 'learning_rate': 4.796562457114573e-08, 'epoch': 0.94} 94%|█████████▍| 2606/2774 [8:33:46<32:33, 11.63s/it] 94%|█████████▍| 2607/2774 [8:34:00<33:49, 12.15s/it] {'loss': 1.0347, 'learning_rate': 4.739811057236615e-08, 'epoch': 0.94} 94%|█████████▍| 2607/2774 [8:34:00<33:49, 12.15s/it] 94%|█████████▍| 2608/2774 [8:34:12<33:29, 12.11s/it] {'loss': 1.0107, 'learning_rate': 4.6833941774406535e-08, 'epoch': 0.94} 94%|█████████▍| 2608/2774 [8:34:12<33:29, 12.11s/it] 94%|█████████▍| 2609/2774 [8:34:23<32:34, 11.85s/it] {'loss': 1.0088, 'learning_rate': 4.627311894675857e-08, 'epoch': 0.94} 94%|█████████▍| 2609/2774 [8:34:23<32:34, 11.85s/it] 94%|█████████▍| 2610/2774 [8:34:35<32:15, 11.80s/it] {'loss': 0.9668, 'learning_rate': 4.5715642854350374e-08, 'epoch': 0.94} 94%|█████████▍| 2610/2774 [8:34:35<32:15, 11.80s/it] 94%|█████████▍| 2611/2774 [8:34:46<31:48, 11.71s/it] {'loss': 1.0352, 'learning_rate': 4.5161514257546504e-08, 'epoch': 0.94} 94%|█████████▍| 2611/2774 [8:34:46<31:48, 11.71s/it] 94%|█████████▍| 2612/2774 [8:34:57<31:16, 11.59s/it] {'loss': 1.0474, 'learning_rate': 4.46107339121446e-08, 'epoch': 0.94} 94%|█████████▍| 2612/2774 [8:34:57<31:16, 11.59s/it] 94%|█████████▍| 2613/2774 [8:35:09<30:53, 11.51s/it] {'loss': 1.0366, 'learning_rate': 4.406330256937541e-08, 'epoch': 0.94} 94%|█████████▍| 2613/2774 [8:35:09<30:53, 11.51s/it] 94%|█████████▍| 2614/2774 [8:35:20<30:51, 11.57s/it] {'loss': 1.0381, 'learning_rate': 4.3519220975902775e-08, 'epoch': 0.94} 94%|█████████▍| 2614/2774 [8:35:20<30:51, 11.57s/it] 94%|█████████▍| 2615/2774 [8:35:32<30:55, 11.67s/it] {'loss': 1.0034, 'learning_rate': 4.297848987382031e-08, 'epoch': 0.94} 94%|█████████▍| 2615/2774 [8:35:32<30:55, 11.67s/it] 94%|█████████▍| 2616/2774 [8:35:45<31:21, 11.91s/it] {'loss': 1.0098, 'learning_rate': 4.2441110000653596e-08, 'epoch': 0.94} 94%|█████████▍| 2616/2774 [8:35:45<31:21, 11.91s/it] 94%|█████████▍| 2617/2774 [8:35:56<30:40, 11.72s/it] {'loss': 1.0288, 'learning_rate': 4.190708208935579e-08, 'epoch': 0.94} 94%|█████████▍| 2617/2774 [8:35:56<30:40, 11.72s/it] 94%|█████████▍| 2618/2774 [8:36:08<30:23, 11.69s/it] {'loss': 1.0146, 'learning_rate': 4.1376406868308684e-08, 'epoch': 0.94} 94%|█████████▍| 2618/2774 [8:36:08<30:23, 11.69s/it] 94%|█████████▍| 2619/2774 [8:36:21<31:40, 12.26s/it] {'loss': 0.9351, 'learning_rate': 4.084908506132107e-08, 'epoch': 0.94} 94%|█████████▍| 2619/2774 [8:36:21<31:40, 12.26s/it] 94%|█████████▍| 2620/2774 [8:36:33<30:55, 12.05s/it] {'loss': 0.9795, 'learning_rate': 4.0325117387628455e-08, 'epoch': 0.94} 94%|█████████▍| 2620/2774 [8:36:33<30:55, 12.05s/it] 94%|█████████▍| 2621/2774 [8:36:44<30:09, 11.82s/it] {'loss': 1.0723, 'learning_rate': 3.9804504561890554e-08, 'epoch': 0.94} 94%|█████████▍| 2621/2774 [8:36:44<30:09, 11.82s/it] 95%|█████████▍| 2622/2774 [8:36:55<29:35, 11.68s/it] {'loss': 1.0889, 'learning_rate': 3.928724729419242e-08, 'epoch': 0.95} 95%|█████████▍| 2622/2774 [8:36:55<29:35, 11.68s/it] 95%|█████████▍| 2623/2774 [8:37:07<29:11, 11.60s/it] {'loss': 1.0122, 'learning_rate': 3.877334629004109e-08, 'epoch': 0.95} 95%|█████████▍| 2623/2774 [8:37:07<29:11, 11.60s/it] 95%|█████████▍| 2624/2774 [8:37:18<28:47, 11.52s/it] {'loss': 1.0381, 'learning_rate': 3.826280225036727e-08, 'epoch': 0.95} 95%|█████████▍| 2624/2774 [8:37:18<28:47, 11.52s/it] 95%|█████████▍| 2625/2774 [8:37:30<28:54, 11.64s/it] {'loss': 0.98, 'learning_rate': 3.7755615871521434e-08, 'epoch': 0.95} 95%|█████████▍| 2625/2774 [8:37:30<28:54, 11.64s/it] 95%|█████████▍| 2626/2774 [8:37:42<28:37, 11.60s/it] {'loss': 1.0737, 'learning_rate': 3.725178784527578e-08, 'epoch': 0.95} 95%|█████████▍| 2626/2774 [8:37:42<28:37, 11.60s/it] 95%|█████████▍| 2627/2774 [8:37:53<28:33, 11.65s/it] {'loss': 1.0532, 'learning_rate': 3.6751318858820885e-08, 'epoch': 0.95} 95%|█████████▍| 2627/2774 [8:37:53<28:33, 11.65s/it] 95%|█████████▍| 2628/2774 [8:38:05<28:23, 11.67s/it] {'loss': 1.0146, 'learning_rate': 3.625420959476628e-08, 'epoch': 0.95} 95%|█████████▍| 2628/2774 [8:38:05<28:23, 11.67s/it] 95%|█████████▍| 2629/2774 [8:38:16<27:51, 11.53s/it] {'loss': 1.0215, 'learning_rate': 3.576046073113903e-08, 'epoch': 0.95} 95%|█████████▍| 2629/2774 [8:38:16<27:51, 11.53s/it] 95%|█████████▍| 2630/2774 [8:38:28<27:44, 11.56s/it] {'loss': 1.1143, 'learning_rate': 3.5270072941382684e-08, 'epoch': 0.95} 95%|█████████▍| 2630/2774 [8:38:28<27:44, 11.56s/it] 95%|█████████▍| 2631/2774 [8:38:40<27:33, 11.56s/it] {'loss': 1.0278, 'learning_rate': 3.4783046894356906e-08, 'epoch': 0.95} 95%|█████████▍| 2631/2774 [8:38:40<27:33, 11.56s/it] 95%|█████████▍| 2632/2774 [8:38:51<27:11, 11.49s/it] {'loss': 1.0132, 'learning_rate': 3.429938325433507e-08, 'epoch': 0.95} 95%|█████████▍| 2632/2774 [8:38:51<27:11, 11.49s/it] 95%|█████████▍| 2633/2774 [8:39:02<27:05, 11.53s/it] {'loss': 1.0474, 'learning_rate': 3.3819082681006145e-08, 'epoch': 0.95} 95%|█████████▍| 2633/2774 [8:39:02<27:05, 11.53s/it] 95%|█████████▍| 2634/2774 [8:39:14<27:00, 11.57s/it] {'loss': 1.0225, 'learning_rate': 3.334214582946998e-08, 'epoch': 0.95} 95%|█████████▍| 2634/2774 [8:39:14<27:00, 11.57s/it] 95%|█████████▍| 2635/2774 [8:39:25<26:40, 11.51s/it] {'loss': 1.0254, 'learning_rate': 3.2868573350240687e-08, 'epoch': 0.95} 95%|█████████▍| 2635/2774 [8:39:25<26:40, 11.51s/it] 95%|█████████▌| 2636/2774 [8:39:37<26:27, 11.50s/it] {'loss': 1.0649, 'learning_rate': 3.239836588924211e-08, 'epoch': 0.95} 95%|█████████▌| 2636/2774 [8:39:37<26:27, 11.50s/it] 95%|█████████▌| 2637/2774 [8:39:49<26:22, 11.55s/it] {'loss': 1.0508, 'learning_rate': 3.1931524087808476e-08, 'epoch': 0.95} 95%|█████████▌| 2637/2774 [8:39:49<26:22, 11.55s/it] 95%|█████████▌| 2638/2774 [8:40:01<26:31, 11.70s/it] {'loss': 0.9888, 'learning_rate': 3.146804858268404e-08, 'epoch': 0.95} 95%|█████████▌| 2638/2774 [8:40:01<26:31, 11.70s/it] 95%|█████████▌| 2639/2774 [8:40:12<26:06, 11.60s/it] {'loss': 1.0625, 'learning_rate': 3.100794000602175e-08, 'epoch': 0.95} 95%|█████████▌| 2639/2774 [8:40:12<26:06, 11.60s/it] 95%|█████████▌| 2640/2774 [8:40:25<26:59, 12.08s/it] {'loss': 1.0308, 'learning_rate': 3.0551198985381284e-08, 'epoch': 0.95} 95%|█████████▌| 2640/2774 [8:40:25<26:59, 12.08s/it] 95%|█████████▌| 2641/2774 [8:40:37<26:19, 11.87s/it] {'loss': 1.0068, 'learning_rate': 3.0097826143730414e-08, 'epoch': 0.95} 95%|█████████▌| 2641/2774 [8:40:37<26:19, 11.87s/it] 95%|█████████▌| 2642/2774 [8:40:48<26:00, 11.82s/it] {'loss': 1.043, 'learning_rate': 2.9647822099442004e-08, 'epoch': 0.95} 95%|█████████▌| 2642/2774 [8:40:48<26:00, 11.82s/it] 95%|█████████▌| 2643/2774 [8:41:00<25:39, 11.75s/it] {'loss': 1.0122, 'learning_rate': 2.9201187466294246e-08, 'epoch': 0.95} 95%|█████████▌| 2643/2774 [8:41:00<25:39, 11.75s/it] 95%|█████████▌| 2644/2774 [8:41:11<25:19, 11.69s/it] {'loss': 1.0146, 'learning_rate': 2.8757922853470123e-08, 'epoch': 0.95} 95%|█████████▌| 2644/2774 [8:41:11<25:19, 11.69s/it] 95%|█████████▌| 2645/2774 [8:41:23<24:53, 11.58s/it] {'loss': 0.9844, 'learning_rate': 2.8318028865555736e-08, 'epoch': 0.95} 95%|█████████▌| 2645/2774 [8:41:23<24:53, 11.58s/it] 95%|█████████▌| 2646/2774 [8:41:34<24:36, 11.53s/it] {'loss': 1.0308, 'learning_rate': 2.788150610253948e-08, 'epoch': 0.95} 95%|█████████▌| 2646/2774 [8:41:34<24:36, 11.53s/it] 95%|█████████▌| 2647/2774 [8:41:46<24:25, 11.54s/it] {'loss': 1.0068, 'learning_rate': 2.7448355159812313e-08, 'epoch': 0.95} 95%|█████████▌| 2647/2774 [8:41:46<24:25, 11.54s/it] 95%|█████████▌| 2648/2774 [8:41:57<24:02, 11.45s/it] {'loss': 1.0127, 'learning_rate': 2.7018576628166095e-08, 'epoch': 0.95} 95%|█████████▌| 2648/2774 [8:41:57<24:02, 11.45s/it] 95%|█████████▌| 2649/2774 [8:42:08<23:50, 11.44s/it] {'loss': 0.9707, 'learning_rate': 2.659217109379275e-08, 'epoch': 0.95} 95%|█████████▌| 2649/2774 [8:42:08<23:50, 11.44s/it] 96%|█████████▌| 2650/2774 [8:42:20<23:38, 11.44s/it] {'loss': 1.0376, 'learning_rate': 2.616913913828373e-08, 'epoch': 0.96} 96%|█████████▌| 2650/2774 [8:42:20<23:38, 11.44s/it] 96%|█████████▌| 2651/2774 [8:42:31<23:21, 11.39s/it] {'loss': 1.0474, 'learning_rate': 2.574948133862887e-08, 'epoch': 0.96} 96%|█████████▌| 2651/2774 [8:42:31<23:21, 11.39s/it] 96%|█████████▌| 2652/2774 [8:42:42<23:07, 11.38s/it] {'loss': 0.9883, 'learning_rate': 2.5333198267215862e-08, 'epoch': 0.96} 96%|█████████▌| 2652/2774 [8:42:42<23:07, 11.38s/it] 96%|█████████▌| 2653/2774 [8:42:54<23:02, 11.42s/it] {'loss': 1.0703, 'learning_rate': 2.4920290491830257e-08, 'epoch': 0.96} 96%|█████████▌| 2653/2774 [8:42:54<23:02, 11.42s/it] 96%|█████████▌| 2654/2774 [8:43:06<22:58, 11.49s/it] {'loss': 1.0127, 'learning_rate': 2.4510758575652937e-08, 'epoch': 0.96} 96%|█████████▌| 2654/2774 [8:43:06<22:58, 11.49s/it] 96%|█████████▌| 2655/2774 [8:43:18<23:32, 11.87s/it] {'loss': 1.0249, 'learning_rate': 2.4104603077260703e-08, 'epoch': 0.96} 96%|█████████▌| 2655/2774 [8:43:18<23:32, 11.87s/it] 96%|█████████▌| 2656/2774 [8:43:30<23:25, 11.91s/it] {'loss': 1.0234, 'learning_rate': 2.3701824550624864e-08, 'epoch': 0.96} 96%|█████████▌| 2656/2774 [8:43:30<23:25, 11.91s/it] 96%|█████████▌| 2657/2774 [8:43:44<24:22, 12.50s/it] {'loss': 1.0039, 'learning_rate': 2.3302423545111807e-08, 'epoch': 0.96} 96%|█████████▌| 2657/2774 [8:43:44<24:22, 12.50s/it] 96%|█████████▌| 2658/2774 [8:43:56<23:33, 12.19s/it] {'loss': 1.0308, 'learning_rate': 2.2906400605479663e-08, 'epoch': 0.96} 96%|█████████▌| 2658/2774 [8:43:56<23:33, 12.19s/it] 96%|█████████▌| 2659/2774 [8:44:07<22:58, 11.98s/it] {'loss': 1.0884, 'learning_rate': 2.251375627187996e-08, 'epoch': 0.96} 96%|█████████▌| 2659/2774 [8:44:07<22:58, 11.98s/it] 96%|█████████▌| 2660/2774 [8:44:18<22:19, 11.75s/it] {'loss': 1.0508, 'learning_rate': 2.212449107985598e-08, 'epoch': 0.96} 96%|█████████▌| 2660/2774 [8:44:18<22:19, 11.75s/it] 96%|█████████▌| 2661/2774 [8:44:31<22:33, 11.98s/it] {'loss': 0.9922, 'learning_rate': 2.173860556034163e-08, 'epoch': 0.96} 96%|█████████▌| 2661/2774 [8:44:31<22:33, 11.98s/it] 96%|█████████▌| 2662/2774 [8:44:42<21:55, 11.74s/it] {'loss': 1.0791, 'learning_rate': 2.1356100239662002e-08, 'epoch': 0.96} 96%|█████████▌| 2662/2774 [8:44:42<21:55, 11.74s/it] 96%|█████████▌| 2663/2774 [8:44:54<21:47, 11.78s/it] {'loss': 0.9834, 'learning_rate': 2.0976975639530606e-08, 'epoch': 0.96} 96%|█████████▌| 2663/2774 [8:44:54<21:47, 11.78s/it] 96%|█████████▌| 2664/2774 [8:45:06<21:26, 11.70s/it] {'loss': 0.9858, 'learning_rate': 2.060123227705102e-08, 'epoch': 0.96} 96%|█████████▌| 2664/2774 [8:45:06<21:26, 11.70s/it] 96%|█████████▌| 2665/2774 [8:45:17<21:02, 11.59s/it] {'loss': 1.0151, 'learning_rate': 2.0228870664714128e-08, 'epoch': 0.96} 96%|█████████▌| 2665/2774 [8:45:17<21:02, 11.59s/it] 96%|█████████▌| 2666/2774 [8:45:28<20:41, 11.49s/it] {'loss': 1.04, 'learning_rate': 1.9859891310398948e-08, 'epoch': 0.96} 96%|█████████▌| 2666/2774 [8:45:28<20:41, 11.49s/it] 96%|█████████▌| 2667/2774 [8:45:40<20:26, 11.46s/it] {'loss': 1.0615, 'learning_rate': 1.9494294717370964e-08, 'epoch': 0.96} 96%|█████████▌| 2667/2774 [8:45:40<20:26, 11.46s/it] 96%|█████████▌| 2668/2774 [8:45:51<20:18, 11.50s/it] {'loss': 1.0225, 'learning_rate': 1.9132081384281575e-08, 'epoch': 0.96} 96%|█████████▌| 2668/2774 [8:45:51<20:18, 11.50s/it] 96%|█████████▌| 2669/2774 [8:46:03<20:07, 11.50s/it] {'loss': 1.0356, 'learning_rate': 1.8773251805168092e-08, 'epoch': 0.96} 96%|█████████▌| 2669/2774 [8:46:03<20:07, 11.50s/it] 96%|█████████▋| 2670/2774 [8:46:14<19:55, 11.50s/it] {'loss': 1.0586, 'learning_rate': 1.8417806469452626e-08, 'epoch': 0.96} 96%|█████████▋| 2670/2774 [8:46:14<19:55, 11.50s/it] 96%|█████████▋| 2671/2774 [8:46:26<19:41, 11.47s/it] {'loss': 1.0811, 'learning_rate': 1.806574586194071e-08, 'epoch': 0.96} 96%|█████████▋| 2671/2774 [8:46:26<19:41, 11.47s/it] 96%|█████████▋| 2672/2774 [8:46:38<19:58, 11.75s/it] {'loss': 0.9658, 'learning_rate': 1.7717070462822116e-08, 'epoch': 0.96} 96%|█████████▋| 2672/2774 [8:46:38<19:58, 11.75s/it] 96%|█████████▋| 2673/2774 [8:46:50<19:41, 11.70s/it] {'loss': 1.0117, 'learning_rate': 1.7371780747668655e-08, 'epoch': 0.96} 96%|█████████▋| 2673/2774 [8:46:50<19:41, 11.70s/it] 96%|█████████▋| 2674/2774 [8:47:01<19:31, 11.72s/it] {'loss': 1.1084, 'learning_rate': 1.7029877187434986e-08, 'epoch': 0.96} 96%|█████████▋| 2674/2774 [8:47:01<19:31, 11.72s/it] 96%|█████████▋| 2675/2774 [8:47:13<19:11, 11.63s/it] {'loss': 1.0889, 'learning_rate': 1.6691360248456412e-08, 'epoch': 0.96} 96%|█████████▋| 2675/2774 [8:47:13<19:11, 11.63s/it] 96%|█████████▋| 2676/2774 [8:47:26<19:57, 12.22s/it] {'loss': 0.9692, 'learning_rate': 1.6356230392450268e-08, 'epoch': 0.96} 96%|█████████▋| 2676/2774 [8:47:26<19:57, 12.22s/it] 97%|█████████▋| 2677/2774 [8:47:37<19:14, 11.90s/it] {'loss': 1.022, 'learning_rate': 1.6024488076512855e-08, 'epoch': 0.97} 97%|█████████▋| 2677/2774 [8:47:37<19:14, 11.90s/it] 97%|█████████▋| 2678/2774 [8:47:49<18:42, 11.70s/it] {'loss': 0.9805, 'learning_rate': 1.5696133753121124e-08, 'epoch': 0.97} 97%|█████████▋| 2678/2774 [8:47:49<18:42, 11.70s/it] 97%|█████████▋| 2679/2774 [8:48:00<18:20, 11.59s/it] {'loss': 1.0249, 'learning_rate': 1.5371167870130433e-08, 'epoch': 0.97} 97%|█████████▋| 2679/2774 [8:48:00<18:20, 11.59s/it] 97%|█████████▋| 2680/2774 [8:48:11<18:06, 11.56s/it] {'loss': 1.0605, 'learning_rate': 1.504959087077429e-08, 'epoch': 0.97} 97%|█████████▋| 2680/2774 [8:48:11<18:06, 11.56s/it] 97%|█████████▋| 2681/2774 [8:48:23<17:59, 11.61s/it] {'loss': 1.0854, 'learning_rate': 1.473140319366434e-08, 'epoch': 0.97} 97%|█████████▋| 2681/2774 [8:48:23<17:59, 11.61s/it] 97%|█████████▋| 2682/2774 [8:48:35<17:39, 11.52s/it] {'loss': 0.998, 'learning_rate': 1.4416605272789819e-08, 'epoch': 0.97} 97%|█████████▋| 2682/2774 [8:48:35<17:39, 11.52s/it] 97%|█████████▋| 2683/2774 [8:48:46<17:38, 11.63s/it] {'loss': 1.0264, 'learning_rate': 1.4105197537515602e-08, 'epoch': 0.97} 97%|█████████▋| 2683/2774 [8:48:46<17:38, 11.63s/it] 97%|█████████▋| 2684/2774 [8:48:58<17:17, 11.53s/it] {'loss': 0.9487, 'learning_rate': 1.3797180412583322e-08, 'epoch': 0.97} 97%|█████████▋| 2684/2774 [8:48:58<17:17, 11.53s/it] 97%|█████████▋| 2685/2774 [8:49:11<18:01, 12.15s/it] {'loss': 1.0137, 'learning_rate': 1.349255431810942e-08, 'epoch': 0.97} 97%|█████████▋| 2685/2774 [8:49:11<18:01, 12.15s/it] 97%|█████████▋| 2686/2774 [8:49:24<18:11, 12.40s/it] {'loss': 0.9824, 'learning_rate': 1.31913196695857e-08, 'epoch': 0.97} 97%|█████████▋| 2686/2774 [8:49:24<18:11, 12.40s/it] 97%|█████████▋| 2687/2774 [8:49:36<17:33, 12.11s/it] {'loss': 1.0454, 'learning_rate': 1.289347687787823e-08, 'epoch': 0.97} 97%|█████████▋| 2687/2774 [8:49:36<17:33, 12.11s/it] 97%|█████████▋| 2688/2774 [8:49:47<16:57, 11.84s/it] {'loss': 1.0156, 'learning_rate': 1.259902634922594e-08, 'epoch': 0.97} 97%|█████████▋| 2688/2774 [8:49:47<16:57, 11.84s/it] 97%|█████████▋| 2689/2774 [8:49:59<16:40, 11.77s/it] {'loss': 1.0571, 'learning_rate': 1.2307968485242572e-08, 'epoch': 0.97} 97%|█████████▋| 2689/2774 [8:49:59<16:40, 11.77s/it] 97%|█████████▋| 2690/2774 [8:50:10<16:14, 11.60s/it] {'loss': 0.9727, 'learning_rate': 1.2020303682912237e-08, 'epoch': 0.97} 97%|█████████▋| 2690/2774 [8:50:10<16:14, 11.60s/it] 97%|█████████▋| 2691/2774 [8:50:22<16:09, 11.68s/it] {'loss': 1.0508, 'learning_rate': 1.1736032334593306e-08, 'epoch': 0.97} 97%|█████████▋| 2691/2774 [8:50:22<16:09, 11.68s/it] 97%|█████████▋| 2692/2774 [8:50:35<16:37, 12.17s/it] {'loss': 1.0239, 'learning_rate': 1.1455154828014515e-08, 'epoch': 0.97} 97%|█████████▋| 2692/2774 [8:50:35<16:37, 12.17s/it] 97%|█████████▋| 2693/2774 [8:50:46<16:05, 11.92s/it] {'loss': 0.979, 'learning_rate': 1.1177671546275526e-08, 'epoch': 0.97} 97%|█████████▋| 2693/2774 [8:50:46<16:05, 11.92s/it] 97%|█████████▋| 2694/2774 [8:50:58<15:44, 11.80s/it] {'loss': 1.0034, 'learning_rate': 1.090358286784693e-08, 'epoch': 0.97} 97%|█████████▋| 2694/2774 [8:50:58<15:44, 11.80s/it] 97%|█████████▋| 2695/2774 [8:51:10<15:34, 11.83s/it] {'loss': 1.1143, 'learning_rate': 1.0632889166569128e-08, 'epoch': 0.97} 97%|█████████▋| 2695/2774 [8:51:10<15:34, 11.83s/it] 97%|█████████▋| 2696/2774 [8:51:21<15:16, 11.76s/it] {'loss': 0.9917, 'learning_rate': 1.036559081165206e-08, 'epoch': 0.97} 97%|█████████▋| 2696/2774 [8:51:21<15:16, 11.76s/it] 97%|█████████▋| 2697/2774 [8:51:34<15:38, 12.19s/it] {'loss': 0.9907, 'learning_rate': 1.0101688167674372e-08, 'epoch': 0.97} 97%|█████████▋| 2697/2774 [8:51:34<15:38, 12.19s/it] 97%|█████████▋| 2698/2774 [8:51:46<15:11, 11.99s/it] {'loss': 1.0381, 'learning_rate': 9.841181594583693e-09, 'epoch': 0.97} 97%|█████████▋| 2698/2774 [8:51:46<15:11, 11.99s/it] 97%|█████████▋| 2699/2774 [8:51:57<14:41, 11.75s/it] {'loss': 1.0747, 'learning_rate': 9.584071447694688e-09, 'epoch': 0.97} 97%|█████████▋| 2699/2774 [8:51:57<14:41, 11.75s/it] 97%|█████████▋| 2700/2774 [8:52:10<14:59, 12.16s/it] {'loss': 0.9819, 'learning_rate': 9.330358077690449e-09, 'epoch': 0.97} 97%|█████████▋| 2700/2774 [8:52:10<14:59, 12.16s/it] 97%|█████████▋| 2701/2774 [8:52:22<14:28, 11.90s/it] {'loss': 1.0171, 'learning_rate': 9.080041830620834e-09, 'epoch': 0.97} 97%|█████████▋| 2701/2774 [8:52:22<14:28, 11.90s/it] 97%|█████████▋| 2702/2774 [8:52:34<14:32, 12.12s/it] {'loss': 1.0112, 'learning_rate': 8.833123047901626e-09, 'epoch': 0.97} 97%|█████████▋| 2702/2774 [8:52:34<14:32, 12.12s/it] 97%|█████████▋| 2703/2774 [8:52:46<14:06, 11.93s/it] {'loss': 1.0132, 'learning_rate': 8.589602066315372e-09, 'epoch': 0.97} 97%|█████████▋| 2703/2774 [8:52:46<14:06, 11.93s/it] 97%|█████████▋| 2704/2774 [8:52:57<13:47, 11.82s/it] {'loss': 1.0112, 'learning_rate': 8.349479218009993e-09, 'epoch': 0.97} 97%|█████████▋| 2704/2774 [8:52:57<13:47, 11.82s/it] 98%|█████████▊| 2705/2774 [8:53:09<13:43, 11.94s/it] {'loss': 1.0278, 'learning_rate': 8.112754830498504e-09, 'epoch': 0.98} 98%|█████████▊| 2705/2774 [8:53:09<13:43, 11.94s/it] 98%|█████████▊| 2706/2774 [8:53:22<13:35, 12.00s/it] {'loss': 1.0117, 'learning_rate': 7.879429226658741e-09, 'epoch': 0.98} 98%|█████████▊| 2706/2774 [8:53:22<13:35, 12.00s/it] 98%|█████████▊| 2707/2774 [8:53:34<13:40, 12.25s/it] {'loss': 1.0044, 'learning_rate': 7.649502724732528e-09, 'epoch': 0.98} 98%|█████████▊| 2707/2774 [8:53:34<13:40, 12.25s/it] 98%|█████████▊| 2708/2774 [8:53:46<13:15, 12.05s/it] {'loss': 1.0103, 'learning_rate': 7.4229756383259465e-09, 'epoch': 0.98} 98%|█████████▊| 2708/2774 [8:53:46<13:15, 12.05s/it] 98%|█████████▊| 2709/2774 [8:53:58<12:52, 11.88s/it] {'loss': 1.0322, 'learning_rate': 7.1998482764082386e-09, 'epoch': 0.98} 98%|█████████▊| 2709/2774 [8:53:58<12:52, 11.88s/it] 98%|█████████▊| 2710/2774 [8:54:09<12:28, 11.70s/it] {'loss': 1.063, 'learning_rate': 6.980120943311519e-09, 'epoch': 0.98} 98%|█████████▊| 2710/2774 [8:54:09<12:28, 11.70s/it] 98%|█████████▊| 2711/2774 [8:54:21<12:22, 11.79s/it] {'loss': 1.0371, 'learning_rate': 6.763793938730778e-09, 'epoch': 0.98} 98%|█████████▊| 2711/2774 [8:54:21<12:22, 11.79s/it] 98%|█████████▊| 2712/2774 [8:54:32<11:59, 11.61s/it] {'loss': 1.0015, 'learning_rate': 6.5508675577227735e-09, 'epoch': 0.98} 98%|█████████▊| 2712/2774 [8:54:32<11:59, 11.61s/it] 98%|█████████▊| 2713/2774 [8:54:46<12:26, 12.24s/it] {'loss': 0.9917, 'learning_rate': 6.341342090706304e-09, 'epoch': 0.98} 98%|█████████▊| 2713/2774 [8:54:46<12:26, 12.24s/it] 98%|█████████▊| 2714/2774 [8:54:57<12:03, 12.06s/it] {'loss': 0.9702, 'learning_rate': 6.1352178234613816e-09, 'epoch': 0.98} 98%|█████████▊| 2714/2774 [8:54:57<12:03, 12.06s/it] 98%|█████████▊| 2715/2774 [8:55:09<11:40, 11.87s/it] {'loss': 1.0659, 'learning_rate': 5.9324950371292264e-09, 'epoch': 0.98} 98%|█████████▊| 2715/2774 [8:55:09<11:40, 11.87s/it] 98%|█████████▊| 2716/2774 [8:55:20<11:18, 11.69s/it] {'loss': 1.0391, 'learning_rate': 5.733174008211717e-09, 'epoch': 0.98} 98%|█████████▊| 2716/2774 [8:55:20<11:18, 11.69s/it] 98%|█████████▊| 2717/2774 [8:55:31<10:58, 11.56s/it] {'loss': 0.9941, 'learning_rate': 5.537255008569997e-09, 'epoch': 0.98} 98%|█████████▊| 2717/2774 [8:55:31<10:58, 11.56s/it] 98%|█████████▊| 2718/2774 [8:55:45<11:19, 12.13s/it] {'loss': 1.0166, 'learning_rate': 5.3447383054261445e-09, 'epoch': 0.98} 98%|█████████▊| 2718/2774 [8:55:45<11:19, 12.13s/it] 98%|█████████▊| 2719/2774 [8:55:56<10:57, 11.95s/it] {'loss': 0.9854, 'learning_rate': 5.155624161361505e-09, 'epoch': 0.98} 98%|█████████▊| 2719/2774 [8:55:56<10:57, 11.95s/it] 98%|█████████▊| 2720/2774 [8:56:08<10:44, 11.94s/it] {'loss': 0.9702, 'learning_rate': 4.96991283431586e-09, 'epoch': 0.98} 98%|█████████▊| 2720/2774 [8:56:08<10:44, 11.94s/it] 98%|█████████▊| 2721/2774 [8:56:20<10:23, 11.76s/it] {'loss': 1.0581, 'learning_rate': 4.787604577588534e-09, 'epoch': 0.98} 98%|█████████▊| 2721/2774 [8:56:20<10:23, 11.76s/it] 98%|█████████▊| 2722/2774 [8:56:34<10:47, 12.45s/it] {'loss': 0.9575, 'learning_rate': 4.608699639837288e-09, 'epoch': 0.98} 98%|█████████▊| 2722/2774 [8:56:34<10:47, 12.45s/it] 98%|█████████▊| 2723/2774 [8:56:45<10:17, 12.11s/it] {'loss': 0.9707, 'learning_rate': 4.433198265076932e-09, 'epoch': 0.98} 98%|█████████▊| 2723/2774 [8:56:45<10:17, 12.11s/it] 98%|█████████▊| 2724/2774 [8:56:57<09:58, 11.96s/it] {'loss': 1.0518, 'learning_rate': 4.261100692681264e-09, 'epoch': 0.98} 98%|█████████▊| 2724/2774 [8:56:57<09:58, 11.96s/it] 98%|█████████▊| 2725/2774 [8:57:08<09:38, 11.81s/it] {'loss': 1.0269, 'learning_rate': 4.092407157380851e-09, 'epoch': 0.98} 98%|█████████▊| 2725/2774 [8:57:08<09:38, 11.81s/it] 98%|█████████▊| 2726/2774 [8:57:19<09:22, 11.72s/it] {'loss': 0.9761, 'learning_rate': 3.9271178892635875e-09, 'epoch': 0.98} 98%|█████████▊| 2726/2774 [8:57:19<09:22, 11.72s/it] 98%|█████████▊| 2727/2774 [8:57:32<09:21, 11.94s/it] {'loss': 1.0869, 'learning_rate': 3.765233113773858e-09, 'epoch': 0.98} 98%|█████████▊| 2727/2774 [8:57:32<09:21, 11.94s/it] 98%|█████████▊| 2728/2774 [8:57:43<08:59, 11.73s/it] {'loss': 0.9951, 'learning_rate': 3.6067530517128192e-09, 'epoch': 0.98} 98%|█████████▊| 2728/2774 [8:57:43<08:59, 11.73s/it] 98%|█████████▊| 2729/2774 [8:57:54<08:40, 11.57s/it] {'loss': 1.0156, 'learning_rate': 3.4516779192375616e-09, 'epoch': 0.98} 98%|█████████▊| 2729/2774 [8:57:54<08:40, 11.57s/it] 98%|█████████▊| 2730/2774 [8:58:06<08:26, 11.51s/it] {'loss': 0.9619, 'learning_rate': 3.3000079278611154e-09, 'epoch': 0.98} 98%|█████████▊| 2730/2774 [8:58:06<08:26, 11.51s/it] 98%|█████████▊| 2731/2774 [8:58:17<08:17, 11.56s/it] {'loss': 1.0068, 'learning_rate': 3.151743284452724e-09, 'epoch': 0.98} 98%|█████████▊| 2731/2774 [8:58:17<08:17, 11.56s/it] 98%|█████████▊| 2732/2774 [8:58:29<08:05, 11.56s/it] {'loss': 1.0352, 'learning_rate': 3.0068841912359035e-09, 'epoch': 0.98} 98%|█████████▊| 2732/2774 [8:58:29<08:05, 11.56s/it] 99%|█████████▊| 2733/2774 [8:58:40<07:52, 11.53s/it] {'loss': 1.0337, 'learning_rate': 2.865430845790107e-09, 'epoch': 0.99} 99%|█████████▊| 2733/2774 [8:58:40<07:52, 11.53s/it] 99%|█████████▊| 2734/2774 [8:58:52<07:36, 11.41s/it] {'loss': 0.9585, 'learning_rate': 2.7273834410485033e-09, 'epoch': 0.99} 99%|█████████▊| 2734/2774 [8:58:52<07:36, 11.41s/it] 99%|█████████▊| 2735/2774 [8:59:04<07:33, 11.63s/it] {'loss': 1.0376, 'learning_rate': 2.5927421653001995e-09, 'epoch': 0.99} 99%|█████████▊| 2735/2774 [8:59:04<07:33, 11.63s/it] 99%|█████████▊| 2736/2774 [8:59:15<07:20, 11.59s/it] {'loss': 1.0566, 'learning_rate': 2.4615072021871855e-09, 'epoch': 0.99} 99%|█████████▊| 2736/2774 [8:59:15<07:20, 11.59s/it] 99%|█████████▊| 2737/2774 [8:59:27<07:07, 11.55s/it] {'loss': 1.0405, 'learning_rate': 2.333678730706279e-09, 'epoch': 0.99} 99%|█████████▊| 2737/2774 [8:59:27<07:07, 11.55s/it] 99%|█████████▊| 2738/2774 [8:59:38<06:52, 11.47s/it] {'loss': 1.0356, 'learning_rate': 2.2092569252077366e-09, 'epoch': 0.99} 99%|█████████▊| 2738/2774 [8:59:38<06:52, 11.47s/it] 99%|█████████▊| 2739/2774 [8:59:51<06:57, 11.94s/it] {'loss': 1.0327, 'learning_rate': 2.0882419553952537e-09, 'epoch': 0.99} 99%|█████████▊| 2739/2774 [8:59:51<06:57, 11.94s/it] 99%|█████████▉| 2740/2774 [9:00:03<06:41, 11.82s/it] {'loss': 1.0479, 'learning_rate': 1.9706339863262424e-09, 'epoch': 0.99} 99%|█████████▉| 2740/2774 [9:00:03<06:41, 11.82s/it] 99%|█████████▉| 2741/2774 [9:00:14<06:29, 11.79s/it] {'loss': 1.0444, 'learning_rate': 1.8564331784107214e-09, 'epoch': 0.99} 99%|█████████▉| 2741/2774 [9:00:14<06:29, 11.79s/it] 99%|█████████▉| 2742/2774 [9:00:28<06:33, 12.29s/it] {'loss': 0.9517, 'learning_rate': 1.7456396874115933e-09, 'epoch': 0.99} 99%|█████████▉| 2742/2774 [9:00:28<06:33, 12.29s/it] 99%|█████████▉| 2743/2774 [9:00:39<06:12, 12.02s/it] {'loss': 1.0547, 'learning_rate': 1.6382536644446445e-09, 'epoch': 0.99} 99%|█████████▉| 2743/2774 [9:00:39<06:12, 12.02s/it] 99%|█████████▉| 2744/2774 [9:00:51<05:55, 11.84s/it] {'loss': 1.0366, 'learning_rate': 1.534275255977713e-09, 'epoch': 0.99} 99%|█████████▉| 2744/2774 [9:00:51<05:55, 11.84s/it] 99%|█████████▉| 2745/2774 [9:01:02<05:39, 11.70s/it] {'loss': 1.0293, 'learning_rate': 1.433704603831243e-09, 'epoch': 0.99} 99%|█████████▉| 2745/2774 [9:01:02<05:39, 11.70s/it] 99%|█████████▉| 2746/2774 [9:01:14<05:30, 11.79s/it] {'loss': 1.0122, 'learning_rate': 1.3365418451774526e-09, 'epoch': 0.99} 99%|█████████▉| 2746/2774 [9:01:14<05:30, 11.79s/it] 99%|█████████▉| 2747/2774 [9:01:25<05:16, 11.73s/it] {'loss': 1.1206, 'learning_rate': 1.2427871125403334e-09, 'epoch': 0.99} 99%|█████████▉| 2747/2774 [9:01:25<05:16, 11.73s/it] 99%|█████████▉| 2748/2774 [9:01:37<05:01, 11.58s/it] {'loss': 1.0068, 'learning_rate': 1.1524405337962063e-09, 'epoch': 0.99} 99%|█████████▉| 2748/2774 [9:01:37<05:01, 11.58s/it] 99%|█████████▉| 2749/2774 [9:01:48<04:49, 11.58s/it] {'loss': 1.0703, 'learning_rate': 1.065502232171778e-09, 'epoch': 0.99} 99%|█████████▉| 2749/2774 [9:01:48<04:49, 11.58s/it] 99%|█████████▉| 2750/2774 [9:02:00<04:35, 11.50s/it] {'loss': 1.0166, 'learning_rate': 9.819723262458057e-10, 'epoch': 0.99} 99%|█████████▉| 2750/2774 [9:02:00<04:35, 11.50s/it] 99%|█████████▉| 2751/2774 [9:02:11<04:23, 11.45s/it] {'loss': 1.0249, 'learning_rate': 9.018509299482669e-10, 'epoch': 0.99} 99%|█████████▉| 2751/2774 [9:02:11<04:23, 11.45s/it] 99%|█████████▉| 2752/2774 [9:02:22<04:12, 11.48s/it] {'loss': 1.0063, 'learning_rate': 8.251381525595237e-10, 'epoch': 0.99} 99%|█████████▉| 2752/2774 [9:02:22<04:12, 11.48s/it] 99%|█████████▉| 2753/2774 [9:02:34<04:00, 11.46s/it] {'loss': 1.0864, 'learning_rate': 7.518340987114347e-10, 'epoch': 0.99} 99%|█████████▉| 2753/2774 [9:02:34<04:00, 11.46s/it] 99%|█████████▉| 2754/2774 [9:02:45<03:47, 11.39s/it] {'loss': 0.9976, 'learning_rate': 6.819388683862449e-10, 'epoch': 0.99} 99%|█████████▉| 2754/2774 [9:02:45<03:47, 11.39s/it] 99%|█████████▉| 2755/2774 [9:02:56<03:36, 11.38s/it] {'loss': 1.0342, 'learning_rate': 6.154525569168623e-10, 'epoch': 0.99} 99%|█████████▉| 2755/2774 [9:02:56<03:36, 11.38s/it] 99%|█████████▉| 2756/2774 [9:03:08<03:26, 11.49s/it] {'loss': 1.0415, 'learning_rate': 5.523752549863037e-10, 'epoch': 0.99} 99%|█████████▉| 2756/2774 [9:03:08<03:26, 11.49s/it] 99%|█████████▉| 2757/2774 [9:03:20<03:15, 11.50s/it] {'loss': 1.0205, 'learning_rate': 4.927070486288043e-10, 'epoch': 0.99} 99%|█████████▉| 2757/2774 [9:03:20<03:15, 11.50s/it] 99%|█████████▉| 2758/2774 [9:03:31<03:03, 11.50s/it] {'loss': 1.0024, 'learning_rate': 4.364480192275977e-10, 'epoch': 0.99} 99%|█████████▉| 2758/2774 [9:03:31<03:03, 11.50s/it] 99%|█████████▉| 2759/2774 [9:03:43<02:54, 11.65s/it] {'loss': 1.0308, 'learning_rate': 3.835982435168584e-10, 'epoch': 0.99} 99%|█████████▉| 2759/2774 [9:03:43<02:54, 11.65s/it] 99%|█████████▉| 2760/2774 [9:03:54<02:40, 11.50s/it] {'loss': 0.9834, 'learning_rate': 3.3415779358059174e-10, 'epoch': 0.99} 99%|█████████▉| 2760/2774 [9:03:54<02:40, 11.50s/it] 100%|█████████▉| 2761/2774 [9:04:07<02:35, 11.96s/it] {'loss': 0.9233, 'learning_rate': 2.8812673685235657e-10, 'epoch': 1.0} 100%|█████████▉| 2761/2774 [9:04:07<02:35, 11.96s/it] 100%|█████████▉| 2762/2774 [9:04:20<02:26, 12.20s/it] {'loss': 0.9932, 'learning_rate': 2.4550513611582007e-10, 'epoch': 1.0} 100%|█████████▉| 2762/2774 [9:04:20<02:26, 12.20s/it] 100%|█████████▉| 2763/2774 [9:04:33<02:16, 12.42s/it] {'loss': 1.0405, 'learning_rate': 2.0629304950420258e-10, 'epoch': 1.0} 100%|█████████▉| 2763/2774 [9:04:33<02:16, 12.42s/it] 100%|█████████▉| 2764/2774 [9:04:45<02:01, 12.12s/it] {'loss': 1.0146, 'learning_rate': 1.7049053050083308e-10, 'epoch': 1.0} 100%|█████████▉| 2764/2774 [9:04:45<02:01, 12.12s/it] 100%|█████████▉| 2765/2774 [9:04:56<01:47, 11.95s/it] {'loss': 1.0205, 'learning_rate': 1.380976279374835e-10, 'epoch': 1.0} 100%|█████████▉| 2765/2774 [9:04:56<01:47, 11.95s/it] 100%|█████████▉| 2766/2774 [9:05:08<01:34, 11.85s/it] {'loss': 1.02, 'learning_rate': 1.0911438599686686e-10, 'epoch': 1.0} 100%|█████████▉| 2766/2774 [9:05:08<01:34, 11.85s/it] 100%|█████████▉| 2767/2774 [9:05:19<01:21, 11.61s/it] {'loss': 1.0254, 'learning_rate': 8.35408442095842e-11, 'epoch': 1.0} 100%|█████████▉| 2767/2774 [9:05:19<01:21, 11.61s/it] 100%|█████████▉| 2768/2774 [9:05:30<01:09, 11.55s/it] {'loss': 0.9971, 'learning_rate': 6.137703745717761e-11, 'epoch': 1.0} 100%|█████████▉| 2768/2774 [9:05:30<01:09, 11.55s/it] 100%|█████████▉| 2769/2774 [9:05:42<00:57, 11.53s/it] {'loss': 1.0059, 'learning_rate': 4.262299596907715e-11, 'epoch': 1.0} 100%|█████████▉| 2769/2774 [9:05:42<00:57, 11.53s/it] 100%|█████████▉| 2770/2774 [9:05:53<00:46, 11.58s/it] {'loss': 1.0249, 'learning_rate': 2.7278745325098887e-11, 'epoch': 1.0} 100%|█████████▉| 2770/2774 [9:05:53<00:46, 11.58s/it] 100%|█████████▉| 2771/2774 [9:06:05<00:34, 11.50s/it] {'loss': 1.0352, 'learning_rate': 1.534430645377949e-11, 'epoch': 1.0} 100%|█████████▉| 2771/2774 [9:06:05<00:34, 11.50s/it] 100%|█████████▉| 2772/2774 [9:06:16<00:22, 11.47s/it] {'loss': 0.998, 'learning_rate': 6.819695632931389e-12, 'epoch': 1.0} 100%|█████████▉| 2772/2774 [9:06:16<00:22, 11.47s/it] 100%|█████████▉| 2773/2774 [9:06:27<00:11, 11.42s/it] {'loss': 1.0547, 'learning_rate': 1.7049244896427675e-12, 'epoch': 1.0} 100%|█████████▉| 2773/2774 [9:06:27<00:11, 11.42s/it] 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] {'loss': 1.0215, 'learning_rate': 0.0, 'epoch': 1.0} 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] {'train_runtime': 32801.9753, 'train_samples_per_second': 10.823, 'train_steps_per_second': 0.085, 'train_loss': 1.030239465516853, 'epoch': 1.0} 100%|██████████| 2774/2774 [9:06:40<00:00, 11.69s/it] 100%|██████████| 2774/2774 [9:06:40<00:00, 11.82s/it] 2024-03-10 20:19:34.180 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.180 n193-018-074:2301449:2302675 [1] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.180 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:34.382 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.382 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.382 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-10 20:19:34.418 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.418 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.418 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-10 20:19:34.437 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.437 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.437 n193-018-074:2301455:2302677 [7] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:34.547 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.547 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.547 n193-018-074:2301455:2302677 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:34.621 n193-018-074:2301452:2302680 [4] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.621 n193-018-074:2301453:2302676 [5] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.621 n193-018-074:2301454:2302678 [6] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-10 20:19:34.666 n193-018-074:2301449:2302675 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.666 n193-018-074:2301451:2302679 [3] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.666 n193-018-074:2301450:2302682 [2] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:34.788 n193-018-074:2301454:2301454 [6] NCCL INFO comm 0xb7182140 rank 6 nranks 8 cudaDev 6 busId c5000 - Abort COMPLETE 2024-03-10 20:19:34.791 n193-018-074:2301452:2301452 [4] NCCL INFO comm 0x185a77340 rank 4 nranks 8 cudaDev 4 busId 89000 - Abort COMPLETE 2024-03-10 20:19:34.830 n193-018-074:2301451:2301451 [3] NCCL INFO comm 0xb6209bc0 rank 3 nranks 8 cudaDev 3 busId 4e000 - Abort COMPLETE 2024-03-10 20:19:35.225 n193-018-074:2301453:2301453 [5] NCCL INFO comm 0x1862cf940 rank 5 nranks 8 cudaDev 5 busId 8e000 - Abort COMPLETE 2024-03-10 20:19:35.269 n193-018-074:2301450:2301450 [2] NCCL INFO comm 0xb858a750 rank 2 nranks 8 cudaDev 2 busId 4a000 - Abort COMPLETE 2024-03-10 20:19:37.061 n193-018-074:2301455:2302419 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-10 20:19:37.656 n193-018-074:2301449:2302422 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-10 20:19:40.518 n193-018-074:2301455:2301455 [7] NCCL INFO comm 0x1872d08c0 rank 7 nranks 8 cudaDev 7 busId c9000 - Abort COMPLETE 2024-03-10 20:19:40.520 n193-018-074:2301449:2301449 [1] NCCL INFO comm 0x1862a5d40 rank 1 nranks 8 cudaDev 1 busId 16000 - Abort COMPLETE 2024-03-10 20:19:42.723 n193-018-074:2301448:2302425 [0] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-10 20:19:42.742 n193-018-074:2301448:2302425 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-10 20:19:52.571 n193-018-074:2301448:2302681 [0] NCCL INFO [Service thread] Connection closed by localRank 0 2024-03-10 20:19:53.416 n193-018-074:2301448:2301448 [0] NCCL INFO comm 0x1985b4bb0 rank 0 nranks 8 cudaDev 0 busId 10000 - Abort COMPLETE