File size: 10,715 Bytes
09a501b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
2024-02-12,02:49:55 | INFO | Running with a single process. Device cuda:0.
2024-02-12,02:49:55 | INFO | Loaded ViT-B-32 model config.
2024-02-12,02:49:58 | INFO | Loading pretrained ViT-B-32 weights (laion2b_s34b_b79k).
2024-02-12,02:49:58 | INFO | Model:
2024-02-12,02:49:58 | INFO | CLIP(
  (visual): VisionTransformer(
    (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False)
    (patch_dropout): Identity()
    (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (transformer): Transformer(
      (resblocks): ModuleList(
        (0-11): 12 x ResidualAttentionBlock(
          (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (attn): MultiheadAttention(
            (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
          )
          (ls_1): Identity()
          (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (mlp): Sequential(
            (c_fc): Linear(in_features=768, out_features=3072, bias=True)
            (gelu): GELU(approximate='none')
            (c_proj): Linear(in_features=3072, out_features=768, bias=True)
          )
          (ls_2): Identity()
        )
      )
    )
    (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (transformer): Transformer(
    (resblocks): ModuleList(
      (0-11): 12 x ResidualAttentionBlock(
        (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (ls_1): Identity()
        (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (c_fc): Linear(in_features=512, out_features=2048, bias=True)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=2048, out_features=512, bias=True)
        )
        (ls_2): Identity()
      )
    )
  )
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
2024-02-12,02:49:58 | INFO | Params:
2024-02-12,02:49:58 | INFO |   accum_freq: 1
2024-02-12,02:49:58 | INFO |   aug_cfg: {}
2024-02-12,02:49:58 | INFO |   batch_size: 256
2024-02-12,02:49:58 | INFO |   beta1: 0.9
2024-02-12,02:49:58 | INFO |   beta2: 0.98
2024-02-12,02:49:58 | INFO |   checkpoint_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/checkpoints
2024-02-12,02:49:58 | INFO |   coca_caption_loss_weight: 2.0
2024-02-12,02:49:58 | INFO |   coca_contrastive_loss_weight: 1.0
2024-02-12,02:49:58 | INFO |   copy_codebase: False
2024-02-12,02:49:58 | INFO |   csv_caption_key: captions
2024-02-12,02:49:58 | INFO |   csv_img_key: images
2024-02-12,02:49:58 | INFO |   csv_separator: 	
2024-02-12,02:49:58 | INFO |   dataset_resampled: False
2024-02-12,02:49:58 | INFO |   dataset_type: auto
2024-02-12,02:49:58 | INFO |   ddp_static_graph: True
2024-02-12,02:49:58 | INFO |   debug: False
2024-02-12,02:49:58 | INFO |   delete_previous_checkpoint: False
2024-02-12,02:49:58 | INFO |   device: cuda:0
2024-02-12,02:49:58 | INFO |   dist_backend: nccl
2024-02-12,02:49:58 | INFO |   dist_url: env://
2024-02-12,02:49:58 | INFO |   distill: False
2024-02-12,02:49:58 | INFO |   distill_model: None
2024-02-12,02:49:58 | INFO |   distill_pretrained: None
2024-02-12,02:49:58 | INFO |   distributed: False
2024-02-12,02:49:58 | INFO |   epochs: 5
2024-02-12,02:49:58 | INFO |   epochs_cooldown: None
2024-02-12,02:49:58 | INFO |   eps: 1e-06
2024-02-12,02:49:58 | INFO |   force_custom_text: False
2024-02-12,02:49:58 | INFO |   force_image_size: None
2024-02-12,02:49:58 | INFO |   force_patch_dropout: None
2024-02-12,02:49:58 | INFO |   force_quick_gelu: False
2024-02-12,02:49:58 | INFO |   gather_with_grad: True
2024-02-12,02:49:58 | INFO |   grad_checkpointing: False
2024-02-12,02:49:58 | INFO |   grad_clip_norm: None
2024-02-12,02:49:58 | INFO |   horovod: False
2024-02-12,02:49:58 | INFO |   image_interpolation: None
2024-02-12,02:49:58 | INFO |   image_mean: None
2024-02-12,02:49:58 | INFO |   image_resize_mode: None
2024-02-12,02:49:58 | INFO |   image_std: None
2024-02-12,02:49:58 | INFO |   imagenet_v2: None
2024-02-12,02:49:58 | INFO |   imagenet_val: None
2024-02-12,02:49:58 | INFO |   local_loss: True
2024-02-12,02:49:58 | INFO |   local_rank: 0
2024-02-12,02:49:58 | INFO |   lock_image: False
2024-02-12,02:49:58 | INFO |   lock_image_freeze_bn_stats: False
2024-02-12,02:49:58 | INFO |   lock_image_unlocked_groups: 0
2024-02-12,02:49:58 | INFO |   lock_text: False
2024-02-12,02:49:58 | INFO |   lock_text_freeze_layer_norm: False
2024-02-12,02:49:58 | INFO |   lock_text_unlocked_layers: 0
2024-02-12,02:49:58 | INFO |   log_every_n_steps: 100
2024-02-12,02:49:58 | INFO |   log_level: 20
2024-02-12,02:49:58 | INFO |   log_local: False
2024-02-12,02:49:58 | INFO |   log_path: ./logs/2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16/out.log
2024-02-12,02:49:58 | INFO |   logs: ./logs/
2024-02-12,02:49:58 | INFO |   lr: 1e-05
2024-02-12,02:49:58 | INFO |   lr_cooldown_end: 0.0
2024-02-12,02:49:58 | INFO |   lr_cooldown_power: 1.0
2024-02-12,02:49:58 | INFO |   lr_scheduler: cosine
2024-02-12,02:49:58 | INFO |   model: ViT-B-32
2024-02-12,02:49:58 | INFO |   name: 2024_02_12-02_49_55-model_ViT-B-32-lr_1e-05-b_256-j_8-p_amp_bf16
2024-02-12,02:49:58 | INFO |   no_set_device_rank: False
2024-02-12,02:49:58 | INFO |   precision: amp_bf16
2024-02-12,02:49:58 | INFO |   pretrained: laion2b_s34b_b79k
2024-02-12,02:49:58 | INFO |   pretrained_image: False
2024-02-12,02:49:58 | INFO |   rank: 0
2024-02-12,02:49:58 | INFO |   remote_sync: None
2024-02-12,02:49:58 | INFO |   remote_sync_frequency: 300
2024-02-12,02:49:58 | INFO |   remote_sync_protocol: s3
2024-02-12,02:49:58 | INFO |   report_to: 
2024-02-12,02:49:58 | INFO |   resume: None
2024-02-12,02:49:58 | INFO |   save_frequency: 5
2024-02-12,02:49:58 | INFO |   save_most_recent: False
2024-02-12,02:49:58 | INFO |   seed: 0
2024-02-12,02:49:58 | INFO |   siglip: False
2024-02-12,02:49:58 | INFO |   skip_scheduler: False
2024-02-12,02:49:58 | INFO |   tensorboard: False
2024-02-12,02:49:58 | INFO |   tensorboard_path: 
2024-02-12,02:49:58 | INFO |   torchcompile: False
2024-02-12,02:49:58 | INFO |   torchscript: False
2024-02-12,02:49:58 | INFO |   trace: False
2024-02-12,02:49:58 | INFO |   train_data: ../../train_data_counterfactuals_neg_clip2.csv
2024-02-12,02:49:58 | INFO |   train_data_upsampling_factors: None
2024-02-12,02:49:58 | INFO |   train_num_samples: None
2024-02-12,02:49:58 | INFO |   use_bn_sync: False
2024-02-12,02:49:58 | INFO |   use_bnb_linear: None
2024-02-12,02:49:58 | INFO |   val_data: None
2024-02-12,02:49:58 | INFO |   val_frequency: 5
2024-02-12,02:49:58 | INFO |   val_num_samples: None
2024-02-12,02:49:58 | INFO |   wandb: False
2024-02-12,02:49:58 | INFO |   wandb_notes: 
2024-02-12,02:49:58 | INFO |   wandb_project_name: open-clip
2024-02-12,02:49:58 | INFO |   warmup: 1024
2024-02-12,02:49:58 | INFO |   wd: 0.2
2024-02-12,02:49:58 | INFO |   workers: 8
2024-02-12,02:49:58 | INFO |   world_size: 1
2024-02-12,02:49:58 | INFO |   zeroshot_frequency: 5
2024-02-12,02:49:58 | INFO | Start epoch 0
2024-02-12,02:50:15 | INFO | Train Epoch: 0 [ 1024/27087 (1%)] Data (t): 12.525 Batch (t): 16.592, 15.4295/s, 15.4295/s/gpu LR: 0.000000 Logit Scale: 100.000 Contrastive_loss: 1.0551 (1.0551) Loss: 1.0551 (1.0551)
2024-02-12,02:52:13 | INFO | Train Epoch: 0 [103424/27087 (96%)] Data (t): 0.645 Batch (t): 1.175, 459.500/s, 459.500/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.80440 (0.92975) Loss: 0.80440 (0.92975)
2024-02-12,02:52:20 | INFO | Train Epoch: 0 [107520/27087 (100%)] Data (t): 1.439 Batch (t): 1.884, 43.6989/s, 43.6989/s/gpu LR: 0.000001 Logit Scale: 99.996 Contrastive_loss: 0.73623 (0.86524) Loss: 0.73623 (0.86524)
2024-02-12,02:52:21 | INFO | Start epoch 1
2024-02-12,02:52:33 | INFO | Train Epoch: 1 [ 1024/27087 (1%)] Data (t): 11.817 Batch (t): 12.154, 21.0639/s, 21.0639/s/gpu LR: 0.000001 Logit Scale: 99.995 Contrastive_loss: 0.75390 (0.75390) Loss: 0.75390 (0.75390)
2024-02-12,02:54:37 | INFO | Train Epoch: 1 [103424/27087 (96%)] Data (t): 0.740 Batch (t): 1.238, 460.135/s, 460.135/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.65958 (0.70674) Loss: 0.65958 (0.70674)
2024-02-12,02:54:39 | INFO | Train Epoch: 1 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.557, 459.304/s, 459.304/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.64635 (0.68661) Loss: 0.64635 (0.68661)
2024-02-12,02:54:39 | INFO | Start epoch 2
2024-02-12,02:54:51 | INFO | Train Epoch: 2 [ 1024/27087 (1%)] Data (t): 11.166 Batch (t): 11.505, 22.2512/s, 22.2512/s/gpu LR: 0.000002 Logit Scale: 99.988 Contrastive_loss: 0.53999 (0.53999) Loss: 0.53999 (0.53999)
2024-02-12,02:56:51 | INFO | Train Epoch: 2 [103424/27087 (96%)] Data (t): 0.696 Batch (t): 1.195, 459.292/s, 459.292/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.56759 (0.55379) Loss: 0.56759 (0.55379)
2024-02-12,02:56:54 | INFO | Train Epoch: 2 [107520/27087 (100%)] Data (t): 0.387 Batch (t): 0.888, 457.597/s, 457.597/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.48756 (0.53171) Loss: 0.48756 (0.53171)
2024-02-12,02:56:55 | INFO | Start epoch 3
2024-02-12,02:57:07 | INFO | Train Epoch: 3 [ 1024/27087 (1%)] Data (t): 11.677 Batch (t): 12.022, 21.2941/s, 21.2941/s/gpu LR: 0.000003 Logit Scale: 99.983 Contrastive_loss: 0.44987 (0.44987) Loss: 0.44987 (0.44987)
2024-02-12,02:59:10 | INFO | Train Epoch: 3 [103424/27087 (96%)] Data (t): 0.718 Batch (t): 1.230, 459.886/s, 459.886/s/gpu LR: 0.000004 Logit Scale: 99.981 Contrastive_loss: 0.42789 (0.43888) Loss: 0.42789 (0.43888)
2024-02-12,02:59:12 | INFO | Train Epoch: 3 [107520/27087 (100%)] Data (t): 0.058 Batch (t): 0.558, 459.170/s, 459.170/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.42664 (0.43480) Loss: 0.42664 (0.43480)
2024-02-12,02:59:12 | INFO | Start epoch 4
2024-02-12,02:59:24 | INFO | Train Epoch: 4 [ 1024/27087 (1%)] Data (t): 11.325 Batch (t): 11.659, 21.9575/s, 21.9575/s/gpu LR: 0.000004 Logit Scale: 99.980 Contrastive_loss: 0.34311 (0.34311) Loss: 0.34311 (0.34311)
2024-02-12,03:01:24 | INFO | Train Epoch: 4 [103424/27087 (96%)] Data (t): 0.712 Batch (t): 1.198, 459.840/s, 459.840/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.32785 (0.33548) Loss: 0.32785 (0.33548)
2024-02-12,03:01:27 | INFO | Train Epoch: 4 [107520/27087 (100%)] Data (t): 0.180 Batch (t): 0.623, 313.004/s, 313.004/s/gpu LR: 0.000005 Logit Scale: 99.989 Contrastive_loss: 0.36298 (0.34464) Loss: 0.36298 (0.34464)