| Using devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=2, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=3, process_index=0, coords=(1,1,0), core_on_chip=0)] | |
| Device count 4 | |
| Global device count 4 | |
| Global Batch: 256 | |
| Node Batch: 256 | |
| Device Batch: 64 | |
| Loading dataset | |
| Loading dataset | |
| DiT: Input of shape (1, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (1, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (1, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (1, 768) dtype float32 | |
| [3m DiT Summary [0m | |
| ββββββββββββββββββββββββββββββββββββ³βββββββββββββββββββ³ββββββββββββββββββββββββ³ββββββββββββββββββββββββ³βββββββββββββββββββββββββββββββ | |
| β[1m [0m[1mpath [0m[1m [0mβ[1m [0m[1mmodule [0m[1m [0mβ[1m [0m[1minputs [0m[1m [0mβ[1m [0m[1moutputs [0m[1m [0mβ[1m [0m[1mparams [0m[1m [0mβ | |
| β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© | |
| β β DiT β - [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,32,32,4] β β | |
| β β β - [2mfloat32[0m[1] β β β | |
| β β β - [2mfloat32[0m[1] β β β | |
| β β β - [2mint32[0m[1] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β PatchEmbed_0 β PatchEmbed β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β PatchEmbed_0/Conv_0 β Conv β [2mfloat32[0m[1,32,32,4] β [2mbfloat16[0m[1,16,16,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[2,2,4,768] β | |
| β β β β β β | |
| β β β β β [1m13,056 [0m[1;2m(52.2 KB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_0 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_0/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[256,768] β | |
| β β β β β β | |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_0/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_1 β TimestepEmbedder β [2mfloat32[0m[1] β [2mfloat32[0m[1,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_1/Dense_0 β Dense β [2mbfloat16[0m[1,256] β [2mbfloat16[0m[1,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[256,768] β | |
| β β β β β β | |
| β β β β β [1m197,376 [0m[1;2m(789.5 KB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β TimestepEmbedder_1/Dense_1 β Dense β [2mbfloat16[0m[1,768] β [2mfloat32[0m[1,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β LabelEmbedder_0 β LabelEmbedder β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β LabelEmbedder_0/Embed_0 β Embed β [2mint32[0m[1] β [2mbfloat16[0m[1,768] β embedding: [2mfloat32[0m[1001,768] β | |
| β β β β β β | |
| β β β β β [1m768,768 [0m[1;2m(3.1 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_0/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_1/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_2/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_3/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_4/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_5/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_6/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_7/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_8/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_9/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_10/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11 β DiTBlock β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,4608] β bias: [2mfloat32[0m[4608] β | |
| β β β β β kernel: [2mfloat32[0m[768,4608] β | |
| β β β β β β | |
| β β β β β [1m3,543,552 [0m[1;2m(14.2 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/Dense_2 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/Dense_3 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/Dense_4 β Dense β [2mfloat32[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[768,768] β | |
| β β β β β β | |
| β β β β β [1m590,592 [0m[1;2m(2.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/LayerNorm_1 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/MlpBlock_0 β MlpBlock β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/MlpBlock_0/Dense_0 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,3072] β bias: [2mfloat32[0m[3072] β | |
| β β β β β kernel: [2mfloat32[0m[768,3072] β | |
| β β β β β β | |
| β β β β β [1m2,362,368 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/MlpBlock_0/Dropout_0 β Dropout β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,3072] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/MlpBlock_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,3072] β [2mbfloat16[0m[1,256,768] β bias: [2mfloat32[0m[768] β | |
| β β β β β kernel: [2mfloat32[0m[3072,768] β | |
| β β β β β β | |
| β β β β β [1m2,360,064 [0m[1;2m(9.4 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β DiTBlock_11/MlpBlock_0/Dropout_1 β Dropout β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β FinalLayer_0 β FinalLayer β - [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,16] β β | |
| β β β - [2mfloat32[0m[1,768] β β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β FinalLayer_0/Dense_0 β Dense β [2mfloat32[0m[1,768] β [2mbfloat16[0m[1,1536] β bias: [2mfloat32[0m[1536] β | |
| β β β β β kernel: [2mfloat32[0m[768,1536] β | |
| β β β β β β | |
| β β β β β [1m1,181,184 [0m[1;2m(4.7 MB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β FinalLayer_0/LayerNorm_0 β LayerNorm β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,768] β β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β FinalLayer_0/Dense_1 β Dense β [2mbfloat16[0m[1,256,768] β [2mbfloat16[0m[1,256,16] β bias: [2mfloat32[0m[16] β | |
| β β β β β kernel: [2mfloat32[0m[768,16] β | |
| β β β β β β | |
| β β β β β [1m12,304 [0m[1;2m(49.2 KB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β Embed_0 β Embed β [2mint32[0m[1] β [2mfloat32[0m[1,1] β embedding: [2mfloat32[0m[256,1] β | |
| β β β β β β | |
| β β β β β [1m256 [0m[1;2m(1.0 KB)[0m β | |
| ββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ€ | |
| β[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m [0m[1m [0mβ[1m [0m[1m Total[0m[1m [0mβ[1m [0m[1m131,091,728 [0m[1;2m(524.4 MB)[0m[1m [0m[1m [0mβ | |
| ββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ | |
| [1m [0m | |
| [1m Total Parameters: 131,091,728 [0m[1;2m(524.4 MB)[0m[1m [0m | |
| DiT: Input of shape (1, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (1, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (1, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (1, 768) dtype float32 | |
| Loaded checkpoint from 1097920 seconds ago. | |
| parameter shapes: | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) | |
| ('Embed_0', 'embedding'): (256, 1) | |
| parameter shapes: | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('Embed_0', 'embedding'): (1, 256, 1) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| parameter shapes: | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('Embed_0', 'embedding'): (1, 256, 1) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| parameter shapes: | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('Embed_0', 'embedding'): (1, 256, 1) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| parameter shapes: | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (1, 4608) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (1, 768, 4608) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (1, 768, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (1, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (1, 768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (1, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (1, 3072, 768) | |
| ('Embed_0', 'embedding'): (1, 256, 1) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1, 1536) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (1, 768, 1536) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (1, 16) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (1, 768, 16) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1, 1001, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (1, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (1, 2, 2, 4, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (1, 768, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (1, 256, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (1, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (1, 768, 768) | |
| parameter shapes: | |
| ('DiTBlock_0', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_0', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_0', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_0', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_1', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_1', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_1', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_1', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_1', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_10', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_10', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_10', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_10', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_10', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_11', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_11', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_11', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_11', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_11', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_2', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_2', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_2', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_2', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_2', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_3', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_3', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_3', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_3', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_3', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_4', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_4', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_4', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_4', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_4', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_5', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_5', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_5', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_5', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_5', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_6', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_6', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_6', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_6', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_6', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_7', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_7', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_7', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_7', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_7', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_8', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_8', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_8', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_8', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_8', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('DiTBlock_9', 'Dense_0', 'bias'): (4608,) | |
| ('DiTBlock_9', 'Dense_0', 'kernel'): (768, 4608) | |
| ('DiTBlock_9', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_1', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_2', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_2', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_3', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_3', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'Dense_4', 'bias'): (768,) | |
| ('DiTBlock_9', 'Dense_4', 'kernel'): (768, 768) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'bias'): (3072,) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_0', 'kernel'): (768, 3072) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'bias'): (768,) | |
| ('DiTBlock_9', 'MlpBlock_0', 'Dense_1', 'kernel'): (3072, 768) | |
| ('Embed_0', 'embedding'): (256, 1) | |
| ('FinalLayer_0', 'Dense_0', 'bias'): (1536,) | |
| ('FinalLayer_0', 'Dense_0', 'kernel'): (768, 1536) | |
| ('FinalLayer_0', 'Dense_1', 'bias'): (16,) | |
| ('FinalLayer_0', 'Dense_1', 'kernel'): (768, 16) | |
| ('LabelEmbedder_0', 'Embed_0', 'embedding'): (1001, 768) | |
| ('PatchEmbed_0', 'Conv_0', 'bias'): (768,) | |
| ('PatchEmbed_0', 'Conv_0', 'kernel'): (2, 2, 4, 768) | |
| ('TimestepEmbedder_0', 'Dense_0', 'bias'): (768,) | |
| ('TimestepEmbedder_0', 'Dense_0', 'kernel'): (256, 768) | |
| ('TimestepEmbedder_0', 'Dense_1', 'bias'): (768,) | |
| ('TimestepEmbedder_0', 'Dense_1', 'kernel'): (768, 768) | |
| ('TimestepEmbedder_1', 'Dense_0', 'bias'): (768,) | |
| ('TimestepEmbedder_1', 'Dense_0', 'kernel'): (256, 768) | |
| ('TimestepEmbedder_1', 'Dense_1', 'bias'): (768,) | |
| ('TimestepEmbedder_1', 'Dense_1', 'kernel'): (768, 768) | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β β | |
| β β | |
| β β | |
| β β | |
| β TPU 0,1,2,3 β | |
| β β | |
| β β | |
| β β | |
| β β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β β | |
| β β | |
| β β | |
| β β | |
| β TPU 0,1,2,3 β | |
| β β | |
| β β | |
| β β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Calc FID for CFG 1.0 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 36.864280700683594 | |
| Calc FID for CFG 1.0 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 18.943756103515625 | |
| Calc FID for CFG 1.0 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 19.45444679260254 | |
| Calc FID for CFG 1.0 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 22.834754943847656 | |
| Calc FID for CFG 1.0 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 28.95001983642578 | |
| Calc FID for CFG 1.0 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 43.88872528076172 | |
| Calc FID for CFG 1.0 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 64.87855529785156 | |
| Calc FID for CFG 1.0 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 39.091678619384766 | |
| Calc FID for CFG 1.25 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 21.947959899902344 | |
| Calc FID for CFG 1.25 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 11.55620002746582 | |
| Calc FID for CFG 1.25 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.410966873168945 | |
| Calc FID for CFG 1.25 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 14.745138168334961 | |
| Calc FID for CFG 1.25 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 19.643573760986328 | |
| Calc FID for CFG 1.25 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 32.41807556152344 | |
| Calc FID for CFG 1.25 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 50.73760986328125 | |
| Calc FID for CFG 1.25 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 30.63207244873047 | |
| Calc FID for CFG 1.75 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 9.499155044555664 | |
| Calc FID for CFG 1.75 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 9.292729377746582 | |
| Calc FID for CFG 1.75 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.044074058532715 | |
| Calc FID for CFG 1.75 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.720970153808594 | |
| Calc FID for CFG 1.75 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.427814483642578 | |
| Calc FID for CFG 1.75 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 22.363943099975586 | |
| Calc FID for CFG 1.75 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 37.685306549072266 | |
| Calc FID for CFG 1.75 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 32.443660736083984 | |
| Calc FID for CFG 2.0 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 7.922760963439941 | |
| Calc FID for CFG 2.0 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.212583541870117 | |
| Calc FID for CFG 2.0 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.722338676452637 | |
| Calc FID for CFG 2.0 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.903777122497559 | |
| Calc FID for CFG 2.0 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.852094650268555 | |
| Calc FID for CFG 2.0 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 20.36902618408203 | |
| Calc FID for CFG 2.0 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 34.6470947265625 | |
| Calc FID for CFG 2.0 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 37.30630111694336 | |
| Calc FID for CFG 2.25 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 7.680515766143799 | |
| Calc FID for CFG 2.25 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 11.41958999633789 | |
| Calc FID for CFG 2.25 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 11.761509895324707 | |
| Calc FID for CFG 2.25 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 11.56173038482666 | |
| Calc FID for CFG 2.25 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.88215446472168 | |
| Calc FID for CFG 2.25 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 19.283063888549805 | |
| Calc FID for CFG 2.25 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 32.80674743652344 | |
| Calc FID for CFG 2.25 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 43.063232421875 | |
| Calc FID for CFG 2.5 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 8.174129486083984 | |
| Calc FID for CFG 2.5 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.659521102905273 | |
| Calc FID for CFG 2.5 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.81586742401123 | |
| Calc FID for CFG 2.5 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 12.36107349395752 | |
| Calc FID for CFG 2.5 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.175222396850586 | |
| Calc FID for CFG 2.5 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 18.757583618164062 | |
| Calc FID for CFG 2.5 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 31.73568344116211 | |
| Calc FID for CFG 2.5 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 48.83190155029297 | |
| Calc FID for CFG 2.75 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 9.024274826049805 | |
| Calc FID for CFG 2.75 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.811668395996094 | |
| Calc FID for CFG 2.75 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.847970962524414 | |
| Calc FID for CFG 2.75 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.199394226074219 | |
| Calc FID for CFG 2.75 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 13.646522521972656 | |
| Calc FID for CFG 2.75 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 18.569910049438477 | |
| Calc FID for CFG 2.75 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 31.189559936523438 | |
| Calc FID for CFG 2.75 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 54.34938049316406 | |
| Calc FID for CFG 3.0 and denoise_timesteps 128 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 10.017110824584961 | |
| Calc FID for CFG 3.0 and denoise_timesteps 64 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 14.84672737121582 | |
| Calc FID for CFG 3.0 and denoise_timesteps 32 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 14.882085800170898 | |
| Calc FID for CFG 3.0 and denoise_timesteps 16 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 14.022567749023438 | |
| Calc FID for CFG 3.0 and denoise_timesteps 8 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 14.15646743774414 | |
| Calc FID for CFG 3.0 and denoise_timesteps 4 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 18.565689086914062 | |
| Calc FID for CFG 3.0 and denoise_timesteps 2 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 31.047740936279297 | |
| Calc FID for CFG 3.0 and denoise_timesteps 1 | |
| DiT: Input of shape (256, 32, 32, 4) dtype float32 | |
| DiT: After patch embed, shape is (256, 256, 768) dtype bfloat16 | |
| DiT: Patch Embed of shape (256, 256, 768) dtype bfloat16 | |
| DiT: Conditioning of shape (256, 768) dtype float32 | |
| FID is 59.46251678466797 | |