INFO:__main__:Initializing DDP settings... INFO:__main__: is_ddp = True INFO:__main__:Initializing PyTorch settings... INFO:__main__:Initializing models and optimizers... INFO:__main__: Initializing a new model from scrach for pre-train... INFO:__main__: Loading tokenizer from /root/Nano/tokenizer/tokenizer_16384.json... INFO:__main__: block_size = 512 INFO:__main__: vocab_size = 16384 INFO:__main__: n_layer = 16 INFO:__main__: n_embd = 512 INFO:__main__: n_head = 16 INFO:__main__: n_kv_head = 8 INFO:__main__: n_hidden = 1408 INFO:__main__: Parameters = 55,591,424 (0.055591424B) INFO:__main__:Loading dataset... INFO:__main__: Train set 0 : 19,728,099 samples (10,100,786,688 tokens) INFO:__main__: Valid set 0 : 986,490 samples (505,082,880 tokens) INFO:__main__:2024-10-26 18:10:44 | Start training from iteration #0 INFO:__main__:2024-10-26 18:10:53 | Epoch: 0 | Step: 10 | Dataset: 0-9000 | Loss: 9.646 | 667 ms/step , 58909.03 GFLOP/s , 467271.8 tokens/s INFO:__main__:2024-10-26 18:11:00 | Epoch: 0 | Step: 20 | Dataset: 0-17000 | Loss: 8.915 | 669 ms/step , 58798.16 GFLOP/s , 537662.4 tokens/s INFO:__main__:2024-10-26 18:11:08 | Epoch: 0 | Step: 30 | Dataset: 0-25000 | Loss: 8.210 | 669 ms/step , 58738.40 GFLOP/s , 536600.9 tokens/s INFO:__main__:2024-10-26 18:11:15 | Epoch: 0 | Step: 40 | Dataset: 0-33000 | Loss: 7.644 | 668 ms/step , 58854.59 GFLOP/s , 536962.9 tokens/s INFO:__main__:2024-10-26 18:11:23 | Epoch: 0 | Step: 50 | Dataset: 0-41000 | Loss: 7.282 | 670 ms/step , 58654.35 GFLOP/s , 536397.2 tokens/s INFO:__main__:2024-10-26 18:11:31 | Epoch: 0 | Step: 60 | Dataset: 0-49000 | Loss: 7.166 | 670 ms/step , 58651.93 GFLOP/s , 535559.8 tokens/s INFO:__main__:2024-10-26 18:11:38 | Epoch: 0 | Step: 70 | Dataset: 0-57000 | Loss: 7.125 | 672 ms/step , 58533.54 GFLOP/s , 535158.9 tokens/s INFO:__main__:2024-10-26 18:11:46 | Epoch: 0 | Step: 80 | Dataset: 0-65000 | Loss: 6.948 | 671 ms/step , 58570.42 GFLOP/s , 535292.3 tokens/s INFO:__main__:2024-10-26 18:11:54 | Epoch: 0 | Step: 90 | Dataset: 0-73000 | Loss: 6.785 | 672 ms/step , 58469.80 GFLOP/s , 534801.9 tokens/s INFO:__main__:2024-10-26 18:12:01 | Epoch: 0 | Step: 100 | Dataset: 0-81000 | Loss: 9.467 | 672 ms/step , 58467.26 GFLOP/s , 534731.2 tokens/s INFO:__main__:2024-10-26 18:12:09 | Epoch: 0 | Step: 110 | Dataset: 0-89000 | Loss: 9.212 | 671 ms/step , 58560.03 GFLOP/s , 535442.0 tokens/s INFO:__main__:2024-10-26 18:12:17 | Epoch: 0 | Step: 120 | Dataset: 0-97000 | Loss: 9.094 | 672 ms/step , 58539.43 GFLOP/s , 535376.9 tokens/s INFO:__main__:2024-10-26 18:12:24 | Epoch: 0 | Step: 130 | Dataset: 0-105000 | Loss: 9.007 | 672 ms/step , 58485.96 GFLOP/s , 535488.0 tokens/s INFO:__main__:2024-10-26 18:12:32 | Epoch: 0 | Step: 140 | Dataset: 0-113000 | Loss: 8.916 | 673 ms/step , 58448.40 GFLOP/s , 534698.9 tokens/s INFO:__main__:2024-10-26 18:12:40 | Epoch: 0 | Step: 150 | Dataset: 0-121000 | Loss: 8.798 | 673 ms/step , 58384.51 GFLOP/s , 534811.5 tokens/s INFO:__main__:2024-10-26 18:12:47 | Epoch: 0 | Step: 160 | Dataset: 0-129000 | Loss: 8.679 | 672 ms/step , 58469.01 GFLOP/s , 534582.2 tokens/s INFO:__main__:2024-10-26 18:12:55 | Epoch: 0 | Step: 170 | Dataset: 0-137000 | Loss: 8.527 | 672 ms/step , 58455.82 GFLOP/s , 534866.7 tokens/s INFO:__main__:2024-10-26 18:13:03 | Epoch: 0 | Step: 180 | Dataset: 0-145000 | Loss: 8.395 | 672 ms/step , 58485.47 GFLOP/s , 534717.0 tokens/s INFO:__main__:2024-10-26 18:13:10 | Epoch: 0 | Step: 190 | Dataset: 0-153000 | Loss: 8.262 | 672 ms/step , 58471.45 GFLOP/s , 534616.2 tokens/s INFO:__main__:2024-10-26 18:13:18 | Epoch: 0 | Step: 200 | Dataset: 0-161000 | Loss: 8.124 | 674 ms/step , 58313.42 GFLOP/s , 534455.7 tokens/s INFO:__main__:2024-10-26 18:13:26 | Epoch: 0 | Step: 210 | Dataset: 0-169000 | Loss: 8.002 | 672 ms/step , 58456.67 GFLOP/s , 534743.8 tokens/s INFO:__main__:2024-10-26 18:13:33 | Epoch: 0 | Step: 220 | Dataset: 0-177000 | Loss: 7.903 | 674 ms/step , 58322.04 GFLOP/s , 534616.6 tokens/s INFO:__main__:2024-10-26 18:13:41 | Epoch: 0 | Step: 230 | Dataset: 0-185000 | Loss: 7.768 | 674 ms/step , 58361.22 GFLOP/s , 533818.8 tokens/s INFO:__main__:2024-10-26 18:13:49 | Epoch: 0 | Step: 240 | Dataset: 0-193000 | Loss: 7.649 | 675 ms/step , 58277.47 GFLOP/s , 534371.4 tokens/s INFO:__main__:2024-10-26 18:13:56 | Epoch: 0 | Step: 250 | Dataset: 0-201000 | Loss: 7.561 | 674 ms/step , 58324.82 GFLOP/s , 533706.7 tokens/s INFO:__main__:2024-10-26 18:14:04 | Epoch: 0 | Step: 260 | Dataset: 0-209000 | Loss: 7.478 | 674 ms/step , 58322.90 GFLOP/s , 534327.7 tokens/s INFO:__main__:2024-10-26 18:14:12 | Epoch: 0 | Step: 270 | Dataset: 0-217000 | Loss: 7.397 | 674 ms/step , 58314.58 GFLOP/s , 534391.1 tokens/s INFO:__main__:2024-10-26 18:14:19 | Epoch: 0 | Step: 280 | Dataset: 0-225000 | Loss: 7.294 | 673 ms/step , 58395.30 GFLOP/s , 534748.7 tokens/s INFO:__main__:2024-10-26 18:14:27 | Epoch: 0 | Step: 290 | Dataset: 0-233000 | Loss: 7.167 | 673 ms/step , 58436.65 GFLOP/s , 534810.9 tokens/s INFO:__main__:2024-10-26 18:14:35 | Epoch: 0 | Step: 300 | Dataset: 0-241000 | Loss: 7.080 | 675 ms/step , 58235.33 GFLOP/s , 534703.0 tokens/s INFO:__main__:2024-10-26 18:14:42 | Epoch: 0 | Step: 310 | Dataset: 0-249000 | Loss: 6.959 | 672 ms/step , 58493.96 GFLOP/s , 535323.6 tokens/s INFO:__main__:2024-10-26 18:14:50 | Epoch: 0 | Step: 320 | Dataset: 0-257000 | Loss: 6.879 | 673 ms/step , 58437.95 GFLOP/s , 534651.9 tokens/s INFO:__main__:2024-10-26 18:14:58 | Epoch: 0 | Step: 330 | Dataset: 0-265000 | Loss: 6.782 | 673 ms/step , 58379.96 GFLOP/s , 534721.6 tokens/s INFO:__main__:2024-10-26 18:15:05 | Epoch: 0 | Step: 340 | Dataset: 0-273000 | Loss: 6.670 | 674 ms/step , 58353.68 GFLOP/s , 534162.6 tokens/s INFO:__main__:2024-10-26 18:15:13 | Epoch: 0 | Step: 350 | Dataset: 0-281000 | Loss: 6.635 | 672 ms/step , 58466.50 GFLOP/s , 534667.6 tokens/s INFO:__main__:2024-10-26 18:15:21 | Epoch: 0 | Step: 360 | Dataset: 0-289000 | Loss: 6.517 | 675 ms/step , 58268.04 GFLOP/s , 534301.2 tokens/s INFO:__main__:2024-10-26 18:15:28 | Epoch: 0 | Step: 370 | Dataset: 0-297000 | Loss: 6.410 | 673 ms/step , 58431.11 GFLOP/s , 534830.1 tokens/s INFO:__main__:2024-10-26 18:15:36 | Epoch: 0 | Step: 380 | Dataset: 0-305000 | Loss: 6.394 | 674 ms/step , 58329.33 GFLOP/s , 534233.6 tokens/s INFO:__main__:2024-10-26 18:15:44 | Epoch: 0 | Step: 390 | Dataset: 0-313000 | Loss: 6.283 | 674 ms/step , 58337.91 GFLOP/s , 534509.7 tokens/s INFO:__main__:2024-10-26 18:15:51 | Epoch: 0 | Step: 400 | Dataset: 0-321000 | Loss: 6.214 | 676 ms/step , 58126.72 GFLOP/s , 534512.3 tokens/s INFO:__main__:2024-10-26 18:15:59 | Epoch: 0 | Step: 410 | Dataset: 0-329000 | Loss: 6.151 | 672 ms/step , 58511.09 GFLOP/s , 534774.0 tokens/s INFO:__main__:2024-10-26 18:16:07 | Epoch: 0 | Step: 420 | Dataset: 0-337000 | Loss: 5.807 | 675 ms/step , 58274.59 GFLOP/s , 533634.7 tokens/s INFO:__main__:2024-10-26 18:16:14 | Epoch: 0 | Step: 430 | Dataset: 0-345000 | Loss: 5.628 | 673 ms/step , 58419.67 GFLOP/s , 533582.8 tokens/s INFO:__main__:2024-10-26 18:16:22 | Epoch: 0 | Step: 440 | Dataset: 0-353000 | Loss: 5.436 | 673 ms/step , 58412.03 GFLOP/s , 533919.8 tokens/s INFO:__main__:2024-10-26 18:16:30 | Epoch: 0 | Step: 450 | Dataset: 0-361000 | Loss: 5.278 | 673 ms/step , 58386.52 GFLOP/s , 533122.6 tokens/s INFO:__main__:2024-10-26 18:16:37 | Epoch: 0 | Step: 460 | Dataset: 0-369000 | Loss: 5.152 | 673 ms/step , 58378.24 GFLOP/s , 533486.9 tokens/s INFO:__main__:2024-10-26 18:16:45 | Epoch: 0 | Step: 470 | Dataset: 0-377000 | Loss: 5.030 | 677 ms/step , 58066.68 GFLOP/s , 533278.2 tokens/s INFO:__main__:2024-10-26 18:16:53 | Epoch: 0 | Step: 480 | Dataset: 0-385000 | Loss: 4.916 | 674 ms/step , 58315.95 GFLOP/s , 533728.2 tokens/s INFO:__main__:2024-10-26 18:17:00 | Epoch: 0 | Step: 490 | Dataset: 0-393000 | Loss: 4.799 | 674 ms/step , 58360.83 GFLOP/s , 533141.3 tokens/s INFO:__main__:2024-10-26 18:17:08 | Epoch: 0 | Step: 500 | Dataset: 0-401000 | Loss: 4.674 | 674 ms/step , 58357.67 GFLOP/s , 533699.9 tokens/s INFO:__main__:2024-10-26 18:17:16 | Epoch: 0 | Step: 510 | Dataset: 0-409000 | Loss: 6.109 | 673 ms/step , 58384.99 GFLOP/s , 534391.7 tokens/s INFO:__main__:2024-10-26 18:17:23 | Epoch: 0 | Step: 520 | Dataset: 0-417000 | Loss: 5.964 | 673 ms/step , 58387.55 GFLOP/s , 534407.7 tokens/s INFO:__main__:2024-10-26 18:17:31 | Epoch: 0 | Step: 530 | Dataset: 0-425000 | Loss: 5.822 | 674 ms/step , 58301.85 GFLOP/s , 534230.3 tokens/s INFO:__main__:2024-10-26 18:17:39 | Epoch: 0 | Step: 540 | Dataset: 0-433000 | Loss: 5.769 | 673 ms/step , 58446.33 GFLOP/s , 534232.3 tokens/s INFO:__main__:2024-10-26 18:17:46 | Epoch: 0 | Step: 550 | Dataset: 0-441000 | Loss: 5.761 | 673 ms/step , 58389.67 GFLOP/s , 533944.6 tokens/s INFO:__main__:2024-10-26 18:17:54 | Epoch: 0 | Step: 560 | Dataset: 0-449000 | Loss: 5.670 | 674 ms/step , 58348.04 GFLOP/s , 533749.7 tokens/s INFO:__main__:2024-10-26 18:18:02 | Epoch: 0 | Step: 570 | Dataset: 0-457000 | Loss: 5.659 | 674 ms/step , 58361.53 GFLOP/s , 534293.3 tokens/s INFO:__main__:2024-10-26 18:18:09 | Epoch: 0 | Step: 580 | Dataset: 0-465000 | Loss: 5.512 | 674 ms/step , 58301.04 GFLOP/s , 534062.7 tokens/s INFO:__main__:2024-10-26 18:18:17 | Epoch: 0 | Step: 590 | Dataset: 0-473000 | Loss: 5.446 | 673 ms/step , 58369.98 GFLOP/s , 534219.5 tokens/s INFO:__main__:2024-10-26 18:18:25 | Epoch: 0 | Step: 600 | Dataset: 0-481000 | Loss: 5.396 | 676 ms/step , 58192.45 GFLOP/s , 534600.2 tokens/s INFO:__main__:2024-10-26 18:18:32 | Epoch: 0 | Step: 610 | Dataset: 0-489000 | Loss: 5.285 | 675 ms/step , 58260.96 GFLOP/s , 534353.6 tokens/s INFO:__main__:2024-10-26 18:18:40 | Epoch: 0 | Step: 620 | Dataset: 0-497000 | Loss: 5.233 | 674 ms/step , 58311.34 GFLOP/s , 534707.2 tokens/s INFO:__main__:2024-10-26 18:18:48 | Epoch: 0 | Step: 630 | Dataset: 0-505000 | Loss: 5.165 | 674 ms/step , 58329.51 GFLOP/s , 534615.7 tokens/s INFO:__main__:2024-10-26 18:18:55 | Epoch: 0 | Step: 640 | Dataset: 0-513000 | Loss: 5.213 | 674 ms/step , 58293.34 GFLOP/s , 533880.4 tokens/s INFO:__main__:2024-10-26 18:19:03 | Epoch: 0 | Step: 650 | Dataset: 0-521000 | Loss: 5.173 | 675 ms/step , 58203.13 GFLOP/s , 533075.2 tokens/s INFO:__main__:2024-10-26 18:19:11 | Epoch: 0 | Step: 660 | Dataset: 0-529000 | Loss: 5.080 | 673 ms/step , 58382.26 GFLOP/s , 534311.6 tokens/s INFO:__main__:2024-10-26 18:19:18 | Epoch: 0 | Step: 670 | Dataset: 0-537000 | Loss: 5.104 | 675 ms/step , 58238.11 GFLOP/s , 534316.2 tokens/s INFO:__main__:2024-10-26 18:19:26 | Epoch: 0 | Step: 680 | Dataset: 0-545000 | Loss: 5.079 | 674 ms/step , 58298.41 GFLOP/s , 534726.9 tokens/s INFO:__main__:2024-10-26 18:19:34 | Epoch: 0 | Step: 690 | Dataset: 0-553000 | Loss: 5.045 | 674 ms/step , 58356.54 GFLOP/s , 534167.2 tokens/s INFO:__main__:2024-10-26 18:19:41 | Epoch: 0 | Step: 700 | Dataset: 0-561000 | Loss: 4.965 | 674 ms/step , 58308.46 GFLOP/s , 534386.6 tokens/s INFO:__main__:2024-10-26 18:19:49 | Epoch: 0 | Step: 710 | Dataset: 0-569000 | Loss: 4.895 | 674 ms/step , 58310.85 GFLOP/s , 534421.0 tokens/s INFO:__main__:2024-10-26 18:19:57 | Epoch: 0 | Step: 720 | Dataset: 0-577000 | Loss: 4.934 | 675 ms/step , 58248.18 GFLOP/s , 534362.6 tokens/s INFO:__main__:2024-10-26 18:20:04 | Epoch: 0 | Step: 730 | Dataset: 0-585000 | Loss: 4.862 | 674 ms/step , 58305.68 GFLOP/s , 534569.7 tokens/s INFO:__main__:2024-10-26 18:20:12 | Epoch: 0 | Step: 740 | Dataset: 0-593000 | Loss: 4.808 | 674 ms/step , 58333.58 GFLOP/s , 534145.2 tokens/s INFO:__main__:2024-10-26 18:20:20 | Epoch: 0 | Step: 750 | Dataset: 0-601000 | Loss: 4.831 | 674 ms/step , 58342.24 GFLOP/s , 534907.2 tokens/s INFO:__main__:2024-10-26 18:20:27 | Epoch: 0 | Step: 760 | Dataset: 0-609000 | Loss: 4.756 | 674 ms/step , 58354.62 GFLOP/s , 534694.4 tokens/s INFO:__main__:2024-10-26 18:20:35 | Epoch: 0 | Step: 770 | Dataset: 0-617000 | Loss: 4.715 | 674 ms/step , 58318.38 GFLOP/s , 533980.1 tokens/s INFO:__main__:2024-10-26 18:20:43 | Epoch: 0 | Step: 780 | Dataset: 0-625000 | Loss: 4.707 | 673 ms/step , 58396.14 GFLOP/s , 534399.2 tokens/s INFO:__main__:2024-10-26 18:20:50 | Epoch: 0 | Step: 790 | Dataset: 0-633000 | Loss: 4.663 | 674 ms/step , 58345.32 GFLOP/s , 534333.0 tokens/s INFO:__main__:2024-10-26 18:20:58 | Epoch: 0 | Step: 800 | Dataset: 0-641000 | Loss: 4.663 | 676 ms/step , 58153.84 GFLOP/s , 533730.0 tokens/s INFO:__main__:2024-10-26 18:21:06 | Epoch: 0 | Step: 810 | Dataset: 0-649000 | Loss: 4.625 | 675 ms/step , 58225.35 GFLOP/s , 533982.6 tokens/s INFO:__main__:2024-10-26 18:21:13 | Epoch: 0 | Step: 820 | Dataset: 0-657000 | Loss: 4.642 | 675 ms/step , 58260.04 GFLOP/s , 534363.8 tokens/s INFO:__main__:2024-10-26 18:21:21 | Epoch: 0 | Step: 830 | Dataset: 0-665000 | Loss: 4.569 | 675 ms/step , 58213.88 GFLOP/s , 533216.5 tokens/s INFO:__main__:2024-10-26 18:21:29 | Epoch: 0 | Step: 840 | Dataset: 0-673000 | Loss: 4.592 | 674 ms/step , 58309.74 GFLOP/s , 534001.5 tokens/s INFO:__main__:2024-10-26 18:21:36 | Epoch: 0 | Step: 850 | Dataset: 0-681000 | Loss: 4.645 | 673 ms/step , 58403.21 GFLOP/s , 534159.3 tokens/s INFO:__main__:2024-10-26 18:21:44 | Epoch: 0 | Step: 860 | Dataset: 0-689000 | Loss: 4.557 | 674 ms/step , 58282.40 GFLOP/s , 533859.4 tokens/s INFO:__main__:2024-10-26 18:21:52 | Epoch: 0 | Step: 870 | Dataset: 0-697000 | Loss: 4.489 | 675 ms/step , 58278.19 GFLOP/s , 533872.7 tokens/s INFO:__main__:2024-10-26 18:21:59 | Epoch: 0 | Step: 880 | Dataset: 0-705000 | Loss: 4.431 | 675 ms/step , 58252.88 GFLOP/s , 533821.6 tokens/s INFO:__main__:2024-10-26 18:22:07 | Epoch: 0 | Step: 890 | Dataset: 0-713000 | Loss: 4.502 | 673 ms/step , 58428.00 GFLOP/s , 533937.3 tokens/s INFO:__main__:2024-10-26 18:22:15 | Epoch: 0 | Step: 900 | Dataset: 0-721000 | Loss: 4.433 | 674 ms/step , 58320.53 GFLOP/s , 533912.0 tokens/s INFO:__main__:2024-10-26 18:22:22 | Epoch: 0 | Step: 910 | Dataset: 0-729000 | Loss: 4.413 | 674 ms/step , 58311.27 GFLOP/s , 534211.8 tokens/s INFO:__main__:2024-10-26 18:22:30 | Epoch: 0 | Step: 920 | Dataset: 0-737000 | Loss: 4.373 | 675 ms/step , 58272.41 GFLOP/s , 533610.3 tokens/s INFO:__main__:2024-10-26 18:22:38 | Epoch: 0 | Step: 930 | Dataset: 0-745000 | Loss: 4.419 | 674 ms/step , 58346.92 GFLOP/s , 534672.9 tokens/s INFO:__main__:2024-10-26 18:22:45 | Epoch: 0 | Step: 940 | Dataset: 0-753000 | Loss: 4.379 | 675 ms/step , 58252.72 GFLOP/s , 533676.7 tokens/s INFO:__main__:2024-10-26 18:22:53 | Epoch: 0 | Step: 950 | Dataset: 0-761000 | Loss: 4.402 | 673 ms/step , 58368.67 GFLOP/s , 534040.9 tokens/s INFO:__main__:2024-10-26 18:23:01 | Epoch: 0 | Step: 960 | Dataset: 0-769000 | Loss: 4.331 | 673 ms/step , 58372.00 GFLOP/s , 534221.7 tokens/s INFO:__main__:2024-10-26 18:23:08 | Epoch: 0 | Step: 970 | Dataset: 0-777000 | Loss: 4.458 | 674 ms/step , 58293.10 GFLOP/s , 534327.5 tokens/s INFO:__main__:2024-10-26 18:23:16 | Epoch: 0 | Step: 980 | Dataset: 0-785000 | Loss: 4.285 | 675 ms/step , 58275.16 GFLOP/s , 534104.5 tokens/s INFO:__main__:2024-10-26 18:23:24 | Epoch: 0 | Step: 990 | Dataset: 0-793000 | Loss: 4.273 | 675 ms/step , 58207.19 GFLOP/s , 533941.5 tokens/s INFO:__main__:2024-10-26 18:23:31 | Validation | Step: 1000 | Val_loss: 4.342 | Best_val_loss: 19.4081 INFO:__main__:2024-10-26 18:23:31 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_182331_step_1000.pt` INFO:__main__:2024-10-26 18:23:32 | Epoch: 0 | Step: 1000 | Dataset: 0-801000 | Loss: 4.298 | 673 ms/step , 58434.14 GFLOP/s , 480717.4 tokens/s INFO:__main__:2024-10-26 18:23:40 | Epoch: 0 | Step: 1010 | Dataset: 0-809000 | Loss: 4.215 | 676 ms/step , 58163.38 GFLOP/s , 531556.0 tokens/s INFO:__main__:2024-10-26 18:23:48 | Epoch: 0 | Step: 1020 | Dataset: 0-817000 | Loss: 4.220 | 675 ms/step , 58250.83 GFLOP/s , 534034.4 tokens/s INFO:__main__:2024-10-26 18:23:55 | Epoch: 0 | Step: 1030 | Dataset: 0-825000 | Loss: 4.244 | 674 ms/step , 58330.40 GFLOP/s , 533654.6 tokens/s INFO:__main__:2024-10-26 18:24:03 | Epoch: 0 | Step: 1040 | Dataset: 0-833000 | Loss: 4.174 | 674 ms/step , 58324.82 GFLOP/s , 534244.9 tokens/s INFO:__main__:2024-10-26 18:24:11 | Epoch: 0 | Step: 1050 | Dataset: 0-841000 | Loss: 4.168 | 675 ms/step , 58278.00 GFLOP/s , 534158.8 tokens/s INFO:__main__:2024-10-26 18:24:18 | Epoch: 0 | Step: 1060 | Dataset: 0-849000 | Loss: 4.124 | 675 ms/step , 58250.54 GFLOP/s , 533946.6 tokens/s INFO:__main__:2024-10-26 18:24:26 | Epoch: 0 | Step: 1070 | Dataset: 0-857000 | Loss: 4.125 | 676 ms/step , 58110.30 GFLOP/s , 533217.8 tokens/s INFO:__main__:2024-10-26 18:24:34 | Epoch: 0 | Step: 1080 | Dataset: 0-865000 | Loss: 4.076 | 676 ms/step , 58145.88 GFLOP/s , 532422.4 tokens/s INFO:__main__:2024-10-26 18:24:41 | Epoch: 0 | Step: 1090 | Dataset: 0-873000 | Loss: 4.116 | 675 ms/step , 58241.97 GFLOP/s , 532963.4 tokens/s INFO:__main__:2024-10-26 18:24:49 | Epoch: 0 | Step: 1100 | Dataset: 0-881000 | Loss: 4.142 | 675 ms/step , 58199.20 GFLOP/s , 533197.6 tokens/s INFO:__main__:2024-10-26 18:24:57 | Epoch: 0 | Step: 1110 | Dataset: 0-889000 | Loss: 4.110 | 676 ms/step , 58136.61 GFLOP/s , 532130.8 tokens/s INFO:__main__:2024-10-26 18:25:04 | Epoch: 0 | Step: 1120 | Dataset: 0-897000 | Loss: 4.035 | 674 ms/step , 58333.28 GFLOP/s , 532034.5 tokens/s INFO:__main__:2024-10-26 18:25:12 | Epoch: 0 | Step: 1130 | Dataset: 0-905000 | Loss: 4.026 | 674 ms/step , 58347.53 GFLOP/s , 532442.8 tokens/s INFO:__main__:2024-10-26 18:25:20 | Epoch: 0 | Step: 1140 | Dataset: 0-913000 | Loss: 4.038 | 673 ms/step , 58390.02 GFLOP/s , 534546.0 tokens/s INFO:__main__:2024-10-26 18:25:27 | Epoch: 0 | Step: 1150 | Dataset: 0-921000 | Loss: 4.082 | 673 ms/step , 58380.49 GFLOP/s , 534318.7 tokens/s INFO:__main__:2024-10-26 18:25:35 | Epoch: 0 | Step: 1160 | Dataset: 0-929000 | Loss: 4.069 | 673 ms/step , 58380.67 GFLOP/s , 534070.6 tokens/s INFO:__main__:2024-10-26 18:25:43 | Epoch: 0 | Step: 1170 | Dataset: 0-937000 | Loss: 4.051 | 674 ms/step , 58283.42 GFLOP/s , 533666.1 tokens/s INFO:__main__:2024-10-26 18:25:50 | Epoch: 0 | Step: 1180 | Dataset: 0-945000 | Loss: 3.980 | 674 ms/step , 58336.88 GFLOP/s , 533850.5 tokens/s INFO:__main__:2024-10-26 18:25:58 | Epoch: 0 | Step: 1190 | Dataset: 0-953000 | Loss: 4.032 | 674 ms/step , 58365.67 GFLOP/s , 534307.6 tokens/s INFO:__main__:2024-10-26 18:26:06 | Epoch: 0 | Step: 1200 | Dataset: 0-961000 | Loss: 4.015 | 675 ms/step , 58259.27 GFLOP/s , 533922.6 tokens/s INFO:__main__:2024-10-26 18:26:13 | Epoch: 0 | Step: 1210 | Dataset: 0-969000 | Loss: 3.994 | 676 ms/step , 58191.54 GFLOP/s , 533377.3 tokens/s INFO:__main__:2024-10-26 18:26:21 | Epoch: 0 | Step: 1220 | Dataset: 0-977000 | Loss: 3.908 | 676 ms/step , 58189.30 GFLOP/s , 533269.6 tokens/s INFO:__main__:2024-10-26 18:26:29 | Epoch: 0 | Step: 1230 | Dataset: 0-985000 | Loss: 3.967 | 675 ms/step , 58244.29 GFLOP/s , 533038.5 tokens/s INFO:__main__:2024-10-26 18:26:37 | Epoch: 0 | Step: 1240 | Dataset: 0-993000 | Loss: 3.984 | 673 ms/step , 58366.13 GFLOP/s , 533826.8 tokens/s INFO:__main__:2024-10-26 18:26:44 | Epoch: 0 | Step: 1250 | Dataset: 0-1001000 | Loss: 3.892 | 673 ms/step , 58390.00 GFLOP/s , 533964.7 tokens/s INFO:__main__:2024-10-26 18:26:52 | Epoch: 0 | Step: 1260 | Dataset: 0-1009000 | Loss: 3.958 | 674 ms/step , 58343.11 GFLOP/s , 534349.9 tokens/s INFO:__main__:2024-10-26 18:27:00 | Epoch: 0 | Step: 1270 | Dataset: 0-1017000 | Loss: 3.940 | 675 ms/step , 58269.34 GFLOP/s , 533748.7 tokens/s INFO:__main__:2024-10-26 18:27:07 | Epoch: 0 | Step: 1280 | Dataset: 0-1025000 | Loss: 3.903 | 673 ms/step , 58380.68 GFLOP/s , 533991.1 tokens/s INFO:__main__:2024-10-26 18:27:15 | Epoch: 0 | Step: 1290 | Dataset: 0-1033000 | Loss: 3.870 | 673 ms/step , 58443.14 GFLOP/s , 534133.3 tokens/s INFO:__main__:2024-10-26 18:27:23 | Epoch: 0 | Step: 1300 | Dataset: 0-1041000 | Loss: 3.884 | 673 ms/step , 58389.40 GFLOP/s , 534621.4 tokens/s INFO:__main__:2024-10-26 18:27:30 | Epoch: 0 | Step: 1310 | Dataset: 0-1049000 | Loss: 4.145 | 673 ms/step , 58373.39 GFLOP/s , 534023.5 tokens/s INFO:__main__:2024-10-26 18:27:38 | Epoch: 0 | Step: 1320 | Dataset: 0-1057000 | Loss: 3.819 | 673 ms/step , 58377.04 GFLOP/s , 533931.6 tokens/s INFO:__main__:2024-10-26 18:27:46 | Epoch: 0 | Step: 1330 | Dataset: 0-1065000 | Loss: 3.628 | 675 ms/step , 58250.33 GFLOP/s , 532829.4 tokens/s INFO:__main__:2024-10-26 18:27:53 | Epoch: 0 | Step: 1340 | Dataset: 0-1073000 | Loss: 3.492 | 674 ms/step , 58312.18 GFLOP/s , 532581.0 tokens/s INFO:__main__:2024-10-26 18:28:01 | Epoch: 0 | Step: 1350 | Dataset: 0-1081000 | Loss: 3.361 | 674 ms/step , 58363.38 GFLOP/s , 533514.4 tokens/s INFO:__main__:2024-10-26 18:28:09 | Epoch: 0 | Step: 1360 | Dataset: 0-1089000 | Loss: 3.246 | 674 ms/step , 58321.38 GFLOP/s , 532997.3 tokens/s INFO:__main__:2024-10-26 18:28:16 | Epoch: 0 | Step: 1370 | Dataset: 0-1097000 | Loss: 3.160 | 673 ms/step , 58411.44 GFLOP/s , 533475.2 tokens/s INFO:__main__:2024-10-26 18:28:24 | Epoch: 0 | Step: 1380 | Dataset: 0-1105000 | Loss: 3.095 | 673 ms/step , 58381.37 GFLOP/s , 533377.4 tokens/s INFO:__main__:2024-10-26 18:28:32 | Epoch: 0 | Step: 1390 | Dataset: 0-1113000 | Loss: 3.047 | 674 ms/step , 58335.72 GFLOP/s , 533495.2 tokens/s INFO:__main__:2024-10-26 18:28:39 | Epoch: 0 | Step: 1400 | Dataset: 0-1121000 | Loss: 4.413 | 676 ms/step , 58121.48 GFLOP/s , 533289.3 tokens/s INFO:__main__:2024-10-26 18:28:47 | Epoch: 0 | Step: 1410 | Dataset: 0-1129000 | Loss: 4.020 | 674 ms/step , 58344.20 GFLOP/s , 532878.0 tokens/s INFO:__main__:2024-10-26 18:28:55 | Epoch: 0 | Step: 1420 | Dataset: 0-1137000 | Loss: 3.923 | 675 ms/step , 58243.86 GFLOP/s , 533980.4 tokens/s INFO:__main__:2024-10-26 18:29:02 | Epoch: 0 | Step: 1430 | Dataset: 0-1145000 | Loss: 3.934 | 673 ms/step , 58450.95 GFLOP/s , 533864.2 tokens/s INFO:__main__:2024-10-26 18:29:10 | Epoch: 0 | Step: 1440 | Dataset: 0-1153000 | Loss: 3.848 | 674 ms/step , 58324.19 GFLOP/s , 534289.6 tokens/s INFO:__main__:2024-10-26 18:29:18 | Epoch: 0 | Step: 1450 | Dataset: 0-1161000 | Loss: 3.763 | 675 ms/step , 58278.62 GFLOP/s , 533218.1 tokens/s INFO:__main__:2024-10-26 18:29:25 | Epoch: 0 | Step: 1460 | Dataset: 0-1169000 | Loss: 3.783 | 674 ms/step , 58290.61 GFLOP/s , 533493.2 tokens/s INFO:__main__:2024-10-26 18:29:33 | Epoch: 0 | Step: 1470 | Dataset: 0-1177000 | Loss: 3.841 | 674 ms/step , 58354.89 GFLOP/s , 533002.0 tokens/s INFO:__main__:2024-10-26 18:29:41 | Epoch: 0 | Step: 1480 | Dataset: 0-1185000 | Loss: 3.716 | 675 ms/step , 58271.92 GFLOP/s , 533553.8 tokens/s INFO:__main__:2024-10-26 18:29:48 | Epoch: 0 | Step: 1490 | Dataset: 0-1193000 | Loss: 3.747 | 674 ms/step , 58302.69 GFLOP/s , 533656.4 tokens/s INFO:__main__:2024-10-26 18:29:56 | Epoch: 0 | Step: 1500 | Dataset: 0-1201000 | Loss: 3.722 | 674 ms/step , 58285.87 GFLOP/s , 533602.2 tokens/s INFO:__main__:2024-10-26 18:30:04 | Epoch: 0 | Step: 1510 | Dataset: 0-1209000 | Loss: 3.701 | 674 ms/step , 58301.94 GFLOP/s , 533294.1 tokens/s INFO:__main__:2024-10-26 18:30:11 | Epoch: 0 | Step: 1520 | Dataset: 0-1217000 | Loss: 3.782 | 673 ms/step , 58383.35 GFLOP/s , 533445.1 tokens/s INFO:__main__:2024-10-26 18:30:19 | Epoch: 0 | Step: 1530 | Dataset: 0-1225000 | Loss: 3.660 | 674 ms/step , 58317.21 GFLOP/s , 533465.1 tokens/s INFO:__main__:2024-10-26 18:30:27 | Epoch: 0 | Step: 1540 | Dataset: 0-1233000 | Loss: 3.753 | 674 ms/step , 58306.16 GFLOP/s , 533263.8 tokens/s INFO:__main__:2024-10-26 18:30:34 | Epoch: 0 | Step: 1550 | Dataset: 0-1241000 | Loss: 3.742 | 674 ms/step , 58303.05 GFLOP/s , 533424.8 tokens/s INFO:__main__:2024-10-26 18:30:42 | Epoch: 0 | Step: 1560 | Dataset: 0-1249000 | Loss: 3.747 | 674 ms/step , 58345.56 GFLOP/s , 533091.3 tokens/s INFO:__main__:2024-10-26 18:30:50 | Epoch: 0 | Step: 1570 | Dataset: 0-1257000 | Loss: 3.729 | 675 ms/step , 58218.21 GFLOP/s , 533388.9 tokens/s INFO:__main__:2024-10-26 18:30:58 | Epoch: 0 | Step: 1580 | Dataset: 0-1265000 | Loss: 3.742 | 675 ms/step , 58213.30 GFLOP/s , 533062.1 tokens/s INFO:__main__:2024-10-26 18:31:05 | Epoch: 0 | Step: 1590 | Dataset: 0-1273000 | Loss: 3.715 | 674 ms/step , 58293.80 GFLOP/s , 533601.1 tokens/s INFO:__main__:2024-10-26 18:31:13 | Epoch: 0 | Step: 1600 | Dataset: 0-1281000 | Loss: 3.681 | 675 ms/step , 58266.21 GFLOP/s , 533570.5 tokens/s INFO:__main__:2024-10-26 18:31:21 | Epoch: 0 | Step: 1610 | Dataset: 0-1289000 | Loss: 3.723 | 674 ms/step , 58333.71 GFLOP/s , 532968.7 tokens/s INFO:__main__:2024-10-26 18:31:28 | Epoch: 0 | Step: 1620 | Dataset: 0-1297000 | Loss: 3.641 | 674 ms/step , 58296.88 GFLOP/s , 533337.2 tokens/s INFO:__main__:2024-10-26 18:31:36 | Epoch: 0 | Step: 1630 | Dataset: 0-1305000 | Loss: 3.670 | 674 ms/step , 58330.70 GFLOP/s , 533237.8 tokens/s INFO:__main__:2024-10-26 18:31:44 | Epoch: 0 | Step: 1640 | Dataset: 0-1313000 | Loss: 3.738 | 674 ms/step , 58329.35 GFLOP/s , 533490.7 tokens/s INFO:__main__:2024-10-26 18:31:51 | Epoch: 0 | Step: 1650 | Dataset: 0-1321000 | Loss: 3.639 | 674 ms/step , 58339.91 GFLOP/s , 533184.3 tokens/s INFO:__main__:2024-10-26 18:31:59 | Epoch: 0 | Step: 1660 | Dataset: 0-1329000 | Loss: 3.682 | 674 ms/step , 58308.56 GFLOP/s , 533383.2 tokens/s INFO:__main__:2024-10-26 18:32:07 | Epoch: 0 | Step: 1670 | Dataset: 0-1337000 | Loss: 3.609 | 675 ms/step , 58269.66 GFLOP/s , 532747.4 tokens/s INFO:__main__:2024-10-26 18:32:14 | Epoch: 0 | Step: 1680 | Dataset: 0-1345000 | Loss: 3.593 | 674 ms/step , 58308.40 GFLOP/s , 533557.8 tokens/s INFO:__main__:2024-10-26 18:32:22 | Epoch: 0 | Step: 1690 | Dataset: 0-1353000 | Loss: 3.636 | 674 ms/step , 58300.33 GFLOP/s , 533347.9 tokens/s INFO:__main__:2024-10-26 18:32:30 | Epoch: 0 | Step: 1700 | Dataset: 0-1361000 | Loss: 3.551 | 675 ms/step , 58235.07 GFLOP/s , 533440.6 tokens/s INFO:__main__:2024-10-26 18:32:37 | Epoch: 0 | Step: 1710 | Dataset: 0-1369000 | Loss: 3.542 | 674 ms/step , 58310.93 GFLOP/s , 533349.3 tokens/s INFO:__main__:2024-10-26 18:32:45 | Epoch: 0 | Step: 1720 | Dataset: 0-1377000 | Loss: 3.634 | 675 ms/step , 58259.66 GFLOP/s , 532844.1 tokens/s INFO:__main__:2024-10-26 18:32:53 | Epoch: 0 | Step: 1730 | Dataset: 0-1385000 | Loss: 3.535 | 674 ms/step , 58315.72 GFLOP/s , 533757.7 tokens/s INFO:__main__:2024-10-26 18:33:00 | Epoch: 0 | Step: 1740 | Dataset: 0-1393000 | Loss: 3.599 | 673 ms/step , 58381.27 GFLOP/s , 533353.1 tokens/s INFO:__main__:2024-10-26 18:33:08 | Epoch: 0 | Step: 1750 | Dataset: 0-1401000 | Loss: 3.639 | 674 ms/step , 58341.82 GFLOP/s , 533471.8 tokens/s INFO:__main__:2024-10-26 18:33:16 | Epoch: 0 | Step: 1760 | Dataset: 0-1409000 | Loss: 3.459 | 673 ms/step , 58398.83 GFLOP/s , 533398.0 tokens/s INFO:__main__:2024-10-26 18:33:23 | Epoch: 0 | Step: 1770 | Dataset: 0-1417000 | Loss: 3.481 | 674 ms/step , 58353.30 GFLOP/s , 533393.3 tokens/s INFO:__main__:2024-10-26 18:33:31 | Epoch: 0 | Step: 1780 | Dataset: 0-1425000 | Loss: 3.593 | 675 ms/step , 58257.58 GFLOP/s , 533075.1 tokens/s INFO:__main__:2024-10-26 18:33:39 | Epoch: 0 | Step: 1790 | Dataset: 0-1433000 | Loss: 3.481 | 675 ms/step , 58258.83 GFLOP/s , 533244.1 tokens/s INFO:__main__:2024-10-26 18:33:47 | Epoch: 0 | Step: 1800 | Dataset: 0-1441000 | Loss: 3.536 | 675 ms/step , 58277.62 GFLOP/s , 533428.4 tokens/s INFO:__main__:2024-10-26 18:33:54 | Epoch: 0 | Step: 1810 | Dataset: 0-1449000 | Loss: 3.410 | 673 ms/step , 58449.97 GFLOP/s , 533476.0 tokens/s INFO:__main__:2024-10-26 18:34:02 | Epoch: 0 | Step: 1820 | Dataset: 0-1457000 | Loss: 3.476 | 673 ms/step , 58429.09 GFLOP/s , 533719.7 tokens/s INFO:__main__:2024-10-26 18:34:10 | Epoch: 0 | Step: 1830 | Dataset: 0-1465000 | Loss: 3.490 | 673 ms/step , 58373.03 GFLOP/s , 533526.0 tokens/s INFO:__main__:2024-10-26 18:34:17 | Epoch: 0 | Step: 1840 | Dataset: 0-1473000 | Loss: 3.528 | 675 ms/step , 58268.96 GFLOP/s , 533976.0 tokens/s INFO:__main__:2024-10-26 18:34:25 | Epoch: 0 | Step: 1850 | Dataset: 0-1481000 | Loss: 3.491 | 675 ms/step , 58197.96 GFLOP/s , 533118.2 tokens/s INFO:__main__:2024-10-26 18:34:33 | Epoch: 0 | Step: 1860 | Dataset: 0-1489000 | Loss: 3.491 | 673 ms/step , 58407.08 GFLOP/s , 533012.3 tokens/s INFO:__main__:2024-10-26 18:34:40 | Epoch: 0 | Step: 1870 | Dataset: 0-1497000 | Loss: 3.459 | 674 ms/step , 58321.35 GFLOP/s , 533046.7 tokens/s INFO:__main__:2024-10-26 18:34:48 | Epoch: 0 | Step: 1880 | Dataset: 0-1505000 | Loss: 3.518 | 673 ms/step , 58380.55 GFLOP/s , 533790.8 tokens/s INFO:__main__:2024-10-26 18:34:56 | Epoch: 0 | Step: 1890 | Dataset: 0-1513000 | Loss: 3.114 | 675 ms/step , 58235.67 GFLOP/s , 532559.3 tokens/s INFO:__main__:2024-10-26 18:35:03 | Epoch: 0 | Step: 1900 | Dataset: 0-1521000 | Loss: 2.963 | 673 ms/step , 58398.76 GFLOP/s , 532607.6 tokens/s INFO:__main__:2024-10-26 18:35:11 | Epoch: 0 | Step: 1910 | Dataset: 0-1529000 | Loss: 2.937 | 673 ms/step , 58374.01 GFLOP/s , 533367.0 tokens/s INFO:__main__:2024-10-26 18:35:19 | Epoch: 0 | Step: 1920 | Dataset: 0-1537000 | Loss: 2.855 | 674 ms/step , 58288.82 GFLOP/s , 532304.3 tokens/s INFO:__main__:2024-10-26 18:35:26 | Epoch: 0 | Step: 1930 | Dataset: 0-1545000 | Loss: 2.784 | 675 ms/step , 58236.89 GFLOP/s , 532178.1 tokens/s INFO:__main__:2024-10-26 18:35:34 | Epoch: 0 | Step: 1940 | Dataset: 0-1553000 | Loss: 2.729 | 673 ms/step , 58367.05 GFLOP/s , 531475.8 tokens/s INFO:__main__:2024-10-26 18:35:42 | Epoch: 0 | Step: 1950 | Dataset: 0-1561000 | Loss: 2.699 | 672 ms/step , 58469.26 GFLOP/s , 533875.4 tokens/s INFO:__main__:2024-10-26 18:35:49 | Epoch: 0 | Step: 1960 | Dataset: 0-1569000 | Loss: 2.695 | 673 ms/step , 58388.28 GFLOP/s , 531899.3 tokens/s INFO:__main__:2024-10-26 18:35:57 | Epoch: 0 | Step: 1970 | Dataset: 0-1577000 | Loss: 4.188 | 674 ms/step , 58320.60 GFLOP/s , 533306.2 tokens/s INFO:__main__:2024-10-26 18:36:05 | Epoch: 0 | Step: 1980 | Dataset: 0-1585000 | Loss: 3.660 | 675 ms/step , 58246.56 GFLOP/s , 533278.7 tokens/s INFO:__main__:2024-10-26 18:36:13 | Epoch: 0 | Step: 1990 | Dataset: 0-1593000 | Loss: 3.561 | 674 ms/step , 58291.10 GFLOP/s , 533693.7 tokens/s INFO:__main__:2024-10-26 18:36:20 | Validation | Step: 2000 | Val_loss: 3.503 | Best_val_loss: 4.3424 INFO:__main__:2024-10-26 18:36:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_183620_step_2000.pt` INFO:__main__:2024-10-26 18:36:21 | Epoch: 0 | Step: 2000 | Dataset: 0-1601000 | Loss: 3.435 | 673 ms/step , 58441.75 GFLOP/s , 480802.9 tokens/s INFO:__main__:2024-10-26 18:36:29 | Epoch: 0 | Step: 2010 | Dataset: 0-1609000 | Loss: 3.500 | 674 ms/step , 58323.86 GFLOP/s , 533978.4 tokens/s INFO:__main__:2024-10-26 18:36:36 | Epoch: 0 | Step: 2020 | Dataset: 0-1617000 | Loss: 3.451 | 674 ms/step , 58287.48 GFLOP/s , 533787.7 tokens/s INFO:__main__:2024-10-26 18:36:44 | Epoch: 0 | Step: 2030 | Dataset: 0-1625000 | Loss: 3.440 | 673 ms/step , 58377.56 GFLOP/s , 533862.3 tokens/s INFO:__main__:2024-10-26 18:36:52 | Epoch: 0 | Step: 2040 | Dataset: 0-1633000 | Loss: 3.425 | 674 ms/step , 58311.77 GFLOP/s , 534241.8 tokens/s INFO:__main__:2024-10-26 18:36:59 | Epoch: 0 | Step: 2050 | Dataset: 0-1641000 | Loss: 3.469 | 675 ms/step , 58250.15 GFLOP/s , 533247.8 tokens/s INFO:__main__:2024-10-26 18:37:07 | Epoch: 0 | Step: 2060 | Dataset: 0-1649000 | Loss: 3.492 | 675 ms/step , 58277.65 GFLOP/s , 533364.3 tokens/s INFO:__main__:2024-10-26 18:37:15 | Epoch: 0 | Step: 2070 | Dataset: 0-1657000 | Loss: 3.387 | 674 ms/step , 58352.87 GFLOP/s , 533522.8 tokens/s INFO:__main__:2024-10-26 18:37:22 | Epoch: 0 | Step: 2080 | Dataset: 0-1665000 | Loss: 3.439 | 674 ms/step , 58317.35 GFLOP/s , 533972.4 tokens/s INFO:__main__:2024-10-26 18:37:30 | Epoch: 0 | Step: 2090 | Dataset: 0-1673000 | Loss: 3.460 | 674 ms/step , 58279.98 GFLOP/s , 533378.0 tokens/s INFO:__main__:2024-10-26 18:37:38 | Epoch: 0 | Step: 2100 | Dataset: 0-1681000 | Loss: 3.455 | 674 ms/step , 58345.45 GFLOP/s , 533438.8 tokens/s INFO:__main__:2024-10-26 18:37:45 | Epoch: 0 | Step: 2110 | Dataset: 0-1689000 | Loss: 3.387 | 674 ms/step , 58343.27 GFLOP/s , 533458.6 tokens/s INFO:__main__:2024-10-26 18:37:53 | Epoch: 0 | Step: 2120 | Dataset: 0-1697000 | Loss: 3.423 | 673 ms/step , 58421.70 GFLOP/s , 533812.8 tokens/s INFO:__main__:2024-10-26 18:38:01 | Epoch: 0 | Step: 2130 | Dataset: 0-1705000 | Loss: 3.440 | 673 ms/step , 58443.56 GFLOP/s , 534264.8 tokens/s INFO:__main__:2024-10-26 18:38:08 | Epoch: 0 | Step: 2140 | Dataset: 0-1713000 | Loss: 3.386 | 674 ms/step , 58345.48 GFLOP/s , 533612.0 tokens/s INFO:__main__:2024-10-26 18:38:16 | Epoch: 0 | Step: 2150 | Dataset: 0-1721000 | Loss: 3.405 | 674 ms/step , 58337.73 GFLOP/s , 533734.9 tokens/s INFO:__main__:2024-10-26 18:38:24 | Epoch: 0 | Step: 2160 | Dataset: 0-1729000 | Loss: 3.361 | 674 ms/step , 58305.49 GFLOP/s , 533181.1 tokens/s INFO:__main__:2024-10-26 18:38:32 | Epoch: 0 | Step: 2170 | Dataset: 0-1737000 | Loss: 3.435 | 674 ms/step , 58286.88 GFLOP/s , 533285.4 tokens/s INFO:__main__:2024-10-26 18:38:39 | Epoch: 0 | Step: 2180 | Dataset: 0-1745000 | Loss: 3.463 | 674 ms/step , 58336.70 GFLOP/s , 533191.1 tokens/s INFO:__main__:2024-10-26 18:38:47 | Epoch: 0 | Step: 2190 | Dataset: 0-1753000 | Loss: 3.305 | 674 ms/step , 58326.19 GFLOP/s , 533141.2 tokens/s INFO:__main__:2024-10-26 18:38:55 | Epoch: 0 | Step: 2200 | Dataset: 0-1761000 | Loss: 3.336 | 674 ms/step , 58307.97 GFLOP/s , 533280.3 tokens/s INFO:__main__:2024-10-26 18:39:02 | Epoch: 0 | Step: 2210 | Dataset: 0-1769000 | Loss: 3.293 | 675 ms/step , 58236.49 GFLOP/s , 533410.0 tokens/s INFO:__main__:2024-10-26 18:39:10 | Epoch: 0 | Step: 2220 | Dataset: 0-1777000 | Loss: 3.313 | 675 ms/step , 58268.74 GFLOP/s , 533293.9 tokens/s INFO:__main__:2024-10-26 18:39:18 | Epoch: 0 | Step: 2230 | Dataset: 0-1785000 | Loss: 3.300 | 674 ms/step , 58317.08 GFLOP/s , 533402.1 tokens/s INFO:__main__:2024-10-26 18:39:25 | Epoch: 0 | Step: 2240 | Dataset: 0-1793000 | Loss: 3.260 | 674 ms/step , 58321.56 GFLOP/s , 533336.8 tokens/s INFO:__main__:2024-10-26 18:39:33 | Epoch: 0 | Step: 2250 | Dataset: 0-1801000 | Loss: 3.310 | 674 ms/step , 58294.35 GFLOP/s , 532175.9 tokens/s INFO:__main__:2024-10-26 18:39:41 | Epoch: 0 | Step: 2260 | Dataset: 0-1809000 | Loss: 3.217 | 674 ms/step , 58357.96 GFLOP/s , 533649.8 tokens/s INFO:__main__:2024-10-26 18:39:48 | Epoch: 0 | Step: 2270 | Dataset: 0-1817000 | Loss: 3.270 | 673 ms/step , 58388.64 GFLOP/s , 533221.3 tokens/s INFO:__main__:2024-10-26 18:39:56 | Epoch: 0 | Step: 2280 | Dataset: 0-1825000 | Loss: 3.204 | 674 ms/step , 58341.90 GFLOP/s , 533553.4 tokens/s INFO:__main__:2024-10-26 18:40:04 | Epoch: 0 | Step: 2290 | Dataset: 0-1833000 | Loss: 3.206 | 674 ms/step , 58316.11 GFLOP/s , 533183.3 tokens/s INFO:__main__:2024-10-26 18:40:11 | Epoch: 0 | Step: 2300 | Dataset: 0-1841000 | Loss: 2.746 | 674 ms/step , 58359.77 GFLOP/s , 532932.3 tokens/s INFO:__main__:2024-10-26 18:40:19 | Epoch: 0 | Step: 2310 | Dataset: 0-1849000 | Loss: 2.684 | 674 ms/step , 58325.84 GFLOP/s , 532352.0 tokens/s INFO:__main__:2024-10-26 18:40:27 | Epoch: 0 | Step: 2320 | Dataset: 0-1857000 | Loss: 2.637 | 673 ms/step , 58373.67 GFLOP/s , 533114.6 tokens/s INFO:__main__:2024-10-26 18:40:34 | Epoch: 0 | Step: 2330 | Dataset: 0-1865000 | Loss: 2.578 | 676 ms/step , 58145.59 GFLOP/s , 532531.9 tokens/s INFO:__main__:2024-10-26 18:40:42 | Epoch: 0 | Step: 2340 | Dataset: 0-1873000 | Loss: 2.552 | 675 ms/step , 58254.01 GFLOP/s , 532752.4 tokens/s INFO:__main__:2024-10-26 18:40:50 | Epoch: 0 | Step: 2350 | Dataset: 0-1881000 | Loss: 2.551 | 675 ms/step , 58258.65 GFLOP/s , 532244.0 tokens/s INFO:__main__:2024-10-26 18:40:58 | Epoch: 0 | Step: 2360 | Dataset: 0-1889000 | Loss: 2.502 | 675 ms/step , 58235.49 GFLOP/s , 532558.8 tokens/s INFO:__main__:2024-10-26 18:41:05 | Epoch: 0 | Step: 2370 | Dataset: 0-1897000 | Loss: 2.491 | 675 ms/step , 58247.61 GFLOP/s , 532479.4 tokens/s INFO:__main__:2024-10-26 18:41:13 | Epoch: 0 | Step: 2380 | Dataset: 0-1905000 | Loss: 4.170 | 675 ms/step , 58261.01 GFLOP/s , 532475.2 tokens/s INFO:__main__:2024-10-26 18:41:21 | Epoch: 0 | Step: 2390 | Dataset: 0-1913000 | Loss: 3.374 | 675 ms/step , 58270.04 GFLOP/s , 533214.2 tokens/s INFO:__main__:2024-10-26 18:41:28 | Epoch: 0 | Step: 2400 | Dataset: 0-1921000 | Loss: 3.389 | 677 ms/step , 58094.75 GFLOP/s , 532786.1 tokens/s INFO:__main__:2024-10-26 18:41:36 | Epoch: 0 | Step: 2410 | Dataset: 0-1929000 | Loss: 3.348 | 674 ms/step , 58301.95 GFLOP/s , 532946.3 tokens/s INFO:__main__:2024-10-26 18:41:44 | Epoch: 0 | Step: 2420 | Dataset: 0-1937000 | Loss: 3.205 | 676 ms/step , 58167.69 GFLOP/s , 533014.7 tokens/s INFO:__main__:2024-10-26 18:41:51 | Epoch: 0 | Step: 2430 | Dataset: 0-1945000 | Loss: 3.252 | 675 ms/step , 58251.28 GFLOP/s , 532831.3 tokens/s INFO:__main__:2024-10-26 18:41:59 | Epoch: 0 | Step: 2440 | Dataset: 0-1953000 | Loss: 3.309 | 675 ms/step , 58261.85 GFLOP/s , 532969.1 tokens/s INFO:__main__:2024-10-26 18:42:07 | Epoch: 0 | Step: 2450 | Dataset: 0-1961000 | Loss: 3.272 | 674 ms/step , 58285.55 GFLOP/s , 532279.3 tokens/s INFO:__main__:2024-10-26 18:42:14 | Epoch: 0 | Step: 2460 | Dataset: 0-1969000 | Loss: 3.153 | 676 ms/step , 58137.67 GFLOP/s , 531741.5 tokens/s INFO:__main__:2024-10-26 18:42:22 | Epoch: 0 | Step: 2470 | Dataset: 0-1977000 | Loss: 3.237 | 683 ms/step , 57544.82 GFLOP/s , 529284.0 tokens/s INFO:__main__:2024-10-26 18:42:30 | Epoch: 0 | Step: 2480 | Dataset: 0-1985000 | Loss: 3.216 | 677 ms/step , 58072.62 GFLOP/s , 528960.4 tokens/s INFO:__main__:2024-10-26 18:42:38 | Epoch: 0 | Step: 2490 | Dataset: 0-1993000 | Loss: 3.229 | 675 ms/step , 58216.77 GFLOP/s , 530113.8 tokens/s INFO:__main__:2024-10-26 18:42:45 | Epoch: 0 | Step: 2500 | Dataset: 0-2001000 | Loss: 3.115 | 676 ms/step , 58146.97 GFLOP/s , 530564.7 tokens/s INFO:__main__:2024-10-26 18:42:53 | Epoch: 0 | Step: 2510 | Dataset: 0-2009000 | Loss: 3.229 | 680 ms/step , 57769.07 GFLOP/s , 530124.8 tokens/s INFO:__main__:2024-10-26 18:43:01 | Epoch: 0 | Step: 2520 | Dataset: 0-2017000 | Loss: 3.205 | 674 ms/step , 58318.13 GFLOP/s , 532337.1 tokens/s INFO:__main__:2024-10-26 18:43:08 | Epoch: 0 | Step: 2530 | Dataset: 0-2025000 | Loss: 3.166 | 677 ms/step , 58077.72 GFLOP/s , 529885.3 tokens/s INFO:__main__:2024-10-26 18:43:16 | Epoch: 0 | Step: 2540 | Dataset: 0-2033000 | Loss: 2.995 | 677 ms/step , 58081.95 GFLOP/s , 530347.7 tokens/s INFO:__main__:2024-10-26 18:43:24 | Epoch: 0 | Step: 2550 | Dataset: 0-2041000 | Loss: 2.518 | 673 ms/step , 58379.63 GFLOP/s , 532315.4 tokens/s INFO:__main__:2024-10-26 18:43:32 | Epoch: 0 | Step: 2560 | Dataset: 0-2049000 | Loss: 2.466 | 674 ms/step , 58361.80 GFLOP/s , 532539.4 tokens/s INFO:__main__:2024-10-26 18:43:39 | Epoch: 0 | Step: 2570 | Dataset: 0-2057000 | Loss: 2.408 | 674 ms/step , 58288.70 GFLOP/s , 532218.3 tokens/s INFO:__main__:2024-10-26 18:43:47 | Epoch: 0 | Step: 2580 | Dataset: 0-2065000 | Loss: 2.415 | 673 ms/step , 58452.43 GFLOP/s , 532755.6 tokens/s INFO:__main__:2024-10-26 18:43:55 | Epoch: 0 | Step: 2590 | Dataset: 0-2073000 | Loss: 2.371 | 674 ms/step , 58349.72 GFLOP/s , 532915.9 tokens/s INFO:__main__:2024-10-26 18:44:02 | Epoch: 0 | Step: 2600 | Dataset: 0-2081000 | Loss: 2.350 | 676 ms/step , 58192.21 GFLOP/s , 532533.9 tokens/s INFO:__main__:2024-10-26 18:44:10 | Epoch: 0 | Step: 2610 | Dataset: 0-2089000 | Loss: 2.323 | 675 ms/step , 58212.71 GFLOP/s , 532883.4 tokens/s INFO:__main__:2024-10-26 18:44:18 | Epoch: 0 | Step: 2620 | Dataset: 0-2097000 | Loss: 2.310 | 674 ms/step , 58314.44 GFLOP/s , 532626.2 tokens/s INFO:__main__:2024-10-26 18:44:25 | Epoch: 0 | Step: 2630 | Dataset: 0-2105000 | Loss: 3.538 | 674 ms/step , 58312.15 GFLOP/s , 533241.4 tokens/s INFO:__main__:2024-10-26 18:44:33 | Epoch: 0 | Step: 2640 | Dataset: 0-2113000 | Loss: 3.305 | 674 ms/step , 58340.93 GFLOP/s , 533830.0 tokens/s INFO:__main__:2024-10-26 18:44:41 | Epoch: 0 | Step: 2650 | Dataset: 0-2121000 | Loss: 3.192 | 673 ms/step , 58375.88 GFLOP/s , 533316.5 tokens/s INFO:__main__:2024-10-26 18:44:48 | Epoch: 0 | Step: 2660 | Dataset: 0-2129000 | Loss: 3.214 | 674 ms/step , 58312.44 GFLOP/s , 533108.7 tokens/s INFO:__main__:2024-10-26 18:44:56 | Epoch: 0 | Step: 2670 | Dataset: 0-2137000 | Loss: 3.188 | 674 ms/step , 58282.25 GFLOP/s , 533075.5 tokens/s INFO:__main__:2024-10-26 18:45:04 | Epoch: 0 | Step: 2680 | Dataset: 0-2145000 | Loss: 3.090 | 674 ms/step , 58339.56 GFLOP/s , 532724.9 tokens/s INFO:__main__:2024-10-26 18:45:12 | Epoch: 0 | Step: 2690 | Dataset: 0-2153000 | Loss: 3.161 | 673 ms/step , 58397.39 GFLOP/s , 533020.0 tokens/s INFO:__main__:2024-10-26 18:45:19 | Epoch: 0 | Step: 2700 | Dataset: 0-2161000 | Loss: 3.106 | 674 ms/step , 58306.13 GFLOP/s , 533815.5 tokens/s INFO:__main__:2024-10-26 18:45:27 | Epoch: 0 | Step: 2710 | Dataset: 0-2169000 | Loss: 3.015 | 674 ms/step , 58305.99 GFLOP/s , 532986.5 tokens/s INFO:__main__:2024-10-26 18:45:35 | Epoch: 0 | Step: 2720 | Dataset: 0-2177000 | Loss: 3.151 | 673 ms/step , 58428.82 GFLOP/s , 533864.9 tokens/s INFO:__main__:2024-10-26 18:45:42 | Epoch: 0 | Step: 2730 | Dataset: 0-2185000 | Loss: 3.080 | 674 ms/step , 58304.53 GFLOP/s , 533386.8 tokens/s INFO:__main__:2024-10-26 18:45:50 | Epoch: 0 | Step: 2740 | Dataset: 0-2193000 | Loss: 3.108 | 675 ms/step , 58271.36 GFLOP/s , 532606.5 tokens/s INFO:__main__:2024-10-26 18:45:58 | Epoch: 0 | Step: 2750 | Dataset: 0-2201000 | Loss: 3.123 | 674 ms/step , 58307.80 GFLOP/s , 533079.0 tokens/s INFO:__main__:2024-10-26 18:46:05 | Epoch: 0 | Step: 2760 | Dataset: 0-2209000 | Loss: 3.031 | 674 ms/step , 58357.49 GFLOP/s , 533063.6 tokens/s INFO:__main__:2024-10-26 18:46:13 | Epoch: 0 | Step: 2770 | Dataset: 0-2217000 | Loss: 3.060 | 674 ms/step , 58346.66 GFLOP/s , 533241.6 tokens/s INFO:__main__:2024-10-26 18:46:21 | Epoch: 0 | Step: 2780 | Dataset: 0-2225000 | Loss: 3.092 | 674 ms/step , 58360.99 GFLOP/s , 533302.9 tokens/s INFO:__main__:2024-10-26 18:46:28 | Epoch: 0 | Step: 2790 | Dataset: 0-2233000 | Loss: 2.576 | 674 ms/step , 58354.28 GFLOP/s , 533370.8 tokens/s INFO:__main__:2024-10-26 18:46:36 | Epoch: 0 | Step: 2800 | Dataset: 0-2241000 | Loss: 2.454 | 673 ms/step , 58382.95 GFLOP/s , 532823.7 tokens/s INFO:__main__:2024-10-26 18:46:44 | Epoch: 0 | Step: 2810 | Dataset: 0-2249000 | Loss: 2.444 | 673 ms/step , 58398.23 GFLOP/s , 530400.2 tokens/s INFO:__main__:2024-10-26 18:46:51 | Epoch: 0 | Step: 2820 | Dataset: 0-2257000 | Loss: 2.412 | 673 ms/step , 58428.13 GFLOP/s , 532999.6 tokens/s INFO:__main__:2024-10-26 18:46:59 | Epoch: 0 | Step: 2830 | Dataset: 0-2265000 | Loss: 2.377 | 673 ms/step , 58375.54 GFLOP/s , 532970.2 tokens/s INFO:__main__:2024-10-26 18:47:07 | Epoch: 0 | Step: 2840 | Dataset: 0-2273000 | Loss: 2.385 | 673 ms/step , 58385.65 GFLOP/s , 532869.1 tokens/s INFO:__main__:2024-10-26 18:47:14 | Epoch: 0 | Step: 2850 | Dataset: 0-2281000 | Loss: 2.340 | 673 ms/step , 58398.40 GFLOP/s , 533001.1 tokens/s INFO:__main__:2024-10-26 18:47:22 | Epoch: 0 | Step: 2860 | Dataset: 0-2289000 | Loss: 2.327 | 673 ms/step , 58425.12 GFLOP/s , 533396.7 tokens/s INFO:__main__:2024-10-26 18:47:30 | Epoch: 0 | Step: 2870 | Dataset: 0-2297000 | Loss: 2.290 | 675 ms/step , 58267.19 GFLOP/s , 532410.9 tokens/s INFO:__main__:2024-10-26 18:47:38 | Epoch: 0 | Step: 2880 | Dataset: 0-2305000 | Loss: 3.498 | 674 ms/step , 58344.43 GFLOP/s , 532981.8 tokens/s INFO:__main__:2024-10-26 18:47:45 | Epoch: 0 | Step: 2890 | Dataset: 0-2313000 | Loss: 3.217 | 675 ms/step , 58220.71 GFLOP/s , 533205.1 tokens/s INFO:__main__:2024-10-26 18:47:53 | Epoch: 0 | Step: 2900 | Dataset: 0-2321000 | Loss: 3.175 | 674 ms/step , 58317.87 GFLOP/s , 533551.7 tokens/s INFO:__main__:2024-10-26 18:48:01 | Epoch: 0 | Step: 2910 | Dataset: 0-2329000 | Loss: 3.078 | 675 ms/step , 58233.46 GFLOP/s , 533002.3 tokens/s INFO:__main__:2024-10-26 18:48:08 | Epoch: 0 | Step: 2920 | Dataset: 0-2337000 | Loss: 3.077 | 675 ms/step , 58263.71 GFLOP/s , 532033.0 tokens/s INFO:__main__:2024-10-26 18:48:16 | Epoch: 0 | Step: 2930 | Dataset: 0-2345000 | Loss: 3.073 | 675 ms/step , 58197.24 GFLOP/s , 532603.0 tokens/s INFO:__main__:2024-10-26 18:48:24 | Epoch: 0 | Step: 2940 | Dataset: 0-2353000 | Loss: 3.085 | 674 ms/step , 58284.78 GFLOP/s , 532725.5 tokens/s INFO:__main__:2024-10-26 18:48:31 | Epoch: 0 | Step: 2950 | Dataset: 0-2361000 | Loss: 3.109 | 674 ms/step , 58302.47 GFLOP/s , 532567.4 tokens/s INFO:__main__:2024-10-26 18:48:39 | Epoch: 0 | Step: 2960 | Dataset: 0-2369000 | Loss: 3.005 | 674 ms/step , 58295.41 GFLOP/s , 532931.3 tokens/s INFO:__main__:2024-10-26 18:48:47 | Epoch: 0 | Step: 2970 | Dataset: 0-2377000 | Loss: 3.063 | 675 ms/step , 58238.49 GFLOP/s , 532809.2 tokens/s INFO:__main__:2024-10-26 18:48:54 | Epoch: 0 | Step: 2980 | Dataset: 0-2385000 | Loss: 3.056 | 675 ms/step , 58262.03 GFLOP/s , 532174.2 tokens/s INFO:__main__:2024-10-26 18:49:02 | Epoch: 0 | Step: 2990 | Dataset: 0-2393000 | Loss: 3.012 | 676 ms/step , 58163.91 GFLOP/s , 532745.3 tokens/s INFO:__main__:2024-10-26 18:49:09 | Validation | Step: 3000 | Val_loss: 3.105 | Best_val_loss: 3.5027 INFO:__main__:2024-10-26 18:49:09 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_184909_step_3000.pt` INFO:__main__:2024-10-26 18:49:11 | Epoch: 0 | Step: 3000 | Dataset: 0-2401000 | Loss: 2.923 | 674 ms/step , 58300.98 GFLOP/s , 479629.6 tokens/s INFO:__main__:2024-10-26 18:49:18 | Epoch: 0 | Step: 3010 | Dataset: 0-2409000 | Loss: 3.003 | 674 ms/step , 58319.59 GFLOP/s , 531055.3 tokens/s INFO:__main__:2024-10-26 18:49:26 | Epoch: 0 | Step: 3020 | Dataset: 0-2417000 | Loss: 3.104 | 675 ms/step , 58228.64 GFLOP/s , 532824.6 tokens/s INFO:__main__:2024-10-26 18:49:34 | Epoch: 0 | Step: 3030 | Dataset: 0-2425000 | Loss: 2.987 | 675 ms/step , 58248.24 GFLOP/s , 533162.6 tokens/s INFO:__main__:2024-10-26 18:49:41 | Epoch: 0 | Step: 3040 | Dataset: 0-2433000 | Loss: 3.144 | 674 ms/step , 58284.56 GFLOP/s , 533041.3 tokens/s INFO:__main__:2024-10-26 18:49:49 | Epoch: 0 | Step: 3050 | Dataset: 0-2441000 | Loss: 3.130 | 675 ms/step , 58254.55 GFLOP/s , 533395.6 tokens/s INFO:__main__:2024-10-26 18:49:57 | Epoch: 0 | Step: 3060 | Dataset: 0-2449000 | Loss: 3.107 | 675 ms/step , 58232.66 GFLOP/s , 533127.8 tokens/s INFO:__main__:2024-10-26 18:50:04 | Epoch: 0 | Step: 3070 | Dataset: 0-2457000 | Loss: 3.163 | 674 ms/step , 58311.30 GFLOP/s , 532583.2 tokens/s INFO:__main__:2024-10-26 18:50:12 | Epoch: 0 | Step: 3080 | Dataset: 0-2465000 | Loss: 3.066 | 675 ms/step , 58245.19 GFLOP/s , 533360.2 tokens/s INFO:__main__:2024-10-26 18:50:20 | Epoch: 0 | Step: 3090 | Dataset: 0-2473000 | Loss: 3.056 | 675 ms/step , 58241.17 GFLOP/s , 532681.0 tokens/s INFO:__main__:2024-10-26 18:50:28 | Epoch: 0 | Step: 3100 | Dataset: 0-2481000 | Loss: 2.973 | 675 ms/step , 58252.60 GFLOP/s , 533214.9 tokens/s INFO:__main__:2024-10-26 18:50:35 | Epoch: 0 | Step: 3110 | Dataset: 0-2489000 | Loss: 2.983 | 674 ms/step , 58298.12 GFLOP/s , 532860.4 tokens/s INFO:__main__:2024-10-26 18:50:43 | Epoch: 0 | Step: 3120 | Dataset: 0-2497000 | Loss: 2.943 | 674 ms/step , 58343.61 GFLOP/s , 533568.1 tokens/s INFO:__main__:2024-10-26 18:50:51 | Epoch: 0 | Step: 3130 | Dataset: 0-2505000 | Loss: 3.034 | 675 ms/step , 58268.14 GFLOP/s , 532937.6 tokens/s INFO:__main__:2024-10-26 18:50:58 | Epoch: 0 | Step: 3140 | Dataset: 0-2513000 | Loss: 2.966 | 674 ms/step , 58341.51 GFLOP/s , 533618.7 tokens/s INFO:__main__:2024-10-26 18:51:06 | Epoch: 0 | Step: 3150 | Dataset: 0-2521000 | Loss: 3.034 | 675 ms/step , 58274.26 GFLOP/s , 533193.5 tokens/s INFO:__main__:2024-10-26 18:51:14 | Epoch: 0 | Step: 3160 | Dataset: 0-2529000 | Loss: 3.018 | 674 ms/step , 58322.56 GFLOP/s , 533201.6 tokens/s INFO:__main__:2024-10-26 18:51:21 | Epoch: 0 | Step: 3170 | Dataset: 0-2537000 | Loss: 2.912 | 675 ms/step , 58239.19 GFLOP/s , 532897.8 tokens/s INFO:__main__:2024-10-26 18:51:29 | Epoch: 0 | Step: 3180 | Dataset: 0-2545000 | Loss: 3.010 | 675 ms/step , 58250.61 GFLOP/s , 532525.4 tokens/s INFO:__main__:2024-10-26 18:51:37 | Epoch: 0 | Step: 3190 | Dataset: 0-2553000 | Loss: 2.871 | 674 ms/step , 58316.06 GFLOP/s , 533443.9 tokens/s INFO:__main__:2024-10-26 18:51:44 | Epoch: 0 | Step: 3200 | Dataset: 0-2561000 | Loss: 3.038 | 675 ms/step , 58223.30 GFLOP/s , 533330.4 tokens/s INFO:__main__:2024-10-26 18:51:52 | Epoch: 0 | Step: 3210 | Dataset: 0-2569000 | Loss: 3.039 | 674 ms/step , 58280.33 GFLOP/s , 533175.5 tokens/s INFO:__main__:2024-10-26 18:52:00 | Epoch: 0 | Step: 3220 | Dataset: 0-2577000 | Loss: 2.897 | 675 ms/step , 58271.97 GFLOP/s , 533242.0 tokens/s INFO:__main__:2024-10-26 18:52:07 | Epoch: 0 | Step: 3230 | Dataset: 0-2585000 | Loss: 2.946 | 674 ms/step , 58326.05 GFLOP/s , 533264.3 tokens/s INFO:__main__:2024-10-26 18:52:15 | Epoch: 0 | Step: 3240 | Dataset: 0-2593000 | Loss: 2.939 | 675 ms/step , 58256.04 GFLOP/s , 532836.7 tokens/s INFO:__main__:2024-10-26 18:52:23 | Epoch: 0 | Step: 3250 | Dataset: 0-2601000 | Loss: 2.939 | 674 ms/step , 58298.86 GFLOP/s , 533422.7 tokens/s INFO:__main__:2024-10-26 18:52:30 | Epoch: 0 | Step: 3260 | Dataset: 0-2609000 | Loss: 2.941 | 674 ms/step , 58300.34 GFLOP/s , 533319.1 tokens/s INFO:__main__:2024-10-26 18:52:38 | Epoch: 0 | Step: 3270 | Dataset: 0-2617000 | Loss: 2.911 | 675 ms/step , 58252.07 GFLOP/s , 533496.3 tokens/s INFO:__main__:2024-10-26 18:52:46 | Epoch: 0 | Step: 3280 | Dataset: 0-2625000 | Loss: 2.986 | 674 ms/step , 58318.21 GFLOP/s , 533303.8 tokens/s INFO:__main__:2024-10-26 18:52:53 | Epoch: 0 | Step: 3290 | Dataset: 0-2633000 | Loss: 2.954 | 674 ms/step , 58313.10 GFLOP/s , 533368.2 tokens/s INFO:__main__:2024-10-26 18:53:01 | Epoch: 0 | Step: 3300 | Dataset: 0-2641000 | Loss: 2.898 | 674 ms/step , 58325.77 GFLOP/s , 533579.5 tokens/s INFO:__main__:2024-10-26 18:53:09 | Epoch: 0 | Step: 3310 | Dataset: 0-2649000 | Loss: 2.855 | 674 ms/step , 58326.65 GFLOP/s , 533483.4 tokens/s INFO:__main__:2024-10-26 18:53:16 | Epoch: 0 | Step: 3320 | Dataset: 0-2657000 | Loss: 2.917 | 674 ms/step , 58318.36 GFLOP/s , 533688.9 tokens/s INFO:__main__:2024-10-26 18:53:24 | Epoch: 0 | Step: 3330 | Dataset: 0-2665000 | Loss: 2.909 | 674 ms/step , 58306.44 GFLOP/s , 532357.6 tokens/s INFO:__main__:2024-10-26 18:53:32 | Epoch: 0 | Step: 3340 | Dataset: 0-2673000 | Loss: 2.850 | 673 ms/step , 58445.29 GFLOP/s , 534011.4 tokens/s INFO:__main__:2024-10-26 18:53:40 | Epoch: 0 | Step: 3350 | Dataset: 0-2681000 | Loss: 2.861 | 675 ms/step , 58276.07 GFLOP/s , 533629.5 tokens/s INFO:__main__:2024-10-26 18:53:47 | Epoch: 0 | Step: 3360 | Dataset: 0-2689000 | Loss: 2.875 | 673 ms/step , 58384.22 GFLOP/s , 534028.1 tokens/s INFO:__main__:2024-10-26 18:53:55 | Epoch: 0 | Step: 3370 | Dataset: 0-2697000 | Loss: 3.014 | 674 ms/step , 58300.61 GFLOP/s , 533764.9 tokens/s INFO:__main__:2024-10-26 18:54:03 | Epoch: 0 | Step: 3380 | Dataset: 0-2705000 | Loss: 2.951 | 674 ms/step , 58340.21 GFLOP/s , 533780.6 tokens/s INFO:__main__:2024-10-26 18:54:10 | Epoch: 0 | Step: 3390 | Dataset: 0-2713000 | Loss: 2.813 | 675 ms/step , 58196.85 GFLOP/s , 533704.6 tokens/s INFO:__main__:2024-10-26 18:54:18 | Epoch: 0 | Step: 3400 | Dataset: 0-2721000 | Loss: 2.722 | 674 ms/step , 58300.35 GFLOP/s , 533384.8 tokens/s INFO:__main__:2024-10-26 18:54:26 | Epoch: 0 | Step: 3410 | Dataset: 0-2729000 | Loss: 2.734 | 674 ms/step , 58300.35 GFLOP/s , 533507.5 tokens/s INFO:__main__:2024-10-26 18:54:33 | Epoch: 0 | Step: 3420 | Dataset: 0-2737000 | Loss: 2.607 | 674 ms/step , 58349.38 GFLOP/s , 533402.1 tokens/s INFO:__main__:2024-10-26 18:54:41 | Epoch: 0 | Step: 3430 | Dataset: 0-2745000 | Loss: 2.638 | 673 ms/step , 58390.67 GFLOP/s , 533864.7 tokens/s INFO:__main__:2024-10-26 18:54:49 | Epoch: 0 | Step: 3440 | Dataset: 0-2753000 | Loss: 2.588 | 678 ms/step , 58000.06 GFLOP/s , 532964.7 tokens/s INFO:__main__:2024-10-26 18:54:56 | Epoch: 0 | Step: 3450 | Dataset: 0-2761000 | Loss: 2.531 | 674 ms/step , 58344.58 GFLOP/s , 533584.2 tokens/s INFO:__main__:2024-10-26 18:55:04 | Epoch: 0 | Step: 3460 | Dataset: 0-2769000 | Loss: 2.520 | 674 ms/step , 58334.62 GFLOP/s , 533467.6 tokens/s INFO:__main__:2024-10-26 18:55:12 | Epoch: 0 | Step: 3470 | Dataset: 0-2777000 | Loss: 2.508 | 675 ms/step , 58204.34 GFLOP/s , 533390.8 tokens/s INFO:__main__:2024-10-26 18:55:19 | Epoch: 0 | Step: 3480 | Dataset: 0-2785000 | Loss: 2.479 | 673 ms/step , 58368.23 GFLOP/s , 533789.8 tokens/s INFO:__main__:2024-10-26 18:55:27 | Epoch: 0 | Step: 3490 | Dataset: 0-2793000 | Loss: 2.405 | 673 ms/step , 58381.76 GFLOP/s , 533554.1 tokens/s INFO:__main__:2024-10-26 18:55:35 | Epoch: 0 | Step: 3500 | Dataset: 0-2801000 | Loss: 2.442 | 674 ms/step , 58333.23 GFLOP/s , 534008.4 tokens/s INFO:__main__:2024-10-26 18:55:42 | Epoch: 0 | Step: 3510 | Dataset: 0-2809000 | Loss: 2.406 | 674 ms/step , 58354.96 GFLOP/s , 533980.9 tokens/s INFO:__main__:2024-10-26 18:55:50 | Epoch: 0 | Step: 3520 | Dataset: 0-2817000 | Loss: 2.458 | 673 ms/step , 58408.60 GFLOP/s , 533478.6 tokens/s INFO:__main__:2024-10-26 18:55:58 | Epoch: 0 | Step: 3530 | Dataset: 0-2825000 | Loss: 3.163 | 673 ms/step , 58402.20 GFLOP/s , 534172.3 tokens/s INFO:__main__:2024-10-26 18:56:05 | Epoch: 0 | Step: 3540 | Dataset: 0-2833000 | Loss: 2.988 | 674 ms/step , 58311.94 GFLOP/s , 533948.7 tokens/s INFO:__main__:2024-10-26 18:56:13 | Epoch: 0 | Step: 3550 | Dataset: 0-2841000 | Loss: 2.964 | 674 ms/step , 58334.76 GFLOP/s , 533709.8 tokens/s INFO:__main__:2024-10-26 18:56:21 | Epoch: 0 | Step: 3560 | Dataset: 0-2849000 | Loss: 2.929 | 673 ms/step , 58385.55 GFLOP/s , 534456.4 tokens/s INFO:__main__:2024-10-26 18:56:28 | Epoch: 0 | Step: 3570 | Dataset: 0-2857000 | Loss: 2.975 | 674 ms/step , 58347.43 GFLOP/s , 534598.4 tokens/s INFO:__main__:2024-10-26 18:56:36 | Epoch: 0 | Step: 3580 | Dataset: 0-2865000 | Loss: 2.896 | 673 ms/step , 58408.46 GFLOP/s , 533962.1 tokens/s INFO:__main__:2024-10-26 18:56:44 | Epoch: 0 | Step: 3590 | Dataset: 0-2873000 | Loss: 2.823 | 674 ms/step , 58314.81 GFLOP/s , 534119.7 tokens/s INFO:__main__:2024-10-26 18:56:51 | Epoch: 0 | Step: 3600 | Dataset: 0-2881000 | Loss: 2.905 | 673 ms/step , 58408.88 GFLOP/s , 533876.4 tokens/s INFO:__main__:2024-10-26 18:56:59 | Epoch: 0 | Step: 3610 | Dataset: 0-2889000 | Loss: 2.842 | 675 ms/step , 58277.03 GFLOP/s , 533870.8 tokens/s INFO:__main__:2024-10-26 18:57:07 | Epoch: 0 | Step: 3620 | Dataset: 0-2897000 | Loss: 2.845 | 673 ms/step , 58367.86 GFLOP/s , 533939.0 tokens/s INFO:__main__:2024-10-26 18:57:14 | Epoch: 0 | Step: 3630 | Dataset: 0-2905000 | Loss: 2.881 | 674 ms/step , 58354.97 GFLOP/s , 533874.2 tokens/s INFO:__main__:2024-10-26 18:57:22 | Epoch: 0 | Step: 3640 | Dataset: 0-2913000 | Loss: 2.855 | 674 ms/step , 58295.89 GFLOP/s , 533822.2 tokens/s INFO:__main__:2024-10-26 18:57:30 | Epoch: 0 | Step: 3650 | Dataset: 0-2921000 | Loss: 2.874 | 674 ms/step , 58311.24 GFLOP/s , 533477.2 tokens/s INFO:__main__:2024-10-26 18:57:37 | Epoch: 0 | Step: 3660 | Dataset: 0-2929000 | Loss: 2.830 | 677 ms/step , 58073.14 GFLOP/s , 533652.8 tokens/s INFO:__main__:2024-10-26 18:57:45 | Epoch: 0 | Step: 3670 | Dataset: 0-2937000 | Loss: 2.826 | 674 ms/step , 58355.60 GFLOP/s , 533047.3 tokens/s INFO:__main__:2024-10-26 18:57:53 | Epoch: 0 | Step: 3680 | Dataset: 0-2945000 | Loss: 2.812 | 674 ms/step , 58306.68 GFLOP/s , 533325.6 tokens/s INFO:__main__:2024-10-26 18:58:00 | Epoch: 0 | Step: 3690 | Dataset: 0-2953000 | Loss: 2.873 | 675 ms/step , 58268.69 GFLOP/s , 532883.3 tokens/s INFO:__main__:2024-10-26 18:58:08 | Epoch: 0 | Step: 3700 | Dataset: 0-2961000 | Loss: 2.845 | 675 ms/step , 58206.17 GFLOP/s , 532567.4 tokens/s INFO:__main__:2024-10-26 18:58:16 | Epoch: 0 | Step: 3710 | Dataset: 0-2969000 | Loss: 2.840 | 675 ms/step , 58278.16 GFLOP/s , 532491.6 tokens/s INFO:__main__:2024-10-26 18:58:24 | Epoch: 0 | Step: 3720 | Dataset: 0-2977000 | Loss: 2.872 | 675 ms/step , 58256.64 GFLOP/s , 532567.1 tokens/s INFO:__main__:2024-10-26 18:58:31 | Epoch: 0 | Step: 3730 | Dataset: 0-2985000 | Loss: 2.873 | 675 ms/step , 58222.40 GFLOP/s , 531871.2 tokens/s INFO:__main__:2024-10-26 18:58:39 | Epoch: 0 | Step: 3740 | Dataset: 0-2993000 | Loss: 2.850 | 675 ms/step , 58262.60 GFLOP/s , 532392.5 tokens/s INFO:__main__:2024-10-26 18:58:47 | Epoch: 0 | Step: 3750 | Dataset: 0-3001000 | Loss: 2.892 | 675 ms/step , 58193.91 GFLOP/s , 532376.2 tokens/s INFO:__main__:2024-10-26 18:58:54 | Epoch: 0 | Step: 3760 | Dataset: 0-3009000 | Loss: 2.800 | 675 ms/step , 58218.10 GFLOP/s , 531759.0 tokens/s INFO:__main__:2024-10-26 18:59:02 | Epoch: 0 | Step: 3770 | Dataset: 0-3017000 | Loss: 2.823 | 675 ms/step , 58271.21 GFLOP/s , 532311.1 tokens/s INFO:__main__:2024-10-26 18:59:10 | Epoch: 0 | Step: 3780 | Dataset: 0-3025000 | Loss: 2.799 | 675 ms/step , 58199.71 GFLOP/s , 531943.8 tokens/s INFO:__main__:2024-10-26 18:59:17 | Epoch: 0 | Step: 3790 | Dataset: 0-3033000 | Loss: 2.793 | 675 ms/step , 58267.78 GFLOP/s , 531137.1 tokens/s INFO:__main__:2024-10-26 18:59:25 | Epoch: 0 | Step: 3800 | Dataset: 0-3041000 | Loss: 2.846 | 675 ms/step , 58207.43 GFLOP/s , 531390.4 tokens/s INFO:__main__:2024-10-26 18:59:33 | Epoch: 0 | Step: 3810 | Dataset: 0-3049000 | Loss: 2.761 | 674 ms/step , 58323.34 GFLOP/s , 533151.0 tokens/s INFO:__main__:2024-10-26 18:59:41 | Epoch: 0 | Step: 3820 | Dataset: 0-3057000 | Loss: 2.886 | 674 ms/step , 58333.07 GFLOP/s , 532592.2 tokens/s INFO:__main__:2024-10-26 18:59:48 | Epoch: 0 | Step: 3830 | Dataset: 0-3065000 | Loss: 2.755 | 675 ms/step , 58228.65 GFLOP/s , 533086.6 tokens/s INFO:__main__:2024-10-26 18:59:56 | Epoch: 0 | Step: 3840 | Dataset: 0-3073000 | Loss: 2.769 | 675 ms/step , 58250.49 GFLOP/s , 533125.5 tokens/s INFO:__main__:2024-10-26 19:00:03 | Epoch: 0 | Step: 3850 | Dataset: 0-3081000 | Loss: 2.554 | 673 ms/step , 58451.60 GFLOP/s , 608067.5 tokens/s INFO:__main__:2024-10-26 19:00:10 | Epoch: 0 | Step: 3860 | Dataset: 0-3089000 | Loss: 2.441 | 674 ms/step , 58307.21 GFLOP/s , 533017.4 tokens/s INFO:__main__:2024-10-26 19:00:18 | Epoch: 0 | Step: 3870 | Dataset: 0-3097000 | Loss: 2.271 | 674 ms/step , 58322.43 GFLOP/s , 532906.7 tokens/s INFO:__main__:2024-10-26 19:00:26 | Epoch: 0 | Step: 3880 | Dataset: 0-3105000 | Loss: 2.213 | 674 ms/step , 58296.20 GFLOP/s , 532574.0 tokens/s INFO:__main__:2024-10-26 19:00:33 | Epoch: 0 | Step: 3890 | Dataset: 0-3113000 | Loss: 2.171 | 675 ms/step , 58209.58 GFLOP/s , 531300.0 tokens/s INFO:__main__:2024-10-26 19:00:41 | Epoch: 0 | Step: 3900 | Dataset: 0-3121000 | Loss: 2.103 | 674 ms/step , 58292.50 GFLOP/s , 531206.0 tokens/s INFO:__main__:2024-10-26 19:00:49 | Epoch: 0 | Step: 3910 | Dataset: 0-3129000 | Loss: 2.939 | 674 ms/step , 58315.12 GFLOP/s , 532032.8 tokens/s INFO:__main__:2024-10-26 19:00:57 | Epoch: 0 | Step: 3920 | Dataset: 0-3137000 | Loss: 2.866 | 674 ms/step , 58286.86 GFLOP/s , 531134.3 tokens/s INFO:__main__:2024-10-26 19:01:04 | Epoch: 0 | Step: 3930 | Dataset: 0-3145000 | Loss: 2.740 | 677 ms/step , 58064.40 GFLOP/s , 530958.8 tokens/s INFO:__main__:2024-10-26 19:01:12 | Epoch: 0 | Step: 3940 | Dataset: 0-3153000 | Loss: 2.847 | 677 ms/step , 58084.71 GFLOP/s , 530655.9 tokens/s INFO:__main__:2024-10-26 19:01:20 | Epoch: 0 | Step: 3950 | Dataset: 0-3161000 | Loss: 2.773 | 722 ms/step , 54438.38 GFLOP/s , 524406.8 tokens/s INFO:__main__:2024-10-26 19:01:28 | Epoch: 0 | Step: 3960 | Dataset: 0-3169000 | Loss: 2.780 | 722 ms/step , 54462.23 GFLOP/s , 497048.0 tokens/s INFO:__main__:2024-10-26 19:01:36 | Epoch: 0 | Step: 3970 | Dataset: 0-3177000 | Loss: 2.790 | 675 ms/step , 58213.85 GFLOP/s , 530834.2 tokens/s INFO:__main__:2024-10-26 19:01:43 | Epoch: 0 | Step: 3980 | Dataset: 0-3185000 | Loss: 2.755 | 675 ms/step , 58236.28 GFLOP/s , 533055.6 tokens/s INFO:__main__:2024-10-26 19:01:51 | Epoch: 0 | Step: 3990 | Dataset: 0-3193000 | Loss: 2.752 | 674 ms/step , 58281.00 GFLOP/s , 532946.0 tokens/s INFO:__main__:2024-10-26 19:01:58 | Validation | Step: 4000 | Val_loss: 2.843 | Best_val_loss: 3.1050 INFO:__main__:2024-10-26 19:01:58 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_190158_step_4000.pt` INFO:__main__:2024-10-26 19:02:00 | Epoch: 0 | Step: 4000 | Dataset: 0-3201000 | Loss: 2.735 | 673 ms/step , 58435.55 GFLOP/s , 480344.9 tokens/s INFO:__main__:2024-10-26 19:02:07 | Epoch: 0 | Step: 4010 | Dataset: 0-3209000 | Loss: 2.707 | 674 ms/step , 58316.06 GFLOP/s , 532798.5 tokens/s INFO:__main__:2024-10-26 19:02:15 | Epoch: 0 | Step: 4020 | Dataset: 0-3217000 | Loss: 2.760 | 674 ms/step , 58332.28 GFLOP/s , 532810.5 tokens/s INFO:__main__:2024-10-26 19:02:23 | Epoch: 0 | Step: 4030 | Dataset: 0-3225000 | Loss: 2.762 | 673 ms/step , 58385.78 GFLOP/s , 533436.6 tokens/s INFO:__main__:2024-10-26 19:02:30 | Epoch: 0 | Step: 4040 | Dataset: 0-3233000 | Loss: 2.753 | 685 ms/step , 57393.66 GFLOP/s , 525108.7 tokens/s INFO:__main__:2024-10-26 19:02:38 | Epoch: 0 | Step: 4050 | Dataset: 0-3241000 | Loss: 2.713 | 685 ms/step , 57401.45 GFLOP/s , 524861.0 tokens/s INFO:__main__:2024-10-26 19:02:46 | Epoch: 0 | Step: 4060 | Dataset: 0-3249000 | Loss: 2.793 | 675 ms/step , 58240.17 GFLOP/s , 532554.2 tokens/s INFO:__main__:2024-10-26 19:02:54 | Epoch: 0 | Step: 4070 | Dataset: 0-3257000 | Loss: 2.766 | 673 ms/step , 58388.75 GFLOP/s , 533578.5 tokens/s INFO:__main__:2024-10-26 19:03:01 | Epoch: 0 | Step: 4080 | Dataset: 0-3265000 | Loss: 2.724 | 674 ms/step , 58360.37 GFLOP/s , 533254.3 tokens/s INFO:__main__:2024-10-26 19:03:09 | Epoch: 0 | Step: 4090 | Dataset: 0-3273000 | Loss: 2.777 | 673 ms/step , 58368.99 GFLOP/s , 533285.1 tokens/s INFO:__main__:2024-10-26 19:03:17 | Epoch: 0 | Step: 4100 | Dataset: 0-3281000 | Loss: 2.765 | 674 ms/step , 58330.59 GFLOP/s , 533557.6 tokens/s INFO:__main__:2024-10-26 19:03:24 | Epoch: 0 | Step: 4110 | Dataset: 0-3289000 | Loss: 2.791 | 674 ms/step , 58353.25 GFLOP/s , 532940.2 tokens/s INFO:__main__:2024-10-26 19:03:32 | Epoch: 0 | Step: 4120 | Dataset: 0-3297000 | Loss: 2.740 | 676 ms/step , 58173.09 GFLOP/s , 532196.2 tokens/s INFO:__main__:2024-10-26 19:03:40 | Epoch: 0 | Step: 4130 | Dataset: 0-3305000 | Loss: 2.689 | 675 ms/step , 58198.45 GFLOP/s , 531673.7 tokens/s INFO:__main__:2024-10-26 19:03:47 | Epoch: 0 | Step: 4140 | Dataset: 0-3313000 | Loss: 2.728 | 677 ms/step , 58087.34 GFLOP/s , 531357.7 tokens/s INFO:__main__:2024-10-26 19:03:55 | Epoch: 0 | Step: 4150 | Dataset: 0-3321000 | Loss: 2.723 | 676 ms/step , 58170.69 GFLOP/s , 531025.7 tokens/s INFO:__main__:2024-10-26 19:04:03 | Epoch: 0 | Step: 4160 | Dataset: 0-3329000 | Loss: 2.719 | 674 ms/step , 58318.47 GFLOP/s , 532340.1 tokens/s INFO:__main__:2024-10-26 19:04:11 | Epoch: 0 | Step: 4170 | Dataset: 0-3337000 | Loss: 2.717 | 674 ms/step , 58327.27 GFLOP/s , 533299.5 tokens/s INFO:__main__:2024-10-26 19:04:18 | Epoch: 0 | Step: 4180 | Dataset: 0-3345000 | Loss: 2.648 | 674 ms/step , 58356.05 GFLOP/s , 533422.8 tokens/s INFO:__main__:2024-10-26 19:04:26 | Epoch: 0 | Step: 4190 | Dataset: 0-3353000 | Loss: 2.739 | 675 ms/step , 58248.82 GFLOP/s , 532826.7 tokens/s INFO:__main__:2024-10-26 19:04:34 | Epoch: 0 | Step: 4200 | Dataset: 0-3361000 | Loss: 2.706 | 673 ms/step , 58426.90 GFLOP/s , 533244.2 tokens/s INFO:__main__:2024-10-26 19:04:41 | Epoch: 0 | Step: 4210 | Dataset: 0-3369000 | Loss: 2.690 | 674 ms/step , 58361.89 GFLOP/s , 533567.1 tokens/s INFO:__main__:2024-10-26 19:04:49 | Epoch: 0 | Step: 4220 | Dataset: 0-3377000 | Loss: 2.720 | 674 ms/step , 58364.56 GFLOP/s , 533153.3 tokens/s INFO:__main__:2024-10-26 19:04:57 | Epoch: 0 | Step: 4230 | Dataset: 0-3385000 | Loss: 2.745 | 674 ms/step , 58364.57 GFLOP/s , 533087.5 tokens/s INFO:__main__:2024-10-26 19:05:04 | Epoch: 0 | Step: 4240 | Dataset: 0-3393000 | Loss: 2.761 | 672 ms/step , 58455.94 GFLOP/s , 533369.8 tokens/s INFO:__main__:2024-10-26 19:05:12 | Epoch: 0 | Step: 4250 | Dataset: 0-3401000 | Loss: 2.729 | 674 ms/step , 58331.08 GFLOP/s , 532570.9 tokens/s INFO:__main__:2024-10-26 19:05:20 | Epoch: 0 | Step: 4260 | Dataset: 0-3409000 | Loss: 2.764 | 675 ms/step , 58266.45 GFLOP/s , 533007.5 tokens/s INFO:__main__:2024-10-26 19:05:27 | Epoch: 0 | Step: 4270 | Dataset: 0-3417000 | Loss: 2.773 | 691 ms/step , 56921.51 GFLOP/s , 529503.3 tokens/s INFO:__main__:2024-10-26 19:05:35 | Epoch: 0 | Step: 4280 | Dataset: 0-3425000 | Loss: 2.739 | 698 ms/step , 56319.47 GFLOP/s , 526006.8 tokens/s INFO:__main__:2024-10-26 19:05:43 | Epoch: 0 | Step: 4290 | Dataset: 0-3433000 | Loss: 2.771 | 675 ms/step , 58252.21 GFLOP/s , 540834.7 tokens/s INFO:__main__:2024-10-26 19:05:51 | Epoch: 0 | Step: 4300 | Dataset: 0-3441000 | Loss: 2.752 | 675 ms/step , 58237.80 GFLOP/s , 531593.0 tokens/s INFO:__main__:2024-10-26 19:05:58 | Epoch: 0 | Step: 4310 | Dataset: 0-3449000 | Loss: 2.727 | 676 ms/step , 58177.93 GFLOP/s , 531544.5 tokens/s INFO:__main__:2024-10-26 19:06:06 | Epoch: 0 | Step: 4320 | Dataset: 0-3457000 | Loss: 2.712 | 712 ms/step , 55217.88 GFLOP/s , 528620.4 tokens/s INFO:__main__:2024-10-26 19:06:14 | Epoch: 0 | Step: 4330 | Dataset: 0-3465000 | Loss: 2.700 | 674 ms/step , 58347.62 GFLOP/s , 527080.6 tokens/s INFO:__main__:2024-10-26 19:06:22 | Epoch: 0 | Step: 4340 | Dataset: 0-3473000 | Loss: 2.668 | 685 ms/step , 57412.09 GFLOP/s , 526374.4 tokens/s INFO:__main__:2024-10-26 19:06:29 | Epoch: 0 | Step: 4350 | Dataset: 0-3481000 | Loss: 2.752 | 675 ms/step , 58237.76 GFLOP/s , 528364.4 tokens/s INFO:__main__:2024-10-26 19:06:37 | Epoch: 0 | Step: 4360 | Dataset: 0-3489000 | Loss: 2.720 | 675 ms/step , 58249.45 GFLOP/s , 531801.4 tokens/s INFO:__main__:2024-10-26 19:06:45 | Epoch: 0 | Step: 4370 | Dataset: 0-3497000 | Loss: 2.698 | 675 ms/step , 58198.37 GFLOP/s , 531832.6 tokens/s INFO:__main__:2024-10-26 19:06:52 | Epoch: 0 | Step: 4380 | Dataset: 0-3505000 | Loss: 2.659 | 674 ms/step , 58351.21 GFLOP/s , 531368.7 tokens/s INFO:__main__:2024-10-26 19:07:00 | Epoch: 0 | Step: 4390 | Dataset: 0-3513000 | Loss: 2.669 | 674 ms/step , 58320.02 GFLOP/s , 533262.0 tokens/s INFO:__main__:2024-10-26 19:07:08 | Epoch: 0 | Step: 4400 | Dataset: 0-3521000 | Loss: 2.798 | 675 ms/step , 58234.17 GFLOP/s , 533204.5 tokens/s INFO:__main__:2024-10-26 19:07:15 | Epoch: 0 | Step: 4410 | Dataset: 0-3529000 | Loss: 2.681 | 674 ms/step , 58296.35 GFLOP/s , 532908.8 tokens/s INFO:__main__:2024-10-26 19:07:23 | Epoch: 0 | Step: 4420 | Dataset: 0-3537000 | Loss: 2.604 | 675 ms/step , 58255.09 GFLOP/s , 532849.6 tokens/s INFO:__main__:2024-10-26 19:07:31 | Epoch: 0 | Step: 4430 | Dataset: 0-3545000 | Loss: 2.736 | 674 ms/step , 58289.16 GFLOP/s , 532816.4 tokens/s INFO:__main__:2024-10-26 19:07:38 | Epoch: 0 | Step: 4440 | Dataset: 0-3553000 | Loss: 2.685 | 674 ms/step , 58315.35 GFLOP/s , 533159.4 tokens/s INFO:__main__:2024-10-26 19:07:46 | Epoch: 0 | Step: 4450 | Dataset: 0-3561000 | Loss: 2.710 | 674 ms/step , 58294.45 GFLOP/s , 533286.8 tokens/s INFO:__main__:2024-10-26 19:07:54 | Epoch: 0 | Step: 4460 | Dataset: 0-3569000 | Loss: 2.699 | 675 ms/step , 58212.56 GFLOP/s , 530737.8 tokens/s INFO:__main__:2024-10-26 19:08:02 | Epoch: 0 | Step: 4470 | Dataset: 0-3577000 | Loss: 2.646 | 674 ms/step , 58350.02 GFLOP/s , 533466.2 tokens/s INFO:__main__:2024-10-26 19:08:09 | Epoch: 0 | Step: 4480 | Dataset: 0-3585000 | Loss: 2.650 | 676 ms/step , 58106.97 GFLOP/s , 532625.1 tokens/s INFO:__main__:2024-10-26 19:08:17 | Epoch: 0 | Step: 4490 | Dataset: 0-3593000 | Loss: 2.574 | 674 ms/step , 58302.06 GFLOP/s , 533060.8 tokens/s INFO:__main__:2024-10-26 19:08:25 | Epoch: 0 | Step: 4500 | Dataset: 0-3601000 | Loss: 2.600 | 675 ms/step , 58196.22 GFLOP/s , 532644.8 tokens/s INFO:__main__:2024-10-26 19:08:32 | Epoch: 0 | Step: 4510 | Dataset: 0-3609000 | Loss: 2.604 | 675 ms/step , 58245.49 GFLOP/s , 533159.5 tokens/s INFO:__main__:2024-10-26 19:08:40 | Epoch: 0 | Step: 4520 | Dataset: 0-3617000 | Loss: 2.674 | 674 ms/step , 58304.87 GFLOP/s , 533127.9 tokens/s INFO:__main__:2024-10-26 19:08:48 | Epoch: 0 | Step: 4530 | Dataset: 0-3625000 | Loss: 2.618 | 675 ms/step , 58267.25 GFLOP/s , 532609.1 tokens/s INFO:__main__:2024-10-26 19:08:55 | Epoch: 0 | Step: 4540 | Dataset: 0-3633000 | Loss: 2.602 | 675 ms/step , 58210.20 GFLOP/s , 533124.3 tokens/s INFO:__main__:2024-10-26 19:09:03 | Epoch: 0 | Step: 4550 | Dataset: 0-3641000 | Loss: 2.619 | 675 ms/step , 58256.03 GFLOP/s , 532865.9 tokens/s INFO:__main__:2024-10-26 19:09:11 | Epoch: 0 | Step: 4560 | Dataset: 0-3649000 | Loss: 2.678 | 674 ms/step , 58339.13 GFLOP/s , 533143.1 tokens/s INFO:__main__:2024-10-26 19:09:18 | Epoch: 0 | Step: 4570 | Dataset: 0-3657000 | Loss: 2.673 | 674 ms/step , 58304.00 GFLOP/s , 532872.4 tokens/s INFO:__main__:2024-10-26 19:09:26 | Epoch: 0 | Step: 4580 | Dataset: 0-3665000 | Loss: 2.792 | 675 ms/step , 58236.25 GFLOP/s , 532921.2 tokens/s INFO:__main__:2024-10-26 19:09:34 | Epoch: 0 | Step: 4590 | Dataset: 0-3673000 | Loss: 2.690 | 675 ms/step , 58230.16 GFLOP/s , 532802.3 tokens/s INFO:__main__:2024-10-26 19:09:41 | Epoch: 0 | Step: 4600 | Dataset: 0-3681000 | Loss: 2.576 | 674 ms/step , 58286.46 GFLOP/s , 532942.1 tokens/s INFO:__main__:2024-10-26 19:09:49 | Epoch: 0 | Step: 4610 | Dataset: 0-3689000 | Loss: 2.697 | 675 ms/step , 58235.58 GFLOP/s , 532776.8 tokens/s INFO:__main__:2024-10-26 19:09:57 | Epoch: 0 | Step: 4620 | Dataset: 0-3697000 | Loss: 2.625 | 674 ms/step , 58326.14 GFLOP/s , 533292.2 tokens/s INFO:__main__:2024-10-26 19:10:05 | Epoch: 0 | Step: 4630 | Dataset: 0-3705000 | Loss: 2.682 | 674 ms/step , 58348.42 GFLOP/s , 532917.6 tokens/s INFO:__main__:2024-10-26 19:10:12 | Epoch: 0 | Step: 4640 | Dataset: 0-3713000 | Loss: 2.663 | 675 ms/step , 58275.00 GFLOP/s , 532710.6 tokens/s INFO:__main__:2024-10-26 19:10:20 | Epoch: 0 | Step: 4650 | Dataset: 0-3721000 | Loss: 2.572 | 674 ms/step , 58314.49 GFLOP/s , 533186.8 tokens/s INFO:__main__:2024-10-26 19:10:28 | Epoch: 0 | Step: 4660 | Dataset: 0-3729000 | Loss: 2.659 | 674 ms/step , 58283.80 GFLOP/s , 532577.7 tokens/s INFO:__main__:2024-10-26 19:10:35 | Epoch: 0 | Step: 4670 | Dataset: 0-3737000 | Loss: 2.619 | 674 ms/step , 58323.14 GFLOP/s , 533408.5 tokens/s INFO:__main__:2024-10-26 19:10:43 | Epoch: 0 | Step: 4680 | Dataset: 0-3745000 | Loss: 2.598 | 675 ms/step , 58245.61 GFLOP/s , 531711.8 tokens/s INFO:__main__:2024-10-26 19:10:51 | Epoch: 0 | Step: 4690 | Dataset: 0-3753000 | Loss: 2.620 | 674 ms/step , 58315.37 GFLOP/s , 533355.6 tokens/s INFO:__main__:2024-10-26 19:10:58 | Epoch: 0 | Step: 4700 | Dataset: 0-3761000 | Loss: 2.616 | 675 ms/step , 58267.50 GFLOP/s , 533051.5 tokens/s INFO:__main__:2024-10-26 19:11:06 | Epoch: 0 | Step: 4710 | Dataset: 0-3769000 | Loss: 2.705 | 674 ms/step , 58284.03 GFLOP/s , 532836.3 tokens/s INFO:__main__:2024-10-26 19:11:14 | Epoch: 0 | Step: 4720 | Dataset: 0-3777000 | Loss: 3.158 | 675 ms/step , 58233.90 GFLOP/s , 532902.1 tokens/s INFO:__main__:2024-10-26 19:11:21 | Epoch: 0 | Step: 4730 | Dataset: 0-3785000 | Loss: 2.722 | 673 ms/step , 58392.16 GFLOP/s , 533304.5 tokens/s INFO:__main__:2024-10-26 19:11:29 | Epoch: 0 | Step: 4740 | Dataset: 0-3793000 | Loss: 2.482 | 675 ms/step , 58196.51 GFLOP/s , 533005.1 tokens/s INFO:__main__:2024-10-26 19:11:37 | Epoch: 0 | Step: 4750 | Dataset: 0-3801000 | Loss: 2.404 | 675 ms/step , 58245.68 GFLOP/s , 532872.1 tokens/s INFO:__main__:2024-10-26 19:11:44 | Epoch: 0 | Step: 4760 | Dataset: 0-3809000 | Loss: 2.330 | 674 ms/step , 58323.42 GFLOP/s , 533121.0 tokens/s INFO:__main__:2024-10-26 19:11:52 | Epoch: 0 | Step: 4770 | Dataset: 0-3817000 | Loss: 2.268 | 675 ms/step , 58212.30 GFLOP/s , 532426.8 tokens/s INFO:__main__:2024-10-26 19:12:00 | Epoch: 0 | Step: 4780 | Dataset: 0-3825000 | Loss: 2.228 | 673 ms/step , 58379.53 GFLOP/s , 532538.0 tokens/s INFO:__main__:2024-10-26 19:12:08 | Epoch: 0 | Step: 4790 | Dataset: 0-3833000 | Loss: 2.216 | 673 ms/step , 58408.15 GFLOP/s , 532927.7 tokens/s INFO:__main__:2024-10-26 19:12:15 | Epoch: 0 | Step: 4800 | Dataset: 0-3841000 | Loss: 3.330 | 673 ms/step , 58376.28 GFLOP/s , 532716.3 tokens/s INFO:__main__:2024-10-26 19:12:23 | Epoch: 0 | Step: 4810 | Dataset: 0-3849000 | Loss: 2.802 | 675 ms/step , 58220.81 GFLOP/s , 533115.8 tokens/s INFO:__main__:2024-10-26 19:12:31 | Epoch: 0 | Step: 4820 | Dataset: 0-3857000 | Loss: 2.701 | 673 ms/step , 58387.10 GFLOP/s , 533346.9 tokens/s INFO:__main__:2024-10-26 19:12:38 | Epoch: 0 | Step: 4830 | Dataset: 0-3865000 | Loss: 2.631 | 674 ms/step , 58336.90 GFLOP/s , 533323.7 tokens/s INFO:__main__:2024-10-26 19:12:46 | Epoch: 0 | Step: 4840 | Dataset: 0-3873000 | Loss: 2.721 | 673 ms/step , 58399.41 GFLOP/s , 533346.7 tokens/s INFO:__main__:2024-10-26 19:12:54 | Epoch: 0 | Step: 4850 | Dataset: 0-3881000 | Loss: 2.594 | 675 ms/step , 58269.71 GFLOP/s , 533493.9 tokens/s INFO:__main__:2024-10-26 19:13:01 | Epoch: 0 | Step: 4860 | Dataset: 0-3889000 | Loss: 2.558 | 673 ms/step , 58387.08 GFLOP/s , 532028.0 tokens/s INFO:__main__:2024-10-26 19:13:09 | Epoch: 0 | Step: 4870 | Dataset: 0-3897000 | Loss: 2.695 | 673 ms/step , 58386.71 GFLOP/s , 534110.7 tokens/s INFO:__main__:2024-10-26 19:13:17 | Epoch: 0 | Step: 4880 | Dataset: 0-3905000 | Loss: 2.605 | 674 ms/step , 58283.09 GFLOP/s , 533588.2 tokens/s INFO:__main__:2024-10-26 19:13:24 | Epoch: 0 | Step: 4890 | Dataset: 0-3913000 | Loss: 2.587 | 675 ms/step , 58272.93 GFLOP/s , 533585.7 tokens/s INFO:__main__:2024-10-26 19:13:32 | Epoch: 0 | Step: 4900 | Dataset: 0-3921000 | Loss: 2.655 | 672 ms/step , 58453.35 GFLOP/s , 533904.0 tokens/s INFO:__main__:2024-10-26 19:13:40 | Epoch: 0 | Step: 4910 | Dataset: 0-3929000 | Loss: 2.687 | 674 ms/step , 58353.66 GFLOP/s , 533416.6 tokens/s INFO:__main__:2024-10-26 19:13:47 | Epoch: 0 | Step: 4920 | Dataset: 0-3937000 | Loss: 2.657 | 674 ms/step , 58356.01 GFLOP/s , 534044.5 tokens/s INFO:__main__:2024-10-26 19:13:55 | Epoch: 0 | Step: 4930 | Dataset: 0-3945000 | Loss: 2.641 | 674 ms/step , 58344.62 GFLOP/s , 533549.3 tokens/s INFO:__main__:2024-10-26 19:14:03 | Epoch: 0 | Step: 4940 | Dataset: 0-3953000 | Loss: 2.673 | 675 ms/step , 58219.76 GFLOP/s , 533404.2 tokens/s INFO:__main__:2024-10-26 19:14:10 | Epoch: 0 | Step: 4950 | Dataset: 0-3961000 | Loss: 2.563 | 673 ms/step , 58408.32 GFLOP/s , 531866.6 tokens/s INFO:__main__:2024-10-26 19:14:18 | Epoch: 0 | Step: 4960 | Dataset: 0-3969000 | Loss: 2.544 | 674 ms/step , 58323.49 GFLOP/s , 533461.7 tokens/s INFO:__main__:2024-10-26 19:14:26 | Epoch: 0 | Step: 4970 | Dataset: 0-3977000 | Loss: 2.609 | 673 ms/step , 58371.95 GFLOP/s , 533543.9 tokens/s INFO:__main__:2024-10-26 19:14:33 | Epoch: 0 | Step: 4980 | Dataset: 0-3985000 | Loss: 2.651 | 674 ms/step , 58343.85 GFLOP/s , 533813.9 tokens/s INFO:__main__:2024-10-26 19:14:41 | Epoch: 0 | Step: 4990 | Dataset: 0-3993000 | Loss: 2.604 | 675 ms/step , 58268.54 GFLOP/s , 533298.8 tokens/s INFO:__main__:2024-10-26 19:14:48 | Validation | Step: 5000 | Val_loss: 2.635 | Best_val_loss: 2.8429 INFO:__main__:2024-10-26 19:14:48 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_191448_step_5000.pt` INFO:__main__:2024-10-26 19:14:50 | Epoch: 0 | Step: 5000 | Dataset: 0-4001000 | Loss: 2.667 | 672 ms/step , 58460.30 GFLOP/s , 480488.4 tokens/s INFO:__main__:2024-10-26 19:14:57 | Epoch: 0 | Step: 5010 | Dataset: 0-4009000 | Loss: 2.637 | 674 ms/step , 58287.38 GFLOP/s , 533210.4 tokens/s INFO:__main__:2024-10-26 19:15:05 | Epoch: 0 | Step: 5020 | Dataset: 0-4017000 | Loss: 2.630 | 674 ms/step , 58341.73 GFLOP/s , 533331.7 tokens/s INFO:__main__:2024-10-26 19:15:13 | Epoch: 0 | Step: 5030 | Dataset: 0-4025000 | Loss: 2.643 | 674 ms/step , 58281.08 GFLOP/s , 533533.1 tokens/s INFO:__main__:2024-10-26 19:15:20 | Epoch: 0 | Step: 5040 | Dataset: 0-4033000 | Loss: 2.585 | 674 ms/step , 58313.98 GFLOP/s , 533267.2 tokens/s INFO:__main__:2024-10-26 19:15:28 | Epoch: 0 | Step: 5050 | Dataset: 0-4041000 | Loss: 2.594 | 673 ms/step , 58391.58 GFLOP/s , 533318.0 tokens/s INFO:__main__:2024-10-26 19:15:36 | Epoch: 0 | Step: 5060 | Dataset: 0-4049000 | Loss: 2.556 | 675 ms/step , 58234.18 GFLOP/s , 532659.3 tokens/s INFO:__main__:2024-10-26 19:15:43 | Epoch: 0 | Step: 5070 | Dataset: 0-4057000 | Loss: 2.698 | 674 ms/step , 58348.65 GFLOP/s , 532954.0 tokens/s INFO:__main__:2024-10-26 19:15:51 | Epoch: 0 | Step: 5080 | Dataset: 0-4065000 | Loss: 2.565 | 674 ms/step , 58351.67 GFLOP/s , 533021.2 tokens/s INFO:__main__:2024-10-26 19:15:59 | Epoch: 0 | Step: 5090 | Dataset: 0-4073000 | Loss: 2.622 | 674 ms/step , 58282.08 GFLOP/s , 533194.0 tokens/s INFO:__main__:2024-10-26 19:16:06 | Epoch: 0 | Step: 5100 | Dataset: 0-4081000 | Loss: 2.614 | 674 ms/step , 58295.55 GFLOP/s , 532962.1 tokens/s INFO:__main__:2024-10-26 19:16:14 | Epoch: 0 | Step: 5110 | Dataset: 0-4089000 | Loss: 2.586 | 674 ms/step , 58289.69 GFLOP/s , 532719.1 tokens/s INFO:__main__:2024-10-26 19:16:22 | Epoch: 0 | Step: 5120 | Dataset: 0-4097000 | Loss: 2.572 | 675 ms/step , 58216.39 GFLOP/s , 532764.8 tokens/s INFO:__main__:2024-10-26 19:16:30 | Epoch: 0 | Step: 5130 | Dataset: 0-4105000 | Loss: 2.700 | 675 ms/step , 58265.20 GFLOP/s , 532684.0 tokens/s INFO:__main__:2024-10-26 19:16:37 | Epoch: 0 | Step: 5140 | Dataset: 0-4113000 | Loss: 2.594 | 675 ms/step , 58214.65 GFLOP/s , 532576.2 tokens/s INFO:__main__:2024-10-26 19:16:45 | Epoch: 0 | Step: 5150 | Dataset: 0-4121000 | Loss: 2.640 | 674 ms/step , 58363.00 GFLOP/s , 533000.9 tokens/s INFO:__main__:2024-10-26 19:16:53 | Epoch: 0 | Step: 5160 | Dataset: 0-4129000 | Loss: 2.567 | 673 ms/step , 58388.77 GFLOP/s , 533438.1 tokens/s INFO:__main__:2024-10-26 19:17:00 | Epoch: 0 | Step: 5170 | Dataset: 0-4137000 | Loss: 2.624 | 675 ms/step , 58250.35 GFLOP/s , 532355.2 tokens/s INFO:__main__:2024-10-26 19:17:08 | Epoch: 0 | Step: 5180 | Dataset: 0-4145000 | Loss: 2.657 | 674 ms/step , 58287.48 GFLOP/s , 533030.6 tokens/s INFO:__main__:2024-10-26 19:17:16 | Epoch: 0 | Step: 5190 | Dataset: 0-4153000 | Loss: 2.558 | 675 ms/step , 58274.03 GFLOP/s , 532453.7 tokens/s INFO:__main__:2024-10-26 19:17:23 | Epoch: 0 | Step: 5200 | Dataset: 0-4161000 | Loss: 2.572 | 674 ms/step , 58329.49 GFLOP/s , 532859.7 tokens/s INFO:__main__:2024-10-26 19:17:31 | Epoch: 0 | Step: 5210 | Dataset: 0-4169000 | Loss: 2.661 | 673 ms/step , 58369.28 GFLOP/s , 533229.5 tokens/s INFO:__main__:2024-10-26 19:17:39 | Epoch: 0 | Step: 5220 | Dataset: 0-4177000 | Loss: 2.551 | 676 ms/step , 58187.25 GFLOP/s , 533130.6 tokens/s INFO:__main__:2024-10-26 19:17:46 | Epoch: 0 | Step: 5230 | Dataset: 0-4185000 | Loss: 2.597 | 676 ms/step , 58189.64 GFLOP/s , 529706.2 tokens/s INFO:__main__:2024-10-26 19:17:54 | Epoch: 0 | Step: 5240 | Dataset: 0-4193000 | Loss: 2.614 | 677 ms/step , 58099.58 GFLOP/s , 527938.9 tokens/s INFO:__main__:2024-10-26 19:18:02 | Epoch: 0 | Step: 5250 | Dataset: 0-4201000 | Loss: 2.528 | 675 ms/step , 58277.85 GFLOP/s , 531503.3 tokens/s INFO:__main__:2024-10-26 19:18:10 | Epoch: 0 | Step: 5260 | Dataset: 0-4209000 | Loss: 2.594 | 674 ms/step , 58314.15 GFLOP/s , 533168.7 tokens/s INFO:__main__:2024-10-26 19:18:17 | Epoch: 0 | Step: 5270 | Dataset: 0-4217000 | Loss: 2.490 | 675 ms/step , 58248.49 GFLOP/s , 533017.4 tokens/s INFO:__main__:2024-10-26 19:18:25 | Epoch: 0 | Step: 5280 | Dataset: 0-4225000 | Loss: 2.572 | 675 ms/step , 58200.51 GFLOP/s , 532579.2 tokens/s INFO:__main__:2024-10-26 19:18:33 | Epoch: 0 | Step: 5290 | Dataset: 0-4233000 | Loss: 2.581 | 678 ms/step , 57987.16 GFLOP/s , 529779.4 tokens/s INFO:__main__:2024-10-26 19:18:40 | Epoch: 0 | Step: 5300 | Dataset: 0-4241000 | Loss: 2.653 | 677 ms/step , 58092.31 GFLOP/s , 529141.9 tokens/s INFO:__main__:2024-10-26 19:18:48 | Epoch: 0 | Step: 5310 | Dataset: 0-4249000 | Loss: 2.622 | 676 ms/step , 58112.20 GFLOP/s , 530838.3 tokens/s INFO:__main__:2024-10-26 19:18:56 | Epoch: 0 | Step: 5320 | Dataset: 0-4257000 | Loss: 2.556 | 675 ms/step , 58203.17 GFLOP/s , 531159.8 tokens/s INFO:__main__:2024-10-26 19:19:04 | Epoch: 0 | Step: 5330 | Dataset: 0-4265000 | Loss: 2.540 | 676 ms/step , 58161.07 GFLOP/s , 530419.3 tokens/s INFO:__main__:2024-10-26 19:19:11 | Epoch: 0 | Step: 5340 | Dataset: 0-4273000 | Loss: 2.654 | 675 ms/step , 58215.82 GFLOP/s , 531367.2 tokens/s INFO:__main__:2024-10-26 19:19:19 | Epoch: 0 | Step: 5350 | Dataset: 0-4281000 | Loss: 2.523 | 676 ms/step , 58138.85 GFLOP/s , 530412.4 tokens/s INFO:__main__:2024-10-26 19:19:27 | Epoch: 0 | Step: 5360 | Dataset: 0-4289000 | Loss: 2.550 | 677 ms/step , 58102.67 GFLOP/s , 529674.3 tokens/s INFO:__main__:2024-10-26 19:19:34 | Epoch: 0 | Step: 5370 | Dataset: 0-4297000 | Loss: 2.573 | 677 ms/step , 58091.21 GFLOP/s , 530209.4 tokens/s INFO:__main__:2024-10-26 19:19:42 | Epoch: 0 | Step: 5380 | Dataset: 0-4305000 | Loss: 2.546 | 674 ms/step , 58309.05 GFLOP/s , 531593.4 tokens/s INFO:__main__:2024-10-26 19:19:50 | Epoch: 0 | Step: 5390 | Dataset: 0-4313000 | Loss: 2.527 | 675 ms/step , 58236.75 GFLOP/s , 531547.7 tokens/s INFO:__main__:2024-10-26 19:19:58 | Epoch: 0 | Step: 5400 | Dataset: 0-4321000 | Loss: 2.422 | 676 ms/step , 58183.76 GFLOP/s , 530613.2 tokens/s INFO:__main__:2024-10-26 19:20:05 | Epoch: 0 | Step: 5410 | Dataset: 0-4329000 | Loss: 2.500 | 674 ms/step , 58307.30 GFLOP/s , 530463.9 tokens/s INFO:__main__:2024-10-26 19:20:13 | Epoch: 0 | Step: 5420 | Dataset: 0-4337000 | Loss: 2.556 | 674 ms/step , 58331.35 GFLOP/s , 532424.8 tokens/s INFO:__main__:2024-10-26 19:20:21 | Epoch: 0 | Step: 5430 | Dataset: 0-4345000 | Loss: 2.550 | 675 ms/step , 58266.75 GFLOP/s , 532271.2 tokens/s INFO:__main__:2024-10-26 19:20:28 | Epoch: 0 | Step: 5440 | Dataset: 0-4353000 | Loss: 2.504 | 673 ms/step , 58388.52 GFLOP/s , 532660.3 tokens/s INFO:__main__:2024-10-26 19:20:36 | Epoch: 0 | Step: 5450 | Dataset: 0-4361000 | Loss: 2.723 | 674 ms/step , 58308.17 GFLOP/s , 532379.5 tokens/s INFO:__main__:2024-10-26 19:20:44 | Epoch: 0 | Step: 5460 | Dataset: 0-4369000 | Loss: 2.460 | 676 ms/step , 58120.15 GFLOP/s , 532376.1 tokens/s INFO:__main__:2024-10-26 19:20:51 | Epoch: 0 | Step: 5470 | Dataset: 0-4377000 | Loss: 2.389 | 674 ms/step , 58313.08 GFLOP/s , 532484.9 tokens/s INFO:__main__:2024-10-26 19:20:59 | Epoch: 0 | Step: 5480 | Dataset: 0-4385000 | Loss: 2.324 | 675 ms/step , 58212.62 GFLOP/s , 532290.4 tokens/s INFO:__main__:2024-10-26 19:21:07 | Epoch: 0 | Step: 5490 | Dataset: 0-4393000 | Loss: 2.239 | 675 ms/step , 58199.72 GFLOP/s , 532899.2 tokens/s INFO:__main__:2024-10-26 19:21:15 | Epoch: 0 | Step: 5500 | Dataset: 0-4401000 | Loss: 2.216 | 675 ms/step , 58247.67 GFLOP/s , 532173.7 tokens/s INFO:__main__:2024-10-26 19:21:22 | Epoch: 0 | Step: 5510 | Dataset: 0-4409000 | Loss: 2.210 | 674 ms/step , 58289.49 GFLOP/s , 532037.3 tokens/s INFO:__main__:2024-10-26 19:21:30 | Epoch: 0 | Step: 5520 | Dataset: 0-4417000 | Loss: 2.192 | 674 ms/step , 58304.15 GFLOP/s , 532520.4 tokens/s INFO:__main__:2024-10-26 19:21:38 | Epoch: 0 | Step: 5530 | Dataset: 0-4425000 | Loss: 2.162 | 675 ms/step , 58251.73 GFLOP/s , 531982.0 tokens/s INFO:__main__:2024-10-26 19:21:45 | Epoch: 0 | Step: 5540 | Dataset: 0-4433000 | Loss: 2.111 | 675 ms/step , 58260.48 GFLOP/s , 530726.4 tokens/s INFO:__main__:2024-10-26 19:21:53 | Epoch: 0 | Step: 5550 | Dataset: 0-4441000 | Loss: 2.095 | 676 ms/step , 58186.68 GFLOP/s , 531682.1 tokens/s INFO:__main__:2024-10-26 19:22:01 | Epoch: 0 | Step: 5560 | Dataset: 0-4449000 | Loss: 2.065 | 674 ms/step , 58329.13 GFLOP/s , 531631.6 tokens/s INFO:__main__:2024-10-26 19:22:08 | Epoch: 0 | Step: 5570 | Dataset: 0-4457000 | Loss: 2.062 | 675 ms/step , 58246.10 GFLOP/s , 531196.5 tokens/s INFO:__main__:2024-10-26 19:22:16 | Epoch: 0 | Step: 5580 | Dataset: 0-4465000 | Loss: 2.041 | 675 ms/step , 58230.00 GFLOP/s , 530627.3 tokens/s INFO:__main__:2024-10-26 19:22:24 | Epoch: 0 | Step: 5590 | Dataset: 0-4473000 | Loss: 2.038 | 676 ms/step , 58169.74 GFLOP/s , 530519.0 tokens/s INFO:__main__:2024-10-26 19:22:32 | Epoch: 0 | Step: 5600 | Dataset: 0-4481000 | Loss: 2.010 | 677 ms/step , 58103.24 GFLOP/s , 530244.1 tokens/s INFO:__main__:2024-10-26 19:22:39 | Epoch: 0 | Step: 5610 | Dataset: 0-4489000 | Loss: 1.966 | 674 ms/step , 58319.83 GFLOP/s , 531575.2 tokens/s INFO:__main__:2024-10-26 19:22:47 | Epoch: 0 | Step: 5620 | Dataset: 0-4497000 | Loss: 2.272 | 674 ms/step , 58285.78 GFLOP/s , 531348.2 tokens/s INFO:__main__:2024-10-26 19:22:55 | Epoch: 0 | Step: 5630 | Dataset: 0-4505000 | Loss: 2.746 | 676 ms/step , 58186.64 GFLOP/s , 531230.3 tokens/s INFO:__main__:2024-10-26 19:23:02 | Epoch: 0 | Step: 5640 | Dataset: 0-4513000 | Loss: 2.671 | 675 ms/step , 58203.19 GFLOP/s , 531210.8 tokens/s INFO:__main__:2024-10-26 19:23:10 | Epoch: 0 | Step: 5650 | Dataset: 0-4521000 | Loss: 2.541 | 675 ms/step , 58250.23 GFLOP/s , 527953.3 tokens/s INFO:__main__:2024-10-26 19:23:18 | Epoch: 0 | Step: 5660 | Dataset: 0-4529000 | Loss: 2.570 | 675 ms/step , 58221.47 GFLOP/s , 530613.8 tokens/s INFO:__main__:2024-10-26 19:23:26 | Epoch: 0 | Step: 5670 | Dataset: 0-4537000 | Loss: 2.580 | 674 ms/step , 58280.68 GFLOP/s , 531506.9 tokens/s INFO:__main__:2024-10-26 19:23:33 | Epoch: 0 | Step: 5680 | Dataset: 0-4545000 | Loss: 2.560 | 673 ms/step , 58388.90 GFLOP/s , 532872.9 tokens/s INFO:__main__:2024-10-26 19:23:41 | Epoch: 0 | Step: 5690 | Dataset: 0-4553000 | Loss: 2.598 | 675 ms/step , 58206.65 GFLOP/s , 532970.0 tokens/s INFO:__main__:2024-10-26 19:23:49 | Epoch: 0 | Step: 5700 | Dataset: 0-4561000 | Loss: 2.486 | 675 ms/step , 58266.29 GFLOP/s , 532922.0 tokens/s INFO:__main__:2024-10-26 19:23:56 | Epoch: 0 | Step: 5710 | Dataset: 0-4569000 | Loss: 2.531 | 675 ms/step , 58197.03 GFLOP/s , 532489.8 tokens/s INFO:__main__:2024-10-26 19:24:04 | Epoch: 0 | Step: 5720 | Dataset: 0-4577000 | Loss: 2.599 | 675 ms/step , 58239.77 GFLOP/s , 532315.1 tokens/s INFO:__main__:2024-10-26 19:24:12 | Epoch: 0 | Step: 5730 | Dataset: 0-4585000 | Loss: 2.601 | 675 ms/step , 58254.62 GFLOP/s , 532474.5 tokens/s INFO:__main__:2024-10-26 19:24:19 | Epoch: 0 | Step: 5740 | Dataset: 0-4593000 | Loss: 2.463 | 675 ms/step , 58270.22 GFLOP/s , 532517.6 tokens/s INFO:__main__:2024-10-26 19:24:27 | Epoch: 0 | Step: 5750 | Dataset: 0-4601000 | Loss: 2.534 | 674 ms/step , 58330.61 GFLOP/s , 532973.8 tokens/s INFO:__main__:2024-10-26 19:24:35 | Epoch: 0 | Step: 5760 | Dataset: 0-4609000 | Loss: 2.481 | 675 ms/step , 58273.30 GFLOP/s , 532165.2 tokens/s INFO:__main__:2024-10-26 19:24:43 | Epoch: 0 | Step: 5770 | Dataset: 0-4617000 | Loss: 2.524 | 674 ms/step , 58282.22 GFLOP/s , 533146.3 tokens/s INFO:__main__:2024-10-26 19:24:50 | Epoch: 0 | Step: 5780 | Dataset: 0-4625000 | Loss: 2.514 | 675 ms/step , 58233.92 GFLOP/s , 532486.1 tokens/s INFO:__main__:2024-10-26 19:24:58 | Epoch: 0 | Step: 5790 | Dataset: 0-4633000 | Loss: 2.627 | 674 ms/step , 58289.06 GFLOP/s , 532700.1 tokens/s INFO:__main__:2024-10-26 19:25:06 | Epoch: 0 | Step: 5800 | Dataset: 0-4641000 | Loss: 2.574 | 675 ms/step , 58230.20 GFLOP/s , 532044.1 tokens/s INFO:__main__:2024-10-26 19:25:13 | Epoch: 0 | Step: 5810 | Dataset: 0-4649000 | Loss: 2.534 | 674 ms/step , 58325.82 GFLOP/s , 532482.3 tokens/s INFO:__main__:2024-10-26 19:25:21 | Epoch: 0 | Step: 5820 | Dataset: 0-4657000 | Loss: 2.621 | 674 ms/step , 58350.83 GFLOP/s , 532920.5 tokens/s INFO:__main__:2024-10-26 19:25:29 | Epoch: 0 | Step: 5830 | Dataset: 0-4665000 | Loss: 2.595 | 675 ms/step , 58201.57 GFLOP/s , 532458.0 tokens/s INFO:__main__:2024-10-26 19:25:36 | Epoch: 0 | Step: 5840 | Dataset: 0-4673000 | Loss: 2.538 | 675 ms/step , 58253.48 GFLOP/s , 532945.9 tokens/s INFO:__main__:2024-10-26 19:25:44 | Epoch: 0 | Step: 5850 | Dataset: 0-4681000 | Loss: 2.510 | 675 ms/step , 58247.77 GFLOP/s , 531595.6 tokens/s INFO:__main__:2024-10-26 19:25:52 | Epoch: 0 | Step: 5860 | Dataset: 0-4689000 | Loss: 2.514 | 674 ms/step , 58337.90 GFLOP/s , 532944.4 tokens/s INFO:__main__:2024-10-26 19:25:59 | Epoch: 0 | Step: 5870 | Dataset: 0-4697000 | Loss: 2.434 | 673 ms/step , 58385.51 GFLOP/s , 532654.2 tokens/s INFO:__main__:2024-10-26 19:26:07 | Epoch: 0 | Step: 5880 | Dataset: 0-4705000 | Loss: 2.537 | 674 ms/step , 58335.78 GFLOP/s , 533103.0 tokens/s INFO:__main__:2024-10-26 19:26:15 | Epoch: 0 | Step: 5890 | Dataset: 0-4713000 | Loss: 2.547 | 676 ms/step , 58133.36 GFLOP/s , 532413.0 tokens/s INFO:__main__:2024-10-26 19:26:23 | Epoch: 0 | Step: 5900 | Dataset: 0-4721000 | Loss: 2.569 | 675 ms/step , 58201.21 GFLOP/s , 532677.9 tokens/s INFO:__main__:2024-10-26 19:26:30 | Epoch: 0 | Step: 5910 | Dataset: 0-4729000 | Loss: 2.478 | 675 ms/step , 58210.22 GFLOP/s , 532193.0 tokens/s INFO:__main__:2024-10-26 19:26:38 | Epoch: 0 | Step: 5920 | Dataset: 0-4737000 | Loss: 2.440 | 676 ms/step , 58167.40 GFLOP/s , 532099.7 tokens/s INFO:__main__:2024-10-26 19:26:46 | Epoch: 0 | Step: 5930 | Dataset: 0-4745000 | Loss: 2.577 | 673 ms/step , 58413.99 GFLOP/s , 531516.4 tokens/s INFO:__main__:2024-10-26 19:26:53 | Epoch: 0 | Step: 5940 | Dataset: 0-4753000 | Loss: 2.489 | 674 ms/step , 58326.37 GFLOP/s , 532445.2 tokens/s INFO:__main__:2024-10-26 19:27:01 | Epoch: 0 | Step: 5950 | Dataset: 0-4761000 | Loss: 2.553 | 675 ms/step , 58244.93 GFLOP/s , 533094.5 tokens/s INFO:__main__:2024-10-26 19:27:09 | Epoch: 0 | Step: 5960 | Dataset: 0-4769000 | Loss: 2.483 | 675 ms/step , 58245.62 GFLOP/s , 532601.4 tokens/s INFO:__main__:2024-10-26 19:27:16 | Epoch: 0 | Step: 5970 | Dataset: 0-4777000 | Loss: 2.566 | 674 ms/step , 58317.46 GFLOP/s , 533361.1 tokens/s INFO:__main__:2024-10-26 19:27:24 | Epoch: 0 | Step: 5980 | Dataset: 0-4785000 | Loss: 2.519 | 674 ms/step , 58306.56 GFLOP/s , 533338.7 tokens/s INFO:__main__:2024-10-26 19:27:32 | Epoch: 0 | Step: 5990 | Dataset: 0-4793000 | Loss: 2.594 | 675 ms/step , 58276.03 GFLOP/s , 533624.2 tokens/s INFO:__main__:2024-10-26 19:27:39 | Validation | Step: 6000 | Val_loss: 2.601 | Best_val_loss: 2.6349 INFO:__main__:2024-10-26 19:27:39 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_192739_step_6000.pt` INFO:__main__:2024-10-26 19:27:40 | Epoch: 0 | Step: 6000 | Dataset: 0-4801000 | Loss: 2.522 | 674 ms/step , 58334.50 GFLOP/s , 476086.1 tokens/s INFO:__main__:2024-10-26 19:27:48 | Epoch: 0 | Step: 6010 | Dataset: 0-4809000 | Loss: 2.491 | 674 ms/step , 58294.99 GFLOP/s , 533243.6 tokens/s INFO:__main__:2024-10-26 19:27:56 | Epoch: 0 | Step: 6020 | Dataset: 0-4817000 | Loss: 2.499 | 675 ms/step , 58265.09 GFLOP/s , 532978.8 tokens/s INFO:__main__:2024-10-26 19:28:03 | Epoch: 0 | Step: 6030 | Dataset: 0-4825000 | Loss: 2.579 | 674 ms/step , 58358.77 GFLOP/s , 533193.3 tokens/s INFO:__main__:2024-10-26 19:28:11 | Epoch: 0 | Step: 6040 | Dataset: 0-4833000 | Loss: 2.525 | 674 ms/step , 58338.87 GFLOP/s , 533597.7 tokens/s INFO:__main__:2024-10-26 19:28:19 | Epoch: 0 | Step: 6050 | Dataset: 0-4841000 | Loss: 2.489 | 674 ms/step , 58334.75 GFLOP/s , 532424.2 tokens/s INFO:__main__:2024-10-26 19:28:26 | Epoch: 0 | Step: 6060 | Dataset: 0-4849000 | Loss: 2.607 | 675 ms/step , 58215.29 GFLOP/s , 533472.9 tokens/s INFO:__main__:2024-10-26 19:28:34 | Epoch: 0 | Step: 6070 | Dataset: 0-4857000 | Loss: 2.536 | 675 ms/step , 58273.43 GFLOP/s , 532761.0 tokens/s INFO:__main__:2024-10-26 19:28:42 | Epoch: 0 | Step: 6080 | Dataset: 0-4865000 | Loss: 2.452 | 677 ms/step , 58083.10 GFLOP/s , 533463.4 tokens/s INFO:__main__:2024-10-26 19:28:50 | Epoch: 0 | Step: 6090 | Dataset: 0-4873000 | Loss: 2.481 | 674 ms/step , 58313.81 GFLOP/s , 532832.9 tokens/s INFO:__main__:2024-10-26 19:28:57 | Epoch: 0 | Step: 6100 | Dataset: 0-4881000 | Loss: 2.536 | 675 ms/step , 58231.03 GFLOP/s , 532919.6 tokens/s INFO:__main__:2024-10-26 19:29:05 | Epoch: 0 | Step: 6110 | Dataset: 0-4889000 | Loss: 2.532 | 675 ms/step , 58271.16 GFLOP/s , 532909.4 tokens/s INFO:__main__:2024-10-26 19:29:13 | Epoch: 0 | Step: 6120 | Dataset: 0-4897000 | Loss: 2.556 | 677 ms/step , 58059.75 GFLOP/s , 533326.8 tokens/s INFO:__main__:2024-10-26 19:29:20 | Epoch: 0 | Step: 6130 | Dataset: 0-4905000 | Loss: 2.580 | 674 ms/step , 58285.27 GFLOP/s , 533337.0 tokens/s INFO:__main__:2024-10-26 19:29:28 | Epoch: 0 | Step: 6140 | Dataset: 0-4913000 | Loss: 2.503 | 675 ms/step , 58261.17 GFLOP/s , 533108.3 tokens/s INFO:__main__:2024-10-26 19:29:36 | Epoch: 0 | Step: 6150 | Dataset: 0-4921000 | Loss: 2.508 | 674 ms/step , 58315.42 GFLOP/s , 532759.0 tokens/s INFO:__main__:2024-10-26 19:29:43 | Epoch: 0 | Step: 6160 | Dataset: 0-4929000 | Loss: 2.587 | 674 ms/step , 58359.32 GFLOP/s , 533299.1 tokens/s INFO:__main__:2024-10-26 19:29:51 | Epoch: 0 | Step: 6170 | Dataset: 0-4937000 | Loss: 2.459 | 676 ms/step , 58131.40 GFLOP/s , 532553.7 tokens/s INFO:__main__:2024-10-26 19:29:59 | Epoch: 0 | Step: 6180 | Dataset: 0-4945000 | Loss: 2.464 | 677 ms/step , 58091.45 GFLOP/s , 532259.6 tokens/s INFO:__main__:2024-10-26 19:30:06 | Epoch: 0 | Step: 6190 | Dataset: 0-4953000 | Loss: 2.495 | 674 ms/step , 58294.56 GFLOP/s , 532226.1 tokens/s INFO:__main__:2024-10-26 19:30:14 | Epoch: 0 | Step: 6200 | Dataset: 0-4961000 | Loss: 2.466 | 676 ms/step , 58111.52 GFLOP/s , 531757.2 tokens/s INFO:__main__:2024-10-26 19:30:22 | Epoch: 0 | Step: 6210 | Dataset: 0-4969000 | Loss: 2.483 | 675 ms/step , 58248.85 GFLOP/s , 531458.9 tokens/s INFO:__main__:2024-10-26 19:30:29 | Epoch: 0 | Step: 6220 | Dataset: 0-4977000 | Loss: 2.588 | 674 ms/step , 58335.28 GFLOP/s , 532944.2 tokens/s INFO:__main__:2024-10-26 19:30:37 | Epoch: 0 | Step: 6230 | Dataset: 0-4985000 | Loss: 2.464 | 675 ms/step , 58263.57 GFLOP/s , 531958.8 tokens/s INFO:__main__:2024-10-26 19:30:45 | Epoch: 0 | Step: 6240 | Dataset: 0-4993000 | Loss: 2.572 | 679 ms/step , 57922.27 GFLOP/s , 531283.9 tokens/s INFO:__main__:2024-10-26 19:30:53 | Epoch: 0 | Step: 6250 | Dataset: 0-5001000 | Loss: 2.502 | 674 ms/step , 58306.96 GFLOP/s , 531603.8 tokens/s INFO:__main__:2024-10-26 19:31:00 | Epoch: 0 | Step: 6260 | Dataset: 0-5009000 | Loss: 2.551 | 674 ms/step , 58333.76 GFLOP/s , 532245.8 tokens/s INFO:__main__:2024-10-26 19:31:08 | Epoch: 0 | Step: 6270 | Dataset: 0-5017000 | Loss: 2.610 | 675 ms/step , 58268.74 GFLOP/s , 531708.1 tokens/s INFO:__main__:2024-10-26 19:31:16 | Epoch: 0 | Step: 6280 | Dataset: 0-5025000 | Loss: 2.486 | 673 ms/step , 58374.16 GFLOP/s , 529361.9 tokens/s INFO:__main__:2024-10-26 19:31:23 | Epoch: 0 | Step: 6290 | Dataset: 0-5033000 | Loss: 2.576 | 675 ms/step , 58263.53 GFLOP/s , 533190.6 tokens/s INFO:__main__:2024-10-26 19:31:31 | Epoch: 0 | Step: 6300 | Dataset: 0-5041000 | Loss: 2.459 | 677 ms/step , 58089.43 GFLOP/s , 531331.0 tokens/s INFO:__main__:2024-10-26 19:31:39 | Epoch: 0 | Step: 6310 | Dataset: 0-5049000 | Loss: 2.428 | 674 ms/step , 58317.76 GFLOP/s , 530476.0 tokens/s INFO:__main__:2024-10-26 19:31:47 | Epoch: 0 | Step: 6320 | Dataset: 0-5057000 | Loss: 2.411 | 674 ms/step , 58324.51 GFLOP/s , 532318.4 tokens/s INFO:__main__:2024-10-26 19:31:54 | Epoch: 0 | Step: 6330 | Dataset: 0-5065000 | Loss: 2.466 | 674 ms/step , 58325.83 GFLOP/s , 532326.5 tokens/s INFO:__main__:2024-10-26 19:32:02 | Epoch: 0 | Step: 6340 | Dataset: 0-5073000 | Loss: 2.460 | 674 ms/step , 58329.49 GFLOP/s , 532206.4 tokens/s INFO:__main__:2024-10-26 19:32:10 | Epoch: 0 | Step: 6350 | Dataset: 0-5081000 | Loss: 2.416 | 675 ms/step , 58277.10 GFLOP/s , 532379.5 tokens/s INFO:__main__:2024-10-26 19:32:17 | Epoch: 0 | Step: 6360 | Dataset: 0-5089000 | Loss: 2.403 | 676 ms/step , 58177.70 GFLOP/s , 532208.0 tokens/s INFO:__main__:2024-10-26 19:32:25 | Epoch: 0 | Step: 6370 | Dataset: 0-5097000 | Loss: 2.394 | 675 ms/step , 58257.83 GFLOP/s , 531544.9 tokens/s INFO:__main__:2024-10-26 19:32:33 | Epoch: 0 | Step: 6380 | Dataset: 0-5105000 | Loss: 2.393 | 675 ms/step , 58234.77 GFLOP/s , 531863.2 tokens/s INFO:__main__:2024-10-26 19:32:40 | Epoch: 0 | Step: 6390 | Dataset: 0-5113000 | Loss: 2.392 | 674 ms/step , 58279.33 GFLOP/s , 531942.7 tokens/s INFO:__main__:2024-10-26 19:32:48 | Epoch: 0 | Step: 6400 | Dataset: 0-5121000 | Loss: 2.380 | 675 ms/step , 58227.69 GFLOP/s , 532259.0 tokens/s INFO:__main__:2024-10-26 19:32:56 | Epoch: 0 | Step: 6410 | Dataset: 0-5129000 | Loss: 2.351 | 675 ms/step , 58226.56 GFLOP/s , 532133.1 tokens/s INFO:__main__:2024-10-26 19:33:04 | Epoch: 0 | Step: 6420 | Dataset: 0-5137000 | Loss: 2.267 | 694 ms/step , 56667.88 GFLOP/s , 530654.6 tokens/s INFO:__main__:2024-10-26 19:33:11 | Epoch: 0 | Step: 6430 | Dataset: 0-5145000 | Loss: 3.397 | 675 ms/step , 58211.99 GFLOP/s , 532272.3 tokens/s INFO:__main__:2024-10-26 19:33:19 | Epoch: 0 | Step: 6440 | Dataset: 0-5153000 | Loss: 2.475 | 675 ms/step , 58249.00 GFLOP/s , 531949.7 tokens/s INFO:__main__:2024-10-26 19:33:27 | Epoch: 0 | Step: 6450 | Dataset: 0-5161000 | Loss: 2.269 | 676 ms/step , 58157.70 GFLOP/s , 531375.3 tokens/s INFO:__main__:2024-10-26 19:33:34 | Epoch: 0 | Step: 6460 | Dataset: 0-5169000 | Loss: 2.185 | 676 ms/step , 58165.43 GFLOP/s , 531574.1 tokens/s INFO:__main__:2024-10-26 19:33:42 | Epoch: 0 | Step: 6470 | Dataset: 0-5177000 | Loss: 2.112 | 677 ms/step , 58104.38 GFLOP/s , 530802.3 tokens/s INFO:__main__:2024-10-26 19:33:50 | Epoch: 0 | Step: 6480 | Dataset: 0-5185000 | Loss: 2.119 | 675 ms/step , 58223.33 GFLOP/s , 531594.2 tokens/s INFO:__main__:2024-10-26 19:33:57 | Epoch: 0 | Step: 6490 | Dataset: 0-5193000 | Loss: 2.036 | 680 ms/step , 57779.33 GFLOP/s , 531401.6 tokens/s INFO:__main__:2024-10-26 19:34:05 | Epoch: 0 | Step: 6500 | Dataset: 0-5201000 | Loss: 2.062 | 676 ms/step , 58142.28 GFLOP/s , 531786.3 tokens/s INFO:__main__:2024-10-26 19:34:13 | Epoch: 0 | Step: 6510 | Dataset: 0-5209000 | Loss: 2.012 | 676 ms/step , 58137.65 GFLOP/s , 530641.3 tokens/s INFO:__main__:2024-10-26 19:34:21 | Epoch: 0 | Step: 6520 | Dataset: 0-5217000 | Loss: 2.700 | 678 ms/step , 58016.04 GFLOP/s , 530409.3 tokens/s INFO:__main__:2024-10-26 19:34:28 | Epoch: 0 | Step: 6530 | Dataset: 0-5225000 | Loss: 2.614 | 679 ms/step , 57872.95 GFLOP/s , 531328.8 tokens/s INFO:__main__:2024-10-26 19:34:36 | Epoch: 0 | Step: 6540 | Dataset: 0-5233000 | Loss: 2.542 | 727 ms/step , 54102.35 GFLOP/s , 528024.1 tokens/s INFO:__main__:2024-10-26 19:34:44 | Epoch: 0 | Step: 6550 | Dataset: 0-5241000 | Loss: 2.566 | 675 ms/step , 58217.03 GFLOP/s , 532175.4 tokens/s INFO:__main__:2024-10-26 19:34:51 | Epoch: 0 | Step: 6560 | Dataset: 0-5249000 | Loss: 2.515 | 674 ms/step , 58284.55 GFLOP/s , 531338.9 tokens/s INFO:__main__:2024-10-26 19:34:59 | Epoch: 0 | Step: 6570 | Dataset: 0-5257000 | Loss: 2.469 | 676 ms/step , 58165.37 GFLOP/s , 531751.7 tokens/s INFO:__main__:2024-10-26 19:35:07 | Epoch: 0 | Step: 6580 | Dataset: 0-5265000 | Loss: 2.502 | 675 ms/step , 58243.30 GFLOP/s , 531571.9 tokens/s INFO:__main__:2024-10-26 19:35:15 | Epoch: 0 | Step: 6590 | Dataset: 0-5273000 | Loss: 2.519 | 674 ms/step , 58335.25 GFLOP/s , 532391.6 tokens/s INFO:__main__:2024-10-26 19:35:22 | Epoch: 0 | Step: 6600 | Dataset: 0-5281000 | Loss: 2.531 | 676 ms/step , 58142.63 GFLOP/s , 532023.2 tokens/s INFO:__main__:2024-10-26 19:35:30 | Epoch: 0 | Step: 6610 | Dataset: 0-5289000 | Loss: 2.478 | 674 ms/step , 58306.14 GFLOP/s , 531990.4 tokens/s INFO:__main__:2024-10-26 19:35:38 | Epoch: 0 | Step: 6620 | Dataset: 0-5297000 | Loss: 2.478 | 675 ms/step , 58238.95 GFLOP/s , 532294.4 tokens/s INFO:__main__:2024-10-26 19:35:45 | Epoch: 0 | Step: 6630 | Dataset: 0-5305000 | Loss: 2.461 | 681 ms/step , 57737.77 GFLOP/s , 530150.8 tokens/s INFO:__main__:2024-10-26 19:35:53 | Epoch: 0 | Step: 6640 | Dataset: 0-5313000 | Loss: 2.522 | 675 ms/step , 58249.52 GFLOP/s , 531254.8 tokens/s INFO:__main__:2024-10-26 19:36:01 | Epoch: 0 | Step: 6650 | Dataset: 0-5321000 | Loss: 2.425 | 677 ms/step , 58100.87 GFLOP/s , 531456.9 tokens/s INFO:__main__:2024-10-26 19:36:09 | Epoch: 0 | Step: 6660 | Dataset: 0-5329000 | Loss: 2.469 | 676 ms/step , 58114.92 GFLOP/s , 529201.8 tokens/s INFO:__main__:2024-10-26 19:36:16 | Epoch: 0 | Step: 6670 | Dataset: 0-5337000 | Loss: 2.471 | 676 ms/step , 58165.58 GFLOP/s , 529449.1 tokens/s INFO:__main__:2024-10-26 19:36:24 | Epoch: 0 | Step: 6680 | Dataset: 0-5345000 | Loss: 2.476 | 675 ms/step , 58265.84 GFLOP/s , 531406.5 tokens/s INFO:__main__:2024-10-26 19:36:32 | Epoch: 0 | Step: 6690 | Dataset: 0-5353000 | Loss: 2.371 | 675 ms/step , 58212.32 GFLOP/s , 531971.5 tokens/s INFO:__main__:2024-10-26 19:36:39 | Epoch: 0 | Step: 6700 | Dataset: 0-5361000 | Loss: 2.308 | 674 ms/step , 58307.05 GFLOP/s , 531224.6 tokens/s INFO:__main__:2024-10-26 19:36:47 | Epoch: 0 | Step: 6710 | Dataset: 0-5369000 | Loss: 2.237 | 674 ms/step , 58282.47 GFLOP/s , 531439.2 tokens/s INFO:__main__:2024-10-26 19:36:55 | Epoch: 0 | Step: 6720 | Dataset: 0-5377000 | Loss: 2.246 | 675 ms/step , 58218.69 GFLOP/s , 531484.3 tokens/s INFO:__main__:2024-10-26 19:37:03 | Epoch: 0 | Step: 6730 | Dataset: 0-5385000 | Loss: 2.250 | 676 ms/step , 58185.09 GFLOP/s , 529963.3 tokens/s INFO:__main__:2024-10-26 19:37:10 | Epoch: 0 | Step: 6740 | Dataset: 0-5393000 | Loss: 2.188 | 675 ms/step , 58234.39 GFLOP/s , 532124.6 tokens/s INFO:__main__:2024-10-26 19:37:18 | Epoch: 0 | Step: 6750 | Dataset: 0-5401000 | Loss: 2.177 | 675 ms/step , 58246.55 GFLOP/s , 531758.2 tokens/s INFO:__main__:2024-10-26 19:37:26 | Epoch: 0 | Step: 6760 | Dataset: 0-5409000 | Loss: 2.206 | 678 ms/step , 58001.88 GFLOP/s , 531746.1 tokens/s INFO:__main__:2024-10-26 19:37:33 | Epoch: 0 | Step: 6770 | Dataset: 0-5417000 | Loss: 2.145 | 676 ms/step , 58183.50 GFLOP/s , 532367.8 tokens/s INFO:__main__:2024-10-26 19:37:41 | Epoch: 0 | Step: 6780 | Dataset: 0-5425000 | Loss: 2.064 | 674 ms/step , 58319.17 GFLOP/s , 530200.2 tokens/s INFO:__main__:2024-10-26 19:37:49 | Epoch: 0 | Step: 6790 | Dataset: 0-5433000 | Loss: 2.061 | 675 ms/step , 58201.02 GFLOP/s , 531553.2 tokens/s INFO:__main__:2024-10-26 19:37:56 | Epoch: 0 | Step: 6800 | Dataset: 0-5441000 | Loss: 2.034 | 675 ms/step , 58208.96 GFLOP/s , 531415.1 tokens/s INFO:__main__:2024-10-26 19:38:04 | Epoch: 0 | Step: 6810 | Dataset: 0-5449000 | Loss: 2.025 | 677 ms/step , 58049.81 GFLOP/s , 532055.0 tokens/s INFO:__main__:2024-10-26 19:38:12 | Epoch: 0 | Step: 6820 | Dataset: 0-5457000 | Loss: 2.013 | 674 ms/step , 58280.85 GFLOP/s , 532100.9 tokens/s INFO:__main__:2024-10-26 19:38:20 | Epoch: 0 | Step: 6830 | Dataset: 0-5465000 | Loss: 2.011 | 675 ms/step , 58264.41 GFLOP/s , 529101.1 tokens/s INFO:__main__:2024-10-26 19:38:27 | Epoch: 0 | Step: 6840 | Dataset: 0-5473000 | Loss: 1.965 | 675 ms/step , 58196.18 GFLOP/s , 532133.0 tokens/s INFO:__main__:2024-10-26 19:38:35 | Epoch: 0 | Step: 6850 | Dataset: 0-5481000 | Loss: 1.965 | 674 ms/step , 58281.06 GFLOP/s , 531721.4 tokens/s INFO:__main__:2024-10-26 19:38:43 | Epoch: 0 | Step: 6860 | Dataset: 0-5489000 | Loss: 1.963 | 675 ms/step , 58205.47 GFLOP/s , 532510.4 tokens/s INFO:__main__:2024-10-26 19:38:50 | Epoch: 0 | Step: 6870 | Dataset: 0-5497000 | Loss: 3.006 | 674 ms/step , 58335.77 GFLOP/s , 532759.3 tokens/s INFO:__main__:2024-10-26 19:38:58 | Epoch: 0 | Step: 6880 | Dataset: 0-5505000 | Loss: 2.699 | 673 ms/step , 58392.37 GFLOP/s , 533437.3 tokens/s INFO:__main__:2024-10-26 19:39:06 | Epoch: 0 | Step: 6890 | Dataset: 0-5513000 | Loss: 2.691 | 674 ms/step , 58285.89 GFLOP/s , 532835.8 tokens/s INFO:__main__:2024-10-26 19:39:13 | Epoch: 0 | Step: 6900 | Dataset: 0-5521000 | Loss: 2.540 | 674 ms/step , 58307.49 GFLOP/s , 533029.3 tokens/s INFO:__main__:2024-10-26 19:39:21 | Epoch: 0 | Step: 6910 | Dataset: 0-5529000 | Loss: 2.509 | 674 ms/step , 58355.48 GFLOP/s , 533362.6 tokens/s INFO:__main__:2024-10-26 19:39:29 | Epoch: 0 | Step: 6920 | Dataset: 0-5537000 | Loss: 2.610 | 674 ms/step , 58320.47 GFLOP/s , 533223.5 tokens/s INFO:__main__:2024-10-26 19:39:36 | Epoch: 0 | Step: 6930 | Dataset: 0-5545000 | Loss: 2.556 | 673 ms/step , 58408.65 GFLOP/s , 534175.9 tokens/s INFO:__main__:2024-10-26 19:39:44 | Epoch: 0 | Step: 6940 | Dataset: 0-5553000 | Loss: 2.531 | 674 ms/step , 58365.57 GFLOP/s , 533309.7 tokens/s INFO:__main__:2024-10-26 19:39:52 | Epoch: 0 | Step: 6950 | Dataset: 0-5561000 | Loss: 2.501 | 674 ms/step , 58316.74 GFLOP/s , 533610.4 tokens/s INFO:__main__:2024-10-26 19:40:00 | Epoch: 0 | Step: 6960 | Dataset: 0-5569000 | Loss: 2.573 | 674 ms/step , 58339.30 GFLOP/s , 532737.8 tokens/s INFO:__main__:2024-10-26 19:40:07 | Epoch: 0 | Step: 6970 | Dataset: 0-5577000 | Loss: 2.573 | 675 ms/step , 58277.13 GFLOP/s , 529737.0 tokens/s INFO:__main__:2024-10-26 19:40:15 | Epoch: 0 | Step: 6980 | Dataset: 0-5585000 | Loss: 2.529 | 673 ms/step , 58446.06 GFLOP/s , 533176.3 tokens/s INFO:__main__:2024-10-26 19:40:23 | Epoch: 0 | Step: 6990 | Dataset: 0-5593000 | Loss: 2.496 | 674 ms/step , 58288.19 GFLOP/s , 533201.4 tokens/s INFO:__main__:2024-10-26 19:40:30 | Validation | Step: 7000 | Val_loss: 2.518 | Best_val_loss: 2.6011 INFO:__main__:2024-10-26 19:40:30 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_194030_step_7000.pt` INFO:__main__:2024-10-26 19:40:31 | Epoch: 0 | Step: 7000 | Dataset: 0-5601000 | Loss: 2.515 | 673 ms/step , 58392.20 GFLOP/s , 479712.6 tokens/s INFO:__main__:2024-10-26 19:40:39 | Epoch: 0 | Step: 7010 | Dataset: 0-5609000 | Loss: 2.488 | 674 ms/step , 58309.02 GFLOP/s , 533154.4 tokens/s INFO:__main__:2024-10-26 19:40:47 | Epoch: 0 | Step: 7020 | Dataset: 0-5617000 | Loss: 2.493 | 674 ms/step , 58307.07 GFLOP/s , 533853.7 tokens/s INFO:__main__:2024-10-26 19:40:54 | Epoch: 0 | Step: 7030 | Dataset: 0-5625000 | Loss: 2.534 | 674 ms/step , 58340.97 GFLOP/s , 533361.2 tokens/s INFO:__main__:2024-10-26 19:41:02 | Epoch: 0 | Step: 7040 | Dataset: 0-5633000 | Loss: 2.522 | 675 ms/step , 58263.01 GFLOP/s , 533235.0 tokens/s INFO:__main__:2024-10-26 19:41:10 | Epoch: 0 | Step: 7050 | Dataset: 0-5641000 | Loss: 2.473 | 674 ms/step , 58338.70 GFLOP/s , 532780.8 tokens/s INFO:__main__:2024-10-26 19:41:17 | Epoch: 0 | Step: 7060 | Dataset: 0-5649000 | Loss: 2.501 | 674 ms/step , 58298.66 GFLOP/s , 533476.8 tokens/s INFO:__main__:2024-10-26 19:41:25 | Epoch: 0 | Step: 7070 | Dataset: 0-5657000 | Loss: 2.546 | 675 ms/step , 58260.60 GFLOP/s , 532852.1 tokens/s INFO:__main__:2024-10-26 19:41:33 | Epoch: 0 | Step: 7080 | Dataset: 0-5665000 | Loss: 2.491 | 675 ms/step , 58241.41 GFLOP/s , 533224.1 tokens/s INFO:__main__:2024-10-26 19:41:40 | Epoch: 0 | Step: 7090 | Dataset: 0-5673000 | Loss: 2.434 | 675 ms/step , 58208.57 GFLOP/s , 533194.5 tokens/s INFO:__main__:2024-10-26 19:41:48 | Epoch: 0 | Step: 7100 | Dataset: 0-5681000 | Loss: 2.473 | 675 ms/step , 58270.17 GFLOP/s , 532937.6 tokens/s INFO:__main__:2024-10-26 19:41:56 | Epoch: 0 | Step: 7110 | Dataset: 0-5689000 | Loss: 2.520 | 675 ms/step , 58261.85 GFLOP/s , 532522.8 tokens/s INFO:__main__:2024-10-26 19:42:03 | Epoch: 0 | Step: 7120 | Dataset: 0-5697000 | Loss: 2.531 | 675 ms/step , 58268.82 GFLOP/s , 533043.6 tokens/s INFO:__main__:2024-10-26 19:42:11 | Epoch: 0 | Step: 7130 | Dataset: 0-5705000 | Loss: 2.519 | 675 ms/step , 58278.65 GFLOP/s , 533630.0 tokens/s INFO:__main__:2024-10-26 19:42:19 | Epoch: 0 | Step: 7140 | Dataset: 0-5713000 | Loss: 2.436 | 673 ms/step , 58392.06 GFLOP/s , 533645.9 tokens/s INFO:__main__:2024-10-26 19:42:26 | Epoch: 0 | Step: 7150 | Dataset: 0-5721000 | Loss: 2.452 | 674 ms/step , 58307.10 GFLOP/s , 533497.2 tokens/s INFO:__main__:2024-10-26 19:42:34 | Epoch: 0 | Step: 7160 | Dataset: 0-5729000 | Loss: 2.479 | 673 ms/step , 58379.87 GFLOP/s , 533359.5 tokens/s INFO:__main__:2024-10-26 19:42:42 | Epoch: 0 | Step: 7170 | Dataset: 0-5737000 | Loss: 2.478 | 675 ms/step , 58246.25 GFLOP/s , 533441.2 tokens/s INFO:__main__:2024-10-26 19:42:49 | Epoch: 0 | Step: 7180 | Dataset: 0-5745000 | Loss: 2.517 | 675 ms/step , 58249.20 GFLOP/s , 533406.2 tokens/s INFO:__main__:2024-10-26 19:42:57 | Epoch: 0 | Step: 7190 | Dataset: 0-5753000 | Loss: 2.432 | 674 ms/step , 58301.54 GFLOP/s , 533138.9 tokens/s INFO:__main__:2024-10-26 19:43:05 | Epoch: 0 | Step: 7200 | Dataset: 0-5761000 | Loss: 2.581 | 675 ms/step , 58259.42 GFLOP/s , 533287.1 tokens/s INFO:__main__:2024-10-26 19:43:12 | Epoch: 0 | Step: 7210 | Dataset: 0-5769000 | Loss: 2.455 | 674 ms/step , 58323.44 GFLOP/s , 533060.9 tokens/s INFO:__main__:2024-10-26 19:43:20 | Epoch: 0 | Step: 7220 | Dataset: 0-5777000 | Loss: 2.507 | 674 ms/step , 58297.67 GFLOP/s , 533482.0 tokens/s INFO:__main__:2024-10-26 19:43:28 | Epoch: 0 | Step: 7230 | Dataset: 0-5785000 | Loss: 2.458 | 674 ms/step , 58297.13 GFLOP/s , 533074.4 tokens/s INFO:__main__:2024-10-26 19:43:36 | Epoch: 0 | Step: 7240 | Dataset: 0-5793000 | Loss: 2.491 | 675 ms/step , 58265.77 GFLOP/s , 533161.3 tokens/s INFO:__main__:2024-10-26 19:43:43 | Epoch: 0 | Step: 7250 | Dataset: 0-5801000 | Loss: 2.472 | 674 ms/step , 58313.26 GFLOP/s , 533136.6 tokens/s INFO:__main__:2024-10-26 19:43:51 | Epoch: 0 | Step: 7260 | Dataset: 0-5809000 | Loss: 2.464 | 675 ms/step , 58278.86 GFLOP/s , 533044.8 tokens/s INFO:__main__:2024-10-26 19:43:59 | Epoch: 0 | Step: 7270 | Dataset: 0-5817000 | Loss: 2.481 | 675 ms/step , 58235.96 GFLOP/s , 532658.4 tokens/s INFO:__main__:2024-10-26 19:44:06 | Epoch: 0 | Step: 7280 | Dataset: 0-5825000 | Loss: 2.446 | 673 ms/step , 58430.70 GFLOP/s , 533445.3 tokens/s INFO:__main__:2024-10-26 19:44:14 | Epoch: 0 | Step: 7290 | Dataset: 0-5833000 | Loss: 2.558 | 674 ms/step , 58335.47 GFLOP/s , 533341.1 tokens/s INFO:__main__:2024-10-26 19:44:22 | Epoch: 0 | Step: 7300 | Dataset: 0-5841000 | Loss: 2.467 | 674 ms/step , 58333.47 GFLOP/s , 532651.5 tokens/s INFO:__main__:2024-10-26 19:44:29 | Epoch: 0 | Step: 7310 | Dataset: 0-5849000 | Loss: 2.387 | 675 ms/step , 58234.64 GFLOP/s , 533013.2 tokens/s INFO:__main__:2024-10-26 19:44:37 | Epoch: 0 | Step: 7320 | Dataset: 0-5857000 | Loss: 2.531 | 673 ms/step , 58383.25 GFLOP/s , 532639.2 tokens/s INFO:__main__:2024-10-26 19:44:45 | Epoch: 0 | Step: 7330 | Dataset: 0-5865000 | Loss: 2.482 | 674 ms/step , 58291.23 GFLOP/s , 533425.6 tokens/s INFO:__main__:2024-10-26 19:44:52 | Epoch: 0 | Step: 7340 | Dataset: 0-5873000 | Loss: 2.494 | 674 ms/step , 58280.77 GFLOP/s , 533235.5 tokens/s INFO:__main__:2024-10-26 19:45:00 | Epoch: 0 | Step: 7350 | Dataset: 0-5881000 | Loss: 2.381 | 675 ms/step , 58247.92 GFLOP/s , 532936.0 tokens/s INFO:__main__:2024-10-26 19:45:08 | Epoch: 0 | Step: 7360 | Dataset: 0-5889000 | Loss: 2.436 | 675 ms/step , 58267.83 GFLOP/s , 533118.1 tokens/s INFO:__main__:2024-10-26 19:45:15 | Epoch: 0 | Step: 7370 | Dataset: 0-5897000 | Loss: 2.517 | 676 ms/step , 58164.79 GFLOP/s , 532100.8 tokens/s INFO:__main__:2024-10-26 19:45:23 | Epoch: 0 | Step: 7380 | Dataset: 0-5905000 | Loss: 2.523 | 675 ms/step , 58205.55 GFLOP/s , 532226.6 tokens/s INFO:__main__:2024-10-26 19:45:31 | Epoch: 0 | Step: 7390 | Dataset: 0-5913000 | Loss: 2.578 | 675 ms/step , 58253.62 GFLOP/s , 532423.9 tokens/s INFO:__main__:2024-10-26 19:45:38 | Epoch: 0 | Step: 7400 | Dataset: 0-5921000 | Loss: 2.614 | 674 ms/step , 58307.16 GFLOP/s , 532840.4 tokens/s INFO:__main__:2024-10-26 19:45:46 | Epoch: 0 | Step: 7410 | Dataset: 0-5929000 | Loss: 2.508 | 674 ms/step , 58341.12 GFLOP/s , 532850.7 tokens/s INFO:__main__:2024-10-26 19:45:54 | Epoch: 0 | Step: 7420 | Dataset: 0-5937000 | Loss: 2.546 | 675 ms/step , 58229.01 GFLOP/s , 532741.6 tokens/s INFO:__main__:2024-10-26 19:46:02 | Epoch: 0 | Step: 7430 | Dataset: 0-5945000 | Loss: 2.451 | 676 ms/step , 58191.21 GFLOP/s , 531786.1 tokens/s INFO:__main__:2024-10-26 19:46:09 | Epoch: 0 | Step: 7440 | Dataset: 0-5953000 | Loss: 2.567 | 676 ms/step , 58179.02 GFLOP/s , 531126.9 tokens/s INFO:__main__:2024-10-26 19:46:17 | Epoch: 0 | Step: 7450 | Dataset: 0-5961000 | Loss: 2.532 | 675 ms/step , 58257.76 GFLOP/s , 533548.2 tokens/s INFO:__main__:2024-10-26 19:46:25 | Epoch: 0 | Step: 7460 | Dataset: 0-5969000 | Loss: 2.541 | 675 ms/step , 58269.15 GFLOP/s , 533390.9 tokens/s INFO:__main__:2024-10-26 19:46:32 | Epoch: 0 | Step: 7470 | Dataset: 0-5977000 | Loss: 2.499 | 674 ms/step , 58313.88 GFLOP/s , 533483.2 tokens/s INFO:__main__:2024-10-26 19:46:40 | Epoch: 0 | Step: 7480 | Dataset: 0-5985000 | Loss: 2.428 | 675 ms/step , 58253.99 GFLOP/s , 532889.1 tokens/s INFO:__main__:2024-10-26 19:46:48 | Epoch: 0 | Step: 7490 | Dataset: 0-5993000 | Loss: 2.511 | 674 ms/step , 58360.14 GFLOP/s , 533801.9 tokens/s INFO:__main__:2024-10-26 19:46:55 | Epoch: 0 | Step: 7500 | Dataset: 0-6001000 | Loss: 2.497 | 675 ms/step , 58271.63 GFLOP/s , 532740.3 tokens/s INFO:__main__:2024-10-26 19:47:03 | Epoch: 0 | Step: 7510 | Dataset: 0-6009000 | Loss: 2.555 | 675 ms/step , 58243.89 GFLOP/s , 532960.9 tokens/s INFO:__main__:2024-10-26 19:47:11 | Epoch: 0 | Step: 7520 | Dataset: 0-6017000 | Loss: 2.496 | 674 ms/step , 58319.04 GFLOP/s , 532772.7 tokens/s INFO:__main__:2024-10-26 19:47:18 | Epoch: 0 | Step: 7530 | Dataset: 0-6025000 | Loss: 2.542 | 673 ms/step , 58433.49 GFLOP/s , 533969.0 tokens/s INFO:__main__:2024-10-26 19:47:26 | Epoch: 0 | Step: 7540 | Dataset: 0-6033000 | Loss: 2.512 | 674 ms/step , 58288.19 GFLOP/s , 533373.6 tokens/s INFO:__main__:2024-10-26 19:47:34 | Epoch: 0 | Step: 7550 | Dataset: 0-6041000 | Loss: 2.488 | 675 ms/step , 58235.91 GFLOP/s , 533491.8 tokens/s INFO:__main__:2024-10-26 19:47:41 | Epoch: 0 | Step: 7560 | Dataset: 0-6049000 | Loss: 2.467 | 674 ms/step , 58307.63 GFLOP/s , 533561.5 tokens/s INFO:__main__:2024-10-26 19:47:49 | Epoch: 0 | Step: 7570 | Dataset: 0-6057000 | Loss: 2.468 | 674 ms/step , 58326.55 GFLOP/s , 533666.5 tokens/s INFO:__main__:2024-10-26 19:47:57 | Epoch: 0 | Step: 7580 | Dataset: 0-6065000 | Loss: 2.450 | 674 ms/step , 58328.73 GFLOP/s , 533654.8 tokens/s INFO:__main__:2024-10-26 19:48:04 | Epoch: 0 | Step: 7590 | Dataset: 0-6073000 | Loss: 2.466 | 674 ms/step , 58355.06 GFLOP/s , 533180.6 tokens/s INFO:__main__:2024-10-26 19:48:12 | Epoch: 0 | Step: 7600 | Dataset: 0-6081000 | Loss: 2.478 | 674 ms/step , 58327.52 GFLOP/s , 533605.6 tokens/s INFO:__main__:2024-10-26 19:48:20 | Epoch: 0 | Step: 7610 | Dataset: 0-6089000 | Loss: 2.439 | 673 ms/step , 58387.67 GFLOP/s , 533451.4 tokens/s INFO:__main__:2024-10-26 19:48:28 | Epoch: 0 | Step: 7620 | Dataset: 0-6097000 | Loss: 2.440 | 673 ms/step , 58371.65 GFLOP/s , 533666.5 tokens/s INFO:__main__:2024-10-26 19:48:35 | Epoch: 0 | Step: 7630 | Dataset: 0-6105000 | Loss: 2.441 | 674 ms/step , 58309.71 GFLOP/s , 533471.5 tokens/s INFO:__main__:2024-10-26 19:48:43 | Epoch: 0 | Step: 7640 | Dataset: 0-6113000 | Loss: 2.442 | 674 ms/step , 58356.77 GFLOP/s , 533714.8 tokens/s INFO:__main__:2024-10-26 19:48:51 | Epoch: 0 | Step: 7650 | Dataset: 0-6121000 | Loss: 2.376 | 675 ms/step , 58220.51 GFLOP/s , 533445.9 tokens/s INFO:__main__:2024-10-26 19:48:58 | Epoch: 0 | Step: 7660 | Dataset: 0-6129000 | Loss: 2.452 | 674 ms/step , 58293.24 GFLOP/s , 533576.6 tokens/s INFO:__main__:2024-10-26 19:49:06 | Epoch: 0 | Step: 7670 | Dataset: 0-6137000 | Loss: 2.367 | 675 ms/step , 58232.60 GFLOP/s , 533729.7 tokens/s INFO:__main__:2024-10-26 19:49:14 | Epoch: 0 | Step: 7680 | Dataset: 0-6145000 | Loss: 2.442 | 674 ms/step , 58357.96 GFLOP/s , 533420.9 tokens/s INFO:__main__:2024-10-26 19:49:21 | Epoch: 0 | Step: 7690 | Dataset: 0-6153000 | Loss: 2.535 | 674 ms/step , 58318.14 GFLOP/s , 533295.7 tokens/s INFO:__main__:2024-10-26 19:49:29 | Epoch: 0 | Step: 7700 | Dataset: 0-6161000 | Loss: 2.423 | 674 ms/step , 58310.65 GFLOP/s , 533360.3 tokens/s INFO:__main__:2024-10-26 19:49:37 | Epoch: 0 | Step: 7710 | Dataset: 0-6169000 | Loss: 2.433 | 674 ms/step , 58363.99 GFLOP/s , 533657.5 tokens/s INFO:__main__:2024-10-26 19:49:44 | Epoch: 0 | Step: 7720 | Dataset: 0-6177000 | Loss: 2.481 | 673 ms/step , 58371.10 GFLOP/s , 533729.9 tokens/s INFO:__main__:2024-10-26 19:49:52 | Epoch: 0 | Step: 7730 | Dataset: 0-6185000 | Loss: 2.423 | 674 ms/step , 58287.73 GFLOP/s , 533681.1 tokens/s INFO:__main__:2024-10-26 19:50:00 | Epoch: 0 | Step: 7740 | Dataset: 0-6193000 | Loss: 2.437 | 683 ms/step , 57545.71 GFLOP/s , 532629.9 tokens/s INFO:__main__:2024-10-26 19:50:07 | Epoch: 0 | Step: 7750 | Dataset: 0-6201000 | Loss: 2.367 | 676 ms/step , 58180.42 GFLOP/s , 532564.0 tokens/s INFO:__main__:2024-10-26 19:50:15 | Epoch: 0 | Step: 7760 | Dataset: 0-6209000 | Loss: 2.428 | 674 ms/step , 58356.47 GFLOP/s , 532410.8 tokens/s INFO:__main__:2024-10-26 19:50:23 | Epoch: 0 | Step: 7770 | Dataset: 0-6217000 | Loss: 2.484 | 674 ms/step , 58343.65 GFLOP/s , 533128.2 tokens/s INFO:__main__:2024-10-26 19:50:30 | Epoch: 0 | Step: 7780 | Dataset: 0-6225000 | Loss: 2.426 | 675 ms/step , 58227.33 GFLOP/s , 533286.3 tokens/s INFO:__main__:2024-10-26 19:50:38 | Epoch: 0 | Step: 7790 | Dataset: 0-6233000 | Loss: 2.408 | 673 ms/step , 58400.57 GFLOP/s , 533334.5 tokens/s INFO:__main__:2024-10-26 19:50:46 | Epoch: 0 | Step: 7800 | Dataset: 0-6241000 | Loss: 2.404 | 683 ms/step , 57562.91 GFLOP/s , 532903.5 tokens/s INFO:__main__:2024-10-26 19:50:53 | Epoch: 0 | Step: 7810 | Dataset: 0-6249000 | Loss: 2.415 | 673 ms/step , 58398.46 GFLOP/s , 533667.7 tokens/s INFO:__main__:2024-10-26 19:51:01 | Epoch: 0 | Step: 7820 | Dataset: 0-6257000 | Loss: 2.429 | 676 ms/step , 58176.90 GFLOP/s , 533726.7 tokens/s INFO:__main__:2024-10-26 19:51:09 | Epoch: 0 | Step: 7830 | Dataset: 0-6265000 | Loss: 2.340 | 674 ms/step , 58332.02 GFLOP/s , 533489.5 tokens/s INFO:__main__:2024-10-26 19:51:16 | Epoch: 0 | Step: 7840 | Dataset: 0-6273000 | Loss: 2.510 | 676 ms/step , 58162.88 GFLOP/s , 532301.0 tokens/s INFO:__main__:2024-10-26 19:51:24 | Epoch: 0 | Step: 7850 | Dataset: 0-6281000 | Loss: 2.521 | 674 ms/step , 58310.95 GFLOP/s , 532219.1 tokens/s INFO:__main__:2024-10-26 19:51:32 | Epoch: 0 | Step: 7860 | Dataset: 0-6289000 | Loss: 2.469 | 674 ms/step , 58347.55 GFLOP/s , 532924.2 tokens/s INFO:__main__:2024-10-26 19:51:40 | Epoch: 0 | Step: 7870 | Dataset: 0-6297000 | Loss: 2.481 | 675 ms/step , 58271.29 GFLOP/s , 533387.2 tokens/s INFO:__main__:2024-10-26 19:51:47 | Epoch: 0 | Step: 7880 | Dataset: 0-6305000 | Loss: 2.404 | 673 ms/step , 58380.96 GFLOP/s , 533672.3 tokens/s INFO:__main__:2024-10-26 19:51:55 | Epoch: 0 | Step: 7890 | Dataset: 0-6313000 | Loss: 2.464 | 674 ms/step , 58347.70 GFLOP/s , 533695.2 tokens/s INFO:__main__:2024-10-26 19:52:03 | Epoch: 0 | Step: 7900 | Dataset: 0-6321000 | Loss: 2.383 | 674 ms/step , 58327.36 GFLOP/s , 532600.5 tokens/s INFO:__main__:2024-10-26 19:52:10 | Epoch: 0 | Step: 7910 | Dataset: 0-6329000 | Loss: 2.493 | 673 ms/step , 58391.66 GFLOP/s , 533743.0 tokens/s INFO:__main__:2024-10-26 19:52:18 | Epoch: 0 | Step: 7920 | Dataset: 0-6337000 | Loss: 2.404 | 676 ms/step , 58189.39 GFLOP/s , 533489.9 tokens/s INFO:__main__:2024-10-26 19:52:26 | Epoch: 0 | Step: 7930 | Dataset: 0-6345000 | Loss: 2.389 | 674 ms/step , 58300.45 GFLOP/s , 533087.9 tokens/s INFO:__main__:2024-10-26 19:52:33 | Epoch: 0 | Step: 7940 | Dataset: 0-6353000 | Loss: 2.526 | 674 ms/step , 58311.49 GFLOP/s , 533120.9 tokens/s INFO:__main__:2024-10-26 19:52:41 | Epoch: 0 | Step: 7950 | Dataset: 0-6361000 | Loss: 2.452 | 674 ms/step , 58299.10 GFLOP/s , 533227.6 tokens/s INFO:__main__:2024-10-26 19:52:49 | Epoch: 0 | Step: 7960 | Dataset: 0-6369000 | Loss: 2.620 | 674 ms/step , 58316.10 GFLOP/s , 533284.8 tokens/s INFO:__main__:2024-10-26 19:52:56 | Epoch: 0 | Step: 7970 | Dataset: 0-6377000 | Loss: 2.407 | 674 ms/step , 58354.59 GFLOP/s , 533013.8 tokens/s INFO:__main__:2024-10-26 19:53:04 | Epoch: 0 | Step: 7980 | Dataset: 0-6385000 | Loss: 2.457 | 674 ms/step , 58280.12 GFLOP/s , 533187.3 tokens/s INFO:__main__:2024-10-26 19:53:12 | Epoch: 0 | Step: 7990 | Dataset: 0-6393000 | Loss: 2.487 | 674 ms/step , 58298.16 GFLOP/s , 533523.8 tokens/s INFO:__main__:2024-10-26 19:53:19 | Validation | Step: 8000 | Val_loss: 2.574 | Best_val_loss: 2.5181 INFO:__main__:2024-10-26 19:53:19 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_195319_step_8000.pt` INFO:__main__:2024-10-26 19:53:20 | Epoch: 0 | Step: 8000 | Dataset: 0-6401000 | Loss: 2.663 | 672 ms/step , 58474.33 GFLOP/s , 481423.4 tokens/s INFO:__main__:2024-10-26 19:53:28 | Epoch: 0 | Step: 8010 | Dataset: 0-6409000 | Loss: 2.480 | 674 ms/step , 58361.00 GFLOP/s , 533397.9 tokens/s INFO:__main__:2024-10-26 19:53:36 | Epoch: 0 | Step: 8020 | Dataset: 0-6417000 | Loss: 2.384 | 674 ms/step , 58296.77 GFLOP/s , 532710.8 tokens/s INFO:__main__:2024-10-26 19:53:43 | Epoch: 0 | Step: 8030 | Dataset: 0-6425000 | Loss: 2.310 | 674 ms/step , 58296.91 GFLOP/s , 533393.4 tokens/s INFO:__main__:2024-10-26 19:53:51 | Epoch: 0 | Step: 8040 | Dataset: 0-6433000 | Loss: 2.282 | 675 ms/step , 58196.35 GFLOP/s , 532950.2 tokens/s INFO:__main__:2024-10-26 19:53:59 | Epoch: 0 | Step: 8050 | Dataset: 0-6441000 | Loss: 2.281 | 675 ms/step , 58199.84 GFLOP/s , 531845.3 tokens/s INFO:__main__:2024-10-26 19:54:06 | Epoch: 0 | Step: 8060 | Dataset: 0-6449000 | Loss: 2.220 | 674 ms/step , 58282.83 GFLOP/s , 532223.6 tokens/s INFO:__main__:2024-10-26 19:54:14 | Epoch: 0 | Step: 8070 | Dataset: 0-6457000 | Loss: 2.258 | 675 ms/step , 58264.51 GFLOP/s , 532932.0 tokens/s INFO:__main__:2024-10-26 19:54:22 | Epoch: 0 | Step: 8080 | Dataset: 0-6465000 | Loss: 2.182 | 674 ms/step , 58280.42 GFLOP/s , 532511.5 tokens/s INFO:__main__:2024-10-26 19:54:29 | Epoch: 0 | Step: 8090 | Dataset: 0-6473000 | Loss: 2.198 | 674 ms/step , 58327.20 GFLOP/s , 533306.0 tokens/s INFO:__main__:2024-10-26 19:54:37 | Epoch: 0 | Step: 8100 | Dataset: 0-6481000 | Loss: 2.224 | 675 ms/step , 58194.87 GFLOP/s , 532294.5 tokens/s INFO:__main__:2024-10-26 19:54:45 | Epoch: 0 | Step: 8110 | Dataset: 0-6489000 | Loss: 2.188 | 675 ms/step , 58197.40 GFLOP/s , 531906.9 tokens/s INFO:__main__:2024-10-26 19:54:52 | Epoch: 0 | Step: 8120 | Dataset: 0-6497000 | Loss: 2.124 | 674 ms/step , 58301.25 GFLOP/s , 532199.7 tokens/s INFO:__main__:2024-10-26 19:55:00 | Epoch: 0 | Step: 8130 | Dataset: 0-6505000 | Loss: 2.175 | 674 ms/step , 58281.15 GFLOP/s , 533129.6 tokens/s INFO:__main__:2024-10-26 19:55:08 | Epoch: 0 | Step: 8140 | Dataset: 0-6513000 | Loss: 2.129 | 683 ms/step , 57537.64 GFLOP/s , 532706.8 tokens/s INFO:__main__:2024-10-26 19:55:16 | Epoch: 0 | Step: 8150 | Dataset: 0-6521000 | Loss: 2.079 | 674 ms/step , 58336.73 GFLOP/s , 533374.2 tokens/s INFO:__main__:2024-10-26 19:55:23 | Epoch: 0 | Step: 8160 | Dataset: 0-6529000 | Loss: 3.520 | 673 ms/step , 58400.84 GFLOP/s , 533602.4 tokens/s INFO:__main__:2024-10-26 19:55:31 | Epoch: 0 | Step: 8170 | Dataset: 0-6537000 | Loss: 2.525 | 674 ms/step , 58341.64 GFLOP/s , 532568.8 tokens/s INFO:__main__:2024-10-26 19:55:39 | Epoch: 0 | Step: 8180 | Dataset: 0-6545000 | Loss: 2.334 | 675 ms/step , 58209.95 GFLOP/s , 532781.7 tokens/s INFO:__main__:2024-10-26 19:55:46 | Epoch: 0 | Step: 8190 | Dataset: 0-6553000 | Loss: 2.220 | 676 ms/step , 58163.64 GFLOP/s , 532545.7 tokens/s INFO:__main__:2024-10-26 19:55:54 | Epoch: 0 | Step: 8200 | Dataset: 0-6561000 | Loss: 2.165 | 675 ms/step , 58250.64 GFLOP/s , 532679.0 tokens/s INFO:__main__:2024-10-26 19:56:02 | Epoch: 0 | Step: 8210 | Dataset: 0-6569000 | Loss: 2.083 | 675 ms/step , 58263.51 GFLOP/s , 532760.8 tokens/s INFO:__main__:2024-10-26 19:56:09 | Epoch: 0 | Step: 8220 | Dataset: 0-6577000 | Loss: 2.089 | 674 ms/step , 58315.76 GFLOP/s , 533205.0 tokens/s INFO:__main__:2024-10-26 19:56:17 | Epoch: 0 | Step: 8230 | Dataset: 0-6585000 | Loss: 2.067 | 674 ms/step , 58297.50 GFLOP/s , 533183.8 tokens/s INFO:__main__:2024-10-26 19:56:25 | Epoch: 0 | Step: 8240 | Dataset: 0-6593000 | Loss: 2.064 | 674 ms/step , 58279.70 GFLOP/s , 532334.3 tokens/s INFO:__main__:2024-10-26 19:56:32 | Epoch: 0 | Step: 8250 | Dataset: 0-6601000 | Loss: 2.067 | 674 ms/step , 58337.22 GFLOP/s , 533320.3 tokens/s INFO:__main__:2024-10-26 19:56:40 | Epoch: 0 | Step: 8260 | Dataset: 0-6609000 | Loss: 2.054 | 676 ms/step , 58153.87 GFLOP/s , 532672.8 tokens/s INFO:__main__:2024-10-26 19:56:48 | Epoch: 0 | Step: 8270 | Dataset: 0-6617000 | Loss: 2.016 | 674 ms/step , 58331.06 GFLOP/s , 532843.2 tokens/s INFO:__main__:2024-10-26 19:56:55 | Epoch: 0 | Step: 8280 | Dataset: 0-6625000 | Loss: 2.000 | 675 ms/step , 58217.50 GFLOP/s , 532133.3 tokens/s INFO:__main__:2024-10-26 19:57:03 | Epoch: 0 | Step: 8290 | Dataset: 0-6633000 | Loss: 1.996 | 674 ms/step , 58353.60 GFLOP/s , 532492.6 tokens/s INFO:__main__:2024-10-26 19:57:11 | Epoch: 0 | Step: 8300 | Dataset: 0-6641000 | Loss: 1.989 | 675 ms/step , 58252.31 GFLOP/s , 532650.8 tokens/s INFO:__main__:2024-10-26 19:57:19 | Epoch: 0 | Step: 8310 | Dataset: 0-6649000 | Loss: 1.996 | 674 ms/step , 58292.01 GFLOP/s , 532144.9 tokens/s INFO:__main__:2024-10-26 19:57:26 | Epoch: 0 | Step: 8320 | Dataset: 0-6657000 | Loss: 1.948 | 674 ms/step , 58314.42 GFLOP/s , 532756.1 tokens/s INFO:__main__:2024-10-26 19:57:34 | Epoch: 0 | Step: 8330 | Dataset: 0-6665000 | Loss: 1.943 | 675 ms/step , 58236.19 GFLOP/s , 532366.1 tokens/s INFO:__main__:2024-10-26 19:57:42 | Epoch: 0 | Step: 8340 | Dataset: 0-6673000 | Loss: 2.978 | 674 ms/step , 58327.64 GFLOP/s , 532862.0 tokens/s INFO:__main__:2024-10-26 19:57:49 | Epoch: 0 | Step: 8350 | Dataset: 0-6681000 | Loss: 2.541 | 674 ms/step , 58318.22 GFLOP/s , 531822.5 tokens/s INFO:__main__:2024-10-26 19:57:57 | Epoch: 0 | Step: 8360 | Dataset: 0-6689000 | Loss: 2.500 | 674 ms/step , 58322.69 GFLOP/s , 533277.2 tokens/s INFO:__main__:2024-10-26 19:58:05 | Epoch: 0 | Step: 8370 | Dataset: 0-6697000 | Loss: 2.426 | 675 ms/step , 58227.92 GFLOP/s , 533272.4 tokens/s INFO:__main__:2024-10-26 19:58:12 | Epoch: 0 | Step: 8380 | Dataset: 0-6705000 | Loss: 2.338 | 673 ms/step , 58394.58 GFLOP/s , 533960.0 tokens/s INFO:__main__:2024-10-26 19:58:20 | Epoch: 0 | Step: 8390 | Dataset: 0-6713000 | Loss: 2.418 | 674 ms/step , 58347.17 GFLOP/s , 533640.3 tokens/s INFO:__main__:2024-10-26 19:58:28 | Epoch: 0 | Step: 8400 | Dataset: 0-6721000 | Loss: 2.316 | 674 ms/step , 58356.51 GFLOP/s , 533753.6 tokens/s INFO:__main__:2024-10-26 19:58:35 | Epoch: 0 | Step: 8410 | Dataset: 0-6729000 | Loss: 2.230 | 675 ms/step , 58199.79 GFLOP/s , 533392.0 tokens/s INFO:__main__:2024-10-26 19:58:43 | Epoch: 0 | Step: 8420 | Dataset: 0-6737000 | Loss: 2.364 | 675 ms/step , 58273.34 GFLOP/s , 533336.2 tokens/s INFO:__main__:2024-10-26 19:58:51 | Epoch: 0 | Step: 8430 | Dataset: 0-6745000 | Loss: 2.262 | 673 ms/step , 58425.85 GFLOP/s , 533742.3 tokens/s INFO:__main__:2024-10-26 19:58:58 | Epoch: 0 | Step: 8440 | Dataset: 0-6753000 | Loss: 2.228 | 675 ms/step , 58242.85 GFLOP/s , 532694.8 tokens/s INFO:__main__:2024-10-26 19:59:06 | Epoch: 0 | Step: 8450 | Dataset: 0-6761000 | Loss: 2.221 | 674 ms/step , 58317.94 GFLOP/s , 533438.9 tokens/s INFO:__main__:2024-10-26 19:59:14 | Epoch: 0 | Step: 8460 | Dataset: 0-6769000 | Loss: 2.236 | 675 ms/step , 58204.36 GFLOP/s , 533036.0 tokens/s INFO:__main__:2024-10-26 19:59:21 | Epoch: 0 | Step: 8470 | Dataset: 0-6777000 | Loss: 2.306 | 673 ms/step , 58406.66 GFLOP/s , 533641.6 tokens/s INFO:__main__:2024-10-26 19:59:29 | Epoch: 0 | Step: 8480 | Dataset: 0-6785000 | Loss: 2.212 | 674 ms/step , 58288.51 GFLOP/s , 533416.0 tokens/s INFO:__main__:2024-10-26 19:59:37 | Epoch: 0 | Step: 8490 | Dataset: 0-6793000 | Loss: 2.191 | 674 ms/step , 58302.96 GFLOP/s , 533892.5 tokens/s INFO:__main__:2024-10-26 19:59:45 | Epoch: 0 | Step: 8500 | Dataset: 0-6801000 | Loss: 2.231 | 676 ms/step , 58158.58 GFLOP/s , 532882.6 tokens/s INFO:__main__:2024-10-26 19:59:52 | Epoch: 0 | Step: 8510 | Dataset: 0-6809000 | Loss: 2.081 | 675 ms/step , 58244.91 GFLOP/s , 532311.0 tokens/s INFO:__main__:2024-10-26 20:00:00 | Epoch: 0 | Step: 8520 | Dataset: 0-6817000 | Loss: 2.016 | 674 ms/step , 58316.10 GFLOP/s , 533094.7 tokens/s INFO:__main__:2024-10-26 20:00:08 | Epoch: 0 | Step: 8530 | Dataset: 0-6825000 | Loss: 1.960 | 675 ms/step , 58263.19 GFLOP/s , 535908.2 tokens/s INFO:__main__:2024-10-26 20:00:15 | Epoch: 0 | Step: 8540 | Dataset: 0-6833000 | Loss: 1.985 | 675 ms/step , 58212.91 GFLOP/s , 531699.7 tokens/s INFO:__main__:2024-10-26 20:00:23 | Epoch: 0 | Step: 8550 | Dataset: 0-6841000 | Loss: 1.994 | 675 ms/step , 58275.07 GFLOP/s , 532946.0 tokens/s INFO:__main__:2024-10-26 20:00:31 | Epoch: 0 | Step: 8560 | Dataset: 0-6849000 | Loss: 1.953 | 674 ms/step , 58348.07 GFLOP/s , 533176.1 tokens/s INFO:__main__:2024-10-26 20:00:38 | Epoch: 0 | Step: 8570 | Dataset: 0-6857000 | Loss: 1.958 | 674 ms/step , 58307.54 GFLOP/s , 532947.3 tokens/s INFO:__main__:2024-10-26 20:00:46 | Epoch: 0 | Step: 8580 | Dataset: 0-6865000 | Loss: 1.955 | 675 ms/step , 58256.41 GFLOP/s , 532939.1 tokens/s INFO:__main__:2024-10-26 20:00:54 | Epoch: 0 | Step: 8590 | Dataset: 0-6873000 | Loss: 2.866 | 674 ms/step , 58328.86 GFLOP/s , 532878.7 tokens/s INFO:__main__:2024-10-26 20:01:01 | Epoch: 0 | Step: 8600 | Dataset: 0-6881000 | Loss: 2.670 | 674 ms/step , 58286.06 GFLOP/s , 532736.2 tokens/s INFO:__main__:2024-10-26 20:01:09 | Epoch: 0 | Step: 8610 | Dataset: 0-6889000 | Loss: 2.561 | 675 ms/step , 58202.45 GFLOP/s , 532287.2 tokens/s INFO:__main__:2024-10-26 20:01:17 | Epoch: 0 | Step: 8620 | Dataset: 0-6897000 | Loss: 2.591 | 674 ms/step , 58304.34 GFLOP/s , 531984.2 tokens/s INFO:__main__:2024-10-26 20:01:24 | Epoch: 0 | Step: 8630 | Dataset: 0-6905000 | Loss: 2.466 | 674 ms/step , 58338.82 GFLOP/s , 533080.3 tokens/s INFO:__main__:2024-10-26 20:01:32 | Epoch: 0 | Step: 8640 | Dataset: 0-6913000 | Loss: 2.540 | 677 ms/step , 58023.58 GFLOP/s , 530431.4 tokens/s INFO:__main__:2024-10-26 20:01:40 | Epoch: 0 | Step: 8650 | Dataset: 0-6921000 | Loss: 2.447 | 674 ms/step , 58284.09 GFLOP/s , 530976.0 tokens/s INFO:__main__:2024-10-26 20:01:48 | Epoch: 0 | Step: 8660 | Dataset: 0-6929000 | Loss: 2.500 | 676 ms/step , 58146.56 GFLOP/s , 531398.3 tokens/s INFO:__main__:2024-10-26 20:01:55 | Epoch: 0 | Step: 8670 | Dataset: 0-6937000 | Loss: 2.471 | 674 ms/step , 58301.42 GFLOP/s , 530390.5 tokens/s INFO:__main__:2024-10-26 20:02:03 | Epoch: 0 | Step: 8680 | Dataset: 0-6945000 | Loss: 2.494 | 674 ms/step , 58364.96 GFLOP/s , 532623.3 tokens/s INFO:__main__:2024-10-26 20:02:11 | Epoch: 0 | Step: 8690 | Dataset: 0-6953000 | Loss: 2.402 | 673 ms/step , 58376.08 GFLOP/s , 532818.1 tokens/s INFO:__main__:2024-10-26 20:02:18 | Epoch: 0 | Step: 8700 | Dataset: 0-6961000 | Loss: 2.450 | 675 ms/step , 58210.69 GFLOP/s , 532581.1 tokens/s INFO:__main__:2024-10-26 20:02:26 | Epoch: 0 | Step: 8710 | Dataset: 0-6969000 | Loss: 2.466 | 676 ms/step , 58143.47 GFLOP/s , 532040.5 tokens/s INFO:__main__:2024-10-26 20:02:34 | Epoch: 0 | Step: 8720 | Dataset: 0-6977000 | Loss: 2.435 | 675 ms/step , 58240.74 GFLOP/s , 532754.0 tokens/s INFO:__main__:2024-10-26 20:02:41 | Epoch: 0 | Step: 8730 | Dataset: 0-6985000 | Loss: 2.422 | 674 ms/step , 58322.36 GFLOP/s , 532683.0 tokens/s INFO:__main__:2024-10-26 20:02:49 | Epoch: 0 | Step: 8740 | Dataset: 0-6993000 | Loss: 2.460 | 674 ms/step , 58289.53 GFLOP/s , 533171.8 tokens/s INFO:__main__:2024-10-26 20:02:57 | Epoch: 0 | Step: 8750 | Dataset: 0-7001000 | Loss: 2.495 | 674 ms/step , 58300.91 GFLOP/s , 532779.2 tokens/s INFO:__main__:2024-10-26 20:03:04 | Epoch: 0 | Step: 8760 | Dataset: 0-7009000 | Loss: 2.489 | 673 ms/step , 58396.30 GFLOP/s , 533208.5 tokens/s INFO:__main__:2024-10-26 20:03:12 | Epoch: 0 | Step: 8770 | Dataset: 0-7017000 | Loss: 2.497 | 675 ms/step , 58257.91 GFLOP/s , 532922.1 tokens/s INFO:__main__:2024-10-26 20:03:20 | Epoch: 0 | Step: 8780 | Dataset: 0-7025000 | Loss: 2.500 | 675 ms/step , 58200.43 GFLOP/s , 532681.0 tokens/s INFO:__main__:2024-10-26 20:03:28 | Epoch: 0 | Step: 8790 | Dataset: 0-7033000 | Loss: 2.528 | 675 ms/step , 58208.16 GFLOP/s , 532220.9 tokens/s INFO:__main__:2024-10-26 20:03:35 | Epoch: 0 | Step: 8800 | Dataset: 0-7041000 | Loss: 2.465 | 675 ms/step , 58272.89 GFLOP/s , 532214.8 tokens/s INFO:__main__:2024-10-26 20:03:43 | Epoch: 0 | Step: 8810 | Dataset: 0-7049000 | Loss: 2.442 | 675 ms/step , 58275.62 GFLOP/s , 532876.2 tokens/s INFO:__main__:2024-10-26 20:03:51 | Epoch: 0 | Step: 8820 | Dataset: 0-7057000 | Loss: 2.423 | 674 ms/step , 58345.98 GFLOP/s , 533170.3 tokens/s INFO:__main__:2024-10-26 20:03:58 | Epoch: 0 | Step: 8830 | Dataset: 0-7065000 | Loss: 2.525 | 675 ms/step , 58227.24 GFLOP/s , 532797.4 tokens/s INFO:__main__:2024-10-26 20:04:06 | Epoch: 0 | Step: 8840 | Dataset: 0-7073000 | Loss: 2.430 | 675 ms/step , 58221.34 GFLOP/s , 532036.4 tokens/s INFO:__main__:2024-10-26 20:04:14 | Epoch: 0 | Step: 8850 | Dataset: 0-7081000 | Loss: 2.431 | 676 ms/step , 58185.17 GFLOP/s , 532471.8 tokens/s INFO:__main__:2024-10-26 20:04:21 | Epoch: 0 | Step: 8860 | Dataset: 0-7089000 | Loss: 2.438 | 674 ms/step , 58282.44 GFLOP/s , 532062.0 tokens/s INFO:__main__:2024-10-26 20:04:29 | Epoch: 0 | Step: 8870 | Dataset: 0-7097000 | Loss: 2.397 | 675 ms/step , 58277.33 GFLOP/s , 532582.0 tokens/s INFO:__main__:2024-10-26 20:04:37 | Epoch: 0 | Step: 8880 | Dataset: 0-7105000 | Loss: 2.394 | 675 ms/step , 58216.28 GFLOP/s , 532651.1 tokens/s INFO:__main__:2024-10-26 20:04:44 | Epoch: 0 | Step: 8890 | Dataset: 0-7113000 | Loss: 2.443 | 675 ms/step , 58215.52 GFLOP/s , 532287.3 tokens/s INFO:__main__:2024-10-26 20:04:52 | Epoch: 0 | Step: 8900 | Dataset: 0-7121000 | Loss: 2.450 | 676 ms/step , 58186.40 GFLOP/s , 531708.2 tokens/s INFO:__main__:2024-10-26 20:05:00 | Epoch: 0 | Step: 8910 | Dataset: 0-7129000 | Loss: 2.385 | 674 ms/step , 58288.47 GFLOP/s , 532361.3 tokens/s INFO:__main__:2024-10-26 20:05:08 | Epoch: 0 | Step: 8920 | Dataset: 0-7137000 | Loss: 2.061 | 675 ms/step , 58241.26 GFLOP/s , 532345.9 tokens/s INFO:__main__:2024-10-26 20:05:15 | Epoch: 0 | Step: 8930 | Dataset: 0-7145000 | Loss: 1.971 | 675 ms/step , 58257.31 GFLOP/s , 532147.5 tokens/s INFO:__main__:2024-10-26 20:05:23 | Epoch: 0 | Step: 8940 | Dataset: 0-7153000 | Loss: 1.946 | 675 ms/step , 58234.93 GFLOP/s , 532227.5 tokens/s INFO:__main__:2024-10-26 20:05:31 | Epoch: 0 | Step: 8950 | Dataset: 0-7161000 | Loss: 1.876 | 679 ms/step , 57927.80 GFLOP/s , 530679.5 tokens/s INFO:__main__:2024-10-26 20:05:38 | Epoch: 0 | Step: 8960 | Dataset: 0-7169000 | Loss: 1.842 | 676 ms/step , 58142.17 GFLOP/s , 530814.8 tokens/s INFO:__main__:2024-10-26 20:05:46 | Epoch: 0 | Step: 8970 | Dataset: 0-7177000 | Loss: 1.866 | 677 ms/step , 58079.58 GFLOP/s , 531111.6 tokens/s INFO:__main__:2024-10-26 20:05:54 | Epoch: 0 | Step: 8980 | Dataset: 0-7185000 | Loss: 1.851 | 675 ms/step , 58277.34 GFLOP/s , 532087.1 tokens/s INFO:__main__:2024-10-26 20:06:02 | Epoch: 0 | Step: 8990 | Dataset: 0-7193000 | Loss: 1.836 | 675 ms/step , 58262.50 GFLOP/s , 532158.3 tokens/s INFO:__main__:2024-10-26 20:06:09 | Validation | Step: 9000 | Val_loss: 2.589 | Best_val_loss: 2.5181 INFO:__main__:2024-10-26 20:06:09 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_200609_step_9000.pt` INFO:__main__:2024-10-26 20:06:10 | Epoch: 0 | Step: 9000 | Dataset: 0-7201000 | Loss: 2.601 | 672 ms/step , 58462.94 GFLOP/s , 477363.3 tokens/s INFO:__main__:2024-10-26 20:06:18 | Epoch: 0 | Step: 9010 | Dataset: 0-7209000 | Loss: 2.577 | 675 ms/step , 58261.24 GFLOP/s , 532406.8 tokens/s INFO:__main__:2024-10-26 20:06:25 | Epoch: 0 | Step: 9020 | Dataset: 0-7217000 | Loss: 2.510 | 676 ms/step , 58170.36 GFLOP/s , 532559.5 tokens/s INFO:__main__:2024-10-26 20:06:33 | Epoch: 0 | Step: 9030 | Dataset: 0-7225000 | Loss: 2.461 | 674 ms/step , 58308.45 GFLOP/s , 532258.5 tokens/s INFO:__main__:2024-10-26 20:06:41 | Epoch: 0 | Step: 9040 | Dataset: 0-7233000 | Loss: 2.421 | 675 ms/step , 58277.76 GFLOP/s , 532635.4 tokens/s INFO:__main__:2024-10-26 20:06:49 | Epoch: 0 | Step: 9050 | Dataset: 0-7241000 | Loss: 2.422 | 674 ms/step , 58282.96 GFLOP/s , 532312.8 tokens/s INFO:__main__:2024-10-26 20:06:56 | Epoch: 0 | Step: 9060 | Dataset: 0-7249000 | Loss: 2.448 | 675 ms/step , 58213.13 GFLOP/s , 531934.0 tokens/s INFO:__main__:2024-10-26 20:07:04 | Epoch: 0 | Step: 9070 | Dataset: 0-7257000 | Loss: 2.361 | 674 ms/step , 58291.07 GFLOP/s , 532737.0 tokens/s INFO:__main__:2024-10-26 20:07:12 | Epoch: 0 | Step: 9080 | Dataset: 0-7265000 | Loss: 2.389 | 675 ms/step , 58230.30 GFLOP/s , 532835.7 tokens/s INFO:__main__:2024-10-26 20:07:19 | Epoch: 0 | Step: 9090 | Dataset: 0-7273000 | Loss: 2.423 | 676 ms/step , 58189.68 GFLOP/s , 532140.0 tokens/s INFO:__main__:2024-10-26 20:07:27 | Epoch: 0 | Step: 9100 | Dataset: 0-7281000 | Loss: 2.388 | 674 ms/step , 58298.41 GFLOP/s , 532994.4 tokens/s INFO:__main__:2024-10-26 20:07:35 | Epoch: 0 | Step: 9110 | Dataset: 0-7289000 | Loss: 2.399 | 675 ms/step , 58233.45 GFLOP/s , 532497.0 tokens/s INFO:__main__:2024-10-26 20:07:42 | Epoch: 0 | Step: 9120 | Dataset: 0-7297000 | Loss: 2.465 | 675 ms/step , 58247.15 GFLOP/s , 531265.0 tokens/s INFO:__main__:2024-10-26 20:07:50 | Epoch: 0 | Step: 9130 | Dataset: 0-7305000 | Loss: 2.389 | 674 ms/step , 58315.08 GFLOP/s , 532502.0 tokens/s INFO:__main__:2024-10-26 20:07:58 | Epoch: 0 | Step: 9140 | Dataset: 0-7313000 | Loss: 2.412 | 675 ms/step , 58257.07 GFLOP/s , 532674.4 tokens/s INFO:__main__:2024-10-26 20:08:05 | Epoch: 0 | Step: 9150 | Dataset: 0-7321000 | Loss: 2.391 | 674 ms/step , 58327.65 GFLOP/s , 533042.2 tokens/s INFO:__main__:2024-10-26 20:08:13 | Epoch: 0 | Step: 9160 | Dataset: 0-7329000 | Loss: 2.115 | 673 ms/step , 58386.03 GFLOP/s , 533199.1 tokens/s INFO:__main__:2024-10-26 20:08:21 | Epoch: 0 | Step: 9170 | Dataset: 0-7337000 | Loss: 2.001 | 673 ms/step , 58379.48 GFLOP/s , 532353.0 tokens/s INFO:__main__:2024-10-26 20:08:29 | Epoch: 0 | Step: 9180 | Dataset: 0-7345000 | Loss: 1.979 | 673 ms/step , 58420.78 GFLOP/s , 531882.5 tokens/s INFO:__main__:2024-10-26 20:08:36 | Epoch: 0 | Step: 9190 | Dataset: 0-7353000 | Loss: 1.927 | 673 ms/step , 58369.03 GFLOP/s , 533392.8 tokens/s INFO:__main__:2024-10-26 20:08:44 | Epoch: 0 | Step: 9200 | Dataset: 0-7361000 | Loss: 1.937 | 675 ms/step , 58246.15 GFLOP/s , 532694.2 tokens/s INFO:__main__:2024-10-26 20:08:52 | Epoch: 0 | Step: 9210 | Dataset: 0-7369000 | Loss: 1.941 | 674 ms/step , 58305.78 GFLOP/s , 532743.7 tokens/s INFO:__main__:2024-10-26 20:08:59 | Epoch: 0 | Step: 9220 | Dataset: 0-7377000 | Loss: 1.874 | 673 ms/step , 58402.86 GFLOP/s , 532565.4 tokens/s INFO:__main__:2024-10-26 20:09:07 | Epoch: 0 | Step: 9230 | Dataset: 0-7385000 | Loss: 1.927 | 675 ms/step , 58268.38 GFLOP/s , 532532.8 tokens/s INFO:__main__:2024-10-26 20:09:15 | Epoch: 0 | Step: 9240 | Dataset: 0-7393000 | Loss: 1.944 | 675 ms/step , 58209.34 GFLOP/s , 532326.4 tokens/s INFO:__main__:2024-10-26 20:09:22 | Epoch: 0 | Step: 9250 | Dataset: 0-7401000 | Loss: 2.606 | 675 ms/step , 58256.77 GFLOP/s , 533007.6 tokens/s INFO:__main__:2024-10-26 20:09:30 | Epoch: 0 | Step: 9260 | Dataset: 0-7409000 | Loss: 2.600 | 674 ms/step , 58364.40 GFLOP/s , 533394.5 tokens/s INFO:__main__:2024-10-26 20:09:38 | Epoch: 0 | Step: 9270 | Dataset: 0-7417000 | Loss: 2.528 | 673 ms/step , 58378.39 GFLOP/s , 533075.4 tokens/s INFO:__main__:2024-10-26 20:09:45 | Epoch: 0 | Step: 9280 | Dataset: 0-7425000 | Loss: 2.473 | 674 ms/step , 58304.64 GFLOP/s , 532897.8 tokens/s INFO:__main__:2024-10-26 20:09:53 | Epoch: 0 | Step: 9290 | Dataset: 0-7433000 | Loss: 2.512 | 674 ms/step , 58348.06 GFLOP/s , 533218.6 tokens/s INFO:__main__:2024-10-26 20:10:01 | Epoch: 0 | Step: 9300 | Dataset: 0-7441000 | Loss: 2.535 | 674 ms/step , 58287.73 GFLOP/s , 533542.6 tokens/s INFO:__main__:2024-10-26 20:10:08 | Epoch: 0 | Step: 9310 | Dataset: 0-7449000 | Loss: 2.477 | 673 ms/step , 58413.82 GFLOP/s , 533145.1 tokens/s INFO:__main__:2024-10-26 20:10:16 | Epoch: 0 | Step: 9320 | Dataset: 0-7457000 | Loss: 2.443 | 674 ms/step , 58363.21 GFLOP/s , 533097.9 tokens/s INFO:__main__:2024-10-26 20:10:24 | Epoch: 0 | Step: 9330 | Dataset: 0-7465000 | Loss: 2.546 | 673 ms/step , 58379.67 GFLOP/s , 533128.0 tokens/s INFO:__main__:2024-10-26 20:10:32 | Epoch: 0 | Step: 9340 | Dataset: 0-7473000 | Loss: 2.405 | 673 ms/step , 58420.48 GFLOP/s , 533516.6 tokens/s INFO:__main__:2024-10-26 20:10:39 | Epoch: 0 | Step: 9350 | Dataset: 0-7481000 | Loss: 2.382 | 674 ms/step , 58326.27 GFLOP/s , 533090.7 tokens/s INFO:__main__:2024-10-26 20:10:47 | Epoch: 0 | Step: 9360 | Dataset: 0-7489000 | Loss: 2.443 | 674 ms/step , 58321.81 GFLOP/s , 533107.6 tokens/s INFO:__main__:2024-10-26 20:10:55 | Epoch: 0 | Step: 9370 | Dataset: 0-7497000 | Loss: 2.439 | 675 ms/step , 58252.32 GFLOP/s , 533199.4 tokens/s INFO:__main__:2024-10-26 20:11:02 | Epoch: 0 | Step: 9380 | Dataset: 0-7505000 | Loss: 2.481 | 674 ms/step , 58330.82 GFLOP/s , 533028.2 tokens/s INFO:__main__:2024-10-26 20:11:10 | Epoch: 0 | Step: 9390 | Dataset: 0-7513000 | Loss: 2.491 | 674 ms/step , 58351.86 GFLOP/s , 532705.9 tokens/s INFO:__main__:2024-10-26 20:11:18 | Epoch: 0 | Step: 9400 | Dataset: 0-7521000 | Loss: 2.501 | 673 ms/step , 58414.51 GFLOP/s , 533069.8 tokens/s INFO:__main__:2024-10-26 20:11:25 | Epoch: 0 | Step: 9410 | Dataset: 0-7529000 | Loss: 2.517 | 673 ms/step , 58383.32 GFLOP/s , 533353.8 tokens/s INFO:__main__:2024-10-26 20:11:33 | Epoch: 0 | Step: 9420 | Dataset: 0-7537000 | Loss: 2.512 | 674 ms/step , 58323.27 GFLOP/s , 532614.7 tokens/s INFO:__main__:2024-10-26 20:11:41 | Epoch: 0 | Step: 9430 | Dataset: 0-7545000 | Loss: 2.431 | 675 ms/step , 58209.18 GFLOP/s , 532167.4 tokens/s INFO:__main__:2024-10-26 20:11:48 | Epoch: 0 | Step: 9440 | Dataset: 0-7553000 | Loss: 2.435 | 675 ms/step , 58234.76 GFLOP/s , 531706.2 tokens/s INFO:__main__:2024-10-26 20:11:56 | Epoch: 0 | Step: 9450 | Dataset: 0-7561000 | Loss: 2.402 | 674 ms/step , 58320.33 GFLOP/s , 531978.2 tokens/s INFO:__main__:2024-10-26 20:12:04 | Epoch: 0 | Step: 9460 | Dataset: 0-7569000 | Loss: 2.414 | 675 ms/step , 58270.49 GFLOP/s , 532274.6 tokens/s INFO:__main__:2024-10-26 20:12:11 | Epoch: 0 | Step: 9470 | Dataset: 0-7577000 | Loss: 2.414 | 674 ms/step , 58314.31 GFLOP/s , 531951.5 tokens/s INFO:__main__:2024-10-26 20:12:19 | Epoch: 0 | Step: 9480 | Dataset: 0-7585000 | Loss: 2.377 | 674 ms/step , 58327.62 GFLOP/s , 532230.2 tokens/s INFO:__main__:2024-10-26 20:12:27 | Epoch: 0 | Step: 9490 | Dataset: 0-7593000 | Loss: 2.402 | 675 ms/step , 58210.99 GFLOP/s , 531112.6 tokens/s INFO:__main__:2024-10-26 20:12:35 | Epoch: 0 | Step: 9500 | Dataset: 0-7601000 | Loss: 2.370 | 675 ms/step , 58233.66 GFLOP/s , 530358.1 tokens/s INFO:__main__:2024-10-26 20:12:42 | Epoch: 0 | Step: 9510 | Dataset: 0-7609000 | Loss: 2.421 | 674 ms/step , 58335.71 GFLOP/s , 532668.7 tokens/s INFO:__main__:2024-10-26 20:12:50 | Epoch: 0 | Step: 9520 | Dataset: 0-7617000 | Loss: 2.374 | 674 ms/step , 58359.59 GFLOP/s , 533251.4 tokens/s INFO:__main__:2024-10-26 20:12:58 | Epoch: 0 | Step: 9530 | Dataset: 0-7625000 | Loss: 2.430 | 675 ms/step , 58263.95 GFLOP/s , 533086.7 tokens/s INFO:__main__:2024-10-26 20:13:05 | Epoch: 0 | Step: 9540 | Dataset: 0-7633000 | Loss: 2.375 | 674 ms/step , 58280.58 GFLOP/s , 532124.8 tokens/s INFO:__main__:2024-10-26 20:13:13 | Epoch: 0 | Step: 9550 | Dataset: 0-7641000 | Loss: 2.463 | 675 ms/step , 58253.50 GFLOP/s , 532878.9 tokens/s INFO:__main__:2024-10-26 20:13:21 | Epoch: 0 | Step: 9560 | Dataset: 0-7649000 | Loss: 2.374 | 674 ms/step , 58365.42 GFLOP/s , 532481.5 tokens/s INFO:__main__:2024-10-26 20:13:28 | Epoch: 0 | Step: 9570 | Dataset: 0-7657000 | Loss: 2.410 | 675 ms/step , 58211.10 GFLOP/s , 532762.0 tokens/s INFO:__main__:2024-10-26 20:13:36 | Epoch: 0 | Step: 9580 | Dataset: 0-7665000 | Loss: 2.430 | 675 ms/step , 58211.11 GFLOP/s , 529680.4 tokens/s INFO:__main__:2024-10-26 20:13:44 | Epoch: 0 | Step: 9590 | Dataset: 0-7673000 | Loss: 2.451 | 675 ms/step , 58215.46 GFLOP/s , 530477.5 tokens/s INFO:__main__:2024-10-26 20:13:52 | Epoch: 0 | Step: 9600 | Dataset: 0-7681000 | Loss: 2.489 | 676 ms/step , 58124.27 GFLOP/s , 531609.2 tokens/s INFO:__main__:2024-10-26 20:13:59 | Epoch: 0 | Step: 9610 | Dataset: 0-7689000 | Loss: 2.459 | 676 ms/step , 58120.69 GFLOP/s , 529930.0 tokens/s INFO:__main__:2024-10-26 20:14:07 | Epoch: 0 | Step: 9620 | Dataset: 0-7697000 | Loss: 2.457 | 676 ms/step , 58166.78 GFLOP/s , 530488.6 tokens/s INFO:__main__:2024-10-26 20:14:15 | Epoch: 0 | Step: 9630 | Dataset: 0-7705000 | Loss: 2.447 | 676 ms/step , 58161.12 GFLOP/s , 530671.3 tokens/s INFO:__main__:2024-10-26 20:14:22 | Epoch: 0 | Step: 9640 | Dataset: 0-7713000 | Loss: 2.454 | 676 ms/step , 58115.51 GFLOP/s , 530376.6 tokens/s INFO:__main__:2024-10-26 20:14:30 | Epoch: 0 | Step: 9650 | Dataset: 0-7721000 | Loss: 2.407 | 675 ms/step , 58202.09 GFLOP/s , 530792.6 tokens/s INFO:__main__:2024-10-26 20:14:38 | Epoch: 0 | Step: 9660 | Dataset: 0-7729000 | Loss: 2.425 | 677 ms/step , 58085.92 GFLOP/s , 528708.7 tokens/s INFO:__main__:2024-10-26 20:14:46 | Epoch: 0 | Step: 9670 | Dataset: 0-7737000 | Loss: 2.379 | 676 ms/step , 58113.92 GFLOP/s , 530736.3 tokens/s INFO:__main__:2024-10-26 20:14:53 | Epoch: 0 | Step: 9680 | Dataset: 0-7745000 | Loss: 2.414 | 680 ms/step , 57807.42 GFLOP/s , 530221.2 tokens/s INFO:__main__:2024-10-26 20:15:01 | Epoch: 0 | Step: 9690 | Dataset: 0-7753000 | Loss: 2.384 | 677 ms/step , 58100.70 GFLOP/s , 528451.7 tokens/s INFO:__main__:2024-10-26 20:15:09 | Epoch: 0 | Step: 9700 | Dataset: 0-7761000 | Loss: 2.435 | 677 ms/step , 58088.01 GFLOP/s , 530690.2 tokens/s INFO:__main__:2024-10-26 20:15:17 | Epoch: 0 | Step: 9710 | Dataset: 0-7769000 | Loss: 2.485 | 682 ms/step , 57645.59 GFLOP/s , 528047.4 tokens/s INFO:__main__:2024-10-26 20:15:24 | Epoch: 0 | Step: 9720 | Dataset: 0-7777000 | Loss: 2.440 | 675 ms/step , 58200.97 GFLOP/s , 529440.7 tokens/s INFO:__main__:2024-10-26 20:15:32 | Epoch: 0 | Step: 9730 | Dataset: 0-7785000 | Loss: 2.467 | 680 ms/step , 57809.36 GFLOP/s , 529004.5 tokens/s INFO:__main__:2024-10-26 20:15:40 | Epoch: 0 | Step: 9740 | Dataset: 0-7793000 | Loss: 2.380 | 675 ms/step , 58215.05 GFLOP/s , 526247.2 tokens/s INFO:__main__:2024-10-26 20:15:48 | Epoch: 0 | Step: 9750 | Dataset: 0-7801000 | Loss: 2.396 | 680 ms/step , 57776.81 GFLOP/s , 529506.2 tokens/s INFO:__main__:2024-10-26 20:15:55 | Epoch: 0 | Step: 9760 | Dataset: 0-7809000 | Loss: 2.408 | 681 ms/step , 57756.64 GFLOP/s , 528796.6 tokens/s INFO:__main__:2024-10-26 20:16:03 | Epoch: 0 | Step: 9770 | Dataset: 0-7817000 | Loss: 2.399 | 675 ms/step , 58260.65 GFLOP/s , 528593.7 tokens/s INFO:__main__:2024-10-26 20:16:11 | Epoch: 0 | Step: 9780 | Dataset: 0-7825000 | Loss: 2.396 | 676 ms/step , 58187.95 GFLOP/s , 528983.2 tokens/s INFO:__main__:2024-10-26 20:16:19 | Epoch: 0 | Step: 9790 | Dataset: 0-7833000 | Loss: 2.365 | 676 ms/step , 58155.22 GFLOP/s , 531680.0 tokens/s INFO:__main__:2024-10-26 20:16:26 | Epoch: 0 | Step: 9800 | Dataset: 0-7841000 | Loss: 2.422 | 676 ms/step , 58157.80 GFLOP/s , 530184.2 tokens/s INFO:__main__:2024-10-26 20:16:34 | Epoch: 0 | Step: 9810 | Dataset: 0-7849000 | Loss: 2.323 | 674 ms/step , 58299.73 GFLOP/s , 531037.2 tokens/s INFO:__main__:2024-10-26 20:16:42 | Epoch: 0 | Step: 9820 | Dataset: 0-7857000 | Loss: 2.405 | 674 ms/step , 58291.53 GFLOP/s , 531561.2 tokens/s INFO:__main__:2024-10-26 20:16:49 | Epoch: 0 | Step: 9830 | Dataset: 0-7865000 | Loss: 2.248 | 676 ms/step , 58180.41 GFLOP/s , 530815.2 tokens/s INFO:__main__:2024-10-26 20:16:57 | Epoch: 0 | Step: 9840 | Dataset: 0-7873000 | Loss: 2.416 | 675 ms/step , 58204.64 GFLOP/s , 529976.1 tokens/s INFO:__main__:2024-10-26 20:17:05 | Epoch: 0 | Step: 9850 | Dataset: 0-7881000 | Loss: 2.436 | 676 ms/step , 58143.82 GFLOP/s , 531681.9 tokens/s INFO:__main__:2024-10-26 20:17:13 | Epoch: 0 | Step: 9860 | Dataset: 0-7889000 | Loss: 2.361 | 675 ms/step , 58238.67 GFLOP/s , 532446.4 tokens/s INFO:__main__:2024-10-26 20:17:20 | Epoch: 0 | Step: 9870 | Dataset: 0-7897000 | Loss: 2.314 | 674 ms/step , 58303.78 GFLOP/s , 531532.4 tokens/s INFO:__main__:2024-10-26 20:17:28 | Epoch: 0 | Step: 9880 | Dataset: 0-7905000 | Loss: 2.389 | 675 ms/step , 58252.42 GFLOP/s , 532316.7 tokens/s INFO:__main__:2024-10-26 20:17:36 | Epoch: 0 | Step: 9890 | Dataset: 0-7913000 | Loss: 2.431 | 674 ms/step , 58281.18 GFLOP/s , 531969.4 tokens/s INFO:__main__:2024-10-26 20:17:43 | Epoch: 0 | Step: 9900 | Dataset: 0-7921000 | Loss: 2.437 | 674 ms/step , 58289.76 GFLOP/s , 532521.5 tokens/s INFO:__main__:2024-10-26 20:17:51 | Epoch: 0 | Step: 9910 | Dataset: 0-7929000 | Loss: 2.510 | 674 ms/step , 58297.70 GFLOP/s , 532199.5 tokens/s INFO:__main__:2024-10-26 20:17:59 | Epoch: 0 | Step: 9920 | Dataset: 0-7937000 | Loss: 2.427 | 675 ms/step , 58253.25 GFLOP/s , 531960.1 tokens/s INFO:__main__:2024-10-26 20:18:06 | Epoch: 0 | Step: 9930 | Dataset: 0-7945000 | Loss: 2.473 | 676 ms/step , 58183.18 GFLOP/s , 527690.7 tokens/s INFO:__main__:2024-10-26 20:18:14 | Epoch: 0 | Step: 9940 | Dataset: 0-7953000 | Loss: 2.410 | 674 ms/step , 58363.95 GFLOP/s , 531716.8 tokens/s INFO:__main__:2024-10-26 20:18:22 | Epoch: 0 | Step: 9950 | Dataset: 0-7961000 | Loss: 2.430 | 674 ms/step , 58357.00 GFLOP/s , 532065.5 tokens/s INFO:__main__:2024-10-26 20:18:30 | Epoch: 0 | Step: 9960 | Dataset: 0-7969000 | Loss: 2.371 | 674 ms/step , 58322.07 GFLOP/s , 530432.9 tokens/s INFO:__main__:2024-10-26 20:18:37 | Epoch: 0 | Step: 9970 | Dataset: 0-7977000 | Loss: 2.422 | 675 ms/step , 58227.13 GFLOP/s , 531888.1 tokens/s INFO:__main__:2024-10-26 20:18:45 | Epoch: 0 | Step: 9980 | Dataset: 0-7985000 | Loss: 2.397 | 675 ms/step , 58272.08 GFLOP/s , 531847.7 tokens/s INFO:__main__:2024-10-26 20:18:53 | Epoch: 0 | Step: 9990 | Dataset: 0-7993000 | Loss: 2.284 | 674 ms/step , 58281.39 GFLOP/s , 531519.3 tokens/s INFO:__main__:2024-10-26 20:19:00 | Validation | Step: 10000 | Val_loss: 2.377 | Best_val_loss: 2.5181 INFO:__main__:2024-10-26 20:19:00 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_201900_step_10000.pt` INFO:__main__:2024-10-26 20:19:01 | Epoch: 0 | Step: 10000 | Dataset: 0-8001000 | Loss: 2.353 | 674 ms/step , 58361.89 GFLOP/s , 479448.8 tokens/s INFO:__main__:2024-10-26 20:19:09 | Epoch: 0 | Step: 10010 | Dataset: 0-8009000 | Loss: 2.337 | 675 ms/step , 58263.43 GFLOP/s , 531925.5 tokens/s INFO:__main__:2024-10-26 20:19:17 | Epoch: 0 | Step: 10020 | Dataset: 0-8017000 | Loss: 2.352 | 683 ms/step , 57588.59 GFLOP/s , 529809.1 tokens/s INFO:__main__:2024-10-26 20:19:24 | Epoch: 0 | Step: 10030 | Dataset: 0-8025000 | Loss: 2.335 | 680 ms/step , 57816.30 GFLOP/s , 531128.7 tokens/s INFO:__main__:2024-10-26 20:19:32 | Epoch: 0 | Step: 10040 | Dataset: 0-8033000 | Loss: 2.331 | 679 ms/step , 57863.85 GFLOP/s , 526851.0 tokens/s INFO:__main__:2024-10-26 20:19:40 | Epoch: 0 | Step: 10050 | Dataset: 0-8041000 | Loss: 2.385 | 681 ms/step , 57732.84 GFLOP/s , 526976.4 tokens/s INFO:__main__:2024-10-26 20:19:48 | Epoch: 0 | Step: 10060 | Dataset: 0-8049000 | Loss: 2.346 | 676 ms/step , 58150.19 GFLOP/s , 528979.0 tokens/s INFO:__main__:2024-10-26 20:19:55 | Epoch: 0 | Step: 10070 | Dataset: 0-8057000 | Loss: 2.355 | 676 ms/step , 58169.34 GFLOP/s , 528536.0 tokens/s INFO:__main__:2024-10-26 20:20:03 | Epoch: 0 | Step: 10080 | Dataset: 0-8065000 | Loss: 2.362 | 676 ms/step , 58169.10 GFLOP/s , 526598.2 tokens/s INFO:__main__:2024-10-26 20:20:11 | Epoch: 0 | Step: 10090 | Dataset: 0-8073000 | Loss: 2.368 | 675 ms/step , 58232.87 GFLOP/s , 527400.1 tokens/s INFO:__main__:2024-10-26 20:20:19 | Epoch: 0 | Step: 10100 | Dataset: 0-8081000 | Loss: 2.333 | 675 ms/step , 58243.04 GFLOP/s , 531779.1 tokens/s INFO:__main__:2024-10-26 20:20:26 | Epoch: 0 | Step: 10110 | Dataset: 0-8089000 | Loss: 2.359 | 675 ms/step , 58238.89 GFLOP/s , 530198.2 tokens/s INFO:__main__:2024-10-26 20:20:34 | Epoch: 0 | Step: 10120 | Dataset: 0-8097000 | Loss: 2.358 | 676 ms/step , 58133.16 GFLOP/s , 529679.4 tokens/s INFO:__main__:2024-10-26 20:20:42 | Epoch: 0 | Step: 10130 | Dataset: 0-8105000 | Loss: 2.411 | 674 ms/step , 58360.47 GFLOP/s , 530499.9 tokens/s INFO:__main__:2024-10-26 20:20:50 | Epoch: 0 | Step: 10140 | Dataset: 0-8113000 | Loss: 2.363 | 675 ms/step , 58253.43 GFLOP/s , 531415.8 tokens/s INFO:__main__:2024-10-26 20:20:57 | Epoch: 0 | Step: 10150 | Dataset: 0-8121000 | Loss: 2.309 | 675 ms/step , 58222.51 GFLOP/s , 530969.2 tokens/s INFO:__main__:2024-10-26 20:21:05 | Epoch: 0 | Step: 10160 | Dataset: 0-8129000 | Loss: 2.323 | 675 ms/step , 58241.47 GFLOP/s , 530853.9 tokens/s INFO:__main__:2024-10-26 20:21:13 | Epoch: 0 | Step: 10170 | Dataset: 0-8137000 | Loss: 2.289 | 676 ms/step , 58179.36 GFLOP/s , 531294.2 tokens/s INFO:__main__:2024-10-26 20:21:20 | Epoch: 0 | Step: 10180 | Dataset: 0-8145000 | Loss: 2.471 | 675 ms/step , 58256.58 GFLOP/s , 531078.6 tokens/s INFO:__main__:2024-10-26 20:21:28 | Epoch: 0 | Step: 10190 | Dataset: 0-8153000 | Loss: 2.453 | 676 ms/step , 58162.25 GFLOP/s , 531325.1 tokens/s INFO:__main__:2024-10-26 20:21:36 | Epoch: 0 | Step: 10200 | Dataset: 0-8161000 | Loss: 2.288 | 675 ms/step , 58198.77 GFLOP/s , 530032.8 tokens/s INFO:__main__:2024-10-26 20:21:44 | Epoch: 0 | Step: 10210 | Dataset: 0-8169000 | Loss: 2.354 | 674 ms/step , 58292.02 GFLOP/s , 530196.4 tokens/s INFO:__main__:2024-10-26 20:21:51 | Epoch: 0 | Step: 10220 | Dataset: 0-8177000 | Loss: 2.369 | 675 ms/step , 58210.65 GFLOP/s , 531776.4 tokens/s INFO:__main__:2024-10-26 20:21:59 | Epoch: 0 | Step: 10230 | Dataset: 0-8185000 | Loss: 2.360 | 674 ms/step , 58326.64 GFLOP/s , 533364.2 tokens/s INFO:__main__:2024-10-26 20:22:07 | Epoch: 0 | Step: 10240 | Dataset: 0-8193000 | Loss: 2.339 | 675 ms/step , 58211.61 GFLOP/s , 532945.5 tokens/s INFO:__main__:2024-10-26 20:22:14 | Epoch: 0 | Step: 10250 | Dataset: 0-8201000 | Loss: 2.429 | 678 ms/step , 57957.77 GFLOP/s , 532409.3 tokens/s INFO:__main__:2024-10-26 20:22:22 | Epoch: 0 | Step: 10260 | Dataset: 0-8209000 | Loss: 2.365 | 675 ms/step , 58200.83 GFLOP/s , 531903.4 tokens/s INFO:__main__:2024-10-26 20:22:30 | Epoch: 0 | Step: 10270 | Dataset: 0-8217000 | Loss: 2.340 | 676 ms/step , 58130.11 GFLOP/s , 532257.1 tokens/s INFO:__main__:2024-10-26 20:22:37 | Epoch: 0 | Step: 10280 | Dataset: 0-8225000 | Loss: 2.312 | 678 ms/step , 57990.29 GFLOP/s , 532123.1 tokens/s INFO:__main__:2024-10-26 20:22:45 | Epoch: 0 | Step: 10290 | Dataset: 0-8233000 | Loss: 2.348 | 675 ms/step , 58277.74 GFLOP/s , 531509.5 tokens/s INFO:__main__:2024-10-26 20:22:53 | Epoch: 0 | Step: 10300 | Dataset: 0-8241000 | Loss: 2.363 | 680 ms/step , 57801.19 GFLOP/s , 530940.5 tokens/s INFO:__main__:2024-10-26 20:23:01 | Epoch: 0 | Step: 10310 | Dataset: 0-8249000 | Loss: 2.301 | 678 ms/step , 57982.30 GFLOP/s , 531046.5 tokens/s INFO:__main__:2024-10-26 20:23:08 | Epoch: 0 | Step: 10320 | Dataset: 0-8257000 | Loss: 2.364 | 677 ms/step , 58049.58 GFLOP/s , 531903.5 tokens/s INFO:__main__:2024-10-26 20:23:16 | Epoch: 0 | Step: 10330 | Dataset: 0-8265000 | Loss: 2.269 | 676 ms/step , 58119.84 GFLOP/s , 532019.5 tokens/s INFO:__main__:2024-10-26 20:23:24 | Epoch: 0 | Step: 10340 | Dataset: 0-8273000 | Loss: 2.259 | 674 ms/step , 58297.45 GFLOP/s , 532324.7 tokens/s INFO:__main__:2024-10-26 20:23:31 | Epoch: 0 | Step: 10350 | Dataset: 0-8281000 | Loss: 2.370 | 675 ms/step , 58220.71 GFLOP/s , 532026.9 tokens/s INFO:__main__:2024-10-26 20:23:39 | Epoch: 0 | Step: 10360 | Dataset: 0-8289000 | Loss: 2.347 | 679 ms/step , 57900.17 GFLOP/s , 530835.5 tokens/s INFO:__main__:2024-10-26 20:23:47 | Epoch: 0 | Step: 10370 | Dataset: 0-8297000 | Loss: 2.348 | 678 ms/step , 57962.53 GFLOP/s , 531190.1 tokens/s INFO:__main__:2024-10-26 20:23:55 | Epoch: 0 | Step: 10380 | Dataset: 0-8305000 | Loss: 2.500 | 675 ms/step , 58279.18 GFLOP/s , 531434.0 tokens/s INFO:__main__:2024-10-26 20:24:02 | Epoch: 0 | Step: 10390 | Dataset: 0-8313000 | Loss: 2.202 | 677 ms/step , 58097.94 GFLOP/s , 531739.5 tokens/s INFO:__main__:2024-10-26 20:24:10 | Epoch: 0 | Step: 10400 | Dataset: 0-8321000 | Loss: 2.063 | 678 ms/step , 58014.19 GFLOP/s , 531310.3 tokens/s INFO:__main__:2024-10-26 20:24:18 | Epoch: 0 | Step: 10410 | Dataset: 0-8329000 | Loss: 1.956 | 677 ms/step , 58098.62 GFLOP/s , 531609.3 tokens/s INFO:__main__:2024-10-26 20:24:25 | Epoch: 0 | Step: 10420 | Dataset: 0-8337000 | Loss: 1.953 | 696 ms/step , 56506.43 GFLOP/s , 530550.1 tokens/s INFO:__main__:2024-10-26 20:24:33 | Epoch: 0 | Step: 10430 | Dataset: 0-8345000 | Loss: 1.915 | 675 ms/step , 58212.24 GFLOP/s , 531961.2 tokens/s INFO:__main__:2024-10-26 20:24:41 | Epoch: 0 | Step: 10440 | Dataset: 0-8353000 | Loss: 1.876 | 677 ms/step , 58082.67 GFLOP/s , 531935.0 tokens/s INFO:__main__:2024-10-26 20:24:48 | Epoch: 0 | Step: 10450 | Dataset: 0-8361000 | Loss: 1.877 | 679 ms/step , 57903.63 GFLOP/s , 531856.5 tokens/s INFO:__main__:2024-10-26 20:24:56 | Epoch: 0 | Step: 10460 | Dataset: 0-8369000 | Loss: 2.773 | 676 ms/step , 58109.78 GFLOP/s , 532122.3 tokens/s INFO:__main__:2024-10-26 20:25:04 | Epoch: 0 | Step: 10470 | Dataset: 0-8377000 | Loss: 2.544 | 676 ms/step , 58120.61 GFLOP/s , 531755.7 tokens/s INFO:__main__:2024-10-26 20:25:12 | Epoch: 0 | Step: 10480 | Dataset: 0-8385000 | Loss: 2.472 | 678 ms/step , 57939.28 GFLOP/s , 531296.4 tokens/s INFO:__main__:2024-10-26 20:25:19 | Epoch: 0 | Step: 10490 | Dataset: 0-8393000 | Loss: 2.423 | 674 ms/step , 58315.55 GFLOP/s , 529971.0 tokens/s INFO:__main__:2024-10-26 20:25:27 | Epoch: 0 | Step: 10500 | Dataset: 0-8401000 | Loss: 2.368 | 675 ms/step , 58235.14 GFLOP/s , 532229.4 tokens/s INFO:__main__:2024-10-26 20:25:35 | Epoch: 0 | Step: 10510 | Dataset: 0-8409000 | Loss: 2.375 | 675 ms/step , 58224.52 GFLOP/s , 531728.9 tokens/s INFO:__main__:2024-10-26 20:25:42 | Epoch: 0 | Step: 10520 | Dataset: 0-8417000 | Loss: 2.377 | 673 ms/step , 58452.13 GFLOP/s , 533026.5 tokens/s INFO:__main__:2024-10-26 20:25:50 | Epoch: 0 | Step: 10530 | Dataset: 0-8425000 | Loss: 2.349 | 675 ms/step , 58268.73 GFLOP/s , 533394.2 tokens/s INFO:__main__:2024-10-26 20:25:58 | Epoch: 0 | Step: 10540 | Dataset: 0-8433000 | Loss: 2.352 | 674 ms/step , 58356.30 GFLOP/s , 531575.2 tokens/s INFO:__main__:2024-10-26 20:26:06 | Epoch: 0 | Step: 10550 | Dataset: 0-8441000 | Loss: 2.405 | 676 ms/step , 58180.04 GFLOP/s , 527621.6 tokens/s INFO:__main__:2024-10-26 20:26:13 | Epoch: 0 | Step: 10560 | Dataset: 0-8449000 | Loss: 2.397 | 675 ms/step , 58202.80 GFLOP/s , 531694.0 tokens/s INFO:__main__:2024-10-26 20:26:21 | Epoch: 0 | Step: 10570 | Dataset: 0-8457000 | Loss: 2.400 | 674 ms/step , 58310.92 GFLOP/s , 532745.4 tokens/s INFO:__main__:2024-10-26 20:26:29 | Epoch: 0 | Step: 10580 | Dataset: 0-8465000 | Loss: 2.447 | 674 ms/step , 58351.58 GFLOP/s , 532846.2 tokens/s INFO:__main__:2024-10-26 20:26:36 | Epoch: 0 | Step: 10590 | Dataset: 0-8473000 | Loss: 2.453 | 673 ms/step , 58373.87 GFLOP/s , 533960.7 tokens/s INFO:__main__:2024-10-26 20:26:44 | Epoch: 0 | Step: 10600 | Dataset: 0-8481000 | Loss: 2.350 | 674 ms/step , 58293.43 GFLOP/s , 533780.9 tokens/s INFO:__main__:2024-10-26 20:26:52 | Epoch: 0 | Step: 10610 | Dataset: 0-8489000 | Loss: 2.343 | 674 ms/step , 58294.46 GFLOP/s , 533730.6 tokens/s INFO:__main__:2024-10-26 20:26:59 | Epoch: 0 | Step: 10620 | Dataset: 0-8497000 | Loss: 2.181 | 674 ms/step , 58323.74 GFLOP/s , 533379.4 tokens/s INFO:__main__:2024-10-26 20:27:07 | Epoch: 0 | Step: 10630 | Dataset: 0-8505000 | Loss: 2.012 | 674 ms/step , 58360.68 GFLOP/s , 533328.0 tokens/s INFO:__main__:2024-10-26 20:27:15 | Epoch: 0 | Step: 10640 | Dataset: 0-8513000 | Loss: 1.918 | 674 ms/step , 58335.53 GFLOP/s , 533484.7 tokens/s INFO:__main__:2024-10-26 20:27:22 | Epoch: 0 | Step: 10650 | Dataset: 0-8521000 | Loss: 1.920 | 673 ms/step , 58370.98 GFLOP/s , 533246.8 tokens/s INFO:__main__:2024-10-26 20:27:30 | Epoch: 0 | Step: 10660 | Dataset: 0-8529000 | Loss: 1.845 | 675 ms/step , 58264.99 GFLOP/s , 533156.6 tokens/s INFO:__main__:2024-10-26 20:27:38 | Epoch: 0 | Step: 10670 | Dataset: 0-8537000 | Loss: 1.854 | 673 ms/step , 58410.26 GFLOP/s , 532433.0 tokens/s INFO:__main__:2024-10-26 20:27:45 | Epoch: 0 | Step: 10680 | Dataset: 0-8545000 | Loss: 1.881 | 674 ms/step , 58333.40 GFLOP/s , 532639.4 tokens/s INFO:__main__:2024-10-26 20:27:53 | Epoch: 0 | Step: 10690 | Dataset: 0-8553000 | Loss: 1.833 | 675 ms/step , 58266.87 GFLOP/s , 532663.4 tokens/s INFO:__main__:2024-10-26 20:28:01 | Epoch: 0 | Step: 10700 | Dataset: 0-8561000 | Loss: 1.826 | 674 ms/step , 58302.64 GFLOP/s , 532199.1 tokens/s INFO:__main__:2024-10-26 20:28:08 | Epoch: 0 | Step: 10710 | Dataset: 0-8569000 | Loss: 2.577 | 675 ms/step , 58234.70 GFLOP/s , 531826.0 tokens/s INFO:__main__:2024-10-26 20:28:16 | Epoch: 0 | Step: 10720 | Dataset: 0-8577000 | Loss: 2.479 | 674 ms/step , 58283.98 GFLOP/s , 531590.1 tokens/s INFO:__main__:2024-10-26 20:28:24 | Epoch: 0 | Step: 10730 | Dataset: 0-8585000 | Loss: 2.474 | 676 ms/step , 58184.91 GFLOP/s , 532712.8 tokens/s INFO:__main__:2024-10-26 20:28:32 | Epoch: 0 | Step: 10740 | Dataset: 0-8593000 | Loss: 2.485 | 674 ms/step , 58321.40 GFLOP/s , 532119.2 tokens/s INFO:__main__:2024-10-26 20:28:39 | Epoch: 0 | Step: 10750 | Dataset: 0-8601000 | Loss: 2.353 | 674 ms/step , 58346.21 GFLOP/s , 532738.0 tokens/s INFO:__main__:2024-10-26 20:28:47 | Epoch: 0 | Step: 10760 | Dataset: 0-8609000 | Loss: 2.377 | 675 ms/step , 58209.24 GFLOP/s , 532153.8 tokens/s INFO:__main__:2024-10-26 20:28:55 | Epoch: 0 | Step: 10770 | Dataset: 0-8617000 | Loss: 2.445 | 675 ms/step , 58245.32 GFLOP/s , 532312.8 tokens/s INFO:__main__:2024-10-26 20:29:02 | Epoch: 0 | Step: 10780 | Dataset: 0-8625000 | Loss: 2.287 | 675 ms/step , 58193.00 GFLOP/s , 532292.9 tokens/s INFO:__main__:2024-10-26 20:29:10 | Epoch: 0 | Step: 10790 | Dataset: 0-8633000 | Loss: 2.398 | 673 ms/step , 58389.59 GFLOP/s , 532925.5 tokens/s INFO:__main__:2024-10-26 20:29:18 | Epoch: 0 | Step: 10800 | Dataset: 0-8641000 | Loss: 2.456 | 675 ms/step , 58216.22 GFLOP/s , 532229.8 tokens/s INFO:__main__:2024-10-26 20:29:25 | Epoch: 0 | Step: 10810 | Dataset: 0-8649000 | Loss: 2.341 | 675 ms/step , 58268.03 GFLOP/s , 531789.7 tokens/s INFO:__main__:2024-10-26 20:29:33 | Epoch: 0 | Step: 10820 | Dataset: 0-8657000 | Loss: 2.284 | 674 ms/step , 58302.61 GFLOP/s , 531968.5 tokens/s INFO:__main__:2024-10-26 20:29:41 | Epoch: 0 | Step: 10830 | Dataset: 0-8665000 | Loss: 2.397 | 676 ms/step , 58187.19 GFLOP/s , 531997.1 tokens/s INFO:__main__:2024-10-26 20:29:49 | Epoch: 0 | Step: 10840 | Dataset: 0-8673000 | Loss: 2.367 | 675 ms/step , 58254.18 GFLOP/s , 532087.6 tokens/s INFO:__main__:2024-10-26 20:29:56 | Epoch: 0 | Step: 10850 | Dataset: 0-8681000 | Loss: 2.356 | 675 ms/step , 58242.76 GFLOP/s , 531673.2 tokens/s INFO:__main__:2024-10-26 20:30:04 | Epoch: 0 | Step: 10860 | Dataset: 0-8689000 | Loss: 2.357 | 676 ms/step , 58178.62 GFLOP/s , 532348.4 tokens/s INFO:__main__:2024-10-26 20:30:12 | Epoch: 0 | Step: 10870 | Dataset: 0-8697000 | Loss: 2.582 | 674 ms/step , 58298.95 GFLOP/s , 532533.7 tokens/s INFO:__main__:2024-10-26 20:30:19 | Epoch: 0 | Step: 10880 | Dataset: 0-8705000 | Loss: 2.521 | 675 ms/step , 58245.88 GFLOP/s , 531811.7 tokens/s INFO:__main__:2024-10-26 20:30:27 | Epoch: 0 | Step: 10890 | Dataset: 0-8713000 | Loss: 2.423 | 675 ms/step , 58219.02 GFLOP/s , 532145.5 tokens/s INFO:__main__:2024-10-26 20:30:35 | Epoch: 0 | Step: 10900 | Dataset: 0-8721000 | Loss: 2.437 | 676 ms/step , 58142.41 GFLOP/s , 530715.6 tokens/s INFO:__main__:2024-10-26 20:30:42 | Epoch: 0 | Step: 10910 | Dataset: 0-8729000 | Loss: 2.389 | 675 ms/step , 58226.08 GFLOP/s , 530887.9 tokens/s INFO:__main__:2024-10-26 20:30:50 | Epoch: 0 | Step: 10920 | Dataset: 0-8737000 | Loss: 2.427 | 676 ms/step , 58136.86 GFLOP/s , 531506.2 tokens/s INFO:__main__:2024-10-26 20:30:58 | Epoch: 0 | Step: 10930 | Dataset: 0-8745000 | Loss: 2.407 | 674 ms/step , 58298.37 GFLOP/s , 531043.6 tokens/s INFO:__main__:2024-10-26 20:31:06 | Epoch: 0 | Step: 10940 | Dataset: 0-8753000 | Loss: 2.362 | 674 ms/step , 58292.31 GFLOP/s , 529718.5 tokens/s INFO:__main__:2024-10-26 20:31:13 | Epoch: 0 | Step: 10950 | Dataset: 0-8761000 | Loss: 2.429 | 681 ms/step , 57715.03 GFLOP/s , 530751.5 tokens/s INFO:__main__:2024-10-26 20:31:21 | Epoch: 0 | Step: 10960 | Dataset: 0-8769000 | Loss: 2.336 | 675 ms/step , 58212.32 GFLOP/s , 531017.0 tokens/s INFO:__main__:2024-10-26 20:31:29 | Epoch: 0 | Step: 10970 | Dataset: 0-8777000 | Loss: 2.402 | 676 ms/step , 58160.97 GFLOP/s , 529806.9 tokens/s INFO:__main__:2024-10-26 20:31:37 | Epoch: 0 | Step: 10980 | Dataset: 0-8785000 | Loss: 2.328 | 679 ms/step , 57909.04 GFLOP/s , 529442.7 tokens/s INFO:__main__:2024-10-26 20:31:44 | Epoch: 0 | Step: 10990 | Dataset: 0-8793000 | Loss: 2.289 | 676 ms/step , 58164.57 GFLOP/s , 529168.3 tokens/s INFO:__main__:2024-10-26 20:31:51 | Validation | Step: 11000 | Val_loss: 2.653 | Best_val_loss: 2.3770 INFO:__main__:2024-10-26 20:31:51 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_203151_step_11000.pt` INFO:__main__:2024-10-26 20:31:53 | Epoch: 0 | Step: 11000 | Dataset: 0-8801000 | Loss: 2.313 | 674 ms/step , 58351.17 GFLOP/s , 478458.5 tokens/s INFO:__main__:2024-10-26 20:32:01 | Epoch: 0 | Step: 11010 | Dataset: 0-8809000 | Loss: 2.242 | 680 ms/step , 57778.62 GFLOP/s , 529888.6 tokens/s INFO:__main__:2024-10-26 20:32:08 | Epoch: 0 | Step: 11020 | Dataset: 0-8817000 | Loss: 2.339 | 675 ms/step , 58256.44 GFLOP/s , 530520.5 tokens/s INFO:__main__:2024-10-26 20:32:16 | Epoch: 0 | Step: 11030 | Dataset: 0-8825000 | Loss: 2.840 | 678 ms/step , 57951.46 GFLOP/s , 531149.9 tokens/s INFO:__main__:2024-10-26 20:32:24 | Epoch: 0 | Step: 11040 | Dataset: 0-8833000 | Loss: 2.129 | 679 ms/step , 57904.93 GFLOP/s , 531481.2 tokens/s INFO:__main__:2024-10-26 20:32:31 | Epoch: 0 | Step: 11050 | Dataset: 0-8841000 | Loss: 2.019 | 677 ms/step , 58063.98 GFLOP/s , 531763.4 tokens/s INFO:__main__:2024-10-26 20:32:39 | Epoch: 0 | Step: 11060 | Dataset: 0-8849000 | Loss: 2.034 | 674 ms/step , 58336.12 GFLOP/s , 532070.0 tokens/s INFO:__main__:2024-10-26 20:32:47 | Epoch: 0 | Step: 11070 | Dataset: 0-8857000 | Loss: 1.973 | 677 ms/step , 58087.03 GFLOP/s , 531431.9 tokens/s INFO:__main__:2024-10-26 20:32:54 | Epoch: 0 | Step: 11080 | Dataset: 0-8865000 | Loss: 1.948 | 681 ms/step , 57731.45 GFLOP/s , 531885.9 tokens/s INFO:__main__:2024-10-26 20:33:02 | Epoch: 0 | Step: 11090 | Dataset: 0-8873000 | Loss: 1.954 | 676 ms/step , 58113.63 GFLOP/s , 531551.0 tokens/s INFO:__main__:2024-10-26 20:33:10 | Epoch: 0 | Step: 11100 | Dataset: 0-8881000 | Loss: 1.923 | 675 ms/step , 58278.30 GFLOP/s , 532065.0 tokens/s INFO:__main__:2024-10-26 20:33:18 | Epoch: 0 | Step: 11110 | Dataset: 0-8889000 | Loss: 1.933 | 677 ms/step , 58051.88 GFLOP/s , 532400.9 tokens/s INFO:__main__:2024-10-26 20:33:25 | Epoch: 0 | Step: 11120 | Dataset: 0-8897000 | Loss: 2.823 | 674 ms/step , 58292.05 GFLOP/s , 531798.7 tokens/s INFO:__main__:2024-10-26 20:33:33 | Epoch: 0 | Step: 11130 | Dataset: 0-8905000 | Loss: 2.595 | 674 ms/step , 58297.79 GFLOP/s , 532049.1 tokens/s INFO:__main__:2024-10-26 20:33:41 | Epoch: 0 | Step: 11140 | Dataset: 0-8913000 | Loss: 2.518 | 675 ms/step , 58253.51 GFLOP/s , 531461.9 tokens/s INFO:__main__:2024-10-26 20:33:48 | Epoch: 0 | Step: 11150 | Dataset: 0-8921000 | Loss: 2.367 | 675 ms/step , 58254.73 GFLOP/s , 532344.6 tokens/s INFO:__main__:2024-10-26 20:33:56 | Epoch: 0 | Step: 11160 | Dataset: 0-8929000 | Loss: 2.372 | 676 ms/step , 58172.23 GFLOP/s , 531332.6 tokens/s INFO:__main__:2024-10-26 20:34:04 | Epoch: 0 | Step: 11170 | Dataset: 0-8937000 | Loss: 2.415 | 677 ms/step , 58099.59 GFLOP/s , 531616.2 tokens/s INFO:__main__:2024-10-26 20:34:12 | Epoch: 0 | Step: 11180 | Dataset: 0-8945000 | Loss: 2.348 | 675 ms/step , 58253.66 GFLOP/s , 531968.6 tokens/s INFO:__main__:2024-10-26 20:34:19 | Epoch: 0 | Step: 11190 | Dataset: 0-8953000 | Loss: 2.434 | 675 ms/step , 58269.23 GFLOP/s , 531892.2 tokens/s INFO:__main__:2024-10-26 20:34:27 | Epoch: 0 | Step: 11200 | Dataset: 0-8961000 | Loss: 2.415 | 675 ms/step , 58211.91 GFLOP/s , 531588.0 tokens/s INFO:__main__:2024-10-26 20:34:35 | Epoch: 0 | Step: 11210 | Dataset: 0-8969000 | Loss: 2.387 | 675 ms/step , 58205.81 GFLOP/s , 531793.2 tokens/s INFO:__main__:2024-10-26 20:34:42 | Epoch: 0 | Step: 11220 | Dataset: 0-8977000 | Loss: 2.427 | 676 ms/step , 58151.09 GFLOP/s , 532054.1 tokens/s INFO:__main__:2024-10-26 20:34:50 | Epoch: 0 | Step: 11230 | Dataset: 0-8985000 | Loss: 2.451 | 675 ms/step , 58275.91 GFLOP/s , 531342.1 tokens/s INFO:__main__:2024-10-26 20:34:58 | Epoch: 0 | Step: 11240 | Dataset: 0-8993000 | Loss: 2.317 | 674 ms/step , 58330.78 GFLOP/s , 532361.2 tokens/s INFO:__main__:2024-10-26 20:35:05 | Epoch: 0 | Step: 11250 | Dataset: 0-9001000 | Loss: 2.279 | 676 ms/step , 58137.76 GFLOP/s , 531018.3 tokens/s INFO:__main__:2024-10-26 20:35:13 | Epoch: 0 | Step: 11260 | Dataset: 0-9009000 | Loss: 2.358 | 674 ms/step , 58282.57 GFLOP/s , 532014.3 tokens/s INFO:__main__:2024-10-26 20:35:21 | Epoch: 0 | Step: 11270 | Dataset: 0-9017000 | Loss: 2.367 | 675 ms/step , 58236.32 GFLOP/s , 532063.8 tokens/s INFO:__main__:2024-10-26 20:35:29 | Epoch: 0 | Step: 11280 | Dataset: 0-9025000 | Loss: 2.392 | 674 ms/step , 58283.04 GFLOP/s , 531997.7 tokens/s INFO:__main__:2024-10-26 20:35:36 | Epoch: 0 | Step: 11290 | Dataset: 0-9033000 | Loss: 2.386 | 675 ms/step , 58272.54 GFLOP/s , 531553.4 tokens/s INFO:__main__:2024-10-26 20:35:44 | Epoch: 0 | Step: 11300 | Dataset: 0-9041000 | Loss: 2.388 | 675 ms/step , 58200.73 GFLOP/s , 531693.7 tokens/s INFO:__main__:2024-10-26 20:35:52 | Epoch: 0 | Step: 11310 | Dataset: 0-9049000 | Loss: 2.393 | 675 ms/step , 58247.39 GFLOP/s , 532012.5 tokens/s INFO:__main__:2024-10-26 20:35:59 | Epoch: 0 | Step: 11320 | Dataset: 0-9057000 | Loss: 2.331 | 674 ms/step , 58294.65 GFLOP/s , 532362.1 tokens/s INFO:__main__:2024-10-26 20:36:07 | Epoch: 0 | Step: 11330 | Dataset: 0-9065000 | Loss: 2.335 | 674 ms/step , 58337.65 GFLOP/s , 533238.9 tokens/s INFO:__main__:2024-10-26 20:36:15 | Epoch: 0 | Step: 11340 | Dataset: 0-9073000 | Loss: 2.442 | 675 ms/step , 58242.71 GFLOP/s , 531352.4 tokens/s INFO:__main__:2024-10-26 20:36:22 | Epoch: 0 | Step: 11350 | Dataset: 0-9081000 | Loss: 2.482 | 675 ms/step , 58226.92 GFLOP/s , 530014.6 tokens/s INFO:__main__:2024-10-26 20:36:30 | Epoch: 0 | Step: 11360 | Dataset: 0-9089000 | Loss: 2.365 | 675 ms/step , 58243.11 GFLOP/s , 528112.6 tokens/s INFO:__main__:2024-10-26 20:36:38 | Epoch: 0 | Step: 11370 | Dataset: 0-9097000 | Loss: 2.426 | 677 ms/step , 58082.18 GFLOP/s , 530679.1 tokens/s INFO:__main__:2024-10-26 20:36:46 | Epoch: 0 | Step: 11380 | Dataset: 0-9105000 | Loss: 2.325 | 676 ms/step , 58170.33 GFLOP/s , 529909.3 tokens/s INFO:__main__:2024-10-26 20:36:53 | Epoch: 0 | Step: 11390 | Dataset: 0-9113000 | Loss: 2.387 | 676 ms/step , 58128.51 GFLOP/s , 529285.9 tokens/s INFO:__main__:2024-10-26 20:37:01 | Epoch: 0 | Step: 11400 | Dataset: 0-9121000 | Loss: 2.388 | 676 ms/step , 58172.67 GFLOP/s , 530588.9 tokens/s INFO:__main__:2024-10-26 20:37:09 | Epoch: 0 | Step: 11410 | Dataset: 0-9129000 | Loss: 2.403 | 676 ms/step , 58126.58 GFLOP/s , 530046.2 tokens/s INFO:__main__:2024-10-26 20:37:17 | Epoch: 0 | Step: 11420 | Dataset: 0-9137000 | Loss: 2.382 | 675 ms/step , 58224.89 GFLOP/s , 531966.2 tokens/s INFO:__main__:2024-10-26 20:37:24 | Epoch: 0 | Step: 11430 | Dataset: 0-9145000 | Loss: 2.398 | 675 ms/step , 58264.38 GFLOP/s , 531350.3 tokens/s INFO:__main__:2024-10-26 20:37:32 | Epoch: 0 | Step: 11440 | Dataset: 0-9153000 | Loss: 2.383 | 674 ms/step , 58350.67 GFLOP/s , 532474.1 tokens/s INFO:__main__:2024-10-26 20:37:40 | Epoch: 0 | Step: 11450 | Dataset: 0-9161000 | Loss: 2.425 | 674 ms/step , 58288.59 GFLOP/s , 530846.7 tokens/s INFO:__main__:2024-10-26 20:37:47 | Epoch: 0 | Step: 11460 | Dataset: 0-9169000 | Loss: 2.416 | 676 ms/step , 58180.89 GFLOP/s , 532230.6 tokens/s INFO:__main__:2024-10-26 20:37:55 | Epoch: 0 | Step: 11470 | Dataset: 0-9177000 | Loss: 2.344 | 676 ms/step , 58132.83 GFLOP/s , 531677.4 tokens/s INFO:__main__:2024-10-26 20:38:03 | Epoch: 0 | Step: 11480 | Dataset: 0-9185000 | Loss: 2.384 | 674 ms/step , 58350.73 GFLOP/s , 532607.8 tokens/s INFO:__main__:2024-10-26 20:38:10 | Epoch: 0 | Step: 11490 | Dataset: 0-9193000 | Loss: 2.436 | 674 ms/step , 58296.22 GFLOP/s , 531565.9 tokens/s INFO:__main__:2024-10-26 20:38:18 | Epoch: 0 | Step: 11500 | Dataset: 0-9201000 | Loss: 2.343 | 674 ms/step , 58288.95 GFLOP/s , 532148.4 tokens/s INFO:__main__:2024-10-26 20:38:26 | Epoch: 0 | Step: 11510 | Dataset: 0-9209000 | Loss: 2.351 | 676 ms/step , 58159.71 GFLOP/s , 531585.5 tokens/s INFO:__main__:2024-10-26 20:38:34 | Epoch: 0 | Step: 11520 | Dataset: 0-9217000 | Loss: 2.427 | 676 ms/step , 58149.79 GFLOP/s , 532145.9 tokens/s INFO:__main__:2024-10-26 20:38:41 | Epoch: 0 | Step: 11530 | Dataset: 0-9225000 | Loss: 2.428 | 674 ms/step , 58287.02 GFLOP/s , 532332.9 tokens/s INFO:__main__:2024-10-26 20:38:49 | Epoch: 0 | Step: 11540 | Dataset: 0-9233000 | Loss: 2.323 | 675 ms/step , 58241.91 GFLOP/s , 531994.3 tokens/s INFO:__main__:2024-10-26 20:38:57 | Epoch: 0 | Step: 11550 | Dataset: 0-9241000 | Loss: 2.373 | 674 ms/step , 58303.56 GFLOP/s , 531878.9 tokens/s INFO:__main__:2024-10-26 20:39:04 | Epoch: 0 | Step: 11560 | Dataset: 0-9249000 | Loss: 2.354 | 677 ms/step , 58097.85 GFLOP/s , 532167.0 tokens/s INFO:__main__:2024-10-26 20:39:12 | Epoch: 0 | Step: 11570 | Dataset: 0-9257000 | Loss: 2.341 | 676 ms/step , 58140.48 GFLOP/s , 530611.1 tokens/s INFO:__main__:2024-10-26 20:39:20 | Epoch: 0 | Step: 11580 | Dataset: 0-9265000 | Loss: 2.316 | 676 ms/step , 58157.85 GFLOP/s , 533342.9 tokens/s INFO:__main__:2024-10-26 20:39:27 | Epoch: 0 | Step: 11590 | Dataset: 0-9273000 | Loss: 2.339 | 674 ms/step , 58313.62 GFLOP/s , 533303.6 tokens/s INFO:__main__:2024-10-26 20:39:35 | Epoch: 0 | Step: 11600 | Dataset: 0-9281000 | Loss: 2.294 | 675 ms/step , 58245.86 GFLOP/s , 533451.5 tokens/s INFO:__main__:2024-10-26 20:39:43 | Epoch: 0 | Step: 11610 | Dataset: 0-9289000 | Loss: 2.101 | 674 ms/step , 58289.92 GFLOP/s , 532770.4 tokens/s INFO:__main__:2024-10-26 20:39:50 | Epoch: 0 | Step: 11620 | Dataset: 0-9297000 | Loss: 1.999 | 674 ms/step , 58283.88 GFLOP/s , 532898.1 tokens/s INFO:__main__:2024-10-26 20:39:58 | Epoch: 0 | Step: 11630 | Dataset: 0-9305000 | Loss: 1.979 | 674 ms/step , 58350.23 GFLOP/s , 532791.5 tokens/s INFO:__main__:2024-10-26 20:40:06 | Epoch: 0 | Step: 11640 | Dataset: 0-9313000 | Loss: 1.966 | 674 ms/step , 58365.08 GFLOP/s , 533088.6 tokens/s INFO:__main__:2024-10-26 20:40:14 | Epoch: 0 | Step: 11650 | Dataset: 0-9321000 | Loss: 1.935 | 673 ms/step , 58402.89 GFLOP/s , 532854.0 tokens/s INFO:__main__:2024-10-26 20:40:21 | Epoch: 0 | Step: 11660 | Dataset: 0-9329000 | Loss: 1.958 | 673 ms/step , 58368.01 GFLOP/s , 532987.8 tokens/s INFO:__main__:2024-10-26 20:40:29 | Epoch: 0 | Step: 11670 | Dataset: 0-9337000 | Loss: 1.921 | 675 ms/step , 58253.12 GFLOP/s , 532766.9 tokens/s INFO:__main__:2024-10-26 20:40:37 | Epoch: 0 | Step: 11680 | Dataset: 0-9345000 | Loss: 1.881 | 675 ms/step , 58269.98 GFLOP/s , 532784.3 tokens/s INFO:__main__:2024-10-26 20:40:44 | Epoch: 0 | Step: 11690 | Dataset: 0-9353000 | Loss: 1.894 | 676 ms/step , 58120.01 GFLOP/s , 532482.0 tokens/s INFO:__main__:2024-10-26 20:40:52 | Epoch: 0 | Step: 11700 | Dataset: 0-9361000 | Loss: 2.407 | 674 ms/step , 58304.71 GFLOP/s , 533316.5 tokens/s INFO:__main__:2024-10-26 20:41:00 | Epoch: 0 | Step: 11710 | Dataset: 0-9369000 | Loss: 2.308 | 675 ms/step , 58240.22 GFLOP/s , 533064.4 tokens/s INFO:__main__:2024-10-26 20:41:07 | Epoch: 0 | Step: 11720 | Dataset: 0-9377000 | Loss: 2.350 | 675 ms/step , 58269.35 GFLOP/s , 533071.4 tokens/s INFO:__main__:2024-10-26 20:41:15 | Epoch: 0 | Step: 11730 | Dataset: 0-9385000 | Loss: 2.358 | 675 ms/step , 58269.99 GFLOP/s , 532891.2 tokens/s INFO:__main__:2024-10-26 20:41:23 | Epoch: 0 | Step: 11740 | Dataset: 0-9393000 | Loss: 2.308 | 673 ms/step , 58366.01 GFLOP/s , 532505.2 tokens/s INFO:__main__:2024-10-26 20:41:30 | Epoch: 0 | Step: 11750 | Dataset: 0-9401000 | Loss: 2.368 | 675 ms/step , 58234.02 GFLOP/s , 533241.2 tokens/s INFO:__main__:2024-10-26 20:41:38 | Epoch: 0 | Step: 11760 | Dataset: 0-9409000 | Loss: 2.317 | 673 ms/step , 58365.96 GFLOP/s , 533093.3 tokens/s INFO:__main__:2024-10-26 20:41:46 | Epoch: 0 | Step: 11770 | Dataset: 0-9417000 | Loss: 2.245 | 674 ms/step , 58339.86 GFLOP/s , 533615.4 tokens/s INFO:__main__:2024-10-26 20:41:53 | Epoch: 0 | Step: 11780 | Dataset: 0-9425000 | Loss: 2.249 | 674 ms/step , 58287.90 GFLOP/s , 533390.5 tokens/s INFO:__main__:2024-10-26 20:42:01 | Epoch: 0 | Step: 11790 | Dataset: 0-9433000 | Loss: 2.269 | 677 ms/step , 58049.96 GFLOP/s , 533051.4 tokens/s INFO:__main__:2024-10-26 20:42:09 | Epoch: 0 | Step: 11800 | Dataset: 0-9441000 | Loss: 2.334 | 675 ms/step , 58249.50 GFLOP/s , 533355.9 tokens/s INFO:__main__:2024-10-26 20:42:17 | Epoch: 0 | Step: 11810 | Dataset: 0-9449000 | Loss: 2.345 | 700 ms/step , 56155.86 GFLOP/s , 529916.4 tokens/s INFO:__main__:2024-10-26 20:42:24 | Epoch: 0 | Step: 11820 | Dataset: 0-9457000 | Loss: 2.239 | 674 ms/step , 58308.12 GFLOP/s , 533092.2 tokens/s INFO:__main__:2024-10-26 20:42:32 | Epoch: 0 | Step: 11830 | Dataset: 0-9465000 | Loss: 2.270 | 675 ms/step , 58271.78 GFLOP/s , 533211.1 tokens/s INFO:__main__:2024-10-26 20:42:40 | Epoch: 0 | Step: 11840 | Dataset: 0-9473000 | Loss: 2.228 | 675 ms/step , 58201.58 GFLOP/s , 533782.1 tokens/s INFO:__main__:2024-10-26 20:42:47 | Epoch: 0 | Step: 11850 | Dataset: 0-9481000 | Loss: 2.267 | 677 ms/step , 58101.10 GFLOP/s , 533048.4 tokens/s INFO:__main__:2024-10-26 20:42:55 | Epoch: 0 | Step: 11860 | Dataset: 0-9489000 | Loss: 2.430 | 676 ms/step , 58168.11 GFLOP/s , 533681.3 tokens/s INFO:__main__:2024-10-26 20:43:03 | Epoch: 0 | Step: 11870 | Dataset: 0-9497000 | Loss: 2.437 | 675 ms/step , 58256.68 GFLOP/s , 533511.3 tokens/s INFO:__main__:2024-10-26 20:43:10 | Epoch: 0 | Step: 11880 | Dataset: 0-9505000 | Loss: 2.356 | 674 ms/step , 58297.46 GFLOP/s , 533311.9 tokens/s INFO:__main__:2024-10-26 20:43:18 | Epoch: 0 | Step: 11890 | Dataset: 0-9513000 | Loss: 2.336 | 674 ms/step , 58290.75 GFLOP/s , 533693.6 tokens/s INFO:__main__:2024-10-26 20:43:26 | Epoch: 0 | Step: 11900 | Dataset: 0-9521000 | Loss: 2.382 | 675 ms/step , 58213.96 GFLOP/s , 532467.6 tokens/s INFO:__main__:2024-10-26 20:43:33 | Epoch: 0 | Step: 11910 | Dataset: 0-9529000 | Loss: 2.414 | 674 ms/step , 58292.38 GFLOP/s , 533596.8 tokens/s INFO:__main__:2024-10-26 20:43:41 | Epoch: 0 | Step: 11920 | Dataset: 0-9537000 | Loss: 2.389 | 674 ms/step , 58306.90 GFLOP/s , 532899.8 tokens/s INFO:__main__:2024-10-26 20:43:49 | Epoch: 0 | Step: 11930 | Dataset: 0-9545000 | Loss: 2.319 | 675 ms/step , 58277.70 GFLOP/s , 533476.3 tokens/s INFO:__main__:2024-10-26 20:43:56 | Epoch: 0 | Step: 11940 | Dataset: 0-9553000 | Loss: 2.359 | 675 ms/step , 58235.88 GFLOP/s , 532801.2 tokens/s INFO:__main__:2024-10-26 20:44:04 | Epoch: 0 | Step: 11950 | Dataset: 0-9561000 | Loss: 2.269 | 675 ms/step , 58236.95 GFLOP/s , 533038.1 tokens/s INFO:__main__:2024-10-26 20:44:12 | Epoch: 0 | Step: 11960 | Dataset: 0-9569000 | Loss: 2.280 | 674 ms/step , 58331.64 GFLOP/s , 533283.8 tokens/s INFO:__main__:2024-10-26 20:44:19 | Epoch: 0 | Step: 11970 | Dataset: 0-9577000 | Loss: 2.303 | 674 ms/step , 58322.71 GFLOP/s , 533016.3 tokens/s INFO:__main__:2024-10-26 20:44:27 | Epoch: 0 | Step: 11980 | Dataset: 0-9585000 | Loss: 2.367 | 676 ms/step , 58179.57 GFLOP/s , 532919.6 tokens/s INFO:__main__:2024-10-26 20:44:35 | Epoch: 0 | Step: 11990 | Dataset: 0-9593000 | Loss: 2.354 | 675 ms/step , 58264.21 GFLOP/s , 533235.2 tokens/s INFO:__main__:2024-10-26 20:44:42 | Validation | Step: 12000 | Val_loss: 2.395 | Best_val_loss: 2.3770 INFO:__main__:2024-10-26 20:44:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_204442_step_12000.pt` INFO:__main__:2024-10-26 20:44:43 | Epoch: 0 | Step: 12000 | Dataset: 0-9601000 | Loss: 2.368 | 673 ms/step , 58401.66 GFLOP/s , 480311.5 tokens/s INFO:__main__:2024-10-26 20:44:51 | Epoch: 0 | Step: 12010 | Dataset: 0-9609000 | Loss: 2.401 | 674 ms/step , 58297.50 GFLOP/s , 532373.4 tokens/s INFO:__main__:2024-10-26 20:44:59 | Epoch: 0 | Step: 12020 | Dataset: 0-9617000 | Loss: 2.337 | 675 ms/step , 58197.35 GFLOP/s , 532619.2 tokens/s INFO:__main__:2024-10-26 20:45:06 | Epoch: 0 | Step: 12030 | Dataset: 0-9625000 | Loss: 2.359 | 675 ms/step , 58275.48 GFLOP/s , 532652.6 tokens/s INFO:__main__:2024-10-26 20:45:14 | Epoch: 0 | Step: 12040 | Dataset: 0-9633000 | Loss: 2.309 | 675 ms/step , 58228.37 GFLOP/s , 532897.3 tokens/s INFO:__main__:2024-10-26 20:45:22 | Epoch: 0 | Step: 12050 | Dataset: 0-9641000 | Loss: 2.367 | 675 ms/step , 58270.99 GFLOP/s , 532889.9 tokens/s INFO:__main__:2024-10-26 20:45:29 | Epoch: 0 | Step: 12060 | Dataset: 0-9649000 | Loss: 2.397 | 674 ms/step , 58324.24 GFLOP/s , 533147.8 tokens/s INFO:__main__:2024-10-26 20:45:37 | Epoch: 0 | Step: 12070 | Dataset: 0-9657000 | Loss: 2.304 | 674 ms/step , 58304.18 GFLOP/s , 532143.0 tokens/s INFO:__main__:2024-10-26 20:45:45 | Epoch: 0 | Step: 12080 | Dataset: 0-9665000 | Loss: 2.269 | 674 ms/step , 58298.60 GFLOP/s , 532958.8 tokens/s INFO:__main__:2024-10-26 20:45:53 | Epoch: 0 | Step: 12090 | Dataset: 0-9673000 | Loss: 2.295 | 674 ms/step , 58283.76 GFLOP/s , 533348.4 tokens/s INFO:__main__:2024-10-26 20:46:00 | Epoch: 0 | Step: 12100 | Dataset: 0-9681000 | Loss: 2.270 | 675 ms/step , 58270.92 GFLOP/s , 532921.5 tokens/s INFO:__main__:2024-10-26 20:46:08 | Epoch: 0 | Step: 12110 | Dataset: 0-9689000 | Loss: 2.234 | 678 ms/step , 58019.81 GFLOP/s , 533063.0 tokens/s INFO:__main__:2024-10-26 20:46:16 | Epoch: 0 | Step: 12120 | Dataset: 0-9697000 | Loss: 2.289 | 674 ms/step , 58281.90 GFLOP/s , 532546.4 tokens/s INFO:__main__:2024-10-26 20:46:23 | Epoch: 0 | Step: 12130 | Dataset: 0-9705000 | Loss: 2.319 | 675 ms/step , 58278.70 GFLOP/s , 533423.9 tokens/s INFO:__main__:2024-10-26 20:46:31 | Epoch: 0 | Step: 12140 | Dataset: 0-9713000 | Loss: 2.199 | 674 ms/step , 58305.07 GFLOP/s , 532876.3 tokens/s INFO:__main__:2024-10-26 20:46:39 | Epoch: 0 | Step: 12150 | Dataset: 0-9721000 | Loss: 2.349 | 676 ms/step , 58152.87 GFLOP/s , 533335.0 tokens/s INFO:__main__:2024-10-26 20:46:46 | Epoch: 0 | Step: 12160 | Dataset: 0-9729000 | Loss: 2.250 | 674 ms/step , 58315.06 GFLOP/s , 532895.2 tokens/s INFO:__main__:2024-10-26 20:46:54 | Epoch: 0 | Step: 12170 | Dataset: 0-9737000 | Loss: 2.270 | 673 ms/step , 58400.18 GFLOP/s , 533308.0 tokens/s INFO:__main__:2024-10-26 20:47:02 | Epoch: 0 | Step: 12180 | Dataset: 0-9745000 | Loss: 2.097 | 675 ms/step , 58278.43 GFLOP/s , 532920.5 tokens/s INFO:__main__:2024-10-26 20:47:09 | Epoch: 0 | Step: 12190 | Dataset: 0-9753000 | Loss: 1.935 | 675 ms/step , 58241.27 GFLOP/s , 532297.7 tokens/s INFO:__main__:2024-10-26 20:47:17 | Epoch: 0 | Step: 12200 | Dataset: 0-9761000 | Loss: 1.871 | 677 ms/step , 58031.20 GFLOP/s , 531407.3 tokens/s INFO:__main__:2024-10-26 20:47:25 | Epoch: 0 | Step: 12210 | Dataset: 0-9769000 | Loss: 1.847 | 675 ms/step , 58276.22 GFLOP/s , 532598.5 tokens/s INFO:__main__:2024-10-26 20:47:32 | Epoch: 0 | Step: 12220 | Dataset: 0-9777000 | Loss: 1.823 | 674 ms/step , 58347.97 GFLOP/s , 532946.6 tokens/s INFO:__main__:2024-10-26 20:47:40 | Epoch: 0 | Step: 12230 | Dataset: 0-9785000 | Loss: 1.822 | 675 ms/step , 58222.95 GFLOP/s , 531580.5 tokens/s INFO:__main__:2024-10-26 20:47:48 | Epoch: 0 | Step: 12240 | Dataset: 0-9793000 | Loss: 1.784 | 677 ms/step , 58066.76 GFLOP/s , 531393.9 tokens/s INFO:__main__:2024-10-26 20:47:56 | Epoch: 0 | Step: 12250 | Dataset: 0-9801000 | Loss: 1.822 | 674 ms/step , 58288.18 GFLOP/s , 531290.2 tokens/s INFO:__main__:2024-10-26 20:48:03 | Epoch: 0 | Step: 12260 | Dataset: 0-9809000 | Loss: 1.812 | 675 ms/step , 58201.99 GFLOP/s , 531991.9 tokens/s INFO:__main__:2024-10-26 20:48:11 | Epoch: 0 | Step: 12270 | Dataset: 0-9817000 | Loss: 2.366 | 676 ms/step , 58141.00 GFLOP/s , 532776.0 tokens/s INFO:__main__:2024-10-26 20:48:19 | Epoch: 0 | Step: 12280 | Dataset: 0-9825000 | Loss: 2.353 | 674 ms/step , 58300.69 GFLOP/s , 532404.6 tokens/s INFO:__main__:2024-10-26 20:48:26 | Epoch: 0 | Step: 12290 | Dataset: 0-9833000 | Loss: 2.340 | 675 ms/step , 58239.02 GFLOP/s , 532092.2 tokens/s INFO:__main__:2024-10-26 20:48:34 | Epoch: 0 | Step: 12300 | Dataset: 0-9841000 | Loss: 2.267 | 675 ms/step , 58259.41 GFLOP/s , 530495.3 tokens/s INFO:__main__:2024-10-26 20:48:42 | Epoch: 0 | Step: 12310 | Dataset: 0-9849000 | Loss: 2.294 | 673 ms/step , 58389.90 GFLOP/s , 533486.4 tokens/s INFO:__main__:2024-10-26 20:48:49 | Epoch: 0 | Step: 12320 | Dataset: 0-9857000 | Loss: 2.370 | 674 ms/step , 58332.08 GFLOP/s , 532041.2 tokens/s INFO:__main__:2024-10-26 20:48:57 | Epoch: 0 | Step: 12330 | Dataset: 0-9865000 | Loss: 2.313 | 677 ms/step , 58046.45 GFLOP/s , 531794.2 tokens/s INFO:__main__:2024-10-26 20:49:05 | Epoch: 0 | Step: 12340 | Dataset: 0-9873000 | Loss: 2.256 | 678 ms/step , 57987.36 GFLOP/s , 530969.9 tokens/s INFO:__main__:2024-10-26 20:49:13 | Epoch: 0 | Step: 12350 | Dataset: 0-9881000 | Loss: 2.226 | 676 ms/step , 58109.74 GFLOP/s , 530975.1 tokens/s INFO:__main__:2024-10-26 20:49:20 | Epoch: 0 | Step: 12360 | Dataset: 0-9889000 | Loss: 2.305 | 672 ms/step , 58463.83 GFLOP/s , 530493.5 tokens/s INFO:__main__:2024-10-26 20:49:28 | Epoch: 0 | Step: 12370 | Dataset: 0-9897000 | Loss: 2.338 | 674 ms/step , 58302.40 GFLOP/s , 532355.3 tokens/s INFO:__main__:2024-10-26 20:49:36 | Epoch: 0 | Step: 12380 | Dataset: 0-9905000 | Loss: 2.172 | 674 ms/step , 58282.29 GFLOP/s , 532753.6 tokens/s INFO:__main__:2024-10-26 20:49:43 | Epoch: 0 | Step: 12390 | Dataset: 0-9913000 | Loss: 2.166 | 673 ms/step , 58394.79 GFLOP/s , 532846.7 tokens/s INFO:__main__:2024-10-26 20:49:51 | Epoch: 0 | Step: 12400 | Dataset: 0-9921000 | Loss: 2.301 | 674 ms/step , 58328.13 GFLOP/s , 533189.9 tokens/s INFO:__main__:2024-10-26 20:49:59 | Epoch: 0 | Step: 12410 | Dataset: 0-9929000 | Loss: 2.243 | 676 ms/step , 58183.40 GFLOP/s , 532011.7 tokens/s INFO:__main__:2024-10-26 20:50:06 | Epoch: 0 | Step: 12420 | Dataset: 0-9937000 | Loss: 2.293 | 675 ms/step , 58257.95 GFLOP/s , 531279.9 tokens/s INFO:__main__:2024-10-26 20:50:14 | Epoch: 0 | Step: 12430 | Dataset: 0-9945000 | Loss: 2.424 | 674 ms/step , 58321.75 GFLOP/s , 533211.7 tokens/s INFO:__main__:2024-10-26 20:50:22 | Epoch: 0 | Step: 12440 | Dataset: 0-9953000 | Loss: 2.412 | 675 ms/step , 58219.25 GFLOP/s , 532219.7 tokens/s INFO:__main__:2024-10-26 20:50:30 | Epoch: 0 | Step: 12450 | Dataset: 0-9961000 | Loss: 2.386 | 675 ms/step , 58248.23 GFLOP/s , 531155.9 tokens/s INFO:__main__:2024-10-26 20:50:37 | Epoch: 0 | Step: 12460 | Dataset: 0-9969000 | Loss: 2.398 | 674 ms/step , 58340.89 GFLOP/s , 530912.7 tokens/s INFO:__main__:2024-10-26 20:50:45 | Epoch: 0 | Step: 12470 | Dataset: 0-9977000 | Loss: 2.387 | 674 ms/step , 58312.05 GFLOP/s , 533589.9 tokens/s INFO:__main__:2024-10-26 20:50:53 | Epoch: 0 | Step: 12480 | Dataset: 0-9985000 | Loss: 2.352 | 674 ms/step , 58325.02 GFLOP/s , 532994.6 tokens/s INFO:__main__:2024-10-26 20:51:00 | Epoch: 0 | Step: 12490 | Dataset: 0-9993000 | Loss: 2.379 | 675 ms/step , 58196.54 GFLOP/s , 533354.1 tokens/s INFO:__main__:2024-10-26 20:51:08 | Epoch: 0 | Step: 12500 | Dataset: 0-10001000 | Loss: 2.357 | 674 ms/step , 58354.08 GFLOP/s , 533063.2 tokens/s INFO:__main__:2024-10-26 20:51:16 | Epoch: 0 | Step: 12510 | Dataset: 0-10009000 | Loss: 2.321 | 674 ms/step , 58357.13 GFLOP/s , 533944.2 tokens/s INFO:__main__:2024-10-26 20:51:23 | Epoch: 0 | Step: 12520 | Dataset: 0-10017000 | Loss: 2.385 | 675 ms/step , 58244.00 GFLOP/s , 533307.0 tokens/s INFO:__main__:2024-10-26 20:51:31 | Epoch: 0 | Step: 12530 | Dataset: 0-10025000 | Loss: 2.382 | 674 ms/step , 58300.90 GFLOP/s , 533616.6 tokens/s INFO:__main__:2024-10-26 20:51:39 | Epoch: 0 | Step: 12540 | Dataset: 0-10033000 | Loss: 2.345 | 675 ms/step , 58208.97 GFLOP/s , 531825.4 tokens/s INFO:__main__:2024-10-26 20:51:46 | Epoch: 0 | Step: 12550 | Dataset: 0-10041000 | Loss: 2.329 | 676 ms/step , 58139.13 GFLOP/s , 530974.9 tokens/s INFO:__main__:2024-10-26 20:51:54 | Epoch: 0 | Step: 12560 | Dataset: 0-10049000 | Loss: 2.301 | 675 ms/step , 58269.70 GFLOP/s , 532558.4 tokens/s INFO:__main__:2024-10-26 20:52:02 | Epoch: 0 | Step: 12570 | Dataset: 0-10057000 | Loss: 2.300 | 674 ms/step , 58326.03 GFLOP/s , 532361.7 tokens/s INFO:__main__:2024-10-26 20:52:10 | Epoch: 0 | Step: 12580 | Dataset: 0-10065000 | Loss: 2.268 | 675 ms/step , 58277.87 GFLOP/s , 533157.2 tokens/s INFO:__main__:2024-10-26 20:52:17 | Epoch: 0 | Step: 12590 | Dataset: 0-10073000 | Loss: 2.104 | 675 ms/step , 58276.14 GFLOP/s , 532356.6 tokens/s INFO:__main__:2024-10-26 20:52:25 | Epoch: 0 | Step: 12600 | Dataset: 0-10081000 | Loss: 1.991 | 674 ms/step , 58312.92 GFLOP/s , 532109.8 tokens/s INFO:__main__:2024-10-26 20:52:33 | Epoch: 0 | Step: 12610 | Dataset: 0-10089000 | Loss: 1.971 | 676 ms/step , 58172.35 GFLOP/s , 532041.0 tokens/s INFO:__main__:2024-10-26 20:52:40 | Epoch: 0 | Step: 12620 | Dataset: 0-10097000 | Loss: 1.933 | 675 ms/step , 58228.36 GFLOP/s , 531857.4 tokens/s INFO:__main__:2024-10-26 20:52:48 | Epoch: 0 | Step: 12630 | Dataset: 0-10105000 | Loss: 1.924 | 676 ms/step , 58169.67 GFLOP/s , 532112.6 tokens/s INFO:__main__:2024-10-26 20:52:56 | Epoch: 0 | Step: 12640 | Dataset: 0-10113000 | Loss: 1.893 | 675 ms/step , 58239.52 GFLOP/s , 531723.6 tokens/s INFO:__main__:2024-10-26 20:53:03 | Epoch: 0 | Step: 12650 | Dataset: 0-10121000 | Loss: 1.921 | 675 ms/step , 58196.71 GFLOP/s , 531899.1 tokens/s INFO:__main__:2024-10-26 20:53:11 | Epoch: 0 | Step: 12660 | Dataset: 0-10129000 | Loss: 1.885 | 674 ms/step , 58296.34 GFLOP/s , 532005.9 tokens/s INFO:__main__:2024-10-26 20:53:19 | Epoch: 0 | Step: 12670 | Dataset: 0-10137000 | Loss: 1.875 | 675 ms/step , 58246.83 GFLOP/s , 531922.3 tokens/s INFO:__main__:2024-10-26 20:53:27 | Epoch: 0 | Step: 12680 | Dataset: 0-10145000 | Loss: 2.474 | 674 ms/step , 58338.45 GFLOP/s , 531315.4 tokens/s INFO:__main__:2024-10-26 20:53:34 | Epoch: 0 | Step: 12690 | Dataset: 0-10153000 | Loss: 2.418 | 674 ms/step , 58334.06 GFLOP/s , 533086.5 tokens/s INFO:__main__:2024-10-26 20:53:42 | Epoch: 0 | Step: 12700 | Dataset: 0-10161000 | Loss: 2.319 | 676 ms/step , 58134.12 GFLOP/s , 532114.3 tokens/s INFO:__main__:2024-10-26 20:53:50 | Epoch: 0 | Step: 12710 | Dataset: 0-10169000 | Loss: 2.282 | 674 ms/step , 58306.43 GFLOP/s , 532757.3 tokens/s INFO:__main__:2024-10-26 20:53:57 | Epoch: 0 | Step: 12720 | Dataset: 0-10177000 | Loss: 2.302 | 675 ms/step , 58208.58 GFLOP/s , 532312.1 tokens/s INFO:__main__:2024-10-26 20:54:05 | Epoch: 0 | Step: 12730 | Dataset: 0-10185000 | Loss: 2.366 | 675 ms/step , 58259.67 GFLOP/s , 532402.3 tokens/s INFO:__main__:2024-10-26 20:54:13 | Epoch: 0 | Step: 12740 | Dataset: 0-10193000 | Loss: 2.307 | 675 ms/step , 58251.94 GFLOP/s , 532696.9 tokens/s INFO:__main__:2024-10-26 20:54:20 | Epoch: 0 | Step: 12750 | Dataset: 0-10201000 | Loss: 2.297 | 675 ms/step , 58222.75 GFLOP/s , 532035.5 tokens/s INFO:__main__:2024-10-26 20:54:28 | Epoch: 0 | Step: 12760 | Dataset: 0-10209000 | Loss: 2.315 | 674 ms/step , 58316.83 GFLOP/s , 532858.4 tokens/s INFO:__main__:2024-10-26 20:54:36 | Epoch: 0 | Step: 12770 | Dataset: 0-10217000 | Loss: 2.325 | 675 ms/step , 58232.25 GFLOP/s , 531948.5 tokens/s INFO:__main__:2024-10-26 20:54:43 | Epoch: 0 | Step: 12780 | Dataset: 0-10225000 | Loss: 2.242 | 674 ms/step , 58281.24 GFLOP/s , 532406.5 tokens/s INFO:__main__:2024-10-26 20:54:51 | Epoch: 0 | Step: 12790 | Dataset: 0-10233000 | Loss: 2.220 | 674 ms/step , 58291.92 GFLOP/s , 532774.2 tokens/s INFO:__main__:2024-10-26 20:54:59 | Epoch: 0 | Step: 12800 | Dataset: 0-10241000 | Loss: 2.325 | 674 ms/step , 58349.18 GFLOP/s , 533463.7 tokens/s INFO:__main__:2024-10-26 20:55:06 | Epoch: 0 | Step: 12810 | Dataset: 0-10249000 | Loss: 2.243 | 674 ms/step , 58308.11 GFLOP/s , 532471.1 tokens/s INFO:__main__:2024-10-26 20:55:14 | Epoch: 0 | Step: 12820 | Dataset: 0-10257000 | Loss: 2.372 | 674 ms/step , 58285.45 GFLOP/s , 533498.2 tokens/s INFO:__main__:2024-10-26 20:55:22 | Epoch: 0 | Step: 12830 | Dataset: 0-10265000 | Loss: 2.326 | 674 ms/step , 58309.91 GFLOP/s , 533225.9 tokens/s INFO:__main__:2024-10-26 20:55:30 | Epoch: 0 | Step: 12840 | Dataset: 0-10273000 | Loss: 2.402 | 675 ms/step , 58219.90 GFLOP/s , 532916.7 tokens/s INFO:__main__:2024-10-26 20:55:37 | Epoch: 0 | Step: 12850 | Dataset: 0-10281000 | Loss: 2.378 | 676 ms/step , 58176.84 GFLOP/s , 532525.4 tokens/s INFO:__main__:2024-10-26 20:55:45 | Epoch: 0 | Step: 12860 | Dataset: 0-10289000 | Loss: 2.388 | 676 ms/step , 58187.17 GFLOP/s , 531087.2 tokens/s INFO:__main__:2024-10-26 20:55:53 | Epoch: 0 | Step: 12870 | Dataset: 0-10297000 | Loss: 2.356 | 674 ms/step , 58300.23 GFLOP/s , 532572.3 tokens/s INFO:__main__:2024-10-26 20:56:00 | Epoch: 0 | Step: 12880 | Dataset: 0-10305000 | Loss: 2.367 | 676 ms/step , 58180.70 GFLOP/s , 531947.6 tokens/s INFO:__main__:2024-10-26 20:56:08 | Epoch: 0 | Step: 12890 | Dataset: 0-10313000 | Loss: 2.356 | 674 ms/step , 58340.77 GFLOP/s , 532315.5 tokens/s INFO:__main__:2024-10-26 20:56:16 | Epoch: 0 | Step: 12900 | Dataset: 0-10321000 | Loss: 2.381 | 675 ms/step , 58211.93 GFLOP/s , 532267.0 tokens/s INFO:__main__:2024-10-26 20:56:23 | Epoch: 0 | Step: 12910 | Dataset: 0-10329000 | Loss: 2.278 | 675 ms/step , 58256.48 GFLOP/s , 532053.9 tokens/s INFO:__main__:2024-10-26 20:56:31 | Epoch: 0 | Step: 12920 | Dataset: 0-10337000 | Loss: 2.398 | 675 ms/step , 58212.89 GFLOP/s , 531835.2 tokens/s INFO:__main__:2024-10-26 20:56:39 | Epoch: 0 | Step: 12930 | Dataset: 0-10345000 | Loss: 2.345 | 675 ms/step , 58223.64 GFLOP/s , 531563.1 tokens/s INFO:__main__:2024-10-26 20:56:47 | Epoch: 0 | Step: 12940 | Dataset: 0-10353000 | Loss: 2.356 | 718 ms/step , 54712.38 GFLOP/s , 523279.6 tokens/s INFO:__main__:2024-10-26 20:56:54 | Epoch: 0 | Step: 12950 | Dataset: 0-10361000 | Loss: 2.308 | 675 ms/step , 58231.41 GFLOP/s , 530638.8 tokens/s INFO:__main__:2024-10-26 20:57:02 | Epoch: 0 | Step: 12960 | Dataset: 0-10369000 | Loss: 2.331 | 676 ms/step , 58165.24 GFLOP/s , 532084.6 tokens/s INFO:__main__:2024-10-26 20:57:10 | Epoch: 0 | Step: 12970 | Dataset: 0-10377000 | Loss: 2.309 | 675 ms/step , 58217.05 GFLOP/s , 531877.3 tokens/s INFO:__main__:2024-10-26 20:57:18 | Epoch: 0 | Step: 12980 | Dataset: 0-10385000 | Loss: 2.289 | 674 ms/step , 58286.83 GFLOP/s , 529669.2 tokens/s INFO:__main__:2024-10-26 20:57:25 | Epoch: 0 | Step: 12990 | Dataset: 0-10393000 | Loss: 2.258 | 675 ms/step , 58220.47 GFLOP/s , 531187.2 tokens/s INFO:__main__:2024-10-26 20:57:32 | Validation | Step: 13000 | Val_loss: 2.360 | Best_val_loss: 2.3770 INFO:__main__:2024-10-26 20:57:32 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_205732_step_13000.pt` INFO:__main__:2024-10-26 20:57:34 | Epoch: 0 | Step: 13000 | Dataset: 0-10401000 | Loss: 2.358 | 674 ms/step , 58337.08 GFLOP/s , 477036.0 tokens/s INFO:__main__:2024-10-26 20:57:42 | Epoch: 0 | Step: 13010 | Dataset: 0-10409000 | Loss: 2.346 | 675 ms/step , 58207.88 GFLOP/s , 531738.5 tokens/s INFO:__main__:2024-10-26 20:57:49 | Epoch: 0 | Step: 13020 | Dataset: 0-10417000 | Loss: 2.400 | 675 ms/step , 58277.75 GFLOP/s , 532433.2 tokens/s INFO:__main__:2024-10-26 20:57:57 | Epoch: 0 | Step: 13030 | Dataset: 0-10425000 | Loss: 2.324 | 675 ms/step , 58276.55 GFLOP/s , 531855.0 tokens/s INFO:__main__:2024-10-26 20:58:05 | Epoch: 0 | Step: 13040 | Dataset: 0-10433000 | Loss: 2.384 | 676 ms/step , 58146.05 GFLOP/s , 531959.5 tokens/s INFO:__main__:2024-10-26 20:58:12 | Epoch: 0 | Step: 13050 | Dataset: 0-10441000 | Loss: 2.337 | 674 ms/step , 58321.88 GFLOP/s , 532746.0 tokens/s INFO:__main__:2024-10-26 20:58:20 | Epoch: 0 | Step: 13060 | Dataset: 0-10449000 | Loss: 2.364 | 675 ms/step , 58246.67 GFLOP/s , 530076.7 tokens/s INFO:__main__:2024-10-26 20:58:28 | Epoch: 0 | Step: 13070 | Dataset: 0-10457000 | Loss: 2.269 | 674 ms/step , 58296.19 GFLOP/s , 533312.3 tokens/s INFO:__main__:2024-10-26 20:58:35 | Epoch: 0 | Step: 13080 | Dataset: 0-10465000 | Loss: 2.389 | 675 ms/step , 58274.48 GFLOP/s , 532454.6 tokens/s INFO:__main__:2024-10-26 20:58:43 | Epoch: 0 | Step: 13090 | Dataset: 0-10473000 | Loss: 2.356 | 674 ms/step , 58295.05 GFLOP/s , 533002.0 tokens/s INFO:__main__:2024-10-26 20:58:51 | Epoch: 0 | Step: 13100 | Dataset: 0-10481000 | Loss: 2.292 | 674 ms/step , 58353.36 GFLOP/s , 533297.0 tokens/s INFO:__main__:2024-10-26 20:58:58 | Epoch: 0 | Step: 13110 | Dataset: 0-10489000 | Loss: 2.342 | 674 ms/step , 58352.55 GFLOP/s , 533479.4 tokens/s INFO:__main__:2024-10-26 20:59:06 | Epoch: 0 | Step: 13120 | Dataset: 0-10497000 | Loss: 2.312 | 674 ms/step , 58305.71 GFLOP/s , 533207.7 tokens/s INFO:__main__:2024-10-26 20:59:14 | Epoch: 0 | Step: 13130 | Dataset: 0-10505000 | Loss: 2.348 | 674 ms/step , 58323.41 GFLOP/s , 533304.8 tokens/s INFO:__main__:2024-10-26 20:59:21 | Epoch: 0 | Step: 13140 | Dataset: 0-10513000 | Loss: 2.315 | 673 ms/step , 58366.46 GFLOP/s , 533586.4 tokens/s INFO:__main__:2024-10-26 20:59:29 | Epoch: 0 | Step: 13150 | Dataset: 0-10521000 | Loss: 2.325 | 674 ms/step , 58304.41 GFLOP/s , 533274.3 tokens/s INFO:__main__:2024-10-26 20:59:37 | Epoch: 0 | Step: 13160 | Dataset: 0-10529000 | Loss: 2.111 | 673 ms/step , 58382.81 GFLOP/s , 533256.0 tokens/s INFO:__main__:2024-10-26 20:59:45 | Epoch: 0 | Step: 13170 | Dataset: 0-10537000 | Loss: 1.919 | 675 ms/step , 58263.52 GFLOP/s , 531312.9 tokens/s INFO:__main__:2024-10-26 20:59:52 | Epoch: 0 | Step: 13180 | Dataset: 0-10545000 | Loss: 1.878 | 674 ms/step , 58310.19 GFLOP/s , 532861.9 tokens/s INFO:__main__:2024-10-26 21:00:00 | Epoch: 0 | Step: 13190 | Dataset: 0-10553000 | Loss: 1.862 | 674 ms/step , 58310.55 GFLOP/s , 532684.3 tokens/s INFO:__main__:2024-10-26 21:00:07 | Epoch: 0 | Step: 13200 | Dataset: 0-10561000 | Loss: 1.825 | 675 ms/step , 58268.62 GFLOP/s , 612144.6 tokens/s INFO:__main__:2024-10-26 21:00:14 | Epoch: 0 | Step: 13210 | Dataset: 0-10569000 | Loss: 1.797 | 674 ms/step , 58296.48 GFLOP/s , 533086.9 tokens/s INFO:__main__:2024-10-26 21:00:22 | Epoch: 0 | Step: 13220 | Dataset: 0-10577000 | Loss: 1.828 | 674 ms/step , 58331.49 GFLOP/s , 532479.3 tokens/s INFO:__main__:2024-10-26 21:00:30 | Epoch: 0 | Step: 13230 | Dataset: 0-10585000 | Loss: 1.754 | 673 ms/step , 58372.40 GFLOP/s , 532378.5 tokens/s INFO:__main__:2024-10-26 21:00:37 | Epoch: 0 | Step: 13240 | Dataset: 0-10593000 | Loss: 1.795 | 674 ms/step , 58358.58 GFLOP/s , 532760.6 tokens/s INFO:__main__:2024-10-26 21:00:45 | Epoch: 0 | Step: 13250 | Dataset: 0-10601000 | Loss: 1.794 | 674 ms/step , 58360.86 GFLOP/s , 533201.8 tokens/s INFO:__main__:2024-10-26 21:00:53 | Epoch: 0 | Step: 13260 | Dataset: 0-10609000 | Loss: 1.757 | 677 ms/step , 58093.80 GFLOP/s , 532063.2 tokens/s INFO:__main__:2024-10-26 21:01:00 | Epoch: 0 | Step: 13270 | Dataset: 0-10617000 | Loss: 1.758 | 676 ms/step , 58175.39 GFLOP/s , 531756.3 tokens/s INFO:__main__:2024-10-26 21:01:08 | Epoch: 0 | Step: 13280 | Dataset: 0-10625000 | Loss: 1.768 | 677 ms/step , 58073.03 GFLOP/s , 530998.8 tokens/s INFO:__main__:2024-10-26 21:01:16 | Epoch: 0 | Step: 13290 | Dataset: 0-10633000 | Loss: 1.726 | 678 ms/step , 57946.97 GFLOP/s , 529409.1 tokens/s INFO:__main__:2024-10-26 21:01:24 | Epoch: 0 | Step: 13300 | Dataset: 0-10641000 | Loss: 1.778 | 676 ms/step , 58151.84 GFLOP/s , 531432.8 tokens/s INFO:__main__:2024-10-26 21:01:31 | Epoch: 0 | Step: 13310 | Dataset: 0-10649000 | Loss: 1.779 | 675 ms/step , 58230.32 GFLOP/s , 531804.6 tokens/s INFO:__main__:2024-10-26 21:01:39 | Epoch: 0 | Step: 13320 | Dataset: 0-10657000 | Loss: 1.748 | 687 ms/step , 57206.56 GFLOP/s , 530830.8 tokens/s INFO:__main__:2024-10-26 21:01:47 | Epoch: 0 | Step: 13330 | Dataset: 0-10665000 | Loss: 2.755 | 687 ms/step , 57251.04 GFLOP/s , 523099.6 tokens/s INFO:__main__:2024-10-26 21:01:55 | Epoch: 0 | Step: 13340 | Dataset: 0-10673000 | Loss: 2.419 | 688 ms/step , 57169.25 GFLOP/s , 523432.7 tokens/s INFO:__main__:2024-10-26 21:02:03 | Epoch: 0 | Step: 13350 | Dataset: 0-10681000 | Loss: 2.365 | 686 ms/step , 57299.92 GFLOP/s , 522961.7 tokens/s INFO:__main__:2024-10-26 21:02:10 | Epoch: 0 | Step: 13360 | Dataset: 0-10689000 | Loss: 2.407 | 687 ms/step , 57247.06 GFLOP/s , 523223.0 tokens/s INFO:__main__:2024-10-26 21:02:18 | Epoch: 0 | Step: 13370 | Dataset: 0-10697000 | Loss: 2.329 | 674 ms/step , 58330.02 GFLOP/s , 527339.4 tokens/s INFO:__main__:2024-10-26 21:02:26 | Epoch: 0 | Step: 13380 | Dataset: 0-10705000 | Loss: 2.318 | 675 ms/step , 58240.50 GFLOP/s , 533385.5 tokens/s INFO:__main__:2024-10-26 21:02:33 | Epoch: 0 | Step: 13390 | Dataset: 0-10713000 | Loss: 2.287 | 674 ms/step , 58336.42 GFLOP/s , 532933.8 tokens/s INFO:__main__:2024-10-26 21:02:41 | Epoch: 0 | Step: 13400 | Dataset: 0-10721000 | Loss: 2.318 | 674 ms/step , 58348.51 GFLOP/s , 533720.8 tokens/s INFO:__main__:2024-10-26 21:02:49 | Epoch: 0 | Step: 13410 | Dataset: 0-10729000 | Loss: 2.402 | 680 ms/step , 57824.28 GFLOP/s , 530301.5 tokens/s INFO:__main__:2024-10-26 21:02:57 | Epoch: 0 | Step: 13420 | Dataset: 0-10737000 | Loss: 2.285 | 679 ms/step , 57870.47 GFLOP/s , 529276.3 tokens/s INFO:__main__:2024-10-26 21:03:04 | Epoch: 0 | Step: 13430 | Dataset: 0-10745000 | Loss: 2.330 | 678 ms/step , 57963.62 GFLOP/s , 529957.0 tokens/s INFO:__main__:2024-10-26 21:03:12 | Epoch: 0 | Step: 13440 | Dataset: 0-10753000 | Loss: 2.311 | 678 ms/step , 58019.39 GFLOP/s , 530856.0 tokens/s INFO:__main__:2024-10-26 21:03:20 | Epoch: 0 | Step: 13450 | Dataset: 0-10761000 | Loss: 2.158 | 678 ms/step , 57950.57 GFLOP/s , 530189.7 tokens/s INFO:__main__:2024-10-26 21:03:28 | Epoch: 0 | Step: 13460 | Dataset: 0-10769000 | Loss: 2.233 | 680 ms/step , 57803.45 GFLOP/s , 530032.9 tokens/s INFO:__main__:2024-10-26 21:03:35 | Epoch: 0 | Step: 13470 | Dataset: 0-10777000 | Loss: 2.281 | 684 ms/step , 57440.27 GFLOP/s , 526383.3 tokens/s INFO:__main__:2024-10-26 21:03:43 | Epoch: 0 | Step: 13480 | Dataset: 0-10785000 | Loss: 2.310 | 674 ms/step , 58316.46 GFLOP/s , 525197.0 tokens/s INFO:__main__:2024-10-26 21:03:51 | Epoch: 0 | Step: 13490 | Dataset: 0-10793000 | Loss: 2.202 | 681 ms/step , 57687.80 GFLOP/s , 526126.7 tokens/s INFO:__main__:2024-10-26 21:03:59 | Epoch: 0 | Step: 13500 | Dataset: 0-10801000 | Loss: 1.815 | 678 ms/step , 58006.19 GFLOP/s , 526708.3 tokens/s INFO:__main__:2024-10-26 21:04:06 | Epoch: 0 | Step: 13510 | Dataset: 0-10809000 | Loss: 1.800 | 676 ms/step , 58185.04 GFLOP/s , 526611.1 tokens/s INFO:__main__:2024-10-26 21:04:14 | Epoch: 0 | Step: 13520 | Dataset: 0-10817000 | Loss: 1.790 | 678 ms/step , 57962.53 GFLOP/s , 526465.5 tokens/s INFO:__main__:2024-10-26 21:04:22 | Epoch: 0 | Step: 13530 | Dataset: 0-10825000 | Loss: 1.780 | 675 ms/step , 58196.87 GFLOP/s , 518647.4 tokens/s INFO:__main__:2024-10-26 21:04:30 | Epoch: 0 | Step: 13540 | Dataset: 0-10833000 | Loss: 1.776 | 675 ms/step , 58220.51 GFLOP/s , 520008.2 tokens/s INFO:__main__:2024-10-26 21:04:38 | Epoch: 0 | Step: 13550 | Dataset: 0-10841000 | Loss: 1.744 | 677 ms/step , 58083.62 GFLOP/s , 516154.0 tokens/s INFO:__main__:2024-10-26 21:04:46 | Epoch: 0 | Step: 13560 | Dataset: 0-10849000 | Loss: 1.736 | 674 ms/step , 58289.39 GFLOP/s , 526057.2 tokens/s INFO:__main__:2024-10-26 21:04:53 | Epoch: 0 | Step: 13570 | Dataset: 0-10857000 | Loss: 1.744 | 677 ms/step , 58093.88 GFLOP/s , 526148.4 tokens/s INFO:__main__:2024-10-26 21:05:01 | Epoch: 0 | Step: 13580 | Dataset: 0-10865000 | Loss: 1.762 | 678 ms/step , 58000.72 GFLOP/s , 527957.4 tokens/s INFO:__main__:2024-10-26 21:05:09 | Epoch: 0 | Step: 13590 | Dataset: 0-10873000 | Loss: 1.752 | 677 ms/step , 58094.76 GFLOP/s , 523064.8 tokens/s INFO:__main__:2024-10-26 21:05:17 | Epoch: 0 | Step: 13600 | Dataset: 0-10881000 | Loss: 1.747 | 675 ms/step , 58209.63 GFLOP/s , 529424.5 tokens/s INFO:__main__:2024-10-26 21:05:25 | Epoch: 0 | Step: 13610 | Dataset: 0-10889000 | Loss: 1.753 | 677 ms/step , 58101.93 GFLOP/s , 527607.5 tokens/s INFO:__main__:2024-10-26 21:05:32 | Epoch: 0 | Step: 13620 | Dataset: 0-10897000 | Loss: 1.743 | 676 ms/step , 58164.91 GFLOP/s , 530963.1 tokens/s INFO:__main__:2024-10-26 21:05:40 | Epoch: 0 | Step: 13630 | Dataset: 0-10905000 | Loss: 1.726 | 676 ms/step , 58153.58 GFLOP/s , 530748.6 tokens/s INFO:__main__:2024-10-26 21:05:48 | Epoch: 0 | Step: 13640 | Dataset: 0-10913000 | Loss: 1.725 | 676 ms/step , 58137.40 GFLOP/s , 531270.2 tokens/s INFO:__main__:2024-10-26 21:05:55 | Epoch: 0 | Step: 13650 | Dataset: 0-10921000 | Loss: 1.728 | 678 ms/step , 58004.46 GFLOP/s , 530428.8 tokens/s INFO:__main__:2024-10-26 21:06:03 | Epoch: 0 | Step: 13660 | Dataset: 0-10929000 | Loss: 1.690 | 674 ms/step , 58285.47 GFLOP/s , 540890.6 tokens/s INFO:__main__:2024-10-26 21:06:11 | Epoch: 0 | Step: 13670 | Dataset: 0-10937000 | Loss: 2.539 | 674 ms/step , 58301.81 GFLOP/s , 531950.7 tokens/s INFO:__main__:2024-10-26 21:06:18 | Epoch: 0 | Step: 13680 | Dataset: 0-10945000 | Loss: 2.429 | 675 ms/step , 58222.69 GFLOP/s , 531774.7 tokens/s INFO:__main__:2024-10-26 21:06:26 | Epoch: 0 | Step: 13690 | Dataset: 0-10953000 | Loss: 2.370 | 674 ms/step , 58314.79 GFLOP/s , 532509.9 tokens/s INFO:__main__:2024-10-26 21:06:34 | Epoch: 0 | Step: 13700 | Dataset: 0-10961000 | Loss: 2.430 | 677 ms/step , 58021.46 GFLOP/s , 530458.0 tokens/s INFO:__main__:2024-10-26 21:06:42 | Epoch: 0 | Step: 13710 | Dataset: 0-10969000 | Loss: 2.286 | 678 ms/step , 57938.23 GFLOP/s , 529939.2 tokens/s INFO:__main__:2024-10-26 21:06:49 | Epoch: 0 | Step: 13720 | Dataset: 0-10977000 | Loss: 2.334 | 676 ms/step , 58177.37 GFLOP/s , 529773.1 tokens/s INFO:__main__:2024-10-26 21:06:57 | Epoch: 0 | Step: 13730 | Dataset: 0-10985000 | Loss: 2.349 | 676 ms/step , 58143.77 GFLOP/s , 528780.4 tokens/s INFO:__main__:2024-10-26 21:07:05 | Epoch: 0 | Step: 13740 | Dataset: 0-10993000 | Loss: 2.293 | 676 ms/step , 58157.00 GFLOP/s , 530267.2 tokens/s INFO:__main__:2024-10-26 21:07:12 | Epoch: 0 | Step: 13750 | Dataset: 0-11001000 | Loss: 2.366 | 675 ms/step , 58230.99 GFLOP/s , 529977.0 tokens/s INFO:__main__:2024-10-26 21:07:20 | Epoch: 0 | Step: 13760 | Dataset: 0-11009000 | Loss: 2.342 | 677 ms/step , 58090.51 GFLOP/s , 529107.0 tokens/s INFO:__main__:2024-10-26 21:07:28 | Epoch: 0 | Step: 13770 | Dataset: 0-11017000 | Loss: 2.279 | 675 ms/step , 58206.88 GFLOP/s , 529255.5 tokens/s INFO:__main__:2024-10-26 21:07:36 | Epoch: 0 | Step: 13780 | Dataset: 0-11025000 | Loss: 2.275 | 674 ms/step , 58297.36 GFLOP/s , 529650.8 tokens/s INFO:__main__:2024-10-26 21:07:43 | Epoch: 0 | Step: 13790 | Dataset: 0-11033000 | Loss: 2.324 | 673 ms/step , 58368.73 GFLOP/s , 530332.0 tokens/s INFO:__main__:2024-10-26 21:07:51 | Epoch: 0 | Step: 13800 | Dataset: 0-11041000 | Loss: 2.341 | 675 ms/step , 58252.11 GFLOP/s , 530261.4 tokens/s INFO:__main__:2024-10-26 21:07:59 | Epoch: 0 | Step: 13810 | Dataset: 0-11049000 | Loss: 2.325 | 676 ms/step , 58166.74 GFLOP/s , 530036.0 tokens/s INFO:__main__:2024-10-26 21:08:07 | Epoch: 0 | Step: 13820 | Dataset: 0-11057000 | Loss: 2.277 | 676 ms/step , 58163.70 GFLOP/s , 527869.8 tokens/s INFO:__main__:2024-10-26 21:08:14 | Epoch: 0 | Step: 13830 | Dataset: 0-11065000 | Loss: 2.339 | 676 ms/step , 58152.43 GFLOP/s , 527767.6 tokens/s INFO:__main__:2024-10-26 21:08:22 | Epoch: 0 | Step: 13840 | Dataset: 0-11073000 | Loss: 2.233 | 674 ms/step , 58304.03 GFLOP/s , 529776.5 tokens/s INFO:__main__:2024-10-26 21:08:30 | Epoch: 0 | Step: 13850 | Dataset: 0-11081000 | Loss: 2.347 | 676 ms/step , 58145.03 GFLOP/s , 531128.8 tokens/s INFO:__main__:2024-10-26 21:08:38 | Epoch: 0 | Step: 13860 | Dataset: 0-11089000 | Loss: 2.259 | 675 ms/step , 58207.61 GFLOP/s , 527512.5 tokens/s INFO:__main__:2024-10-26 21:08:45 | Epoch: 0 | Step: 13870 | Dataset: 0-11097000 | Loss: 2.259 | 675 ms/step , 58213.69 GFLOP/s , 526354.4 tokens/s INFO:__main__:2024-10-26 21:08:53 | Epoch: 0 | Step: 13880 | Dataset: 0-11105000 | Loss: 2.273 | 675 ms/step , 58262.89 GFLOP/s , 528785.0 tokens/s INFO:__main__:2024-10-26 21:09:01 | Epoch: 0 | Step: 13890 | Dataset: 0-11113000 | Loss: 2.257 | 676 ms/step , 58150.22 GFLOP/s , 528880.9 tokens/s INFO:__main__:2024-10-26 21:09:09 | Epoch: 0 | Step: 13900 | Dataset: 0-11121000 | Loss: 2.279 | 678 ms/step , 58012.61 GFLOP/s , 530449.8 tokens/s INFO:__main__:2024-10-26 21:09:16 | Epoch: 0 | Step: 13910 | Dataset: 0-11129000 | Loss: 2.321 | 676 ms/step , 58137.03 GFLOP/s , 530186.0 tokens/s INFO:__main__:2024-10-26 21:09:24 | Epoch: 0 | Step: 13920 | Dataset: 0-11137000 | Loss: 2.365 | 675 ms/step , 58250.63 GFLOP/s , 530109.2 tokens/s INFO:__main__:2024-10-26 21:09:32 | Epoch: 0 | Step: 13930 | Dataset: 0-11145000 | Loss: 2.243 | 677 ms/step , 58047.60 GFLOP/s , 530803.8 tokens/s INFO:__main__:2024-10-26 21:09:39 | Epoch: 0 | Step: 13940 | Dataset: 0-11153000 | Loss: 2.185 | 675 ms/step , 58236.65 GFLOP/s , 531335.7 tokens/s INFO:__main__:2024-10-26 21:09:47 | Epoch: 0 | Step: 13950 | Dataset: 0-11161000 | Loss: 2.221 | 674 ms/step , 58295.64 GFLOP/s , 531038.2 tokens/s INFO:__main__:2024-10-26 21:09:55 | Epoch: 0 | Step: 13960 | Dataset: 0-11169000 | Loss: 2.261 | 677 ms/step , 58084.40 GFLOP/s , 529058.1 tokens/s INFO:__main__:2024-10-26 21:10:03 | Epoch: 0 | Step: 13970 | Dataset: 0-11177000 | Loss: 2.299 | 676 ms/step , 58192.69 GFLOP/s , 530540.0 tokens/s INFO:__main__:2024-10-26 21:10:10 | Epoch: 0 | Step: 13980 | Dataset: 0-11185000 | Loss: 2.261 | 675 ms/step , 58202.73 GFLOP/s , 530267.2 tokens/s INFO:__main__:2024-10-26 21:10:18 | Epoch: 0 | Step: 13990 | Dataset: 0-11193000 | Loss: 2.421 | 676 ms/step , 58167.90 GFLOP/s , 530217.7 tokens/s INFO:__main__:2024-10-26 21:10:25 | Validation | Step: 14000 | Val_loss: 2.313 | Best_val_loss: 2.3601 INFO:__main__:2024-10-26 21:10:25 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_211025_step_14000.pt` INFO:__main__:2024-10-26 21:10:27 | Epoch: 0 | Step: 14000 | Dataset: 0-11201000 | Loss: 2.301 | 674 ms/step , 58314.26 GFLOP/s , 477684.8 tokens/s INFO:__main__:2024-10-26 21:10:34 | Epoch: 0 | Step: 14010 | Dataset: 0-11209000 | Loss: 2.347 | 676 ms/step , 58171.99 GFLOP/s , 529450.7 tokens/s INFO:__main__:2024-10-26 21:10:42 | Epoch: 0 | Step: 14020 | Dataset: 0-11217000 | Loss: 2.358 | 678 ms/step , 57936.26 GFLOP/s , 525873.1 tokens/s INFO:__main__:2024-10-26 21:10:50 | Epoch: 0 | Step: 14030 | Dataset: 0-11225000 | Loss: 2.309 | 677 ms/step , 58066.26 GFLOP/s , 526069.1 tokens/s INFO:__main__:2024-10-26 21:10:58 | Epoch: 0 | Step: 14040 | Dataset: 0-11233000 | Loss: 2.380 | 686 ms/step , 57276.26 GFLOP/s , 525455.3 tokens/s INFO:__main__:2024-10-26 21:11:06 | Epoch: 0 | Step: 14050 | Dataset: 0-11241000 | Loss: 2.404 | 685 ms/step , 57407.09 GFLOP/s , 527843.0 tokens/s INFO:__main__:2024-10-26 21:11:13 | Epoch: 0 | Step: 14060 | Dataset: 0-11249000 | Loss: 2.273 | 678 ms/step , 58000.76 GFLOP/s , 526438.7 tokens/s INFO:__main__:2024-10-26 21:11:21 | Epoch: 0 | Step: 14070 | Dataset: 0-11257000 | Loss: 2.349 | 676 ms/step , 58149.10 GFLOP/s , 527284.4 tokens/s INFO:__main__:2024-10-26 21:11:29 | Epoch: 0 | Step: 14080 | Dataset: 0-11265000 | Loss: 2.244 | 677 ms/step , 58082.72 GFLOP/s , 529853.9 tokens/s INFO:__main__:2024-10-26 21:11:37 | Epoch: 0 | Step: 14090 | Dataset: 0-11273000 | Loss: 2.280 | 677 ms/step , 58085.09 GFLOP/s , 528559.1 tokens/s INFO:__main__:2024-10-26 21:11:44 | Epoch: 0 | Step: 14100 | Dataset: 0-11281000 | Loss: 2.296 | 677 ms/step , 58078.78 GFLOP/s , 528886.9 tokens/s INFO:__main__:2024-10-26 21:11:52 | Epoch: 0 | Step: 14110 | Dataset: 0-11289000 | Loss: 2.373 | 678 ms/step , 57994.22 GFLOP/s , 528229.0 tokens/s INFO:__main__:2024-10-26 21:12:00 | Epoch: 0 | Step: 14120 | Dataset: 0-11297000 | Loss: 2.349 | 678 ms/step , 57981.01 GFLOP/s , 528966.2 tokens/s INFO:__main__:2024-10-26 21:12:08 | Epoch: 0 | Step: 14130 | Dataset: 0-11305000 | Loss: 2.192 | 676 ms/step , 58113.89 GFLOP/s , 528074.9 tokens/s INFO:__main__:2024-10-26 21:12:15 | Epoch: 0 | Step: 14140 | Dataset: 0-11313000 | Loss: 2.350 | 676 ms/step , 58174.46 GFLOP/s , 527718.2 tokens/s INFO:__main__:2024-10-26 21:12:23 | Epoch: 0 | Step: 14150 | Dataset: 0-11321000 | Loss: 2.377 | 677 ms/step , 58086.42 GFLOP/s , 529336.7 tokens/s INFO:__main__:2024-10-26 21:12:31 | Epoch: 0 | Step: 14160 | Dataset: 0-11329000 | Loss: 2.288 | 675 ms/step , 58192.99 GFLOP/s , 525964.1 tokens/s INFO:__main__:2024-10-26 21:12:39 | Epoch: 0 | Step: 14170 | Dataset: 0-11337000 | Loss: 2.294 | 678 ms/step , 58006.71 GFLOP/s , 528439.1 tokens/s INFO:__main__:2024-10-26 21:12:46 | Epoch: 0 | Step: 14180 | Dataset: 0-11345000 | Loss: 2.305 | 676 ms/step , 58120.24 GFLOP/s , 528662.7 tokens/s INFO:__main__:2024-10-26 21:12:54 | Epoch: 0 | Step: 14190 | Dataset: 0-11353000 | Loss: 2.326 | 676 ms/step , 58125.72 GFLOP/s , 530120.6 tokens/s INFO:__main__:2024-10-26 21:13:02 | Epoch: 0 | Step: 14200 | Dataset: 0-11361000 | Loss: 2.263 | 677 ms/step , 58091.16 GFLOP/s , 527569.7 tokens/s INFO:__main__:2024-10-26 21:13:10 | Epoch: 0 | Step: 14210 | Dataset: 0-11369000 | Loss: 2.273 | 680 ms/step , 57826.40 GFLOP/s , 528449.2 tokens/s INFO:__main__:2024-10-26 21:13:17 | Epoch: 0 | Step: 14220 | Dataset: 0-11377000 | Loss: 2.358 | 678 ms/step , 57954.77 GFLOP/s , 527063.4 tokens/s INFO:__main__:2024-10-26 21:13:25 | Epoch: 0 | Step: 14230 | Dataset: 0-11385000 | Loss: 2.238 | 677 ms/step , 58062.67 GFLOP/s , 530831.2 tokens/s INFO:__main__:2024-10-26 21:13:33 | Epoch: 0 | Step: 14240 | Dataset: 0-11393000 | Loss: 2.278 | 677 ms/step , 58060.99 GFLOP/s , 531222.8 tokens/s INFO:__main__:2024-10-26 21:13:41 | Epoch: 0 | Step: 14250 | Dataset: 0-11401000 | Loss: 2.296 | 679 ms/step , 57902.64 GFLOP/s , 528691.6 tokens/s INFO:__main__:2024-10-26 21:13:48 | Epoch: 0 | Step: 14260 | Dataset: 0-11409000 | Loss: 2.217 | 678 ms/step , 57936.43 GFLOP/s , 526852.4 tokens/s INFO:__main__:2024-10-26 21:13:56 | Epoch: 0 | Step: 14270 | Dataset: 0-11417000 | Loss: 2.320 | 676 ms/step , 58117.30 GFLOP/s , 527511.0 tokens/s INFO:__main__:2024-10-26 21:14:04 | Epoch: 0 | Step: 14280 | Dataset: 0-11425000 | Loss: 2.299 | 677 ms/step , 58094.87 GFLOP/s , 527862.3 tokens/s INFO:__main__:2024-10-26 21:14:12 | Epoch: 0 | Step: 14290 | Dataset: 0-11433000 | Loss: 2.294 | 677 ms/step , 58084.80 GFLOP/s , 527624.8 tokens/s INFO:__main__:2024-10-26 21:14:19 | Epoch: 0 | Step: 14300 | Dataset: 0-11441000 | Loss: 2.205 | 676 ms/step , 58142.95 GFLOP/s , 528467.4 tokens/s INFO:__main__:2024-10-26 21:14:27 | Epoch: 0 | Step: 14310 | Dataset: 0-11449000 | Loss: 2.107 | 678 ms/step , 57963.54 GFLOP/s , 528026.2 tokens/s INFO:__main__:2024-10-26 21:14:35 | Epoch: 0 | Step: 14320 | Dataset: 0-11457000 | Loss: 1.881 | 677 ms/step , 58089.55 GFLOP/s , 528844.2 tokens/s INFO:__main__:2024-10-26 21:14:43 | Epoch: 0 | Step: 14330 | Dataset: 0-11465000 | Loss: 1.815 | 677 ms/step , 58092.01 GFLOP/s , 528821.1 tokens/s INFO:__main__:2024-10-26 21:14:50 | Epoch: 0 | Step: 14340 | Dataset: 0-11473000 | Loss: 1.820 | 677 ms/step , 58033.22 GFLOP/s , 528958.0 tokens/s INFO:__main__:2024-10-26 21:14:58 | Epoch: 0 | Step: 14350 | Dataset: 0-11481000 | Loss: 1.812 | 675 ms/step , 58193.97 GFLOP/s , 526502.6 tokens/s INFO:__main__:2024-10-26 21:15:06 | Epoch: 0 | Step: 14360 | Dataset: 0-11489000 | Loss: 1.783 | 677 ms/step , 58105.02 GFLOP/s , 528631.3 tokens/s INFO:__main__:2024-10-26 21:15:14 | Epoch: 0 | Step: 14370 | Dataset: 0-11497000 | Loss: 1.759 | 675 ms/step , 58272.47 GFLOP/s , 527185.6 tokens/s INFO:__main__:2024-10-26 21:15:21 | Epoch: 0 | Step: 14380 | Dataset: 0-11505000 | Loss: 1.781 | 677 ms/step , 58046.25 GFLOP/s , 528102.9 tokens/s INFO:__main__:2024-10-26 21:15:29 | Epoch: 0 | Step: 14390 | Dataset: 0-11513000 | Loss: 1.767 | 676 ms/step , 58162.15 GFLOP/s , 529114.0 tokens/s INFO:__main__:2024-10-26 21:15:37 | Epoch: 0 | Step: 14400 | Dataset: 0-11521000 | Loss: 2.474 | 677 ms/step , 58087.66 GFLOP/s , 529224.0 tokens/s INFO:__main__:2024-10-26 21:15:45 | Epoch: 0 | Step: 14410 | Dataset: 0-11529000 | Loss: 2.400 | 676 ms/step , 58152.60 GFLOP/s , 529803.3 tokens/s INFO:__main__:2024-10-26 21:15:52 | Epoch: 0 | Step: 14420 | Dataset: 0-11537000 | Loss: 2.262 | 677 ms/step , 58079.67 GFLOP/s , 529321.7 tokens/s INFO:__main__:2024-10-26 21:16:00 | Epoch: 0 | Step: 14430 | Dataset: 0-11545000 | Loss: 2.416 | 676 ms/step , 58146.18 GFLOP/s , 528839.6 tokens/s INFO:__main__:2024-10-26 21:16:08 | Epoch: 0 | Step: 14440 | Dataset: 0-11553000 | Loss: 2.346 | 677 ms/step , 58054.10 GFLOP/s , 528086.6 tokens/s INFO:__main__:2024-10-26 21:16:16 | Epoch: 0 | Step: 14450 | Dataset: 0-11561000 | Loss: 2.402 | 678 ms/step , 57971.80 GFLOP/s , 528188.2 tokens/s INFO:__main__:2024-10-26 21:16:23 | Epoch: 0 | Step: 14460 | Dataset: 0-11569000 | Loss: 2.317 | 679 ms/step , 57902.39 GFLOP/s , 527284.3 tokens/s INFO:__main__:2024-10-26 21:16:31 | Epoch: 0 | Step: 14470 | Dataset: 0-11577000 | Loss: 2.301 | 677 ms/step , 58102.18 GFLOP/s , 528008.8 tokens/s INFO:__main__:2024-10-26 21:16:39 | Epoch: 0 | Step: 14480 | Dataset: 0-11585000 | Loss: 2.342 | 676 ms/step , 58129.80 GFLOP/s , 528546.3 tokens/s INFO:__main__:2024-10-26 21:16:47 | Epoch: 0 | Step: 14490 | Dataset: 0-11593000 | Loss: 2.374 | 676 ms/step , 58140.26 GFLOP/s , 528733.1 tokens/s INFO:__main__:2024-10-26 21:16:54 | Epoch: 0 | Step: 14500 | Dataset: 0-11601000 | Loss: 2.351 | 675 ms/step , 58194.52 GFLOP/s , 527479.8 tokens/s INFO:__main__:2024-10-26 21:17:02 | Epoch: 0 | Step: 14510 | Dataset: 0-11609000 | Loss: 2.388 | 678 ms/step , 57949.42 GFLOP/s , 526015.6 tokens/s INFO:__main__:2024-10-26 21:17:10 | Epoch: 0 | Step: 14520 | Dataset: 0-11617000 | Loss: 2.274 | 675 ms/step , 58199.82 GFLOP/s , 529184.4 tokens/s INFO:__main__:2024-10-26 21:17:18 | Epoch: 0 | Step: 14530 | Dataset: 0-11625000 | Loss: 2.337 | 677 ms/step , 58103.43 GFLOP/s , 527879.5 tokens/s INFO:__main__:2024-10-26 21:17:25 | Epoch: 0 | Step: 14540 | Dataset: 0-11633000 | Loss: 2.388 | 675 ms/step , 58218.85 GFLOP/s , 529256.7 tokens/s INFO:__main__:2024-10-26 21:17:33 | Epoch: 0 | Step: 14550 | Dataset: 0-11641000 | Loss: 2.276 | 675 ms/step , 58231.07 GFLOP/s , 530928.6 tokens/s INFO:__main__:2024-10-26 21:17:41 | Epoch: 0 | Step: 14560 | Dataset: 0-11649000 | Loss: 2.361 | 676 ms/step , 58147.39 GFLOP/s , 528997.2 tokens/s INFO:__main__:2024-10-26 21:17:49 | Epoch: 0 | Step: 14570 | Dataset: 0-11657000 | Loss: 2.435 | 676 ms/step , 58155.39 GFLOP/s , 529875.5 tokens/s INFO:__main__:2024-10-26 21:17:56 | Epoch: 0 | Step: 14580 | Dataset: 0-11665000 | Loss: 2.309 | 676 ms/step , 58126.83 GFLOP/s , 530237.0 tokens/s INFO:__main__:2024-10-26 21:18:04 | Epoch: 0 | Step: 14590 | Dataset: 0-11673000 | Loss: 2.302 | 676 ms/step , 58172.11 GFLOP/s , 531067.9 tokens/s INFO:__main__:2024-10-26 21:18:12 | Epoch: 0 | Step: 14600 | Dataset: 0-11681000 | Loss: 2.319 | 674 ms/step , 58302.94 GFLOP/s , 531365.3 tokens/s INFO:__main__:2024-10-26 21:18:19 | Epoch: 0 | Step: 14610 | Dataset: 0-11689000 | Loss: 2.269 | 674 ms/step , 58282.70 GFLOP/s , 531709.6 tokens/s INFO:__main__:2024-10-26 21:18:27 | Epoch: 0 | Step: 14620 | Dataset: 0-11697000 | Loss: 2.328 | 675 ms/step , 58247.90 GFLOP/s , 531143.9 tokens/s INFO:__main__:2024-10-26 21:18:35 | Epoch: 0 | Step: 14630 | Dataset: 0-11705000 | Loss: 2.253 | 675 ms/step , 58241.68 GFLOP/s , 531222.6 tokens/s INFO:__main__:2024-10-26 21:18:43 | Epoch: 0 | Step: 14640 | Dataset: 0-11713000 | Loss: 2.270 | 675 ms/step , 58252.22 GFLOP/s , 531535.3 tokens/s INFO:__main__:2024-10-26 21:18:50 | Epoch: 0 | Step: 14650 | Dataset: 0-11721000 | Loss: 2.309 | 676 ms/step , 58155.41 GFLOP/s , 529192.9 tokens/s INFO:__main__:2024-10-26 21:18:58 | Epoch: 0 | Step: 14660 | Dataset: 0-11729000 | Loss: 2.256 | 676 ms/step , 58148.76 GFLOP/s , 530473.9 tokens/s INFO:__main__:2024-10-26 21:19:06 | Epoch: 0 | Step: 14670 | Dataset: 0-11737000 | Loss: 2.277 | 674 ms/step , 58281.38 GFLOP/s , 531382.4 tokens/s INFO:__main__:2024-10-26 21:19:13 | Epoch: 0 | Step: 14680 | Dataset: 0-11745000 | Loss: 2.283 | 675 ms/step , 58273.61 GFLOP/s , 531831.5 tokens/s INFO:__main__:2024-10-26 21:19:21 | Epoch: 0 | Step: 14690 | Dataset: 0-11753000 | Loss: 2.152 | 676 ms/step , 58192.34 GFLOP/s , 531552.3 tokens/s INFO:__main__:2024-10-26 21:19:29 | Epoch: 0 | Step: 14700 | Dataset: 0-11761000 | Loss: 2.265 | 676 ms/step , 58133.65 GFLOP/s , 532128.1 tokens/s INFO:__main__:2024-10-26 21:19:37 | Epoch: 0 | Step: 14710 | Dataset: 0-11769000 | Loss: 2.238 | 678 ms/step , 57994.43 GFLOP/s , 531841.2 tokens/s INFO:__main__:2024-10-26 21:19:44 | Epoch: 0 | Step: 14720 | Dataset: 0-11777000 | Loss: 2.887 | 676 ms/step , 58157.43 GFLOP/s , 531897.4 tokens/s INFO:__main__:2024-10-26 21:19:52 | Epoch: 0 | Step: 14730 | Dataset: 0-11785000 | Loss: 2.749 | 674 ms/step , 58283.93 GFLOP/s , 532894.5 tokens/s INFO:__main__:2024-10-26 21:20:00 | Epoch: 0 | Step: 14740 | Dataset: 0-11793000 | Loss: 2.715 | 679 ms/step , 57911.73 GFLOP/s , 532710.9 tokens/s INFO:__main__:2024-10-26 21:20:07 | Epoch: 0 | Step: 14750 | Dataset: 0-11801000 | Loss: 2.695 | 677 ms/step , 58076.61 GFLOP/s , 531726.1 tokens/s INFO:__main__:2024-10-26 21:20:15 | Epoch: 0 | Step: 14760 | Dataset: 0-11809000 | Loss: 2.691 | 676 ms/step , 58145.99 GFLOP/s , 531887.5 tokens/s INFO:__main__:2024-10-26 21:20:23 | Epoch: 0 | Step: 14770 | Dataset: 0-11817000 | Loss: 2.595 | 677 ms/step , 58073.79 GFLOP/s , 532095.0 tokens/s INFO:__main__:2024-10-26 21:20:30 | Epoch: 0 | Step: 14780 | Dataset: 0-11825000 | Loss: 2.620 | 677 ms/step , 58085.79 GFLOP/s , 532275.8 tokens/s INFO:__main__:2024-10-26 21:20:38 | Epoch: 0 | Step: 14790 | Dataset: 0-11833000 | Loss: 2.637 | 677 ms/step , 58037.15 GFLOP/s , 532246.7 tokens/s INFO:__main__:2024-10-26 21:20:46 | Epoch: 0 | Step: 14800 | Dataset: 0-11841000 | Loss: 2.600 | 677 ms/step , 58080.77 GFLOP/s , 531741.9 tokens/s INFO:__main__:2024-10-26 21:20:54 | Epoch: 0 | Step: 14810 | Dataset: 0-11849000 | Loss: 2.624 | 675 ms/step , 58244.56 GFLOP/s , 531422.4 tokens/s INFO:__main__:2024-10-26 21:21:01 | Epoch: 0 | Step: 14820 | Dataset: 0-11857000 | Loss: 2.577 | 677 ms/step , 58103.91 GFLOP/s , 532019.6 tokens/s INFO:__main__:2024-10-26 21:21:09 | Epoch: 0 | Step: 14830 | Dataset: 0-11865000 | Loss: 2.547 | 676 ms/step , 58158.97 GFLOP/s , 532019.1 tokens/s INFO:__main__:2024-10-26 21:21:17 | Epoch: 0 | Step: 14840 | Dataset: 0-11873000 | Loss: 2.575 | 675 ms/step , 58243.53 GFLOP/s , 531656.5 tokens/s INFO:__main__:2024-10-26 21:21:24 | Epoch: 0 | Step: 14850 | Dataset: 0-11881000 | Loss: 2.609 | 674 ms/step , 58307.83 GFLOP/s , 531507.0 tokens/s INFO:__main__:2024-10-26 21:21:32 | Epoch: 0 | Step: 14860 | Dataset: 0-11889000 | Loss: 2.630 | 678 ms/step , 58005.39 GFLOP/s , 531436.8 tokens/s INFO:__main__:2024-10-26 21:21:40 | Epoch: 0 | Step: 14870 | Dataset: 0-11897000 | Loss: 2.595 | 678 ms/step , 57956.59 GFLOP/s , 531233.9 tokens/s INFO:__main__:2024-10-26 21:21:47 | Epoch: 0 | Step: 14880 | Dataset: 0-11905000 | Loss: 2.451 | 678 ms/step , 57998.38 GFLOP/s , 531651.7 tokens/s INFO:__main__:2024-10-26 21:21:55 | Epoch: 0 | Step: 14890 | Dataset: 0-11913000 | Loss: 2.427 | 676 ms/step , 58158.46 GFLOP/s , 531040.3 tokens/s INFO:__main__:2024-10-26 21:22:03 | Epoch: 0 | Step: 14900 | Dataset: 0-11921000 | Loss: 2.312 | 677 ms/step , 58057.25 GFLOP/s , 531247.7 tokens/s INFO:__main__:2024-10-26 21:22:11 | Epoch: 0 | Step: 14910 | Dataset: 0-11929000 | Loss: 2.434 | 675 ms/step , 58228.63 GFLOP/s , 531905.1 tokens/s INFO:__main__:2024-10-26 21:22:18 | Epoch: 0 | Step: 14920 | Dataset: 0-11937000 | Loss: 2.400 | 675 ms/step , 58261.01 GFLOP/s , 531915.1 tokens/s INFO:__main__:2024-10-26 21:22:26 | Epoch: 0 | Step: 14930 | Dataset: 0-11945000 | Loss: 2.296 | 677 ms/step , 58085.26 GFLOP/s , 531145.4 tokens/s INFO:__main__:2024-10-26 21:22:34 | Epoch: 0 | Step: 14940 | Dataset: 0-11953000 | Loss: 2.403 | 678 ms/step , 58016.20 GFLOP/s , 530051.5 tokens/s INFO:__main__:2024-10-26 21:22:41 | Epoch: 0 | Step: 14950 | Dataset: 0-11961000 | Loss: 2.312 | 677 ms/step , 58094.64 GFLOP/s , 530220.5 tokens/s INFO:__main__:2024-10-26 21:22:49 | Epoch: 0 | Step: 14960 | Dataset: 0-11969000 | Loss: 2.384 | 674 ms/step , 58294.39 GFLOP/s , 531565.9 tokens/s INFO:__main__:2024-10-26 21:22:57 | Epoch: 0 | Step: 14970 | Dataset: 0-11977000 | Loss: 2.229 | 675 ms/step , 58251.11 GFLOP/s , 532041.3 tokens/s INFO:__main__:2024-10-26 21:23:05 | Epoch: 0 | Step: 14980 | Dataset: 0-11985000 | Loss: 2.320 | 676 ms/step , 58188.49 GFLOP/s , 531926.7 tokens/s INFO:__main__:2024-10-26 21:23:12 | Epoch: 0 | Step: 14990 | Dataset: 0-11993000 | Loss: 2.281 | 675 ms/step , 58207.85 GFLOP/s , 531398.6 tokens/s INFO:__main__:2024-10-26 21:23:20 | Validation | Step: 15000 | Val_loss: 2.263 | Best_val_loss: 2.3135 INFO:__main__:2024-10-26 21:23:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_212320_step_15000.pt` INFO:__main__:2024-10-26 21:23:21 | Epoch: 0 | Step: 15000 | Dataset: 0-12001000 | Loss: 2.343 | 675 ms/step , 58268.29 GFLOP/s , 477472.8 tokens/s INFO:__main__:2024-10-26 21:23:29 | Epoch: 0 | Step: 15010 | Dataset: 0-12009000 | Loss: 2.319 | 676 ms/step , 58143.82 GFLOP/s , 531783.2 tokens/s INFO:__main__:2024-10-26 21:23:36 | Epoch: 0 | Step: 15020 | Dataset: 0-12017000 | Loss: 2.262 | 675 ms/step , 58198.79 GFLOP/s , 531977.1 tokens/s INFO:__main__:2024-10-26 21:23:44 | Epoch: 0 | Step: 15030 | Dataset: 0-12025000 | Loss: 2.292 | 677 ms/step , 58085.71 GFLOP/s , 531095.8 tokens/s INFO:__main__:2024-10-26 21:23:52 | Epoch: 0 | Step: 15040 | Dataset: 0-12033000 | Loss: 2.402 | 675 ms/step , 58221.51 GFLOP/s , 531376.9 tokens/s INFO:__main__:2024-10-26 21:23:59 | Epoch: 0 | Step: 15050 | Dataset: 0-12041000 | Loss: 2.252 | 677 ms/step , 58103.83 GFLOP/s , 530256.4 tokens/s INFO:__main__:2024-10-26 21:24:07 | Epoch: 0 | Step: 15060 | Dataset: 0-12049000 | Loss: 2.286 | 682 ms/step , 57679.88 GFLOP/s , 529528.8 tokens/s INFO:__main__:2024-10-26 21:24:15 | Epoch: 0 | Step: 15070 | Dataset: 0-12057000 | Loss: 2.242 | 676 ms/step , 58111.37 GFLOP/s , 529250.2 tokens/s INFO:__main__:2024-10-26 21:24:23 | Epoch: 0 | Step: 15080 | Dataset: 0-12065000 | Loss: 2.354 | 676 ms/step , 58146.58 GFLOP/s , 529770.0 tokens/s INFO:__main__:2024-10-26 21:24:30 | Epoch: 0 | Step: 15090 | Dataset: 0-12073000 | Loss: 2.262 | 675 ms/step , 58220.93 GFLOP/s , 528305.5 tokens/s INFO:__main__:2024-10-26 21:24:38 | Epoch: 0 | Step: 15100 | Dataset: 0-12081000 | Loss: 2.296 | 675 ms/step , 58221.83 GFLOP/s , 529960.4 tokens/s INFO:__main__:2024-10-26 21:24:46 | Epoch: 0 | Step: 15110 | Dataset: 0-12089000 | Loss: 2.266 | 676 ms/step , 58183.09 GFLOP/s , 529805.8 tokens/s INFO:__main__:2024-10-26 21:24:54 | Epoch: 0 | Step: 15120 | Dataset: 0-12097000 | Loss: 2.251 | 676 ms/step , 58144.92 GFLOP/s , 529986.1 tokens/s INFO:__main__:2024-10-26 21:25:01 | Epoch: 0 | Step: 15130 | Dataset: 0-12105000 | Loss: 2.311 | 675 ms/step , 58216.07 GFLOP/s , 529903.0 tokens/s INFO:__main__:2024-10-26 21:25:09 | Epoch: 0 | Step: 15140 | Dataset: 0-12113000 | Loss: 2.248 | 677 ms/step , 58076.84 GFLOP/s , 530286.4 tokens/s INFO:__main__:2024-10-26 21:25:17 | Epoch: 0 | Step: 15150 | Dataset: 0-12121000 | Loss: 2.188 | 676 ms/step , 58187.68 GFLOP/s , 530933.1 tokens/s INFO:__main__:2024-10-26 21:25:24 | Epoch: 0 | Step: 15160 | Dataset: 0-12129000 | Loss: 2.248 | 680 ms/step , 57794.16 GFLOP/s , 530628.7 tokens/s INFO:__main__:2024-10-26 21:25:32 | Epoch: 0 | Step: 15170 | Dataset: 0-12137000 | Loss: 2.320 | 676 ms/step , 58122.89 GFLOP/s , 530720.3 tokens/s INFO:__main__:2024-10-26 21:25:40 | Epoch: 0 | Step: 15180 | Dataset: 0-12145000 | Loss: 2.245 | 675 ms/step , 58207.03 GFLOP/s , 531233.7 tokens/s INFO:__main__:2024-10-26 21:25:48 | Epoch: 0 | Step: 15190 | Dataset: 0-12153000 | Loss: 2.247 | 676 ms/step , 58182.42 GFLOP/s , 531016.6 tokens/s INFO:__main__:2024-10-26 21:25:55 | Epoch: 0 | Step: 15200 | Dataset: 0-12161000 | Loss: 2.439 | 678 ms/step , 57940.34 GFLOP/s , 530547.5 tokens/s INFO:__main__:2024-10-26 21:26:03 | Epoch: 0 | Step: 15210 | Dataset: 0-12169000 | Loss: 2.341 | 677 ms/step , 58106.40 GFLOP/s , 529602.5 tokens/s INFO:__main__:2024-10-26 21:26:11 | Epoch: 0 | Step: 15220 | Dataset: 0-12177000 | Loss: 2.354 | 675 ms/step , 58230.28 GFLOP/s , 530013.2 tokens/s INFO:__main__:2024-10-26 21:26:19 | Epoch: 0 | Step: 15230 | Dataset: 0-12185000 | Loss: 2.346 | 676 ms/step , 58164.93 GFLOP/s , 529428.1 tokens/s INFO:__main__:2024-10-26 21:26:26 | Epoch: 0 | Step: 15240 | Dataset: 0-12193000 | Loss: 2.334 | 676 ms/step , 58162.41 GFLOP/s , 530192.4 tokens/s INFO:__main__:2024-10-26 21:26:34 | Epoch: 0 | Step: 15250 | Dataset: 0-12201000 | Loss: 2.325 | 675 ms/step , 58265.93 GFLOP/s , 531106.9 tokens/s INFO:__main__:2024-10-26 21:26:42 | Epoch: 0 | Step: 15260 | Dataset: 0-12209000 | Loss: 2.360 | 676 ms/step , 58180.92 GFLOP/s , 531060.5 tokens/s INFO:__main__:2024-10-26 21:26:49 | Epoch: 0 | Step: 15270 | Dataset: 0-12217000 | Loss: 2.345 | 676 ms/step , 58154.27 GFLOP/s , 529835.8 tokens/s INFO:__main__:2024-10-26 21:26:57 | Epoch: 0 | Step: 15280 | Dataset: 0-12225000 | Loss: 2.363 | 675 ms/step , 58196.64 GFLOP/s , 530045.7 tokens/s INFO:__main__:2024-10-26 21:27:05 | Epoch: 0 | Step: 15290 | Dataset: 0-12233000 | Loss: 2.398 | 675 ms/step , 58219.85 GFLOP/s , 530017.3 tokens/s INFO:__main__:2024-10-26 21:27:13 | Epoch: 0 | Step: 15300 | Dataset: 0-12241000 | Loss: 2.260 | 675 ms/step , 58263.48 GFLOP/s , 530349.0 tokens/s INFO:__main__:2024-10-26 21:27:20 | Epoch: 0 | Step: 15310 | Dataset: 0-12249000 | Loss: 2.248 | 676 ms/step , 58129.97 GFLOP/s , 529365.1 tokens/s INFO:__main__:2024-10-26 21:27:28 | Epoch: 0 | Step: 15320 | Dataset: 0-12257000 | Loss: 2.245 | 678 ms/step , 57988.65 GFLOP/s , 531176.5 tokens/s INFO:__main__:2024-10-26 21:27:36 | Epoch: 0 | Step: 15330 | Dataset: 0-12265000 | Loss: 2.334 | 677 ms/step , 58024.56 GFLOP/s , 528740.7 tokens/s INFO:__main__:2024-10-26 21:27:43 | Epoch: 0 | Step: 15340 | Dataset: 0-12273000 | Loss: 2.345 | 677 ms/step , 58082.52 GFLOP/s , 530460.8 tokens/s INFO:__main__:2024-10-26 21:27:51 | Epoch: 0 | Step: 15350 | Dataset: 0-12281000 | Loss: 2.277 | 676 ms/step , 58176.54 GFLOP/s , 529732.7 tokens/s INFO:__main__:2024-10-26 21:27:59 | Epoch: 0 | Step: 15360 | Dataset: 0-12289000 | Loss: 2.298 | 673 ms/step , 58378.60 GFLOP/s , 532491.5 tokens/s INFO:__main__:2024-10-26 21:28:07 | Epoch: 0 | Step: 15370 | Dataset: 0-12297000 | Loss: 2.350 | 674 ms/step , 58290.59 GFLOP/s , 532398.3 tokens/s INFO:__main__:2024-10-26 21:28:14 | Epoch: 0 | Step: 15380 | Dataset: 0-12305000 | Loss: 2.285 | 675 ms/step , 58203.69 GFLOP/s , 532310.4 tokens/s INFO:__main__:2024-10-26 21:28:22 | Epoch: 0 | Step: 15390 | Dataset: 0-12313000 | Loss: 2.269 | 674 ms/step , 58337.05 GFLOP/s , 532609.4 tokens/s INFO:__main__:2024-10-26 21:28:30 | Epoch: 0 | Step: 15400 | Dataset: 0-12321000 | Loss: 2.310 | 675 ms/step , 58235.49 GFLOP/s , 532325.5 tokens/s INFO:__main__:2024-10-26 21:28:37 | Epoch: 0 | Step: 15410 | Dataset: 0-12329000 | Loss: 2.391 | 675 ms/step , 58223.89 GFLOP/s , 530262.9 tokens/s INFO:__main__:2024-10-26 21:28:45 | Epoch: 0 | Step: 15420 | Dataset: 0-12337000 | Loss: 2.268 | 675 ms/step , 58196.68 GFLOP/s , 531589.6 tokens/s INFO:__main__:2024-10-26 21:28:53 | Epoch: 0 | Step: 15430 | Dataset: 0-12345000 | Loss: 2.224 | 676 ms/step , 58143.11 GFLOP/s , 531148.1 tokens/s INFO:__main__:2024-10-26 21:29:01 | Epoch: 0 | Step: 15440 | Dataset: 0-12353000 | Loss: 2.226 | 676 ms/step , 58186.06 GFLOP/s , 531443.2 tokens/s INFO:__main__:2024-10-26 21:29:08 | Epoch: 0 | Step: 15450 | Dataset: 0-12361000 | Loss: 2.243 | 676 ms/step , 58163.32 GFLOP/s , 531353.6 tokens/s INFO:__main__:2024-10-26 21:29:16 | Epoch: 0 | Step: 15460 | Dataset: 0-12369000 | Loss: 2.265 | 676 ms/step , 58138.99 GFLOP/s , 528434.6 tokens/s INFO:__main__:2024-10-26 21:29:24 | Epoch: 0 | Step: 15470 | Dataset: 0-12377000 | Loss: 2.340 | 683 ms/step , 57516.51 GFLOP/s , 528946.5 tokens/s INFO:__main__:2024-10-26 21:29:32 | Epoch: 0 | Step: 15480 | Dataset: 0-12385000 | Loss: 2.223 | 678 ms/step , 57939.28 GFLOP/s , 526884.3 tokens/s INFO:__main__:2024-10-26 21:29:39 | Epoch: 0 | Step: 15490 | Dataset: 0-12393000 | Loss: 2.251 | 680 ms/step , 57798.38 GFLOP/s , 529750.5 tokens/s INFO:__main__:2024-10-26 21:29:47 | Epoch: 0 | Step: 15500 | Dataset: 0-12401000 | Loss: 2.215 | 676 ms/step , 58182.41 GFLOP/s , 529975.8 tokens/s INFO:__main__:2024-10-26 21:29:55 | Epoch: 0 | Step: 15510 | Dataset: 0-12409000 | Loss: 2.308 | 680 ms/step , 57849.38 GFLOP/s , 528913.9 tokens/s INFO:__main__:2024-10-26 21:30:02 | Epoch: 0 | Step: 15520 | Dataset: 0-12417000 | Loss: 2.329 | 677 ms/step , 58050.47 GFLOP/s , 528032.3 tokens/s INFO:__main__:2024-10-26 21:30:10 | Epoch: 0 | Step: 15530 | Dataset: 0-12425000 | Loss: 2.303 | 677 ms/step , 58063.41 GFLOP/s , 526020.5 tokens/s INFO:__main__:2024-10-26 21:30:18 | Epoch: 0 | Step: 15540 | Dataset: 0-12433000 | Loss: 2.246 | 678 ms/step , 58000.68 GFLOP/s , 526949.2 tokens/s INFO:__main__:2024-10-26 21:30:26 | Epoch: 0 | Step: 15550 | Dataset: 0-12441000 | Loss: 2.303 | 682 ms/step , 57639.28 GFLOP/s , 528410.9 tokens/s INFO:__main__:2024-10-26 21:30:34 | Epoch: 0 | Step: 15560 | Dataset: 0-12449000 | Loss: 2.380 | 679 ms/step , 57922.15 GFLOP/s , 529057.5 tokens/s INFO:__main__:2024-10-26 21:30:41 | Epoch: 0 | Step: 15570 | Dataset: 0-12457000 | Loss: 2.349 | 676 ms/step , 58127.69 GFLOP/s , 529020.2 tokens/s INFO:__main__:2024-10-26 21:30:49 | Epoch: 0 | Step: 15580 | Dataset: 0-12465000 | Loss: 2.242 | 679 ms/step , 57881.07 GFLOP/s , 529285.3 tokens/s INFO:__main__:2024-10-26 21:30:57 | Epoch: 0 | Step: 15590 | Dataset: 0-12473000 | Loss: 2.251 | 677 ms/step , 58049.08 GFLOP/s , 526590.0 tokens/s INFO:__main__:2024-10-26 21:31:05 | Epoch: 0 | Step: 15600 | Dataset: 0-12481000 | Loss: 2.257 | 677 ms/step , 58102.49 GFLOP/s , 527972.2 tokens/s INFO:__main__:2024-10-26 21:31:12 | Epoch: 0 | Step: 15610 | Dataset: 0-12489000 | Loss: 2.259 | 678 ms/step , 58005.14 GFLOP/s , 527955.9 tokens/s INFO:__main__:2024-10-26 21:31:20 | Epoch: 0 | Step: 15620 | Dataset: 0-12497000 | Loss: 2.309 | 679 ms/step , 57900.95 GFLOP/s , 527925.9 tokens/s INFO:__main__:2024-10-26 21:31:28 | Epoch: 0 | Step: 15630 | Dataset: 0-12505000 | Loss: 2.247 | 675 ms/step , 58193.82 GFLOP/s , 528099.2 tokens/s INFO:__main__:2024-10-26 21:31:36 | Epoch: 0 | Step: 15640 | Dataset: 0-12513000 | Loss: 2.324 | 677 ms/step , 58051.03 GFLOP/s , 526757.8 tokens/s INFO:__main__:2024-10-26 21:31:43 | Epoch: 0 | Step: 15650 | Dataset: 0-12521000 | Loss: 2.295 | 684 ms/step , 57498.32 GFLOP/s , 526696.3 tokens/s INFO:__main__:2024-10-26 21:31:51 | Epoch: 0 | Step: 15660 | Dataset: 0-12529000 | Loss: 2.211 | 676 ms/step , 58134.50 GFLOP/s , 527423.7 tokens/s INFO:__main__:2024-10-26 21:31:59 | Epoch: 0 | Step: 15670 | Dataset: 0-12537000 | Loss: 2.264 | 677 ms/step , 58105.17 GFLOP/s , 528562.0 tokens/s INFO:__main__:2024-10-26 21:32:07 | Epoch: 0 | Step: 15680 | Dataset: 0-12545000 | Loss: 2.307 | 677 ms/step , 58078.00 GFLOP/s , 528232.3 tokens/s INFO:__main__:2024-10-26 21:32:14 | Epoch: 0 | Step: 15690 | Dataset: 0-12553000 | Loss: 2.220 | 677 ms/step , 58069.32 GFLOP/s , 526116.3 tokens/s INFO:__main__:2024-10-26 21:32:22 | Epoch: 0 | Step: 15700 | Dataset: 0-12561000 | Loss: 2.305 | 676 ms/step , 58149.32 GFLOP/s , 528389.0 tokens/s INFO:__main__:2024-10-26 21:32:30 | Epoch: 0 | Step: 15710 | Dataset: 0-12569000 | Loss: 2.300 | 688 ms/step , 57163.35 GFLOP/s , 528636.4 tokens/s INFO:__main__:2024-10-26 21:32:38 | Epoch: 0 | Step: 15720 | Dataset: 0-12577000 | Loss: 2.223 | 675 ms/step , 58210.32 GFLOP/s , 529068.7 tokens/s INFO:__main__:2024-10-26 21:32:45 | Epoch: 0 | Step: 15730 | Dataset: 0-12585000 | Loss: 2.295 | 675 ms/step , 58263.07 GFLOP/s , 529882.4 tokens/s INFO:__main__:2024-10-26 21:32:53 | Epoch: 0 | Step: 15740 | Dataset: 0-12593000 | Loss: 2.268 | 675 ms/step , 58242.07 GFLOP/s , 531127.3 tokens/s INFO:__main__:2024-10-26 21:33:01 | Epoch: 0 | Step: 15750 | Dataset: 0-12601000 | Loss: 2.198 | 675 ms/step , 58254.27 GFLOP/s , 530312.5 tokens/s INFO:__main__:2024-10-26 21:33:09 | Epoch: 0 | Step: 15760 | Dataset: 0-12609000 | Loss: 2.348 | 676 ms/step , 58142.40 GFLOP/s , 531215.9 tokens/s INFO:__main__:2024-10-26 21:33:16 | Epoch: 0 | Step: 15770 | Dataset: 0-12617000 | Loss: 2.338 | 674 ms/step , 58311.23 GFLOP/s , 531237.5 tokens/s INFO:__main__:2024-10-26 21:33:24 | Epoch: 0 | Step: 15780 | Dataset: 0-12625000 | Loss: 2.315 | 675 ms/step , 58216.53 GFLOP/s , 531357.3 tokens/s INFO:__main__:2024-10-26 21:33:32 | Epoch: 0 | Step: 15790 | Dataset: 0-12633000 | Loss: 2.222 | 675 ms/step , 58268.06 GFLOP/s , 529013.4 tokens/s INFO:__main__:2024-10-26 21:33:39 | Epoch: 0 | Step: 15800 | Dataset: 0-12641000 | Loss: 2.205 | 679 ms/step , 57925.40 GFLOP/s , 531356.6 tokens/s INFO:__main__:2024-10-26 21:33:47 | Epoch: 0 | Step: 15810 | Dataset: 0-12649000 | Loss: 2.214 | 675 ms/step , 58251.46 GFLOP/s , 529243.2 tokens/s INFO:__main__:2024-10-26 21:33:55 | Epoch: 0 | Step: 15820 | Dataset: 0-12657000 | Loss: 2.260 | 675 ms/step , 58211.37 GFLOP/s , 530832.4 tokens/s INFO:__main__:2024-10-26 21:34:03 | Epoch: 0 | Step: 15830 | Dataset: 0-12665000 | Loss: 2.246 | 674 ms/step , 58291.57 GFLOP/s , 529589.4 tokens/s INFO:__main__:2024-10-26 21:34:10 | Epoch: 0 | Step: 15840 | Dataset: 0-12673000 | Loss: 2.328 | 675 ms/step , 58212.37 GFLOP/s , 530341.2 tokens/s INFO:__main__:2024-10-26 21:34:18 | Epoch: 0 | Step: 15850 | Dataset: 0-12681000 | Loss: 2.354 | 675 ms/step , 58207.96 GFLOP/s , 530460.6 tokens/s INFO:__main__:2024-10-26 21:34:26 | Epoch: 0 | Step: 15860 | Dataset: 0-12689000 | Loss: 2.356 | 679 ms/step , 57851.48 GFLOP/s , 530177.6 tokens/s INFO:__main__:2024-10-26 21:34:33 | Epoch: 0 | Step: 15870 | Dataset: 0-12697000 | Loss: 2.312 | 677 ms/step , 58066.28 GFLOP/s , 529604.9 tokens/s INFO:__main__:2024-10-26 21:34:41 | Epoch: 0 | Step: 15880 | Dataset: 0-12705000 | Loss: 2.219 | 676 ms/step , 58156.84 GFLOP/s , 530073.2 tokens/s INFO:__main__:2024-10-26 21:34:49 | Epoch: 0 | Step: 15890 | Dataset: 0-12713000 | Loss: 2.268 | 675 ms/step , 58225.46 GFLOP/s , 527604.3 tokens/s INFO:__main__:2024-10-26 21:34:57 | Epoch: 0 | Step: 15900 | Dataset: 0-12721000 | Loss: 2.352 | 676 ms/step , 58189.51 GFLOP/s , 530255.7 tokens/s INFO:__main__:2024-10-26 21:35:04 | Epoch: 0 | Step: 15910 | Dataset: 0-12729000 | Loss: 2.314 | 674 ms/step , 58308.06 GFLOP/s , 530613.3 tokens/s INFO:__main__:2024-10-26 21:35:12 | Epoch: 0 | Step: 15920 | Dataset: 0-12737000 | Loss: 2.329 | 673 ms/step , 58374.02 GFLOP/s , 533804.4 tokens/s INFO:__main__:2024-10-26 21:35:20 | Epoch: 0 | Step: 15930 | Dataset: 0-12745000 | Loss: 2.214 | 675 ms/step , 58277.76 GFLOP/s , 532676.1 tokens/s INFO:__main__:2024-10-26 21:35:28 | Epoch: 0 | Step: 15940 | Dataset: 0-12753000 | Loss: 2.321 | 676 ms/step , 58189.38 GFLOP/s , 531212.5 tokens/s INFO:__main__:2024-10-26 21:35:35 | Epoch: 0 | Step: 15950 | Dataset: 0-12761000 | Loss: 2.329 | 676 ms/step , 58142.95 GFLOP/s , 531290.9 tokens/s INFO:__main__:2024-10-26 21:35:43 | Epoch: 0 | Step: 15960 | Dataset: 0-12769000 | Loss: 2.297 | 676 ms/step , 58187.15 GFLOP/s , 531204.0 tokens/s INFO:__main__:2024-10-26 21:35:51 | Epoch: 0 | Step: 15970 | Dataset: 0-12777000 | Loss: 2.219 | 676 ms/step , 58140.41 GFLOP/s , 531286.5 tokens/s INFO:__main__:2024-10-26 21:35:58 | Epoch: 0 | Step: 15980 | Dataset: 0-12785000 | Loss: 2.260 | 675 ms/step , 58232.48 GFLOP/s , 531509.3 tokens/s INFO:__main__:2024-10-26 21:36:06 | Epoch: 0 | Step: 15990 | Dataset: 0-12793000 | Loss: 2.286 | 675 ms/step , 58216.59 GFLOP/s , 530754.4 tokens/s INFO:__main__:2024-10-26 21:36:13 | Validation | Step: 16000 | Val_loss: 2.332 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 21:36:13 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_213613_step_16000.pt` INFO:__main__:2024-10-26 21:36:15 | Epoch: 0 | Step: 16000 | Dataset: 0-12801000 | Loss: 2.289 | 674 ms/step , 58302.64 GFLOP/s , 474741.4 tokens/s INFO:__main__:2024-10-26 21:36:22 | Epoch: 0 | Step: 16010 | Dataset: 0-12809000 | Loss: 2.512 | 676 ms/step , 58180.53 GFLOP/s , 531060.4 tokens/s INFO:__main__:2024-10-26 21:36:30 | Epoch: 0 | Step: 16020 | Dataset: 0-12817000 | Loss: 2.540 | 675 ms/step , 58240.27 GFLOP/s , 531346.1 tokens/s INFO:__main__:2024-10-26 21:36:38 | Epoch: 0 | Step: 16030 | Dataset: 0-12825000 | Loss: 2.479 | 675 ms/step , 58246.92 GFLOP/s , 531753.1 tokens/s INFO:__main__:2024-10-26 21:36:46 | Epoch: 0 | Step: 16040 | Dataset: 0-12833000 | Loss: 2.494 | 675 ms/step , 58230.74 GFLOP/s , 531554.8 tokens/s INFO:__main__:2024-10-26 21:36:53 | Epoch: 0 | Step: 16050 | Dataset: 0-12841000 | Loss: 2.445 | 675 ms/step , 58241.25 GFLOP/s , 531881.2 tokens/s INFO:__main__:2024-10-26 21:37:01 | Epoch: 0 | Step: 16060 | Dataset: 0-12849000 | Loss: 2.404 | 675 ms/step , 58238.29 GFLOP/s , 529596.0 tokens/s INFO:__main__:2024-10-26 21:37:09 | Epoch: 0 | Step: 16070 | Dataset: 0-12857000 | Loss: 2.450 | 675 ms/step , 58193.36 GFLOP/s , 528061.1 tokens/s INFO:__main__:2024-10-26 21:37:16 | Epoch: 0 | Step: 16080 | Dataset: 0-12865000 | Loss: 2.449 | 676 ms/step , 58125.96 GFLOP/s , 531157.1 tokens/s INFO:__main__:2024-10-26 21:37:24 | Epoch: 0 | Step: 16090 | Dataset: 0-12873000 | Loss: 2.510 | 675 ms/step , 58237.02 GFLOP/s , 530755.8 tokens/s INFO:__main__:2024-10-26 21:37:32 | Epoch: 0 | Step: 16100 | Dataset: 0-12881000 | Loss: 2.373 | 676 ms/step , 58159.87 GFLOP/s , 531600.5 tokens/s INFO:__main__:2024-10-26 21:37:40 | Epoch: 0 | Step: 16110 | Dataset: 0-12889000 | Loss: 2.473 | 676 ms/step , 58130.18 GFLOP/s , 530646.4 tokens/s INFO:__main__:2024-10-26 21:37:47 | Epoch: 0 | Step: 16120 | Dataset: 0-12897000 | Loss: 2.395 | 676 ms/step , 58140.69 GFLOP/s , 530848.6 tokens/s INFO:__main__:2024-10-26 21:37:55 | Epoch: 0 | Step: 16130 | Dataset: 0-12905000 | Loss: 2.472 | 676 ms/step , 58115.45 GFLOP/s , 530069.7 tokens/s INFO:__main__:2024-10-26 21:38:03 | Epoch: 0 | Step: 16140 | Dataset: 0-12913000 | Loss: 2.411 | 676 ms/step , 58148.28 GFLOP/s , 530886.6 tokens/s INFO:__main__:2024-10-26 21:38:10 | Epoch: 0 | Step: 16150 | Dataset: 0-12921000 | Loss: 2.419 | 675 ms/step , 58225.21 GFLOP/s , 531027.9 tokens/s INFO:__main__:2024-10-26 21:38:18 | Epoch: 0 | Step: 16160 | Dataset: 0-12929000 | Loss: 2.458 | 677 ms/step , 58042.04 GFLOP/s , 531010.3 tokens/s INFO:__main__:2024-10-26 21:38:26 | Epoch: 0 | Step: 16170 | Dataset: 0-12937000 | Loss: 2.211 | 676 ms/step , 58176.93 GFLOP/s , 530604.4 tokens/s INFO:__main__:2024-10-26 21:38:34 | Epoch: 0 | Step: 16180 | Dataset: 0-12945000 | Loss: 2.050 | 677 ms/step , 58073.94 GFLOP/s , 530956.3 tokens/s INFO:__main__:2024-10-26 21:38:41 | Epoch: 0 | Step: 16190 | Dataset: 0-12953000 | Loss: 2.013 | 677 ms/step , 58087.79 GFLOP/s , 528999.9 tokens/s INFO:__main__:2024-10-26 21:38:49 | Epoch: 0 | Step: 16200 | Dataset: 0-12961000 | Loss: 1.981 | 676 ms/step , 58123.13 GFLOP/s , 529077.6 tokens/s INFO:__main__:2024-10-26 21:38:57 | Epoch: 0 | Step: 16210 | Dataset: 0-12969000 | Loss: 1.963 | 676 ms/step , 58160.53 GFLOP/s , 531011.9 tokens/s INFO:__main__:2024-10-26 21:39:05 | Epoch: 0 | Step: 16220 | Dataset: 0-12977000 | Loss: 1.938 | 676 ms/step , 58120.40 GFLOP/s , 530383.0 tokens/s INFO:__main__:2024-10-26 21:39:12 | Epoch: 0 | Step: 16230 | Dataset: 0-12985000 | Loss: 1.902 | 677 ms/step , 58026.13 GFLOP/s , 530200.1 tokens/s INFO:__main__:2024-10-26 21:39:20 | Epoch: 0 | Step: 16240 | Dataset: 0-12993000 | Loss: 1.890 | 676 ms/step , 58168.42 GFLOP/s , 530804.7 tokens/s INFO:__main__:2024-10-26 21:39:28 | Epoch: 0 | Step: 16250 | Dataset: 0-13001000 | Loss: 1.919 | 675 ms/step , 58232.70 GFLOP/s , 530623.6 tokens/s INFO:__main__:2024-10-26 21:39:35 | Epoch: 0 | Step: 16260 | Dataset: 0-13009000 | Loss: 1.870 | 677 ms/step , 58090.57 GFLOP/s , 529282.5 tokens/s INFO:__main__:2024-10-26 21:39:43 | Epoch: 0 | Step: 16270 | Dataset: 0-13017000 | Loss: 1.867 | 674 ms/step , 58279.53 GFLOP/s , 530639.5 tokens/s INFO:__main__:2024-10-26 21:39:51 | Epoch: 0 | Step: 16280 | Dataset: 0-13025000 | Loss: 1.919 | 676 ms/step , 58187.73 GFLOP/s , 531537.8 tokens/s INFO:__main__:2024-10-26 21:39:59 | Epoch: 0 | Step: 16290 | Dataset: 0-13033000 | Loss: 1.875 | 676 ms/step , 58166.39 GFLOP/s , 531217.3 tokens/s INFO:__main__:2024-10-26 21:40:06 | Epoch: 0 | Step: 16300 | Dataset: 0-13041000 | Loss: 1.806 | 676 ms/step , 58164.23 GFLOP/s , 530517.1 tokens/s INFO:__main__:2024-10-26 21:40:14 | Epoch: 0 | Step: 16310 | Dataset: 0-13049000 | Loss: 1.814 | 678 ms/step , 57974.27 GFLOP/s , 530082.6 tokens/s INFO:__main__:2024-10-26 21:40:22 | Epoch: 0 | Step: 16320 | Dataset: 0-13057000 | Loss: 1.853 | 676 ms/step , 58143.96 GFLOP/s , 530273.5 tokens/s INFO:__main__:2024-10-26 21:40:29 | Epoch: 0 | Step: 16330 | Dataset: 0-13065000 | Loss: 1.849 | 679 ms/step , 57898.05 GFLOP/s , 529422.4 tokens/s INFO:__main__:2024-10-26 21:40:37 | Epoch: 0 | Step: 16340 | Dataset: 0-13073000 | Loss: 2.725 | 678 ms/step , 58018.56 GFLOP/s , 530113.7 tokens/s INFO:__main__:2024-10-26 21:40:45 | Epoch: 0 | Step: 16350 | Dataset: 0-13081000 | Loss: 2.442 | 676 ms/step , 58109.91 GFLOP/s , 529687.3 tokens/s INFO:__main__:2024-10-26 21:40:53 | Epoch: 0 | Step: 16360 | Dataset: 0-13089000 | Loss: 2.307 | 678 ms/step , 57983.81 GFLOP/s , 530389.1 tokens/s INFO:__main__:2024-10-26 21:41:00 | Epoch: 0 | Step: 16370 | Dataset: 0-13097000 | Loss: 2.249 | 675 ms/step , 58266.91 GFLOP/s , 531148.9 tokens/s INFO:__main__:2024-10-26 21:41:08 | Epoch: 0 | Step: 16380 | Dataset: 0-13105000 | Loss: 2.236 | 680 ms/step , 57838.24 GFLOP/s , 531098.5 tokens/s INFO:__main__:2024-10-26 21:41:16 | Epoch: 0 | Step: 16390 | Dataset: 0-13113000 | Loss: 2.226 | 675 ms/step , 58200.78 GFLOP/s , 531597.4 tokens/s INFO:__main__:2024-10-26 21:41:23 | Epoch: 0 | Step: 16400 | Dataset: 0-13121000 | Loss: 2.189 | 677 ms/step , 58098.58 GFLOP/s , 531393.8 tokens/s INFO:__main__:2024-10-26 21:41:31 | Epoch: 0 | Step: 16410 | Dataset: 0-13129000 | Loss: 2.165 | 676 ms/step , 58181.71 GFLOP/s , 531295.1 tokens/s INFO:__main__:2024-10-26 21:41:39 | Epoch: 0 | Step: 16420 | Dataset: 0-13137000 | Loss: 2.151 | 676 ms/step , 58175.95 GFLOP/s , 531724.5 tokens/s INFO:__main__:2024-10-26 21:41:47 | Epoch: 0 | Step: 16430 | Dataset: 0-13145000 | Loss: 2.120 | 677 ms/step , 58059.43 GFLOP/s , 530876.5 tokens/s INFO:__main__:2024-10-26 21:41:54 | Epoch: 0 | Step: 16440 | Dataset: 0-13153000 | Loss: 2.079 | 675 ms/step , 58198.13 GFLOP/s , 531794.2 tokens/s INFO:__main__:2024-10-26 21:42:02 | Epoch: 0 | Step: 16450 | Dataset: 0-13161000 | Loss: 2.177 | 677 ms/step , 58050.42 GFLOP/s , 532189.2 tokens/s INFO:__main__:2024-10-26 21:42:10 | Epoch: 0 | Step: 16460 | Dataset: 0-13169000 | Loss: 2.142 | 680 ms/step , 57778.16 GFLOP/s , 531706.7 tokens/s INFO:__main__:2024-10-26 21:42:17 | Epoch: 0 | Step: 16470 | Dataset: 0-13177000 | Loss: 2.155 | 682 ms/step , 57604.16 GFLOP/s , 531206.9 tokens/s INFO:__main__:2024-10-26 21:42:25 | Epoch: 0 | Step: 16480 | Dataset: 0-13185000 | Loss: 2.167 | 678 ms/step , 58011.12 GFLOP/s , 530878.0 tokens/s INFO:__main__:2024-10-26 21:42:33 | Epoch: 0 | Step: 16490 | Dataset: 0-13193000 | Loss: 2.128 | 677 ms/step , 58095.00 GFLOP/s , 532104.3 tokens/s INFO:__main__:2024-10-26 21:42:41 | Epoch: 0 | Step: 16500 | Dataset: 0-13201000 | Loss: 2.193 | 676 ms/step , 58158.45 GFLOP/s , 531577.6 tokens/s INFO:__main__:2024-10-26 21:42:48 | Epoch: 0 | Step: 16510 | Dataset: 0-13209000 | Loss: 2.438 | 676 ms/step , 58152.31 GFLOP/s , 531786.2 tokens/s INFO:__main__:2024-10-26 21:42:56 | Epoch: 0 | Step: 16520 | Dataset: 0-13217000 | Loss: 2.392 | 676 ms/step , 58140.56 GFLOP/s , 532359.8 tokens/s INFO:__main__:2024-10-26 21:43:04 | Epoch: 0 | Step: 16530 | Dataset: 0-13225000 | Loss: 2.392 | 675 ms/step , 58211.41 GFLOP/s , 531909.8 tokens/s INFO:__main__:2024-10-26 21:43:11 | Epoch: 0 | Step: 16540 | Dataset: 0-13233000 | Loss: 2.349 | 676 ms/step , 58181.07 GFLOP/s , 531341.6 tokens/s INFO:__main__:2024-10-26 21:43:19 | Epoch: 0 | Step: 16550 | Dataset: 0-13241000 | Loss: 2.244 | 674 ms/step , 58282.16 GFLOP/s , 532448.2 tokens/s INFO:__main__:2024-10-26 21:43:27 | Epoch: 0 | Step: 16560 | Dataset: 0-13249000 | Loss: 2.280 | 674 ms/step , 58365.16 GFLOP/s , 531981.1 tokens/s INFO:__main__:2024-10-26 21:43:34 | Epoch: 0 | Step: 16570 | Dataset: 0-13257000 | Loss: 2.244 | 676 ms/step , 58186.88 GFLOP/s , 531781.9 tokens/s INFO:__main__:2024-10-26 21:43:42 | Epoch: 0 | Step: 16580 | Dataset: 0-13265000 | Loss: 2.340 | 675 ms/step , 58246.58 GFLOP/s , 531914.5 tokens/s INFO:__main__:2024-10-26 21:43:50 | Epoch: 0 | Step: 16590 | Dataset: 0-13273000 | Loss: 2.238 | 675 ms/step , 58263.76 GFLOP/s , 532282.4 tokens/s INFO:__main__:2024-10-26 21:43:58 | Epoch: 0 | Step: 16600 | Dataset: 0-13281000 | Loss: 2.334 | 678 ms/step , 57999.14 GFLOP/s , 531457.3 tokens/s INFO:__main__:2024-10-26 21:44:05 | Epoch: 0 | Step: 16610 | Dataset: 0-13289000 | Loss: 2.268 | 678 ms/step , 57982.44 GFLOP/s , 530870.7 tokens/s INFO:__main__:2024-10-26 21:44:13 | Epoch: 0 | Step: 16620 | Dataset: 0-13297000 | Loss: 2.325 | 674 ms/step , 58283.30 GFLOP/s , 532190.4 tokens/s INFO:__main__:2024-10-26 21:44:21 | Epoch: 0 | Step: 16630 | Dataset: 0-13305000 | Loss: 2.342 | 675 ms/step , 58266.11 GFLOP/s , 532116.4 tokens/s INFO:__main__:2024-10-26 21:44:28 | Epoch: 0 | Step: 16640 | Dataset: 0-13313000 | Loss: 2.272 | 676 ms/step , 58123.91 GFLOP/s , 532624.5 tokens/s INFO:__main__:2024-10-26 21:44:36 | Epoch: 0 | Step: 16650 | Dataset: 0-13321000 | Loss: 2.265 | 675 ms/step , 58262.71 GFLOP/s , 532573.6 tokens/s INFO:__main__:2024-10-26 21:44:44 | Epoch: 0 | Step: 16660 | Dataset: 0-13329000 | Loss: 2.281 | 676 ms/step , 58184.70 GFLOP/s , 531881.9 tokens/s INFO:__main__:2024-10-26 21:44:51 | Epoch: 0 | Step: 16670 | Dataset: 0-13337000 | Loss: 2.319 | 675 ms/step , 58203.16 GFLOP/s , 530641.9 tokens/s INFO:__main__:2024-10-26 21:44:59 | Epoch: 0 | Step: 16680 | Dataset: 0-13345000 | Loss: 2.348 | 676 ms/step , 58175.55 GFLOP/s , 531670.8 tokens/s INFO:__main__:2024-10-26 21:45:07 | Epoch: 0 | Step: 16690 | Dataset: 0-13353000 | Loss: 2.169 | 675 ms/step , 58201.94 GFLOP/s , 530150.1 tokens/s INFO:__main__:2024-10-26 21:45:15 | Epoch: 0 | Step: 16700 | Dataset: 0-13361000 | Loss: 2.273 | 675 ms/step , 58243.32 GFLOP/s , 529846.3 tokens/s INFO:__main__:2024-10-26 21:45:22 | Epoch: 0 | Step: 16710 | Dataset: 0-13369000 | Loss: 2.304 | 676 ms/step , 58134.84 GFLOP/s , 530256.8 tokens/s INFO:__main__:2024-10-26 21:45:30 | Epoch: 0 | Step: 16720 | Dataset: 0-13377000 | Loss: 2.304 | 675 ms/step , 58240.25 GFLOP/s , 529998.2 tokens/s INFO:__main__:2024-10-26 21:45:38 | Epoch: 0 | Step: 16730 | Dataset: 0-13385000 | Loss: 2.265 | 678 ms/step , 58005.33 GFLOP/s , 526109.5 tokens/s INFO:__main__:2024-10-26 21:45:46 | Epoch: 0 | Step: 16740 | Dataset: 0-13393000 | Loss: 2.305 | 677 ms/step , 58025.16 GFLOP/s , 530794.9 tokens/s INFO:__main__:2024-10-26 21:45:53 | Epoch: 0 | Step: 16750 | Dataset: 0-13401000 | Loss: 2.307 | 678 ms/step , 57998.61 GFLOP/s , 528204.3 tokens/s INFO:__main__:2024-10-26 21:46:01 | Epoch: 0 | Step: 16760 | Dataset: 0-13409000 | Loss: 2.292 | 676 ms/step , 58130.48 GFLOP/s , 529662.1 tokens/s INFO:__main__:2024-10-26 21:46:09 | Epoch: 0 | Step: 16770 | Dataset: 0-13417000 | Loss: 2.330 | 677 ms/step , 58063.96 GFLOP/s , 528063.4 tokens/s INFO:__main__:2024-10-26 21:46:17 | Epoch: 0 | Step: 16780 | Dataset: 0-13425000 | Loss: 2.310 | 676 ms/step , 58123.19 GFLOP/s , 529054.6 tokens/s INFO:__main__:2024-10-26 21:46:24 | Epoch: 0 | Step: 16790 | Dataset: 0-13433000 | Loss: 2.162 | 677 ms/step , 58071.86 GFLOP/s , 529113.9 tokens/s INFO:__main__:2024-10-26 21:46:32 | Epoch: 0 | Step: 16800 | Dataset: 0-13441000 | Loss: 2.253 | 676 ms/step , 58129.75 GFLOP/s , 529792.9 tokens/s INFO:__main__:2024-10-26 21:46:40 | Epoch: 0 | Step: 16810 | Dataset: 0-13449000 | Loss: 2.206 | 676 ms/step , 58127.09 GFLOP/s , 529794.9 tokens/s INFO:__main__:2024-10-26 21:46:47 | Epoch: 0 | Step: 16820 | Dataset: 0-13457000 | Loss: 2.336 | 677 ms/step , 58102.14 GFLOP/s , 530320.6 tokens/s INFO:__main__:2024-10-26 21:46:55 | Epoch: 0 | Step: 16830 | Dataset: 0-13465000 | Loss: 2.016 | 678 ms/step , 58017.83 GFLOP/s , 530576.3 tokens/s INFO:__main__:2024-10-26 21:47:03 | Epoch: 0 | Step: 16840 | Dataset: 0-13473000 | Loss: 1.883 | 675 ms/step , 58262.95 GFLOP/s , 529834.0 tokens/s INFO:__main__:2024-10-26 21:47:11 | Epoch: 0 | Step: 16850 | Dataset: 0-13481000 | Loss: 1.842 | 677 ms/step , 58045.84 GFLOP/s , 529952.2 tokens/s INFO:__main__:2024-10-26 21:47:18 | Epoch: 0 | Step: 16860 | Dataset: 0-13489000 | Loss: 1.827 | 677 ms/step , 58033.56 GFLOP/s , 529074.8 tokens/s INFO:__main__:2024-10-26 21:47:26 | Epoch: 0 | Step: 16870 | Dataset: 0-13497000 | Loss: 1.800 | 676 ms/step , 58143.32 GFLOP/s , 527653.1 tokens/s INFO:__main__:2024-10-26 21:47:34 | Epoch: 0 | Step: 16880 | Dataset: 0-13505000 | Loss: 1.764 | 677 ms/step , 58073.71 GFLOP/s , 529754.0 tokens/s INFO:__main__:2024-10-26 21:47:42 | Epoch: 0 | Step: 16890 | Dataset: 0-13513000 | Loss: 1.741 | 679 ms/step , 57907.24 GFLOP/s , 529724.6 tokens/s INFO:__main__:2024-10-26 21:47:49 | Epoch: 0 | Step: 16900 | Dataset: 0-13521000 | Loss: 1.781 | 677 ms/step , 58088.60 GFLOP/s , 528899.7 tokens/s INFO:__main__:2024-10-26 21:47:57 | Epoch: 0 | Step: 16910 | Dataset: 0-13529000 | Loss: 1.725 | 677 ms/step , 58095.21 GFLOP/s , 529975.6 tokens/s INFO:__main__:2024-10-26 21:48:05 | Epoch: 0 | Step: 16920 | Dataset: 0-13537000 | Loss: 1.894 | 677 ms/step , 58097.82 GFLOP/s , 529129.1 tokens/s INFO:__main__:2024-10-26 21:48:13 | Epoch: 0 | Step: 16930 | Dataset: 0-13545000 | Loss: 1.853 | 676 ms/step , 58128.97 GFLOP/s , 528988.9 tokens/s INFO:__main__:2024-10-26 21:48:20 | Epoch: 0 | Step: 16940 | Dataset: 0-13553000 | Loss: 1.849 | 677 ms/step , 58082.19 GFLOP/s , 528897.1 tokens/s INFO:__main__:2024-10-26 21:48:28 | Epoch: 0 | Step: 16950 | Dataset: 0-13561000 | Loss: 1.830 | 678 ms/step , 57954.14 GFLOP/s , 528403.7 tokens/s INFO:__main__:2024-10-26 21:48:36 | Epoch: 0 | Step: 16960 | Dataset: 0-13569000 | Loss: 1.835 | 676 ms/step , 58140.60 GFLOP/s , 527814.2 tokens/s INFO:__main__:2024-10-26 21:48:44 | Epoch: 0 | Step: 16970 | Dataset: 0-13577000 | Loss: 1.821 | 676 ms/step , 58167.35 GFLOP/s , 532031.1 tokens/s INFO:__main__:2024-10-26 21:48:51 | Epoch: 0 | Step: 16980 | Dataset: 0-13585000 | Loss: 1.845 | 675 ms/step , 58194.72 GFLOP/s , 531733.1 tokens/s INFO:__main__:2024-10-26 21:48:59 | Epoch: 0 | Step: 16990 | Dataset: 0-13593000 | Loss: 1.830 | 674 ms/step , 58330.36 GFLOP/s , 531951.0 tokens/s INFO:__main__:2024-10-26 21:49:06 | Validation | Step: 17000 | Val_loss: 3.087 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 21:49:06 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_214906_step_17000.pt` INFO:__main__:2024-10-26 21:49:08 | Epoch: 0 | Step: 17000 | Dataset: 0-13601000 | Loss: 1.816 | 674 ms/step , 58338.39 GFLOP/s , 477487.0 tokens/s INFO:__main__:2024-10-26 21:49:15 | Epoch: 0 | Step: 17010 | Dataset: 0-13609000 | Loss: 2.420 | 677 ms/step , 58074.87 GFLOP/s , 531412.2 tokens/s INFO:__main__:2024-10-26 21:49:23 | Epoch: 0 | Step: 17020 | Dataset: 0-13617000 | Loss: 2.310 | 677 ms/step , 58074.90 GFLOP/s , 531138.2 tokens/s INFO:__main__:2024-10-26 21:49:31 | Epoch: 0 | Step: 17030 | Dataset: 0-13625000 | Loss: 2.288 | 675 ms/step , 58208.15 GFLOP/s , 531364.8 tokens/s INFO:__main__:2024-10-26 21:49:38 | Epoch: 0 | Step: 17040 | Dataset: 0-13633000 | Loss: 2.205 | 676 ms/step , 58132.25 GFLOP/s , 532701.8 tokens/s INFO:__main__:2024-10-26 21:49:46 | Epoch: 0 | Step: 17050 | Dataset: 0-13641000 | Loss: 2.253 | 677 ms/step , 58069.79 GFLOP/s , 532647.1 tokens/s INFO:__main__:2024-10-26 21:49:54 | Epoch: 0 | Step: 17060 | Dataset: 0-13649000 | Loss: 2.296 | 675 ms/step , 58243.12 GFLOP/s , 533115.8 tokens/s INFO:__main__:2024-10-26 21:50:01 | Epoch: 0 | Step: 17070 | Dataset: 0-13657000 | Loss: 2.317 | 675 ms/step , 58254.68 GFLOP/s , 533485.5 tokens/s INFO:__main__:2024-10-26 21:50:09 | Epoch: 0 | Step: 17080 | Dataset: 0-13665000 | Loss: 2.298 | 677 ms/step , 58045.70 GFLOP/s , 533007.8 tokens/s INFO:__main__:2024-10-26 21:50:17 | Epoch: 0 | Step: 17090 | Dataset: 0-13673000 | Loss: 2.263 | 678 ms/step , 58009.34 GFLOP/s , 531790.5 tokens/s INFO:__main__:2024-10-26 21:50:24 | Epoch: 0 | Step: 17100 | Dataset: 0-13681000 | Loss: 2.250 | 676 ms/step , 58117.04 GFLOP/s , 531011.6 tokens/s INFO:__main__:2024-10-26 21:50:32 | Epoch: 0 | Step: 17110 | Dataset: 0-13689000 | Loss: 2.276 | 674 ms/step , 58316.58 GFLOP/s , 531277.5 tokens/s INFO:__main__:2024-10-26 21:50:40 | Epoch: 0 | Step: 17120 | Dataset: 0-13697000 | Loss: 2.219 | 674 ms/step , 58334.19 GFLOP/s , 532544.2 tokens/s INFO:__main__:2024-10-26 21:50:48 | Epoch: 0 | Step: 17130 | Dataset: 0-13705000 | Loss: 2.188 | 674 ms/step , 58311.15 GFLOP/s , 533178.8 tokens/s INFO:__main__:2024-10-26 21:50:55 | Epoch: 0 | Step: 17140 | Dataset: 0-13713000 | Loss: 2.257 | 677 ms/step , 58073.90 GFLOP/s , 531127.6 tokens/s INFO:__main__:2024-10-26 21:51:03 | Epoch: 0 | Step: 17150 | Dataset: 0-13721000 | Loss: 2.278 | 675 ms/step , 58270.28 GFLOP/s , 531300.9 tokens/s INFO:__main__:2024-10-26 21:51:11 | Epoch: 0 | Step: 17160 | Dataset: 0-13729000 | Loss: 2.263 | 674 ms/step , 58325.11 GFLOP/s , 533200.4 tokens/s INFO:__main__:2024-10-26 21:51:18 | Epoch: 0 | Step: 17170 | Dataset: 0-13737000 | Loss: 2.380 | 674 ms/step , 58318.89 GFLOP/s , 533135.2 tokens/s INFO:__main__:2024-10-26 21:51:26 | Epoch: 0 | Step: 17180 | Dataset: 0-13745000 | Loss: 2.334 | 674 ms/step , 58301.56 GFLOP/s , 533197.7 tokens/s INFO:__main__:2024-10-26 21:51:34 | Epoch: 0 | Step: 17190 | Dataset: 0-13753000 | Loss: 2.273 | 674 ms/step , 58285.59 GFLOP/s , 532661.4 tokens/s INFO:__main__:2024-10-26 21:51:41 | Epoch: 0 | Step: 17200 | Dataset: 0-13761000 | Loss: 2.291 | 675 ms/step , 58201.49 GFLOP/s , 532939.9 tokens/s INFO:__main__:2024-10-26 21:51:49 | Epoch: 0 | Step: 17210 | Dataset: 0-13769000 | Loss: 2.385 | 674 ms/step , 58313.05 GFLOP/s , 532504.8 tokens/s INFO:__main__:2024-10-26 21:51:57 | Epoch: 0 | Step: 17220 | Dataset: 0-13777000 | Loss: 2.233 | 675 ms/step , 58259.35 GFLOP/s , 532622.8 tokens/s INFO:__main__:2024-10-26 21:52:04 | Epoch: 0 | Step: 17230 | Dataset: 0-13785000 | Loss: 2.305 | 675 ms/step , 58221.22 GFLOP/s , 532533.7 tokens/s INFO:__main__:2024-10-26 21:52:12 | Epoch: 0 | Step: 17240 | Dataset: 0-13793000 | Loss: 2.337 | 674 ms/step , 58286.00 GFLOP/s , 533178.5 tokens/s INFO:__main__:2024-10-26 21:52:20 | Epoch: 0 | Step: 17250 | Dataset: 0-13801000 | Loss: 2.295 | 674 ms/step , 58298.64 GFLOP/s , 533309.6 tokens/s INFO:__main__:2024-10-26 21:52:28 | Epoch: 0 | Step: 17260 | Dataset: 0-13809000 | Loss: 2.343 | 674 ms/step , 58344.19 GFLOP/s , 533528.4 tokens/s INFO:__main__:2024-10-26 21:52:35 | Epoch: 0 | Step: 17270 | Dataset: 0-13817000 | Loss: 2.322 | 675 ms/step , 58261.82 GFLOP/s , 533059.8 tokens/s INFO:__main__:2024-10-26 21:52:43 | Epoch: 0 | Step: 17280 | Dataset: 0-13825000 | Loss: 2.377 | 674 ms/step , 58279.95 GFLOP/s , 533165.4 tokens/s INFO:__main__:2024-10-26 21:52:51 | Epoch: 0 | Step: 17290 | Dataset: 0-13833000 | Loss: 2.307 | 676 ms/step , 58174.07 GFLOP/s , 532556.6 tokens/s INFO:__main__:2024-10-26 21:52:58 | Epoch: 0 | Step: 17300 | Dataset: 0-13841000 | Loss: 2.256 | 675 ms/step , 58251.72 GFLOP/s , 532857.7 tokens/s INFO:__main__:2024-10-26 21:53:06 | Epoch: 0 | Step: 17310 | Dataset: 0-13849000 | Loss: 2.376 | 674 ms/step , 58298.05 GFLOP/s , 533200.2 tokens/s INFO:__main__:2024-10-26 21:53:14 | Epoch: 0 | Step: 17320 | Dataset: 0-13857000 | Loss: 2.283 | 674 ms/step , 58328.35 GFLOP/s , 533051.1 tokens/s INFO:__main__:2024-10-26 21:53:21 | Epoch: 0 | Step: 17330 | Dataset: 0-13865000 | Loss: 2.284 | 675 ms/step , 58248.97 GFLOP/s , 532883.1 tokens/s INFO:__main__:2024-10-26 21:53:29 | Epoch: 0 | Step: 17340 | Dataset: 0-13873000 | Loss: 2.295 | 674 ms/step , 58346.01 GFLOP/s , 532464.1 tokens/s INFO:__main__:2024-10-26 21:53:37 | Epoch: 0 | Step: 17350 | Dataset: 0-13881000 | Loss: 2.294 | 674 ms/step , 58348.71 GFLOP/s , 533569.8 tokens/s INFO:__main__:2024-10-26 21:53:44 | Epoch: 0 | Step: 17360 | Dataset: 0-13889000 | Loss: 2.199 | 674 ms/step , 58352.88 GFLOP/s , 533152.5 tokens/s INFO:__main__:2024-10-26 21:53:52 | Epoch: 0 | Step: 17370 | Dataset: 0-13897000 | Loss: 2.356 | 675 ms/step , 58245.13 GFLOP/s , 533074.6 tokens/s INFO:__main__:2024-10-26 21:54:00 | Epoch: 0 | Step: 17380 | Dataset: 0-13905000 | Loss: 2.273 | 675 ms/step , 58232.00 GFLOP/s , 532265.8 tokens/s INFO:__main__:2024-10-26 21:54:07 | Epoch: 0 | Step: 17390 | Dataset: 0-13913000 | Loss: 2.282 | 675 ms/step , 58228.43 GFLOP/s , 532343.3 tokens/s INFO:__main__:2024-10-26 21:54:15 | Epoch: 0 | Step: 17400 | Dataset: 0-13921000 | Loss: 2.178 | 675 ms/step , 58231.34 GFLOP/s , 533008.2 tokens/s INFO:__main__:2024-10-26 21:54:23 | Epoch: 0 | Step: 17410 | Dataset: 0-13929000 | Loss: 2.319 | 674 ms/step , 58283.45 GFLOP/s , 532923.2 tokens/s INFO:__main__:2024-10-26 21:54:31 | Epoch: 0 | Step: 17420 | Dataset: 0-13937000 | Loss: 2.214 | 676 ms/step , 58176.26 GFLOP/s , 532449.3 tokens/s INFO:__main__:2024-10-26 21:54:38 | Epoch: 0 | Step: 17430 | Dataset: 0-13945000 | Loss: 2.291 | 676 ms/step , 58107.61 GFLOP/s , 531002.1 tokens/s INFO:__main__:2024-10-26 21:54:46 | Epoch: 0 | Step: 17440 | Dataset: 0-13953000 | Loss: 2.289 | 675 ms/step , 58223.64 GFLOP/s , 531303.2 tokens/s INFO:__main__:2024-10-26 21:54:54 | Epoch: 0 | Step: 17450 | Dataset: 0-13961000 | Loss: 2.256 | 677 ms/step , 58029.94 GFLOP/s , 530550.3 tokens/s INFO:__main__:2024-10-26 21:55:01 | Epoch: 0 | Step: 17460 | Dataset: 0-13969000 | Loss: 2.231 | 676 ms/step , 58129.90 GFLOP/s , 527919.3 tokens/s INFO:__main__:2024-10-26 21:55:09 | Epoch: 0 | Step: 17470 | Dataset: 0-13977000 | Loss: 2.236 | 678 ms/step , 58011.15 GFLOP/s , 529266.6 tokens/s INFO:__main__:2024-10-26 21:55:17 | Epoch: 0 | Step: 17480 | Dataset: 0-13985000 | Loss: 2.230 | 676 ms/step , 58129.28 GFLOP/s , 529553.1 tokens/s INFO:__main__:2024-10-26 21:55:25 | Epoch: 0 | Step: 17490 | Dataset: 0-13993000 | Loss: 2.191 | 676 ms/step , 58169.69 GFLOP/s , 531265.4 tokens/s INFO:__main__:2024-10-26 21:55:32 | Epoch: 0 | Step: 17500 | Dataset: 0-14001000 | Loss: 2.120 | 677 ms/step , 58021.66 GFLOP/s , 527320.2 tokens/s INFO:__main__:2024-10-26 21:55:40 | Epoch: 0 | Step: 17510 | Dataset: 0-14009000 | Loss: 2.141 | 676 ms/step , 58123.68 GFLOP/s , 529663.3 tokens/s INFO:__main__:2024-10-26 21:55:48 | Epoch: 0 | Step: 17520 | Dataset: 0-14017000 | Loss: 2.128 | 676 ms/step , 58114.21 GFLOP/s , 529251.5 tokens/s INFO:__main__:2024-10-26 21:55:56 | Epoch: 0 | Step: 17530 | Dataset: 0-14025000 | Loss: 2.073 | 677 ms/step , 58094.99 GFLOP/s , 529439.2 tokens/s INFO:__main__:2024-10-26 21:56:03 | Epoch: 0 | Step: 17540 | Dataset: 0-14033000 | Loss: 2.112 | 678 ms/step , 58018.88 GFLOP/s , 527164.9 tokens/s INFO:__main__:2024-10-26 21:56:11 | Epoch: 0 | Step: 17550 | Dataset: 0-14041000 | Loss: 2.124 | 678 ms/step , 57998.66 GFLOP/s , 529469.5 tokens/s INFO:__main__:2024-10-26 21:56:19 | Epoch: 0 | Step: 17560 | Dataset: 0-14049000 | Loss: 2.067 | 677 ms/step , 58098.83 GFLOP/s , 528829.8 tokens/s INFO:__main__:2024-10-26 21:56:27 | Epoch: 0 | Step: 17570 | Dataset: 0-14057000 | Loss: 2.018 | 679 ms/step , 57915.39 GFLOP/s , 529944.6 tokens/s INFO:__main__:2024-10-26 21:56:34 | Epoch: 0 | Step: 17580 | Dataset: 0-14065000 | Loss: 2.059 | 678 ms/step , 57990.13 GFLOP/s , 529072.6 tokens/s INFO:__main__:2024-10-26 21:56:42 | Epoch: 0 | Step: 17590 | Dataset: 0-14073000 | Loss: 2.118 | 678 ms/step , 57995.90 GFLOP/s , 528302.4 tokens/s INFO:__main__:2024-10-26 21:56:50 | Epoch: 0 | Step: 17600 | Dataset: 0-14081000 | Loss: 2.096 | 679 ms/step , 57859.36 GFLOP/s , 528381.8 tokens/s INFO:__main__:2024-10-26 21:56:58 | Epoch: 0 | Step: 17610 | Dataset: 0-14089000 | Loss: 2.030 | 677 ms/step , 58048.17 GFLOP/s , 528349.3 tokens/s INFO:__main__:2024-10-26 21:57:05 | Epoch: 0 | Step: 17620 | Dataset: 0-14097000 | Loss: 2.037 | 677 ms/step , 58057.47 GFLOP/s , 529075.4 tokens/s INFO:__main__:2024-10-26 21:57:13 | Epoch: 0 | Step: 17630 | Dataset: 0-14105000 | Loss: 2.089 | 677 ms/step , 58063.96 GFLOP/s , 528967.2 tokens/s INFO:__main__:2024-10-26 21:57:21 | Epoch: 0 | Step: 17640 | Dataset: 0-14113000 | Loss: 1.998 | 676 ms/step , 58175.17 GFLOP/s , 529296.6 tokens/s INFO:__main__:2024-10-26 21:57:28 | Epoch: 0 | Step: 17650 | Dataset: 0-14121000 | Loss: 2.414 | 676 ms/step , 58126.52 GFLOP/s , 530164.7 tokens/s INFO:__main__:2024-10-26 21:57:36 | Epoch: 0 | Step: 17660 | Dataset: 0-14129000 | Loss: 2.364 | 676 ms/step , 58141.57 GFLOP/s , 529452.1 tokens/s INFO:__main__:2024-10-26 21:57:44 | Epoch: 0 | Step: 17670 | Dataset: 0-14137000 | Loss: 2.353 | 676 ms/step , 58173.83 GFLOP/s , 529827.6 tokens/s INFO:__main__:2024-10-26 21:57:52 | Epoch: 0 | Step: 17680 | Dataset: 0-14145000 | Loss: 2.362 | 678 ms/step , 57946.53 GFLOP/s , 528748.7 tokens/s INFO:__main__:2024-10-26 21:57:59 | Epoch: 0 | Step: 17690 | Dataset: 0-14153000 | Loss: 2.290 | 678 ms/step , 57973.49 GFLOP/s , 528479.9 tokens/s INFO:__main__:2024-10-26 21:58:07 | Epoch: 0 | Step: 17700 | Dataset: 0-14161000 | Loss: 2.316 | 682 ms/step , 57645.58 GFLOP/s , 527699.8 tokens/s INFO:__main__:2024-10-26 21:58:15 | Epoch: 0 | Step: 17710 | Dataset: 0-14169000 | Loss: 2.247 | 680 ms/step , 57828.19 GFLOP/s , 526451.3 tokens/s INFO:__main__:2024-10-26 21:58:23 | Epoch: 0 | Step: 17720 | Dataset: 0-14177000 | Loss: 2.233 | 677 ms/step , 58033.10 GFLOP/s , 528310.2 tokens/s INFO:__main__:2024-10-26 21:58:30 | Epoch: 0 | Step: 17730 | Dataset: 0-14185000 | Loss: 2.359 | 677 ms/step , 58061.24 GFLOP/s , 529661.9 tokens/s INFO:__main__:2024-10-26 21:58:38 | Epoch: 0 | Step: 17740 | Dataset: 0-14193000 | Loss: 2.231 | 679 ms/step , 57933.02 GFLOP/s , 529149.3 tokens/s INFO:__main__:2024-10-26 21:58:46 | Epoch: 0 | Step: 17750 | Dataset: 0-14201000 | Loss: 2.207 | 677 ms/step , 58096.23 GFLOP/s , 530021.0 tokens/s INFO:__main__:2024-10-26 21:58:54 | Epoch: 0 | Step: 17760 | Dataset: 0-14209000 | Loss: 2.287 | 677 ms/step , 58042.92 GFLOP/s , 529436.4 tokens/s INFO:__main__:2024-10-26 21:59:01 | Epoch: 0 | Step: 17770 | Dataset: 0-14217000 | Loss: 2.285 | 677 ms/step , 58079.97 GFLOP/s , 531241.9 tokens/s INFO:__main__:2024-10-26 21:59:09 | Epoch: 0 | Step: 17780 | Dataset: 0-14225000 | Loss: 2.166 | 678 ms/step , 57967.51 GFLOP/s , 530633.3 tokens/s INFO:__main__:2024-10-26 21:59:17 | Epoch: 0 | Step: 17790 | Dataset: 0-14233000 | Loss: 2.129 | 679 ms/step , 57866.21 GFLOP/s , 529561.0 tokens/s INFO:__main__:2024-10-26 21:59:25 | Epoch: 0 | Step: 17800 | Dataset: 0-14241000 | Loss: 2.205 | 677 ms/step , 58038.89 GFLOP/s , 529107.6 tokens/s INFO:__main__:2024-10-26 21:59:32 | Epoch: 0 | Step: 17810 | Dataset: 0-14249000 | Loss: 2.177 | 680 ms/step , 57850.10 GFLOP/s , 529939.1 tokens/s INFO:__main__:2024-10-26 21:59:40 | Epoch: 0 | Step: 17820 | Dataset: 0-14257000 | Loss: 1.948 | 678 ms/step , 57963.62 GFLOP/s , 529711.6 tokens/s INFO:__main__:2024-10-26 21:59:48 | Epoch: 0 | Step: 17830 | Dataset: 0-14265000 | Loss: 1.839 | 678 ms/step , 57984.04 GFLOP/s , 528895.6 tokens/s INFO:__main__:2024-10-26 21:59:56 | Epoch: 0 | Step: 17840 | Dataset: 0-14273000 | Loss: 1.829 | 677 ms/step , 58039.61 GFLOP/s , 528848.5 tokens/s INFO:__main__:2024-10-26 22:00:03 | Epoch: 0 | Step: 17850 | Dataset: 0-14281000 | Loss: 1.800 | 679 ms/step , 57907.36 GFLOP/s , 535203.6 tokens/s INFO:__main__:2024-10-26 22:00:11 | Epoch: 0 | Step: 17860 | Dataset: 0-14289000 | Loss: 1.807 | 678 ms/step , 57969.61 GFLOP/s , 529215.3 tokens/s INFO:__main__:2024-10-26 22:00:19 | Epoch: 0 | Step: 17870 | Dataset: 0-14297000 | Loss: 1.786 | 676 ms/step , 58126.83 GFLOP/s , 529703.5 tokens/s INFO:__main__:2024-10-26 22:00:26 | Epoch: 0 | Step: 17880 | Dataset: 0-14305000 | Loss: 1.779 | 679 ms/step , 57868.83 GFLOP/s , 529023.7 tokens/s INFO:__main__:2024-10-26 22:00:34 | Epoch: 0 | Step: 17890 | Dataset: 0-14313000 | Loss: 1.770 | 678 ms/step , 57962.70 GFLOP/s , 529828.6 tokens/s INFO:__main__:2024-10-26 22:00:42 | Epoch: 0 | Step: 17900 | Dataset: 0-14321000 | Loss: 2.416 | 676 ms/step , 58190.50 GFLOP/s , 530476.8 tokens/s INFO:__main__:2024-10-26 22:00:50 | Epoch: 0 | Step: 17910 | Dataset: 0-14329000 | Loss: 2.356 | 678 ms/step , 57983.32 GFLOP/s , 531253.1 tokens/s INFO:__main__:2024-10-26 22:00:57 | Epoch: 0 | Step: 17920 | Dataset: 0-14337000 | Loss: 2.346 | 676 ms/step , 58160.67 GFLOP/s , 529508.1 tokens/s INFO:__main__:2024-10-26 22:01:05 | Epoch: 0 | Step: 17930 | Dataset: 0-14345000 | Loss: 2.266 | 676 ms/step , 58189.93 GFLOP/s , 530147.0 tokens/s INFO:__main__:2024-10-26 22:01:13 | Epoch: 0 | Step: 17940 | Dataset: 0-14353000 | Loss: 2.294 | 675 ms/step , 58237.13 GFLOP/s , 530398.2 tokens/s INFO:__main__:2024-10-26 22:01:20 | Epoch: 0 | Step: 17950 | Dataset: 0-14361000 | Loss: 2.297 | 676 ms/step , 58108.76 GFLOP/s , 530213.3 tokens/s INFO:__main__:2024-10-26 22:01:28 | Epoch: 0 | Step: 17960 | Dataset: 0-14369000 | Loss: 2.407 | 677 ms/step , 58045.03 GFLOP/s , 529033.1 tokens/s INFO:__main__:2024-10-26 22:01:36 | Epoch: 0 | Step: 17970 | Dataset: 0-14377000 | Loss: 2.359 | 678 ms/step , 57944.62 GFLOP/s , 529293.8 tokens/s INFO:__main__:2024-10-26 22:01:44 | Epoch: 0 | Step: 17980 | Dataset: 0-14385000 | Loss: 2.309 | 680 ms/step , 57818.24 GFLOP/s , 529690.9 tokens/s INFO:__main__:2024-10-26 22:01:51 | Epoch: 0 | Step: 17990 | Dataset: 0-14393000 | Loss: 2.285 | 676 ms/step , 58115.60 GFLOP/s , 530607.5 tokens/s INFO:__main__:2024-10-26 22:01:59 | Validation | Step: 18000 | Val_loss: 2.316 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 22:01:59 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_220159_step_18000.pt` INFO:__main__:2024-10-26 22:02:00 | Epoch: 0 | Step: 18000 | Dataset: 0-14401000 | Loss: 2.218 | 675 ms/step , 58217.40 GFLOP/s , 477091.7 tokens/s INFO:__main__:2024-10-26 22:02:08 | Epoch: 0 | Step: 18010 | Dataset: 0-14409000 | Loss: 2.292 | 677 ms/step , 58030.84 GFLOP/s , 529146.9 tokens/s INFO:__main__:2024-10-26 22:02:16 | Epoch: 0 | Step: 18020 | Dataset: 0-14417000 | Loss: 2.287 | 679 ms/step , 57916.85 GFLOP/s , 527647.0 tokens/s INFO:__main__:2024-10-26 22:02:23 | Epoch: 0 | Step: 18030 | Dataset: 0-14425000 | Loss: 2.222 | 679 ms/step , 57919.39 GFLOP/s , 526384.5 tokens/s INFO:__main__:2024-10-26 22:02:31 | Epoch: 0 | Step: 18040 | Dataset: 0-14433000 | Loss: 2.278 | 677 ms/step , 58036.34 GFLOP/s , 526669.8 tokens/s INFO:__main__:2024-10-26 22:02:39 | Epoch: 0 | Step: 18050 | Dataset: 0-14441000 | Loss: 2.385 | 678 ms/step , 57960.93 GFLOP/s , 527763.8 tokens/s INFO:__main__:2024-10-26 22:02:47 | Epoch: 0 | Step: 18060 | Dataset: 0-14449000 | Loss: 1.856 | 677 ms/step , 58069.40 GFLOP/s , 529129.5 tokens/s INFO:__main__:2024-10-26 22:02:54 | Epoch: 0 | Step: 18070 | Dataset: 0-14457000 | Loss: 1.809 | 678 ms/step , 57991.75 GFLOP/s , 529172.8 tokens/s INFO:__main__:2024-10-26 22:03:02 | Epoch: 0 | Step: 18080 | Dataset: 0-14465000 | Loss: 1.763 | 677 ms/step , 58070.96 GFLOP/s , 528421.9 tokens/s INFO:__main__:2024-10-26 22:03:10 | Epoch: 0 | Step: 18090 | Dataset: 0-14473000 | Loss: 1.792 | 678 ms/step , 57988.24 GFLOP/s , 528712.3 tokens/s INFO:__main__:2024-10-26 22:03:18 | Epoch: 0 | Step: 18100 | Dataset: 0-14481000 | Loss: 1.769 | 680 ms/step , 57813.31 GFLOP/s , 527021.0 tokens/s INFO:__main__:2024-10-26 22:03:25 | Epoch: 0 | Step: 18110 | Dataset: 0-14489000 | Loss: 1.725 | 677 ms/step , 58096.52 GFLOP/s , 529333.4 tokens/s INFO:__main__:2024-10-26 22:03:33 | Epoch: 0 | Step: 18120 | Dataset: 0-14497000 | Loss: 1.722 | 677 ms/step , 58086.03 GFLOP/s , 529301.3 tokens/s INFO:__main__:2024-10-26 22:03:41 | Epoch: 0 | Step: 18130 | Dataset: 0-14505000 | Loss: 1.737 | 675 ms/step , 58194.85 GFLOP/s , 529495.6 tokens/s INFO:__main__:2024-10-26 22:03:49 | Epoch: 0 | Step: 18140 | Dataset: 0-14513000 | Loss: 1.731 | 675 ms/step , 58232.26 GFLOP/s , 531242.7 tokens/s INFO:__main__:2024-10-26 22:03:56 | Epoch: 0 | Step: 18150 | Dataset: 0-14521000 | Loss: 2.297 | 675 ms/step , 58235.91 GFLOP/s , 532308.3 tokens/s INFO:__main__:2024-10-26 22:04:04 | Epoch: 0 | Step: 18160 | Dataset: 0-14529000 | Loss: 2.337 | 676 ms/step , 58181.13 GFLOP/s , 532162.6 tokens/s INFO:__main__:2024-10-26 22:04:12 | Epoch: 0 | Step: 18170 | Dataset: 0-14537000 | Loss: 2.207 | 676 ms/step , 58185.19 GFLOP/s , 531974.0 tokens/s INFO:__main__:2024-10-26 22:04:19 | Epoch: 0 | Step: 18180 | Dataset: 0-14545000 | Loss: 2.331 | 676 ms/step , 58180.96 GFLOP/s , 531872.8 tokens/s INFO:__main__:2024-10-26 22:04:27 | Epoch: 0 | Step: 18190 | Dataset: 0-14553000 | Loss: 2.338 | 676 ms/step , 58175.56 GFLOP/s , 531829.0 tokens/s INFO:__main__:2024-10-26 22:04:35 | Epoch: 0 | Step: 18200 | Dataset: 0-14561000 | Loss: 2.263 | 676 ms/step , 58145.07 GFLOP/s , 531812.9 tokens/s INFO:__main__:2024-10-26 22:04:42 | Epoch: 0 | Step: 18210 | Dataset: 0-14569000 | Loss: 2.137 | 675 ms/step , 58195.75 GFLOP/s , 532359.3 tokens/s INFO:__main__:2024-10-26 22:04:50 | Epoch: 0 | Step: 18220 | Dataset: 0-14577000 | Loss: 2.174 | 678 ms/step , 57985.55 GFLOP/s , 531405.4 tokens/s INFO:__main__:2024-10-26 22:04:58 | Epoch: 0 | Step: 18230 | Dataset: 0-14585000 | Loss: 2.267 | 676 ms/step , 58114.09 GFLOP/s , 532121.8 tokens/s INFO:__main__:2024-10-26 22:05:06 | Epoch: 0 | Step: 18240 | Dataset: 0-14593000 | Loss: 2.195 | 676 ms/step , 58140.58 GFLOP/s , 531754.6 tokens/s INFO:__main__:2024-10-26 22:05:13 | Epoch: 0 | Step: 18250 | Dataset: 0-14601000 | Loss: 2.259 | 675 ms/step , 58223.34 GFLOP/s , 531675.8 tokens/s INFO:__main__:2024-10-26 22:05:21 | Epoch: 0 | Step: 18260 | Dataset: 0-14609000 | Loss: 2.255 | 677 ms/step , 58098.20 GFLOP/s , 531124.3 tokens/s INFO:__main__:2024-10-26 22:05:29 | Epoch: 0 | Step: 18270 | Dataset: 0-14617000 | Loss: 2.218 | 682 ms/step , 57608.26 GFLOP/s , 530057.7 tokens/s INFO:__main__:2024-10-26 22:05:36 | Epoch: 0 | Step: 18280 | Dataset: 0-14625000 | Loss: 2.238 | 681 ms/step , 57689.96 GFLOP/s , 529166.4 tokens/s INFO:__main__:2024-10-26 22:05:44 | Epoch: 0 | Step: 18290 | Dataset: 0-14633000 | Loss: 2.301 | 679 ms/step , 57868.50 GFLOP/s , 528721.5 tokens/s INFO:__main__:2024-10-26 22:05:52 | Epoch: 0 | Step: 18300 | Dataset: 0-14641000 | Loss: 2.236 | 681 ms/step , 57738.50 GFLOP/s , 529408.1 tokens/s INFO:__main__:2024-10-26 22:06:00 | Epoch: 0 | Step: 18310 | Dataset: 0-14649000 | Loss: 2.321 | 679 ms/step , 57852.77 GFLOP/s , 529113.4 tokens/s INFO:__main__:2024-10-26 22:06:07 | Epoch: 0 | Step: 18320 | Dataset: 0-14657000 | Loss: 2.294 | 679 ms/step , 57904.20 GFLOP/s , 529473.0 tokens/s INFO:__main__:2024-10-26 22:06:15 | Epoch: 0 | Step: 18330 | Dataset: 0-14665000 | Loss: 2.404 | 677 ms/step , 58027.92 GFLOP/s , 529199.5 tokens/s INFO:__main__:2024-10-26 22:06:23 | Epoch: 0 | Step: 18340 | Dataset: 0-14673000 | Loss: 2.319 | 680 ms/step , 57827.43 GFLOP/s , 530649.8 tokens/s INFO:__main__:2024-10-26 22:06:31 | Epoch: 0 | Step: 18350 | Dataset: 0-14681000 | Loss: 2.223 | 677 ms/step , 58042.51 GFLOP/s , 528956.1 tokens/s INFO:__main__:2024-10-26 22:06:38 | Epoch: 0 | Step: 18360 | Dataset: 0-14689000 | Loss: 2.223 | 678 ms/step , 57960.27 GFLOP/s , 528957.1 tokens/s INFO:__main__:2024-10-26 22:06:46 | Epoch: 0 | Step: 18370 | Dataset: 0-14697000 | Loss: 2.239 | 680 ms/step , 57813.78 GFLOP/s , 525497.3 tokens/s INFO:__main__:2024-10-26 22:06:54 | Epoch: 0 | Step: 18380 | Dataset: 0-14705000 | Loss: 2.303 | 676 ms/step , 58188.73 GFLOP/s , 529045.0 tokens/s INFO:__main__:2024-10-26 22:07:02 | Epoch: 0 | Step: 18390 | Dataset: 0-14713000 | Loss: 2.305 | 679 ms/step , 57931.31 GFLOP/s , 527555.3 tokens/s INFO:__main__:2024-10-26 22:07:09 | Epoch: 0 | Step: 18400 | Dataset: 0-14721000 | Loss: 2.301 | 677 ms/step , 58035.55 GFLOP/s , 528370.7 tokens/s INFO:__main__:2024-10-26 22:07:17 | Epoch: 0 | Step: 18410 | Dataset: 0-14729000 | Loss: 2.273 | 677 ms/step , 58055.15 GFLOP/s , 528575.2 tokens/s INFO:__main__:2024-10-26 22:07:25 | Epoch: 0 | Step: 18420 | Dataset: 0-14737000 | Loss: 2.310 | 676 ms/step , 58145.84 GFLOP/s , 529812.2 tokens/s INFO:__main__:2024-10-26 22:07:33 | Epoch: 0 | Step: 18430 | Dataset: 0-14745000 | Loss: 2.201 | 677 ms/step , 58080.62 GFLOP/s , 529846.3 tokens/s INFO:__main__:2024-10-26 22:07:40 | Epoch: 0 | Step: 18440 | Dataset: 0-14753000 | Loss: 2.312 | 678 ms/step , 57993.09 GFLOP/s , 528132.1 tokens/s INFO:__main__:2024-10-26 22:07:48 | Epoch: 0 | Step: 18450 | Dataset: 0-14761000 | Loss: 2.307 | 677 ms/step , 58077.42 GFLOP/s , 528483.9 tokens/s INFO:__main__:2024-10-26 22:07:56 | Epoch: 0 | Step: 18460 | Dataset: 0-14769000 | Loss: 2.164 | 677 ms/step , 58068.45 GFLOP/s , 529744.1 tokens/s INFO:__main__:2024-10-26 22:08:04 | Epoch: 0 | Step: 18470 | Dataset: 0-14777000 | Loss: 2.368 | 681 ms/step , 57734.88 GFLOP/s , 530165.2 tokens/s INFO:__main__:2024-10-26 22:08:11 | Epoch: 0 | Step: 18480 | Dataset: 0-14785000 | Loss: 2.273 | 677 ms/step , 58067.63 GFLOP/s , 529144.7 tokens/s INFO:__main__:2024-10-26 22:08:19 | Epoch: 0 | Step: 18490 | Dataset: 0-14793000 | Loss: 2.294 | 679 ms/step , 57924.16 GFLOP/s , 528978.6 tokens/s INFO:__main__:2024-10-26 22:08:27 | Epoch: 0 | Step: 18500 | Dataset: 0-14801000 | Loss: 2.223 | 677 ms/step , 58069.58 GFLOP/s , 528355.2 tokens/s INFO:__main__:2024-10-26 22:08:35 | Epoch: 0 | Step: 18510 | Dataset: 0-14809000 | Loss: 2.233 | 679 ms/step , 57884.32 GFLOP/s , 528951.3 tokens/s INFO:__main__:2024-10-26 22:08:42 | Epoch: 0 | Step: 18520 | Dataset: 0-14817000 | Loss: 2.332 | 679 ms/step , 57870.09 GFLOP/s , 527972.5 tokens/s INFO:__main__:2024-10-26 22:08:50 | Epoch: 0 | Step: 18530 | Dataset: 0-14825000 | Loss: 2.292 | 680 ms/step , 57785.28 GFLOP/s , 528156.2 tokens/s INFO:__main__:2024-10-26 22:08:58 | Epoch: 0 | Step: 18540 | Dataset: 0-14833000 | Loss: 2.251 | 676 ms/step , 58177.39 GFLOP/s , 528387.8 tokens/s INFO:__main__:2024-10-26 22:09:06 | Epoch: 0 | Step: 18550 | Dataset: 0-14841000 | Loss: 2.338 | 676 ms/step , 58126.47 GFLOP/s , 529844.5 tokens/s INFO:__main__:2024-10-26 22:09:13 | Epoch: 0 | Step: 18560 | Dataset: 0-14849000 | Loss: 2.241 | 677 ms/step , 58087.42 GFLOP/s , 530547.4 tokens/s INFO:__main__:2024-10-26 22:09:21 | Epoch: 0 | Step: 18570 | Dataset: 0-14857000 | Loss: 2.269 | 678 ms/step , 58018.55 GFLOP/s , 528389.3 tokens/s INFO:__main__:2024-10-26 22:09:29 | Epoch: 0 | Step: 18580 | Dataset: 0-14865000 | Loss: 2.241 | 680 ms/step , 57842.09 GFLOP/s , 529368.4 tokens/s INFO:__main__:2024-10-26 22:09:36 | Epoch: 0 | Step: 18590 | Dataset: 0-14873000 | Loss: 2.202 | 679 ms/step , 57908.12 GFLOP/s , 528751.8 tokens/s INFO:__main__:2024-10-26 22:09:44 | Epoch: 0 | Step: 18600 | Dataset: 0-14881000 | Loss: 2.254 | 675 ms/step , 58225.99 GFLOP/s , 529625.7 tokens/s INFO:__main__:2024-10-26 22:09:52 | Epoch: 0 | Step: 18610 | Dataset: 0-14889000 | Loss: 2.268 | 677 ms/step , 58030.99 GFLOP/s , 532178.6 tokens/s INFO:__main__:2024-10-26 22:10:00 | Epoch: 0 | Step: 18620 | Dataset: 0-14897000 | Loss: 2.214 | 675 ms/step , 58205.86 GFLOP/s , 530363.6 tokens/s INFO:__main__:2024-10-26 22:10:07 | Epoch: 0 | Step: 18630 | Dataset: 0-14905000 | Loss: 2.343 | 680 ms/step , 57804.56 GFLOP/s , 531386.1 tokens/s INFO:__main__:2024-10-26 22:10:15 | Epoch: 0 | Step: 18640 | Dataset: 0-14913000 | Loss: 2.226 | 676 ms/step , 58185.11 GFLOP/s , 530540.2 tokens/s INFO:__main__:2024-10-26 22:10:23 | Epoch: 0 | Step: 18650 | Dataset: 0-14921000 | Loss: 2.357 | 676 ms/step , 58159.06 GFLOP/s , 529450.1 tokens/s INFO:__main__:2024-10-26 22:10:31 | Epoch: 0 | Step: 18660 | Dataset: 0-14929000 | Loss: 2.336 | 677 ms/step , 58049.43 GFLOP/s , 529390.4 tokens/s INFO:__main__:2024-10-26 22:10:38 | Epoch: 0 | Step: 18670 | Dataset: 0-14937000 | Loss: 2.226 | 676 ms/step , 58192.00 GFLOP/s , 529420.3 tokens/s INFO:__main__:2024-10-26 22:10:46 | Epoch: 0 | Step: 18680 | Dataset: 0-14945000 | Loss: 2.328 | 677 ms/step , 58085.14 GFLOP/s , 530527.2 tokens/s INFO:__main__:2024-10-26 22:10:54 | Epoch: 0 | Step: 18690 | Dataset: 0-14953000 | Loss: 2.224 | 676 ms/step , 58135.21 GFLOP/s , 529617.0 tokens/s INFO:__main__:2024-10-26 22:11:01 | Epoch: 0 | Step: 18700 | Dataset: 0-14961000 | Loss: 2.260 | 677 ms/step , 58088.92 GFLOP/s , 529543.9 tokens/s INFO:__main__:2024-10-26 22:11:09 | Epoch: 0 | Step: 18710 | Dataset: 0-14969000 | Loss: 2.280 | 677 ms/step , 58078.51 GFLOP/s , 528867.5 tokens/s INFO:__main__:2024-10-26 22:11:17 | Epoch: 0 | Step: 18720 | Dataset: 0-14977000 | Loss: 2.267 | 678 ms/step , 57968.20 GFLOP/s , 529746.5 tokens/s INFO:__main__:2024-10-26 22:11:25 | Epoch: 0 | Step: 18730 | Dataset: 0-14985000 | Loss: 2.324 | 677 ms/step , 58058.05 GFLOP/s , 530176.3 tokens/s INFO:__main__:2024-10-26 22:11:32 | Epoch: 0 | Step: 18740 | Dataset: 0-14993000 | Loss: 2.317 | 676 ms/step , 58181.41 GFLOP/s , 529900.7 tokens/s INFO:__main__:2024-10-26 22:11:40 | Epoch: 0 | Step: 18750 | Dataset: 0-15001000 | Loss: 2.275 | 676 ms/step , 58135.90 GFLOP/s , 529549.2 tokens/s INFO:__main__:2024-10-26 22:11:48 | Epoch: 0 | Step: 18760 | Dataset: 0-15009000 | Loss: 2.298 | 675 ms/step , 58245.47 GFLOP/s , 530352.6 tokens/s INFO:__main__:2024-10-26 22:11:56 | Epoch: 0 | Step: 18770 | Dataset: 0-15017000 | Loss: 2.274 | 677 ms/step , 58042.97 GFLOP/s , 530554.1 tokens/s INFO:__main__:2024-10-26 22:12:03 | Epoch: 0 | Step: 18780 | Dataset: 0-15025000 | Loss: 2.297 | 677 ms/step , 58062.79 GFLOP/s , 529570.0 tokens/s INFO:__main__:2024-10-26 22:12:11 | Epoch: 0 | Step: 18790 | Dataset: 0-15033000 | Loss: 2.265 | 676 ms/step , 58168.69 GFLOP/s , 529445.1 tokens/s INFO:__main__:2024-10-26 22:12:19 | Epoch: 0 | Step: 18800 | Dataset: 0-15041000 | Loss: 2.320 | 677 ms/step , 58051.04 GFLOP/s , 528380.4 tokens/s INFO:__main__:2024-10-26 22:12:27 | Epoch: 0 | Step: 18810 | Dataset: 0-15049000 | Loss: 2.247 | 677 ms/step , 58040.80 GFLOP/s , 529129.9 tokens/s INFO:__main__:2024-10-26 22:12:34 | Epoch: 0 | Step: 18820 | Dataset: 0-15057000 | Loss: 2.289 | 677 ms/step , 58105.71 GFLOP/s , 529565.7 tokens/s INFO:__main__:2024-10-26 22:12:42 | Epoch: 0 | Step: 18830 | Dataset: 0-15065000 | Loss: 2.274 | 677 ms/step , 58029.75 GFLOP/s , 529039.0 tokens/s INFO:__main__:2024-10-26 22:12:50 | Epoch: 0 | Step: 18840 | Dataset: 0-15073000 | Loss: 2.341 | 676 ms/step , 58112.40 GFLOP/s , 529002.3 tokens/s INFO:__main__:2024-10-26 22:12:57 | Epoch: 0 | Step: 18850 | Dataset: 0-15081000 | Loss: 2.287 | 677 ms/step , 58044.03 GFLOP/s , 530343.1 tokens/s INFO:__main__:2024-10-26 22:13:05 | Epoch: 0 | Step: 18860 | Dataset: 0-15089000 | Loss: 2.205 | 677 ms/step , 58045.01 GFLOP/s , 527288.9 tokens/s INFO:__main__:2024-10-26 22:13:13 | Epoch: 0 | Step: 18870 | Dataset: 0-15097000 | Loss: 2.358 | 677 ms/step , 58097.25 GFLOP/s , 528975.9 tokens/s INFO:__main__:2024-10-26 22:13:21 | Epoch: 0 | Step: 18880 | Dataset: 0-15105000 | Loss: 2.272 | 678 ms/step , 57973.28 GFLOP/s , 525058.2 tokens/s INFO:__main__:2024-10-26 22:13:29 | Epoch: 0 | Step: 18890 | Dataset: 0-15113000 | Loss: 2.225 | 678 ms/step , 57943.71 GFLOP/s , 528958.5 tokens/s INFO:__main__:2024-10-26 22:13:36 | Epoch: 0 | Step: 18900 | Dataset: 0-15121000 | Loss: 2.294 | 679 ms/step , 57930.64 GFLOP/s , 528217.4 tokens/s INFO:__main__:2024-10-26 22:13:44 | Epoch: 0 | Step: 18910 | Dataset: 0-15129000 | Loss: 2.283 | 678 ms/step , 57987.22 GFLOP/s , 527886.6 tokens/s INFO:__main__:2024-10-26 22:13:52 | Epoch: 0 | Step: 18920 | Dataset: 0-15137000 | Loss: 2.257 | 678 ms/step , 58004.16 GFLOP/s , 528187.5 tokens/s INFO:__main__:2024-10-26 22:14:00 | Epoch: 0 | Step: 18930 | Dataset: 0-15145000 | Loss: 2.251 | 678 ms/step , 58016.81 GFLOP/s , 528878.1 tokens/s INFO:__main__:2024-10-26 22:14:07 | Epoch: 0 | Step: 18940 | Dataset: 0-15153000 | Loss: 2.316 | 679 ms/step , 57931.32 GFLOP/s , 528570.8 tokens/s INFO:__main__:2024-10-26 22:14:15 | Epoch: 0 | Step: 18950 | Dataset: 0-15161000 | Loss: 2.292 | 677 ms/step , 58041.90 GFLOP/s , 529249.5 tokens/s INFO:__main__:2024-10-26 22:14:23 | Epoch: 0 | Step: 18960 | Dataset: 0-15169000 | Loss: 2.287 | 678 ms/step , 58001.56 GFLOP/s , 528950.2 tokens/s INFO:__main__:2024-10-26 22:14:30 | Epoch: 0 | Step: 18970 | Dataset: 0-15177000 | Loss: 2.239 | 675 ms/step , 58196.32 GFLOP/s , 529365.1 tokens/s INFO:__main__:2024-10-26 22:14:38 | Epoch: 0 | Step: 18980 | Dataset: 0-15185000 | Loss: 2.240 | 677 ms/step , 58023.61 GFLOP/s , 528631.4 tokens/s INFO:__main__:2024-10-26 22:14:46 | Epoch: 0 | Step: 18990 | Dataset: 0-15193000 | Loss: 2.262 | 676 ms/step , 58152.03 GFLOP/s , 530539.9 tokens/s INFO:__main__:2024-10-26 22:14:53 | Validation | Step: 19000 | Val_loss: 2.284 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 22:14:53 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_221453_step_19000.pt` INFO:__main__:2024-10-26 22:14:55 | Epoch: 0 | Step: 19000 | Dataset: 0-15201000 | Loss: 2.243 | 675 ms/step , 58208.40 GFLOP/s , 477708.8 tokens/s INFO:__main__:2024-10-26 22:15:02 | Epoch: 0 | Step: 19010 | Dataset: 0-15209000 | Loss: 2.243 | 676 ms/step , 58137.78 GFLOP/s , 531535.0 tokens/s INFO:__main__:2024-10-26 22:15:10 | Epoch: 0 | Step: 19020 | Dataset: 0-15217000 | Loss: 2.236 | 676 ms/step , 58173.27 GFLOP/s , 531705.1 tokens/s INFO:__main__:2024-10-26 22:15:18 | Epoch: 0 | Step: 19030 | Dataset: 0-15225000 | Loss: 2.282 | 676 ms/step , 58179.11 GFLOP/s , 532127.5 tokens/s INFO:__main__:2024-10-26 22:15:25 | Epoch: 0 | Step: 19040 | Dataset: 0-15233000 | Loss: 2.251 | 676 ms/step , 58113.90 GFLOP/s , 531929.3 tokens/s INFO:__main__:2024-10-26 22:15:33 | Epoch: 0 | Step: 19050 | Dataset: 0-15241000 | Loss: 2.273 | 676 ms/step , 58149.48 GFLOP/s , 531659.6 tokens/s INFO:__main__:2024-10-26 22:15:41 | Epoch: 0 | Step: 19060 | Dataset: 0-15249000 | Loss: 2.165 | 675 ms/step , 58252.93 GFLOP/s , 531232.6 tokens/s INFO:__main__:2024-10-26 22:15:48 | Epoch: 0 | Step: 19070 | Dataset: 0-15257000 | Loss: 2.248 | 676 ms/step , 58192.00 GFLOP/s , 532740.7 tokens/s INFO:__main__:2024-10-26 22:15:56 | Epoch: 0 | Step: 19080 | Dataset: 0-15265000 | Loss: 2.270 | 676 ms/step , 58121.47 GFLOP/s , 531859.3 tokens/s INFO:__main__:2024-10-26 22:16:04 | Epoch: 0 | Step: 19090 | Dataset: 0-15273000 | Loss: 2.269 | 678 ms/step , 57996.10 GFLOP/s , 529353.3 tokens/s INFO:__main__:2024-10-26 22:16:12 | Epoch: 0 | Step: 19100 | Dataset: 0-15281000 | Loss: 2.167 | 679 ms/step , 57895.32 GFLOP/s , 529410.7 tokens/s INFO:__main__:2024-10-26 22:16:19 | Epoch: 0 | Step: 19110 | Dataset: 0-15289000 | Loss: 2.161 | 677 ms/step , 58024.04 GFLOP/s , 529335.2 tokens/s INFO:__main__:2024-10-26 22:16:27 | Epoch: 0 | Step: 19120 | Dataset: 0-15297000 | Loss: 1.897 | 680 ms/step , 57845.26 GFLOP/s , 527053.7 tokens/s INFO:__main__:2024-10-26 22:16:35 | Epoch: 0 | Step: 19130 | Dataset: 0-15305000 | Loss: 1.837 | 679 ms/step , 57850.82 GFLOP/s , 525316.4 tokens/s INFO:__main__:2024-10-26 22:16:43 | Epoch: 0 | Step: 19140 | Dataset: 0-15313000 | Loss: 1.822 | 683 ms/step , 57578.09 GFLOP/s , 528837.3 tokens/s INFO:__main__:2024-10-26 22:16:50 | Epoch: 0 | Step: 19150 | Dataset: 0-15321000 | Loss: 1.793 | 679 ms/step , 57916.32 GFLOP/s , 527585.2 tokens/s INFO:__main__:2024-10-26 22:16:58 | Epoch: 0 | Step: 19160 | Dataset: 0-15329000 | Loss: 1.792 | 677 ms/step , 58075.71 GFLOP/s , 528515.1 tokens/s INFO:__main__:2024-10-26 22:17:06 | Epoch: 0 | Step: 19170 | Dataset: 0-15337000 | Loss: 1.782 | 677 ms/step , 58097.24 GFLOP/s , 525442.5 tokens/s INFO:__main__:2024-10-26 22:17:14 | Epoch: 0 | Step: 19180 | Dataset: 0-15345000 | Loss: 1.781 | 677 ms/step , 58046.00 GFLOP/s , 529019.8 tokens/s INFO:__main__:2024-10-26 22:17:21 | Epoch: 0 | Step: 19190 | Dataset: 0-15353000 | Loss: 1.733 | 678 ms/step , 57986.00 GFLOP/s , 527724.3 tokens/s INFO:__main__:2024-10-26 22:17:29 | Epoch: 0 | Step: 19200 | Dataset: 0-15361000 | Loss: 2.448 | 678 ms/step , 58020.11 GFLOP/s , 528073.0 tokens/s INFO:__main__:2024-10-26 22:17:37 | Epoch: 0 | Step: 19210 | Dataset: 0-15369000 | Loss: 2.331 | 679 ms/step , 57925.15 GFLOP/s , 529855.2 tokens/s INFO:__main__:2024-10-26 22:17:45 | Epoch: 0 | Step: 19220 | Dataset: 0-15377000 | Loss: 2.254 | 677 ms/step , 58049.50 GFLOP/s , 530565.3 tokens/s INFO:__main__:2024-10-26 22:17:52 | Epoch: 0 | Step: 19230 | Dataset: 0-15385000 | Loss: 2.218 | 679 ms/step , 57894.08 GFLOP/s , 528670.8 tokens/s INFO:__main__:2024-10-26 22:18:00 | Epoch: 0 | Step: 19240 | Dataset: 0-15393000 | Loss: 2.269 | 675 ms/step , 58230.31 GFLOP/s , 529275.7 tokens/s INFO:__main__:2024-10-26 22:18:08 | Epoch: 0 | Step: 19250 | Dataset: 0-15401000 | Loss: 2.263 | 677 ms/step , 58048.05 GFLOP/s , 529506.6 tokens/s INFO:__main__:2024-10-26 22:18:16 | Epoch: 0 | Step: 19260 | Dataset: 0-15409000 | Loss: 2.249 | 677 ms/step , 58077.60 GFLOP/s , 526316.7 tokens/s INFO:__main__:2024-10-26 22:18:23 | Epoch: 0 | Step: 19270 | Dataset: 0-15417000 | Loss: 2.217 | 677 ms/step , 58024.57 GFLOP/s , 527863.9 tokens/s INFO:__main__:2024-10-26 22:18:31 | Epoch: 0 | Step: 19280 | Dataset: 0-15425000 | Loss: 2.221 | 678 ms/step , 57990.92 GFLOP/s , 529047.9 tokens/s INFO:__main__:2024-10-26 22:18:39 | Epoch: 0 | Step: 19290 | Dataset: 0-15433000 | Loss: 2.225 | 677 ms/step , 58089.82 GFLOP/s , 529080.3 tokens/s INFO:__main__:2024-10-26 22:18:47 | Epoch: 0 | Step: 19300 | Dataset: 0-15441000 | Loss: 2.224 | 678 ms/step , 58017.74 GFLOP/s , 528805.6 tokens/s INFO:__main__:2024-10-26 22:18:54 | Epoch: 0 | Step: 19310 | Dataset: 0-15449000 | Loss: 2.237 | 677 ms/step , 58074.20 GFLOP/s , 528578.9 tokens/s INFO:__main__:2024-10-26 22:19:02 | Epoch: 0 | Step: 19320 | Dataset: 0-15457000 | Loss: 2.237 | 677 ms/step , 58059.66 GFLOP/s , 529088.4 tokens/s INFO:__main__:2024-10-26 22:19:10 | Epoch: 0 | Step: 19330 | Dataset: 0-15465000 | Loss: 2.239 | 677 ms/step , 58030.47 GFLOP/s , 528213.9 tokens/s INFO:__main__:2024-10-26 22:19:18 | Epoch: 0 | Step: 19340 | Dataset: 0-15473000 | Loss: 2.212 | 677 ms/step , 58079.17 GFLOP/s , 530003.9 tokens/s INFO:__main__:2024-10-26 22:19:25 | Epoch: 0 | Step: 19350 | Dataset: 0-15481000 | Loss: 2.223 | 677 ms/step , 58043.96 GFLOP/s , 529808.0 tokens/s INFO:__main__:2024-10-26 22:19:33 | Epoch: 0 | Step: 19360 | Dataset: 0-15489000 | Loss: 2.004 | 677 ms/step , 58106.36 GFLOP/s , 529906.2 tokens/s INFO:__main__:2024-10-26 22:19:41 | Epoch: 0 | Step: 19370 | Dataset: 0-15497000 | Loss: 1.914 | 677 ms/step , 58094.91 GFLOP/s , 528849.4 tokens/s INFO:__main__:2024-10-26 22:19:49 | Epoch: 0 | Step: 19380 | Dataset: 0-15505000 | Loss: 1.903 | 677 ms/step , 58060.15 GFLOP/s , 530339.9 tokens/s INFO:__main__:2024-10-26 22:19:56 | Epoch: 0 | Step: 19390 | Dataset: 0-15513000 | Loss: 1.875 | 677 ms/step , 58105.94 GFLOP/s , 528410.9 tokens/s INFO:__main__:2024-10-26 22:20:04 | Epoch: 0 | Step: 19400 | Dataset: 0-15521000 | Loss: 1.850 | 676 ms/step , 58148.68 GFLOP/s , 531591.6 tokens/s INFO:__main__:2024-10-26 22:20:12 | Epoch: 0 | Step: 19410 | Dataset: 0-15529000 | Loss: 1.859 | 675 ms/step , 58213.52 GFLOP/s , 532171.3 tokens/s INFO:__main__:2024-10-26 22:20:19 | Epoch: 0 | Step: 19420 | Dataset: 0-15537000 | Loss: 1.847 | 675 ms/step , 58195.46 GFLOP/s , 532483.8 tokens/s INFO:__main__:2024-10-26 22:20:27 | Epoch: 0 | Step: 19430 | Dataset: 0-15545000 | Loss: 1.855 | 675 ms/step , 58211.03 GFLOP/s , 532517.9 tokens/s INFO:__main__:2024-10-26 22:20:35 | Epoch: 0 | Step: 19440 | Dataset: 0-15553000 | Loss: 1.834 | 675 ms/step , 58229.76 GFLOP/s , 532913.5 tokens/s INFO:__main__:2024-10-26 22:20:43 | Epoch: 0 | Step: 19450 | Dataset: 0-15561000 | Loss: 1.830 | 676 ms/step , 58187.20 GFLOP/s , 532575.5 tokens/s INFO:__main__:2024-10-26 22:20:50 | Epoch: 0 | Step: 19460 | Dataset: 0-15569000 | Loss: 1.844 | 676 ms/step , 58181.71 GFLOP/s , 532732.1 tokens/s INFO:__main__:2024-10-26 22:20:58 | Epoch: 0 | Step: 19470 | Dataset: 0-15577000 | Loss: 1.823 | 676 ms/step , 58175.17 GFLOP/s , 532247.2 tokens/s INFO:__main__:2024-10-26 22:21:06 | Epoch: 0 | Step: 19480 | Dataset: 0-15585000 | Loss: 1.816 | 675 ms/step , 58253.42 GFLOP/s , 532091.5 tokens/s INFO:__main__:2024-10-26 22:21:13 | Epoch: 0 | Step: 19490 | Dataset: 0-15593000 | Loss: 1.799 | 676 ms/step , 58190.83 GFLOP/s , 532195.2 tokens/s INFO:__main__:2024-10-26 22:21:21 | Epoch: 0 | Step: 19500 | Dataset: 0-15601000 | Loss: 1.798 | 676 ms/step , 58158.29 GFLOP/s , 531086.0 tokens/s INFO:__main__:2024-10-26 22:21:29 | Epoch: 0 | Step: 19510 | Dataset: 0-15609000 | Loss: 1.801 | 676 ms/step , 58110.08 GFLOP/s , 532435.8 tokens/s INFO:__main__:2024-10-26 22:21:36 | Epoch: 0 | Step: 19520 | Dataset: 0-15617000 | Loss: 1.794 | 675 ms/step , 58235.56 GFLOP/s , 532081.8 tokens/s INFO:__main__:2024-10-26 22:21:44 | Epoch: 0 | Step: 19530 | Dataset: 0-15625000 | Loss: 1.789 | 675 ms/step , 58259.99 GFLOP/s , 532041.4 tokens/s INFO:__main__:2024-10-26 22:21:52 | Epoch: 0 | Step: 19540 | Dataset: 0-15633000 | Loss: 2.369 | 675 ms/step , 58273.24 GFLOP/s , 533003.3 tokens/s INFO:__main__:2024-10-26 22:21:59 | Epoch: 0 | Step: 19550 | Dataset: 0-15641000 | Loss: 2.359 | 676 ms/step , 58177.57 GFLOP/s , 532941.8 tokens/s INFO:__main__:2024-10-26 22:22:07 | Epoch: 0 | Step: 19560 | Dataset: 0-15649000 | Loss: 2.273 | 676 ms/step , 58123.84 GFLOP/s , 532831.2 tokens/s INFO:__main__:2024-10-26 22:22:15 | Epoch: 0 | Step: 19570 | Dataset: 0-15657000 | Loss: 2.248 | 675 ms/step , 58224.58 GFLOP/s , 530885.0 tokens/s INFO:__main__:2024-10-26 22:22:23 | Epoch: 0 | Step: 19580 | Dataset: 0-15665000 | Loss: 2.336 | 677 ms/step , 58030.83 GFLOP/s , 530980.0 tokens/s INFO:__main__:2024-10-26 22:22:30 | Epoch: 0 | Step: 19590 | Dataset: 0-15673000 | Loss: 2.340 | 677 ms/step , 58026.64 GFLOP/s , 529899.2 tokens/s INFO:__main__:2024-10-26 22:22:38 | Epoch: 0 | Step: 19600 | Dataset: 0-15681000 | Loss: 2.237 | 675 ms/step , 58261.76 GFLOP/s , 529785.3 tokens/s INFO:__main__:2024-10-26 22:22:46 | Epoch: 0 | Step: 19610 | Dataset: 0-15689000 | Loss: 2.281 | 678 ms/step , 57938.13 GFLOP/s , 529934.6 tokens/s INFO:__main__:2024-10-26 22:22:54 | Epoch: 0 | Step: 19620 | Dataset: 0-15697000 | Loss: 2.200 | 677 ms/step , 58070.50 GFLOP/s , 529355.7 tokens/s INFO:__main__:2024-10-26 22:23:01 | Epoch: 0 | Step: 19630 | Dataset: 0-15705000 | Loss: 2.296 | 676 ms/step , 58135.44 GFLOP/s , 530210.2 tokens/s INFO:__main__:2024-10-26 22:23:09 | Epoch: 0 | Step: 19640 | Dataset: 0-15713000 | Loss: 2.220 | 678 ms/step , 58011.65 GFLOP/s , 530632.3 tokens/s INFO:__main__:2024-10-26 22:23:17 | Epoch: 0 | Step: 19650 | Dataset: 0-15721000 | Loss: 2.284 | 676 ms/step , 58112.60 GFLOP/s , 530279.2 tokens/s INFO:__main__:2024-10-26 22:23:24 | Epoch: 0 | Step: 19660 | Dataset: 0-15729000 | Loss: 2.293 | 676 ms/step , 58113.37 GFLOP/s , 530034.5 tokens/s INFO:__main__:2024-10-26 22:23:32 | Epoch: 0 | Step: 19670 | Dataset: 0-15737000 | Loss: 2.218 | 677 ms/step , 58090.92 GFLOP/s , 529899.5 tokens/s INFO:__main__:2024-10-26 22:23:40 | Epoch: 0 | Step: 19680 | Dataset: 0-15745000 | Loss: 2.184 | 676 ms/step , 58112.23 GFLOP/s , 530220.0 tokens/s INFO:__main__:2024-10-26 22:23:48 | Epoch: 0 | Step: 19690 | Dataset: 0-15753000 | Loss: 2.182 | 677 ms/step , 58069.18 GFLOP/s , 530918.7 tokens/s INFO:__main__:2024-10-26 22:23:55 | Epoch: 0 | Step: 19700 | Dataset: 0-15761000 | Loss: 1.827 | 677 ms/step , 58104.91 GFLOP/s , 529476.6 tokens/s INFO:__main__:2024-10-26 22:24:03 | Epoch: 0 | Step: 19710 | Dataset: 0-15769000 | Loss: 1.767 | 676 ms/step , 58166.78 GFLOP/s , 529527.0 tokens/s INFO:__main__:2024-10-26 22:24:11 | Epoch: 0 | Step: 19720 | Dataset: 0-15777000 | Loss: 1.760 | 680 ms/step , 57767.46 GFLOP/s , 525901.2 tokens/s INFO:__main__:2024-10-26 22:24:19 | Epoch: 0 | Step: 19730 | Dataset: 0-15785000 | Loss: 1.732 | 676 ms/step , 58132.07 GFLOP/s , 530478.7 tokens/s INFO:__main__:2024-10-26 22:24:26 | Epoch: 0 | Step: 19740 | Dataset: 0-15793000 | Loss: 1.728 | 677 ms/step , 58102.26 GFLOP/s , 529417.4 tokens/s INFO:__main__:2024-10-26 22:24:34 | Epoch: 0 | Step: 19750 | Dataset: 0-15801000 | Loss: 1.726 | 676 ms/step , 58172.63 GFLOP/s , 529785.8 tokens/s INFO:__main__:2024-10-26 22:24:42 | Epoch: 0 | Step: 19760 | Dataset: 0-15809000 | Loss: 1.700 | 677 ms/step , 58061.74 GFLOP/s , 530315.6 tokens/s INFO:__main__:2024-10-26 22:24:49 | Epoch: 0 | Step: 19770 | Dataset: 0-15817000 | Loss: 1.704 | 678 ms/step , 57998.67 GFLOP/s , 529931.1 tokens/s INFO:__main__:2024-10-26 22:24:57 | Epoch: 0 | Step: 19780 | Dataset: 0-15825000 | Loss: 1.713 | 677 ms/step , 58087.90 GFLOP/s , 528650.5 tokens/s INFO:__main__:2024-10-26 22:25:05 | Epoch: 0 | Step: 19790 | Dataset: 0-15833000 | Loss: 2.363 | 679 ms/step , 57886.46 GFLOP/s , 528543.3 tokens/s INFO:__main__:2024-10-26 22:25:13 | Epoch: 0 | Step: 19800 | Dataset: 0-15841000 | Loss: 2.314 | 676 ms/step , 58129.44 GFLOP/s , 529996.3 tokens/s INFO:__main__:2024-10-26 22:25:20 | Epoch: 0 | Step: 19810 | Dataset: 0-15849000 | Loss: 2.337 | 676 ms/step , 58109.37 GFLOP/s , 530779.7 tokens/s INFO:__main__:2024-10-26 22:25:28 | Epoch: 0 | Step: 19820 | Dataset: 0-15857000 | Loss: 2.185 | 678 ms/step , 58014.22 GFLOP/s , 530713.5 tokens/s INFO:__main__:2024-10-26 22:25:36 | Epoch: 0 | Step: 19830 | Dataset: 0-15865000 | Loss: 2.296 | 676 ms/step , 58124.33 GFLOP/s , 530422.7 tokens/s INFO:__main__:2024-10-26 22:25:44 | Epoch: 0 | Step: 19840 | Dataset: 0-15873000 | Loss: 2.285 | 678 ms/step , 57954.93 GFLOP/s , 530173.7 tokens/s INFO:__main__:2024-10-26 22:25:51 | Epoch: 0 | Step: 19850 | Dataset: 0-15881000 | Loss: 2.232 | 676 ms/step , 58192.85 GFLOP/s , 529577.0 tokens/s INFO:__main__:2024-10-26 22:25:59 | Epoch: 0 | Step: 19860 | Dataset: 0-15889000 | Loss: 2.326 | 678 ms/step , 58012.27 GFLOP/s , 530202.2 tokens/s INFO:__main__:2024-10-26 22:26:07 | Epoch: 0 | Step: 19870 | Dataset: 0-15897000 | Loss: 2.240 | 676 ms/step , 58115.49 GFLOP/s , 529775.6 tokens/s INFO:__main__:2024-10-26 22:26:15 | Epoch: 0 | Step: 19880 | Dataset: 0-15905000 | Loss: 2.258 | 676 ms/step , 58190.18 GFLOP/s , 527357.1 tokens/s INFO:__main__:2024-10-26 22:26:22 | Epoch: 0 | Step: 19890 | Dataset: 0-15913000 | Loss: 2.298 | 677 ms/step , 58082.29 GFLOP/s , 530098.6 tokens/s INFO:__main__:2024-10-26 22:26:30 | Epoch: 0 | Step: 19900 | Dataset: 0-15921000 | Loss: 2.329 | 679 ms/step , 57928.80 GFLOP/s , 529506.5 tokens/s INFO:__main__:2024-10-26 22:26:38 | Epoch: 0 | Step: 19910 | Dataset: 0-15929000 | Loss: 2.281 | 675 ms/step , 58248.58 GFLOP/s , 530295.0 tokens/s INFO:__main__:2024-10-26 22:26:45 | Epoch: 0 | Step: 19920 | Dataset: 0-15937000 | Loss: 2.275 | 678 ms/step , 58015.39 GFLOP/s , 532217.7 tokens/s INFO:__main__:2024-10-26 22:26:53 | Epoch: 0 | Step: 19930 | Dataset: 0-15945000 | Loss: 2.208 | 677 ms/step , 58090.49 GFLOP/s , 529434.8 tokens/s INFO:__main__:2024-10-26 22:27:01 | Epoch: 0 | Step: 19940 | Dataset: 0-15953000 | Loss: 2.294 | 676 ms/step , 58177.95 GFLOP/s , 530664.6 tokens/s INFO:__main__:2024-10-26 22:27:09 | Epoch: 0 | Step: 19950 | Dataset: 0-15961000 | Loss: 2.272 | 677 ms/step , 58063.60 GFLOP/s , 529776.9 tokens/s INFO:__main__:2024-10-26 22:27:16 | Epoch: 0 | Step: 19960 | Dataset: 0-15969000 | Loss: 2.252 | 677 ms/step , 58077.45 GFLOP/s , 530749.3 tokens/s INFO:__main__:2024-10-26 22:27:24 | Epoch: 0 | Step: 19970 | Dataset: 0-15977000 | Loss: 2.235 | 678 ms/step , 58008.62 GFLOP/s , 530681.3 tokens/s INFO:__main__:2024-10-26 22:27:32 | Epoch: 0 | Step: 19980 | Dataset: 0-15985000 | Loss: 2.243 | 675 ms/step , 58212.18 GFLOP/s , 529464.6 tokens/s INFO:__main__:2024-10-26 22:27:39 | Epoch: 0 | Step: 19990 | Dataset: 0-15993000 | Loss: 2.195 | 676 ms/step , 58157.83 GFLOP/s , 530861.6 tokens/s INFO:__main__:2024-10-26 22:27:47 | Validation | Step: 20000 | Val_loss: 2.299 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 22:27:47 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_222747_step_20000.pt` INFO:__main__:2024-10-26 22:27:48 | Epoch: 0 | Step: 20000 | Dataset: 0-16001000 | Loss: 2.241 | 676 ms/step , 58191.88 GFLOP/s , 476558.3 tokens/s INFO:__main__:2024-10-26 22:27:56 | Epoch: 0 | Step: 20010 | Dataset: 0-16009000 | Loss: 2.208 | 676 ms/step , 58177.69 GFLOP/s , 530405.6 tokens/s INFO:__main__:2024-10-26 22:28:04 | Epoch: 0 | Step: 20020 | Dataset: 0-16017000 | Loss: 2.222 | 677 ms/step , 58055.42 GFLOP/s , 530610.0 tokens/s INFO:__main__:2024-10-26 22:28:11 | Epoch: 0 | Step: 20030 | Dataset: 0-16025000 | Loss: 2.184 | 677 ms/step , 58099.68 GFLOP/s , 529087.4 tokens/s INFO:__main__:2024-10-26 22:28:19 | Epoch: 0 | Step: 20040 | Dataset: 0-16033000 | Loss: 2.235 | 678 ms/step , 57981.10 GFLOP/s , 530608.6 tokens/s INFO:__main__:2024-10-26 22:28:27 | Epoch: 0 | Step: 20050 | Dataset: 0-16041000 | Loss: 2.249 | 677 ms/step , 58068.13 GFLOP/s , 529723.2 tokens/s INFO:__main__:2024-10-26 22:28:34 | Epoch: 0 | Step: 20060 | Dataset: 0-16049000 | Loss: 2.226 | 677 ms/step , 58043.27 GFLOP/s , 530938.5 tokens/s INFO:__main__:2024-10-26 22:28:42 | Epoch: 0 | Step: 20070 | Dataset: 0-16057000 | Loss: 2.254 | 677 ms/step , 58082.05 GFLOP/s , 530598.2 tokens/s INFO:__main__:2024-10-26 22:28:50 | Epoch: 0 | Step: 20080 | Dataset: 0-16065000 | Loss: 2.220 | 676 ms/step , 58163.02 GFLOP/s , 526745.6 tokens/s INFO:__main__:2024-10-26 22:28:58 | Epoch: 0 | Step: 20090 | Dataset: 0-16073000 | Loss: 2.179 | 676 ms/step , 58179.59 GFLOP/s , 530075.0 tokens/s INFO:__main__:2024-10-26 22:29:05 | Epoch: 0 | Step: 20100 | Dataset: 0-16081000 | Loss: 2.230 | 681 ms/step , 57762.19 GFLOP/s , 529074.5 tokens/s INFO:__main__:2024-10-26 22:29:13 | Epoch: 0 | Step: 20110 | Dataset: 0-16089000 | Loss: 2.172 | 692 ms/step , 56810.32 GFLOP/s , 527427.4 tokens/s INFO:__main__:2024-10-26 22:29:21 | Epoch: 0 | Step: 20120 | Dataset: 0-16097000 | Loss: 2.307 | 676 ms/step , 58150.79 GFLOP/s , 529008.4 tokens/s INFO:__main__:2024-10-26 22:29:29 | Epoch: 0 | Step: 20130 | Dataset: 0-16105000 | Loss: 2.311 | 678 ms/step , 57992.57 GFLOP/s , 528778.6 tokens/s INFO:__main__:2024-10-26 22:29:36 | Epoch: 0 | Step: 20140 | Dataset: 0-16113000 | Loss: 2.170 | 678 ms/step , 57975.32 GFLOP/s , 530235.0 tokens/s INFO:__main__:2024-10-26 22:29:44 | Epoch: 0 | Step: 20150 | Dataset: 0-16121000 | Loss: 2.194 | 677 ms/step , 58031.01 GFLOP/s , 529599.3 tokens/s INFO:__main__:2024-10-26 22:29:52 | Epoch: 0 | Step: 20160 | Dataset: 0-16129000 | Loss: 2.262 | 677 ms/step , 58097.59 GFLOP/s , 530347.2 tokens/s INFO:__main__:2024-10-26 22:30:00 | Epoch: 0 | Step: 20170 | Dataset: 0-16137000 | Loss: 2.255 | 677 ms/step , 58047.45 GFLOP/s , 529346.4 tokens/s INFO:__main__:2024-10-26 22:30:07 | Epoch: 0 | Step: 20180 | Dataset: 0-16145000 | Loss: 2.214 | 678 ms/step , 57959.93 GFLOP/s , 530208.2 tokens/s INFO:__main__:2024-10-26 22:30:15 | Epoch: 0 | Step: 20190 | Dataset: 0-16153000 | Loss: 2.186 | 675 ms/step , 58197.97 GFLOP/s , 530287.5 tokens/s INFO:__main__:2024-10-26 22:30:23 | Epoch: 0 | Step: 20200 | Dataset: 0-16161000 | Loss: 2.155 | 676 ms/step , 58122.62 GFLOP/s , 530398.4 tokens/s INFO:__main__:2024-10-26 22:30:30 | Epoch: 0 | Step: 20210 | Dataset: 0-16169000 | Loss: 2.249 | 678 ms/step , 57961.81 GFLOP/s , 529309.9 tokens/s INFO:__main__:2024-10-26 22:30:38 | Epoch: 0 | Step: 20220 | Dataset: 0-16177000 | Loss: 2.238 | 676 ms/step , 58156.42 GFLOP/s , 529093.4 tokens/s INFO:__main__:2024-10-26 22:30:46 | Epoch: 0 | Step: 20230 | Dataset: 0-16185000 | Loss: 2.228 | 676 ms/step , 58166.49 GFLOP/s , 527821.0 tokens/s INFO:__main__:2024-10-26 22:30:54 | Epoch: 0 | Step: 20240 | Dataset: 0-16193000 | Loss: 2.250 | 676 ms/step , 58188.37 GFLOP/s , 529562.7 tokens/s INFO:__main__:2024-10-26 22:31:01 | Epoch: 0 | Step: 20250 | Dataset: 0-16201000 | Loss: 2.199 | 676 ms/step , 58154.24 GFLOP/s , 529226.9 tokens/s INFO:__main__:2024-10-26 22:31:09 | Epoch: 0 | Step: 20260 | Dataset: 0-16209000 | Loss: 2.250 | 677 ms/step , 58093.20 GFLOP/s , 530130.5 tokens/s INFO:__main__:2024-10-26 22:31:17 | Epoch: 0 | Step: 20270 | Dataset: 0-16217000 | Loss: 2.284 | 677 ms/step , 58032.67 GFLOP/s , 529524.2 tokens/s INFO:__main__:2024-10-26 22:31:25 | Epoch: 0 | Step: 20280 | Dataset: 0-16225000 | Loss: 2.276 | 678 ms/step , 57952.97 GFLOP/s , 529370.8 tokens/s INFO:__main__:2024-10-26 22:31:32 | Epoch: 0 | Step: 20290 | Dataset: 0-16233000 | Loss: 2.265 | 677 ms/step , 58028.69 GFLOP/s , 529102.4 tokens/s INFO:__main__:2024-10-26 22:31:40 | Epoch: 0 | Step: 20300 | Dataset: 0-16241000 | Loss: 2.258 | 677 ms/step , 58099.90 GFLOP/s , 527803.5 tokens/s INFO:__main__:2024-10-26 22:31:48 | Epoch: 0 | Step: 20310 | Dataset: 0-16249000 | Loss: 2.223 | 677 ms/step , 58090.37 GFLOP/s , 524828.6 tokens/s INFO:__main__:2024-10-26 22:31:56 | Epoch: 0 | Step: 20320 | Dataset: 0-16257000 | Loss: 2.257 | 676 ms/step , 58110.27 GFLOP/s , 528158.6 tokens/s INFO:__main__:2024-10-26 22:32:03 | Epoch: 0 | Step: 20330 | Dataset: 0-16265000 | Loss: 2.249 | 677 ms/step , 58036.94 GFLOP/s , 528368.6 tokens/s INFO:__main__:2024-10-26 22:32:11 | Epoch: 0 | Step: 20340 | Dataset: 0-16273000 | Loss: 2.234 | 680 ms/step , 57805.73 GFLOP/s , 527232.8 tokens/s INFO:__main__:2024-10-26 22:32:19 | Epoch: 0 | Step: 20350 | Dataset: 0-16281000 | Loss: 2.231 | 677 ms/step , 58095.55 GFLOP/s , 527102.2 tokens/s INFO:__main__:2024-10-26 22:32:27 | Epoch: 0 | Step: 20360 | Dataset: 0-16289000 | Loss: 2.198 | 677 ms/step , 58038.60 GFLOP/s , 528345.2 tokens/s INFO:__main__:2024-10-26 22:32:35 | Epoch: 0 | Step: 20370 | Dataset: 0-16297000 | Loss: 2.240 | 677 ms/step , 58072.39 GFLOP/s , 528902.4 tokens/s INFO:__main__:2024-10-26 22:32:42 | Epoch: 0 | Step: 20380 | Dataset: 0-16305000 | Loss: 2.206 | 677 ms/step , 58029.97 GFLOP/s , 526051.3 tokens/s INFO:__main__:2024-10-26 22:32:50 | Epoch: 0 | Step: 20390 | Dataset: 0-16313000 | Loss: 2.284 | 676 ms/step , 58127.06 GFLOP/s , 528832.3 tokens/s INFO:__main__:2024-10-26 22:32:58 | Epoch: 0 | Step: 20400 | Dataset: 0-16321000 | Loss: 2.248 | 677 ms/step , 58063.06 GFLOP/s , 529573.9 tokens/s INFO:__main__:2024-10-26 22:33:06 | Epoch: 0 | Step: 20410 | Dataset: 0-16329000 | Loss: 2.216 | 677 ms/step , 58087.59 GFLOP/s , 527312.7 tokens/s INFO:__main__:2024-10-26 22:33:13 | Epoch: 0 | Step: 20420 | Dataset: 0-16337000 | Loss: 2.283 | 678 ms/step , 58002.73 GFLOP/s , 527314.0 tokens/s INFO:__main__:2024-10-26 22:33:21 | Epoch: 0 | Step: 20430 | Dataset: 0-16345000 | Loss: 1.915 | 677 ms/step , 58065.42 GFLOP/s , 528677.2 tokens/s INFO:__main__:2024-10-26 22:33:29 | Epoch: 0 | Step: 20440 | Dataset: 0-16353000 | Loss: 1.808 | 682 ms/step , 57612.91 GFLOP/s , 527922.4 tokens/s INFO:__main__:2024-10-26 22:33:37 | Epoch: 0 | Step: 20450 | Dataset: 0-16361000 | Loss: 1.777 | 679 ms/step , 57881.95 GFLOP/s , 528805.3 tokens/s INFO:__main__:2024-10-26 22:33:44 | Epoch: 0 | Step: 20460 | Dataset: 0-16369000 | Loss: 1.734 | 677 ms/step , 58084.25 GFLOP/s , 530061.3 tokens/s INFO:__main__:2024-10-26 22:33:52 | Epoch: 0 | Step: 20470 | Dataset: 0-16377000 | Loss: 1.744 | 676 ms/step , 58159.69 GFLOP/s , 530885.1 tokens/s INFO:__main__:2024-10-26 22:34:00 | Epoch: 0 | Step: 20480 | Dataset: 0-16385000 | Loss: 1.721 | 677 ms/step , 58088.24 GFLOP/s , 530715.7 tokens/s INFO:__main__:2024-10-26 22:34:07 | Epoch: 0 | Step: 20490 | Dataset: 0-16393000 | Loss: 1.713 | 678 ms/step , 58018.02 GFLOP/s , 530719.8 tokens/s INFO:__main__:2024-10-26 22:34:15 | Epoch: 0 | Step: 20500 | Dataset: 0-16401000 | Loss: 1.736 | 676 ms/step , 58171.65 GFLOP/s , 530597.6 tokens/s INFO:__main__:2024-10-26 22:34:23 | Epoch: 0 | Step: 20510 | Dataset: 0-16409000 | Loss: 1.743 | 678 ms/step , 57935.94 GFLOP/s , 530079.2 tokens/s INFO:__main__:2024-10-26 22:34:31 | Epoch: 0 | Step: 20520 | Dataset: 0-16417000 | Loss: 2.313 | 677 ms/step , 58058.69 GFLOP/s , 530385.4 tokens/s INFO:__main__:2024-10-26 22:34:38 | Epoch: 0 | Step: 20530 | Dataset: 0-16425000 | Loss: 2.256 | 677 ms/step , 58087.84 GFLOP/s , 529373.0 tokens/s INFO:__main__:2024-10-26 22:34:46 | Epoch: 0 | Step: 20540 | Dataset: 0-16433000 | Loss: 2.195 | 676 ms/step , 58165.39 GFLOP/s , 530104.4 tokens/s INFO:__main__:2024-10-26 22:34:54 | Epoch: 0 | Step: 20550 | Dataset: 0-16441000 | Loss: 2.308 | 677 ms/step , 58040.83 GFLOP/s , 530267.4 tokens/s INFO:__main__:2024-10-26 22:35:02 | Epoch: 0 | Step: 20560 | Dataset: 0-16449000 | Loss: 2.229 | 677 ms/step , 58081.15 GFLOP/s , 527639.0 tokens/s INFO:__main__:2024-10-26 22:35:09 | Epoch: 0 | Step: 20570 | Dataset: 0-16457000 | Loss: 2.295 | 677 ms/step , 58030.11 GFLOP/s , 530835.1 tokens/s INFO:__main__:2024-10-26 22:35:17 | Epoch: 0 | Step: 20580 | Dataset: 0-16465000 | Loss: 2.256 | 677 ms/step , 58097.91 GFLOP/s , 530262.0 tokens/s INFO:__main__:2024-10-26 22:35:25 | Epoch: 0 | Step: 20590 | Dataset: 0-16473000 | Loss: 2.263 | 677 ms/step , 58041.34 GFLOP/s , 527935.7 tokens/s INFO:__main__:2024-10-26 22:35:33 | Epoch: 0 | Step: 20600 | Dataset: 0-16481000 | Loss: 2.156 | 678 ms/step , 58011.53 GFLOP/s , 527697.7 tokens/s INFO:__main__:2024-10-26 22:35:40 | Epoch: 0 | Step: 20610 | Dataset: 0-16489000 | Loss: 2.200 | 678 ms/step , 57940.86 GFLOP/s , 529100.2 tokens/s INFO:__main__:2024-10-26 22:35:48 | Epoch: 0 | Step: 20620 | Dataset: 0-16497000 | Loss: 2.191 | 676 ms/step , 58117.08 GFLOP/s , 529608.1 tokens/s INFO:__main__:2024-10-26 22:35:56 | Epoch: 0 | Step: 20630 | Dataset: 0-16505000 | Loss: 2.203 | 680 ms/step , 57794.32 GFLOP/s , 528760.2 tokens/s INFO:__main__:2024-10-26 22:36:03 | Epoch: 0 | Step: 20640 | Dataset: 0-16513000 | Loss: 2.237 | 680 ms/step , 57787.96 GFLOP/s , 528770.2 tokens/s INFO:__main__:2024-10-26 22:36:11 | Epoch: 0 | Step: 20650 | Dataset: 0-16521000 | Loss: 2.224 | 679 ms/step , 57856.87 GFLOP/s , 528368.2 tokens/s INFO:__main__:2024-10-26 22:36:19 | Epoch: 0 | Step: 20660 | Dataset: 0-16529000 | Loss: 2.125 | 677 ms/step , 58039.42 GFLOP/s , 528388.6 tokens/s INFO:__main__:2024-10-26 22:36:27 | Epoch: 0 | Step: 20670 | Dataset: 0-16537000 | Loss: 2.270 | 678 ms/step , 57975.59 GFLOP/s , 529408.6 tokens/s INFO:__main__:2024-10-26 22:36:34 | Epoch: 0 | Step: 20680 | Dataset: 0-16545000 | Loss: 2.196 | 678 ms/step , 58009.49 GFLOP/s , 527873.6 tokens/s INFO:__main__:2024-10-26 22:36:42 | Epoch: 0 | Step: 20690 | Dataset: 0-16553000 | Loss: 2.198 | 676 ms/step , 58118.44 GFLOP/s , 524957.9 tokens/s INFO:__main__:2024-10-26 22:36:50 | Epoch: 0 | Step: 20700 | Dataset: 0-16561000 | Loss: 2.161 | 678 ms/step , 57987.28 GFLOP/s , 528332.2 tokens/s INFO:__main__:2024-10-26 22:36:58 | Epoch: 0 | Step: 20710 | Dataset: 0-16569000 | Loss: 2.200 | 678 ms/step , 57986.60 GFLOP/s , 525601.9 tokens/s INFO:__main__:2024-10-26 22:37:06 | Epoch: 0 | Step: 20720 | Dataset: 0-16577000 | Loss: 2.185 | 678 ms/step , 57951.25 GFLOP/s , 528564.3 tokens/s INFO:__main__:2024-10-26 22:37:13 | Epoch: 0 | Step: 20730 | Dataset: 0-16585000 | Loss: 2.167 | 678 ms/step , 57977.51 GFLOP/s , 529945.0 tokens/s INFO:__main__:2024-10-26 22:37:21 | Epoch: 0 | Step: 20740 | Dataset: 0-16593000 | Loss: 2.216 | 677 ms/step , 58067.10 GFLOP/s , 529124.9 tokens/s INFO:__main__:2024-10-26 22:37:29 | Epoch: 0 | Step: 20750 | Dataset: 0-16601000 | Loss: 2.183 | 677 ms/step , 58046.92 GFLOP/s , 529577.7 tokens/s INFO:__main__:2024-10-26 22:37:37 | Epoch: 0 | Step: 20760 | Dataset: 0-16609000 | Loss: 2.168 | 678 ms/step , 57992.04 GFLOP/s , 528756.8 tokens/s INFO:__main__:2024-10-26 22:37:44 | Epoch: 0 | Step: 20770 | Dataset: 0-16617000 | Loss: 2.179 | 676 ms/step , 58122.67 GFLOP/s , 529926.3 tokens/s INFO:__main__:2024-10-26 22:37:52 | Epoch: 0 | Step: 20780 | Dataset: 0-16625000 | Loss: 2.262 | 679 ms/step , 57927.84 GFLOP/s , 529554.1 tokens/s INFO:__main__:2024-10-26 22:38:00 | Epoch: 0 | Step: 20790 | Dataset: 0-16633000 | Loss: 2.141 | 677 ms/step , 58038.97 GFLOP/s , 529966.6 tokens/s INFO:__main__:2024-10-26 22:38:07 | Epoch: 0 | Step: 20800 | Dataset: 0-16641000 | Loss: 2.263 | 677 ms/step , 58098.17 GFLOP/s , 530832.6 tokens/s INFO:__main__:2024-10-26 22:38:15 | Epoch: 0 | Step: 20810 | Dataset: 0-16649000 | Loss: 2.293 | 677 ms/step , 58092.18 GFLOP/s , 530375.5 tokens/s INFO:__main__:2024-10-26 22:38:23 | Epoch: 0 | Step: 20820 | Dataset: 0-16657000 | Loss: 2.062 | 677 ms/step , 58023.40 GFLOP/s , 529500.2 tokens/s INFO:__main__:2024-10-26 22:38:31 | Epoch: 0 | Step: 20830 | Dataset: 0-16665000 | Loss: 2.237 | 679 ms/step , 57933.85 GFLOP/s , 529996.7 tokens/s INFO:__main__:2024-10-26 22:38:38 | Epoch: 0 | Step: 20840 | Dataset: 0-16673000 | Loss: 1.826 | 678 ms/step , 57998.78 GFLOP/s , 528898.2 tokens/s INFO:__main__:2024-10-26 22:38:46 | Epoch: 0 | Step: 20850 | Dataset: 0-16681000 | Loss: 1.817 | 678 ms/step , 57972.19 GFLOP/s , 528860.5 tokens/s INFO:__main__:2024-10-26 22:38:54 | Epoch: 0 | Step: 20860 | Dataset: 0-16689000 | Loss: 1.785 | 678 ms/step , 57942.15 GFLOP/s , 529191.2 tokens/s INFO:__main__:2024-10-26 22:39:02 | Epoch: 0 | Step: 20870 | Dataset: 0-16697000 | Loss: 1.770 | 678 ms/step , 57949.25 GFLOP/s , 529441.6 tokens/s INFO:__main__:2024-10-26 22:39:09 | Epoch: 0 | Step: 20880 | Dataset: 0-16705000 | Loss: 1.726 | 678 ms/step , 57959.61 GFLOP/s , 529528.1 tokens/s INFO:__main__:2024-10-26 22:39:17 | Epoch: 0 | Step: 20890 | Dataset: 0-16713000 | Loss: 1.743 | 678 ms/step , 57960.00 GFLOP/s , 529024.7 tokens/s INFO:__main__:2024-10-26 22:39:25 | Epoch: 0 | Step: 20900 | Dataset: 0-16721000 | Loss: 1.740 | 677 ms/step , 58060.16 GFLOP/s , 529304.1 tokens/s INFO:__main__:2024-10-26 22:39:33 | Epoch: 0 | Step: 20910 | Dataset: 0-16729000 | Loss: 1.707 | 677 ms/step , 58028.40 GFLOP/s , 528966.9 tokens/s INFO:__main__:2024-10-26 22:39:40 | Epoch: 0 | Step: 20920 | Dataset: 0-16737000 | Loss: 2.523 | 677 ms/step , 58089.94 GFLOP/s , 529220.3 tokens/s INFO:__main__:2024-10-26 22:39:48 | Epoch: 0 | Step: 20930 | Dataset: 0-16745000 | Loss: 2.310 | 677 ms/step , 58105.73 GFLOP/s , 530338.1 tokens/s INFO:__main__:2024-10-26 22:39:56 | Epoch: 0 | Step: 20940 | Dataset: 0-16753000 | Loss: 2.215 | 678 ms/step , 57993.21 GFLOP/s , 528544.4 tokens/s INFO:__main__:2024-10-26 22:40:04 | Epoch: 0 | Step: 20950 | Dataset: 0-16761000 | Loss: 2.245 | 678 ms/step , 57976.63 GFLOP/s , 527384.7 tokens/s INFO:__main__:2024-10-26 22:40:11 | Epoch: 0 | Step: 20960 | Dataset: 0-16769000 | Loss: 2.174 | 679 ms/step , 57876.96 GFLOP/s , 529690.0 tokens/s INFO:__main__:2024-10-26 22:40:19 | Epoch: 0 | Step: 20970 | Dataset: 0-16777000 | Loss: 2.197 | 680 ms/step , 57765.53 GFLOP/s , 529035.8 tokens/s INFO:__main__:2024-10-26 22:40:27 | Epoch: 0 | Step: 20980 | Dataset: 0-16785000 | Loss: 2.220 | 678 ms/step , 57964.35 GFLOP/s , 529326.0 tokens/s INFO:__main__:2024-10-26 22:40:35 | Epoch: 0 | Step: 20990 | Dataset: 0-16793000 | Loss: 2.208 | 679 ms/step , 57891.75 GFLOP/s , 529339.1 tokens/s INFO:__main__:2024-10-26 22:40:42 | Validation | Step: 21000 | Val_loss: 2.342 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 22:40:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_224042_step_21000.pt` INFO:__main__:2024-10-26 22:40:43 | Epoch: 0 | Step: 21000 | Dataset: 0-16801000 | Loss: 2.181 | 676 ms/step , 58183.26 GFLOP/s , 473396.2 tokens/s INFO:__main__:2024-10-26 22:40:51 | Epoch: 0 | Step: 21010 | Dataset: 0-16809000 | Loss: 2.199 | 676 ms/step , 58142.21 GFLOP/s , 529664.9 tokens/s INFO:__main__:2024-10-26 22:40:59 | Epoch: 0 | Step: 21020 | Dataset: 0-16817000 | Loss: 2.119 | 676 ms/step , 58156.76 GFLOP/s , 526131.3 tokens/s INFO:__main__:2024-10-26 22:41:06 | Epoch: 0 | Step: 21030 | Dataset: 0-16825000 | Loss: 2.193 | 677 ms/step , 58027.63 GFLOP/s , 529710.1 tokens/s INFO:__main__:2024-10-26 22:41:14 | Epoch: 0 | Step: 21040 | Dataset: 0-16833000 | Loss: 2.213 | 677 ms/step , 58048.36 GFLOP/s , 529827.6 tokens/s INFO:__main__:2024-10-26 22:41:22 | Epoch: 0 | Step: 21050 | Dataset: 0-16841000 | Loss: 2.238 | 682 ms/step , 57667.07 GFLOP/s , 529386.6 tokens/s INFO:__main__:2024-10-26 22:41:30 | Epoch: 0 | Step: 21060 | Dataset: 0-16849000 | Loss: 2.204 | 676 ms/step , 58114.62 GFLOP/s , 530374.5 tokens/s INFO:__main__:2024-10-26 22:41:37 | Epoch: 0 | Step: 21070 | Dataset: 0-16857000 | Loss: 2.180 | 680 ms/step , 57836.39 GFLOP/s , 529695.6 tokens/s INFO:__main__:2024-10-26 22:41:45 | Epoch: 0 | Step: 21080 | Dataset: 0-16865000 | Loss: 2.202 | 681 ms/step , 57702.08 GFLOP/s , 529980.1 tokens/s INFO:__main__:2024-10-26 22:41:53 | Epoch: 0 | Step: 21090 | Dataset: 0-16873000 | Loss: 1.749 | 677 ms/step , 58094.62 GFLOP/s , 530695.8 tokens/s INFO:__main__:2024-10-26 22:42:01 | Epoch: 0 | Step: 21100 | Dataset: 0-16881000 | Loss: 1.736 | 676 ms/step , 58123.31 GFLOP/s , 529622.5 tokens/s INFO:__main__:2024-10-26 22:42:08 | Epoch: 0 | Step: 21110 | Dataset: 0-16889000 | Loss: 1.737 | 678 ms/step , 58000.20 GFLOP/s , 530471.5 tokens/s INFO:__main__:2024-10-26 22:42:16 | Epoch: 0 | Step: 21120 | Dataset: 0-16897000 | Loss: 1.723 | 677 ms/step , 58058.52 GFLOP/s , 530299.7 tokens/s INFO:__main__:2024-10-26 22:42:24 | Epoch: 0 | Step: 21130 | Dataset: 0-16905000 | Loss: 1.721 | 676 ms/step , 58190.63 GFLOP/s , 530046.6 tokens/s INFO:__main__:2024-10-26 22:42:31 | Epoch: 0 | Step: 21140 | Dataset: 0-16913000 | Loss: 1.687 | 676 ms/step , 58120.03 GFLOP/s , 531171.4 tokens/s INFO:__main__:2024-10-26 22:42:39 | Epoch: 0 | Step: 21150 | Dataset: 0-16921000 | Loss: 1.700 | 676 ms/step , 58150.88 GFLOP/s , 531416.7 tokens/s INFO:__main__:2024-10-26 22:42:47 | Epoch: 0 | Step: 21160 | Dataset: 0-16929000 | Loss: 1.692 | 678 ms/step , 58015.15 GFLOP/s , 530915.3 tokens/s INFO:__main__:2024-10-26 22:42:55 | Epoch: 0 | Step: 21170 | Dataset: 0-16937000 | Loss: 2.416 | 677 ms/step , 58065.76 GFLOP/s , 529197.2 tokens/s INFO:__main__:2024-10-26 22:43:02 | Epoch: 0 | Step: 21180 | Dataset: 0-16945000 | Loss: 2.393 | 675 ms/step , 58229.85 GFLOP/s , 531738.9 tokens/s INFO:__main__:2024-10-26 22:43:10 | Epoch: 0 | Step: 21190 | Dataset: 0-16953000 | Loss: 2.332 | 720 ms/step , 54590.79 GFLOP/s , 528170.1 tokens/s INFO:__main__:2024-10-26 22:43:18 | Epoch: 0 | Step: 21200 | Dataset: 0-16961000 | Loss: 2.322 | 677 ms/step , 58066.45 GFLOP/s , 529499.6 tokens/s INFO:__main__:2024-10-26 22:43:26 | Epoch: 0 | Step: 21210 | Dataset: 0-16969000 | Loss: 2.328 | 676 ms/step , 58178.06 GFLOP/s , 526586.7 tokens/s INFO:__main__:2024-10-26 22:43:33 | Epoch: 0 | Step: 21220 | Dataset: 0-16977000 | Loss: 2.389 | 679 ms/step , 57889.15 GFLOP/s , 528155.5 tokens/s INFO:__main__:2024-10-26 22:43:41 | Epoch: 0 | Step: 21230 | Dataset: 0-16985000 | Loss: 2.259 | 678 ms/step , 57974.19 GFLOP/s , 527930.1 tokens/s INFO:__main__:2024-10-26 22:43:49 | Epoch: 0 | Step: 21240 | Dataset: 0-16993000 | Loss: 2.258 | 678 ms/step , 57986.11 GFLOP/s , 528398.6 tokens/s INFO:__main__:2024-10-26 22:43:57 | Epoch: 0 | Step: 21250 | Dataset: 0-17001000 | Loss: 2.287 | 679 ms/step , 57896.88 GFLOP/s , 525316.2 tokens/s INFO:__main__:2024-10-26 22:44:04 | Epoch: 0 | Step: 21260 | Dataset: 0-17009000 | Loss: 2.261 | 677 ms/step , 58066.87 GFLOP/s , 526589.3 tokens/s INFO:__main__:2024-10-26 22:44:12 | Epoch: 0 | Step: 21270 | Dataset: 0-17017000 | Loss: 2.249 | 678 ms/step , 58005.27 GFLOP/s , 528142.6 tokens/s INFO:__main__:2024-10-26 22:44:20 | Epoch: 0 | Step: 21280 | Dataset: 0-17025000 | Loss: 2.275 | 677 ms/step , 58023.39 GFLOP/s , 529374.7 tokens/s INFO:__main__:2024-10-26 22:44:28 | Epoch: 0 | Step: 21290 | Dataset: 0-17033000 | Loss: 2.240 | 678 ms/step , 57995.69 GFLOP/s , 529965.6 tokens/s INFO:__main__:2024-10-26 22:44:35 | Epoch: 0 | Step: 21300 | Dataset: 0-17041000 | Loss: 2.308 | 680 ms/step , 57829.90 GFLOP/s , 523324.8 tokens/s INFO:__main__:2024-10-26 22:44:43 | Epoch: 0 | Step: 21310 | Dataset: 0-17049000 | Loss: 2.218 | 679 ms/step , 57924.92 GFLOP/s , 527338.3 tokens/s INFO:__main__:2024-10-26 22:44:51 | Epoch: 0 | Step: 21320 | Dataset: 0-17057000 | Loss: 2.336 | 678 ms/step , 57983.06 GFLOP/s , 527930.3 tokens/s INFO:__main__:2024-10-26 22:44:59 | Epoch: 0 | Step: 21330 | Dataset: 0-17065000 | Loss: 2.331 | 678 ms/step , 57968.78 GFLOP/s , 527698.3 tokens/s INFO:__main__:2024-10-26 22:45:07 | Epoch: 0 | Step: 21340 | Dataset: 0-17073000 | Loss: 2.340 | 678 ms/step , 57945.14 GFLOP/s , 524914.4 tokens/s INFO:__main__:2024-10-26 22:45:14 | Epoch: 0 | Step: 21350 | Dataset: 0-17081000 | Loss: 2.332 | 679 ms/step , 57924.88 GFLOP/s , 527213.5 tokens/s INFO:__main__:2024-10-26 22:45:22 | Epoch: 0 | Step: 21360 | Dataset: 0-17089000 | Loss: 2.340 | 678 ms/step , 57995.49 GFLOP/s , 530031.1 tokens/s INFO:__main__:2024-10-26 22:45:30 | Epoch: 0 | Step: 21370 | Dataset: 0-17097000 | Loss: 2.346 | 683 ms/step , 57562.82 GFLOP/s , 529752.1 tokens/s INFO:__main__:2024-10-26 22:45:37 | Epoch: 0 | Step: 21380 | Dataset: 0-17105000 | Loss: 2.333 | 677 ms/step , 58096.71 GFLOP/s , 528156.2 tokens/s INFO:__main__:2024-10-26 22:45:45 | Epoch: 0 | Step: 21390 | Dataset: 0-17113000 | Loss: 2.241 | 678 ms/step , 58007.83 GFLOP/s , 529816.7 tokens/s INFO:__main__:2024-10-26 22:45:53 | Epoch: 0 | Step: 21400 | Dataset: 0-17121000 | Loss: 2.294 | 678 ms/step , 57948.34 GFLOP/s , 529196.4 tokens/s INFO:__main__:2024-10-26 22:46:01 | Epoch: 0 | Step: 21410 | Dataset: 0-17129000 | Loss: 2.303 | 678 ms/step , 57994.50 GFLOP/s , 528602.4 tokens/s INFO:__main__:2024-10-26 22:46:08 | Epoch: 0 | Step: 21420 | Dataset: 0-17137000 | Loss: 2.279 | 683 ms/step , 57566.90 GFLOP/s , 528825.4 tokens/s INFO:__main__:2024-10-26 22:46:16 | Epoch: 0 | Step: 21430 | Dataset: 0-17145000 | Loss: 2.258 | 678 ms/step , 57992.21 GFLOP/s , 527378.1 tokens/s INFO:__main__:2024-10-26 22:46:24 | Epoch: 0 | Step: 21440 | Dataset: 0-17153000 | Loss: 2.217 | 677 ms/step , 58080.65 GFLOP/s , 528360.7 tokens/s INFO:__main__:2024-10-26 22:46:32 | Epoch: 0 | Step: 21450 | Dataset: 0-17161000 | Loss: 2.203 | 678 ms/step , 57959.61 GFLOP/s , 527876.8 tokens/s INFO:__main__:2024-10-26 22:46:39 | Epoch: 0 | Step: 21460 | Dataset: 0-17169000 | Loss: 2.323 | 678 ms/step , 58000.95 GFLOP/s , 529656.5 tokens/s INFO:__main__:2024-10-26 22:46:47 | Epoch: 0 | Step: 21470 | Dataset: 0-17177000 | Loss: 2.224 | 681 ms/step , 57723.27 GFLOP/s , 529341.6 tokens/s INFO:__main__:2024-10-26 22:46:55 | Epoch: 0 | Step: 21480 | Dataset: 0-17185000 | Loss: 2.281 | 703 ms/step , 55910.06 GFLOP/s , 527656.2 tokens/s INFO:__main__:2024-10-26 22:47:03 | Epoch: 0 | Step: 21490 | Dataset: 0-17193000 | Loss: 1.940 | 679 ms/step , 57870.94 GFLOP/s , 527231.6 tokens/s INFO:__main__:2024-10-26 22:47:10 | Epoch: 0 | Step: 21500 | Dataset: 0-17201000 | Loss: 1.788 | 679 ms/step , 57911.76 GFLOP/s , 529285.0 tokens/s INFO:__main__:2024-10-26 22:47:18 | Epoch: 0 | Step: 21510 | Dataset: 0-17209000 | Loss: 1.748 | 679 ms/step , 57906.07 GFLOP/s , 528844.3 tokens/s INFO:__main__:2024-10-26 22:47:26 | Epoch: 0 | Step: 21520 | Dataset: 0-17217000 | Loss: 1.710 | 679 ms/step , 57929.34 GFLOP/s , 528860.8 tokens/s INFO:__main__:2024-10-26 22:47:34 | Epoch: 0 | Step: 21530 | Dataset: 0-17225000 | Loss: 1.724 | 677 ms/step , 58027.96 GFLOP/s , 529483.3 tokens/s INFO:__main__:2024-10-26 22:47:41 | Epoch: 0 | Step: 21540 | Dataset: 0-17233000 | Loss: 1.724 | 679 ms/step , 57890.68 GFLOP/s , 529388.4 tokens/s INFO:__main__:2024-10-26 22:47:49 | Epoch: 0 | Step: 21550 | Dataset: 0-17241000 | Loss: 1.712 | 684 ms/step , 57487.04 GFLOP/s , 529273.3 tokens/s INFO:__main__:2024-10-26 22:47:57 | Epoch: 0 | Step: 21560 | Dataset: 0-17249000 | Loss: 1.703 | 678 ms/step , 57987.05 GFLOP/s , 528880.8 tokens/s INFO:__main__:2024-10-26 22:48:05 | Epoch: 0 | Step: 21570 | Dataset: 0-17257000 | Loss: 1.719 | 678 ms/step , 58017.75 GFLOP/s , 527331.1 tokens/s INFO:__main__:2024-10-26 22:48:12 | Epoch: 0 | Step: 21580 | Dataset: 0-17265000 | Loss: 2.346 | 677 ms/step , 58104.94 GFLOP/s , 529778.0 tokens/s INFO:__main__:2024-10-26 22:48:20 | Epoch: 0 | Step: 21590 | Dataset: 0-17273000 | Loss: 2.342 | 676 ms/step , 58120.25 GFLOP/s , 529797.6 tokens/s INFO:__main__:2024-10-26 22:48:28 | Epoch: 0 | Step: 21600 | Dataset: 0-17281000 | Loss: 2.306 | 677 ms/step , 58063.99 GFLOP/s , 530536.6 tokens/s INFO:__main__:2024-10-26 22:48:36 | Epoch: 0 | Step: 21610 | Dataset: 0-17289000 | Loss: 2.327 | 677 ms/step , 58064.99 GFLOP/s , 530385.0 tokens/s INFO:__main__:2024-10-26 22:48:43 | Epoch: 0 | Step: 21620 | Dataset: 0-17297000 | Loss: 2.351 | 681 ms/step , 57695.19 GFLOP/s , 527491.8 tokens/s INFO:__main__:2024-10-26 22:48:51 | Epoch: 0 | Step: 21630 | Dataset: 0-17305000 | Loss: 2.298 | 678 ms/step , 57980.10 GFLOP/s , 523261.1 tokens/s INFO:__main__:2024-10-26 22:48:59 | Epoch: 0 | Step: 21640 | Dataset: 0-17313000 | Loss: 2.298 | 677 ms/step , 58055.69 GFLOP/s , 528026.4 tokens/s INFO:__main__:2024-10-26 22:49:07 | Epoch: 0 | Step: 21650 | Dataset: 0-17321000 | Loss: 2.270 | 676 ms/step , 58120.03 GFLOP/s , 528958.8 tokens/s INFO:__main__:2024-10-26 22:49:14 | Epoch: 0 | Step: 21660 | Dataset: 0-17329000 | Loss: 2.275 | 678 ms/step , 58006.49 GFLOP/s , 530625.9 tokens/s INFO:__main__:2024-10-26 22:49:22 | Epoch: 0 | Step: 21670 | Dataset: 0-17337000 | Loss: 2.189 | 680 ms/step , 57815.92 GFLOP/s , 529999.1 tokens/s INFO:__main__:2024-10-26 22:49:30 | Epoch: 0 | Step: 21680 | Dataset: 0-17345000 | Loss: 2.234 | 679 ms/step , 57916.35 GFLOP/s , 530010.2 tokens/s INFO:__main__:2024-10-26 22:49:38 | Epoch: 0 | Step: 21690 | Dataset: 0-17353000 | Loss: 2.255 | 677 ms/step , 58067.24 GFLOP/s , 530122.6 tokens/s INFO:__main__:2024-10-26 22:49:45 | Epoch: 0 | Step: 21700 | Dataset: 0-17361000 | Loss: 2.273 | 677 ms/step , 58049.36 GFLOP/s , 530020.8 tokens/s INFO:__main__:2024-10-26 22:49:53 | Epoch: 0 | Step: 21710 | Dataset: 0-17369000 | Loss: 2.151 | 677 ms/step , 58044.87 GFLOP/s , 529568.6 tokens/s INFO:__main__:2024-10-26 22:50:01 | Epoch: 0 | Step: 21720 | Dataset: 0-17377000 | Loss: 2.217 | 677 ms/step , 58074.44 GFLOP/s , 528755.7 tokens/s INFO:__main__:2024-10-26 22:50:09 | Epoch: 0 | Step: 21730 | Dataset: 0-17385000 | Loss: 2.285 | 676 ms/step , 58121.23 GFLOP/s , 529414.2 tokens/s INFO:__main__:2024-10-26 22:50:16 | Epoch: 0 | Step: 21740 | Dataset: 0-17393000 | Loss: 2.522 | 678 ms/step , 58020.29 GFLOP/s , 530770.0 tokens/s INFO:__main__:2024-10-26 22:50:24 | Epoch: 0 | Step: 21750 | Dataset: 0-17401000 | Loss: 2.410 | 677 ms/step , 58073.94 GFLOP/s , 530849.2 tokens/s INFO:__main__:2024-10-26 22:50:32 | Epoch: 0 | Step: 21760 | Dataset: 0-17409000 | Loss: 2.330 | 678 ms/step , 57996.57 GFLOP/s , 530452.7 tokens/s INFO:__main__:2024-10-26 22:50:39 | Epoch: 0 | Step: 21770 | Dataset: 0-17417000 | Loss: 2.287 | 676 ms/step , 58187.44 GFLOP/s , 530695.3 tokens/s INFO:__main__:2024-10-26 22:50:47 | Epoch: 0 | Step: 21780 | Dataset: 0-17425000 | Loss: 2.250 | 681 ms/step , 57739.00 GFLOP/s , 529177.1 tokens/s INFO:__main__:2024-10-26 22:50:55 | Epoch: 0 | Step: 21790 | Dataset: 0-17433000 | Loss: 2.238 | 677 ms/step , 58055.19 GFLOP/s , 530353.5 tokens/s INFO:__main__:2024-10-26 22:51:03 | Epoch: 0 | Step: 21800 | Dataset: 0-17441000 | Loss: 2.233 | 677 ms/step , 58090.64 GFLOP/s , 529104.8 tokens/s INFO:__main__:2024-10-26 22:51:10 | Epoch: 0 | Step: 21810 | Dataset: 0-17449000 | Loss: 2.194 | 677 ms/step , 58037.11 GFLOP/s , 529913.1 tokens/s INFO:__main__:2024-10-26 22:51:18 | Epoch: 0 | Step: 21820 | Dataset: 0-17457000 | Loss: 2.157 | 677 ms/step , 58082.90 GFLOP/s , 530866.9 tokens/s INFO:__main__:2024-10-26 22:51:26 | Epoch: 0 | Step: 21830 | Dataset: 0-17465000 | Loss: 2.161 | 679 ms/step , 57882.54 GFLOP/s , 530169.4 tokens/s INFO:__main__:2024-10-26 22:51:34 | Epoch: 0 | Step: 21840 | Dataset: 0-17473000 | Loss: 2.213 | 676 ms/step , 58131.06 GFLOP/s , 530095.2 tokens/s INFO:__main__:2024-10-26 22:51:41 | Epoch: 0 | Step: 21850 | Dataset: 0-17481000 | Loss: 2.142 | 676 ms/step , 58108.40 GFLOP/s , 529979.3 tokens/s INFO:__main__:2024-10-26 22:51:49 | Epoch: 0 | Step: 21860 | Dataset: 0-17489000 | Loss: 2.191 | 677 ms/step , 58061.27 GFLOP/s , 529722.0 tokens/s INFO:__main__:2024-10-26 22:51:57 | Epoch: 0 | Step: 21870 | Dataset: 0-17497000 | Loss: 2.128 | 676 ms/step , 58109.56 GFLOP/s , 525209.9 tokens/s INFO:__main__:2024-10-26 22:52:05 | Epoch: 0 | Step: 21880 | Dataset: 0-17505000 | Loss: 2.175 | 675 ms/step , 58223.87 GFLOP/s , 529579.1 tokens/s INFO:__main__:2024-10-26 22:52:12 | Epoch: 0 | Step: 21890 | Dataset: 0-17513000 | Loss: 2.101 | 675 ms/step , 58265.74 GFLOP/s , 529522.9 tokens/s INFO:__main__:2024-10-26 22:52:20 | Epoch: 0 | Step: 21900 | Dataset: 0-17521000 | Loss: 2.340 | 678 ms/step , 57960.00 GFLOP/s , 528896.0 tokens/s INFO:__main__:2024-10-26 22:52:28 | Epoch: 0 | Step: 21910 | Dataset: 0-17529000 | Loss: 2.371 | 677 ms/step , 58030.86 GFLOP/s , 528801.1 tokens/s INFO:__main__:2024-10-26 22:52:35 | Epoch: 0 | Step: 21920 | Dataset: 0-17537000 | Loss: 2.325 | 677 ms/step , 58077.37 GFLOP/s , 528590.7 tokens/s INFO:__main__:2024-10-26 22:52:43 | Epoch: 0 | Step: 21930 | Dataset: 0-17545000 | Loss: 2.196 | 677 ms/step , 58036.33 GFLOP/s , 529674.5 tokens/s INFO:__main__:2024-10-26 22:52:51 | Epoch: 0 | Step: 21940 | Dataset: 0-17553000 | Loss: 2.281 | 677 ms/step , 58057.24 GFLOP/s , 528627.7 tokens/s INFO:__main__:2024-10-26 22:52:59 | Epoch: 0 | Step: 21950 | Dataset: 0-17561000 | Loss: 2.269 | 677 ms/step , 58062.86 GFLOP/s , 528803.8 tokens/s INFO:__main__:2024-10-26 22:53:06 | Epoch: 0 | Step: 21960 | Dataset: 0-17569000 | Loss: 2.266 | 678 ms/step , 57941.91 GFLOP/s , 528094.5 tokens/s INFO:__main__:2024-10-26 22:53:14 | Epoch: 0 | Step: 21970 | Dataset: 0-17577000 | Loss: 2.186 | 678 ms/step , 57953.86 GFLOP/s , 528407.3 tokens/s INFO:__main__:2024-10-26 22:53:22 | Epoch: 0 | Step: 21980 | Dataset: 0-17585000 | Loss: 2.185 | 677 ms/step , 58033.43 GFLOP/s , 527998.9 tokens/s INFO:__main__:2024-10-26 22:53:30 | Epoch: 0 | Step: 21990 | Dataset: 0-17593000 | Loss: 2.210 | 679 ms/step , 57906.92 GFLOP/s , 526977.4 tokens/s INFO:__main__:2024-10-26 22:53:37 | Validation | Step: 22000 | Val_loss: 2.333 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 22:53:37 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_225337_step_22000.pt` INFO:__main__:2024-10-26 22:53:38 | Epoch: 0 | Step: 22000 | Dataset: 0-17601000 | Loss: 2.214 | 675 ms/step , 58214.87 GFLOP/s , 474030.9 tokens/s INFO:__main__:2024-10-26 22:53:46 | Epoch: 0 | Step: 22010 | Dataset: 0-17609000 | Loss: 2.184 | 676 ms/step , 58107.37 GFLOP/s , 527091.2 tokens/s INFO:__main__:2024-10-26 22:53:54 | Epoch: 0 | Step: 22020 | Dataset: 0-17617000 | Loss: 2.278 | 677 ms/step , 58032.39 GFLOP/s , 524432.4 tokens/s INFO:__main__:2024-10-26 22:54:02 | Epoch: 0 | Step: 22030 | Dataset: 0-17625000 | Loss: 2.248 | 677 ms/step , 58027.96 GFLOP/s , 529961.4 tokens/s INFO:__main__:2024-10-26 22:54:09 | Epoch: 0 | Step: 22040 | Dataset: 0-17633000 | Loss: 2.196 | 678 ms/step , 58011.61 GFLOP/s , 529939.5 tokens/s INFO:__main__:2024-10-26 22:54:17 | Epoch: 0 | Step: 22050 | Dataset: 0-17641000 | Loss: 2.219 | 678 ms/step , 57938.17 GFLOP/s , 529632.2 tokens/s INFO:__main__:2024-10-26 22:54:25 | Epoch: 0 | Step: 22060 | Dataset: 0-17649000 | Loss: 2.301 | 676 ms/step , 58106.94 GFLOP/s , 530357.8 tokens/s INFO:__main__:2024-10-26 22:54:33 | Epoch: 0 | Step: 22070 | Dataset: 0-17657000 | Loss: 2.246 | 676 ms/step , 58166.31 GFLOP/s , 529839.6 tokens/s INFO:__main__:2024-10-26 22:54:40 | Epoch: 0 | Step: 22080 | Dataset: 0-17665000 | Loss: 2.296 | 677 ms/step , 58044.88 GFLOP/s , 529229.4 tokens/s INFO:__main__:2024-10-26 22:54:48 | Epoch: 0 | Step: 22090 | Dataset: 0-17673000 | Loss: 2.220 | 676 ms/step , 58118.88 GFLOP/s , 529222.9 tokens/s INFO:__main__:2024-10-26 22:54:56 | Epoch: 0 | Step: 22100 | Dataset: 0-17681000 | Loss: 2.281 | 677 ms/step , 58052.48 GFLOP/s , 528713.7 tokens/s INFO:__main__:2024-10-26 22:55:04 | Epoch: 0 | Step: 22110 | Dataset: 0-17689000 | Loss: 2.279 | 678 ms/step , 58004.16 GFLOP/s , 530620.2 tokens/s INFO:__main__:2024-10-26 22:55:11 | Epoch: 0 | Step: 22120 | Dataset: 0-17697000 | Loss: 2.305 | 678 ms/step , 58011.12 GFLOP/s , 530985.5 tokens/s INFO:__main__:2024-10-26 22:55:19 | Epoch: 0 | Step: 22130 | Dataset: 0-17705000 | Loss: 2.160 | 679 ms/step , 57870.16 GFLOP/s , 530022.0 tokens/s INFO:__main__:2024-10-26 22:55:27 | Epoch: 0 | Step: 22140 | Dataset: 0-17713000 | Loss: 2.288 | 677 ms/step , 58058.31 GFLOP/s , 530767.0 tokens/s INFO:__main__:2024-10-26 22:55:34 | Epoch: 0 | Step: 22150 | Dataset: 0-17721000 | Loss: 2.302 | 677 ms/step , 58087.97 GFLOP/s , 529982.5 tokens/s INFO:__main__:2024-10-26 22:55:42 | Epoch: 0 | Step: 22160 | Dataset: 0-17729000 | Loss: 2.196 | 677 ms/step , 58036.90 GFLOP/s , 528537.8 tokens/s INFO:__main__:2024-10-26 22:55:50 | Epoch: 0 | Step: 22170 | Dataset: 0-17737000 | Loss: 2.267 | 677 ms/step , 58071.22 GFLOP/s , 529576.2 tokens/s INFO:__main__:2024-10-26 22:55:58 | Epoch: 0 | Step: 22180 | Dataset: 0-17745000 | Loss: 2.144 | 678 ms/step , 58020.90 GFLOP/s , 528149.5 tokens/s INFO:__main__:2024-10-26 22:56:05 | Epoch: 0 | Step: 22190 | Dataset: 0-17753000 | Loss: 2.259 | 678 ms/step , 57983.27 GFLOP/s , 528929.7 tokens/s INFO:__main__:2024-10-26 22:56:13 | Epoch: 0 | Step: 22200 | Dataset: 0-17761000 | Loss: 2.222 | 677 ms/step , 58056.56 GFLOP/s , 527545.7 tokens/s INFO:__main__:2024-10-26 22:56:21 | Epoch: 0 | Step: 22210 | Dataset: 0-17769000 | Loss: 2.248 | 678 ms/step , 58002.41 GFLOP/s , 527942.9 tokens/s INFO:__main__:2024-10-26 22:56:29 | Epoch: 0 | Step: 22220 | Dataset: 0-17777000 | Loss: 2.283 | 678 ms/step , 57964.88 GFLOP/s , 528019.2 tokens/s INFO:__main__:2024-10-26 22:56:36 | Epoch: 0 | Step: 22230 | Dataset: 0-17785000 | Loss: 2.400 | 678 ms/step , 57963.85 GFLOP/s , 528027.8 tokens/s INFO:__main__:2024-10-26 22:56:44 | Epoch: 0 | Step: 22240 | Dataset: 0-17793000 | Loss: 2.259 | 678 ms/step , 57936.84 GFLOP/s , 527505.6 tokens/s INFO:__main__:2024-10-26 22:56:52 | Epoch: 0 | Step: 22250 | Dataset: 0-17801000 | Loss: 2.266 | 678 ms/step , 58018.64 GFLOP/s , 528237.3 tokens/s INFO:__main__:2024-10-26 22:57:00 | Epoch: 0 | Step: 22260 | Dataset: 0-17809000 | Loss: 2.319 | 678 ms/step , 57996.23 GFLOP/s , 528153.8 tokens/s INFO:__main__:2024-10-26 22:57:07 | Epoch: 0 | Step: 22270 | Dataset: 0-17817000 | Loss: 2.205 | 679 ms/step , 57935.36 GFLOP/s , 528119.4 tokens/s INFO:__main__:2024-10-26 22:57:15 | Epoch: 0 | Step: 22280 | Dataset: 0-17825000 | Loss: 2.267 | 681 ms/step , 57686.42 GFLOP/s , 527844.9 tokens/s INFO:__main__:2024-10-26 22:57:23 | Epoch: 0 | Step: 22290 | Dataset: 0-17833000 | Loss: 2.248 | 682 ms/step , 57677.75 GFLOP/s , 526455.7 tokens/s INFO:__main__:2024-10-26 22:57:31 | Epoch: 0 | Step: 22300 | Dataset: 0-17841000 | Loss: 2.232 | 677 ms/step , 58032.03 GFLOP/s , 527875.6 tokens/s INFO:__main__:2024-10-26 22:57:39 | Epoch: 0 | Step: 22310 | Dataset: 0-17849000 | Loss: 2.265 | 676 ms/step , 58138.14 GFLOP/s , 528773.7 tokens/s INFO:__main__:2024-10-26 22:57:46 | Epoch: 0 | Step: 22320 | Dataset: 0-17857000 | Loss: 2.217 | 678 ms/step , 57971.85 GFLOP/s , 528495.8 tokens/s INFO:__main__:2024-10-26 22:57:54 | Epoch: 0 | Step: 22330 | Dataset: 0-17865000 | Loss: 2.279 | 678 ms/step , 58020.22 GFLOP/s , 528766.0 tokens/s INFO:__main__:2024-10-26 22:58:02 | Epoch: 0 | Step: 22340 | Dataset: 0-17873000 | Loss: 2.206 | 677 ms/step , 58087.11 GFLOP/s , 528565.0 tokens/s INFO:__main__:2024-10-26 22:58:10 | Epoch: 0 | Step: 22350 | Dataset: 0-17881000 | Loss: 2.233 | 677 ms/step , 58068.09 GFLOP/s , 525359.6 tokens/s INFO:__main__:2024-10-26 22:58:17 | Epoch: 0 | Step: 22360 | Dataset: 0-17889000 | Loss: 2.222 | 677 ms/step , 58031.71 GFLOP/s , 528611.2 tokens/s INFO:__main__:2024-10-26 22:58:25 | Epoch: 0 | Step: 22370 | Dataset: 0-17897000 | Loss: 2.264 | 678 ms/step , 57999.68 GFLOP/s , 528397.6 tokens/s INFO:__main__:2024-10-26 22:58:33 | Epoch: 0 | Step: 22380 | Dataset: 0-17905000 | Loss: 2.173 | 677 ms/step , 58051.80 GFLOP/s , 528649.2 tokens/s INFO:__main__:2024-10-26 22:58:41 | Epoch: 0 | Step: 22390 | Dataset: 0-17913000 | Loss: 1.857 | 676 ms/step , 58145.90 GFLOP/s , 529005.0 tokens/s INFO:__main__:2024-10-26 22:58:48 | Epoch: 0 | Step: 22400 | Dataset: 0-17921000 | Loss: 1.802 | 677 ms/step , 58027.58 GFLOP/s , 528264.3 tokens/s INFO:__main__:2024-10-26 22:58:56 | Epoch: 0 | Step: 22410 | Dataset: 0-17929000 | Loss: 1.774 | 679 ms/step , 57892.40 GFLOP/s , 528939.1 tokens/s INFO:__main__:2024-10-26 22:59:04 | Epoch: 0 | Step: 22420 | Dataset: 0-17937000 | Loss: 1.760 | 677 ms/step , 58053.95 GFLOP/s , 528740.4 tokens/s INFO:__main__:2024-10-26 22:59:12 | Epoch: 0 | Step: 22430 | Dataset: 0-17945000 | Loss: 1.750 | 676 ms/step , 58151.87 GFLOP/s , 526146.2 tokens/s INFO:__main__:2024-10-26 22:59:19 | Epoch: 0 | Step: 22440 | Dataset: 0-17953000 | Loss: 1.738 | 678 ms/step , 57997.23 GFLOP/s , 529969.9 tokens/s INFO:__main__:2024-10-26 22:59:27 | Epoch: 0 | Step: 22450 | Dataset: 0-17961000 | Loss: 1.711 | 677 ms/step , 58065.97 GFLOP/s , 530084.9 tokens/s INFO:__main__:2024-10-26 22:59:35 | Epoch: 0 | Step: 22460 | Dataset: 0-17969000 | Loss: 1.711 | 677 ms/step , 58030.43 GFLOP/s , 529963.0 tokens/s INFO:__main__:2024-10-26 22:59:43 | Epoch: 0 | Step: 22470 | Dataset: 0-17977000 | Loss: 1.747 | 677 ms/step , 58065.10 GFLOP/s , 529957.7 tokens/s INFO:__main__:2024-10-26 22:59:50 | Epoch: 0 | Step: 22480 | Dataset: 0-17985000 | Loss: 1.725 | 677 ms/step , 58039.93 GFLOP/s , 530312.2 tokens/s INFO:__main__:2024-10-26 22:59:58 | Epoch: 0 | Step: 22490 | Dataset: 0-17993000 | Loss: 1.736 | 677 ms/step , 58047.05 GFLOP/s , 530046.6 tokens/s INFO:__main__:2024-10-26 23:00:06 | Epoch: 0 | Step: 22500 | Dataset: 0-18001000 | Loss: 1.702 | 678 ms/step , 58007.99 GFLOP/s , 534247.7 tokens/s INFO:__main__:2024-10-26 23:00:13 | Epoch: 0 | Step: 22510 | Dataset: 0-18009000 | Loss: 1.693 | 678 ms/step , 58001.33 GFLOP/s , 525245.8 tokens/s INFO:__main__:2024-10-26 23:00:21 | Epoch: 0 | Step: 22520 | Dataset: 0-18017000 | Loss: 1.686 | 679 ms/step , 57859.57 GFLOP/s , 529053.6 tokens/s INFO:__main__:2024-10-26 23:00:29 | Epoch: 0 | Step: 22530 | Dataset: 0-18025000 | Loss: 1.691 | 677 ms/step , 58089.76 GFLOP/s , 529528.1 tokens/s INFO:__main__:2024-10-26 23:00:37 | Epoch: 0 | Step: 22540 | Dataset: 0-18033000 | Loss: 1.706 | 678 ms/step , 57940.02 GFLOP/s , 529509.1 tokens/s INFO:__main__:2024-10-26 23:00:44 | Epoch: 0 | Step: 22550 | Dataset: 0-18041000 | Loss: 1.691 | 676 ms/step , 58119.81 GFLOP/s , 529932.6 tokens/s INFO:__main__:2024-10-26 23:00:52 | Epoch: 0 | Step: 22560 | Dataset: 0-18049000 | Loss: 2.467 | 677 ms/step , 58087.51 GFLOP/s , 530018.6 tokens/s INFO:__main__:2024-10-26 23:01:00 | Epoch: 0 | Step: 22570 | Dataset: 0-18057000 | Loss: 2.339 | 676 ms/step , 58140.80 GFLOP/s , 530119.8 tokens/s INFO:__main__:2024-10-26 23:01:08 | Epoch: 0 | Step: 22580 | Dataset: 0-18065000 | Loss: 2.259 | 677 ms/step , 58042.96 GFLOP/s , 529890.9 tokens/s INFO:__main__:2024-10-26 23:01:15 | Epoch: 0 | Step: 22590 | Dataset: 0-18073000 | Loss: 2.261 | 677 ms/step , 58044.10 GFLOP/s , 528949.9 tokens/s INFO:__main__:2024-10-26 23:01:23 | Epoch: 0 | Step: 22600 | Dataset: 0-18081000 | Loss: 2.238 | 676 ms/step , 58153.47 GFLOP/s , 529471.0 tokens/s INFO:__main__:2024-10-26 23:01:31 | Epoch: 0 | Step: 22610 | Dataset: 0-18089000 | Loss: 2.168 | 677 ms/step , 58073.49 GFLOP/s , 531076.4 tokens/s INFO:__main__:2024-10-26 23:01:38 | Epoch: 0 | Step: 22620 | Dataset: 0-18097000 | Loss: 2.206 | 678 ms/step , 57968.98 GFLOP/s , 530057.1 tokens/s INFO:__main__:2024-10-26 23:01:46 | Epoch: 0 | Step: 22630 | Dataset: 0-18105000 | Loss: 2.241 | 678 ms/step , 58017.50 GFLOP/s , 529859.0 tokens/s INFO:__main__:2024-10-26 23:01:54 | Epoch: 0 | Step: 22640 | Dataset: 0-18113000 | Loss: 2.254 | 677 ms/step , 58054.81 GFLOP/s , 529575.2 tokens/s INFO:__main__:2024-10-26 23:02:02 | Epoch: 0 | Step: 22650 | Dataset: 0-18121000 | Loss: 2.256 | 677 ms/step , 58041.43 GFLOP/s , 529717.2 tokens/s INFO:__main__:2024-10-26 23:02:09 | Epoch: 0 | Step: 22660 | Dataset: 0-18129000 | Loss: 2.233 | 678 ms/step , 58012.62 GFLOP/s , 529796.5 tokens/s INFO:__main__:2024-10-26 23:02:17 | Epoch: 0 | Step: 22670 | Dataset: 0-18137000 | Loss: 2.177 | 677 ms/step , 58089.13 GFLOP/s , 530243.0 tokens/s INFO:__main__:2024-10-26 23:02:25 | Epoch: 0 | Step: 22680 | Dataset: 0-18145000 | Loss: 2.265 | 675 ms/step , 58195.98 GFLOP/s , 531051.0 tokens/s INFO:__main__:2024-10-26 23:02:33 | Epoch: 0 | Step: 22690 | Dataset: 0-18153000 | Loss: 2.143 | 677 ms/step , 58101.35 GFLOP/s , 530949.6 tokens/s INFO:__main__:2024-10-26 23:02:40 | Epoch: 0 | Step: 22700 | Dataset: 0-18161000 | Loss: 2.169 | 676 ms/step , 58173.33 GFLOP/s , 531147.1 tokens/s INFO:__main__:2024-10-26 23:02:48 | Epoch: 0 | Step: 22710 | Dataset: 0-18169000 | Loss: 2.267 | 676 ms/step , 58120.18 GFLOP/s , 531041.8 tokens/s INFO:__main__:2024-10-26 23:02:56 | Epoch: 0 | Step: 22720 | Dataset: 0-18177000 | Loss: 1.891 | 678 ms/step , 57975.64 GFLOP/s , 530758.3 tokens/s INFO:__main__:2024-10-26 23:03:03 | Epoch: 0 | Step: 22730 | Dataset: 0-18185000 | Loss: 1.868 | 677 ms/step , 58030.20 GFLOP/s , 530636.7 tokens/s INFO:__main__:2024-10-26 23:03:11 | Epoch: 0 | Step: 22740 | Dataset: 0-18193000 | Loss: 1.840 | 676 ms/step , 58137.44 GFLOP/s , 530590.2 tokens/s INFO:__main__:2024-10-26 23:03:19 | Epoch: 0 | Step: 22750 | Dataset: 0-18201000 | Loss: 1.833 | 678 ms/step , 57971.88 GFLOP/s , 530970.0 tokens/s INFO:__main__:2024-10-26 23:03:27 | Epoch: 0 | Step: 22760 | Dataset: 0-18209000 | Loss: 1.849 | 676 ms/step , 58148.90 GFLOP/s , 530981.8 tokens/s INFO:__main__:2024-10-26 23:03:34 | Epoch: 0 | Step: 22770 | Dataset: 0-18217000 | Loss: 1.819 | 677 ms/step , 58026.24 GFLOP/s , 530674.2 tokens/s INFO:__main__:2024-10-26 23:03:42 | Epoch: 0 | Step: 22780 | Dataset: 0-18225000 | Loss: 1.813 | 678 ms/step , 58019.21 GFLOP/s , 530052.6 tokens/s INFO:__main__:2024-10-26 23:03:50 | Epoch: 0 | Step: 22790 | Dataset: 0-18233000 | Loss: 1.805 | 676 ms/step , 58150.12 GFLOP/s , 530670.3 tokens/s INFO:__main__:2024-10-26 23:03:57 | Epoch: 0 | Step: 22800 | Dataset: 0-18241000 | Loss: 1.760 | 676 ms/step , 58172.98 GFLOP/s , 530836.9 tokens/s INFO:__main__:2024-10-26 23:04:05 | Epoch: 0 | Step: 22810 | Dataset: 0-18249000 | Loss: 2.356 | 677 ms/step , 58064.10 GFLOP/s , 530686.5 tokens/s INFO:__main__:2024-10-26 23:04:13 | Epoch: 0 | Step: 22820 | Dataset: 0-18257000 | Loss: 2.351 | 677 ms/step , 58021.17 GFLOP/s , 530405.4 tokens/s INFO:__main__:2024-10-26 23:04:21 | Epoch: 0 | Step: 22830 | Dataset: 0-18265000 | Loss: 2.305 | 677 ms/step , 58058.29 GFLOP/s , 530258.5 tokens/s INFO:__main__:2024-10-26 23:04:28 | Epoch: 0 | Step: 22840 | Dataset: 0-18273000 | Loss: 2.325 | 678 ms/step , 57971.69 GFLOP/s , 529949.7 tokens/s INFO:__main__:2024-10-26 23:04:36 | Epoch: 0 | Step: 22850 | Dataset: 0-18281000 | Loss: 2.311 | 678 ms/step , 57970.41 GFLOP/s , 528528.4 tokens/s INFO:__main__:2024-10-26 23:04:44 | Epoch: 0 | Step: 22860 | Dataset: 0-18289000 | Loss: 2.234 | 679 ms/step , 57892.89 GFLOP/s , 528532.5 tokens/s INFO:__main__:2024-10-26 23:04:52 | Epoch: 0 | Step: 22870 | Dataset: 0-18297000 | Loss: 2.362 | 677 ms/step , 58071.36 GFLOP/s , 528623.4 tokens/s INFO:__main__:2024-10-26 23:04:59 | Epoch: 0 | Step: 22880 | Dataset: 0-18305000 | Loss: 2.222 | 676 ms/step , 58107.01 GFLOP/s , 529251.2 tokens/s INFO:__main__:2024-10-26 23:05:07 | Epoch: 0 | Step: 22890 | Dataset: 0-18313000 | Loss: 2.243 | 677 ms/step , 58056.34 GFLOP/s , 528838.0 tokens/s INFO:__main__:2024-10-26 23:05:15 | Epoch: 0 | Step: 22900 | Dataset: 0-18321000 | Loss: 2.220 | 678 ms/step , 57967.03 GFLOP/s , 528966.6 tokens/s INFO:__main__:2024-10-26 23:05:23 | Epoch: 0 | Step: 22910 | Dataset: 0-18329000 | Loss: 2.292 | 679 ms/step , 57932.58 GFLOP/s , 528732.0 tokens/s INFO:__main__:2024-10-26 23:05:30 | Epoch: 0 | Step: 22920 | Dataset: 0-18337000 | Loss: 2.241 | 678 ms/step , 57937.93 GFLOP/s , 528168.1 tokens/s INFO:__main__:2024-10-26 23:05:38 | Epoch: 0 | Step: 22930 | Dataset: 0-18345000 | Loss: 2.247 | 678 ms/step , 57939.16 GFLOP/s , 528258.9 tokens/s INFO:__main__:2024-10-26 23:05:46 | Epoch: 0 | Step: 22940 | Dataset: 0-18353000 | Loss: 2.213 | 678 ms/step , 57986.24 GFLOP/s , 528940.6 tokens/s INFO:__main__:2024-10-26 23:05:54 | Epoch: 0 | Step: 22950 | Dataset: 0-18361000 | Loss: 2.262 | 678 ms/step , 57944.49 GFLOP/s , 528473.8 tokens/s INFO:__main__:2024-10-26 23:06:01 | Epoch: 0 | Step: 22960 | Dataset: 0-18369000 | Loss: 2.212 | 679 ms/step , 57864.90 GFLOP/s , 528143.6 tokens/s INFO:__main__:2024-10-26 23:06:09 | Epoch: 0 | Step: 22970 | Dataset: 0-18377000 | Loss: 2.236 | 678 ms/step , 57938.53 GFLOP/s , 528631.3 tokens/s INFO:__main__:2024-10-26 23:06:17 | Epoch: 0 | Step: 22980 | Dataset: 0-18385000 | Loss: 2.296 | 677 ms/step , 58023.89 GFLOP/s , 528429.6 tokens/s INFO:__main__:2024-10-26 23:06:25 | Epoch: 0 | Step: 22990 | Dataset: 0-18393000 | Loss: 2.265 | 678 ms/step , 57948.78 GFLOP/s , 528562.9 tokens/s INFO:__main__:2024-10-26 23:06:32 | Validation | Step: 23000 | Val_loss: 2.296 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 23:06:32 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_230632_step_23000.pt` INFO:__main__:2024-10-26 23:06:33 | Epoch: 0 | Step: 23000 | Dataset: 0-18401000 | Loss: 2.200 | 674 ms/step , 58306.38 GFLOP/s , 474573.7 tokens/s INFO:__main__:2024-10-26 23:06:41 | Epoch: 0 | Step: 23010 | Dataset: 0-18409000 | Loss: 2.191 | 679 ms/step , 57929.26 GFLOP/s , 528898.5 tokens/s INFO:__main__:2024-10-26 23:06:49 | Epoch: 0 | Step: 23020 | Dataset: 0-18417000 | Loss: 2.318 | 677 ms/step , 58067.39 GFLOP/s , 529678.8 tokens/s INFO:__main__:2024-10-26 23:06:56 | Epoch: 0 | Step: 23030 | Dataset: 0-18425000 | Loss: 2.248 | 677 ms/step , 58061.82 GFLOP/s , 529565.9 tokens/s INFO:__main__:2024-10-26 23:07:04 | Epoch: 0 | Step: 23040 | Dataset: 0-18433000 | Loss: 2.292 | 676 ms/step , 58172.38 GFLOP/s , 530069.9 tokens/s INFO:__main__:2024-10-26 23:07:12 | Epoch: 0 | Step: 23050 | Dataset: 0-18441000 | Loss: 2.279 | 675 ms/step , 58217.57 GFLOP/s , 530572.8 tokens/s INFO:__main__:2024-10-26 23:07:20 | Epoch: 0 | Step: 23060 | Dataset: 0-18449000 | Loss: 2.240 | 676 ms/step , 58128.59 GFLOP/s , 530123.5 tokens/s INFO:__main__:2024-10-26 23:07:27 | Epoch: 0 | Step: 23070 | Dataset: 0-18457000 | Loss: 2.214 | 677 ms/step , 58027.17 GFLOP/s , 529977.2 tokens/s INFO:__main__:2024-10-26 23:07:35 | Epoch: 0 | Step: 23080 | Dataset: 0-18465000 | Loss: 2.231 | 678 ms/step , 57992.22 GFLOP/s , 529345.7 tokens/s INFO:__main__:2024-10-26 23:07:43 | Epoch: 0 | Step: 23090 | Dataset: 0-18473000 | Loss: 2.276 | 677 ms/step , 58063.83 GFLOP/s , 529667.0 tokens/s INFO:__main__:2024-10-26 23:07:51 | Epoch: 0 | Step: 23100 | Dataset: 0-18481000 | Loss: 2.251 | 676 ms/step , 58174.09 GFLOP/s , 530333.3 tokens/s INFO:__main__:2024-10-26 23:07:58 | Epoch: 0 | Step: 23110 | Dataset: 0-18489000 | Loss: 2.243 | 676 ms/step , 58179.23 GFLOP/s , 530030.0 tokens/s INFO:__main__:2024-10-26 23:08:06 | Epoch: 0 | Step: 23120 | Dataset: 0-18497000 | Loss: 2.280 | 677 ms/step , 58027.75 GFLOP/s , 529548.0 tokens/s INFO:__main__:2024-10-26 23:08:14 | Epoch: 0 | Step: 23130 | Dataset: 0-18505000 | Loss: 2.195 | 675 ms/step , 58245.05 GFLOP/s , 530327.6 tokens/s INFO:__main__:2024-10-26 23:08:21 | Epoch: 0 | Step: 23140 | Dataset: 0-18513000 | Loss: 2.291 | 676 ms/step , 58132.20 GFLOP/s , 530639.0 tokens/s INFO:__main__:2024-10-26 23:08:29 | Epoch: 0 | Step: 23150 | Dataset: 0-18521000 | Loss: 2.162 | 676 ms/step , 58132.02 GFLOP/s , 530118.2 tokens/s INFO:__main__:2024-10-26 23:08:37 | Epoch: 0 | Step: 23160 | Dataset: 0-18529000 | Loss: 2.224 | 676 ms/step , 58133.90 GFLOP/s , 530572.1 tokens/s INFO:__main__:2024-10-26 23:08:45 | Epoch: 0 | Step: 23170 | Dataset: 0-18537000 | Loss: 2.241 | 677 ms/step , 58059.14 GFLOP/s , 529763.0 tokens/s INFO:__main__:2024-10-26 23:08:52 | Epoch: 0 | Step: 23180 | Dataset: 0-18545000 | Loss: 2.186 | 677 ms/step , 58075.63 GFLOP/s , 530012.9 tokens/s INFO:__main__:2024-10-26 23:09:00 | Epoch: 0 | Step: 23190 | Dataset: 0-18553000 | Loss: 2.280 | 676 ms/step , 58112.93 GFLOP/s , 529689.9 tokens/s INFO:__main__:2024-10-26 23:09:08 | Epoch: 0 | Step: 23200 | Dataset: 0-18561000 | Loss: 2.202 | 676 ms/step , 58119.29 GFLOP/s , 530865.2 tokens/s INFO:__main__:2024-10-26 23:09:16 | Epoch: 0 | Step: 23210 | Dataset: 0-18569000 | Loss: 2.105 | 677 ms/step , 58061.71 GFLOP/s , 530190.4 tokens/s INFO:__main__:2024-10-26 23:09:23 | Epoch: 0 | Step: 23220 | Dataset: 0-18577000 | Loss: 2.225 | 676 ms/step , 58108.93 GFLOP/s , 529932.8 tokens/s INFO:__main__:2024-10-26 23:09:31 | Epoch: 0 | Step: 23230 | Dataset: 0-18585000 | Loss: 2.180 | 677 ms/step , 58074.55 GFLOP/s , 529719.3 tokens/s INFO:__main__:2024-10-26 23:09:39 | Epoch: 0 | Step: 23240 | Dataset: 0-18593000 | Loss: 2.206 | 676 ms/step , 58162.61 GFLOP/s , 529846.6 tokens/s INFO:__main__:2024-10-26 23:09:46 | Epoch: 0 | Step: 23250 | Dataset: 0-18601000 | Loss: 2.163 | 676 ms/step , 58192.91 GFLOP/s , 530812.5 tokens/s INFO:__main__:2024-10-26 23:09:54 | Epoch: 0 | Step: 23260 | Dataset: 0-18609000 | Loss: 2.197 | 678 ms/step , 57979.90 GFLOP/s , 529053.1 tokens/s INFO:__main__:2024-10-26 23:10:02 | Epoch: 0 | Step: 23270 | Dataset: 0-18617000 | Loss: 2.106 | 679 ms/step , 57863.54 GFLOP/s , 528512.1 tokens/s INFO:__main__:2024-10-26 23:10:10 | Epoch: 0 | Step: 23280 | Dataset: 0-18625000 | Loss: 2.177 | 680 ms/step , 57833.56 GFLOP/s , 528078.7 tokens/s INFO:__main__:2024-10-26 23:10:17 | Epoch: 0 | Step: 23290 | Dataset: 0-18633000 | Loss: 2.337 | 678 ms/step , 58015.31 GFLOP/s , 528280.2 tokens/s INFO:__main__:2024-10-26 23:10:25 | Epoch: 0 | Step: 23300 | Dataset: 0-18641000 | Loss: 2.297 | 677 ms/step , 58076.90 GFLOP/s , 529390.7 tokens/s INFO:__main__:2024-10-26 23:10:33 | Epoch: 0 | Step: 23310 | Dataset: 0-18649000 | Loss: 2.281 | 677 ms/step , 58092.72 GFLOP/s , 529506.4 tokens/s INFO:__main__:2024-10-26 23:10:41 | Epoch: 0 | Step: 23320 | Dataset: 0-18657000 | Loss: 2.212 | 678 ms/step , 57969.87 GFLOP/s , 528760.6 tokens/s INFO:__main__:2024-10-26 23:10:48 | Epoch: 0 | Step: 23330 | Dataset: 0-18665000 | Loss: 2.245 | 679 ms/step , 57917.65 GFLOP/s , 528517.1 tokens/s INFO:__main__:2024-10-26 23:10:56 | Epoch: 0 | Step: 23340 | Dataset: 0-18673000 | Loss: 2.143 | 678 ms/step , 58018.46 GFLOP/s , 528893.8 tokens/s INFO:__main__:2024-10-26 23:11:04 | Epoch: 0 | Step: 23350 | Dataset: 0-18681000 | Loss: 2.242 | 677 ms/step , 58071.48 GFLOP/s , 528875.3 tokens/s INFO:__main__:2024-10-26 23:11:12 | Epoch: 0 | Step: 23360 | Dataset: 0-18689000 | Loss: 2.204 | 678 ms/step , 57980.40 GFLOP/s , 528061.5 tokens/s INFO:__main__:2024-10-26 23:11:19 | Epoch: 0 | Step: 23370 | Dataset: 0-18697000 | Loss: 2.187 | 677 ms/step , 58066.85 GFLOP/s , 528586.7 tokens/s INFO:__main__:2024-10-26 23:11:27 | Epoch: 0 | Step: 23380 | Dataset: 0-18705000 | Loss: 2.083 | 677 ms/step , 58043.33 GFLOP/s , 529600.5 tokens/s INFO:__main__:2024-10-26 23:11:35 | Epoch: 0 | Step: 23390 | Dataset: 0-18713000 | Loss: 2.198 | 678 ms/step , 58006.15 GFLOP/s , 529129.7 tokens/s INFO:__main__:2024-10-26 23:11:43 | Epoch: 0 | Step: 23400 | Dataset: 0-18721000 | Loss: 2.149 | 677 ms/step , 58037.27 GFLOP/s , 530374.0 tokens/s INFO:__main__:2024-10-26 23:11:50 | Epoch: 0 | Step: 23410 | Dataset: 0-18729000 | Loss: 2.214 | 676 ms/step , 58137.51 GFLOP/s , 529334.9 tokens/s INFO:__main__:2024-10-26 23:11:58 | Epoch: 0 | Step: 23420 | Dataset: 0-18737000 | Loss: 2.244 | 677 ms/step , 58098.00 GFLOP/s , 529875.2 tokens/s INFO:__main__:2024-10-26 23:12:06 | Epoch: 0 | Step: 23430 | Dataset: 0-18745000 | Loss: 2.152 | 677 ms/step , 58096.84 GFLOP/s , 529592.2 tokens/s INFO:__main__:2024-10-26 23:12:14 | Epoch: 0 | Step: 23440 | Dataset: 0-18753000 | Loss: 2.135 | 677 ms/step , 58048.34 GFLOP/s , 527677.3 tokens/s INFO:__main__:2024-10-26 23:12:21 | Epoch: 0 | Step: 23450 | Dataset: 0-18761000 | Loss: 2.060 | 677 ms/step , 58053.41 GFLOP/s , 529641.2 tokens/s INFO:__main__:2024-10-26 23:12:29 | Epoch: 0 | Step: 23460 | Dataset: 0-18769000 | Loss: 1.927 | 677 ms/step , 58050.18 GFLOP/s , 529215.6 tokens/s INFO:__main__:2024-10-26 23:12:37 | Epoch: 0 | Step: 23470 | Dataset: 0-18777000 | Loss: 1.859 | 677 ms/step , 58096.61 GFLOP/s , 529783.3 tokens/s INFO:__main__:2024-10-26 23:12:44 | Epoch: 0 | Step: 23480 | Dataset: 0-18785000 | Loss: 1.824 | 678 ms/step , 57963.83 GFLOP/s , 529840.1 tokens/s INFO:__main__:2024-10-26 23:12:52 | Epoch: 0 | Step: 23490 | Dataset: 0-18793000 | Loss: 1.859 | 678 ms/step , 58014.62 GFLOP/s , 530111.1 tokens/s INFO:__main__:2024-10-26 23:13:00 | Epoch: 0 | Step: 23500 | Dataset: 0-18801000 | Loss: 1.822 | 678 ms/step , 57971.02 GFLOP/s , 529716.9 tokens/s INFO:__main__:2024-10-26 23:13:08 | Epoch: 0 | Step: 23510 | Dataset: 0-18809000 | Loss: 1.829 | 679 ms/step , 57921.04 GFLOP/s , 529933.6 tokens/s INFO:__main__:2024-10-26 23:13:15 | Epoch: 0 | Step: 23520 | Dataset: 0-18817000 | Loss: 1.856 | 677 ms/step , 58024.30 GFLOP/s , 529762.8 tokens/s INFO:__main__:2024-10-26 23:13:23 | Epoch: 0 | Step: 23530 | Dataset: 0-18825000 | Loss: 1.798 | 678 ms/step , 57983.28 GFLOP/s , 530114.7 tokens/s INFO:__main__:2024-10-26 23:13:31 | Epoch: 0 | Step: 23540 | Dataset: 0-18833000 | Loss: 2.368 | 679 ms/step , 57935.44 GFLOP/s , 529837.3 tokens/s INFO:__main__:2024-10-26 23:13:39 | Epoch: 0 | Step: 23550 | Dataset: 0-18841000 | Loss: 2.312 | 677 ms/step , 58092.28 GFLOP/s , 530241.5 tokens/s INFO:__main__:2024-10-26 23:13:46 | Epoch: 0 | Step: 23560 | Dataset: 0-18849000 | Loss: 2.238 | 677 ms/step , 58064.01 GFLOP/s , 530894.3 tokens/s INFO:__main__:2024-10-26 23:13:54 | Epoch: 0 | Step: 23570 | Dataset: 0-18857000 | Loss: 2.241 | 677 ms/step , 58041.65 GFLOP/s , 530167.2 tokens/s INFO:__main__:2024-10-26 23:14:02 | Epoch: 0 | Step: 23580 | Dataset: 0-18865000 | Loss: 2.175 | 676 ms/step , 58150.57 GFLOP/s , 530846.3 tokens/s INFO:__main__:2024-10-26 23:14:09 | Epoch: 0 | Step: 23590 | Dataset: 0-18873000 | Loss: 2.252 | 677 ms/step , 58070.92 GFLOP/s , 530538.1 tokens/s INFO:__main__:2024-10-26 23:14:17 | Epoch: 0 | Step: 23600 | Dataset: 0-18881000 | Loss: 2.215 | 678 ms/step , 57996.88 GFLOP/s , 529922.2 tokens/s INFO:__main__:2024-10-26 23:14:25 | Epoch: 0 | Step: 23610 | Dataset: 0-18889000 | Loss: 2.228 | 678 ms/step , 57957.92 GFLOP/s , 529903.4 tokens/s INFO:__main__:2024-10-26 23:14:33 | Epoch: 0 | Step: 23620 | Dataset: 0-18897000 | Loss: 2.286 | 679 ms/step , 57860.51 GFLOP/s , 530302.5 tokens/s INFO:__main__:2024-10-26 23:14:40 | Epoch: 0 | Step: 23630 | Dataset: 0-18905000 | Loss: 2.163 | 677 ms/step , 58032.96 GFLOP/s , 530061.2 tokens/s INFO:__main__:2024-10-26 23:14:48 | Epoch: 0 | Step: 23640 | Dataset: 0-18913000 | Loss: 2.231 | 678 ms/step , 57972.00 GFLOP/s , 530305.1 tokens/s INFO:__main__:2024-10-26 23:14:56 | Epoch: 0 | Step: 23650 | Dataset: 0-18921000 | Loss: 2.173 | 677 ms/step , 58067.43 GFLOP/s , 530464.8 tokens/s INFO:__main__:2024-10-26 23:15:04 | Epoch: 0 | Step: 23660 | Dataset: 0-18929000 | Loss: 2.308 | 676 ms/step , 58134.03 GFLOP/s , 530525.0 tokens/s INFO:__main__:2024-10-26 23:15:11 | Epoch: 0 | Step: 23670 | Dataset: 0-18937000 | Loss: 2.140 | 681 ms/step , 57715.36 GFLOP/s , 530983.3 tokens/s INFO:__main__:2024-10-26 23:15:19 | Epoch: 0 | Step: 23680 | Dataset: 0-18945000 | Loss: 2.073 | 676 ms/step , 58124.31 GFLOP/s , 530530.8 tokens/s INFO:__main__:2024-10-26 23:15:27 | Epoch: 0 | Step: 23690 | Dataset: 0-18953000 | Loss: 2.239 | 677 ms/step , 58092.43 GFLOP/s , 530918.4 tokens/s INFO:__main__:2024-10-26 23:15:34 | Epoch: 0 | Step: 23700 | Dataset: 0-18961000 | Loss: 2.317 | 678 ms/step , 57990.24 GFLOP/s , 529993.4 tokens/s INFO:__main__:2024-10-26 23:15:42 | Epoch: 0 | Step: 23710 | Dataset: 0-18969000 | Loss: 2.273 | 676 ms/step , 58152.74 GFLOP/s , 530709.0 tokens/s INFO:__main__:2024-10-26 23:15:50 | Epoch: 0 | Step: 23720 | Dataset: 0-18977000 | Loss: 2.119 | 677 ms/step , 58102.94 GFLOP/s , 531238.3 tokens/s INFO:__main__:2024-10-26 23:15:58 | Epoch: 0 | Step: 23730 | Dataset: 0-18985000 | Loss: 2.286 | 678 ms/step , 58013.54 GFLOP/s , 530560.5 tokens/s INFO:__main__:2024-10-26 23:16:05 | Epoch: 0 | Step: 23740 | Dataset: 0-18993000 | Loss: 2.294 | 676 ms/step , 58146.34 GFLOP/s , 530628.4 tokens/s INFO:__main__:2024-10-26 23:16:13 | Epoch: 0 | Step: 23750 | Dataset: 0-19001000 | Loss: 2.274 | 677 ms/step , 58096.44 GFLOP/s , 530885.5 tokens/s INFO:__main__:2024-10-26 23:16:21 | Epoch: 0 | Step: 23760 | Dataset: 0-19009000 | Loss: 2.256 | 676 ms/step , 58120.88 GFLOP/s , 531077.1 tokens/s INFO:__main__:2024-10-26 23:16:28 | Epoch: 0 | Step: 23770 | Dataset: 0-19017000 | Loss: 2.210 | 678 ms/step , 57999.81 GFLOP/s , 530629.4 tokens/s INFO:__main__:2024-10-26 23:16:36 | Epoch: 0 | Step: 23780 | Dataset: 0-19025000 | Loss: 2.177 | 677 ms/step , 58053.02 GFLOP/s , 530971.2 tokens/s INFO:__main__:2024-10-26 23:16:44 | Epoch: 0 | Step: 23790 | Dataset: 0-19033000 | Loss: 2.202 | 677 ms/step , 58066.59 GFLOP/s , 530616.5 tokens/s INFO:__main__:2024-10-26 23:16:52 | Epoch: 0 | Step: 23800 | Dataset: 0-19041000 | Loss: 2.264 | 678 ms/step , 57985.67 GFLOP/s , 530661.4 tokens/s INFO:__main__:2024-10-26 23:16:59 | Epoch: 0 | Step: 23810 | Dataset: 0-19049000 | Loss: 2.236 | 677 ms/step , 58067.86 GFLOP/s , 531152.7 tokens/s INFO:__main__:2024-10-26 23:17:07 | Epoch: 0 | Step: 23820 | Dataset: 0-19057000 | Loss: 2.213 | 676 ms/step , 58112.63 GFLOP/s , 531080.0 tokens/s INFO:__main__:2024-10-26 23:17:15 | Epoch: 0 | Step: 23830 | Dataset: 0-19065000 | Loss: 2.226 | 676 ms/step , 58147.69 GFLOP/s , 531182.7 tokens/s INFO:__main__:2024-10-26 23:17:22 | Epoch: 0 | Step: 23840 | Dataset: 0-19073000 | Loss: 2.200 | 677 ms/step , 58076.19 GFLOP/s , 530911.0 tokens/s INFO:__main__:2024-10-26 23:17:30 | Epoch: 0 | Step: 23850 | Dataset: 0-19081000 | Loss: 2.249 | 676 ms/step , 58135.99 GFLOP/s , 530831.5 tokens/s INFO:__main__:2024-10-26 23:17:38 | Epoch: 0 | Step: 23860 | Dataset: 0-19089000 | Loss: 2.259 | 675 ms/step , 58219.91 GFLOP/s , 530921.2 tokens/s INFO:__main__:2024-10-26 23:17:46 | Epoch: 0 | Step: 23870 | Dataset: 0-19097000 | Loss: 2.256 | 677 ms/step , 58092.99 GFLOP/s , 530920.5 tokens/s INFO:__main__:2024-10-26 23:17:53 | Epoch: 0 | Step: 23880 | Dataset: 0-19105000 | Loss: 2.242 | 677 ms/step , 58061.11 GFLOP/s , 530938.5 tokens/s INFO:__main__:2024-10-26 23:18:01 | Epoch: 0 | Step: 23890 | Dataset: 0-19113000 | Loss: 2.217 | 677 ms/step , 58083.67 GFLOP/s , 531076.8 tokens/s INFO:__main__:2024-10-26 23:18:09 | Epoch: 0 | Step: 23900 | Dataset: 0-19121000 | Loss: 2.272 | 676 ms/step , 58111.46 GFLOP/s , 530661.2 tokens/s INFO:__main__:2024-10-26 23:18:16 | Epoch: 0 | Step: 23910 | Dataset: 0-19129000 | Loss: 2.145 | 677 ms/step , 58046.74 GFLOP/s , 530749.0 tokens/s INFO:__main__:2024-10-26 23:18:24 | Epoch: 0 | Step: 23920 | Dataset: 0-19137000 | Loss: 2.230 | 676 ms/step , 58138.25 GFLOP/s , 531059.5 tokens/s INFO:__main__:2024-10-26 23:18:32 | Epoch: 0 | Step: 23930 | Dataset: 0-19145000 | Loss: 2.163 | 675 ms/step , 58230.55 GFLOP/s , 531463.8 tokens/s INFO:__main__:2024-10-26 23:18:40 | Epoch: 0 | Step: 23940 | Dataset: 0-19153000 | Loss: 2.259 | 676 ms/step , 58175.34 GFLOP/s , 531817.5 tokens/s INFO:__main__:2024-10-26 23:18:47 | Epoch: 0 | Step: 23950 | Dataset: 0-19161000 | Loss: 2.251 | 677 ms/step , 58101.01 GFLOP/s , 531267.8 tokens/s INFO:__main__:2024-10-26 23:18:55 | Epoch: 0 | Step: 23960 | Dataset: 0-19169000 | Loss: 2.248 | 677 ms/step , 58097.95 GFLOP/s , 530813.6 tokens/s INFO:__main__:2024-10-26 23:19:03 | Epoch: 0 | Step: 23970 | Dataset: 0-19177000 | Loss: 2.208 | 677 ms/step , 58067.44 GFLOP/s , 530958.3 tokens/s INFO:__main__:2024-10-26 23:19:10 | Epoch: 0 | Step: 23980 | Dataset: 0-19185000 | Loss: 2.245 | 676 ms/step , 58129.10 GFLOP/s , 528228.9 tokens/s INFO:__main__:2024-10-26 23:19:18 | Epoch: 0 | Step: 23990 | Dataset: 0-19193000 | Loss: 2.169 | 676 ms/step , 58182.92 GFLOP/s , 531148.5 tokens/s INFO:__main__:2024-10-26 23:19:25 | Validation | Step: 24000 | Val_loss: 2.238 | Best_val_loss: 2.2627 INFO:__main__:2024-10-26 23:19:25 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_231925_step_24000.pt` INFO:__main__:2024-10-26 23:19:27 | Epoch: 0 | Step: 24000 | Dataset: 0-19201000 | Loss: 2.214 | 676 ms/step , 58169.07 GFLOP/s , 478230.9 tokens/s INFO:__main__:2024-10-26 23:19:34 | Epoch: 0 | Step: 24010 | Dataset: 0-19209000 | Loss: 2.236 | 676 ms/step , 58147.01 GFLOP/s , 531011.7 tokens/s INFO:__main__:2024-10-26 23:19:42 | Epoch: 0 | Step: 24020 | Dataset: 0-19217000 | Loss: 2.900 | 676 ms/step , 58132.50 GFLOP/s , 531313.0 tokens/s INFO:__main__:2024-10-26 23:19:50 | Epoch: 0 | Step: 24030 | Dataset: 0-19225000 | Loss: 2.708 | 676 ms/step , 58142.38 GFLOP/s , 532014.4 tokens/s INFO:__main__:2024-10-26 23:19:58 | Epoch: 0 | Step: 24040 | Dataset: 0-19233000 | Loss: 2.596 | 675 ms/step , 58220.21 GFLOP/s , 532157.2 tokens/s INFO:__main__:2024-10-26 23:20:05 | Epoch: 0 | Step: 24050 | Dataset: 0-19241000 | Loss: 2.615 | 675 ms/step , 58252.54 GFLOP/s , 532075.5 tokens/s INFO:__main__:2024-10-26 23:20:13 | Epoch: 0 | Step: 24060 | Dataset: 0-19249000 | Loss: 2.613 | 678 ms/step , 58019.83 GFLOP/s , 531212.7 tokens/s INFO:__main__:2024-10-26 23:20:21 | Epoch: 0 | Step: 24070 | Dataset: 0-19257000 | Loss: 2.548 | 676 ms/step , 58150.78 GFLOP/s , 531496.6 tokens/s INFO:__main__:2024-10-26 23:20:28 | Epoch: 0 | Step: 24080 | Dataset: 0-19265000 | Loss: 2.612 | 677 ms/step , 58095.93 GFLOP/s , 531439.6 tokens/s INFO:__main__:2024-10-26 23:20:36 | Epoch: 0 | Step: 24090 | Dataset: 0-19273000 | Loss: 2.542 | 675 ms/step , 58214.38 GFLOP/s , 531757.9 tokens/s INFO:__main__:2024-10-26 23:20:44 | Epoch: 0 | Step: 24100 | Dataset: 0-19281000 | Loss: 2.489 | 676 ms/step , 58185.27 GFLOP/s , 531900.2 tokens/s INFO:__main__:2024-10-26 23:20:51 | Epoch: 0 | Step: 24110 | Dataset: 0-19289000 | Loss: 2.515 | 676 ms/step , 58149.54 GFLOP/s , 531365.0 tokens/s INFO:__main__:2024-10-26 23:20:59 | Epoch: 0 | Step: 24120 | Dataset: 0-19297000 | Loss: 2.475 | 677 ms/step , 58074.17 GFLOP/s , 531117.1 tokens/s INFO:__main__:2024-10-26 23:21:07 | Epoch: 0 | Step: 24130 | Dataset: 0-19305000 | Loss: 2.540 | 676 ms/step , 58160.33 GFLOP/s , 531247.5 tokens/s INFO:__main__:2024-10-26 23:21:15 | Epoch: 0 | Step: 24140 | Dataset: 0-19313000 | Loss: 2.472 | 676 ms/step , 58123.84 GFLOP/s , 531202.3 tokens/s INFO:__main__:2024-10-26 23:21:22 | Epoch: 0 | Step: 24150 | Dataset: 0-19321000 | Loss: 2.498 | 677 ms/step , 58021.53 GFLOP/s , 531789.5 tokens/s INFO:__main__:2024-10-26 23:21:30 | Epoch: 0 | Step: 24160 | Dataset: 0-19329000 | Loss: 2.548 | 676 ms/step , 58132.38 GFLOP/s , 531190.9 tokens/s INFO:__main__:2024-10-26 23:21:38 | Epoch: 0 | Step: 24170 | Dataset: 0-19337000 | Loss: 2.508 | 677 ms/step , 58050.88 GFLOP/s , 531297.8 tokens/s INFO:__main__:2024-10-26 23:21:45 | Epoch: 0 | Step: 24180 | Dataset: 0-19345000 | Loss: 2.450 | 677 ms/step , 58050.13 GFLOP/s , 530656.1 tokens/s INFO:__main__:2024-10-26 23:21:53 | Epoch: 0 | Step: 24190 | Dataset: 0-19353000 | Loss: 2.360 | 677 ms/step , 58023.23 GFLOP/s , 530332.2 tokens/s INFO:__main__:2024-10-26 23:22:01 | Epoch: 0 | Step: 24200 | Dataset: 0-19361000 | Loss: 2.266 | 679 ms/step , 57929.84 GFLOP/s , 525411.1 tokens/s INFO:__main__:2024-10-26 23:22:09 | Epoch: 0 | Step: 24210 | Dataset: 0-19369000 | Loss: 2.282 | 677 ms/step , 58027.28 GFLOP/s , 529418.9 tokens/s INFO:__main__:2024-10-26 23:22:16 | Epoch: 0 | Step: 24220 | Dataset: 0-19377000 | Loss: 2.221 | 679 ms/step , 57881.92 GFLOP/s , 528385.6 tokens/s INFO:__main__:2024-10-26 23:22:24 | Epoch: 0 | Step: 24230 | Dataset: 0-19385000 | Loss: 2.219 | 677 ms/step , 58056.85 GFLOP/s , 528078.0 tokens/s INFO:__main__:2024-10-26 23:22:32 | Epoch: 0 | Step: 24240 | Dataset: 0-19393000 | Loss: 2.256 | 678 ms/step , 58011.96 GFLOP/s , 528869.6 tokens/s INFO:__main__:2024-10-26 23:22:40 | Epoch: 0 | Step: 24250 | Dataset: 0-19401000 | Loss: 2.192 | 678 ms/step , 57977.76 GFLOP/s , 529429.1 tokens/s INFO:__main__:2024-10-26 23:22:47 | Epoch: 0 | Step: 24260 | Dataset: 0-19409000 | Loss: 2.239 | 677 ms/step , 58079.99 GFLOP/s , 529443.6 tokens/s INFO:__main__:2024-10-26 23:22:55 | Epoch: 0 | Step: 24270 | Dataset: 0-19417000 | Loss: 2.274 | 677 ms/step , 58061.36 GFLOP/s , 530966.4 tokens/s INFO:__main__:2024-10-26 23:23:03 | Epoch: 0 | Step: 24280 | Dataset: 0-19425000 | Loss: 2.270 | 677 ms/step , 58021.50 GFLOP/s , 530997.3 tokens/s INFO:__main__:2024-10-26 23:23:11 | Epoch: 0 | Step: 24290 | Dataset: 0-19433000 | Loss: 2.181 | 677 ms/step , 58022.11 GFLOP/s , 529868.6 tokens/s INFO:__main__:2024-10-26 23:23:18 | Epoch: 0 | Step: 24300 | Dataset: 0-19441000 | Loss: 2.204 | 677 ms/step , 58053.02 GFLOP/s , 530638.0 tokens/s INFO:__main__:2024-10-26 23:23:26 | Epoch: 0 | Step: 24310 | Dataset: 0-19449000 | Loss: 2.273 | 676 ms/step , 58114.09 GFLOP/s , 530661.6 tokens/s INFO:__main__:2024-10-26 23:23:34 | Epoch: 0 | Step: 24320 | Dataset: 0-19457000 | Loss: 2.240 | 676 ms/step , 58165.68 GFLOP/s , 531335.5 tokens/s INFO:__main__:2024-10-26 23:23:41 | Epoch: 0 | Step: 24330 | Dataset: 0-19465000 | Loss: 2.245 | 677 ms/step , 58096.58 GFLOP/s , 531356.6 tokens/s INFO:__main__:2024-10-26 23:23:49 | Epoch: 0 | Step: 24340 | Dataset: 0-19473000 | Loss: 2.286 | 676 ms/step , 58115.79 GFLOP/s , 530747.1 tokens/s INFO:__main__:2024-10-26 23:23:57 | Epoch: 0 | Step: 24350 | Dataset: 0-19481000 | Loss: 2.203 | 677 ms/step , 58085.54 GFLOP/s , 531106.0 tokens/s INFO:__main__:2024-10-26 23:24:05 | Epoch: 0 | Step: 24360 | Dataset: 0-19489000 | Loss: 2.222 | 676 ms/step , 58143.26 GFLOP/s , 530932.4 tokens/s INFO:__main__:2024-10-26 23:24:12 | Epoch: 0 | Step: 24370 | Dataset: 0-19497000 | Loss: 2.184 | 676 ms/step , 58184.02 GFLOP/s , 531387.2 tokens/s INFO:__main__:2024-10-26 23:24:20 | Epoch: 0 | Step: 24380 | Dataset: 0-19505000 | Loss: 2.161 | 677 ms/step , 58080.65 GFLOP/s , 531047.7 tokens/s INFO:__main__:2024-10-26 23:24:28 | Epoch: 0 | Step: 24390 | Dataset: 0-19513000 | Loss: 2.233 | 677 ms/step , 58057.76 GFLOP/s , 530889.7 tokens/s INFO:__main__:2024-10-26 23:24:35 | Epoch: 0 | Step: 24400 | Dataset: 0-19521000 | Loss: 2.263 | 676 ms/step , 58116.22 GFLOP/s , 531102.4 tokens/s INFO:__main__:2024-10-26 23:24:43 | Epoch: 0 | Step: 24410 | Dataset: 0-19529000 | Loss: 2.173 | 676 ms/step , 58165.33 GFLOP/s , 530760.4 tokens/s INFO:__main__:2024-10-26 23:24:51 | Epoch: 0 | Step: 24420 | Dataset: 0-19537000 | Loss: 2.161 | 675 ms/step , 58196.37 GFLOP/s , 531404.9 tokens/s INFO:__main__:2024-10-26 23:24:59 | Epoch: 0 | Step: 24430 | Dataset: 0-19545000 | Loss: 2.171 | 676 ms/step , 58151.79 GFLOP/s , 531216.3 tokens/s INFO:__main__:2024-10-26 23:25:06 | Epoch: 0 | Step: 24440 | Dataset: 0-19553000 | Loss: 2.264 | 676 ms/step , 58187.22 GFLOP/s , 530822.3 tokens/s INFO:__main__:2024-10-26 23:25:14 | Epoch: 0 | Step: 24450 | Dataset: 0-19561000 | Loss: 2.189 | 676 ms/step , 58120.80 GFLOP/s , 531316.3 tokens/s INFO:__main__:2024-10-26 23:25:22 | Epoch: 0 | Step: 24460 | Dataset: 0-19569000 | Loss: 2.151 | 676 ms/step , 58118.76 GFLOP/s , 530864.1 tokens/s INFO:__main__:2024-10-26 23:25:29 | Epoch: 0 | Step: 24470 | Dataset: 0-19577000 | Loss: 2.210 | 676 ms/step , 58149.63 GFLOP/s , 531581.3 tokens/s INFO:__main__:2024-10-26 23:25:37 | Epoch: 0 | Step: 24480 | Dataset: 0-19585000 | Loss: 2.229 | 676 ms/step , 58108.06 GFLOP/s , 531499.2 tokens/s INFO:__main__:2024-10-26 23:25:45 | Epoch: 0 | Step: 24490 | Dataset: 0-19593000 | Loss: 2.135 | 677 ms/step , 58022.65 GFLOP/s , 530815.5 tokens/s INFO:__main__:2024-10-26 23:25:53 | Epoch: 0 | Step: 24500 | Dataset: 0-19601000 | Loss: 2.275 | 678 ms/step , 58000.45 GFLOP/s , 530795.2 tokens/s INFO:__main__:2024-10-26 23:26:00 | Epoch: 0 | Step: 24510 | Dataset: 0-19609000 | Loss: 2.265 | 677 ms/step , 58053.28 GFLOP/s , 530828.9 tokens/s INFO:__main__:2024-10-26 23:26:08 | Epoch: 0 | Step: 24520 | Dataset: 0-19617000 | Loss: 2.241 | 677 ms/step , 58100.82 GFLOP/s , 530931.1 tokens/s INFO:__main__:2024-10-26 23:26:16 | Epoch: 0 | Step: 24530 | Dataset: 0-19625000 | Loss: 2.228 | 677 ms/step , 58089.55 GFLOP/s , 531114.7 tokens/s INFO:__main__:2024-10-26 23:26:23 | Epoch: 0 | Step: 24540 | Dataset: 0-19633000 | Loss: 2.261 | 675 ms/step , 58211.59 GFLOP/s , 531684.3 tokens/s INFO:__main__:2024-10-26 23:26:31 | Epoch: 0 | Step: 24550 | Dataset: 0-19641000 | Loss: 2.225 | 677 ms/step , 58050.28 GFLOP/s , 531902.6 tokens/s INFO:__main__:2024-10-26 23:26:39 | Epoch: 0 | Step: 24560 | Dataset: 0-19649000 | Loss: 2.220 | 675 ms/step , 58224.91 GFLOP/s , 531358.5 tokens/s INFO:__main__:2024-10-26 23:26:47 | Epoch: 0 | Step: 24570 | Dataset: 0-19657000 | Loss: 2.225 | 676 ms/step , 58137.32 GFLOP/s , 532015.5 tokens/s INFO:__main__:2024-10-26 23:26:54 | Epoch: 0 | Step: 24580 | Dataset: 0-19665000 | Loss: 2.155 | 676 ms/step , 58125.38 GFLOP/s , 531263.8 tokens/s INFO:__main__:2024-10-26 23:27:02 | Epoch: 0 | Step: 24590 | Dataset: 0-19673000 | Loss: 2.293 | 676 ms/step , 58114.80 GFLOP/s , 531286.3 tokens/s INFO:__main__:2024-10-26 23:27:10 | Epoch: 0 | Step: 24600 | Dataset: 0-19681000 | Loss: 2.270 | 677 ms/step , 58031.31 GFLOP/s , 530506.7 tokens/s INFO:__main__:2024-10-26 23:27:17 | Epoch: 0 | Step: 24610 | Dataset: 0-19689000 | Loss: 2.253 | 677 ms/step , 58053.25 GFLOP/s , 530877.9 tokens/s INFO:__main__:2024-10-26 23:27:25 | Epoch: 0 | Step: 24620 | Dataset: 0-19697000 | Loss: 2.163 | 676 ms/step , 58107.38 GFLOP/s , 530902.8 tokens/s INFO:__main__:2024-10-26 23:27:33 | Epoch: 0 | Step: 24630 | Dataset: 0-19705000 | Loss: 2.234 | 678 ms/step , 57994.25 GFLOP/s , 530639.2 tokens/s INFO:__main__:2024-10-26 23:27:41 | Epoch: 0 | Step: 24640 | Dataset: 0-19713000 | Loss: 2.262 | 676 ms/step , 58175.34 GFLOP/s , 531567.0 tokens/s INFO:__main__:2024-10-26 23:27:48 | Epoch: 0 | Step: 24650 | Dataset: 0-19721000 | Loss: 2.184 | 675 ms/step , 58194.43 GFLOP/s , 530729.1 tokens/s INFO:__main__:2024-10-26 23:27:56 | Epoch: 1 | Step: 24660 | Dataset: 0-901 | Loss: 2.247 | 676 ms/step , 58188.49 GFLOP/s , 530784.2 tokens/s INFO:__main__:2024-10-26 23:28:04 | Epoch: 1 | Step: 24670 | Dataset: 0-8901 | Loss: 2.035 | 675 ms/step , 58223.28 GFLOP/s , 530900.5 tokens/s INFO:__main__:2024-10-26 23:28:11 | Epoch: 1 | Step: 24680 | Dataset: 0-16901 | Loss: 1.950 | 677 ms/step , 58047.30 GFLOP/s , 530816.5 tokens/s INFO:__main__:2024-10-26 23:28:19 | Epoch: 1 | Step: 24690 | Dataset: 0-24901 | Loss: 1.875 | 676 ms/step , 58132.09 GFLOP/s , 530617.3 tokens/s INFO:__main__:2024-10-26 23:28:27 | Epoch: 1 | Step: 24700 | Dataset: 0-32901 | Loss: 1.882 | 676 ms/step , 58153.19 GFLOP/s , 530353.4 tokens/s INFO:__main__:2024-10-26 23:28:35 | Epoch: 1 | Step: 24710 | Dataset: 0-40901 | Loss: 1.827 | 677 ms/step , 58065.24 GFLOP/s , 530367.1 tokens/s INFO:__main__:2024-10-26 23:28:42 | Epoch: 1 | Step: 24720 | Dataset: 0-48901 | Loss: 1.859 | 677 ms/step , 58097.91 GFLOP/s , 530476.2 tokens/s INFO:__main__:2024-10-26 23:28:50 | Epoch: 1 | Step: 24730 | Dataset: 0-56901 | Loss: 1.812 | 676 ms/step , 58143.22 GFLOP/s , 530023.8 tokens/s INFO:__main__:2024-10-26 23:28:58 | Epoch: 1 | Step: 24740 | Dataset: 0-64901 | Loss: 1.855 | 676 ms/step , 58122.35 GFLOP/s , 530516.5 tokens/s INFO:__main__:2024-10-26 23:29:05 | Epoch: 1 | Step: 24750 | Dataset: 0-72901 | Loss: 1.836 | 677 ms/step , 58065.78 GFLOP/s , 530248.8 tokens/s INFO:__main__:2024-10-26 23:29:13 | Epoch: 1 | Step: 24760 | Dataset: 0-80901 | Loss: 2.225 | 676 ms/step , 58114.34 GFLOP/s , 528526.0 tokens/s INFO:__main__:2024-10-26 23:29:21 | Epoch: 1 | Step: 24770 | Dataset: 0-88901 | Loss: 2.217 | 675 ms/step , 58223.75 GFLOP/s , 531119.8 tokens/s INFO:__main__:2024-10-26 23:29:29 | Epoch: 1 | Step: 24780 | Dataset: 0-96901 | Loss: 2.268 | 677 ms/step , 58100.50 GFLOP/s , 530880.0 tokens/s INFO:__main__:2024-10-26 23:29:36 | Epoch: 1 | Step: 24790 | Dataset: 0-104901 | Loss: 2.132 | 676 ms/step , 58118.45 GFLOP/s , 531209.7 tokens/s INFO:__main__:2024-10-26 23:29:44 | Epoch: 1 | Step: 24800 | Dataset: 0-112901 | Loss: 2.257 | 676 ms/step , 58157.50 GFLOP/s , 531371.9 tokens/s INFO:__main__:2024-10-26 23:29:52 | Epoch: 1 | Step: 24810 | Dataset: 0-120901 | Loss: 2.167 | 676 ms/step , 58112.68 GFLOP/s , 531367.6 tokens/s INFO:__main__:2024-10-26 23:29:59 | Epoch: 1 | Step: 24820 | Dataset: 0-128901 | Loss: 2.259 | 676 ms/step , 58174.32 GFLOP/s , 531468.7 tokens/s INFO:__main__:2024-10-26 23:30:07 | Epoch: 1 | Step: 24830 | Dataset: 0-136901 | Loss: 2.179 | 675 ms/step , 58215.60 GFLOP/s , 531798.4 tokens/s INFO:__main__:2024-10-26 23:30:15 | Epoch: 1 | Step: 24840 | Dataset: 0-144901 | Loss: 2.134 | 677 ms/step , 58082.62 GFLOP/s , 530934.6 tokens/s INFO:__main__:2024-10-26 23:30:23 | Epoch: 1 | Step: 24850 | Dataset: 0-152901 | Loss: 2.218 | 677 ms/step , 58069.87 GFLOP/s , 530593.3 tokens/s INFO:__main__:2024-10-26 23:30:30 | Epoch: 1 | Step: 24860 | Dataset: 0-160901 | Loss: 2.193 | 677 ms/step , 58038.65 GFLOP/s , 530716.2 tokens/s INFO:__main__:2024-10-26 23:30:38 | Epoch: 1 | Step: 24870 | Dataset: 0-168901 | Loss: 2.160 | 676 ms/step , 58127.42 GFLOP/s , 530831.9 tokens/s INFO:__main__:2024-10-26 23:30:46 | Epoch: 1 | Step: 24880 | Dataset: 0-176901 | Loss: 2.206 | 677 ms/step , 58071.05 GFLOP/s , 530788.5 tokens/s INFO:__main__:2024-10-26 23:30:53 | Epoch: 1 | Step: 24890 | Dataset: 0-184901 | Loss: 2.199 | 676 ms/step , 58149.82 GFLOP/s , 531359.2 tokens/s INFO:__main__:2024-10-26 23:31:01 | Epoch: 1 | Step: 24900 | Dataset: 0-192901 | Loss: 2.200 | 676 ms/step , 58109.91 GFLOP/s , 530235.6 tokens/s INFO:__main__:2024-10-26 23:31:09 | Epoch: 1 | Step: 24910 | Dataset: 0-200901 | Loss: 2.145 | 677 ms/step , 58065.97 GFLOP/s , 530607.0 tokens/s INFO:__main__:2024-10-26 23:31:17 | Epoch: 1 | Step: 24920 | Dataset: 0-208901 | Loss: 2.265 | 676 ms/step , 58169.25 GFLOP/s , 530787.1 tokens/s INFO:__main__:2024-10-26 23:31:24 | Epoch: 1 | Step: 24930 | Dataset: 0-216901 | Loss: 2.223 | 676 ms/step , 58123.61 GFLOP/s , 530812.0 tokens/s INFO:__main__:2024-10-26 23:31:32 | Epoch: 1 | Step: 24940 | Dataset: 0-224901 | Loss: 2.331 | 677 ms/step , 58025.28 GFLOP/s , 529988.2 tokens/s INFO:__main__:2024-10-26 23:31:40 | Epoch: 1 | Step: 24950 | Dataset: 0-232901 | Loss: 2.272 | 677 ms/step , 58061.37 GFLOP/s , 530368.8 tokens/s INFO:__main__:2024-10-26 23:31:48 | Epoch: 1 | Step: 24960 | Dataset: 0-240901 | Loss: 2.251 | 677 ms/step , 58032.07 GFLOP/s , 530153.9 tokens/s INFO:__main__:2024-10-26 23:31:55 | Epoch: 1 | Step: 24970 | Dataset: 0-248901 | Loss: 2.285 | 677 ms/step , 58029.94 GFLOP/s , 530282.0 tokens/s INFO:__main__:2024-10-26 23:32:03 | Epoch: 1 | Step: 24980 | Dataset: 0-256901 | Loss: 2.243 | 678 ms/step , 57957.02 GFLOP/s , 530162.1 tokens/s INFO:__main__:2024-10-26 23:32:11 | Epoch: 1 | Step: 24990 | Dataset: 0-264901 | Loss: 2.308 | 677 ms/step , 58102.97 GFLOP/s , 530833.1 tokens/s INFO:__main__:2024-10-26 23:32:18 | Validation | Step: 25000 | Val_loss: 2.307 | Best_val_loss: 2.2378 INFO:__main__:2024-10-26 23:32:18 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_233218_step_25000.pt` INFO:__main__:2024-10-26 23:32:19 | Epoch: 1 | Step: 25000 | Dataset: 0-272901 | Loss: 2.214 | 676 ms/step , 58128.65 GFLOP/s , 475078.9 tokens/s INFO:__main__:2024-10-26 23:32:27 | Epoch: 1 | Step: 25010 | Dataset: 0-280901 | Loss: 2.291 | 677 ms/step , 58081.36 GFLOP/s , 529785.6 tokens/s INFO:__main__:2024-10-26 23:32:35 | Epoch: 1 | Step: 25020 | Dataset: 0-288901 | Loss: 2.315 | 677 ms/step , 58082.96 GFLOP/s , 530104.8 tokens/s INFO:__main__:2024-10-26 23:32:42 | Epoch: 1 | Step: 25030 | Dataset: 0-296901 | Loss: 2.320 | 677 ms/step , 58075.32 GFLOP/s , 530208.5 tokens/s INFO:__main__:2024-10-26 23:32:50 | Epoch: 1 | Step: 25040 | Dataset: 0-304901 | Loss: 2.266 | 676 ms/step , 58188.68 GFLOP/s , 530513.6 tokens/s INFO:__main__:2024-10-26 23:32:58 | Epoch: 1 | Step: 25050 | Dataset: 0-312901 | Loss: 2.240 | 678 ms/step , 57966.20 GFLOP/s , 530396.1 tokens/s INFO:__main__:2024-10-26 23:33:06 | Epoch: 1 | Step: 25060 | Dataset: 0-320901 | Loss: 2.190 | 676 ms/step , 58130.87 GFLOP/s , 530732.3 tokens/s INFO:__main__:2024-10-26 23:33:13 | Epoch: 1 | Step: 25070 | Dataset: 0-328901 | Loss: 2.239 | 676 ms/step , 58169.33 GFLOP/s , 530342.8 tokens/s INFO:__main__:2024-10-26 23:33:21 | Epoch: 1 | Step: 25080 | Dataset: 0-336901 | Loss: 1.943 | 677 ms/step , 58065.59 GFLOP/s , 530509.7 tokens/s INFO:__main__:2024-10-26 23:33:29 | Epoch: 1 | Step: 25090 | Dataset: 0-344901 | Loss: 1.871 | 677 ms/step , 58075.03 GFLOP/s , 530396.6 tokens/s INFO:__main__:2024-10-26 23:33:37 | Epoch: 1 | Step: 25100 | Dataset: 0-352901 | Loss: 1.881 | 677 ms/step , 58066.87 GFLOP/s , 530434.7 tokens/s INFO:__main__:2024-10-26 23:33:44 | Epoch: 1 | Step: 25110 | Dataset: 0-360901 | Loss: 1.859 | 676 ms/step , 58142.06 GFLOP/s , 530659.5 tokens/s INFO:__main__:2024-10-26 23:33:52 | Epoch: 1 | Step: 25120 | Dataset: 0-368901 | Loss: 1.852 | 677 ms/step , 58077.43 GFLOP/s , 531192.6 tokens/s INFO:__main__:2024-10-26 23:34:00 | Epoch: 1 | Step: 25130 | Dataset: 0-376901 | Loss: 1.816 | 677 ms/step , 58029.88 GFLOP/s , 530412.6 tokens/s INFO:__main__:2024-10-26 23:34:07 | Epoch: 1 | Step: 25140 | Dataset: 0-384901 | Loss: 1.828 | 677 ms/step , 58097.66 GFLOP/s , 530475.3 tokens/s INFO:__main__:2024-10-26 23:34:15 | Epoch: 1 | Step: 25150 | Dataset: 0-392901 | Loss: 1.822 | 677 ms/step , 58048.49 GFLOP/s , 530430.1 tokens/s INFO:__main__:2024-10-26 23:34:23 | Epoch: 1 | Step: 25160 | Dataset: 0-400901 | Loss: 1.799 | 676 ms/step , 58113.25 GFLOP/s , 530599.5 tokens/s INFO:__main__:2024-10-26 23:34:31 | Epoch: 1 | Step: 25170 | Dataset: 0-408901 | Loss: 2.340 | 676 ms/step , 58118.70 GFLOP/s , 531455.8 tokens/s INFO:__main__:2024-10-26 23:34:38 | Epoch: 1 | Step: 25180 | Dataset: 0-416901 | Loss: 2.212 | 678 ms/step , 58020.61 GFLOP/s , 530718.1 tokens/s INFO:__main__:2024-10-26 23:34:46 | Epoch: 1 | Step: 25190 | Dataset: 0-424901 | Loss: 2.226 | 675 ms/step , 58207.93 GFLOP/s , 531257.7 tokens/s INFO:__main__:2024-10-26 23:34:54 | Epoch: 1 | Step: 25200 | Dataset: 0-432901 | Loss: 2.243 | 675 ms/step , 58204.35 GFLOP/s , 531484.3 tokens/s INFO:__main__:2024-10-26 23:35:01 | Epoch: 1 | Step: 25210 | Dataset: 0-440901 | Loss: 2.317 | 677 ms/step , 58030.61 GFLOP/s , 530973.0 tokens/s INFO:__main__:2024-10-26 23:35:09 | Epoch: 1 | Step: 25220 | Dataset: 0-448901 | Loss: 2.207 | 676 ms/step , 58128.06 GFLOP/s , 530701.2 tokens/s INFO:__main__:2024-10-26 23:35:17 | Epoch: 1 | Step: 25230 | Dataset: 0-456901 | Loss: 2.304 | 675 ms/step , 58277.00 GFLOP/s , 531141.5 tokens/s INFO:__main__:2024-10-26 23:35:25 | Epoch: 1 | Step: 25240 | Dataset: 0-464901 | Loss: 2.227 | 676 ms/step , 58180.94 GFLOP/s , 531421.0 tokens/s INFO:__main__:2024-10-26 23:35:32 | Epoch: 1 | Step: 25250 | Dataset: 0-472901 | Loss: 2.251 | 675 ms/step , 58251.06 GFLOP/s , 531581.2 tokens/s INFO:__main__:2024-10-26 23:35:40 | Epoch: 1 | Step: 25260 | Dataset: 0-480901 | Loss: 2.252 | 677 ms/step , 58065.28 GFLOP/s , 530997.7 tokens/s INFO:__main__:2024-10-26 23:35:48 | Epoch: 1 | Step: 25270 | Dataset: 0-488901 | Loss: 2.196 | 675 ms/step , 58199.32 GFLOP/s , 531303.1 tokens/s INFO:__main__:2024-10-26 23:35:55 | Epoch: 1 | Step: 25280 | Dataset: 0-496901 | Loss: 2.194 | 676 ms/step , 58115.49 GFLOP/s , 531142.2 tokens/s INFO:__main__:2024-10-26 23:36:03 | Epoch: 1 | Step: 25290 | Dataset: 0-504901 | Loss: 2.105 | 676 ms/step , 58123.67 GFLOP/s , 530779.8 tokens/s INFO:__main__:2024-10-26 23:36:11 | Epoch: 1 | Step: 25300 | Dataset: 0-512901 | Loss: 2.253 | 675 ms/step , 58250.31 GFLOP/s , 531665.7 tokens/s INFO:__main__:2024-10-26 23:36:19 | Epoch: 1 | Step: 25310 | Dataset: 0-520901 | Loss: 2.220 | 675 ms/step , 58204.13 GFLOP/s , 531258.7 tokens/s INFO:__main__:2024-10-26 23:36:26 | Epoch: 1 | Step: 25320 | Dataset: 0-528901 | Loss: 2.243 | 676 ms/step , 58145.73 GFLOP/s , 531080.4 tokens/s INFO:__main__:2024-10-26 23:36:34 | Epoch: 1 | Step: 25330 | Dataset: 0-536901 | Loss: 2.245 | 675 ms/step , 58204.00 GFLOP/s , 530855.2 tokens/s INFO:__main__:2024-10-26 23:36:42 | Epoch: 1 | Step: 25340 | Dataset: 0-544901 | Loss: 2.201 | 676 ms/step , 58182.14 GFLOP/s , 531003.7 tokens/s INFO:__main__:2024-10-26 23:36:49 | Epoch: 1 | Step: 25350 | Dataset: 0-552901 | Loss: 2.209 | 675 ms/step , 58239.27 GFLOP/s , 531016.2 tokens/s INFO:__main__:2024-10-26 23:36:57 | Epoch: 1 | Step: 25360 | Dataset: 0-560901 | Loss: 2.174 | 677 ms/step , 58067.43 GFLOP/s , 531124.9 tokens/s INFO:__main__:2024-10-26 23:37:05 | Epoch: 1 | Step: 25370 | Dataset: 0-568901 | Loss: 2.172 | 677 ms/step , 58082.17 GFLOP/s , 530601.1 tokens/s INFO:__main__:2024-10-26 23:37:13 | Epoch: 1 | Step: 25380 | Dataset: 0-576901 | Loss: 2.034 | 677 ms/step , 58029.87 GFLOP/s , 530856.4 tokens/s INFO:__main__:2024-10-26 23:37:20 | Epoch: 1 | Step: 25390 | Dataset: 0-584901 | Loss: 2.089 | 675 ms/step , 58204.48 GFLOP/s , 531252.4 tokens/s INFO:__main__:2024-10-26 23:37:28 | Epoch: 1 | Step: 25400 | Dataset: 0-592901 | Loss: 2.140 | 676 ms/step , 58161.72 GFLOP/s , 531372.6 tokens/s INFO:__main__:2024-10-26 23:37:36 | Epoch: 1 | Step: 25410 | Dataset: 0-600901 | Loss: 2.126 | 677 ms/step , 58099.70 GFLOP/s , 530666.4 tokens/s INFO:__main__:2024-10-26 23:37:43 | Epoch: 1 | Step: 25420 | Dataset: 0-608901 | Loss: 2.140 | 675 ms/step , 58245.77 GFLOP/s , 530916.9 tokens/s INFO:__main__:2024-10-26 23:37:51 | Epoch: 1 | Step: 25430 | Dataset: 0-616901 | Loss: 2.114 | 677 ms/step , 58098.27 GFLOP/s , 530893.6 tokens/s INFO:__main__:2024-10-26 23:37:59 | Epoch: 1 | Step: 25440 | Dataset: 0-624901 | Loss: 2.206 | 676 ms/step , 58109.77 GFLOP/s , 530521.2 tokens/s INFO:__main__:2024-10-26 23:38:07 | Epoch: 1 | Step: 25450 | Dataset: 0-632901 | Loss: 2.065 | 676 ms/step , 58133.08 GFLOP/s , 530659.4 tokens/s INFO:__main__:2024-10-26 23:38:14 | Epoch: 1 | Step: 25460 | Dataset: 0-640901 | Loss: 2.178 | 676 ms/step , 58137.91 GFLOP/s , 531111.7 tokens/s INFO:__main__:2024-10-26 23:38:22 | Epoch: 1 | Step: 25470 | Dataset: 0-648901 | Loss: 2.199 | 676 ms/step , 58114.64 GFLOP/s , 530864.9 tokens/s INFO:__main__:2024-10-26 23:38:30 | Epoch: 1 | Step: 25480 | Dataset: 0-656901 | Loss: 2.155 | 677 ms/step , 58099.78 GFLOP/s , 530519.6 tokens/s INFO:__main__:2024-10-26 23:38:37 | Epoch: 1 | Step: 25490 | Dataset: 0-664901 | Loss: 2.213 | 676 ms/step , 58179.15 GFLOP/s , 531049.9 tokens/s INFO:__main__:2024-10-26 23:38:45 | Epoch: 1 | Step: 25500 | Dataset: 0-672901 | Loss: 2.167 | 676 ms/step , 58113.54 GFLOP/s , 530879.1 tokens/s INFO:__main__:2024-10-26 23:38:53 | Epoch: 1 | Step: 25510 | Dataset: 0-680901 | Loss: 2.194 | 676 ms/step , 58111.86 GFLOP/s , 530438.9 tokens/s INFO:__main__:2024-10-26 23:39:01 | Epoch: 1 | Step: 25520 | Dataset: 0-688901 | Loss: 2.165 | 676 ms/step , 58116.78 GFLOP/s , 530540.2 tokens/s INFO:__main__:2024-10-26 23:39:08 | Epoch: 1 | Step: 25530 | Dataset: 0-696901 | Loss: 2.238 | 677 ms/step , 58088.14 GFLOP/s , 530268.7 tokens/s INFO:__main__:2024-10-26 23:39:16 | Epoch: 1 | Step: 25540 | Dataset: 0-704901 | Loss: 2.183 | 676 ms/step , 58189.46 GFLOP/s , 531070.0 tokens/s INFO:__main__:2024-10-26 23:39:24 | Epoch: 1 | Step: 25550 | Dataset: 0-712901 | Loss: 2.159 | 675 ms/step , 58234.20 GFLOP/s , 530995.9 tokens/s INFO:__main__:2024-10-26 23:39:31 | Epoch: 1 | Step: 25560 | Dataset: 0-720901 | Loss: 2.109 | 676 ms/step , 58130.25 GFLOP/s , 531307.3 tokens/s INFO:__main__:2024-10-26 23:39:39 | Epoch: 1 | Step: 25570 | Dataset: 0-728901 | Loss: 2.183 | 677 ms/step , 58100.03 GFLOP/s , 530918.7 tokens/s INFO:__main__:2024-10-26 23:39:47 | Epoch: 1 | Step: 25580 | Dataset: 0-736901 | Loss: 2.101 | 675 ms/step , 58238.53 GFLOP/s , 530692.6 tokens/s INFO:__main__:2024-10-26 23:39:55 | Epoch: 1 | Step: 25590 | Dataset: 0-744901 | Loss: 2.105 | 677 ms/step , 58095.93 GFLOP/s , 530914.4 tokens/s INFO:__main__:2024-10-26 23:40:02 | Epoch: 1 | Step: 25600 | Dataset: 0-752901 | Loss: 2.129 | 676 ms/step , 58157.70 GFLOP/s , 530847.7 tokens/s INFO:__main__:2024-10-26 23:40:10 | Epoch: 1 | Step: 25610 | Dataset: 0-760901 | Loss: 2.281 | 676 ms/step , 58143.28 GFLOP/s , 531249.9 tokens/s INFO:__main__:2024-10-26 23:40:18 | Epoch: 1 | Step: 25620 | Dataset: 0-768901 | Loss: 2.236 | 677 ms/step , 58060.64 GFLOP/s , 530739.5 tokens/s INFO:__main__:2024-10-26 23:40:25 | Epoch: 1 | Step: 25630 | Dataset: 0-776901 | Loss: 2.241 | 676 ms/step , 58183.61 GFLOP/s , 531339.2 tokens/s INFO:__main__:2024-10-26 23:40:33 | Epoch: 1 | Step: 25640 | Dataset: 0-784901 | Loss: 2.183 | 676 ms/step , 58189.14 GFLOP/s , 531370.6 tokens/s INFO:__main__:2024-10-26 23:40:41 | Epoch: 1 | Step: 25650 | Dataset: 0-792901 | Loss: 2.218 | 675 ms/step , 58204.01 GFLOP/s , 531446.3 tokens/s INFO:__main__:2024-10-26 23:40:49 | Epoch: 1 | Step: 25660 | Dataset: 0-800901 | Loss: 2.210 | 676 ms/step , 58144.99 GFLOP/s , 530600.0 tokens/s INFO:__main__:2024-10-26 23:40:56 | Epoch: 1 | Step: 25670 | Dataset: 0-808901 | Loss: 2.253 | 677 ms/step , 58053.73 GFLOP/s , 530805.3 tokens/s INFO:__main__:2024-10-26 23:41:04 | Epoch: 1 | Step: 25680 | Dataset: 0-816901 | Loss: 2.125 | 677 ms/step , 58092.13 GFLOP/s , 530857.5 tokens/s INFO:__main__:2024-10-26 23:41:12 | Epoch: 1 | Step: 25690 | Dataset: 0-824901 | Loss: 2.191 | 675 ms/step , 58205.80 GFLOP/s , 530774.5 tokens/s INFO:__main__:2024-10-26 23:41:19 | Epoch: 1 | Step: 25700 | Dataset: 0-832901 | Loss: 2.146 | 676 ms/step , 58142.56 GFLOP/s , 530692.4 tokens/s INFO:__main__:2024-10-26 23:41:27 | Epoch: 1 | Step: 25710 | Dataset: 0-840901 | Loss: 2.103 | 677 ms/step , 58082.57 GFLOP/s , 530570.5 tokens/s INFO:__main__:2024-10-26 23:41:35 | Epoch: 1 | Step: 25720 | Dataset: 0-848901 | Loss: 2.183 | 676 ms/step , 58143.22 GFLOP/s , 531398.2 tokens/s INFO:__main__:2024-10-26 23:41:43 | Epoch: 1 | Step: 25730 | Dataset: 0-856901 | Loss: 2.161 | 675 ms/step , 58225.39 GFLOP/s , 531409.1 tokens/s INFO:__main__:2024-10-26 23:41:50 | Epoch: 1 | Step: 25740 | Dataset: 0-864901 | Loss: 2.166 | 676 ms/step , 58142.64 GFLOP/s , 530431.0 tokens/s INFO:__main__:2024-10-26 23:41:58 | Epoch: 1 | Step: 25750 | Dataset: 0-872901 | Loss: 2.130 | 676 ms/step , 58185.84 GFLOP/s , 530723.7 tokens/s INFO:__main__:2024-10-26 23:42:06 | Epoch: 1 | Step: 25760 | Dataset: 0-880901 | Loss: 2.179 | 676 ms/step , 58171.45 GFLOP/s , 531332.3 tokens/s INFO:__main__:2024-10-26 23:42:13 | Epoch: 1 | Step: 25770 | Dataset: 0-888901 | Loss: 2.206 | 675 ms/step , 58197.35 GFLOP/s , 531172.3 tokens/s INFO:__main__:2024-10-26 23:42:21 | Epoch: 1 | Step: 25780 | Dataset: 0-896901 | Loss: 2.165 | 678 ms/step , 58010.76 GFLOP/s , 530987.9 tokens/s INFO:__main__:2024-10-26 23:42:29 | Epoch: 1 | Step: 25790 | Dataset: 0-904901 | Loss: 2.042 | 678 ms/step , 57990.34 GFLOP/s , 530282.7 tokens/s INFO:__main__:2024-10-26 23:42:37 | Epoch: 1 | Step: 25800 | Dataset: 0-912901 | Loss: 2.137 | 676 ms/step , 58124.97 GFLOP/s , 530657.6 tokens/s INFO:__main__:2024-10-26 23:42:44 | Epoch: 1 | Step: 25810 | Dataset: 0-920901 | Loss: 2.137 | 677 ms/step , 58095.34 GFLOP/s , 530742.0 tokens/s INFO:__main__:2024-10-26 23:42:52 | Epoch: 1 | Step: 25820 | Dataset: 0-928901 | Loss: 2.177 | 678 ms/step , 58017.60 GFLOP/s , 530532.1 tokens/s INFO:__main__:2024-10-26 23:43:00 | Epoch: 1 | Step: 25830 | Dataset: 0-936901 | Loss: 2.176 | 676 ms/step , 58162.25 GFLOP/s , 531155.5 tokens/s INFO:__main__:2024-10-26 23:43:07 | Epoch: 1 | Step: 25840 | Dataset: 0-944901 | Loss: 2.104 | 675 ms/step , 58195.47 GFLOP/s , 531127.6 tokens/s INFO:__main__:2024-10-26 23:43:15 | Epoch: 1 | Step: 25850 | Dataset: 0-952901 | Loss: 2.237 | 677 ms/step , 58082.12 GFLOP/s , 530863.0 tokens/s INFO:__main__:2024-10-26 23:43:23 | Epoch: 1 | Step: 25860 | Dataset: 0-960901 | Loss: 2.230 | 676 ms/step , 58143.25 GFLOP/s , 530987.8 tokens/s INFO:__main__:2024-10-26 23:43:31 | Epoch: 1 | Step: 25870 | Dataset: 0-968901 | Loss: 2.154 | 675 ms/step , 58201.24 GFLOP/s , 531658.3 tokens/s INFO:__main__:2024-10-26 23:43:38 | Epoch: 1 | Step: 25880 | Dataset: 0-976901 | Loss: 2.082 | 676 ms/step , 58170.88 GFLOP/s , 531122.0 tokens/s INFO:__main__:2024-10-26 23:43:46 | Epoch: 1 | Step: 25890 | Dataset: 0-984901 | Loss: 2.193 | 677 ms/step , 58061.06 GFLOP/s , 530762.9 tokens/s INFO:__main__:2024-10-26 23:43:54 | Epoch: 1 | Step: 25900 | Dataset: 0-992901 | Loss: 2.270 | 675 ms/step , 58193.54 GFLOP/s , 530943.5 tokens/s INFO:__main__:2024-10-26 23:44:01 | Epoch: 1 | Step: 25910 | Dataset: 0-1000901 | Loss: 2.194 | 676 ms/step , 58137.10 GFLOP/s , 530979.9 tokens/s INFO:__main__:2024-10-26 23:44:09 | Epoch: 1 | Step: 25920 | Dataset: 0-1008901 | Loss: 2.160 | 676 ms/step , 58140.39 GFLOP/s , 530624.0 tokens/s INFO:__main__:2024-10-26 23:44:17 | Epoch: 1 | Step: 25930 | Dataset: 0-1016901 | Loss: 2.149 | 677 ms/step , 58048.38 GFLOP/s , 530251.3 tokens/s INFO:__main__:2024-10-26 23:44:25 | Epoch: 1 | Step: 25940 | Dataset: 0-1024901 | Loss: 2.132 | 676 ms/step , 58137.55 GFLOP/s , 531236.9 tokens/s INFO:__main__:2024-10-26 23:44:32 | Epoch: 1 | Step: 25950 | Dataset: 0-1032901 | Loss: 2.187 | 676 ms/step , 58151.22 GFLOP/s , 530815.1 tokens/s INFO:__main__:2024-10-26 23:44:40 | Epoch: 1 | Step: 25960 | Dataset: 0-1040901 | Loss: 2.134 | 676 ms/step , 58148.47 GFLOP/s , 530910.6 tokens/s INFO:__main__:2024-10-26 23:44:48 | Epoch: 1 | Step: 25970 | Dataset: 0-1048901 | Loss: 2.244 | 675 ms/step , 58231.61 GFLOP/s , 531073.3 tokens/s INFO:__main__:2024-10-26 23:44:55 | Epoch: 1 | Step: 25980 | Dataset: 0-1056901 | Loss: 1.959 | 676 ms/step , 58111.22 GFLOP/s , 531329.0 tokens/s INFO:__main__:2024-10-26 23:45:03 | Epoch: 1 | Step: 25990 | Dataset: 0-1064901 | Loss: 1.884 | 676 ms/step , 58150.35 GFLOP/s , 530658.0 tokens/s INFO:__main__:2024-10-26 23:45:10 | Validation | Step: 26000 | Val_loss: 2.691 | Best_val_loss: 2.2378 INFO:__main__:2024-10-26 23:45:10 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_234510_step_26000.pt` INFO:__main__:2024-10-26 23:45:12 | Epoch: 1 | Step: 26000 | Dataset: 0-1072901 | Loss: 1.860 | 676 ms/step , 58179.12 GFLOP/s , 477803.6 tokens/s INFO:__main__:2024-10-26 23:45:19 | Epoch: 1 | Step: 26010 | Dataset: 0-1080901 | Loss: 1.890 | 676 ms/step , 58133.51 GFLOP/s , 530342.6 tokens/s INFO:__main__:2024-10-26 23:45:27 | Epoch: 1 | Step: 26020 | Dataset: 0-1088901 | Loss: 1.829 | 679 ms/step , 57898.25 GFLOP/s , 530255.2 tokens/s INFO:__main__:2024-10-26 23:45:35 | Epoch: 1 | Step: 26030 | Dataset: 0-1096901 | Loss: 1.836 | 678 ms/step , 57944.39 GFLOP/s , 529634.6 tokens/s INFO:__main__:2024-10-26 23:45:43 | Epoch: 1 | Step: 26040 | Dataset: 0-1104901 | Loss: 1.798 | 676 ms/step , 58123.11 GFLOP/s , 530754.2 tokens/s INFO:__main__:2024-10-26 23:45:50 | Epoch: 1 | Step: 26050 | Dataset: 0-1112901 | Loss: 1.834 | 676 ms/step , 58169.42 GFLOP/s , 530698.1 tokens/s INFO:__main__:2024-10-26 23:45:58 | Epoch: 1 | Step: 26060 | Dataset: 0-1120901 | Loss: 2.415 | 677 ms/step , 58085.44 GFLOP/s , 530048.6 tokens/s INFO:__main__:2024-10-26 23:46:06 | Epoch: 1 | Step: 26070 | Dataset: 0-1128901 | Loss: 2.313 | 676 ms/step , 58109.61 GFLOP/s , 529193.0 tokens/s INFO:__main__:2024-10-26 23:46:14 | Epoch: 1 | Step: 26080 | Dataset: 0-1136901 | Loss: 2.256 | 677 ms/step , 58040.52 GFLOP/s , 526365.6 tokens/s INFO:__main__:2024-10-26 23:46:21 | Epoch: 1 | Step: 26090 | Dataset: 0-1144901 | Loss: 2.257 | 678 ms/step , 58010.79 GFLOP/s , 524398.1 tokens/s INFO:__main__:2024-10-26 23:46:29 | Epoch: 1 | Step: 26100 | Dataset: 0-1152901 | Loss: 2.254 | 676 ms/step , 58122.53 GFLOP/s , 528385.2 tokens/s INFO:__main__:2024-10-26 23:46:37 | Epoch: 1 | Step: 26110 | Dataset: 0-1160901 | Loss: 2.240 | 677 ms/step , 58036.01 GFLOP/s , 530387.4 tokens/s INFO:__main__:2024-10-26 23:46:45 | Epoch: 1 | Step: 26120 | Dataset: 0-1168901 | Loss: 2.192 | 677 ms/step , 58039.56 GFLOP/s , 530333.1 tokens/s INFO:__main__:2024-10-26 23:46:52 | Epoch: 1 | Step: 26130 | Dataset: 0-1176901 | Loss: 2.185 | 677 ms/step , 58082.85 GFLOP/s , 529571.2 tokens/s INFO:__main__:2024-10-26 23:47:00 | Epoch: 1 | Step: 26140 | Dataset: 0-1184901 | Loss: 2.191 | 677 ms/step , 58076.40 GFLOP/s , 530083.8 tokens/s INFO:__main__:2024-10-26 23:47:08 | Epoch: 1 | Step: 26150 | Dataset: 0-1192901 | Loss: 2.222 | 679 ms/step , 57933.85 GFLOP/s , 529162.1 tokens/s INFO:__main__:2024-10-26 23:47:16 | Epoch: 1 | Step: 26160 | Dataset: 0-1200901 | Loss: 2.184 | 678 ms/step , 57973.80 GFLOP/s , 528513.3 tokens/s INFO:__main__:2024-10-26 23:47:23 | Epoch: 1 | Step: 26170 | Dataset: 0-1208901 | Loss: 2.169 | 678 ms/step , 57938.25 GFLOP/s , 526566.2 tokens/s INFO:__main__:2024-10-26 23:47:31 | Epoch: 1 | Step: 26180 | Dataset: 0-1216901 | Loss: 2.310 | 677 ms/step , 58079.52 GFLOP/s , 529782.6 tokens/s INFO:__main__:2024-10-26 23:47:39 | Epoch: 1 | Step: 26190 | Dataset: 0-1224901 | Loss: 2.248 | 677 ms/step , 58062.03 GFLOP/s , 530502.7 tokens/s INFO:__main__:2024-10-26 23:47:46 | Epoch: 1 | Step: 26200 | Dataset: 0-1232901 | Loss: 2.186 | 675 ms/step , 58224.36 GFLOP/s , 531327.0 tokens/s INFO:__main__:2024-10-26 23:47:54 | Epoch: 1 | Step: 26210 | Dataset: 0-1240901 | Loss: 2.268 | 677 ms/step , 58091.95 GFLOP/s , 531387.9 tokens/s INFO:__main__:2024-10-26 23:48:02 | Epoch: 1 | Step: 26220 | Dataset: 0-1248901 | Loss: 2.226 | 677 ms/step , 58086.44 GFLOP/s , 531177.9 tokens/s INFO:__main__:2024-10-26 23:48:10 | Epoch: 1 | Step: 26230 | Dataset: 0-1256901 | Loss: 2.201 | 677 ms/step , 58103.01 GFLOP/s , 531013.5 tokens/s INFO:__main__:2024-10-26 23:48:17 | Epoch: 1 | Step: 26240 | Dataset: 0-1264901 | Loss: 2.192 | 675 ms/step , 58216.42 GFLOP/s , 531626.9 tokens/s INFO:__main__:2024-10-26 23:48:25 | Epoch: 1 | Step: 26250 | Dataset: 0-1272901 | Loss: 2.210 | 677 ms/step , 58065.50 GFLOP/s , 531251.8 tokens/s INFO:__main__:2024-10-26 23:48:33 | Epoch: 1 | Step: 26260 | Dataset: 0-1280901 | Loss: 2.257 | 677 ms/step , 58081.80 GFLOP/s , 530923.4 tokens/s INFO:__main__:2024-10-26 23:48:40 | Epoch: 1 | Step: 26270 | Dataset: 0-1288901 | Loss: 2.275 | 676 ms/step , 58181.04 GFLOP/s , 531086.3 tokens/s INFO:__main__:2024-10-26 23:48:48 | Epoch: 1 | Step: 26280 | Dataset: 0-1296901 | Loss: 2.245 | 677 ms/step , 58050.99 GFLOP/s , 531087.6 tokens/s INFO:__main__:2024-10-26 23:48:56 | Epoch: 1 | Step: 26290 | Dataset: 0-1304901 | Loss: 2.239 | 675 ms/step , 58240.63 GFLOP/s , 532218.2 tokens/s INFO:__main__:2024-10-26 23:49:04 | Epoch: 1 | Step: 26300 | Dataset: 0-1312901 | Loss: 2.218 | 676 ms/step , 58189.01 GFLOP/s , 531198.0 tokens/s INFO:__main__:2024-10-26 23:49:11 | Epoch: 1 | Step: 26310 | Dataset: 0-1320901 | Loss: 2.203 | 677 ms/step , 58090.42 GFLOP/s , 531307.1 tokens/s INFO:__main__:2024-10-26 23:49:19 | Epoch: 1 | Step: 26320 | Dataset: 0-1328901 | Loss: 2.208 | 676 ms/step , 58180.12 GFLOP/s , 531094.5 tokens/s INFO:__main__:2024-10-26 23:49:27 | Epoch: 1 | Step: 26330 | Dataset: 0-1336901 | Loss: 2.168 | 675 ms/step , 58231.95 GFLOP/s , 531469.1 tokens/s INFO:__main__:2024-10-26 23:49:34 | Epoch: 1 | Step: 26340 | Dataset: 0-1344901 | Loss: 2.182 | 676 ms/step , 58159.08 GFLOP/s , 531098.1 tokens/s INFO:__main__:2024-10-26 23:49:42 | Epoch: 1 | Step: 26350 | Dataset: 0-1352901 | Loss: 2.261 | 676 ms/step , 58124.50 GFLOP/s , 531306.0 tokens/s INFO:__main__:2024-10-26 23:49:50 | Epoch: 1 | Step: 26360 | Dataset: 0-1360901 | Loss: 2.182 | 677 ms/step , 58054.79 GFLOP/s , 531555.0 tokens/s INFO:__main__:2024-10-26 23:49:58 | Epoch: 1 | Step: 26370 | Dataset: 0-1368901 | Loss: 2.178 | 677 ms/step , 58094.24 GFLOP/s , 531324.5 tokens/s INFO:__main__:2024-10-26 23:50:05 | Epoch: 1 | Step: 26380 | Dataset: 0-1376901 | Loss: 2.226 | 676 ms/step , 58126.92 GFLOP/s , 531811.9 tokens/s INFO:__main__:2024-10-26 23:50:13 | Epoch: 1 | Step: 26390 | Dataset: 0-1384901 | Loss: 2.228 | 676 ms/step , 58164.93 GFLOP/s , 531721.7 tokens/s INFO:__main__:2024-10-26 23:50:21 | Epoch: 1 | Step: 26400 | Dataset: 0-1392901 | Loss: 2.220 | 675 ms/step , 58225.07 GFLOP/s , 531715.2 tokens/s INFO:__main__:2024-10-26 23:50:28 | Epoch: 1 | Step: 26410 | Dataset: 0-1400901 | Loss: 2.255 | 675 ms/step , 58196.51 GFLOP/s , 532285.2 tokens/s INFO:__main__:2024-10-26 23:50:36 | Epoch: 1 | Step: 26420 | Dataset: 0-1408901 | Loss: 2.215 | 676 ms/step , 58124.07 GFLOP/s , 531711.4 tokens/s INFO:__main__:2024-10-26 23:50:44 | Epoch: 1 | Step: 26430 | Dataset: 0-1416901 | Loss: 2.218 | 677 ms/step , 58081.89 GFLOP/s , 530776.6 tokens/s INFO:__main__:2024-10-26 23:50:51 | Epoch: 1 | Step: 26440 | Dataset: 0-1424901 | Loss: 2.285 | 677 ms/step , 58089.15 GFLOP/s , 531148.4 tokens/s INFO:__main__:2024-10-26 23:50:59 | Epoch: 1 | Step: 26450 | Dataset: 0-1432901 | Loss: 2.189 | 676 ms/step , 58170.34 GFLOP/s , 530658.2 tokens/s INFO:__main__:2024-10-26 23:51:07 | Epoch: 1 | Step: 26460 | Dataset: 0-1440901 | Loss: 2.267 | 675 ms/step , 58203.61 GFLOP/s , 531679.7 tokens/s INFO:__main__:2024-10-26 23:51:15 | Epoch: 1 | Step: 26470 | Dataset: 0-1448901 | Loss: 2.196 | 676 ms/step , 58121.55 GFLOP/s , 531612.5 tokens/s INFO:__main__:2024-10-26 23:51:22 | Epoch: 1 | Step: 26480 | Dataset: 0-1456901 | Loss: 2.309 | 676 ms/step , 58116.03 GFLOP/s , 530986.0 tokens/s INFO:__main__:2024-10-26 23:51:30 | Epoch: 1 | Step: 26490 | Dataset: 0-1464901 | Loss: 2.158 | 677 ms/step , 58066.53 GFLOP/s , 531040.8 tokens/s INFO:__main__:2024-10-26 23:51:38 | Epoch: 1 | Step: 26500 | Dataset: 0-1472901 | Loss: 2.094 | 677 ms/step , 58073.71 GFLOP/s , 530892.7 tokens/s INFO:__main__:2024-10-26 23:51:45 | Epoch: 1 | Step: 26510 | Dataset: 0-1480901 | Loss: 2.219 | 676 ms/step , 58171.19 GFLOP/s , 531246.3 tokens/s INFO:__main__:2024-10-26 23:51:53 | Epoch: 1 | Step: 26520 | Dataset: 0-1488901 | Loss: 2.203 | 676 ms/step , 58178.59 GFLOP/s , 531415.8 tokens/s INFO:__main__:2024-10-26 23:52:01 | Epoch: 1 | Step: 26530 | Dataset: 0-1496901 | Loss: 2.150 | 677 ms/step , 58069.77 GFLOP/s , 530605.7 tokens/s INFO:__main__:2024-10-26 23:52:09 | Epoch: 1 | Step: 26540 | Dataset: 0-1504901 | Loss: 2.170 | 677 ms/step , 58030.93 GFLOP/s , 530705.6 tokens/s INFO:__main__:2024-10-26 23:52:16 | Epoch: 1 | Step: 26550 | Dataset: 0-1512901 | Loss: 1.963 | 675 ms/step , 58272.82 GFLOP/s , 530847.7 tokens/s INFO:__main__:2024-10-26 23:52:24 | Epoch: 1 | Step: 26560 | Dataset: 0-1520901 | Loss: 1.874 | 675 ms/step , 58232.60 GFLOP/s , 531106.2 tokens/s INFO:__main__:2024-10-26 23:52:32 | Epoch: 1 | Step: 26570 | Dataset: 0-1528901 | Loss: 1.889 | 677 ms/step , 58030.39 GFLOP/s , 530602.0 tokens/s INFO:__main__:2024-10-26 23:52:39 | Epoch: 1 | Step: 26580 | Dataset: 0-1536901 | Loss: 1.852 | 677 ms/step , 58097.43 GFLOP/s , 530669.1 tokens/s INFO:__main__:2024-10-26 23:52:47 | Epoch: 1 | Step: 26590 | Dataset: 0-1544901 | Loss: 1.817 | 677 ms/step , 58085.76 GFLOP/s , 529993.3 tokens/s INFO:__main__:2024-10-26 23:52:55 | Epoch: 1 | Step: 26600 | Dataset: 0-1552901 | Loss: 1.802 | 676 ms/step , 58177.41 GFLOP/s , 531159.5 tokens/s INFO:__main__:2024-10-26 23:53:03 | Epoch: 1 | Step: 26610 | Dataset: 0-1560901 | Loss: 1.821 | 676 ms/step , 58149.89 GFLOP/s , 531316.5 tokens/s INFO:__main__:2024-10-26 23:53:10 | Epoch: 1 | Step: 26620 | Dataset: 0-1568901 | Loss: 1.811 | 675 ms/step , 58270.50 GFLOP/s , 531059.1 tokens/s INFO:__main__:2024-10-26 23:53:18 | Epoch: 1 | Step: 26630 | Dataset: 0-1576901 | Loss: 2.508 | 676 ms/step , 58134.92 GFLOP/s , 530756.9 tokens/s INFO:__main__:2024-10-26 23:53:26 | Epoch: 1 | Step: 26640 | Dataset: 0-1584901 | Loss: 2.273 | 675 ms/step , 58204.48 GFLOP/s , 531420.0 tokens/s INFO:__main__:2024-10-26 23:53:33 | Epoch: 1 | Step: 26650 | Dataset: 0-1592901 | Loss: 2.212 | 676 ms/step , 58111.82 GFLOP/s , 531741.0 tokens/s INFO:__main__:2024-10-26 23:53:41 | Epoch: 1 | Step: 26660 | Dataset: 0-1600901 | Loss: 2.140 | 674 ms/step , 58308.39 GFLOP/s , 524787.1 tokens/s INFO:__main__:2024-10-26 23:53:49 | Epoch: 1 | Step: 26670 | Dataset: 0-1608901 | Loss: 2.286 | 676 ms/step , 58110.46 GFLOP/s , 531918.1 tokens/s INFO:__main__:2024-10-26 23:53:57 | Epoch: 1 | Step: 26680 | Dataset: 0-1616901 | Loss: 2.156 | 675 ms/step , 58245.99 GFLOP/s , 531330.1 tokens/s INFO:__main__:2024-10-26 23:54:04 | Epoch: 1 | Step: 26690 | Dataset: 0-1624901 | Loss: 2.220 | 676 ms/step , 58147.38 GFLOP/s , 531516.3 tokens/s INFO:__main__:2024-10-26 23:54:12 | Epoch: 1 | Step: 26700 | Dataset: 0-1632901 | Loss: 2.198 | 676 ms/step , 58134.10 GFLOP/s , 530823.4 tokens/s INFO:__main__:2024-10-26 23:54:20 | Epoch: 1 | Step: 26710 | Dataset: 0-1640901 | Loss: 2.258 | 675 ms/step , 58239.14 GFLOP/s , 531452.1 tokens/s INFO:__main__:2024-10-26 23:54:27 | Epoch: 1 | Step: 26720 | Dataset: 0-1648901 | Loss: 2.239 | 675 ms/step , 58259.94 GFLOP/s , 531693.0 tokens/s INFO:__main__:2024-10-26 23:54:35 | Epoch: 1 | Step: 26730 | Dataset: 0-1656901 | Loss: 2.190 | 675 ms/step , 58194.23 GFLOP/s , 531634.5 tokens/s INFO:__main__:2024-10-26 23:54:43 | Epoch: 1 | Step: 26740 | Dataset: 0-1664901 | Loss: 2.207 | 676 ms/step , 58111.53 GFLOP/s , 531171.1 tokens/s INFO:__main__:2024-10-26 23:54:51 | Epoch: 1 | Step: 26750 | Dataset: 0-1672901 | Loss: 2.204 | 675 ms/step , 58216.43 GFLOP/s , 532493.4 tokens/s INFO:__main__:2024-10-26 23:54:58 | Epoch: 1 | Step: 26760 | Dataset: 0-1680901 | Loss: 2.300 | 674 ms/step , 58328.48 GFLOP/s , 532244.2 tokens/s INFO:__main__:2024-10-26 23:55:06 | Epoch: 1 | Step: 26770 | Dataset: 0-1688901 | Loss: 2.245 | 677 ms/step , 58037.16 GFLOP/s , 531056.4 tokens/s INFO:__main__:2024-10-26 23:55:14 | Epoch: 1 | Step: 26780 | Dataset: 0-1696901 | Loss: 2.196 | 676 ms/step , 58180.01 GFLOP/s , 531677.7 tokens/s INFO:__main__:2024-10-26 23:55:21 | Epoch: 1 | Step: 26790 | Dataset: 0-1704901 | Loss: 2.214 | 676 ms/step , 58148.78 GFLOP/s , 530882.2 tokens/s INFO:__main__:2024-10-26 23:55:29 | Epoch: 1 | Step: 26800 | Dataset: 0-1712901 | Loss: 2.269 | 677 ms/step , 58057.65 GFLOP/s , 530790.8 tokens/s INFO:__main__:2024-10-26 23:55:37 | Epoch: 1 | Step: 26810 | Dataset: 0-1720901 | Loss: 2.284 | 676 ms/step , 58170.65 GFLOP/s , 531065.4 tokens/s INFO:__main__:2024-10-26 23:55:45 | Epoch: 1 | Step: 26820 | Dataset: 0-1728901 | Loss: 2.282 | 676 ms/step , 58134.84 GFLOP/s , 531464.9 tokens/s INFO:__main__:2024-10-26 23:55:52 | Epoch: 1 | Step: 26830 | Dataset: 0-1736901 | Loss: 2.318 | 675 ms/step , 58225.43 GFLOP/s , 531451.2 tokens/s INFO:__main__:2024-10-26 23:56:00 | Epoch: 1 | Step: 26840 | Dataset: 0-1744901 | Loss: 2.315 | 676 ms/step , 58124.13 GFLOP/s , 530984.8 tokens/s INFO:__main__:2024-10-26 23:56:08 | Epoch: 1 | Step: 26850 | Dataset: 0-1752901 | Loss: 2.236 | 678 ms/step , 58020.35 GFLOP/s , 530773.9 tokens/s INFO:__main__:2024-10-26 23:56:15 | Epoch: 1 | Step: 26860 | Dataset: 0-1760901 | Loss: 2.302 | 676 ms/step , 58109.44 GFLOP/s , 531129.3 tokens/s INFO:__main__:2024-10-26 23:56:23 | Epoch: 1 | Step: 26870 | Dataset: 0-1768901 | Loss: 2.309 | 676 ms/step , 58146.04 GFLOP/s , 531422.2 tokens/s INFO:__main__:2024-10-26 23:56:31 | Epoch: 1 | Step: 26880 | Dataset: 0-1776901 | Loss: 2.287 | 676 ms/step , 58127.61 GFLOP/s , 530882.8 tokens/s INFO:__main__:2024-10-26 23:56:39 | Epoch: 1 | Step: 26890 | Dataset: 0-1784901 | Loss: 2.267 | 675 ms/step , 58217.95 GFLOP/s , 531561.8 tokens/s INFO:__main__:2024-10-26 23:56:46 | Epoch: 1 | Step: 26900 | Dataset: 0-1792901 | Loss: 2.185 | 677 ms/step , 58096.84 GFLOP/s , 530872.7 tokens/s INFO:__main__:2024-10-26 23:56:54 | Epoch: 1 | Step: 26910 | Dataset: 0-1800901 | Loss: 2.189 | 675 ms/step , 58206.88 GFLOP/s , 531539.7 tokens/s INFO:__main__:2024-10-26 23:57:02 | Epoch: 1 | Step: 26920 | Dataset: 0-1808901 | Loss: 2.195 | 676 ms/step , 58181.74 GFLOP/s , 531026.8 tokens/s INFO:__main__:2024-10-26 23:57:09 | Epoch: 1 | Step: 26930 | Dataset: 0-1816901 | Loss: 2.247 | 676 ms/step , 58155.31 GFLOP/s , 531704.5 tokens/s INFO:__main__:2024-10-26 23:57:17 | Epoch: 1 | Step: 26940 | Dataset: 0-1824901 | Loss: 2.275 | 676 ms/step , 58125.78 GFLOP/s , 531054.7 tokens/s INFO:__main__:2024-10-26 23:57:25 | Epoch: 1 | Step: 26950 | Dataset: 0-1832901 | Loss: 2.133 | 676 ms/step , 58152.72 GFLOP/s , 531418.4 tokens/s INFO:__main__:2024-10-26 23:57:33 | Epoch: 1 | Step: 26960 | Dataset: 0-1840901 | Loss: 1.883 | 676 ms/step , 58158.10 GFLOP/s , 530472.7 tokens/s INFO:__main__:2024-10-26 23:57:40 | Epoch: 1 | Step: 26970 | Dataset: 0-1848901 | Loss: 1.869 | 676 ms/step , 58187.47 GFLOP/s , 531039.9 tokens/s INFO:__main__:2024-10-26 23:57:48 | Epoch: 1 | Step: 26980 | Dataset: 0-1856901 | Loss: 1.833 | 676 ms/step , 58122.95 GFLOP/s , 530939.1 tokens/s INFO:__main__:2024-10-26 23:57:56 | Epoch: 1 | Step: 26990 | Dataset: 0-1864901 | Loss: 1.798 | 676 ms/step , 58132.95 GFLOP/s , 531372.7 tokens/s INFO:__main__:2024-10-26 23:58:03 | Validation | Step: 27000 | Val_loss: 2.607 | Best_val_loss: 2.2378 INFO:__main__:2024-10-26 23:58:03 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241026_235803_step_27000.pt` INFO:__main__:2024-10-26 23:58:04 | Epoch: 1 | Step: 27000 | Dataset: 0-1872901 | Loss: 1.798 | 674 ms/step , 58286.58 GFLOP/s , 477077.0 tokens/s INFO:__main__:2024-10-26 23:58:12 | Epoch: 1 | Step: 27010 | Dataset: 0-1880901 | Loss: 1.756 | 676 ms/step , 58124.90 GFLOP/s , 530978.3 tokens/s INFO:__main__:2024-10-26 23:58:20 | Epoch: 1 | Step: 27020 | Dataset: 0-1888901 | Loss: 1.770 | 676 ms/step , 58136.27 GFLOP/s , 530988.1 tokens/s INFO:__main__:2024-10-26 23:58:27 | Epoch: 1 | Step: 27030 | Dataset: 0-1896901 | Loss: 1.789 | 675 ms/step , 58194.06 GFLOP/s , 531331.4 tokens/s INFO:__main__:2024-10-26 23:58:35 | Epoch: 1 | Step: 27040 | Dataset: 0-1904901 | Loss: 2.479 | 676 ms/step , 58147.05 GFLOP/s , 531323.3 tokens/s INFO:__main__:2024-10-26 23:58:43 | Epoch: 1 | Step: 27050 | Dataset: 0-1912901 | Loss: 2.298 | 676 ms/step , 58144.06 GFLOP/s , 531475.8 tokens/s INFO:__main__:2024-10-26 23:58:51 | Epoch: 1 | Step: 27060 | Dataset: 0-1920901 | Loss: 2.320 | 676 ms/step , 58144.16 GFLOP/s , 531802.3 tokens/s INFO:__main__:2024-10-26 23:58:58 | Epoch: 1 | Step: 27070 | Dataset: 0-1928901 | Loss: 2.198 | 676 ms/step , 58124.44 GFLOP/s , 532003.7 tokens/s INFO:__main__:2024-10-26 23:59:06 | Epoch: 1 | Step: 27080 | Dataset: 0-1936901 | Loss: 2.216 | 676 ms/step , 58149.17 GFLOP/s , 531562.7 tokens/s INFO:__main__:2024-10-26 23:59:14 | Epoch: 1 | Step: 27090 | Dataset: 0-1944901 | Loss: 2.097 | 676 ms/step , 58176.76 GFLOP/s , 531882.4 tokens/s INFO:__main__:2024-10-26 23:59:21 | Epoch: 1 | Step: 27100 | Dataset: 0-1952901 | Loss: 2.245 | 676 ms/step , 58150.12 GFLOP/s , 531520.5 tokens/s INFO:__main__:2024-10-26 23:59:29 | Epoch: 1 | Step: 27110 | Dataset: 0-1960901 | Loss: 2.143 | 676 ms/step , 58180.97 GFLOP/s , 531411.1 tokens/s INFO:__main__:2024-10-26 23:59:37 | Epoch: 1 | Step: 27120 | Dataset: 0-1968901 | Loss: 2.125 | 676 ms/step , 58189.17 GFLOP/s , 530389.4 tokens/s INFO:__main__:2024-10-26 23:59:44 | Epoch: 1 | Step: 27130 | Dataset: 0-1976901 | Loss: 2.220 | 676 ms/step , 58173.90 GFLOP/s , 531114.9 tokens/s INFO:__main__:2024-10-26 23:59:52 | Epoch: 1 | Step: 27140 | Dataset: 0-1984901 | Loss: 2.279 | 674 ms/step , 58296.43 GFLOP/s , 531840.6 tokens/s INFO:__main__:2024-10-27 00:00:00 | Epoch: 1 | Step: 27150 | Dataset: 0-1992901 | Loss: 2.109 | 676 ms/step , 58181.32 GFLOP/s , 531852.2 tokens/s INFO:__main__:2024-10-27 00:00:07 | Epoch: 1 | Step: 27160 | Dataset: 0-2000901 | Loss: 2.072 | 676 ms/step , 58142.37 GFLOP/s , 552003.7 tokens/s INFO:__main__:2024-10-27 00:00:15 | Epoch: 1 | Step: 27170 | Dataset: 0-2008901 | Loss: 2.287 | 676 ms/step , 58117.60 GFLOP/s , 531242.3 tokens/s INFO:__main__:2024-10-27 00:00:23 | Epoch: 1 | Step: 27180 | Dataset: 0-2016901 | Loss: 2.134 | 677 ms/step , 58095.74 GFLOP/s , 531569.1 tokens/s INFO:__main__:2024-10-27 00:00:30 | Epoch: 1 | Step: 27190 | Dataset: 0-2024901 | Loss: 2.133 | 675 ms/step , 58241.13 GFLOP/s , 532502.3 tokens/s INFO:__main__:2024-10-27 00:00:38 | Epoch: 1 | Step: 27200 | Dataset: 0-2032901 | Loss: 2.133 | 674 ms/step , 58323.45 GFLOP/s , 532240.1 tokens/s INFO:__main__:2024-10-27 00:00:46 | Epoch: 1 | Step: 27210 | Dataset: 0-2040901 | Loss: 1.778 | 679 ms/step , 57920.14 GFLOP/s , 530632.4 tokens/s INFO:__main__:2024-10-27 00:00:54 | Epoch: 1 | Step: 27220 | Dataset: 0-2048901 | Loss: 1.760 | 679 ms/step , 57901.05 GFLOP/s , 529125.6 tokens/s INFO:__main__:2024-10-27 00:01:01 | Epoch: 1 | Step: 27230 | Dataset: 0-2056901 | Loss: 1.708 | 679 ms/step , 57884.68 GFLOP/s , 528888.1 tokens/s INFO:__main__:2024-10-27 00:01:09 | Epoch: 1 | Step: 27240 | Dataset: 0-2064901 | Loss: 1.688 | 681 ms/step , 57710.66 GFLOP/s , 527165.8 tokens/s INFO:__main__:2024-10-27 00:01:17 | Epoch: 1 | Step: 27250 | Dataset: 0-2072901 | Loss: 1.713 | 681 ms/step , 57691.62 GFLOP/s , 526769.7 tokens/s INFO:__main__:2024-10-27 00:01:25 | Epoch: 1 | Step: 27260 | Dataset: 0-2080901 | Loss: 1.700 | 681 ms/step , 57718.25 GFLOP/s , 527341.9 tokens/s INFO:__main__:2024-10-27 00:01:32 | Epoch: 1 | Step: 27270 | Dataset: 0-2088901 | Loss: 1.674 | 678 ms/step , 57938.07 GFLOP/s , 529167.4 tokens/s INFO:__main__:2024-10-27 00:01:40 | Epoch: 1 | Step: 27280 | Dataset: 0-2096901 | Loss: 1.660 | 677 ms/step , 58094.58 GFLOP/s , 530438.7 tokens/s INFO:__main__:2024-10-27 00:01:48 | Epoch: 1 | Step: 27290 | Dataset: 0-2104901 | Loss: 2.323 | 678 ms/step , 57949.99 GFLOP/s , 531126.1 tokens/s INFO:__main__:2024-10-27 00:01:56 | Epoch: 1 | Step: 27300 | Dataset: 0-2112901 | Loss: 2.295 | 677 ms/step , 58038.72 GFLOP/s , 529973.8 tokens/s INFO:__main__:2024-10-27 00:02:03 | Epoch: 1 | Step: 27310 | Dataset: 0-2120901 | Loss: 2.143 | 678 ms/step , 57959.18 GFLOP/s , 530117.7 tokens/s INFO:__main__:2024-10-27 00:02:11 | Epoch: 1 | Step: 27320 | Dataset: 0-2128901 | Loss: 2.301 | 677 ms/step , 58060.46 GFLOP/s , 529963.8 tokens/s INFO:__main__:2024-10-27 00:02:19 | Epoch: 1 | Step: 27330 | Dataset: 0-2136901 | Loss: 2.213 | 678 ms/step , 57972.84 GFLOP/s , 530311.2 tokens/s INFO:__main__:2024-10-27 00:02:26 | Epoch: 1 | Step: 27340 | Dataset: 0-2144901 | Loss: 2.147 | 679 ms/step , 57862.76 GFLOP/s , 529517.9 tokens/s INFO:__main__:2024-10-27 00:02:34 | Epoch: 1 | Step: 27350 | Dataset: 0-2152901 | Loss: 2.129 | 675 ms/step , 58251.01 GFLOP/s , 531280.3 tokens/s INFO:__main__:2024-10-27 00:02:42 | Epoch: 1 | Step: 27360 | Dataset: 0-2160901 | Loss: 2.075 | 677 ms/step , 58065.95 GFLOP/s , 531220.9 tokens/s INFO:__main__:2024-10-27 00:02:50 | Epoch: 1 | Step: 27370 | Dataset: 0-2168901 | Loss: 2.047 | 676 ms/step , 58112.21 GFLOP/s , 531498.3 tokens/s INFO:__main__:2024-10-27 00:02:57 | Epoch: 1 | Step: 27380 | Dataset: 0-2176901 | Loss: 2.239 | 675 ms/step , 58248.09 GFLOP/s , 532058.4 tokens/s INFO:__main__:2024-10-27 00:03:05 | Epoch: 1 | Step: 27390 | Dataset: 0-2184901 | Loss: 2.140 | 675 ms/step , 58252.32 GFLOP/s , 532212.2 tokens/s INFO:__main__:2024-10-27 00:03:13 | Epoch: 1 | Step: 27400 | Dataset: 0-2192901 | Loss: 2.157 | 676 ms/step , 58185.56 GFLOP/s , 531961.1 tokens/s INFO:__main__:2024-10-27 00:03:20 | Epoch: 1 | Step: 27410 | Dataset: 0-2200901 | Loss: 2.127 | 676 ms/step , 58168.78 GFLOP/s , 531926.4 tokens/s INFO:__main__:2024-10-27 00:03:28 | Epoch: 1 | Step: 27420 | Dataset: 0-2208901 | Loss: 2.146 | 675 ms/step , 58202.19 GFLOP/s , 531922.4 tokens/s INFO:__main__:2024-10-27 00:03:36 | Epoch: 1 | Step: 27430 | Dataset: 0-2216901 | Loss: 2.218 | 677 ms/step , 58102.49 GFLOP/s , 530399.5 tokens/s INFO:__main__:2024-10-27 00:03:43 | Epoch: 1 | Step: 27440 | Dataset: 0-2224901 | Loss: 2.193 | 676 ms/step , 58185.18 GFLOP/s , 531588.5 tokens/s INFO:__main__:2024-10-27 00:03:51 | Epoch: 1 | Step: 27450 | Dataset: 0-2232901 | Loss: 1.935 | 676 ms/step , 58192.35 GFLOP/s , 531033.1 tokens/s INFO:__main__:2024-10-27 00:03:59 | Epoch: 1 | Step: 27460 | Dataset: 0-2240901 | Loss: 1.844 | 675 ms/step , 58205.57 GFLOP/s , 531589.8 tokens/s INFO:__main__:2024-10-27 00:04:07 | Epoch: 1 | Step: 27470 | Dataset: 0-2248901 | Loss: 1.818 | 676 ms/step , 58150.47 GFLOP/s , 530321.8 tokens/s INFO:__main__:2024-10-27 00:04:14 | Epoch: 1 | Step: 27480 | Dataset: 0-2256901 | Loss: 1.817 | 676 ms/step , 58147.25 GFLOP/s , 530515.4 tokens/s INFO:__main__:2024-10-27 00:04:22 | Epoch: 1 | Step: 27490 | Dataset: 0-2264901 | Loss: 1.790 | 676 ms/step , 58160.95 GFLOP/s , 530837.5 tokens/s INFO:__main__:2024-10-27 00:04:30 | Epoch: 1 | Step: 27500 | Dataset: 0-2272901 | Loss: 1.809 | 677 ms/step , 58048.74 GFLOP/s , 531097.0 tokens/s INFO:__main__:2024-10-27 00:04:37 | Epoch: 1 | Step: 27510 | Dataset: 0-2280901 | Loss: 1.792 | 675 ms/step , 58248.50 GFLOP/s , 531345.8 tokens/s INFO:__main__:2024-10-27 00:04:45 | Epoch: 1 | Step: 27520 | Dataset: 0-2288901 | Loss: 1.793 | 677 ms/step , 58084.66 GFLOP/s , 530873.7 tokens/s INFO:__main__:2024-10-27 00:04:53 | Epoch: 1 | Step: 27530 | Dataset: 0-2296901 | Loss: 1.774 | 678 ms/step , 58007.82 GFLOP/s , 530221.7 tokens/s INFO:__main__:2024-10-27 00:05:01 | Epoch: 1 | Step: 27540 | Dataset: 0-2304901 | Loss: 2.360 | 677 ms/step , 58091.33 GFLOP/s , 529937.2 tokens/s INFO:__main__:2024-10-27 00:05:08 | Epoch: 1 | Step: 27550 | Dataset: 0-2312901 | Loss: 2.220 | 676 ms/step , 58162.63 GFLOP/s , 531328.1 tokens/s INFO:__main__:2024-10-27 00:05:16 | Epoch: 1 | Step: 27560 | Dataset: 0-2320901 | Loss: 2.268 | 677 ms/step , 58091.89 GFLOP/s , 531151.3 tokens/s INFO:__main__:2024-10-27 00:05:24 | Epoch: 1 | Step: 27570 | Dataset: 0-2328901 | Loss: 2.219 | 676 ms/step , 58116.73 GFLOP/s , 531177.4 tokens/s INFO:__main__:2024-10-27 00:05:31 | Epoch: 1 | Step: 27580 | Dataset: 0-2336901 | Loss: 2.165 | 676 ms/step , 58141.05 GFLOP/s , 531018.8 tokens/s INFO:__main__:2024-10-27 00:05:39 | Epoch: 1 | Step: 27590 | Dataset: 0-2344901 | Loss: 2.172 | 677 ms/step , 58073.28 GFLOP/s , 531575.3 tokens/s INFO:__main__:2024-10-27 00:05:47 | Epoch: 1 | Step: 27600 | Dataset: 0-2352901 | Loss: 2.240 | 677 ms/step , 58039.33 GFLOP/s , 529803.4 tokens/s INFO:__main__:2024-10-27 00:05:55 | Epoch: 1 | Step: 27610 | Dataset: 0-2360901 | Loss: 2.263 | 675 ms/step , 58208.53 GFLOP/s , 530529.6 tokens/s INFO:__main__:2024-10-27 00:06:02 | Epoch: 1 | Step: 27620 | Dataset: 0-2368901 | Loss: 2.149 | 676 ms/step , 58153.35 GFLOP/s , 531778.9 tokens/s INFO:__main__:2024-10-27 00:06:10 | Epoch: 1 | Step: 27630 | Dataset: 0-2376901 | Loss: 2.166 | 676 ms/step , 58126.35 GFLOP/s , 536397.0 tokens/s INFO:__main__:2024-10-27 00:06:18 | Epoch: 1 | Step: 27640 | Dataset: 0-2384901 | Loss: 2.259 | 675 ms/step , 58232.11 GFLOP/s , 531505.1 tokens/s INFO:__main__:2024-10-27 00:06:25 | Epoch: 1 | Step: 27650 | Dataset: 0-2392901 | Loss: 2.195 | 676 ms/step , 58141.83 GFLOP/s , 531139.8 tokens/s INFO:__main__:2024-10-27 00:06:33 | Epoch: 1 | Step: 27660 | Dataset: 0-2400901 | Loss: 2.088 | 675 ms/step , 58227.00 GFLOP/s , 531764.9 tokens/s INFO:__main__:2024-10-27 00:06:41 | Epoch: 1 | Step: 27670 | Dataset: 0-2408901 | Loss: 2.117 | 677 ms/step , 58070.72 GFLOP/s , 531324.3 tokens/s INFO:__main__:2024-10-27 00:06:49 | Epoch: 1 | Step: 27680 | Dataset: 0-2416901 | Loss: 2.238 | 676 ms/step , 58185.51 GFLOP/s , 531692.6 tokens/s INFO:__main__:2024-10-27 00:06:56 | Epoch: 1 | Step: 27690 | Dataset: 0-2424901 | Loss: 2.121 | 677 ms/step , 58103.06 GFLOP/s , 531283.4 tokens/s INFO:__main__:2024-10-27 00:07:04 | Epoch: 1 | Step: 27700 | Dataset: 0-2432901 | Loss: 2.252 | 676 ms/step , 58129.01 GFLOP/s , 532186.5 tokens/s INFO:__main__:2024-10-27 00:07:12 | Epoch: 1 | Step: 27710 | Dataset: 0-2440901 | Loss: 2.280 | 675 ms/step , 58233.99 GFLOP/s , 531741.7 tokens/s INFO:__main__:2024-10-27 00:07:19 | Epoch: 1 | Step: 27720 | Dataset: 0-2448901 | Loss: 2.259 | 676 ms/step , 58171.59 GFLOP/s , 532414.5 tokens/s INFO:__main__:2024-10-27 00:07:27 | Epoch: 1 | Step: 27730 | Dataset: 0-2456901 | Loss: 2.300 | 675 ms/step , 58193.36 GFLOP/s , 531514.0 tokens/s INFO:__main__:2024-10-27 00:07:35 | Epoch: 1 | Step: 27740 | Dataset: 0-2464901 | Loss: 2.216 | 677 ms/step , 58095.15 GFLOP/s , 531728.2 tokens/s INFO:__main__:2024-10-27 00:07:42 | Epoch: 1 | Step: 27750 | Dataset: 0-2472901 | Loss: 2.244 | 676 ms/step , 58176.83 GFLOP/s , 531939.1 tokens/s INFO:__main__:2024-10-27 00:07:50 | Epoch: 1 | Step: 27760 | Dataset: 0-2480901 | Loss: 2.189 | 675 ms/step , 58199.95 GFLOP/s , 531690.2 tokens/s INFO:__main__:2024-10-27 00:07:58 | Epoch: 1 | Step: 27770 | Dataset: 0-2488901 | Loss: 2.166 | 676 ms/step , 58182.66 GFLOP/s , 531918.5 tokens/s INFO:__main__:2024-10-27 00:08:06 | Epoch: 1 | Step: 27780 | Dataset: 0-2496901 | Loss: 2.169 | 676 ms/step , 58124.44 GFLOP/s , 531246.8 tokens/s INFO:__main__:2024-10-27 00:08:13 | Epoch: 1 | Step: 27790 | Dataset: 0-2504901 | Loss: 2.260 | 675 ms/step , 58213.91 GFLOP/s , 531846.8 tokens/s INFO:__main__:2024-10-27 00:08:21 | Epoch: 1 | Step: 27800 | Dataset: 0-2512901 | Loss: 2.222 | 676 ms/step , 58116.21 GFLOP/s , 531452.4 tokens/s INFO:__main__:2024-10-27 00:08:29 | Epoch: 1 | Step: 27810 | Dataset: 0-2520901 | Loss: 2.285 | 676 ms/step , 58129.66 GFLOP/s , 531554.7 tokens/s INFO:__main__:2024-10-27 00:08:36 | Epoch: 1 | Step: 27820 | Dataset: 0-2528901 | Loss: 2.191 | 675 ms/step , 58221.63 GFLOP/s , 532033.6 tokens/s INFO:__main__:2024-10-27 00:08:44 | Epoch: 1 | Step: 27830 | Dataset: 0-2536901 | Loss: 2.196 | 676 ms/step , 58181.50 GFLOP/s , 531871.7 tokens/s INFO:__main__:2024-10-27 00:08:52 | Epoch: 1 | Step: 27840 | Dataset: 0-2544901 | Loss: 2.199 | 677 ms/step , 58105.50 GFLOP/s , 531486.1 tokens/s INFO:__main__:2024-10-27 00:08:59 | Epoch: 1 | Step: 27850 | Dataset: 0-2552901 | Loss: 2.219 | 676 ms/step , 58190.30 GFLOP/s , 531511.4 tokens/s INFO:__main__:2024-10-27 00:09:07 | Epoch: 1 | Step: 27860 | Dataset: 0-2560901 | Loss: 2.243 | 676 ms/step , 58171.92 GFLOP/s , 531771.7 tokens/s INFO:__main__:2024-10-27 00:09:15 | Epoch: 1 | Step: 27870 | Dataset: 0-2568901 | Loss: 2.214 | 675 ms/step , 58207.05 GFLOP/s , 531408.5 tokens/s INFO:__main__:2024-10-27 00:09:23 | Epoch: 1 | Step: 27880 | Dataset: 0-2576901 | Loss: 2.204 | 676 ms/step , 58162.47 GFLOP/s , 531360.6 tokens/s INFO:__main__:2024-10-27 00:09:30 | Epoch: 1 | Step: 27890 | Dataset: 0-2584901 | Loss: 2.273 | 676 ms/step , 58134.63 GFLOP/s , 531756.0 tokens/s INFO:__main__:2024-10-27 00:09:38 | Epoch: 1 | Step: 27900 | Dataset: 0-2592901 | Loss: 2.194 | 676 ms/step , 58157.68 GFLOP/s , 531690.4 tokens/s INFO:__main__:2024-10-27 00:09:46 | Epoch: 1 | Step: 27910 | Dataset: 0-2600901 | Loss: 2.200 | 677 ms/step , 58095.33 GFLOP/s , 531635.3 tokens/s INFO:__main__:2024-10-27 00:09:53 | Epoch: 1 | Step: 27920 | Dataset: 0-2608901 | Loss: 2.221 | 675 ms/step , 58235.59 GFLOP/s , 532361.9 tokens/s INFO:__main__:2024-10-27 00:10:01 | Epoch: 1 | Step: 27930 | Dataset: 0-2616901 | Loss: 2.164 | 675 ms/step , 58252.80 GFLOP/s , 531782.9 tokens/s INFO:__main__:2024-10-27 00:10:09 | Epoch: 1 | Step: 27940 | Dataset: 0-2624901 | Loss: 2.220 | 676 ms/step , 58137.55 GFLOP/s , 531982.2 tokens/s INFO:__main__:2024-10-27 00:10:16 | Epoch: 1 | Step: 27950 | Dataset: 0-2632901 | Loss: 2.216 | 677 ms/step , 58102.61 GFLOP/s , 531867.9 tokens/s INFO:__main__:2024-10-27 00:10:24 | Epoch: 1 | Step: 27960 | Dataset: 0-2640901 | Loss: 2.231 | 674 ms/step , 58284.40 GFLOP/s , 532116.3 tokens/s INFO:__main__:2024-10-27 00:10:32 | Epoch: 1 | Step: 27970 | Dataset: 0-2648901 | Loss: 2.148 | 676 ms/step , 58145.06 GFLOP/s , 532159.7 tokens/s INFO:__main__:2024-10-27 00:10:40 | Epoch: 1 | Step: 27980 | Dataset: 0-2656901 | Loss: 2.230 | 676 ms/step , 58157.38 GFLOP/s , 531193.6 tokens/s INFO:__main__:2024-10-27 00:10:47 | Epoch: 1 | Step: 27990 | Dataset: 0-2664901 | Loss: 2.202 | 676 ms/step , 58181.06 GFLOP/s , 531978.5 tokens/s INFO:__main__:2024-10-27 00:10:55 | Validation | Step: 28000 | Val_loss: 2.192 | Best_val_loss: 2.2378 INFO:__main__:2024-10-27 00:10:55 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_001055_step_28000.pt` INFO:__main__:2024-10-27 00:10:56 | Epoch: 1 | Step: 28000 | Dataset: 0-2672901 | Loss: 2.140 | 674 ms/step , 58306.83 GFLOP/s , 477559.8 tokens/s INFO:__main__:2024-10-27 00:11:04 | Epoch: 1 | Step: 28010 | Dataset: 0-2680901 | Loss: 2.208 | 674 ms/step , 58322.77 GFLOP/s , 532371.6 tokens/s INFO:__main__:2024-10-27 00:11:11 | Epoch: 1 | Step: 28020 | Dataset: 0-2688901 | Loss: 2.178 | 675 ms/step , 58262.26 GFLOP/s , 532417.3 tokens/s INFO:__main__:2024-10-27 00:11:19 | Epoch: 1 | Step: 28030 | Dataset: 0-2696901 | Loss: 2.375 | 676 ms/step , 58187.33 GFLOP/s , 532629.2 tokens/s INFO:__main__:2024-10-27 00:11:27 | Epoch: 1 | Step: 28040 | Dataset: 0-2704901 | Loss: 2.299 | 676 ms/step , 58179.78 GFLOP/s , 531902.4 tokens/s INFO:__main__:2024-10-27 00:11:34 | Epoch: 1 | Step: 28050 | Dataset: 0-2712901 | Loss: 2.245 | 675 ms/step , 58225.83 GFLOP/s , 532486.1 tokens/s INFO:__main__:2024-10-27 00:11:42 | Epoch: 1 | Step: 28060 | Dataset: 0-2720901 | Loss: 2.188 | 676 ms/step , 58151.91 GFLOP/s , 532202.6 tokens/s INFO:__main__:2024-10-27 00:11:50 | Epoch: 1 | Step: 28070 | Dataset: 0-2728901 | Loss: 2.149 | 677 ms/step , 58101.41 GFLOP/s , 531701.1 tokens/s INFO:__main__:2024-10-27 00:11:57 | Epoch: 1 | Step: 28080 | Dataset: 0-2736901 | Loss: 2.143 | 675 ms/step , 58213.62 GFLOP/s , 532471.1 tokens/s INFO:__main__:2024-10-27 00:12:05 | Epoch: 1 | Step: 28090 | Dataset: 0-2744901 | Loss: 2.126 | 676 ms/step , 58130.53 GFLOP/s , 532218.0 tokens/s INFO:__main__:2024-10-27 00:12:13 | Epoch: 1 | Step: 28100 | Dataset: 0-2752901 | Loss: 2.089 | 676 ms/step , 58141.70 GFLOP/s , 532042.6 tokens/s INFO:__main__:2024-10-27 00:12:21 | Epoch: 1 | Step: 28110 | Dataset: 0-2760901 | Loss: 2.091 | 675 ms/step , 58198.40 GFLOP/s , 531834.1 tokens/s INFO:__main__:2024-10-27 00:12:28 | Epoch: 1 | Step: 28120 | Dataset: 0-2768901 | Loss: 2.067 | 675 ms/step , 58210.57 GFLOP/s , 532195.5 tokens/s INFO:__main__:2024-10-27 00:12:36 | Epoch: 1 | Step: 28130 | Dataset: 0-2776901 | Loss: 2.035 | 676 ms/step , 58151.70 GFLOP/s , 531946.8 tokens/s INFO:__main__:2024-10-27 00:12:44 | Epoch: 1 | Step: 28140 | Dataset: 0-2784901 | Loss: 2.056 | 675 ms/step , 58207.47 GFLOP/s , 531292.2 tokens/s INFO:__main__:2024-10-27 00:12:51 | Epoch: 1 | Step: 28150 | Dataset: 0-2792901 | Loss: 2.034 | 675 ms/step , 58206.40 GFLOP/s , 531810.3 tokens/s INFO:__main__:2024-10-27 00:12:59 | Epoch: 1 | Step: 28160 | Dataset: 0-2800901 | Loss: 2.001 | 675 ms/step , 58225.76 GFLOP/s , 532018.8 tokens/s INFO:__main__:2024-10-27 00:13:07 | Epoch: 1 | Step: 28170 | Dataset: 0-2808901 | Loss: 2.005 | 676 ms/step , 58147.59 GFLOP/s , 531727.5 tokens/s INFO:__main__:2024-10-27 00:13:14 | Epoch: 1 | Step: 28180 | Dataset: 0-2816901 | Loss: 2.052 | 676 ms/step , 58155.77 GFLOP/s , 531841.3 tokens/s INFO:__main__:2024-10-27 00:13:22 | Epoch: 1 | Step: 28190 | Dataset: 0-2824901 | Loss: 2.373 | 675 ms/step , 58211.16 GFLOP/s , 532240.9 tokens/s INFO:__main__:2024-10-27 00:13:30 | Epoch: 1 | Step: 28200 | Dataset: 0-2832901 | Loss: 2.311 | 675 ms/step , 58278.11 GFLOP/s , 531989.0 tokens/s INFO:__main__:2024-10-27 00:13:38 | Epoch: 1 | Step: 28210 | Dataset: 0-2840901 | Loss: 2.359 | 675 ms/step , 58265.48 GFLOP/s , 532587.6 tokens/s INFO:__main__:2024-10-27 00:13:45 | Epoch: 1 | Step: 28220 | Dataset: 0-2848901 | Loss: 2.266 | 676 ms/step , 58157.87 GFLOP/s , 531477.3 tokens/s INFO:__main__:2024-10-27 00:13:53 | Epoch: 1 | Step: 28230 | Dataset: 0-2856901 | Loss: 2.256 | 675 ms/step , 58211.67 GFLOP/s , 531814.2 tokens/s INFO:__main__:2024-10-27 00:14:01 | Epoch: 1 | Step: 28240 | Dataset: 0-2864901 | Loss: 2.202 | 676 ms/step , 58154.01 GFLOP/s , 531495.8 tokens/s INFO:__main__:2024-10-27 00:14:08 | Epoch: 1 | Step: 28250 | Dataset: 0-2872901 | Loss: 2.225 | 676 ms/step , 58146.23 GFLOP/s , 531934.0 tokens/s INFO:__main__:2024-10-27 00:14:16 | Epoch: 1 | Step: 28260 | Dataset: 0-2880901 | Loss: 2.186 | 676 ms/step , 58162.29 GFLOP/s , 531409.8 tokens/s INFO:__main__:2024-10-27 00:14:24 | Epoch: 1 | Step: 28270 | Dataset: 0-2888901 | Loss: 2.250 | 675 ms/step , 58209.19 GFLOP/s , 531323.3 tokens/s INFO:__main__:2024-10-27 00:14:31 | Epoch: 1 | Step: 28280 | Dataset: 0-2896901 | Loss: 2.231 | 675 ms/step , 58224.18 GFLOP/s , 532033.8 tokens/s INFO:__main__:2024-10-27 00:14:39 | Epoch: 1 | Step: 28290 | Dataset: 0-2904901 | Loss: 2.255 | 676 ms/step , 58142.38 GFLOP/s , 532456.8 tokens/s INFO:__main__:2024-10-27 00:14:47 | Epoch: 1 | Step: 28300 | Dataset: 0-2912901 | Loss: 2.181 | 675 ms/step , 58245.93 GFLOP/s , 532573.9 tokens/s INFO:__main__:2024-10-27 00:14:55 | Epoch: 1 | Step: 28310 | Dataset: 0-2920901 | Loss: 2.227 | 676 ms/step , 58157.66 GFLOP/s , 531887.1 tokens/s INFO:__main__:2024-10-27 00:15:02 | Epoch: 1 | Step: 28320 | Dataset: 0-2928901 | Loss: 2.214 | 675 ms/step , 58206.13 GFLOP/s , 531934.3 tokens/s INFO:__main__:2024-10-27 00:15:10 | Epoch: 1 | Step: 28330 | Dataset: 0-2936901 | Loss: 2.176 | 675 ms/step , 58265.39 GFLOP/s , 532721.0 tokens/s INFO:__main__:2024-10-27 00:15:18 | Epoch: 1 | Step: 28340 | Dataset: 0-2944901 | Loss: 2.260 | 675 ms/step , 58200.55 GFLOP/s , 532366.2 tokens/s INFO:__main__:2024-10-27 00:15:25 | Epoch: 1 | Step: 28350 | Dataset: 0-2952901 | Loss: 2.286 | 676 ms/step , 58171.28 GFLOP/s , 532248.2 tokens/s INFO:__main__:2024-10-27 00:15:33 | Epoch: 1 | Step: 28360 | Dataset: 0-2960901 | Loss: 2.297 | 675 ms/step , 58254.53 GFLOP/s , 532254.7 tokens/s INFO:__main__:2024-10-27 00:15:41 | Epoch: 1 | Step: 28370 | Dataset: 0-2968901 | Loss: 2.302 | 675 ms/step , 58206.25 GFLOP/s , 531464.4 tokens/s INFO:__main__:2024-10-27 00:15:48 | Epoch: 1 | Step: 28380 | Dataset: 0-2976901 | Loss: 2.259 | 675 ms/step , 58263.09 GFLOP/s , 532203.8 tokens/s INFO:__main__:2024-10-27 00:15:56 | Epoch: 1 | Step: 28390 | Dataset: 0-2984901 | Loss: 2.283 | 676 ms/step , 58191.02 GFLOP/s , 532055.4 tokens/s INFO:__main__:2024-10-27 00:16:04 | Epoch: 1 | Step: 28400 | Dataset: 0-2992901 | Loss: 2.227 | 675 ms/step , 58198.17 GFLOP/s , 531288.0 tokens/s INFO:__main__:2024-10-27 00:16:12 | Epoch: 1 | Step: 28410 | Dataset: 0-3000901 | Loss: 2.267 | 675 ms/step , 58250.70 GFLOP/s , 532486.6 tokens/s INFO:__main__:2024-10-27 00:16:19 | Epoch: 1 | Step: 28420 | Dataset: 0-3008901 | Loss: 2.186 | 674 ms/step , 58302.43 GFLOP/s , 532666.1 tokens/s INFO:__main__:2024-10-27 00:16:27 | Epoch: 1 | Step: 28430 | Dataset: 0-3016901 | Loss: 2.306 | 675 ms/step , 58277.99 GFLOP/s , 532948.9 tokens/s INFO:__main__:2024-10-27 00:16:35 | Epoch: 1 | Step: 28440 | Dataset: 0-3024901 | Loss: 2.230 | 675 ms/step , 58246.84 GFLOP/s , 532776.5 tokens/s INFO:__main__:2024-10-27 00:16:42 | Epoch: 1 | Step: 28450 | Dataset: 0-3032901 | Loss: 2.225 | 675 ms/step , 58267.56 GFLOP/s , 532640.0 tokens/s INFO:__main__:2024-10-27 00:16:50 | Epoch: 1 | Step: 28460 | Dataset: 0-3040901 | Loss: 2.285 | 677 ms/step , 58099.07 GFLOP/s , 532566.1 tokens/s INFO:__main__:2024-10-27 00:16:58 | Epoch: 1 | Step: 28470 | Dataset: 0-3048901 | Loss: 2.249 | 676 ms/step , 58172.11 GFLOP/s , 532401.6 tokens/s INFO:__main__:2024-10-27 00:17:05 | Epoch: 1 | Step: 28480 | Dataset: 0-3056901 | Loss: 2.282 | 675 ms/step , 58196.38 GFLOP/s , 532219.8 tokens/s INFO:__main__:2024-10-27 00:17:13 | Epoch: 1 | Step: 28490 | Dataset: 0-3064901 | Loss: 2.212 | 676 ms/step , 58185.46 GFLOP/s , 532287.3 tokens/s INFO:__main__:2024-10-27 00:17:21 | Epoch: 1 | Step: 28500 | Dataset: 0-3072901 | Loss: 2.274 | 675 ms/step , 58203.79 GFLOP/s , 532165.0 tokens/s INFO:__main__:2024-10-27 00:17:28 | Epoch: 1 | Step: 28510 | Dataset: 0-3080901 | Loss: 1.970 | 675 ms/step , 58251.36 GFLOP/s , 532208.9 tokens/s INFO:__main__:2024-10-27 00:17:36 | Epoch: 1 | Step: 28520 | Dataset: 0-3088901 | Loss: 1.845 | 675 ms/step , 58226.03 GFLOP/s , 532565.1 tokens/s INFO:__main__:2024-10-27 00:17:44 | Epoch: 1 | Step: 28530 | Dataset: 0-3096901 | Loss: 1.758 | 675 ms/step , 58252.01 GFLOP/s , 532774.3 tokens/s INFO:__main__:2024-10-27 00:17:52 | Epoch: 1 | Step: 28540 | Dataset: 0-3104901 | Loss: 1.796 | 675 ms/step , 58249.13 GFLOP/s , 532965.4 tokens/s INFO:__main__:2024-10-27 00:17:59 | Epoch: 1 | Step: 28550 | Dataset: 0-3112901 | Loss: 1.751 | 675 ms/step , 58202.15 GFLOP/s , 532292.7 tokens/s INFO:__main__:2024-10-27 00:18:07 | Epoch: 1 | Step: 28560 | Dataset: 0-3120901 | Loss: 1.697 | 675 ms/step , 58204.39 GFLOP/s , 532763.8 tokens/s INFO:__main__:2024-10-27 00:18:15 | Epoch: 1 | Step: 28570 | Dataset: 0-3128901 | Loss: 2.332 | 676 ms/step , 58174.32 GFLOP/s , 532370.3 tokens/s INFO:__main__:2024-10-27 00:18:22 | Epoch: 1 | Step: 28580 | Dataset: 0-3136901 | Loss: 2.317 | 676 ms/step , 58159.55 GFLOP/s , 532271.0 tokens/s INFO:__main__:2024-10-27 00:18:30 | Epoch: 1 | Step: 28590 | Dataset: 0-3144901 | Loss: 2.268 | 674 ms/step , 58298.59 GFLOP/s , 533174.0 tokens/s INFO:__main__:2024-10-27 00:18:38 | Epoch: 1 | Step: 28600 | Dataset: 0-3152901 | Loss: 2.275 | 676 ms/step , 58175.83 GFLOP/s , 533026.0 tokens/s INFO:__main__:2024-10-27 00:18:45 | Epoch: 1 | Step: 28610 | Dataset: 0-3160901 | Loss: 2.186 | 674 ms/step , 58304.23 GFLOP/s , 532247.0 tokens/s INFO:__main__:2024-10-27 00:18:53 | Epoch: 1 | Step: 28620 | Dataset: 0-3168901 | Loss: 2.295 | 674 ms/step , 58295.89 GFLOP/s , 532507.7 tokens/s INFO:__main__:2024-10-27 00:19:01 | Epoch: 1 | Step: 28630 | Dataset: 0-3176901 | Loss: 2.258 | 676 ms/step , 58133.81 GFLOP/s , 532458.3 tokens/s INFO:__main__:2024-10-27 00:19:08 | Epoch: 1 | Step: 28640 | Dataset: 0-3184901 | Loss: 2.182 | 675 ms/step , 58198.07 GFLOP/s , 532270.6 tokens/s INFO:__main__:2024-10-27 00:19:16 | Epoch: 1 | Step: 28650 | Dataset: 0-3192901 | Loss: 2.248 | 676 ms/step , 58167.53 GFLOP/s , 532617.3 tokens/s INFO:__main__:2024-10-27 00:19:24 | Epoch: 1 | Step: 28660 | Dataset: 0-3200901 | Loss: 2.190 | 677 ms/step , 58096.98 GFLOP/s , 532168.3 tokens/s INFO:__main__:2024-10-27 00:19:32 | Epoch: 1 | Step: 28670 | Dataset: 0-3208901 | Loss: 2.183 | 676 ms/step , 58163.51 GFLOP/s , 532139.8 tokens/s INFO:__main__:2024-10-27 00:19:39 | Epoch: 1 | Step: 28680 | Dataset: 0-3216901 | Loss: 2.221 | 677 ms/step , 58054.25 GFLOP/s , 532334.1 tokens/s INFO:__main__:2024-10-27 00:19:47 | Epoch: 1 | Step: 28690 | Dataset: 0-3224901 | Loss: 2.259 | 676 ms/step , 58126.97 GFLOP/s , 531890.1 tokens/s INFO:__main__:2024-10-27 00:19:55 | Epoch: 1 | Step: 28700 | Dataset: 0-3232901 | Loss: 2.190 | 676 ms/step , 58180.28 GFLOP/s , 532623.0 tokens/s INFO:__main__:2024-10-27 00:20:02 | Epoch: 1 | Step: 28710 | Dataset: 0-3240901 | Loss: 2.260 | 676 ms/step , 58132.66 GFLOP/s , 531788.9 tokens/s INFO:__main__:2024-10-27 00:20:10 | Epoch: 1 | Step: 28720 | Dataset: 0-3248901 | Loss: 2.247 | 675 ms/step , 58249.49 GFLOP/s , 532296.9 tokens/s INFO:__main__:2024-10-27 00:20:18 | Epoch: 1 | Step: 28730 | Dataset: 0-3256901 | Loss: 2.308 | 676 ms/step , 58187.72 GFLOP/s , 532262.4 tokens/s INFO:__main__:2024-10-27 00:20:25 | Epoch: 1 | Step: 28740 | Dataset: 0-3264901 | Loss: 2.273 | 676 ms/step , 58151.45 GFLOP/s , 532275.0 tokens/s INFO:__main__:2024-10-27 00:20:33 | Epoch: 1 | Step: 28750 | Dataset: 0-3272901 | Loss: 2.253 | 676 ms/step , 58179.57 GFLOP/s , 532194.3 tokens/s INFO:__main__:2024-10-27 00:20:41 | Epoch: 1 | Step: 28760 | Dataset: 0-3280901 | Loss: 2.260 | 676 ms/step , 58118.02 GFLOP/s , 532406.4 tokens/s INFO:__main__:2024-10-27 00:20:48 | Epoch: 1 | Step: 28770 | Dataset: 0-3288901 | Loss: 2.243 | 674 ms/step , 58315.16 GFLOP/s , 532438.9 tokens/s INFO:__main__:2024-10-27 00:20:56 | Epoch: 1 | Step: 28780 | Dataset: 0-3296901 | Loss: 2.242 | 678 ms/step , 57980.13 GFLOP/s , 532275.3 tokens/s INFO:__main__:2024-10-27 00:21:04 | Epoch: 1 | Step: 28790 | Dataset: 0-3304901 | Loss: 2.257 | 676 ms/step , 58189.48 GFLOP/s , 532465.2 tokens/s INFO:__main__:2024-10-27 00:21:12 | Epoch: 1 | Step: 28800 | Dataset: 0-3312901 | Loss: 2.236 | 675 ms/step , 58239.36 GFLOP/s , 532359.3 tokens/s INFO:__main__:2024-10-27 00:21:19 | Epoch: 1 | Step: 28810 | Dataset: 0-3320901 | Loss: 2.234 | 676 ms/step , 58149.16 GFLOP/s , 532822.5 tokens/s INFO:__main__:2024-10-27 00:21:27 | Epoch: 1 | Step: 28820 | Dataset: 0-3328901 | Loss: 2.253 | 676 ms/step , 58151.11 GFLOP/s , 532186.9 tokens/s INFO:__main__:2024-10-27 00:21:35 | Epoch: 1 | Step: 28830 | Dataset: 0-3336901 | Loss: 2.153 | 676 ms/step , 58161.00 GFLOP/s , 532493.7 tokens/s INFO:__main__:2024-10-27 00:21:42 | Epoch: 1 | Step: 28840 | Dataset: 0-3344901 | Loss: 2.234 | 675 ms/step , 58226.49 GFLOP/s , 532178.9 tokens/s INFO:__main__:2024-10-27 00:21:50 | Epoch: 1 | Step: 28850 | Dataset: 0-3352901 | Loss: 2.273 | 675 ms/step , 58204.26 GFLOP/s , 532747.3 tokens/s INFO:__main__:2024-10-27 00:21:58 | Epoch: 1 | Step: 28860 | Dataset: 0-3360901 | Loss: 2.167 | 675 ms/step , 58203.71 GFLOP/s , 533039.3 tokens/s INFO:__main__:2024-10-27 00:22:05 | Epoch: 1 | Step: 28870 | Dataset: 0-3368901 | Loss: 2.156 | 676 ms/step , 58185.99 GFLOP/s , 532190.7 tokens/s INFO:__main__:2024-10-27 00:22:13 | Epoch: 1 | Step: 28880 | Dataset: 0-3376901 | Loss: 2.249 | 676 ms/step , 58167.06 GFLOP/s , 531978.6 tokens/s INFO:__main__:2024-10-27 00:22:21 | Epoch: 1 | Step: 28890 | Dataset: 0-3384901 | Loss: 2.299 | 677 ms/step , 58091.53 GFLOP/s , 531087.9 tokens/s INFO:__main__:2024-10-27 00:22:29 | Epoch: 1 | Step: 28900 | Dataset: 0-3392901 | Loss: 2.243 | 677 ms/step , 58103.59 GFLOP/s , 531589.1 tokens/s INFO:__main__:2024-10-27 00:22:36 | Epoch: 1 | Step: 28910 | Dataset: 0-3400901 | Loss: 2.225 | 676 ms/step , 58149.09 GFLOP/s , 531493.9 tokens/s INFO:__main__:2024-10-27 00:22:44 | Epoch: 1 | Step: 28920 | Dataset: 0-3408901 | Loss: 2.191 | 676 ms/step , 58121.93 GFLOP/s , 531578.2 tokens/s INFO:__main__:2024-10-27 00:22:52 | Epoch: 1 | Step: 28930 | Dataset: 0-3416901 | Loss: 2.226 | 676 ms/step , 58158.91 GFLOP/s , 531230.4 tokens/s INFO:__main__:2024-10-27 00:22:59 | Epoch: 1 | Step: 28940 | Dataset: 0-3424901 | Loss: 2.191 | 677 ms/step , 58087.96 GFLOP/s , 531351.8 tokens/s INFO:__main__:2024-10-27 00:23:07 | Epoch: 1 | Step: 28950 | Dataset: 0-3432901 | Loss: 2.247 | 676 ms/step , 58162.40 GFLOP/s , 531517.3 tokens/s INFO:__main__:2024-10-27 00:23:15 | Epoch: 1 | Step: 28960 | Dataset: 0-3440901 | Loss: 2.179 | 677 ms/step , 58095.85 GFLOP/s , 531314.0 tokens/s INFO:__main__:2024-10-27 00:23:22 | Epoch: 1 | Step: 28970 | Dataset: 0-3448901 | Loss: 2.150 | 676 ms/step , 58149.88 GFLOP/s , 530576.3 tokens/s INFO:__main__:2024-10-27 00:23:30 | Epoch: 1 | Step: 28980 | Dataset: 0-3456901 | Loss: 2.156 | 676 ms/step , 58152.91 GFLOP/s , 531413.4 tokens/s INFO:__main__:2024-10-27 00:23:38 | Epoch: 1 | Step: 28990 | Dataset: 0-3464901 | Loss: 2.245 | 675 ms/step , 58213.99 GFLOP/s , 532480.0 tokens/s INFO:__main__:2024-10-27 00:23:45 | Validation | Step: 29000 | Val_loss: 2.215 | Best_val_loss: 2.1915 INFO:__main__:2024-10-27 00:23:45 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_002345_step_29000.pt` INFO:__main__:2024-10-27 00:23:46 | Epoch: 1 | Step: 29000 | Dataset: 0-3472901 | Loss: 2.195 | 674 ms/step , 58364.73 GFLOP/s , 479557.8 tokens/s INFO:__main__:2024-10-27 00:23:54 | Epoch: 1 | Step: 29010 | Dataset: 0-3480901 | Loss: 2.184 | 675 ms/step , 58275.08 GFLOP/s , 532550.3 tokens/s INFO:__main__:2024-10-27 00:24:02 | Epoch: 1 | Step: 29020 | Dataset: 0-3488901 | Loss: 2.196 | 676 ms/step , 58179.00 GFLOP/s , 530347.9 tokens/s INFO:__main__:2024-10-27 00:24:10 | Epoch: 1 | Step: 29030 | Dataset: 0-3496901 | Loss: 2.158 | 675 ms/step , 58197.58 GFLOP/s , 531967.7 tokens/s INFO:__main__:2024-10-27 00:24:17 | Epoch: 1 | Step: 29040 | Dataset: 0-3504901 | Loss: 2.184 | 676 ms/step , 58133.74 GFLOP/s , 531974.6 tokens/s INFO:__main__:2024-10-27 00:24:25 | Epoch: 1 | Step: 29050 | Dataset: 0-3512901 | Loss: 2.193 | 676 ms/step , 58161.10 GFLOP/s , 531859.9 tokens/s INFO:__main__:2024-10-27 00:24:33 | Epoch: 1 | Step: 29060 | Dataset: 0-3520901 | Loss: 2.273 | 676 ms/step , 58186.09 GFLOP/s , 532176.7 tokens/s INFO:__main__:2024-10-27 00:24:40 | Epoch: 1 | Step: 29070 | Dataset: 0-3528901 | Loss: 2.237 | 675 ms/step , 58254.42 GFLOP/s , 532259.9 tokens/s INFO:__main__:2024-10-27 00:24:48 | Epoch: 1 | Step: 29080 | Dataset: 0-3536901 | Loss: 2.183 | 675 ms/step , 58248.54 GFLOP/s , 532057.1 tokens/s INFO:__main__:2024-10-27 00:24:56 | Epoch: 1 | Step: 29090 | Dataset: 0-3544901 | Loss: 2.173 | 675 ms/step , 58264.83 GFLOP/s , 532570.7 tokens/s INFO:__main__:2024-10-27 00:25:03 | Epoch: 1 | Step: 29100 | Dataset: 0-3552901 | Loss: 2.233 | 674 ms/step , 58333.18 GFLOP/s , 532736.4 tokens/s INFO:__main__:2024-10-27 00:25:11 | Epoch: 1 | Step: 29110 | Dataset: 0-3560901 | Loss: 2.184 | 675 ms/step , 58240.95 GFLOP/s , 532213.7 tokens/s INFO:__main__:2024-10-27 00:25:19 | Epoch: 1 | Step: 29120 | Dataset: 0-3568901 | Loss: 2.186 | 675 ms/step , 58241.23 GFLOP/s , 532119.1 tokens/s INFO:__main__:2024-10-27 00:25:26 | Epoch: 1 | Step: 29130 | Dataset: 0-3576901 | Loss: 2.253 | 675 ms/step , 58253.67 GFLOP/s , 532034.4 tokens/s INFO:__main__:2024-10-27 00:25:34 | Epoch: 1 | Step: 29140 | Dataset: 0-3584901 | Loss: 2.052 | 676 ms/step , 58173.28 GFLOP/s , 531780.3 tokens/s INFO:__main__:2024-10-27 00:25:42 | Epoch: 1 | Step: 29150 | Dataset: 0-3592901 | Loss: 2.214 | 675 ms/step , 58199.88 GFLOP/s , 531609.5 tokens/s INFO:__main__:2024-10-27 00:25:50 | Epoch: 1 | Step: 29160 | Dataset: 0-3600901 | Loss: 2.131 | 675 ms/step , 58216.16 GFLOP/s , 532200.2 tokens/s INFO:__main__:2024-10-27 00:25:57 | Epoch: 1 | Step: 29170 | Dataset: 0-3608901 | Loss: 2.198 | 676 ms/step , 58181.58 GFLOP/s , 531884.1 tokens/s INFO:__main__:2024-10-27 00:26:05 | Epoch: 1 | Step: 29180 | Dataset: 0-3616901 | Loss: 2.231 | 675 ms/step , 58276.72 GFLOP/s , 533074.3 tokens/s INFO:__main__:2024-10-27 00:26:13 | Epoch: 1 | Step: 29190 | Dataset: 0-3624901 | Loss: 2.160 | 677 ms/step , 58070.68 GFLOP/s , 532480.4 tokens/s INFO:__main__:2024-10-27 00:26:20 | Epoch: 1 | Step: 29200 | Dataset: 0-3632901 | Loss: 2.153 | 674 ms/step , 58324.53 GFLOP/s , 533133.8 tokens/s INFO:__main__:2024-10-27 00:26:28 | Epoch: 1 | Step: 29210 | Dataset: 0-3640901 | Loss: 2.208 | 676 ms/step , 58176.01 GFLOP/s , 532601.5 tokens/s INFO:__main__:2024-10-27 00:26:36 | Epoch: 1 | Step: 29220 | Dataset: 0-3648901 | Loss: 2.227 | 675 ms/step , 58225.46 GFLOP/s , 530424.7 tokens/s INFO:__main__:2024-10-27 00:26:43 | Epoch: 1 | Step: 29230 | Dataset: 0-3656901 | Loss: 2.251 | 677 ms/step , 58105.29 GFLOP/s , 532320.7 tokens/s INFO:__main__:2024-10-27 00:26:51 | Epoch: 1 | Step: 29240 | Dataset: 0-3664901 | Loss: 2.290 | 676 ms/step , 58181.86 GFLOP/s , 532052.6 tokens/s INFO:__main__:2024-10-27 00:26:59 | Epoch: 1 | Step: 29250 | Dataset: 0-3672901 | Loss: 2.266 | 676 ms/step , 58183.97 GFLOP/s , 532198.9 tokens/s INFO:__main__:2024-10-27 00:27:07 | Epoch: 1 | Step: 29260 | Dataset: 0-3680901 | Loss: 2.100 | 675 ms/step , 58233.87 GFLOP/s , 531871.5 tokens/s INFO:__main__:2024-10-27 00:27:14 | Epoch: 1 | Step: 29270 | Dataset: 0-3688901 | Loss: 2.267 | 676 ms/step , 58177.73 GFLOP/s , 532174.2 tokens/s INFO:__main__:2024-10-27 00:27:22 | Epoch: 1 | Step: 29280 | Dataset: 0-3696901 | Loss: 2.184 | 675 ms/step , 58193.27 GFLOP/s , 532013.3 tokens/s INFO:__main__:2024-10-27 00:27:30 | Epoch: 1 | Step: 29290 | Dataset: 0-3704901 | Loss: 2.273 | 675 ms/step , 58217.74 GFLOP/s , 532453.9 tokens/s INFO:__main__:2024-10-27 00:27:37 | Epoch: 1 | Step: 29300 | Dataset: 0-3712901 | Loss: 2.174 | 675 ms/step , 58249.91 GFLOP/s , 532789.3 tokens/s INFO:__main__:2024-10-27 00:27:45 | Epoch: 1 | Step: 29310 | Dataset: 0-3720901 | Loss: 2.169 | 675 ms/step , 58274.27 GFLOP/s , 532459.5 tokens/s INFO:__main__:2024-10-27 00:27:53 | Epoch: 1 | Step: 29320 | Dataset: 0-3728901 | Loss: 2.216 | 676 ms/step , 58154.50 GFLOP/s , 532139.0 tokens/s INFO:__main__:2024-10-27 00:28:00 | Epoch: 1 | Step: 29330 | Dataset: 0-3736901 | Loss: 2.134 | 675 ms/step , 58223.12 GFLOP/s , 531645.4 tokens/s INFO:__main__:2024-10-27 00:28:08 | Epoch: 1 | Step: 29340 | Dataset: 0-3744901 | Loss: 2.204 | 674 ms/step , 58321.63 GFLOP/s , 531552.7 tokens/s INFO:__main__:2024-10-27 00:28:16 | Epoch: 1 | Step: 29350 | Dataset: 0-3752901 | Loss: 2.138 | 674 ms/step , 58292.38 GFLOP/s , 531949.1 tokens/s INFO:__main__:2024-10-27 00:28:24 | Epoch: 1 | Step: 29360 | Dataset: 0-3760901 | Loss: 2.233 | 677 ms/step , 58080.26 GFLOP/s , 531426.4 tokens/s INFO:__main__:2024-10-27 00:28:31 | Epoch: 1 | Step: 29370 | Dataset: 0-3768901 | Loss: 2.201 | 677 ms/step , 58090.06 GFLOP/s , 530820.3 tokens/s INFO:__main__:2024-10-27 00:28:39 | Epoch: 1 | Step: 29380 | Dataset: 0-3776901 | Loss: 2.053 | 677 ms/step , 58073.90 GFLOP/s , 530423.3 tokens/s INFO:__main__:2024-10-27 00:28:47 | Epoch: 1 | Step: 29390 | Dataset: 0-3784901 | Loss: 1.892 | 679 ms/step , 57884.38 GFLOP/s , 529823.9 tokens/s INFO:__main__:2024-10-27 00:28:54 | Epoch: 1 | Step: 29400 | Dataset: 0-3792901 | Loss: 1.825 | 677 ms/step , 58046.27 GFLOP/s , 530865.9 tokens/s INFO:__main__:2024-10-27 00:29:02 | Epoch: 1 | Step: 29410 | Dataset: 0-3800901 | Loss: 1.811 | 678 ms/step , 57949.14 GFLOP/s , 530578.3 tokens/s INFO:__main__:2024-10-27 00:29:10 | Epoch: 1 | Step: 29420 | Dataset: 0-3808901 | Loss: 1.781 | 677 ms/step , 58056.51 GFLOP/s , 529484.8 tokens/s INFO:__main__:2024-10-27 00:29:18 | Epoch: 1 | Step: 29430 | Dataset: 0-3816901 | Loss: 1.773 | 678 ms/step , 58007.87 GFLOP/s , 530823.8 tokens/s INFO:__main__:2024-10-27 00:29:25 | Epoch: 1 | Step: 29440 | Dataset: 0-3824901 | Loss: 1.739 | 677 ms/step , 58054.18 GFLOP/s , 529820.8 tokens/s INFO:__main__:2024-10-27 00:29:33 | Epoch: 1 | Step: 29450 | Dataset: 0-3832901 | Loss: 1.775 | 677 ms/step , 58057.57 GFLOP/s , 530422.4 tokens/s INFO:__main__:2024-10-27 00:29:41 | Epoch: 1 | Step: 29460 | Dataset: 0-3840901 | Loss: 2.447 | 674 ms/step , 58298.30 GFLOP/s , 531647.9 tokens/s INFO:__main__:2024-10-27 00:29:48 | Epoch: 1 | Step: 29470 | Dataset: 0-3848901 | Loss: 2.229 | 676 ms/step , 58126.26 GFLOP/s , 532499.5 tokens/s INFO:__main__:2024-10-27 00:29:56 | Epoch: 1 | Step: 29480 | Dataset: 0-3856901 | Loss: 2.269 | 676 ms/step , 58121.47 GFLOP/s , 531483.6 tokens/s INFO:__main__:2024-10-27 00:30:04 | Epoch: 1 | Step: 29490 | Dataset: 0-3864901 | Loss: 2.117 | 676 ms/step , 58137.35 GFLOP/s , 531358.1 tokens/s INFO:__main__:2024-10-27 00:30:12 | Epoch: 1 | Step: 29500 | Dataset: 0-3872901 | Loss: 2.143 | 677 ms/step , 58065.66 GFLOP/s , 531274.0 tokens/s INFO:__main__:2024-10-27 00:30:19 | Epoch: 1 | Step: 29510 | Dataset: 0-3880901 | Loss: 2.121 | 676 ms/step , 58130.82 GFLOP/s , 531954.6 tokens/s INFO:__main__:2024-10-27 00:30:27 | Epoch: 1 | Step: 29520 | Dataset: 0-3888901 | Loss: 2.097 | 676 ms/step , 58140.74 GFLOP/s , 532070.5 tokens/s INFO:__main__:2024-10-27 00:30:35 | Epoch: 1 | Step: 29530 | Dataset: 0-3896901 | Loss: 2.193 | 676 ms/step , 58166.58 GFLOP/s , 532395.0 tokens/s INFO:__main__:2024-10-27 00:30:42 | Epoch: 1 | Step: 29540 | Dataset: 0-3904901 | Loss: 2.114 | 676 ms/step , 58187.88 GFLOP/s , 532680.7 tokens/s INFO:__main__:2024-10-27 00:30:50 | Epoch: 1 | Step: 29550 | Dataset: 0-3912901 | Loss: 2.231 | 676 ms/step , 58120.26 GFLOP/s , 532099.7 tokens/s INFO:__main__:2024-10-27 00:30:58 | Epoch: 1 | Step: 29560 | Dataset: 0-3920901 | Loss: 2.207 | 675 ms/step , 58215.94 GFLOP/s , 532264.7 tokens/s INFO:__main__:2024-10-27 00:31:05 | Epoch: 1 | Step: 29570 | Dataset: 0-3928901 | Loss: 2.236 | 676 ms/step , 58173.50 GFLOP/s , 531900.7 tokens/s INFO:__main__:2024-10-27 00:31:13 | Epoch: 1 | Step: 29580 | Dataset: 0-3936901 | Loss: 2.151 | 675 ms/step , 58204.37 GFLOP/s , 532569.4 tokens/s INFO:__main__:2024-10-27 00:31:21 | Epoch: 1 | Step: 29590 | Dataset: 0-3944901 | Loss: 2.147 | 674 ms/step , 58327.38 GFLOP/s , 532191.5 tokens/s INFO:__main__:2024-10-27 00:31:29 | Epoch: 1 | Step: 29600 | Dataset: 0-3952901 | Loss: 2.268 | 677 ms/step , 58090.08 GFLOP/s , 532663.0 tokens/s INFO:__main__:2024-10-27 00:31:36 | Epoch: 1 | Step: 29610 | Dataset: 0-3960901 | Loss: 2.103 | 675 ms/step , 58201.66 GFLOP/s , 532287.3 tokens/s INFO:__main__:2024-10-27 00:31:44 | Epoch: 1 | Step: 29620 | Dataset: 0-3968901 | Loss: 2.178 | 679 ms/step , 57899.68 GFLOP/s , 532310.1 tokens/s INFO:__main__:2024-10-27 00:31:52 | Epoch: 1 | Step: 29630 | Dataset: 0-3976901 | Loss: 2.248 | 676 ms/step , 58134.24 GFLOP/s , 532033.4 tokens/s INFO:__main__:2024-10-27 00:31:59 | Epoch: 1 | Step: 29640 | Dataset: 0-3984901 | Loss: 2.260 | 676 ms/step , 58189.80 GFLOP/s , 532294.6 tokens/s INFO:__main__:2024-10-27 00:32:07 | Epoch: 1 | Step: 29650 | Dataset: 0-3992901 | Loss: 2.252 | 677 ms/step , 58056.77 GFLOP/s , 532185.9 tokens/s INFO:__main__:2024-10-27 00:32:15 | Epoch: 1 | Step: 29660 | Dataset: 0-4000901 | Loss: 2.288 | 677 ms/step , 58092.94 GFLOP/s , 531996.3 tokens/s INFO:__main__:2024-10-27 00:32:22 | Epoch: 1 | Step: 29670 | Dataset: 0-4008901 | Loss: 2.229 | 676 ms/step , 58190.60 GFLOP/s , 532094.2 tokens/s INFO:__main__:2024-10-27 00:32:30 | Epoch: 1 | Step: 29680 | Dataset: 0-4016901 | Loss: 2.219 | 675 ms/step , 58224.67 GFLOP/s , 532154.1 tokens/s INFO:__main__:2024-10-27 00:32:38 | Epoch: 1 | Step: 29690 | Dataset: 0-4024901 | Loss: 2.282 | 676 ms/step , 58178.89 GFLOP/s , 532260.7 tokens/s INFO:__main__:2024-10-27 00:32:45 | Epoch: 1 | Step: 29700 | Dataset: 0-4032901 | Loss: 2.203 | 676 ms/step , 58124.60 GFLOP/s , 532371.3 tokens/s INFO:__main__:2024-10-27 00:32:53 | Epoch: 1 | Step: 29710 | Dataset: 0-4040901 | Loss: 2.232 | 675 ms/step , 58241.45 GFLOP/s , 532530.9 tokens/s INFO:__main__:2024-10-27 00:33:01 | Epoch: 1 | Step: 29720 | Dataset: 0-4048901 | Loss: 2.124 | 676 ms/step , 58156.23 GFLOP/s , 532060.8 tokens/s INFO:__main__:2024-10-27 00:33:09 | Epoch: 1 | Step: 29730 | Dataset: 0-4056901 | Loss: 2.260 | 675 ms/step , 58231.57 GFLOP/s , 532391.2 tokens/s INFO:__main__:2024-10-27 00:33:16 | Epoch: 1 | Step: 29740 | Dataset: 0-4064901 | Loss: 2.208 | 675 ms/step , 58232.65 GFLOP/s , 532261.8 tokens/s INFO:__main__:2024-10-27 00:33:24 | Epoch: 1 | Step: 29750 | Dataset: 0-4072901 | Loss: 2.203 | 675 ms/step , 58252.48 GFLOP/s , 532257.9 tokens/s INFO:__main__:2024-10-27 00:33:32 | Epoch: 1 | Step: 29760 | Dataset: 0-4080901 | Loss: 2.208 | 675 ms/step , 58225.40 GFLOP/s , 532617.9 tokens/s INFO:__main__:2024-10-27 00:33:39 | Epoch: 1 | Step: 29770 | Dataset: 0-4088901 | Loss: 2.201 | 676 ms/step , 58181.92 GFLOP/s , 532059.1 tokens/s INFO:__main__:2024-10-27 00:33:47 | Epoch: 1 | Step: 29780 | Dataset: 0-4096901 | Loss: 2.168 | 676 ms/step , 58174.62 GFLOP/s , 532839.4 tokens/s INFO:__main__:2024-10-27 00:33:55 | Epoch: 1 | Step: 29790 | Dataset: 0-4104901 | Loss: 2.244 | 676 ms/step , 58187.38 GFLOP/s , 532197.4 tokens/s INFO:__main__:2024-10-27 00:34:02 | Epoch: 1 | Step: 29800 | Dataset: 0-4112901 | Loss: 2.229 | 675 ms/step , 58264.15 GFLOP/s , 532480.5 tokens/s INFO:__main__:2024-10-27 00:34:10 | Epoch: 1 | Step: 29810 | Dataset: 0-4120901 | Loss: 2.280 | 676 ms/step , 58136.17 GFLOP/s , 532414.9 tokens/s INFO:__main__:2024-10-27 00:34:18 | Epoch: 1 | Step: 29820 | Dataset: 0-4128901 | Loss: 2.181 | 676 ms/step , 58190.15 GFLOP/s , 532336.5 tokens/s INFO:__main__:2024-10-27 00:34:26 | Epoch: 1 | Step: 29830 | Dataset: 0-4136901 | Loss: 2.290 | 675 ms/step , 58242.27 GFLOP/s , 532266.8 tokens/s INFO:__main__:2024-10-27 00:34:33 | Epoch: 1 | Step: 29840 | Dataset: 0-4144901 | Loss: 2.247 | 673 ms/step , 58410.51 GFLOP/s , 532480.7 tokens/s INFO:__main__:2024-10-27 00:34:41 | Epoch: 1 | Step: 29850 | Dataset: 0-4152901 | Loss: 2.251 | 674 ms/step , 58310.97 GFLOP/s , 532623.2 tokens/s INFO:__main__:2024-10-27 00:34:49 | Epoch: 1 | Step: 29860 | Dataset: 0-4160901 | Loss: 2.247 | 674 ms/step , 58292.61 GFLOP/s , 532865.8 tokens/s INFO:__main__:2024-10-27 00:34:56 | Epoch: 1 | Step: 29870 | Dataset: 0-4168901 | Loss: 2.197 | 675 ms/step , 58223.57 GFLOP/s , 532949.3 tokens/s INFO:__main__:2024-10-27 00:35:04 | Epoch: 1 | Step: 29880 | Dataset: 0-4176901 | Loss: 2.182 | 675 ms/step , 58274.58 GFLOP/s , 531610.3 tokens/s INFO:__main__:2024-10-27 00:35:12 | Epoch: 1 | Step: 29890 | Dataset: 0-4184901 | Loss: 2.271 | 675 ms/step , 58236.39 GFLOP/s , 532383.3 tokens/s INFO:__main__:2024-10-27 00:35:19 | Epoch: 1 | Step: 29900 | Dataset: 0-4192901 | Loss: 2.259 | 676 ms/step , 58181.17 GFLOP/s , 532255.9 tokens/s INFO:__main__:2024-10-27 00:35:27 | Epoch: 1 | Step: 29910 | Dataset: 0-4200901 | Loss: 2.186 | 675 ms/step , 58245.40 GFLOP/s , 532632.4 tokens/s INFO:__main__:2024-10-27 00:35:35 | Epoch: 1 | Step: 29920 | Dataset: 0-4208901 | Loss: 2.184 | 677 ms/step , 58105.24 GFLOP/s , 532198.2 tokens/s INFO:__main__:2024-10-27 00:35:42 | Epoch: 1 | Step: 29930 | Dataset: 0-4216901 | Loss: 2.171 | 675 ms/step , 58223.40 GFLOP/s , 532464.6 tokens/s INFO:__main__:2024-10-27 00:35:50 | Epoch: 1 | Step: 29940 | Dataset: 0-4224901 | Loss: 2.257 | 675 ms/step , 58271.82 GFLOP/s , 532875.7 tokens/s INFO:__main__:2024-10-27 00:35:58 | Epoch: 1 | Step: 29950 | Dataset: 0-4232901 | Loss: 2.277 | 675 ms/step , 58267.36 GFLOP/s , 532341.9 tokens/s INFO:__main__:2024-10-27 00:36:06 | Epoch: 1 | Step: 29960 | Dataset: 0-4240901 | Loss: 2.243 | 676 ms/step , 58191.75 GFLOP/s , 532463.8 tokens/s INFO:__main__:2024-10-27 00:36:13 | Epoch: 1 | Step: 29970 | Dataset: 0-4248901 | Loss: 2.144 | 675 ms/step , 58274.00 GFLOP/s , 532537.3 tokens/s INFO:__main__:2024-10-27 00:36:21 | Epoch: 1 | Step: 29980 | Dataset: 0-4256901 | Loss: 2.181 | 675 ms/step , 58259.12 GFLOP/s , 533239.9 tokens/s INFO:__main__:2024-10-27 00:36:29 | Epoch: 1 | Step: 29990 | Dataset: 0-4264901 | Loss: 2.189 | 675 ms/step , 58244.17 GFLOP/s , 532902.3 tokens/s INFO:__main__:2024-10-27 00:36:36 | Validation | Step: 30000 | Val_loss: 2.177 | Best_val_loss: 2.1915 INFO:__main__:2024-10-27 00:36:36 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_003636_step_30000.pt` INFO:__main__:2024-10-27 00:36:37 | Epoch: 1 | Step: 30000 | Dataset: 0-4272901 | Loss: 2.246 | 673 ms/step , 58380.31 GFLOP/s , 479858.1 tokens/s INFO:__main__:2024-10-27 00:36:45 | Epoch: 1 | Step: 30010 | Dataset: 0-4280901 | Loss: 2.173 | 674 ms/step , 58280.31 GFLOP/s , 533065.5 tokens/s INFO:__main__:2024-10-27 00:36:52 | Epoch: 1 | Step: 30020 | Dataset: 0-4288901 | Loss: 2.170 | 676 ms/step , 58146.89 GFLOP/s , 532934.7 tokens/s INFO:__main__:2024-10-27 00:37:00 | Epoch: 1 | Step: 30030 | Dataset: 0-4296901 | Loss: 2.273 | 676 ms/step , 58173.21 GFLOP/s , 532890.1 tokens/s INFO:__main__:2024-10-27 00:37:08 | Epoch: 1 | Step: 30040 | Dataset: 0-4304901 | Loss: 2.218 | 676 ms/step , 58176.18 GFLOP/s , 532923.9 tokens/s INFO:__main__:2024-10-27 00:37:16 | Epoch: 1 | Step: 30050 | Dataset: 0-4312901 | Loss: 2.172 | 675 ms/step , 58275.74 GFLOP/s , 532986.8 tokens/s INFO:__main__:2024-10-27 00:37:23 | Epoch: 1 | Step: 30060 | Dataset: 0-4320901 | Loss: 2.133 | 675 ms/step , 58233.23 GFLOP/s , 532669.7 tokens/s INFO:__main__:2024-10-27 00:37:31 | Epoch: 1 | Step: 30070 | Dataset: 0-4328901 | Loss: 2.225 | 675 ms/step , 58256.13 GFLOP/s , 532785.0 tokens/s INFO:__main__:2024-10-27 00:37:39 | Epoch: 1 | Step: 30080 | Dataset: 0-4336901 | Loss: 2.212 | 676 ms/step , 58176.65 GFLOP/s , 532430.7 tokens/s INFO:__main__:2024-10-27 00:37:46 | Epoch: 1 | Step: 30090 | Dataset: 0-4344901 | Loss: 2.151 | 675 ms/step , 58198.33 GFLOP/s , 532456.4 tokens/s INFO:__main__:2024-10-27 00:37:54 | Epoch: 1 | Step: 30100 | Dataset: 0-4352901 | Loss: 2.262 | 675 ms/step , 58226.16 GFLOP/s , 532486.0 tokens/s INFO:__main__:2024-10-27 00:38:02 | Epoch: 1 | Step: 30110 | Dataset: 0-4360901 | Loss: 2.077 | 676 ms/step , 58178.13 GFLOP/s , 532117.4 tokens/s INFO:__main__:2024-10-27 00:38:09 | Epoch: 1 | Step: 30120 | Dataset: 0-4368901 | Loss: 1.919 | 676 ms/step , 58125.42 GFLOP/s , 531958.5 tokens/s INFO:__main__:2024-10-27 00:38:17 | Epoch: 1 | Step: 30130 | Dataset: 0-4376901 | Loss: 1.894 | 676 ms/step , 58167.39 GFLOP/s , 531873.9 tokens/s INFO:__main__:2024-10-27 00:38:25 | Epoch: 1 | Step: 30140 | Dataset: 0-4384901 | Loss: 1.863 | 676 ms/step , 58134.25 GFLOP/s , 531681.1 tokens/s INFO:__main__:2024-10-27 00:38:32 | Epoch: 1 | Step: 30150 | Dataset: 0-4392901 | Loss: 1.832 | 676 ms/step , 58117.48 GFLOP/s , 532021.4 tokens/s INFO:__main__:2024-10-27 00:38:40 | Epoch: 1 | Step: 30160 | Dataset: 0-4400901 | Loss: 1.810 | 676 ms/step , 58147.07 GFLOP/s , 531808.2 tokens/s INFO:__main__:2024-10-27 00:38:48 | Epoch: 1 | Step: 30170 | Dataset: 0-4408901 | Loss: 1.802 | 677 ms/step , 58085.46 GFLOP/s , 532096.8 tokens/s INFO:__main__:2024-10-27 00:38:56 | Epoch: 1 | Step: 30180 | Dataset: 0-4416901 | Loss: 1.837 | 676 ms/step , 58163.89 GFLOP/s , 531381.4 tokens/s INFO:__main__:2024-10-27 00:39:03 | Epoch: 1 | Step: 30190 | Dataset: 0-4424901 | Loss: 1.825 | 676 ms/step , 58134.72 GFLOP/s , 531341.0 tokens/s INFO:__main__:2024-10-27 00:39:11 | Epoch: 1 | Step: 30200 | Dataset: 0-4432901 | Loss: 1.769 | 677 ms/step , 58055.78 GFLOP/s , 531492.5 tokens/s INFO:__main__:2024-10-27 00:39:19 | Epoch: 1 | Step: 30210 | Dataset: 0-4440901 | Loss: 1.744 | 676 ms/step , 58121.02 GFLOP/s , 531286.9 tokens/s INFO:__main__:2024-10-27 00:39:26 | Epoch: 1 | Step: 30220 | Dataset: 0-4448901 | Loss: 1.743 | 676 ms/step , 58112.05 GFLOP/s , 530904.1 tokens/s INFO:__main__:2024-10-27 00:39:34 | Epoch: 1 | Step: 30230 | Dataset: 0-4456901 | Loss: 1.725 | 677 ms/step , 58060.22 GFLOP/s , 530270.0 tokens/s INFO:__main__:2024-10-27 00:39:42 | Epoch: 1 | Step: 30240 | Dataset: 0-4464901 | Loss: 1.720 | 676 ms/step , 58156.52 GFLOP/s , 530562.1 tokens/s INFO:__main__:2024-10-27 00:39:50 | Epoch: 1 | Step: 30250 | Dataset: 0-4472901 | Loss: 1.717 | 677 ms/step , 58065.81 GFLOP/s , 530574.5 tokens/s INFO:__main__:2024-10-27 00:39:57 | Epoch: 1 | Step: 30260 | Dataset: 0-4480901 | Loss: 1.698 | 676 ms/step , 58120.00 GFLOP/s , 530554.3 tokens/s INFO:__main__:2024-10-27 00:40:05 | Epoch: 1 | Step: 30270 | Dataset: 0-4488901 | Loss: 1.704 | 678 ms/step , 57957.50 GFLOP/s , 530824.2 tokens/s INFO:__main__:2024-10-27 00:40:13 | Epoch: 1 | Step: 30280 | Dataset: 0-4496901 | Loss: 1.709 | 678 ms/step , 58008.17 GFLOP/s , 530343.3 tokens/s INFO:__main__:2024-10-27 00:40:20 | Epoch: 1 | Step: 30290 | Dataset: 0-4504901 | Loss: 2.308 | 678 ms/step , 58003.93 GFLOP/s , 531099.7 tokens/s INFO:__main__:2024-10-27 00:40:28 | Epoch: 1 | Step: 30300 | Dataset: 0-4512901 | Loss: 2.240 | 676 ms/step , 58129.64 GFLOP/s , 531165.6 tokens/s INFO:__main__:2024-10-27 00:40:36 | Epoch: 1 | Step: 30310 | Dataset: 0-4520901 | Loss: 2.255 | 678 ms/step , 58007.83 GFLOP/s , 530888.4 tokens/s INFO:__main__:2024-10-27 00:40:44 | Epoch: 1 | Step: 30320 | Dataset: 0-4528901 | Loss: 2.151 | 677 ms/step , 58096.03 GFLOP/s , 528582.4 tokens/s INFO:__main__:2024-10-27 00:40:51 | Epoch: 1 | Step: 30330 | Dataset: 0-4536901 | Loss: 2.267 | 676 ms/step , 58154.95 GFLOP/s , 531362.4 tokens/s INFO:__main__:2024-10-27 00:40:59 | Epoch: 1 | Step: 30340 | Dataset: 0-4544901 | Loss: 2.252 | 676 ms/step , 58108.76 GFLOP/s , 531691.4 tokens/s INFO:__main__:2024-10-27 00:41:07 | Epoch: 1 | Step: 30350 | Dataset: 0-4552901 | Loss: 2.222 | 676 ms/step , 58183.45 GFLOP/s , 531967.6 tokens/s INFO:__main__:2024-10-27 00:41:14 | Epoch: 1 | Step: 30360 | Dataset: 0-4560901 | Loss: 2.194 | 675 ms/step , 58229.95 GFLOP/s , 531961.6 tokens/s INFO:__main__:2024-10-27 00:41:22 | Epoch: 1 | Step: 30370 | Dataset: 0-4568901 | Loss: 2.176 | 676 ms/step , 58135.34 GFLOP/s , 532139.4 tokens/s INFO:__main__:2024-10-27 00:41:30 | Epoch: 1 | Step: 30380 | Dataset: 0-4576901 | Loss: 2.277 | 677 ms/step , 58069.50 GFLOP/s , 531654.1 tokens/s INFO:__main__:2024-10-27 00:41:38 | Epoch: 1 | Step: 30390 | Dataset: 0-4584901 | Loss: 2.147 | 676 ms/step , 58186.63 GFLOP/s , 530907.9 tokens/s INFO:__main__:2024-10-27 00:41:45 | Epoch: 1 | Step: 30400 | Dataset: 0-4592901 | Loss: 2.101 | 675 ms/step , 58252.67 GFLOP/s , 532605.0 tokens/s INFO:__main__:2024-10-27 00:41:53 | Epoch: 1 | Step: 30410 | Dataset: 0-4600901 | Loss: 2.187 | 675 ms/step , 58273.82 GFLOP/s , 533351.4 tokens/s INFO:__main__:2024-10-27 00:42:01 | Epoch: 1 | Step: 30420 | Dataset: 0-4608901 | Loss: 2.192 | 674 ms/step , 58305.84 GFLOP/s , 533221.0 tokens/s INFO:__main__:2024-10-27 00:42:08 | Epoch: 1 | Step: 30430 | Dataset: 0-4616901 | Loss: 2.140 | 674 ms/step , 58315.42 GFLOP/s , 533365.0 tokens/s INFO:__main__:2024-10-27 00:42:16 | Epoch: 1 | Step: 30440 | Dataset: 0-4624901 | Loss: 2.180 | 678 ms/step , 57991.01 GFLOP/s , 532504.8 tokens/s INFO:__main__:2024-10-27 00:42:24 | Epoch: 1 | Step: 30450 | Dataset: 0-4632901 | Loss: 2.329 | 674 ms/step , 58317.14 GFLOP/s , 531513.8 tokens/s INFO:__main__:2024-10-27 00:42:31 | Epoch: 1 | Step: 30460 | Dataset: 0-4640901 | Loss: 2.187 | 675 ms/step , 58259.81 GFLOP/s , 532869.5 tokens/s INFO:__main__:2024-10-27 00:42:39 | Epoch: 1 | Step: 30470 | Dataset: 0-4648901 | Loss: 2.216 | 675 ms/step , 58228.66 GFLOP/s , 532777.0 tokens/s INFO:__main__:2024-10-27 00:42:47 | Epoch: 1 | Step: 30480 | Dataset: 0-4656901 | Loss: 2.253 | 675 ms/step , 58198.30 GFLOP/s , 532629.7 tokens/s INFO:__main__:2024-10-27 00:42:54 | Epoch: 1 | Step: 30490 | Dataset: 0-4664901 | Loss: 2.133 | 675 ms/step , 58240.71 GFLOP/s , 533153.4 tokens/s INFO:__main__:2024-10-27 00:43:02 | Epoch: 1 | Step: 30500 | Dataset: 0-4672901 | Loss: 2.160 | 675 ms/step , 58227.08 GFLOP/s , 532668.1 tokens/s INFO:__main__:2024-10-27 00:43:10 | Epoch: 1 | Step: 30510 | Dataset: 0-4680901 | Loss: 2.173 | 676 ms/step , 58192.15 GFLOP/s , 532791.8 tokens/s INFO:__main__:2024-10-27 00:43:18 | Epoch: 1 | Step: 30520 | Dataset: 0-4688901 | Loss: 2.263 | 676 ms/step , 58154.32 GFLOP/s , 532771.8 tokens/s INFO:__main__:2024-10-27 00:43:25 | Epoch: 1 | Step: 30530 | Dataset: 0-4696901 | Loss: 2.093 | 675 ms/step , 58249.70 GFLOP/s , 533066.0 tokens/s INFO:__main__:2024-10-27 00:43:33 | Epoch: 1 | Step: 30540 | Dataset: 0-4704901 | Loss: 2.210 | 674 ms/step , 58307.85 GFLOP/s , 533041.8 tokens/s INFO:__main__:2024-10-27 00:43:41 | Epoch: 1 | Step: 30550 | Dataset: 0-4712901 | Loss: 2.195 | 675 ms/step , 58217.44 GFLOP/s , 533204.0 tokens/s INFO:__main__:2024-10-27 00:43:48 | Epoch: 1 | Step: 30560 | Dataset: 0-4720901 | Loss: 2.146 | 675 ms/step , 58272.27 GFLOP/s , 532896.0 tokens/s INFO:__main__:2024-10-27 00:43:56 | Epoch: 1 | Step: 30570 | Dataset: 0-4728901 | Loss: 2.155 | 674 ms/step , 58311.12 GFLOP/s , 533172.8 tokens/s INFO:__main__:2024-10-27 00:44:04 | Epoch: 1 | Step: 30580 | Dataset: 0-4736901 | Loss: 2.072 | 675 ms/step , 58236.42 GFLOP/s , 533405.9 tokens/s INFO:__main__:2024-10-27 00:44:11 | Epoch: 1 | Step: 30590 | Dataset: 0-4744901 | Loss: 2.241 | 674 ms/step , 58304.31 GFLOP/s , 532735.0 tokens/s INFO:__main__:2024-10-27 00:44:19 | Epoch: 1 | Step: 30600 | Dataset: 0-4752901 | Loss: 2.162 | 675 ms/step , 58266.31 GFLOP/s , 533186.7 tokens/s INFO:__main__:2024-10-27 00:44:27 | Epoch: 1 | Step: 30610 | Dataset: 0-4760901 | Loss: 2.118 | 675 ms/step , 58277.30 GFLOP/s , 532668.8 tokens/s INFO:__main__:2024-10-27 00:44:34 | Epoch: 1 | Step: 30620 | Dataset: 0-4768901 | Loss: 2.142 | 674 ms/step , 58308.97 GFLOP/s , 533240.6 tokens/s INFO:__main__:2024-10-27 00:44:42 | Epoch: 1 | Step: 30630 | Dataset: 0-4776901 | Loss: 2.196 | 675 ms/step , 58216.61 GFLOP/s , 532761.2 tokens/s INFO:__main__:2024-10-27 00:44:50 | Epoch: 1 | Step: 30640 | Dataset: 0-4784901 | Loss: 2.086 | 676 ms/step , 58190.76 GFLOP/s , 532897.6 tokens/s INFO:__main__:2024-10-27 00:44:57 | Epoch: 1 | Step: 30650 | Dataset: 0-4792901 | Loss: 2.237 | 675 ms/step , 58207.32 GFLOP/s , 532779.2 tokens/s INFO:__main__:2024-10-27 00:45:05 | Epoch: 1 | Step: 30660 | Dataset: 0-4800901 | Loss: 2.169 | 675 ms/step , 58261.78 GFLOP/s , 532845.9 tokens/s INFO:__main__:2024-10-27 00:45:13 | Epoch: 1 | Step: 30670 | Dataset: 0-4808901 | Loss: 2.065 | 675 ms/step , 58231.54 GFLOP/s , 532744.7 tokens/s INFO:__main__:2024-10-27 00:45:20 | Epoch: 1 | Step: 30680 | Dataset: 0-4816901 | Loss: 2.155 | 676 ms/step , 58145.33 GFLOP/s , 532598.8 tokens/s INFO:__main__:2024-10-27 00:45:28 | Epoch: 1 | Step: 30690 | Dataset: 0-4824901 | Loss: 2.223 | 674 ms/step , 58325.75 GFLOP/s , 533061.5 tokens/s INFO:__main__:2024-10-27 00:45:36 | Epoch: 1 | Step: 30700 | Dataset: 0-4832901 | Loss: 2.149 | 676 ms/step , 58182.64 GFLOP/s , 532756.2 tokens/s INFO:__main__:2024-10-27 00:45:44 | Epoch: 1 | Step: 30710 | Dataset: 0-4840901 | Loss: 2.156 | 676 ms/step , 58182.09 GFLOP/s , 533287.1 tokens/s INFO:__main__:2024-10-27 00:45:51 | Epoch: 1 | Step: 30720 | Dataset: 0-4848901 | Loss: 2.187 | 675 ms/step , 58243.55 GFLOP/s , 532669.8 tokens/s INFO:__main__:2024-10-27 00:45:59 | Epoch: 1 | Step: 30730 | Dataset: 0-4856901 | Loss: 2.165 | 675 ms/step , 58252.37 GFLOP/s , 533088.0 tokens/s INFO:__main__:2024-10-27 00:46:07 | Epoch: 1 | Step: 30740 | Dataset: 0-4864901 | Loss: 2.192 | 674 ms/step , 58291.39 GFLOP/s , 532698.2 tokens/s INFO:__main__:2024-10-27 00:46:14 | Epoch: 1 | Step: 30750 | Dataset: 0-4872901 | Loss: 2.212 | 675 ms/step , 58204.14 GFLOP/s , 532656.2 tokens/s INFO:__main__:2024-10-27 00:46:22 | Epoch: 1 | Step: 30760 | Dataset: 0-4880901 | Loss: 2.193 | 677 ms/step , 58093.00 GFLOP/s , 532485.3 tokens/s INFO:__main__:2024-10-27 00:46:30 | Epoch: 1 | Step: 30770 | Dataset: 0-4888901 | Loss: 2.229 | 676 ms/step , 58109.48 GFLOP/s , 532596.4 tokens/s INFO:__main__:2024-10-27 00:46:37 | Epoch: 1 | Step: 30780 | Dataset: 0-4896901 | Loss: 2.200 | 676 ms/step , 58178.64 GFLOP/s , 532405.0 tokens/s INFO:__main__:2024-10-27 00:46:45 | Epoch: 1 | Step: 30790 | Dataset: 0-4904901 | Loss: 2.180 | 675 ms/step , 58213.68 GFLOP/s , 532707.3 tokens/s INFO:__main__:2024-10-27 00:46:53 | Epoch: 1 | Step: 30800 | Dataset: 0-4912901 | Loss: 2.197 | 677 ms/step , 58089.27 GFLOP/s , 532695.7 tokens/s INFO:__main__:2024-10-27 00:47:00 | Epoch: 1 | Step: 30810 | Dataset: 0-4920901 | Loss: 2.136 | 675 ms/step , 58229.28 GFLOP/s , 532489.6 tokens/s INFO:__main__:2024-10-27 00:47:08 | Epoch: 1 | Step: 30820 | Dataset: 0-4928901 | Loss: 2.230 | 674 ms/step , 58296.69 GFLOP/s , 532879.6 tokens/s INFO:__main__:2024-10-27 00:47:16 | Epoch: 1 | Step: 30830 | Dataset: 0-4936901 | Loss: 2.147 | 675 ms/step , 58255.53 GFLOP/s , 532282.6 tokens/s INFO:__main__:2024-10-27 00:47:24 | Epoch: 1 | Step: 30840 | Dataset: 0-4944901 | Loss: 2.091 | 676 ms/step , 58121.91 GFLOP/s , 532400.8 tokens/s INFO:__main__:2024-10-27 00:47:31 | Epoch: 1 | Step: 30850 | Dataset: 0-4952901 | Loss: 2.221 | 678 ms/step , 58008.81 GFLOP/s , 532013.4 tokens/s INFO:__main__:2024-10-27 00:47:39 | Epoch: 1 | Step: 30860 | Dataset: 0-4960901 | Loss: 2.067 | 675 ms/step , 58224.05 GFLOP/s , 532452.0 tokens/s INFO:__main__:2024-10-27 00:47:47 | Epoch: 1 | Step: 30870 | Dataset: 0-4968901 | Loss: 2.181 | 675 ms/step , 58248.75 GFLOP/s , 532025.8 tokens/s INFO:__main__:2024-10-27 00:47:54 | Epoch: 1 | Step: 30880 | Dataset: 0-4976901 | Loss: 2.168 | 675 ms/step , 58271.32 GFLOP/s , 532527.2 tokens/s INFO:__main__:2024-10-27 00:48:02 | Epoch: 1 | Step: 30890 | Dataset: 0-4984901 | Loss: 2.190 | 676 ms/step , 58119.70 GFLOP/s , 532242.9 tokens/s INFO:__main__:2024-10-27 00:48:10 | Epoch: 1 | Step: 30900 | Dataset: 0-4992901 | Loss: 2.220 | 676 ms/step , 58131.97 GFLOP/s , 531896.1 tokens/s INFO:__main__:2024-10-27 00:48:17 | Epoch: 1 | Step: 30910 | Dataset: 0-5000901 | Loss: 2.123 | 676 ms/step , 58108.30 GFLOP/s , 531772.4 tokens/s INFO:__main__:2024-10-27 00:48:25 | Epoch: 1 | Step: 30920 | Dataset: 0-5008901 | Loss: 2.171 | 676 ms/step , 58184.50 GFLOP/s , 531732.1 tokens/s INFO:__main__:2024-10-27 00:48:33 | Epoch: 1 | Step: 30930 | Dataset: 0-5016901 | Loss: 2.297 | 677 ms/step , 58090.53 GFLOP/s , 532404.7 tokens/s INFO:__main__:2024-10-27 00:48:41 | Epoch: 1 | Step: 30940 | Dataset: 0-5024901 | Loss: 2.251 | 675 ms/step , 58200.38 GFLOP/s , 532061.3 tokens/s INFO:__main__:2024-10-27 00:48:48 | Epoch: 1 | Step: 30950 | Dataset: 0-5032901 | Loss: 2.251 | 674 ms/step , 58296.25 GFLOP/s , 532831.3 tokens/s INFO:__main__:2024-10-27 00:48:56 | Epoch: 1 | Step: 30960 | Dataset: 0-5040901 | Loss: 2.207 | 677 ms/step , 58085.42 GFLOP/s , 531905.9 tokens/s INFO:__main__:2024-10-27 00:49:04 | Epoch: 1 | Step: 30970 | Dataset: 0-5048901 | Loss: 2.212 | 678 ms/step , 58016.07 GFLOP/s , 532367.3 tokens/s INFO:__main__:2024-10-27 00:49:11 | Epoch: 1 | Step: 30980 | Dataset: 0-5056901 | Loss: 2.162 | 676 ms/step , 58166.11 GFLOP/s , 532435.9 tokens/s INFO:__main__:2024-10-27 00:49:19 | Epoch: 1 | Step: 30990 | Dataset: 0-5064901 | Loss: 2.186 | 674 ms/step , 58285.74 GFLOP/s , 532680.8 tokens/s INFO:__main__:2024-10-27 00:49:26 | Validation | Step: 31000 | Val_loss: 2.584 | Best_val_loss: 2.1773 INFO:__main__:2024-10-27 00:49:26 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_004926_step_31000.pt` INFO:__main__:2024-10-27 00:49:28 | Epoch: 1 | Step: 31000 | Dataset: 0-5072901 | Loss: 2.206 | 674 ms/step , 58319.17 GFLOP/s , 479661.1 tokens/s INFO:__main__:2024-10-27 00:49:35 | Epoch: 1 | Step: 31010 | Dataset: 0-5080901 | Loss: 2.086 | 676 ms/step , 58155.86 GFLOP/s , 532473.6 tokens/s INFO:__main__:2024-10-27 00:49:43 | Epoch: 1 | Step: 31020 | Dataset: 0-5088901 | Loss: 2.133 | 675 ms/step , 58257.36 GFLOP/s , 532703.8 tokens/s INFO:__main__:2024-10-27 00:49:51 | Epoch: 1 | Step: 31030 | Dataset: 0-5096901 | Loss: 2.122 | 676 ms/step , 58157.79 GFLOP/s , 532330.6 tokens/s INFO:__main__:2024-10-27 00:49:58 | Epoch: 1 | Step: 31040 | Dataset: 0-5104901 | Loss: 2.149 | 676 ms/step , 58169.77 GFLOP/s , 533009.0 tokens/s INFO:__main__:2024-10-27 00:50:06 | Epoch: 1 | Step: 31050 | Dataset: 0-5112901 | Loss: 2.162 | 675 ms/step , 58230.62 GFLOP/s , 532666.6 tokens/s INFO:__main__:2024-10-27 00:50:14 | Epoch: 1 | Step: 31060 | Dataset: 0-5120901 | Loss: 2.176 | 678 ms/step , 58009.33 GFLOP/s , 532697.3 tokens/s INFO:__main__:2024-10-27 00:50:21 | Epoch: 1 | Step: 31070 | Dataset: 0-5128901 | Loss: 2.185 | 675 ms/step , 58278.57 GFLOP/s , 532610.1 tokens/s INFO:__main__:2024-10-27 00:50:29 | Epoch: 1 | Step: 31080 | Dataset: 0-5136901 | Loss: 2.053 | 675 ms/step , 58198.14 GFLOP/s , 532989.4 tokens/s INFO:__main__:2024-10-27 00:50:37 | Epoch: 1 | Step: 31090 | Dataset: 0-5144901 | Loss: 2.177 | 674 ms/step , 58311.42 GFLOP/s , 532626.1 tokens/s INFO:__main__:2024-10-27 00:50:44 | Epoch: 1 | Step: 31100 | Dataset: 0-5152901 | Loss: 1.865 | 679 ms/step , 57932.13 GFLOP/s , 531667.1 tokens/s INFO:__main__:2024-10-27 00:50:52 | Epoch: 1 | Step: 31110 | Dataset: 0-5160901 | Loss: 1.803 | 675 ms/step , 58205.19 GFLOP/s , 532083.5 tokens/s INFO:__main__:2024-10-27 00:51:00 | Epoch: 1 | Step: 31120 | Dataset: 0-5168901 | Loss: 1.794 | 678 ms/step , 57976.30 GFLOP/s , 531731.6 tokens/s INFO:__main__:2024-10-27 00:51:08 | Epoch: 1 | Step: 31130 | Dataset: 0-5176901 | Loss: 1.764 | 675 ms/step , 58230.50 GFLOP/s , 532176.0 tokens/s INFO:__main__:2024-10-27 00:51:15 | Epoch: 1 | Step: 31140 | Dataset: 0-5184901 | Loss: 1.786 | 676 ms/step , 58171.81 GFLOP/s , 532036.8 tokens/s INFO:__main__:2024-10-27 00:51:23 | Epoch: 1 | Step: 31150 | Dataset: 0-5192901 | Loss: 1.742 | 676 ms/step , 58141.52 GFLOP/s , 532031.7 tokens/s INFO:__main__:2024-10-27 00:51:31 | Epoch: 1 | Step: 31160 | Dataset: 0-5200901 | Loss: 1.744 | 674 ms/step , 58296.70 GFLOP/s , 532032.3 tokens/s INFO:__main__:2024-10-27 00:51:38 | Epoch: 1 | Step: 31170 | Dataset: 0-5208901 | Loss: 1.708 | 704 ms/step , 55842.22 GFLOP/s , 529435.0 tokens/s INFO:__main__:2024-10-27 00:51:46 | Epoch: 1 | Step: 31180 | Dataset: 0-5216901 | Loss: 2.282 | 675 ms/step , 58195.67 GFLOP/s , 531751.6 tokens/s INFO:__main__:2024-10-27 00:51:54 | Epoch: 1 | Step: 31190 | Dataset: 0-5224901 | Loss: 2.261 | 674 ms/step , 58300.20 GFLOP/s , 532025.6 tokens/s INFO:__main__:2024-10-27 00:52:01 | Epoch: 1 | Step: 31200 | Dataset: 0-5232901 | Loss: 2.216 | 674 ms/step , 58309.57 GFLOP/s , 532383.0 tokens/s INFO:__main__:2024-10-27 00:52:09 | Epoch: 1 | Step: 31210 | Dataset: 0-5240901 | Loss: 2.202 | 676 ms/step , 58138.92 GFLOP/s , 532073.1 tokens/s INFO:__main__:2024-10-27 00:52:17 | Epoch: 1 | Step: 31220 | Dataset: 0-5248901 | Loss: 2.184 | 675 ms/step , 58223.71 GFLOP/s , 532213.0 tokens/s INFO:__main__:2024-10-27 00:52:25 | Epoch: 1 | Step: 31230 | Dataset: 0-5256901 | Loss: 2.191 | 676 ms/step , 58177.04 GFLOP/s , 531013.9 tokens/s INFO:__main__:2024-10-27 00:52:32 | Epoch: 1 | Step: 31240 | Dataset: 0-5264901 | Loss: 2.153 | 674 ms/step , 58310.13 GFLOP/s , 532007.4 tokens/s INFO:__main__:2024-10-27 00:52:40 | Epoch: 1 | Step: 31250 | Dataset: 0-5272901 | Loss: 2.223 | 675 ms/step , 58260.00 GFLOP/s , 532573.9 tokens/s INFO:__main__:2024-10-27 00:52:48 | Epoch: 1 | Step: 31260 | Dataset: 0-5280901 | Loss: 2.158 | 675 ms/step , 58202.28 GFLOP/s , 532960.0 tokens/s INFO:__main__:2024-10-27 00:52:55 | Epoch: 1 | Step: 31270 | Dataset: 0-5288901 | Loss: 2.106 | 674 ms/step , 58346.05 GFLOP/s , 532781.4 tokens/s INFO:__main__:2024-10-27 00:53:03 | Epoch: 1 | Step: 31280 | Dataset: 0-5296901 | Loss: 2.249 | 676 ms/step , 58141.87 GFLOP/s , 533221.1 tokens/s INFO:__main__:2024-10-27 00:53:11 | Epoch: 1 | Step: 31290 | Dataset: 0-5304901 | Loss: 2.077 | 675 ms/step , 58232.17 GFLOP/s , 533394.9 tokens/s INFO:__main__:2024-10-27 00:53:18 | Epoch: 1 | Step: 31300 | Dataset: 0-5312901 | Loss: 2.104 | 676 ms/step , 58181.43 GFLOP/s , 532478.8 tokens/s INFO:__main__:2024-10-27 00:53:26 | Epoch: 1 | Step: 31310 | Dataset: 0-5320901 | Loss: 2.148 | 674 ms/step , 58285.79 GFLOP/s , 533181.6 tokens/s INFO:__main__:2024-10-27 00:53:34 | Epoch: 1 | Step: 31320 | Dataset: 0-5328901 | Loss: 2.141 | 676 ms/step , 58151.65 GFLOP/s , 532418.7 tokens/s INFO:__main__:2024-10-27 00:53:41 | Epoch: 1 | Step: 31330 | Dataset: 0-5336901 | Loss: 2.163 | 678 ms/step , 57987.83 GFLOP/s , 532734.8 tokens/s INFO:__main__:2024-10-27 00:53:49 | Epoch: 1 | Step: 31340 | Dataset: 0-5344901 | Loss: 2.111 | 674 ms/step , 58309.76 GFLOP/s , 532481.5 tokens/s INFO:__main__:2024-10-27 00:53:57 | Epoch: 1 | Step: 31350 | Dataset: 0-5352901 | Loss: 2.081 | 675 ms/step , 58246.47 GFLOP/s , 532958.9 tokens/s INFO:__main__:2024-10-27 00:54:04 | Epoch: 1 | Step: 31360 | Dataset: 0-5360901 | Loss: 2.030 | 674 ms/step , 58287.36 GFLOP/s , 532526.8 tokens/s INFO:__main__:2024-10-27 00:54:12 | Epoch: 1 | Step: 31370 | Dataset: 0-5368901 | Loss: 1.996 | 674 ms/step , 58294.17 GFLOP/s , 532797.1 tokens/s INFO:__main__:2024-10-27 00:54:20 | Epoch: 1 | Step: 31380 | Dataset: 0-5376901 | Loss: 2.032 | 675 ms/step , 58245.53 GFLOP/s , 531844.8 tokens/s INFO:__main__:2024-10-27 00:54:28 | Epoch: 1 | Step: 31390 | Dataset: 0-5384901 | Loss: 1.990 | 675 ms/step , 58231.40 GFLOP/s , 532434.2 tokens/s INFO:__main__:2024-10-27 00:54:35 | Epoch: 1 | Step: 31400 | Dataset: 0-5392901 | Loss: 1.941 | 676 ms/step , 58129.45 GFLOP/s , 532678.8 tokens/s INFO:__main__:2024-10-27 00:54:43 | Epoch: 1 | Step: 31410 | Dataset: 0-5400901 | Loss: 1.924 | 675 ms/step , 58271.79 GFLOP/s , 532409.7 tokens/s INFO:__main__:2024-10-27 00:54:51 | Epoch: 1 | Step: 31420 | Dataset: 0-5408901 | Loss: 1.960 | 674 ms/step , 58346.52 GFLOP/s , 532393.1 tokens/s INFO:__main__:2024-10-27 00:54:58 | Epoch: 1 | Step: 31430 | Dataset: 0-5416901 | Loss: 1.951 | 675 ms/step , 58278.67 GFLOP/s , 532268.4 tokens/s INFO:__main__:2024-10-27 00:55:06 | Epoch: 1 | Step: 31440 | Dataset: 0-5424901 | Loss: 1.836 | 677 ms/step , 58086.71 GFLOP/s , 531665.8 tokens/s INFO:__main__:2024-10-27 00:55:14 | Epoch: 1 | Step: 31450 | Dataset: 0-5432901 | Loss: 1.811 | 675 ms/step , 58235.16 GFLOP/s , 529853.4 tokens/s INFO:__main__:2024-10-27 00:55:21 | Epoch: 1 | Step: 31460 | Dataset: 0-5440901 | Loss: 1.839 | 674 ms/step , 58289.39 GFLOP/s , 532005.8 tokens/s INFO:__main__:2024-10-27 00:55:29 | Epoch: 1 | Step: 31470 | Dataset: 0-5448901 | Loss: 1.776 | 675 ms/step , 58239.00 GFLOP/s , 532184.2 tokens/s INFO:__main__:2024-10-27 00:55:37 | Epoch: 1 | Step: 31480 | Dataset: 0-5456901 | Loss: 1.792 | 677 ms/step , 58030.02 GFLOP/s , 532170.4 tokens/s INFO:__main__:2024-10-27 00:55:45 | Epoch: 1 | Step: 31490 | Dataset: 0-5464901 | Loss: 1.784 | 675 ms/step , 58205.28 GFLOP/s , 531797.2 tokens/s INFO:__main__:2024-10-27 00:55:52 | Epoch: 1 | Step: 31500 | Dataset: 0-5472901 | Loss: 1.797 | 675 ms/step , 58232.94 GFLOP/s , 532231.2 tokens/s INFO:__main__:2024-10-27 00:56:00 | Epoch: 1 | Step: 31510 | Dataset: 0-5480901 | Loss: 1.794 | 676 ms/step , 58163.53 GFLOP/s , 532226.2 tokens/s INFO:__main__:2024-10-27 00:56:08 | Epoch: 1 | Step: 31520 | Dataset: 0-5488901 | Loss: 1.775 | 677 ms/step , 58049.92 GFLOP/s , 531758.1 tokens/s INFO:__main__:2024-10-27 00:56:15 | Epoch: 1 | Step: 31530 | Dataset: 0-5496901 | Loss: 2.526 | 676 ms/step , 58141.50 GFLOP/s , 532070.4 tokens/s INFO:__main__:2024-10-27 00:56:23 | Epoch: 1 | Step: 31540 | Dataset: 0-5504901 | Loss: 2.320 | 675 ms/step , 58258.13 GFLOP/s , 532904.8 tokens/s INFO:__main__:2024-10-27 00:56:31 | Epoch: 1 | Step: 31550 | Dataset: 0-5512901 | Loss: 2.297 | 675 ms/step , 58193.88 GFLOP/s , 532899.4 tokens/s INFO:__main__:2024-10-27 00:56:38 | Epoch: 1 | Step: 31560 | Dataset: 0-5520901 | Loss: 2.244 | 675 ms/step , 58274.81 GFLOP/s , 532326.1 tokens/s INFO:__main__:2024-10-27 00:56:46 | Epoch: 1 | Step: 31570 | Dataset: 0-5528901 | Loss: 2.271 | 675 ms/step , 58204.82 GFLOP/s , 532647.3 tokens/s INFO:__main__:2024-10-27 00:56:54 | Epoch: 1 | Step: 31580 | Dataset: 0-5536901 | Loss: 2.247 | 675 ms/step , 58265.99 GFLOP/s , 532093.3 tokens/s INFO:__main__:2024-10-27 00:57:02 | Epoch: 1 | Step: 31590 | Dataset: 0-5544901 | Loss: 2.258 | 676 ms/step , 58133.21 GFLOP/s , 532325.4 tokens/s INFO:__main__:2024-10-27 00:57:09 | Epoch: 1 | Step: 31600 | Dataset: 0-5552901 | Loss: 2.214 | 676 ms/step , 58132.20 GFLOP/s , 531971.3 tokens/s INFO:__main__:2024-10-27 00:57:17 | Epoch: 1 | Step: 31610 | Dataset: 0-5560901 | Loss: 2.212 | 676 ms/step , 58117.52 GFLOP/s , 532221.1 tokens/s INFO:__main__:2024-10-27 00:57:25 | Epoch: 1 | Step: 31620 | Dataset: 0-5568901 | Loss: 2.234 | 677 ms/step , 58064.28 GFLOP/s , 532091.9 tokens/s INFO:__main__:2024-10-27 00:57:32 | Epoch: 1 | Step: 31630 | Dataset: 0-5576901 | Loss: 2.206 | 676 ms/step , 58152.91 GFLOP/s , 531968.3 tokens/s INFO:__main__:2024-10-27 00:57:40 | Epoch: 1 | Step: 31640 | Dataset: 0-5584901 | Loss: 2.234 | 676 ms/step , 58192.84 GFLOP/s , 532112.3 tokens/s INFO:__main__:2024-10-27 00:57:48 | Epoch: 1 | Step: 31650 | Dataset: 0-5592901 | Loss: 2.182 | 678 ms/step , 58018.61 GFLOP/s , 531801.3 tokens/s INFO:__main__:2024-10-27 00:57:55 | Epoch: 1 | Step: 31660 | Dataset: 0-5600901 | Loss: 2.255 | 675 ms/step , 58197.22 GFLOP/s , 532413.5 tokens/s INFO:__main__:2024-10-27 00:58:03 | Epoch: 1 | Step: 31670 | Dataset: 0-5608901 | Loss: 2.210 | 676 ms/step , 58111.59 GFLOP/s , 531986.7 tokens/s INFO:__main__:2024-10-27 00:58:11 | Epoch: 1 | Step: 31680 | Dataset: 0-5616901 | Loss: 2.176 | 676 ms/step , 58124.04 GFLOP/s , 532187.5 tokens/s INFO:__main__:2024-10-27 00:58:19 | Epoch: 1 | Step: 31690 | Dataset: 0-5624901 | Loss: 2.204 | 676 ms/step , 58150.32 GFLOP/s , 531946.5 tokens/s INFO:__main__:2024-10-27 00:58:26 | Epoch: 1 | Step: 31700 | Dataset: 0-5632901 | Loss: 2.252 | 675 ms/step , 58221.13 GFLOP/s , 532273.5 tokens/s INFO:__main__:2024-10-27 00:58:34 | Epoch: 1 | Step: 31710 | Dataset: 0-5640901 | Loss: 2.244 | 676 ms/step , 58131.84 GFLOP/s , 532137.4 tokens/s INFO:__main__:2024-10-27 00:58:42 | Epoch: 1 | Step: 31720 | Dataset: 0-5648901 | Loss: 2.243 | 677 ms/step , 58030.03 GFLOP/s , 530200.9 tokens/s INFO:__main__:2024-10-27 00:58:49 | Epoch: 1 | Step: 31730 | Dataset: 0-5656901 | Loss: 2.297 | 676 ms/step , 58163.04 GFLOP/s , 530408.9 tokens/s INFO:__main__:2024-10-27 00:58:57 | Epoch: 1 | Step: 31740 | Dataset: 0-5664901 | Loss: 2.216 | 676 ms/step , 58150.08 GFLOP/s , 530677.2 tokens/s INFO:__main__:2024-10-27 00:59:05 | Epoch: 1 | Step: 31750 | Dataset: 0-5672901 | Loss: 2.197 | 677 ms/step , 58070.65 GFLOP/s , 530730.5 tokens/s INFO:__main__:2024-10-27 00:59:13 | Epoch: 1 | Step: 31760 | Dataset: 0-5680901 | Loss: 2.231 | 676 ms/step , 58123.77 GFLOP/s , 528843.3 tokens/s INFO:__main__:2024-10-27 00:59:20 | Epoch: 1 | Step: 31770 | Dataset: 0-5688901 | Loss: 2.220 | 676 ms/step , 58181.13 GFLOP/s , 532371.2 tokens/s INFO:__main__:2024-10-27 00:59:28 | Epoch: 1 | Step: 31780 | Dataset: 0-5696901 | Loss: 2.234 | 675 ms/step , 58248.68 GFLOP/s , 532328.0 tokens/s INFO:__main__:2024-10-27 00:59:36 | Epoch: 1 | Step: 31790 | Dataset: 0-5704901 | Loss: 2.234 | 675 ms/step , 58223.35 GFLOP/s , 532629.2 tokens/s INFO:__main__:2024-10-27 00:59:43 | Epoch: 1 | Step: 31800 | Dataset: 0-5712901 | Loss: 2.230 | 676 ms/step , 58188.80 GFLOP/s , 532427.9 tokens/s INFO:__main__:2024-10-27 00:59:51 | Epoch: 1 | Step: 31810 | Dataset: 0-5720901 | Loss: 2.219 | 676 ms/step , 58159.54 GFLOP/s , 532505.1 tokens/s INFO:__main__:2024-10-27 00:59:59 | Epoch: 1 | Step: 31820 | Dataset: 0-5728901 | Loss: 2.208 | 675 ms/step , 58269.14 GFLOP/s , 532758.0 tokens/s INFO:__main__:2024-10-27 01:00:05 | Epoch: 1 | Step: 31830 | Dataset: 0-5736901 | Loss: 2.178 | 675 ms/step , 58240.51 GFLOP/s , 609373.0 tokens/s INFO:__main__:2024-10-27 01:00:13 | Epoch: 1 | Step: 31840 | Dataset: 0-5744901 | Loss: 2.255 | 676 ms/step , 58163.57 GFLOP/s , 531902.5 tokens/s INFO:__main__:2024-10-27 01:00:21 | Epoch: 1 | Step: 31850 | Dataset: 0-5752901 | Loss: 2.178 | 675 ms/step , 58201.28 GFLOP/s , 530981.4 tokens/s INFO:__main__:2024-10-27 01:00:29 | Epoch: 1 | Step: 31860 | Dataset: 0-5760901 | Loss: 2.368 | 676 ms/step , 58155.87 GFLOP/s , 531701.9 tokens/s INFO:__main__:2024-10-27 01:00:36 | Epoch: 1 | Step: 31870 | Dataset: 0-5768901 | Loss: 2.235 | 675 ms/step , 58205.12 GFLOP/s , 531766.6 tokens/s INFO:__main__:2024-10-27 01:00:44 | Epoch: 1 | Step: 31880 | Dataset: 0-5776901 | Loss: 2.242 | 677 ms/step , 58086.87 GFLOP/s , 530743.6 tokens/s INFO:__main__:2024-10-27 01:00:52 | Epoch: 1 | Step: 31890 | Dataset: 0-5784901 | Loss: 2.175 | 674 ms/step , 58288.52 GFLOP/s , 531845.7 tokens/s INFO:__main__:2024-10-27 01:00:59 | Epoch: 1 | Step: 31900 | Dataset: 0-5792901 | Loss: 2.235 | 675 ms/step , 58236.73 GFLOP/s , 532830.2 tokens/s INFO:__main__:2024-10-27 01:01:07 | Epoch: 1 | Step: 31910 | Dataset: 0-5800901 | Loss: 2.226 | 678 ms/step , 57994.62 GFLOP/s , 531246.7 tokens/s INFO:__main__:2024-10-27 01:01:15 | Epoch: 1 | Step: 31920 | Dataset: 0-5808901 | Loss: 2.190 | 675 ms/step , 58244.70 GFLOP/s , 532420.6 tokens/s INFO:__main__:2024-10-27 01:01:22 | Epoch: 1 | Step: 31930 | Dataset: 0-5816901 | Loss: 2.206 | 676 ms/step , 58140.11 GFLOP/s , 532791.4 tokens/s INFO:__main__:2024-10-27 01:01:30 | Epoch: 1 | Step: 31940 | Dataset: 0-5824901 | Loss: 2.206 | 675 ms/step , 58248.16 GFLOP/s , 533035.2 tokens/s INFO:__main__:2024-10-27 01:01:38 | Epoch: 1 | Step: 31950 | Dataset: 0-5832901 | Loss: 2.213 | 674 ms/step , 58309.90 GFLOP/s , 532899.6 tokens/s INFO:__main__:2024-10-27 01:01:45 | Epoch: 1 | Step: 31960 | Dataset: 0-5840901 | Loss: 2.099 | 674 ms/step , 58296.52 GFLOP/s , 532341.5 tokens/s INFO:__main__:2024-10-27 01:01:53 | Epoch: 1 | Step: 31970 | Dataset: 0-5848901 | Loss: 2.169 | 675 ms/step , 58233.31 GFLOP/s , 532890.7 tokens/s INFO:__main__:2024-10-27 01:02:01 | Epoch: 1 | Step: 31980 | Dataset: 0-5856901 | Loss: 2.254 | 675 ms/step , 58270.98 GFLOP/s , 532263.3 tokens/s INFO:__main__:2024-10-27 01:02:09 | Epoch: 1 | Step: 31990 | Dataset: 0-5864901 | Loss: 2.180 | 675 ms/step , 58240.06 GFLOP/s , 532541.8 tokens/s INFO:__main__:2024-10-27 01:02:16 | Validation | Step: 32000 | Val_loss: 2.224 | Best_val_loss: 2.1773 INFO:__main__:2024-10-27 01:02:16 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_010216_step_32000.pt` INFO:__main__:2024-10-27 01:02:17 | Epoch: 1 | Step: 32000 | Dataset: 0-5872901 | Loss: 2.215 | 676 ms/step , 58183.79 GFLOP/s , 478099.9 tokens/s INFO:__main__:2024-10-27 01:02:25 | Epoch: 1 | Step: 32010 | Dataset: 0-5880901 | Loss: 2.161 | 677 ms/step , 58096.77 GFLOP/s , 530775.2 tokens/s INFO:__main__:2024-10-27 01:02:33 | Epoch: 1 | Step: 32020 | Dataset: 0-5888901 | Loss: 2.218 | 677 ms/step , 58106.68 GFLOP/s , 530845.8 tokens/s INFO:__main__:2024-10-27 01:02:40 | Epoch: 1 | Step: 32030 | Dataset: 0-5896901 | Loss: 2.278 | 676 ms/step , 58175.99 GFLOP/s , 531319.4 tokens/s INFO:__main__:2024-10-27 01:02:48 | Epoch: 1 | Step: 32040 | Dataset: 0-5904901 | Loss: 2.215 | 676 ms/step , 58130.85 GFLOP/s , 530023.9 tokens/s INFO:__main__:2024-10-27 01:02:56 | Epoch: 1 | Step: 32050 | Dataset: 0-5912901 | Loss: 2.246 | 676 ms/step , 58181.87 GFLOP/s , 530628.6 tokens/s INFO:__main__:2024-10-27 01:03:03 | Epoch: 1 | Step: 32060 | Dataset: 0-5920901 | Loss: 2.234 | 677 ms/step , 58098.27 GFLOP/s , 531535.9 tokens/s INFO:__main__:2024-10-27 01:03:11 | Epoch: 1 | Step: 32070 | Dataset: 0-5928901 | Loss: 2.288 | 677 ms/step , 58096.73 GFLOP/s , 530834.7 tokens/s INFO:__main__:2024-10-27 01:03:19 | Epoch: 1 | Step: 32080 | Dataset: 0-5936901 | Loss: 2.186 | 717 ms/step , 54787.28 GFLOP/s , 514455.4 tokens/s INFO:__main__:2024-10-27 01:03:27 | Epoch: 1 | Step: 32090 | Dataset: 0-5944901 | Loss: 2.148 | 719 ms/step , 54699.55 GFLOP/s , 499575.8 tokens/s INFO:__main__:2024-10-27 01:03:35 | Epoch: 1 | Step: 32100 | Dataset: 0-5952901 | Loss: 2.292 | 676 ms/step , 58160.24 GFLOP/s , 511268.7 tokens/s INFO:__main__:2024-10-27 01:03:43 | Epoch: 1 | Step: 32110 | Dataset: 0-5960901 | Loss: 2.266 | 675 ms/step , 58247.43 GFLOP/s , 531680.4 tokens/s INFO:__main__:2024-10-27 01:03:51 | Epoch: 1 | Step: 32120 | Dataset: 0-5968901 | Loss: 2.231 | 675 ms/step , 58255.32 GFLOP/s , 531733.7 tokens/s INFO:__main__:2024-10-27 01:03:58 | Epoch: 1 | Step: 32130 | Dataset: 0-5976901 | Loss: 2.241 | 675 ms/step , 58261.97 GFLOP/s , 530907.9 tokens/s INFO:__main__:2024-10-27 01:04:06 | Epoch: 1 | Step: 32140 | Dataset: 0-5984901 | Loss: 2.197 | 674 ms/step , 58347.96 GFLOP/s , 533202.4 tokens/s INFO:__main__:2024-10-27 01:04:14 | Epoch: 1 | Step: 32150 | Dataset: 0-5992901 | Loss: 2.222 | 675 ms/step , 58260.78 GFLOP/s , 533053.0 tokens/s INFO:__main__:2024-10-27 01:04:21 | Epoch: 1 | Step: 32160 | Dataset: 0-6000901 | Loss: 2.219 | 674 ms/step , 58303.42 GFLOP/s , 532739.5 tokens/s INFO:__main__:2024-10-27 01:04:29 | Epoch: 1 | Step: 32170 | Dataset: 0-6008901 | Loss: 2.279 | 676 ms/step , 58117.57 GFLOP/s , 533037.4 tokens/s INFO:__main__:2024-10-27 01:04:37 | Epoch: 1 | Step: 32180 | Dataset: 0-6016901 | Loss: 2.222 | 675 ms/step , 58255.82 GFLOP/s , 532881.4 tokens/s INFO:__main__:2024-10-27 01:04:45 | Epoch: 1 | Step: 32190 | Dataset: 0-6024901 | Loss: 2.289 | 675 ms/step , 58243.01 GFLOP/s , 532448.9 tokens/s INFO:__main__:2024-10-27 01:04:52 | Epoch: 1 | Step: 32200 | Dataset: 0-6032901 | Loss: 2.231 | 677 ms/step , 58027.34 GFLOP/s , 532667.2 tokens/s INFO:__main__:2024-10-27 01:05:00 | Epoch: 1 | Step: 32210 | Dataset: 0-6040901 | Loss: 2.181 | 675 ms/step , 58266.46 GFLOP/s , 531313.1 tokens/s INFO:__main__:2024-10-27 01:05:08 | Epoch: 1 | Step: 32220 | Dataset: 0-6048901 | Loss: 2.129 | 675 ms/step , 58210.81 GFLOP/s , 532367.7 tokens/s INFO:__main__:2024-10-27 01:05:15 | Epoch: 1 | Step: 32230 | Dataset: 0-6056901 | Loss: 2.182 | 675 ms/step , 58233.56 GFLOP/s , 532271.9 tokens/s INFO:__main__:2024-10-27 01:05:23 | Epoch: 1 | Step: 32240 | Dataset: 0-6064901 | Loss: 2.136 | 675 ms/step , 58276.91 GFLOP/s , 532120.0 tokens/s INFO:__main__:2024-10-27 01:05:30 | Epoch: 1 | Step: 32250 | Dataset: 0-6072901 | Loss: 2.160 | 676 ms/step , 58158.37 GFLOP/s , 551530.4 tokens/s INFO:__main__:2024-10-27 01:05:38 | Epoch: 1 | Step: 32260 | Dataset: 0-6080901 | Loss: 2.185 | 676 ms/step , 58192.21 GFLOP/s , 532114.9 tokens/s INFO:__main__:2024-10-27 01:05:46 | Epoch: 1 | Step: 32270 | Dataset: 0-6088901 | Loss: 2.106 | 676 ms/step , 58151.51 GFLOP/s , 531981.4 tokens/s INFO:__main__:2024-10-27 01:05:54 | Epoch: 1 | Step: 32280 | Dataset: 0-6096901 | Loss: 2.195 | 677 ms/step , 58102.39 GFLOP/s , 532149.6 tokens/s INFO:__main__:2024-10-27 01:06:01 | Epoch: 1 | Step: 32290 | Dataset: 0-6104901 | Loss: 2.164 | 677 ms/step , 58081.17 GFLOP/s , 531753.6 tokens/s INFO:__main__:2024-10-27 01:06:09 | Epoch: 1 | Step: 32300 | Dataset: 0-6112901 | Loss: 2.123 | 675 ms/step , 58194.21 GFLOP/s , 531961.5 tokens/s INFO:__main__:2024-10-27 01:06:17 | Epoch: 1 | Step: 32310 | Dataset: 0-6120901 | Loss: 2.118 | 675 ms/step , 58238.71 GFLOP/s , 531759.1 tokens/s INFO:__main__:2024-10-27 01:06:24 | Epoch: 1 | Step: 32320 | Dataset: 0-6128901 | Loss: 2.190 | 676 ms/step , 58109.19 GFLOP/s , 532070.9 tokens/s INFO:__main__:2024-10-27 01:06:32 | Epoch: 1 | Step: 32330 | Dataset: 0-6136901 | Loss: 2.079 | 675 ms/step , 58235.52 GFLOP/s , 531984.8 tokens/s INFO:__main__:2024-10-27 01:06:40 | Epoch: 1 | Step: 32340 | Dataset: 0-6144901 | Loss: 2.180 | 675 ms/step , 58216.95 GFLOP/s , 532304.6 tokens/s INFO:__main__:2024-10-27 01:06:47 | Epoch: 1 | Step: 32350 | Dataset: 0-6152901 | Loss: 2.219 | 676 ms/step , 58141.31 GFLOP/s , 531524.4 tokens/s INFO:__main__:2024-10-27 01:06:55 | Epoch: 1 | Step: 32360 | Dataset: 0-6160901 | Loss: 2.222 | 674 ms/step , 58289.73 GFLOP/s , 531964.0 tokens/s INFO:__main__:2024-10-27 01:07:03 | Epoch: 1 | Step: 32370 | Dataset: 0-6168901 | Loss: 2.253 | 676 ms/step , 58140.14 GFLOP/s , 532164.7 tokens/s INFO:__main__:2024-10-27 01:07:11 | Epoch: 1 | Step: 32380 | Dataset: 0-6176901 | Loss: 2.123 | 675 ms/step , 58211.59 GFLOP/s , 531935.9 tokens/s INFO:__main__:2024-10-27 01:07:18 | Epoch: 1 | Step: 32390 | Dataset: 0-6184901 | Loss: 2.085 | 675 ms/step , 58275.56 GFLOP/s , 532668.1 tokens/s INFO:__main__:2024-10-27 01:07:26 | Epoch: 1 | Step: 32400 | Dataset: 0-6192901 | Loss: 2.196 | 676 ms/step , 58187.90 GFLOP/s , 531393.4 tokens/s INFO:__main__:2024-10-27 01:07:34 | Epoch: 1 | Step: 32410 | Dataset: 0-6200901 | Loss: 2.117 | 676 ms/step , 58146.85 GFLOP/s , 531692.7 tokens/s INFO:__main__:2024-10-27 01:07:41 | Epoch: 1 | Step: 32420 | Dataset: 0-6208901 | Loss: 2.180 | 677 ms/step , 58090.88 GFLOP/s , 531616.6 tokens/s INFO:__main__:2024-10-27 01:07:49 | Epoch: 1 | Step: 32430 | Dataset: 0-6216901 | Loss: 2.187 | 676 ms/step , 58146.02 GFLOP/s , 532108.6 tokens/s INFO:__main__:2024-10-27 01:07:57 | Epoch: 1 | Step: 32440 | Dataset: 0-6224901 | Loss: 2.225 | 676 ms/step , 58191.69 GFLOP/s , 531859.2 tokens/s INFO:__main__:2024-10-27 01:08:04 | Epoch: 1 | Step: 32450 | Dataset: 0-6232901 | Loss: 2.165 | 675 ms/step , 58208.90 GFLOP/s , 532000.3 tokens/s INFO:__main__:2024-10-27 01:08:12 | Epoch: 1 | Step: 32460 | Dataset: 0-6240901 | Loss: 2.180 | 675 ms/step , 58213.50 GFLOP/s , 531904.9 tokens/s INFO:__main__:2024-10-27 01:08:20 | Epoch: 1 | Step: 32470 | Dataset: 0-6248901 | Loss: 2.202 | 676 ms/step , 58181.27 GFLOP/s , 532426.2 tokens/s INFO:__main__:2024-10-27 01:08:28 | Epoch: 1 | Step: 32480 | Dataset: 0-6256901 | Loss: 2.141 | 676 ms/step , 58167.44 GFLOP/s , 532041.2 tokens/s INFO:__main__:2024-10-27 01:08:35 | Epoch: 1 | Step: 32490 | Dataset: 0-6264901 | Loss: 2.132 | 676 ms/step , 58166.18 GFLOP/s , 531895.2 tokens/s INFO:__main__:2024-10-27 01:08:43 | Epoch: 1 | Step: 32500 | Dataset: 0-6272901 | Loss: 2.190 | 676 ms/step , 58183.91 GFLOP/s , 532179.3 tokens/s INFO:__main__:2024-10-27 01:08:51 | Epoch: 1 | Step: 32510 | Dataset: 0-6280901 | Loss: 2.267 | 675 ms/step , 58245.97 GFLOP/s , 532355.1 tokens/s INFO:__main__:2024-10-27 01:08:58 | Epoch: 1 | Step: 32520 | Dataset: 0-6288901 | Loss: 2.176 | 675 ms/step , 58224.19 GFLOP/s , 532421.8 tokens/s INFO:__main__:2024-10-27 01:09:06 | Epoch: 1 | Step: 32530 | Dataset: 0-6296901 | Loss: 2.229 | 676 ms/step , 58182.92 GFLOP/s , 532089.5 tokens/s INFO:__main__:2024-10-27 01:09:14 | Epoch: 1 | Step: 32540 | Dataset: 0-6304901 | Loss: 2.155 | 675 ms/step , 58220.24 GFLOP/s , 532037.6 tokens/s INFO:__main__:2024-10-27 01:09:21 | Epoch: 1 | Step: 32550 | Dataset: 0-6312901 | Loss: 2.183 | 675 ms/step , 58197.53 GFLOP/s , 531760.5 tokens/s INFO:__main__:2024-10-27 01:09:29 | Epoch: 1 | Step: 32560 | Dataset: 0-6320901 | Loss: 2.222 | 676 ms/step , 58183.81 GFLOP/s , 532119.0 tokens/s INFO:__main__:2024-10-27 01:09:37 | Epoch: 1 | Step: 32570 | Dataset: 0-6328901 | Loss: 2.184 | 675 ms/step , 58258.70 GFLOP/s , 531954.2 tokens/s INFO:__main__:2024-10-27 01:09:45 | Epoch: 1 | Step: 32580 | Dataset: 0-6336901 | Loss: 2.194 | 676 ms/step , 58188.66 GFLOP/s , 532227.4 tokens/s INFO:__main__:2024-10-27 01:09:52 | Epoch: 1 | Step: 32590 | Dataset: 0-6344901 | Loss: 2.221 | 676 ms/step , 58175.89 GFLOP/s , 532045.1 tokens/s INFO:__main__:2024-10-27 01:10:00 | Epoch: 1 | Step: 32600 | Dataset: 0-6352901 | Loss: 2.260 | 675 ms/step , 58254.41 GFLOP/s , 531779.3 tokens/s INFO:__main__:2024-10-27 01:10:08 | Epoch: 1 | Step: 32610 | Dataset: 0-6360901 | Loss: 2.238 | 674 ms/step , 58281.74 GFLOP/s , 532355.0 tokens/s INFO:__main__:2024-10-27 01:10:15 | Epoch: 1 | Step: 32620 | Dataset: 0-6368901 | Loss: 2.291 | 673 ms/step , 58370.33 GFLOP/s , 532619.8 tokens/s INFO:__main__:2024-10-27 01:10:23 | Epoch: 1 | Step: 32630 | Dataset: 0-6376901 | Loss: 2.153 | 675 ms/step , 58253.49 GFLOP/s , 532487.2 tokens/s INFO:__main__:2024-10-27 01:10:31 | Epoch: 1 | Step: 32640 | Dataset: 0-6384901 | Loss: 2.104 | 675 ms/step , 58273.73 GFLOP/s , 532192.1 tokens/s INFO:__main__:2024-10-27 01:10:38 | Epoch: 1 | Step: 32650 | Dataset: 0-6392901 | Loss: 2.121 | 674 ms/step , 58283.69 GFLOP/s , 531884.7 tokens/s INFO:__main__:2024-10-27 01:10:46 | Epoch: 1 | Step: 32660 | Dataset: 0-6400901 | Loss: 2.351 | 677 ms/step , 58043.79 GFLOP/s , 530713.6 tokens/s INFO:__main__:2024-10-27 01:10:54 | Epoch: 1 | Step: 32670 | Dataset: 0-6408901 | Loss: 2.263 | 674 ms/step , 58309.30 GFLOP/s , 532487.6 tokens/s INFO:__main__:2024-10-27 01:11:01 | Epoch: 1 | Step: 32680 | Dataset: 0-6416901 | Loss: 2.185 | 676 ms/step , 58178.82 GFLOP/s , 532790.8 tokens/s INFO:__main__:2024-10-27 01:11:09 | Epoch: 1 | Step: 32690 | Dataset: 0-6424901 | Loss: 2.176 | 676 ms/step , 58162.50 GFLOP/s , 532086.4 tokens/s INFO:__main__:2024-10-27 01:11:17 | Epoch: 1 | Step: 32700 | Dataset: 0-6432901 | Loss: 2.103 | 675 ms/step , 58194.92 GFLOP/s , 532460.1 tokens/s INFO:__main__:2024-10-27 01:11:25 | Epoch: 1 | Step: 32710 | Dataset: 0-6440901 | Loss: 2.135 | 675 ms/step , 58236.83 GFLOP/s , 530541.9 tokens/s INFO:__main__:2024-10-27 01:11:32 | Epoch: 1 | Step: 32720 | Dataset: 0-6448901 | Loss: 2.072 | 674 ms/step , 58354.37 GFLOP/s , 532587.4 tokens/s INFO:__main__:2024-10-27 01:11:40 | Epoch: 1 | Step: 32730 | Dataset: 0-6456901 | Loss: 2.120 | 674 ms/step , 58290.26 GFLOP/s , 532714.5 tokens/s INFO:__main__:2024-10-27 01:11:48 | Epoch: 1 | Step: 32740 | Dataset: 0-6464901 | Loss: 2.098 | 674 ms/step , 58316.75 GFLOP/s , 532932.8 tokens/s INFO:__main__:2024-10-27 01:11:55 | Epoch: 1 | Step: 32750 | Dataset: 0-6472901 | Loss: 2.036 | 676 ms/step , 58190.12 GFLOP/s , 532381.0 tokens/s INFO:__main__:2024-10-27 01:12:03 | Epoch: 1 | Step: 32760 | Dataset: 0-6480901 | Loss: 2.046 | 674 ms/step , 58327.03 GFLOP/s , 532499.3 tokens/s INFO:__main__:2024-10-27 01:12:11 | Epoch: 1 | Step: 32770 | Dataset: 0-6488901 | Loss: 2.049 | 676 ms/step , 58160.52 GFLOP/s , 532252.2 tokens/s INFO:__main__:2024-10-27 01:12:18 | Epoch: 1 | Step: 32780 | Dataset: 0-6496901 | Loss: 2.049 | 676 ms/step , 58188.82 GFLOP/s , 532545.7 tokens/s INFO:__main__:2024-10-27 01:12:26 | Epoch: 1 | Step: 32790 | Dataset: 0-6504901 | Loss: 2.009 | 675 ms/step , 58239.51 GFLOP/s , 532072.0 tokens/s INFO:__main__:2024-10-27 01:12:34 | Epoch: 1 | Step: 32800 | Dataset: 0-6512901 | Loss: 2.009 | 675 ms/step , 58203.16 GFLOP/s , 532197.4 tokens/s INFO:__main__:2024-10-27 01:12:42 | Epoch: 1 | Step: 32810 | Dataset: 0-6520901 | Loss: 1.995 | 675 ms/step , 58249.33 GFLOP/s , 532257.2 tokens/s INFO:__main__:2024-10-27 01:12:49 | Epoch: 1 | Step: 32820 | Dataset: 0-6528901 | Loss: 2.230 | 675 ms/step , 58195.48 GFLOP/s , 532437.3 tokens/s INFO:__main__:2024-10-27 01:12:57 | Epoch: 1 | Step: 32830 | Dataset: 0-6536901 | Loss: 1.989 | 674 ms/step , 58343.15 GFLOP/s , 532413.7 tokens/s INFO:__main__:2024-10-27 01:13:05 | Epoch: 1 | Step: 32840 | Dataset: 0-6544901 | Loss: 1.925 | 675 ms/step , 58201.74 GFLOP/s , 531862.3 tokens/s INFO:__main__:2024-10-27 01:13:12 | Epoch: 1 | Step: 32850 | Dataset: 0-6552901 | Loss: 1.888 | 675 ms/step , 58252.23 GFLOP/s , 531967.6 tokens/s INFO:__main__:2024-10-27 01:13:20 | Epoch: 1 | Step: 32860 | Dataset: 0-6560901 | Loss: 1.874 | 676 ms/step , 58180.25 GFLOP/s , 531826.8 tokens/s INFO:__main__:2024-10-27 01:13:28 | Epoch: 1 | Step: 32870 | Dataset: 0-6568901 | Loss: 1.817 | 677 ms/step , 58092.39 GFLOP/s , 531807.4 tokens/s INFO:__main__:2024-10-27 01:13:35 | Epoch: 1 | Step: 32880 | Dataset: 0-6576901 | Loss: 1.836 | 676 ms/step , 58120.20 GFLOP/s , 531397.9 tokens/s INFO:__main__:2024-10-27 01:13:43 | Epoch: 1 | Step: 32890 | Dataset: 0-6584901 | Loss: 1.828 | 675 ms/step , 58263.02 GFLOP/s , 531802.9 tokens/s INFO:__main__:2024-10-27 01:13:51 | Epoch: 1 | Step: 32900 | Dataset: 0-6592901 | Loss: 1.850 | 674 ms/step , 58309.35 GFLOP/s , 532369.1 tokens/s INFO:__main__:2024-10-27 01:13:59 | Epoch: 1 | Step: 32910 | Dataset: 0-6600901 | Loss: 1.817 | 675 ms/step , 58229.46 GFLOP/s , 532323.3 tokens/s INFO:__main__:2024-10-27 01:14:06 | Epoch: 1 | Step: 32920 | Dataset: 0-6608901 | Loss: 1.851 | 676 ms/step , 58149.13 GFLOP/s , 532165.5 tokens/s INFO:__main__:2024-10-27 01:14:14 | Epoch: 1 | Step: 32930 | Dataset: 0-6616901 | Loss: 1.831 | 676 ms/step , 58177.17 GFLOP/s , 531473.3 tokens/s INFO:__main__:2024-10-27 01:14:22 | Epoch: 1 | Step: 32940 | Dataset: 0-6624901 | Loss: 1.812 | 676 ms/step , 58169.28 GFLOP/s , 531845.8 tokens/s INFO:__main__:2024-10-27 01:14:29 | Epoch: 1 | Step: 32950 | Dataset: 0-6632901 | Loss: 1.783 | 675 ms/step , 58224.09 GFLOP/s , 531496.1 tokens/s INFO:__main__:2024-10-27 01:14:37 | Epoch: 1 | Step: 32960 | Dataset: 0-6640901 | Loss: 1.814 | 675 ms/step , 58237.85 GFLOP/s , 531575.6 tokens/s INFO:__main__:2024-10-27 01:14:45 | Epoch: 1 | Step: 32970 | Dataset: 0-6648901 | Loss: 1.827 | 675 ms/step , 58259.39 GFLOP/s , 532189.0 tokens/s INFO:__main__:2024-10-27 01:14:52 | Epoch: 1 | Step: 32980 | Dataset: 0-6656901 | Loss: 1.765 | 676 ms/step , 58171.52 GFLOP/s , 532241.4 tokens/s INFO:__main__:2024-10-27 01:15:00 | Epoch: 1 | Step: 32990 | Dataset: 0-6664901 | Loss: 1.768 | 676 ms/step , 58124.77 GFLOP/s , 531829.0 tokens/s INFO:__main__:2024-10-27 01:15:07 | Validation | Step: 33000 | Val_loss: 2.077 | Best_val_loss: 2.1773 INFO:__main__:2024-10-27 01:15:07 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_011507_step_33000.pt` INFO:__main__:2024-10-27 01:15:09 | Epoch: 1 | Step: 33000 | Dataset: 0-6672901 | Loss: 2.503 | 674 ms/step , 58361.53 GFLOP/s , 478454.0 tokens/s INFO:__main__:2024-10-27 01:15:16 | Epoch: 1 | Step: 33010 | Dataset: 0-6680901 | Loss: 2.213 | 676 ms/step , 58150.84 GFLOP/s , 531516.5 tokens/s INFO:__main__:2024-10-27 01:15:24 | Epoch: 1 | Step: 33020 | Dataset: 0-6688901 | Loss: 2.196 | 675 ms/step , 58260.25 GFLOP/s , 532293.4 tokens/s INFO:__main__:2024-10-27 01:15:32 | Epoch: 1 | Step: 33030 | Dataset: 0-6696901 | Loss: 2.165 | 676 ms/step , 58129.15 GFLOP/s , 531182.9 tokens/s INFO:__main__:2024-10-27 01:15:40 | Epoch: 1 | Step: 33040 | Dataset: 0-6704901 | Loss: 2.169 | 675 ms/step , 58194.80 GFLOP/s , 531289.2 tokens/s INFO:__main__:2024-10-27 01:15:47 | Epoch: 1 | Step: 33050 | Dataset: 0-6712901 | Loss: 2.194 | 675 ms/step , 58245.39 GFLOP/s , 531372.3 tokens/s INFO:__main__:2024-10-27 01:15:55 | Epoch: 1 | Step: 33060 | Dataset: 0-6720901 | Loss: 2.070 | 676 ms/step , 58146.28 GFLOP/s , 531143.5 tokens/s INFO:__main__:2024-10-27 01:16:03 | Epoch: 1 | Step: 33070 | Dataset: 0-6728901 | Loss: 2.025 | 676 ms/step , 58139.96 GFLOP/s , 530202.5 tokens/s INFO:__main__:2024-10-27 01:16:10 | Epoch: 1 | Step: 33080 | Dataset: 0-6736901 | Loss: 2.078 | 676 ms/step , 58183.78 GFLOP/s , 532027.4 tokens/s INFO:__main__:2024-10-27 01:16:18 | Epoch: 1 | Step: 33090 | Dataset: 0-6744901 | Loss: 2.095 | 675 ms/step , 58214.42 GFLOP/s , 532665.0 tokens/s INFO:__main__:2024-10-27 01:16:26 | Epoch: 1 | Step: 33100 | Dataset: 0-6752901 | Loss: 2.102 | 677 ms/step , 58070.72 GFLOP/s , 531954.1 tokens/s INFO:__main__:2024-10-27 01:16:33 | Epoch: 1 | Step: 33110 | Dataset: 0-6760901 | Loss: 2.001 | 676 ms/step , 58134.00 GFLOP/s , 530572.1 tokens/s INFO:__main__:2024-10-27 01:16:41 | Epoch: 1 | Step: 33120 | Dataset: 0-6768901 | Loss: 2.008 | 676 ms/step , 58137.19 GFLOP/s , 531060.9 tokens/s INFO:__main__:2024-10-27 01:16:49 | Epoch: 1 | Step: 33130 | Dataset: 0-6776901 | Loss: 2.122 | 677 ms/step , 58041.92 GFLOP/s , 531099.9 tokens/s INFO:__main__:2024-10-27 01:16:57 | Epoch: 1 | Step: 33140 | Dataset: 0-6784901 | Loss: 2.054 | 677 ms/step , 58099.72 GFLOP/s , 531190.4 tokens/s INFO:__main__:2024-10-27 01:17:04 | Epoch: 1 | Step: 33150 | Dataset: 0-6792901 | Loss: 1.967 | 676 ms/step , 58115.04 GFLOP/s , 531010.9 tokens/s INFO:__main__:2024-10-27 01:17:12 | Epoch: 1 | Step: 33160 | Dataset: 0-6800901 | Loss: 1.944 | 676 ms/step , 58147.58 GFLOP/s , 531538.0 tokens/s INFO:__main__:2024-10-27 01:17:20 | Epoch: 1 | Step: 33170 | Dataset: 0-6808901 | Loss: 1.862 | 677 ms/step , 58030.66 GFLOP/s , 530313.3 tokens/s INFO:__main__:2024-10-27 01:17:27 | Epoch: 1 | Step: 33180 | Dataset: 0-6816901 | Loss: 1.793 | 676 ms/step , 58116.87 GFLOP/s , 529978.4 tokens/s INFO:__main__:2024-10-27 01:17:35 | Epoch: 1 | Step: 33190 | Dataset: 0-6824901 | Loss: 1.796 | 675 ms/step , 58225.43 GFLOP/s , 531355.4 tokens/s INFO:__main__:2024-10-27 01:17:43 | Epoch: 1 | Step: 33200 | Dataset: 0-6832901 | Loss: 1.793 | 675 ms/step , 58235.20 GFLOP/s , 532129.9 tokens/s INFO:__main__:2024-10-27 01:17:51 | Epoch: 1 | Step: 33210 | Dataset: 0-6840901 | Loss: 1.794 | 676 ms/step , 58192.18 GFLOP/s , 532316.3 tokens/s INFO:__main__:2024-10-27 01:17:58 | Epoch: 1 | Step: 33220 | Dataset: 0-6848901 | Loss: 1.773 | 676 ms/step , 58117.51 GFLOP/s , 531819.8 tokens/s INFO:__main__:2024-10-27 01:18:06 | Epoch: 1 | Step: 33230 | Dataset: 0-6856901 | Loss: 1.804 | 676 ms/step , 58187.86 GFLOP/s , 531846.9 tokens/s INFO:__main__:2024-10-27 01:18:14 | Epoch: 1 | Step: 33240 | Dataset: 0-6864901 | Loss: 1.796 | 675 ms/step , 58212.38 GFLOP/s , 531897.7 tokens/s INFO:__main__:2024-10-27 01:18:21 | Epoch: 1 | Step: 33250 | Dataset: 0-6872901 | Loss: 2.450 | 675 ms/step , 58273.99 GFLOP/s , 532269.2 tokens/s INFO:__main__:2024-10-27 01:18:29 | Epoch: 1 | Step: 33260 | Dataset: 0-6880901 | Loss: 2.355 | 675 ms/step , 58266.46 GFLOP/s , 532434.5 tokens/s INFO:__main__:2024-10-27 01:18:37 | Epoch: 1 | Step: 33270 | Dataset: 0-6888901 | Loss: 2.268 | 675 ms/step , 58239.24 GFLOP/s , 532850.8 tokens/s INFO:__main__:2024-10-27 01:18:44 | Epoch: 1 | Step: 33280 | Dataset: 0-6896901 | Loss: 2.278 | 675 ms/step , 58221.62 GFLOP/s , 532638.4 tokens/s INFO:__main__:2024-10-27 01:18:52 | Epoch: 1 | Step: 33290 | Dataset: 0-6904901 | Loss: 2.302 | 675 ms/step , 58220.87 GFLOP/s , 532863.4 tokens/s INFO:__main__:2024-10-27 01:19:00 | Epoch: 1 | Step: 33300 | Dataset: 0-6912901 | Loss: 2.240 | 675 ms/step , 58234.66 GFLOP/s , 532689.7 tokens/s INFO:__main__:2024-10-27 01:19:08 | Epoch: 1 | Step: 33310 | Dataset: 0-6920901 | Loss: 2.256 | 676 ms/step , 58148.74 GFLOP/s , 533406.3 tokens/s INFO:__main__:2024-10-27 01:19:15 | Epoch: 1 | Step: 33320 | Dataset: 0-6928901 | Loss: 2.255 | 677 ms/step , 58077.35 GFLOP/s , 532117.0 tokens/s INFO:__main__:2024-10-27 01:19:23 | Epoch: 1 | Step: 33330 | Dataset: 0-6936901 | Loss: 2.261 | 678 ms/step , 58010.16 GFLOP/s , 531575.4 tokens/s INFO:__main__:2024-10-27 01:19:31 | Epoch: 1 | Step: 33340 | Dataset: 0-6944901 | Loss: 2.244 | 675 ms/step , 58238.81 GFLOP/s , 531042.5 tokens/s INFO:__main__:2024-10-27 01:19:38 | Epoch: 1 | Step: 33350 | Dataset: 0-6952901 | Loss: 2.190 | 676 ms/step , 58107.12 GFLOP/s , 532100.2 tokens/s INFO:__main__:2024-10-27 01:19:46 | Epoch: 1 | Step: 33360 | Dataset: 0-6960901 | Loss: 2.223 | 675 ms/step , 58244.50 GFLOP/s , 532261.7 tokens/s INFO:__main__:2024-10-27 01:19:54 | Epoch: 1 | Step: 33370 | Dataset: 0-6968901 | Loss: 2.232 | 675 ms/step , 58264.88 GFLOP/s , 532191.5 tokens/s INFO:__main__:2024-10-27 01:20:01 | Epoch: 1 | Step: 33380 | Dataset: 0-6976901 | Loss: 2.235 | 675 ms/step , 58274.57 GFLOP/s , 532415.3 tokens/s INFO:__main__:2024-10-27 01:20:09 | Epoch: 1 | Step: 33390 | Dataset: 0-6984901 | Loss: 2.235 | 675 ms/step , 58201.60 GFLOP/s , 532538.3 tokens/s INFO:__main__:2024-10-27 01:20:17 | Epoch: 1 | Step: 33400 | Dataset: 0-6992901 | Loss: 2.274 | 675 ms/step , 58214.96 GFLOP/s , 532618.9 tokens/s INFO:__main__:2024-10-27 01:20:24 | Epoch: 1 | Step: 33410 | Dataset: 0-7000901 | Loss: 2.251 | 675 ms/step , 58272.20 GFLOP/s , 532780.6 tokens/s INFO:__main__:2024-10-27 01:20:32 | Epoch: 1 | Step: 33420 | Dataset: 0-7008901 | Loss: 2.211 | 675 ms/step , 58212.71 GFLOP/s , 532679.7 tokens/s INFO:__main__:2024-10-27 01:20:40 | Epoch: 1 | Step: 33430 | Dataset: 0-7016901 | Loss: 2.294 | 675 ms/step , 58193.23 GFLOP/s , 532408.0 tokens/s INFO:__main__:2024-10-27 01:20:48 | Epoch: 1 | Step: 33440 | Dataset: 0-7024901 | Loss: 2.250 | 674 ms/step , 58341.56 GFLOP/s , 532342.4 tokens/s INFO:__main__:2024-10-27 01:20:55 | Epoch: 1 | Step: 33450 | Dataset: 0-7032901 | Loss: 2.256 | 676 ms/step , 58141.98 GFLOP/s , 532956.6 tokens/s INFO:__main__:2024-10-27 01:21:03 | Epoch: 1 | Step: 33460 | Dataset: 0-7040901 | Loss: 2.200 | 675 ms/step , 58208.97 GFLOP/s , 532553.0 tokens/s INFO:__main__:2024-10-27 01:21:11 | Epoch: 1 | Step: 33470 | Dataset: 0-7048901 | Loss: 2.226 | 674 ms/step , 58281.82 GFLOP/s , 532967.5 tokens/s INFO:__main__:2024-10-27 01:21:18 | Epoch: 1 | Step: 33480 | Dataset: 0-7056901 | Loss: 2.217 | 675 ms/step , 58197.34 GFLOP/s , 532402.7 tokens/s INFO:__main__:2024-10-27 01:21:26 | Epoch: 1 | Step: 33490 | Dataset: 0-7064901 | Loss: 2.283 | 675 ms/step , 58220.69 GFLOP/s , 532653.8 tokens/s INFO:__main__:2024-10-27 01:21:34 | Epoch: 1 | Step: 33500 | Dataset: 0-7072901 | Loss: 2.263 | 675 ms/step , 58212.65 GFLOP/s , 532291.5 tokens/s INFO:__main__:2024-10-27 01:21:41 | Epoch: 1 | Step: 33510 | Dataset: 0-7080901 | Loss: 2.226 | 674 ms/step , 58280.20 GFLOP/s , 532321.1 tokens/s INFO:__main__:2024-10-27 01:21:49 | Epoch: 1 | Step: 33520 | Dataset: 0-7088901 | Loss: 2.252 | 676 ms/step , 58186.36 GFLOP/s , 532680.3 tokens/s INFO:__main__:2024-10-27 01:21:57 | Epoch: 1 | Step: 33530 | Dataset: 0-7096901 | Loss: 2.192 | 677 ms/step , 58037.15 GFLOP/s , 531438.2 tokens/s INFO:__main__:2024-10-27 01:22:04 | Epoch: 1 | Step: 33540 | Dataset: 0-7104901 | Loss: 2.161 | 675 ms/step , 58272.86 GFLOP/s , 532809.0 tokens/s INFO:__main__:2024-10-27 01:22:12 | Epoch: 1 | Step: 33550 | Dataset: 0-7112901 | Loss: 2.168 | 674 ms/step , 58316.85 GFLOP/s , 532315.0 tokens/s INFO:__main__:2024-10-27 01:22:20 | Epoch: 1 | Step: 33560 | Dataset: 0-7120901 | Loss: 2.226 | 674 ms/step , 58313.60 GFLOP/s , 532883.0 tokens/s INFO:__main__:2024-10-27 01:22:28 | Epoch: 1 | Step: 33570 | Dataset: 0-7128901 | Loss: 1.895 | 675 ms/step , 58231.32 GFLOP/s , 532098.1 tokens/s INFO:__main__:2024-10-27 01:22:35 | Epoch: 1 | Step: 33580 | Dataset: 0-7136901 | Loss: 1.764 | 676 ms/step , 58172.14 GFLOP/s , 532221.6 tokens/s INFO:__main__:2024-10-27 01:22:43 | Epoch: 1 | Step: 33590 | Dataset: 0-7144901 | Loss: 1.759 | 674 ms/step , 58298.14 GFLOP/s , 532254.1 tokens/s INFO:__main__:2024-10-27 01:22:51 | Epoch: 1 | Step: 33600 | Dataset: 0-7152901 | Loss: 1.736 | 675 ms/step , 58233.76 GFLOP/s , 532390.5 tokens/s INFO:__main__:2024-10-27 01:22:58 | Epoch: 1 | Step: 33610 | Dataset: 0-7160901 | Loss: 1.726 | 675 ms/step , 58195.45 GFLOP/s , 532371.9 tokens/s INFO:__main__:2024-10-27 01:23:06 | Epoch: 1 | Step: 33620 | Dataset: 0-7168901 | Loss: 1.694 | 676 ms/step , 58136.26 GFLOP/s , 531783.6 tokens/s INFO:__main__:2024-10-27 01:23:14 | Epoch: 1 | Step: 33630 | Dataset: 0-7176901 | Loss: 1.716 | 676 ms/step , 58170.63 GFLOP/s , 532263.2 tokens/s INFO:__main__:2024-10-27 01:23:21 | Epoch: 1 | Step: 33640 | Dataset: 0-7184901 | Loss: 1.670 | 675 ms/step , 58235.71 GFLOP/s , 532294.9 tokens/s INFO:__main__:2024-10-27 01:23:29 | Epoch: 1 | Step: 33650 | Dataset: 0-7192901 | Loss: 1.685 | 675 ms/step , 58225.27 GFLOP/s , 531851.2 tokens/s INFO:__main__:2024-10-27 01:23:37 | Epoch: 1 | Step: 33660 | Dataset: 0-7200901 | Loss: 2.297 | 675 ms/step , 58273.08 GFLOP/s , 531649.4 tokens/s INFO:__main__:2024-10-27 01:23:45 | Epoch: 1 | Step: 33670 | Dataset: 0-7208901 | Loss: 2.313 | 676 ms/step , 58186.90 GFLOP/s , 532789.4 tokens/s INFO:__main__:2024-10-27 01:23:52 | Epoch: 1 | Step: 33680 | Dataset: 0-7216901 | Loss: 2.218 | 676 ms/step , 58183.70 GFLOP/s , 532352.1 tokens/s INFO:__main__:2024-10-27 01:24:00 | Epoch: 1 | Step: 33690 | Dataset: 0-7224901 | Loss: 2.181 | 675 ms/step , 58216.77 GFLOP/s , 532848.3 tokens/s INFO:__main__:2024-10-27 01:24:08 | Epoch: 1 | Step: 33700 | Dataset: 0-7232901 | Loss: 2.195 | 677 ms/step , 58094.33 GFLOP/s , 532130.5 tokens/s INFO:__main__:2024-10-27 01:24:15 | Epoch: 1 | Step: 33710 | Dataset: 0-7240901 | Loss: 2.212 | 675 ms/step , 58212.09 GFLOP/s , 533123.7 tokens/s INFO:__main__:2024-10-27 01:24:23 | Epoch: 1 | Step: 33720 | Dataset: 0-7248901 | Loss: 2.238 | 675 ms/step , 58211.69 GFLOP/s , 532563.3 tokens/s INFO:__main__:2024-10-27 01:24:31 | Epoch: 1 | Step: 33730 | Dataset: 0-7256901 | Loss: 2.223 | 675 ms/step , 58267.70 GFLOP/s , 532568.7 tokens/s INFO:__main__:2024-10-27 01:24:38 | Epoch: 1 | Step: 33740 | Dataset: 0-7264901 | Loss: 2.130 | 674 ms/step , 58299.36 GFLOP/s , 532643.7 tokens/s INFO:__main__:2024-10-27 01:24:46 | Epoch: 1 | Step: 33750 | Dataset: 0-7272901 | Loss: 2.187 | 677 ms/step , 58070.12 GFLOP/s , 532762.4 tokens/s INFO:__main__:2024-10-27 01:24:54 | Epoch: 1 | Step: 33760 | Dataset: 0-7280901 | Loss: 2.200 | 677 ms/step , 58064.40 GFLOP/s , 532235.8 tokens/s INFO:__main__:2024-10-27 01:25:01 | Epoch: 1 | Step: 33770 | Dataset: 0-7288901 | Loss: 2.138 | 676 ms/step , 58170.18 GFLOP/s , 532283.9 tokens/s INFO:__main__:2024-10-27 01:25:09 | Epoch: 1 | Step: 33780 | Dataset: 0-7296901 | Loss: 2.238 | 675 ms/step , 58193.44 GFLOP/s , 532400.9 tokens/s INFO:__main__:2024-10-27 01:25:17 | Epoch: 1 | Step: 33790 | Dataset: 0-7304901 | Loss: 2.210 | 675 ms/step , 58210.67 GFLOP/s , 532230.5 tokens/s INFO:__main__:2024-10-27 01:25:25 | Epoch: 1 | Step: 33800 | Dataset: 0-7312901 | Loss: 2.176 | 676 ms/step , 58112.20 GFLOP/s , 532697.4 tokens/s INFO:__main__:2024-10-27 01:25:32 | Epoch: 1 | Step: 33810 | Dataset: 0-7320901 | Loss: 2.190 | 675 ms/step , 58215.80 GFLOP/s , 532190.7 tokens/s INFO:__main__:2024-10-27 01:25:40 | Epoch: 1 | Step: 33820 | Dataset: 0-7328901 | Loss: 1.900 | 676 ms/step , 58175.10 GFLOP/s , 532482.6 tokens/s INFO:__main__:2024-10-27 01:25:48 | Epoch: 1 | Step: 33830 | Dataset: 0-7336901 | Loss: 1.816 | 676 ms/step , 58182.10 GFLOP/s , 532193.1 tokens/s INFO:__main__:2024-10-27 01:25:55 | Epoch: 1 | Step: 33840 | Dataset: 0-7344901 | Loss: 1.805 | 675 ms/step , 58205.43 GFLOP/s , 531539.5 tokens/s INFO:__main__:2024-10-27 01:26:03 | Epoch: 1 | Step: 33850 | Dataset: 0-7352901 | Loss: 1.775 | 676 ms/step , 58124.21 GFLOP/s , 531920.3 tokens/s INFO:__main__:2024-10-27 01:26:11 | Epoch: 1 | Step: 33860 | Dataset: 0-7360901 | Loss: 1.769 | 676 ms/step , 58109.55 GFLOP/s , 531559.3 tokens/s INFO:__main__:2024-10-27 01:26:18 | Epoch: 1 | Step: 33870 | Dataset: 0-7368901 | Loss: 1.800 | 676 ms/step , 58143.04 GFLOP/s , 532172.7 tokens/s INFO:__main__:2024-10-27 01:26:26 | Epoch: 1 | Step: 33880 | Dataset: 0-7376901 | Loss: 1.793 | 675 ms/step , 58241.96 GFLOP/s , 531536.0 tokens/s INFO:__main__:2024-10-27 01:26:34 | Epoch: 1 | Step: 33890 | Dataset: 0-7384901 | Loss: 1.791 | 676 ms/step , 58189.27 GFLOP/s , 531832.7 tokens/s INFO:__main__:2024-10-27 01:26:42 | Epoch: 1 | Step: 33900 | Dataset: 0-7392901 | Loss: 1.794 | 676 ms/step , 58171.02 GFLOP/s , 531672.7 tokens/s INFO:__main__:2024-10-27 01:26:49 | Epoch: 1 | Step: 33910 | Dataset: 0-7400901 | Loss: 2.358 | 676 ms/step , 58168.18 GFLOP/s , 532443.1 tokens/s INFO:__main__:2024-10-27 01:26:57 | Epoch: 1 | Step: 33920 | Dataset: 0-7408901 | Loss: 2.277 | 676 ms/step , 58171.60 GFLOP/s , 532388.6 tokens/s INFO:__main__:2024-10-27 01:27:05 | Epoch: 1 | Step: 33930 | Dataset: 0-7416901 | Loss: 2.298 | 675 ms/step , 58203.59 GFLOP/s , 532345.0 tokens/s INFO:__main__:2024-10-27 01:27:12 | Epoch: 1 | Step: 33940 | Dataset: 0-7424901 | Loss: 2.191 | 675 ms/step , 58267.87 GFLOP/s , 532430.4 tokens/s INFO:__main__:2024-10-27 01:27:20 | Epoch: 1 | Step: 33950 | Dataset: 0-7432901 | Loss: 2.223 | 675 ms/step , 58215.65 GFLOP/s , 532130.2 tokens/s INFO:__main__:2024-10-27 01:27:28 | Epoch: 1 | Step: 33960 | Dataset: 0-7440901 | Loss: 2.239 | 676 ms/step , 58192.05 GFLOP/s , 532736.2 tokens/s INFO:__main__:2024-10-27 01:27:35 | Epoch: 1 | Step: 33970 | Dataset: 0-7448901 | Loss: 2.196 | 675 ms/step , 58252.43 GFLOP/s , 532243.2 tokens/s INFO:__main__:2024-10-27 01:27:43 | Epoch: 1 | Step: 33980 | Dataset: 0-7456901 | Loss: 2.215 | 675 ms/step , 58259.21 GFLOP/s , 532676.1 tokens/s INFO:__main__:2024-10-27 01:27:51 | Epoch: 1 | Step: 33990 | Dataset: 0-7464901 | Loss: 2.311 | 674 ms/step , 58363.41 GFLOP/s , 532403.8 tokens/s INFO:__main__:2024-10-27 01:27:58 | Validation | Step: 34000 | Val_loss: 1.944 | Best_val_loss: 2.0767 INFO:__main__:2024-10-27 01:27:58 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_012758_step_34000.pt` INFO:__main__:2024-10-27 01:27:59 | Epoch: 1 | Step: 34000 | Dataset: 0-7472901 | Loss: 2.194 | 673 ms/step , 58447.25 GFLOP/s , 479326.8 tokens/s INFO:__main__:2024-10-27 01:28:07 | Epoch: 1 | Step: 34010 | Dataset: 0-7480901 | Loss: 2.197 | 675 ms/step , 58273.85 GFLOP/s , 533042.6 tokens/s INFO:__main__:2024-10-27 01:28:15 | Epoch: 1 | Step: 34020 | Dataset: 0-7488901 | Loss: 2.232 | 675 ms/step , 58206.62 GFLOP/s , 532932.4 tokens/s INFO:__main__:2024-10-27 01:28:22 | Epoch: 1 | Step: 34030 | Dataset: 0-7496901 | Loss: 2.231 | 675 ms/step , 58232.93 GFLOP/s , 532835.4 tokens/s INFO:__main__:2024-10-27 01:28:30 | Epoch: 1 | Step: 34040 | Dataset: 0-7504901 | Loss: 2.230 | 674 ms/step , 58308.53 GFLOP/s , 532477.7 tokens/s INFO:__main__:2024-10-27 01:28:38 | Epoch: 1 | Step: 34050 | Dataset: 0-7512901 | Loss: 2.268 | 679 ms/step , 57899.54 GFLOP/s , 532722.4 tokens/s INFO:__main__:2024-10-27 01:28:45 | Epoch: 1 | Step: 34060 | Dataset: 0-7520901 | Loss: 2.249 | 675 ms/step , 58251.01 GFLOP/s , 532343.3 tokens/s INFO:__main__:2024-10-27 01:28:53 | Epoch: 1 | Step: 34070 | Dataset: 0-7528901 | Loss: 2.196 | 674 ms/step , 58317.21 GFLOP/s , 533109.6 tokens/s INFO:__main__:2024-10-27 01:29:01 | Epoch: 1 | Step: 34080 | Dataset: 0-7536901 | Loss: 2.213 | 675 ms/step , 58269.81 GFLOP/s , 532395.2 tokens/s INFO:__main__:2024-10-27 01:29:09 | Epoch: 1 | Step: 34090 | Dataset: 0-7544901 | Loss: 2.149 | 676 ms/step , 58174.93 GFLOP/s , 532380.9 tokens/s INFO:__main__:2024-10-27 01:29:16 | Epoch: 1 | Step: 34100 | Dataset: 0-7552901 | Loss: 2.196 | 675 ms/step , 58256.33 GFLOP/s , 532085.9 tokens/s INFO:__main__:2024-10-27 01:29:24 | Epoch: 1 | Step: 34110 | Dataset: 0-7560901 | Loss: 2.127 | 675 ms/step , 58195.21 GFLOP/s , 532077.0 tokens/s INFO:__main__:2024-10-27 01:29:32 | Epoch: 1 | Step: 34120 | Dataset: 0-7568901 | Loss: 2.204 | 675 ms/step , 58241.97 GFLOP/s , 532024.6 tokens/s INFO:__main__:2024-10-27 01:29:39 | Epoch: 1 | Step: 34130 | Dataset: 0-7576901 | Loss: 2.206 | 674 ms/step , 58313.78 GFLOP/s , 531516.4 tokens/s INFO:__main__:2024-10-27 01:29:47 | Epoch: 1 | Step: 34140 | Dataset: 0-7584901 | Loss: 2.196 | 676 ms/step , 58171.39 GFLOP/s , 531414.4 tokens/s INFO:__main__:2024-10-27 01:29:55 | Epoch: 1 | Step: 34150 | Dataset: 0-7592901 | Loss: 2.238 | 673 ms/step , 58369.14 GFLOP/s , 531965.3 tokens/s INFO:__main__:2024-10-27 01:30:02 | Epoch: 1 | Step: 34160 | Dataset: 0-7600901 | Loss: 2.250 | 675 ms/step , 58236.48 GFLOP/s , 531870.6 tokens/s INFO:__main__:2024-10-27 01:30:10 | Epoch: 1 | Step: 34170 | Dataset: 0-7608901 | Loss: 2.240 | 675 ms/step , 58269.56 GFLOP/s , 532101.2 tokens/s INFO:__main__:2024-10-27 01:30:18 | Epoch: 1 | Step: 34180 | Dataset: 0-7616901 | Loss: 2.196 | 675 ms/step , 58259.65 GFLOP/s , 532927.8 tokens/s INFO:__main__:2024-10-27 01:30:26 | Epoch: 1 | Step: 34190 | Dataset: 0-7624901 | Loss: 2.191 | 679 ms/step , 57878.80 GFLOP/s , 531693.0 tokens/s INFO:__main__:2024-10-27 01:30:33 | Epoch: 1 | Step: 34200 | Dataset: 0-7632901 | Loss: 2.155 | 675 ms/step , 58246.73 GFLOP/s , 531615.8 tokens/s INFO:__main__:2024-10-27 01:30:41 | Epoch: 1 | Step: 34210 | Dataset: 0-7640901 | Loss: 2.174 | 675 ms/step , 58228.97 GFLOP/s , 528602.8 tokens/s INFO:__main__:2024-10-27 01:30:49 | Epoch: 1 | Step: 34220 | Dataset: 0-7648901 | Loss: 2.162 | 675 ms/step , 58202.13 GFLOP/s , 532100.7 tokens/s INFO:__main__:2024-10-27 01:30:56 | Epoch: 1 | Step: 34230 | Dataset: 0-7656901 | Loss: 2.199 | 675 ms/step , 58210.36 GFLOP/s , 532560.1 tokens/s INFO:__main__:2024-10-27 01:31:04 | Epoch: 1 | Step: 34240 | Dataset: 0-7664901 | Loss: 2.158 | 675 ms/step , 58266.16 GFLOP/s , 532979.0 tokens/s INFO:__main__:2024-10-27 01:31:12 | Epoch: 1 | Step: 34250 | Dataset: 0-7672901 | Loss: 2.237 | 675 ms/step , 58261.10 GFLOP/s , 532973.0 tokens/s INFO:__main__:2024-10-27 01:31:19 | Epoch: 1 | Step: 34260 | Dataset: 0-7680901 | Loss: 2.272 | 675 ms/step , 58276.96 GFLOP/s , 532835.7 tokens/s INFO:__main__:2024-10-27 01:31:27 | Epoch: 1 | Step: 34270 | Dataset: 0-7688901 | Loss: 2.255 | 674 ms/step , 58338.25 GFLOP/s , 532344.5 tokens/s INFO:__main__:2024-10-27 01:31:35 | Epoch: 1 | Step: 34280 | Dataset: 0-7696901 | Loss: 2.265 | 674 ms/step , 58291.14 GFLOP/s , 532934.3 tokens/s INFO:__main__:2024-10-27 01:31:42 | Epoch: 1 | Step: 34290 | Dataset: 0-7704901 | Loss: 2.195 | 674 ms/step , 58297.71 GFLOP/s , 532478.1 tokens/s INFO:__main__:2024-10-27 01:31:50 | Epoch: 1 | Step: 34300 | Dataset: 0-7712901 | Loss: 2.200 | 676 ms/step , 58144.14 GFLOP/s , 532327.5 tokens/s INFO:__main__:2024-10-27 01:31:58 | Epoch: 1 | Step: 34310 | Dataset: 0-7720901 | Loss: 2.204 | 675 ms/step , 58214.24 GFLOP/s , 531784.1 tokens/s INFO:__main__:2024-10-27 01:32:06 | Epoch: 1 | Step: 34320 | Dataset: 0-7728901 | Loss: 2.139 | 674 ms/step , 58347.10 GFLOP/s , 532272.7 tokens/s INFO:__main__:2024-10-27 01:32:13 | Epoch: 1 | Step: 34330 | Dataset: 0-7736901 | Loss: 2.220 | 675 ms/step , 58277.14 GFLOP/s , 532746.7 tokens/s INFO:__main__:2024-10-27 01:32:21 | Epoch: 1 | Step: 34340 | Dataset: 0-7744901 | Loss: 2.228 | 675 ms/step , 58216.43 GFLOP/s , 531577.0 tokens/s INFO:__main__:2024-10-27 01:32:29 | Epoch: 1 | Step: 34350 | Dataset: 0-7752901 | Loss: 2.140 | 676 ms/step , 58151.82 GFLOP/s , 532250.7 tokens/s INFO:__main__:2024-10-27 01:32:36 | Epoch: 1 | Step: 34360 | Dataset: 0-7760901 | Loss: 2.209 | 674 ms/step , 58308.99 GFLOP/s , 532984.0 tokens/s INFO:__main__:2024-10-27 01:32:44 | Epoch: 1 | Step: 34370 | Dataset: 0-7768901 | Loss: 2.230 | 675 ms/step , 58260.25 GFLOP/s , 532378.2 tokens/s INFO:__main__:2024-10-27 01:32:52 | Epoch: 1 | Step: 34380 | Dataset: 0-7776901 | Loss: 2.184 | 675 ms/step , 58263.60 GFLOP/s , 532398.4 tokens/s INFO:__main__:2024-10-27 01:32:59 | Epoch: 1 | Step: 34390 | Dataset: 0-7784901 | Loss: 2.189 | 678 ms/step , 58017.96 GFLOP/s , 532025.7 tokens/s INFO:__main__:2024-10-27 01:33:07 | Epoch: 1 | Step: 34400 | Dataset: 0-7792901 | Loss: 2.135 | 675 ms/step , 58272.83 GFLOP/s , 533140.3 tokens/s INFO:__main__:2024-10-27 01:33:15 | Epoch: 1 | Step: 34410 | Dataset: 0-7800901 | Loss: 2.181 | 674 ms/step , 58318.66 GFLOP/s , 532773.6 tokens/s INFO:__main__:2024-10-27 01:33:22 | Epoch: 1 | Step: 34420 | Dataset: 0-7808901 | Loss: 2.236 | 676 ms/step , 58192.05 GFLOP/s , 532112.7 tokens/s INFO:__main__:2024-10-27 01:33:30 | Epoch: 1 | Step: 34430 | Dataset: 0-7816901 | Loss: 2.163 | 675 ms/step , 58261.60 GFLOP/s , 531775.8 tokens/s INFO:__main__:2024-10-27 01:33:38 | Epoch: 1 | Step: 34440 | Dataset: 0-7824901 | Loss: 2.260 | 675 ms/step , 58260.15 GFLOP/s , 532942.0 tokens/s INFO:__main__:2024-10-27 01:33:46 | Epoch: 1 | Step: 34450 | Dataset: 0-7832901 | Loss: 2.104 | 676 ms/step , 58163.36 GFLOP/s , 532613.7 tokens/s INFO:__main__:2024-10-27 01:33:53 | Epoch: 1 | Step: 34460 | Dataset: 0-7840901 | Loss: 2.155 | 675 ms/step , 58213.18 GFLOP/s , 532993.9 tokens/s INFO:__main__:2024-10-27 01:34:01 | Epoch: 1 | Step: 34470 | Dataset: 0-7848901 | Loss: 2.126 | 674 ms/step , 58306.68 GFLOP/s , 533213.6 tokens/s INFO:__main__:2024-10-27 01:34:09 | Epoch: 1 | Step: 34480 | Dataset: 0-7856901 | Loss: 2.185 | 675 ms/step , 58198.36 GFLOP/s , 532393.0 tokens/s INFO:__main__:2024-10-27 01:34:16 | Epoch: 1 | Step: 34490 | Dataset: 0-7864901 | Loss: 2.089 | 677 ms/step , 58077.79 GFLOP/s , 532562.1 tokens/s INFO:__main__:2024-10-27 01:34:24 | Epoch: 1 | Step: 34500 | Dataset: 0-7872901 | Loss: 2.177 | 675 ms/step , 58198.17 GFLOP/s , 532223.3 tokens/s INFO:__main__:2024-10-27 01:34:32 | Epoch: 1 | Step: 34510 | Dataset: 0-7880901 | Loss: 2.210 | 675 ms/step , 58230.37 GFLOP/s , 533134.7 tokens/s INFO:__main__:2024-10-27 01:34:39 | Epoch: 1 | Step: 34520 | Dataset: 0-7888901 | Loss: 2.148 | 674 ms/step , 58279.83 GFLOP/s , 532452.3 tokens/s INFO:__main__:2024-10-27 01:34:47 | Epoch: 1 | Step: 34530 | Dataset: 0-7896901 | Loss: 2.156 | 678 ms/step , 58016.96 GFLOP/s , 532663.2 tokens/s INFO:__main__:2024-10-27 01:34:55 | Epoch: 1 | Step: 34540 | Dataset: 0-7904901 | Loss: 2.210 | 674 ms/step , 58283.31 GFLOP/s , 532625.8 tokens/s INFO:__main__:2024-10-27 01:35:02 | Epoch: 1 | Step: 34550 | Dataset: 0-7912901 | Loss: 2.265 | 675 ms/step , 58263.47 GFLOP/s , 532292.0 tokens/s INFO:__main__:2024-10-27 01:35:10 | Epoch: 1 | Step: 34560 | Dataset: 0-7920901 | Loss: 2.249 | 677 ms/step , 58080.87 GFLOP/s , 530944.1 tokens/s INFO:__main__:2024-10-27 01:35:18 | Epoch: 1 | Step: 34570 | Dataset: 0-7928901 | Loss: 2.239 | 676 ms/step , 58131.70 GFLOP/s , 530177.2 tokens/s INFO:__main__:2024-10-27 01:35:26 | Epoch: 1 | Step: 34580 | Dataset: 0-7936901 | Loss: 2.245 | 677 ms/step , 58100.67 GFLOP/s , 530690.0 tokens/s INFO:__main__:2024-10-27 01:35:33 | Epoch: 1 | Step: 34590 | Dataset: 0-7944901 | Loss: 2.291 | 677 ms/step , 58067.72 GFLOP/s , 530570.5 tokens/s INFO:__main__:2024-10-27 01:35:41 | Epoch: 1 | Step: 34600 | Dataset: 0-7952901 | Loss: 2.199 | 676 ms/step , 58184.37 GFLOP/s , 530602.0 tokens/s INFO:__main__:2024-10-27 01:35:49 | Epoch: 1 | Step: 34610 | Dataset: 0-7960901 | Loss: 2.205 | 677 ms/step , 58051.51 GFLOP/s , 531028.5 tokens/s INFO:__main__:2024-10-27 01:35:57 | Epoch: 1 | Step: 34620 | Dataset: 0-7968901 | Loss: 2.199 | 677 ms/step , 58097.44 GFLOP/s , 530369.6 tokens/s INFO:__main__:2024-10-27 01:36:04 | Epoch: 1 | Step: 34630 | Dataset: 0-7976901 | Loss: 2.293 | 676 ms/step , 58130.13 GFLOP/s , 529132.1 tokens/s INFO:__main__:2024-10-27 01:36:12 | Epoch: 1 | Step: 34640 | Dataset: 0-7984901 | Loss: 2.182 | 675 ms/step , 58198.92 GFLOP/s , 531885.2 tokens/s INFO:__main__:2024-10-27 01:36:20 | Epoch: 1 | Step: 34650 | Dataset: 0-7992901 | Loss: 2.118 | 676 ms/step , 58190.90 GFLOP/s , 532267.8 tokens/s INFO:__main__:2024-10-27 01:36:27 | Epoch: 1 | Step: 34660 | Dataset: 0-8000901 | Loss: 2.168 | 675 ms/step , 58219.75 GFLOP/s , 532360.5 tokens/s INFO:__main__:2024-10-27 01:36:35 | Epoch: 1 | Step: 34670 | Dataset: 0-8008901 | Loss: 2.130 | 676 ms/step , 58186.20 GFLOP/s , 532767.5 tokens/s INFO:__main__:2024-10-27 01:36:43 | Epoch: 1 | Step: 34680 | Dataset: 0-8016901 | Loss: 2.108 | 675 ms/step , 58256.09 GFLOP/s , 531985.5 tokens/s INFO:__main__:2024-10-27 01:36:50 | Epoch: 1 | Step: 34690 | Dataset: 0-8024901 | Loss: 2.208 | 676 ms/step , 58164.56 GFLOP/s , 532474.8 tokens/s INFO:__main__:2024-10-27 01:36:58 | Epoch: 1 | Step: 34700 | Dataset: 0-8032901 | Loss: 2.112 | 676 ms/step , 58178.32 GFLOP/s , 532453.2 tokens/s INFO:__main__:2024-10-27 01:37:06 | Epoch: 1 | Step: 34710 | Dataset: 0-8040901 | Loss: 2.125 | 674 ms/step , 58291.55 GFLOP/s , 532902.0 tokens/s INFO:__main__:2024-10-27 01:37:13 | Epoch: 1 | Step: 34720 | Dataset: 0-8048901 | Loss: 2.167 | 675 ms/step , 58260.79 GFLOP/s , 532309.7 tokens/s INFO:__main__:2024-10-27 01:37:21 | Epoch: 1 | Step: 34730 | Dataset: 0-8056901 | Loss: 2.153 | 676 ms/step , 58187.64 GFLOP/s , 532935.2 tokens/s INFO:__main__:2024-10-27 01:37:29 | Epoch: 1 | Step: 34740 | Dataset: 0-8064901 | Loss: 2.136 | 677 ms/step , 58060.16 GFLOP/s , 532586.2 tokens/s INFO:__main__:2024-10-27 01:37:37 | Epoch: 1 | Step: 34750 | Dataset: 0-8072901 | Loss: 2.174 | 675 ms/step , 58197.95 GFLOP/s , 532491.1 tokens/s INFO:__main__:2024-10-27 01:37:44 | Epoch: 1 | Step: 34760 | Dataset: 0-8080901 | Loss: 2.148 | 675 ms/step , 58265.64 GFLOP/s , 532747.4 tokens/s INFO:__main__:2024-10-27 01:37:52 | Epoch: 1 | Step: 34770 | Dataset: 0-8088901 | Loss: 2.133 | 677 ms/step , 58071.61 GFLOP/s , 532673.9 tokens/s INFO:__main__:2024-10-27 01:38:00 | Epoch: 1 | Step: 34780 | Dataset: 0-8096901 | Loss: 2.196 | 675 ms/step , 58251.48 GFLOP/s , 532556.4 tokens/s INFO:__main__:2024-10-27 01:38:07 | Epoch: 1 | Step: 34790 | Dataset: 0-8104901 | Loss: 2.244 | 674 ms/step , 58338.99 GFLOP/s , 532718.9 tokens/s INFO:__main__:2024-10-27 01:38:15 | Epoch: 1 | Step: 34800 | Dataset: 0-8112901 | Loss: 2.192 | 677 ms/step , 58072.10 GFLOP/s , 532873.5 tokens/s INFO:__main__:2024-10-27 01:38:23 | Epoch: 1 | Step: 34810 | Dataset: 0-8120901 | Loss: 2.117 | 676 ms/step , 58117.10 GFLOP/s , 532216.6 tokens/s INFO:__main__:2024-10-27 01:38:30 | Epoch: 1 | Step: 34820 | Dataset: 0-8128901 | Loss: 2.143 | 677 ms/step , 58035.01 GFLOP/s , 531625.5 tokens/s INFO:__main__:2024-10-27 01:38:38 | Epoch: 1 | Step: 34830 | Dataset: 0-8136901 | Loss: 2.178 | 676 ms/step , 58189.78 GFLOP/s , 530987.6 tokens/s INFO:__main__:2024-10-27 01:38:46 | Epoch: 1 | Step: 34840 | Dataset: 0-8144901 | Loss: 2.151 | 674 ms/step , 58307.51 GFLOP/s , 532253.8 tokens/s INFO:__main__:2024-10-27 01:38:54 | Epoch: 1 | Step: 34850 | Dataset: 0-8152901 | Loss: 2.215 | 675 ms/step , 58205.98 GFLOP/s , 532346.0 tokens/s INFO:__main__:2024-10-27 01:39:01 | Epoch: 1 | Step: 34860 | Dataset: 0-8160901 | Loss: 2.095 | 676 ms/step , 58163.26 GFLOP/s , 532619.5 tokens/s INFO:__main__:2024-10-27 01:39:09 | Epoch: 1 | Step: 34870 | Dataset: 0-8168901 | Loss: 2.164 | 675 ms/step , 58255.05 GFLOP/s , 532721.7 tokens/s INFO:__main__:2024-10-27 01:39:17 | Epoch: 1 | Step: 34880 | Dataset: 0-8176901 | Loss: 2.093 | 674 ms/step , 58316.34 GFLOP/s , 532810.6 tokens/s INFO:__main__:2024-10-27 01:39:24 | Epoch: 1 | Step: 34890 | Dataset: 0-8184901 | Loss: 2.152 | 675 ms/step , 58198.96 GFLOP/s , 532492.6 tokens/s INFO:__main__:2024-10-27 01:39:32 | Epoch: 1 | Step: 34900 | Dataset: 0-8192901 | Loss: 2.191 | 675 ms/step , 58196.87 GFLOP/s , 532185.1 tokens/s INFO:__main__:2024-10-27 01:39:40 | Epoch: 1 | Step: 34910 | Dataset: 0-8200901 | Loss: 2.186 | 675 ms/step , 58223.67 GFLOP/s , 532480.0 tokens/s INFO:__main__:2024-10-27 01:39:47 | Epoch: 1 | Step: 34920 | Dataset: 0-8208901 | Loss: 2.150 | 674 ms/step , 58311.17 GFLOP/s , 532519.9 tokens/s INFO:__main__:2024-10-27 01:39:55 | Epoch: 1 | Step: 34930 | Dataset: 0-8216901 | Loss: 2.138 | 674 ms/step , 58280.38 GFLOP/s , 532914.8 tokens/s INFO:__main__:2024-10-27 01:40:03 | Epoch: 1 | Step: 34940 | Dataset: 0-8224901 | Loss: 2.166 | 676 ms/step , 58129.78 GFLOP/s , 532669.3 tokens/s INFO:__main__:2024-10-27 01:40:10 | Epoch: 1 | Step: 34950 | Dataset: 0-8232901 | Loss: 2.175 | 676 ms/step , 58144.56 GFLOP/s , 532970.6 tokens/s INFO:__main__:2024-10-27 01:40:18 | Epoch: 1 | Step: 34960 | Dataset: 0-8240901 | Loss: 2.138 | 673 ms/step , 58365.86 GFLOP/s , 532950.4 tokens/s INFO:__main__:2024-10-27 01:40:26 | Epoch: 1 | Step: 34970 | Dataset: 0-8248901 | Loss: 2.121 | 675 ms/step , 58255.40 GFLOP/s , 531819.4 tokens/s INFO:__main__:2024-10-27 01:40:33 | Epoch: 1 | Step: 34980 | Dataset: 0-8256901 | Loss: 2.220 | 675 ms/step , 58203.33 GFLOP/s , 532198.0 tokens/s INFO:__main__:2024-10-27 01:40:41 | Epoch: 1 | Step: 34990 | Dataset: 0-8264901 | Loss: 2.171 | 675 ms/step , 58239.95 GFLOP/s , 532555.1 tokens/s INFO:__main__:2024-10-27 01:40:48 | Validation | Step: 35000 | Val_loss: 2.469 | Best_val_loss: 1.9440 INFO:__main__:2024-10-27 01:40:48 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_014048_step_35000.pt` INFO:__main__:2024-10-27 01:40:50 | Epoch: 1 | Step: 35000 | Dataset: 0-8272901 | Loss: 2.077 | 673 ms/step , 58376.55 GFLOP/s , 479951.5 tokens/s INFO:__main__:2024-10-27 01:40:57 | Epoch: 1 | Step: 35010 | Dataset: 0-8280901 | Loss: 2.141 | 676 ms/step , 58177.88 GFLOP/s , 532082.0 tokens/s INFO:__main__:2024-10-27 01:41:05 | Epoch: 1 | Step: 35020 | Dataset: 0-8288901 | Loss: 2.131 | 677 ms/step , 58095.94 GFLOP/s , 532554.2 tokens/s INFO:__main__:2024-10-27 01:41:13 | Epoch: 1 | Step: 35030 | Dataset: 0-8296901 | Loss: 2.133 | 673 ms/step , 58369.32 GFLOP/s , 532455.8 tokens/s INFO:__main__:2024-10-27 01:41:20 | Epoch: 1 | Step: 35040 | Dataset: 0-8304901 | Loss: 1.946 | 675 ms/step , 58252.26 GFLOP/s , 532375.4 tokens/s INFO:__main__:2024-10-27 01:41:28 | Epoch: 1 | Step: 35050 | Dataset: 0-8312901 | Loss: 1.849 | 675 ms/step , 58248.59 GFLOP/s , 532075.5 tokens/s INFO:__main__:2024-10-27 01:41:36 | Epoch: 1 | Step: 35060 | Dataset: 0-8320901 | Loss: 1.826 | 675 ms/step , 58202.29 GFLOP/s , 532098.0 tokens/s INFO:__main__:2024-10-27 01:41:44 | Epoch: 1 | Step: 35070 | Dataset: 0-8328901 | Loss: 1.733 | 675 ms/step , 58229.12 GFLOP/s , 532096.8 tokens/s INFO:__main__:2024-10-27 01:41:51 | Epoch: 1 | Step: 35080 | Dataset: 0-8336901 | Loss: 1.728 | 674 ms/step , 58329.72 GFLOP/s , 532415.0 tokens/s INFO:__main__:2024-10-27 01:41:59 | Epoch: 1 | Step: 35090 | Dataset: 0-8344901 | Loss: 1.759 | 675 ms/step , 58230.96 GFLOP/s , 531852.8 tokens/s INFO:__main__:2024-10-27 01:42:07 | Epoch: 1 | Step: 35100 | Dataset: 0-8352901 | Loss: 1.704 | 675 ms/step , 58229.62 GFLOP/s , 532018.3 tokens/s INFO:__main__:2024-10-27 01:42:14 | Epoch: 1 | Step: 35110 | Dataset: 0-8360901 | Loss: 1.723 | 674 ms/step , 58290.10 GFLOP/s , 532532.9 tokens/s INFO:__main__:2024-10-27 01:42:22 | Epoch: 1 | Step: 35120 | Dataset: 0-8368901 | Loss: 2.383 | 674 ms/step , 58287.57 GFLOP/s , 532228.3 tokens/s INFO:__main__:2024-10-27 01:42:30 | Epoch: 1 | Step: 35130 | Dataset: 0-8376901 | Loss: 2.295 | 675 ms/step , 58273.32 GFLOP/s , 533246.0 tokens/s INFO:__main__:2024-10-27 01:42:37 | Epoch: 1 | Step: 35140 | Dataset: 0-8384901 | Loss: 2.210 | 675 ms/step , 58209.32 GFLOP/s , 532207.2 tokens/s INFO:__main__:2024-10-27 01:42:45 | Epoch: 1 | Step: 35150 | Dataset: 0-8392901 | Loss: 2.281 | 675 ms/step , 58201.79 GFLOP/s , 532656.6 tokens/s INFO:__main__:2024-10-27 01:42:53 | Epoch: 1 | Step: 35160 | Dataset: 0-8400901 | Loss: 2.211 | 674 ms/step , 58307.55 GFLOP/s , 532499.6 tokens/s INFO:__main__:2024-10-27 01:43:01 | Epoch: 1 | Step: 35170 | Dataset: 0-8408901 | Loss: 2.203 | 675 ms/step , 58239.01 GFLOP/s , 532837.9 tokens/s INFO:__main__:2024-10-27 01:43:08 | Epoch: 1 | Step: 35180 | Dataset: 0-8416901 | Loss: 2.183 | 678 ms/step , 58009.19 GFLOP/s , 532337.7 tokens/s INFO:__main__:2024-10-27 01:43:16 | Epoch: 1 | Step: 35190 | Dataset: 0-8424901 | Loss: 2.205 | 675 ms/step , 58220.57 GFLOP/s , 531433.5 tokens/s INFO:__main__:2024-10-27 01:43:24 | Epoch: 1 | Step: 35200 | Dataset: 0-8432901 | Loss: 2.186 | 676 ms/step , 58184.89 GFLOP/s , 532724.1 tokens/s INFO:__main__:2024-10-27 01:43:31 | Epoch: 1 | Step: 35210 | Dataset: 0-8440901 | Loss: 2.245 | 675 ms/step , 58254.92 GFLOP/s , 532543.8 tokens/s INFO:__main__:2024-10-27 01:43:39 | Epoch: 1 | Step: 35220 | Dataset: 0-8448901 | Loss: 2.213 | 676 ms/step , 58148.06 GFLOP/s , 532685.0 tokens/s INFO:__main__:2024-10-27 01:43:47 | Epoch: 1 | Step: 35230 | Dataset: 0-8456901 | Loss: 2.216 | 676 ms/step , 58175.27 GFLOP/s , 531995.6 tokens/s INFO:__main__:2024-10-27 01:43:54 | Epoch: 1 | Step: 35240 | Dataset: 0-8464901 | Loss: 2.269 | 674 ms/step , 58303.53 GFLOP/s , 532711.6 tokens/s INFO:__main__:2024-10-27 01:44:02 | Epoch: 1 | Step: 35250 | Dataset: 0-8472901 | Loss: 2.221 | 676 ms/step , 58119.29 GFLOP/s , 532164.5 tokens/s INFO:__main__:2024-10-27 01:44:10 | Epoch: 1 | Step: 35260 | Dataset: 0-8480901 | Loss: 2.172 | 676 ms/step , 58123.60 GFLOP/s , 528723.3 tokens/s INFO:__main__:2024-10-27 01:44:18 | Epoch: 1 | Step: 35270 | Dataset: 0-8488901 | Loss: 2.243 | 676 ms/step , 58147.59 GFLOP/s , 531376.0 tokens/s INFO:__main__:2024-10-27 01:44:25 | Epoch: 1 | Step: 35280 | Dataset: 0-8496901 | Loss: 1.934 | 675 ms/step , 58231.67 GFLOP/s , 531445.4 tokens/s INFO:__main__:2024-10-27 01:44:33 | Epoch: 1 | Step: 35290 | Dataset: 0-8504901 | Loss: 1.804 | 677 ms/step , 58057.78 GFLOP/s , 531564.2 tokens/s INFO:__main__:2024-10-27 01:44:41 | Epoch: 1 | Step: 35300 | Dataset: 0-8512901 | Loss: 1.747 | 676 ms/step , 58160.94 GFLOP/s , 530838.8 tokens/s INFO:__main__:2024-10-27 01:44:48 | Epoch: 1 | Step: 35310 | Dataset: 0-8520901 | Loss: 1.772 | 677 ms/step , 58071.48 GFLOP/s , 531126.1 tokens/s INFO:__main__:2024-10-27 01:44:56 | Epoch: 1 | Step: 35320 | Dataset: 0-8528901 | Loss: 1.719 | 676 ms/step , 58167.76 GFLOP/s , 530935.7 tokens/s INFO:__main__:2024-10-27 01:45:04 | Epoch: 1 | Step: 35330 | Dataset: 0-8536901 | Loss: 1.713 | 677 ms/step , 58084.07 GFLOP/s , 530991.9 tokens/s INFO:__main__:2024-10-27 01:45:12 | Epoch: 1 | Step: 35340 | Dataset: 0-8544901 | Loss: 1.740 | 676 ms/step , 58141.16 GFLOP/s , 529852.7 tokens/s INFO:__main__:2024-10-27 01:45:19 | Epoch: 1 | Step: 35350 | Dataset: 0-8552901 | Loss: 1.682 | 675 ms/step , 58252.38 GFLOP/s , 531564.9 tokens/s INFO:__main__:2024-10-27 01:45:27 | Epoch: 1 | Step: 35360 | Dataset: 0-8560901 | Loss: 1.690 | 677 ms/step , 58023.44 GFLOP/s , 530031.7 tokens/s INFO:__main__:2024-10-27 01:45:35 | Epoch: 1 | Step: 35370 | Dataset: 0-8568901 | Loss: 2.279 | 676 ms/step , 58158.33 GFLOP/s , 530730.6 tokens/s INFO:__main__:2024-10-27 01:45:42 | Epoch: 1 | Step: 35380 | Dataset: 0-8576901 | Loss: 2.327 | 675 ms/step , 58203.42 GFLOP/s , 532459.3 tokens/s INFO:__main__:2024-10-27 01:45:50 | Epoch: 1 | Step: 35390 | Dataset: 0-8584901 | Loss: 2.255 | 675 ms/step , 58263.59 GFLOP/s , 532695.9 tokens/s INFO:__main__:2024-10-27 01:45:58 | Epoch: 1 | Step: 35400 | Dataset: 0-8592901 | Loss: 2.259 | 676 ms/step , 58160.26 GFLOP/s , 532703.4 tokens/s INFO:__main__:2024-10-27 01:46:05 | Epoch: 1 | Step: 35410 | Dataset: 0-8600901 | Loss: 2.125 | 674 ms/step , 58320.47 GFLOP/s , 532466.9 tokens/s INFO:__main__:2024-10-27 01:46:13 | Epoch: 1 | Step: 35420 | Dataset: 0-8608901 | Loss: 2.214 | 676 ms/step , 58150.95 GFLOP/s , 532092.4 tokens/s INFO:__main__:2024-10-27 01:46:21 | Epoch: 1 | Step: 35430 | Dataset: 0-8616901 | Loss: 2.286 | 676 ms/step , 58120.92 GFLOP/s , 532619.8 tokens/s INFO:__main__:2024-10-27 01:46:29 | Epoch: 1 | Step: 35440 | Dataset: 0-8624901 | Loss: 2.086 | 675 ms/step , 58194.71 GFLOP/s , 532725.9 tokens/s INFO:__main__:2024-10-27 01:46:36 | Epoch: 1 | Step: 35450 | Dataset: 0-8632901 | Loss: 2.172 | 675 ms/step , 58229.69 GFLOP/s , 532481.1 tokens/s INFO:__main__:2024-10-27 01:46:44 | Epoch: 1 | Step: 35460 | Dataset: 0-8640901 | Loss: 2.243 | 675 ms/step , 58220.08 GFLOP/s , 532750.9 tokens/s INFO:__main__:2024-10-27 01:46:52 | Epoch: 1 | Step: 35470 | Dataset: 0-8648901 | Loss: 2.145 | 675 ms/step , 58234.55 GFLOP/s , 532217.3 tokens/s INFO:__main__:2024-10-27 01:46:59 | Epoch: 1 | Step: 35480 | Dataset: 0-8656901 | Loss: 2.135 | 675 ms/step , 58254.76 GFLOP/s , 532524.9 tokens/s INFO:__main__:2024-10-27 01:47:07 | Epoch: 1 | Step: 35490 | Dataset: 0-8664901 | Loss: 2.213 | 675 ms/step , 58198.20 GFLOP/s , 531730.8 tokens/s INFO:__main__:2024-10-27 01:47:15 | Epoch: 1 | Step: 35500 | Dataset: 0-8672901 | Loss: 2.182 | 675 ms/step , 58248.13 GFLOP/s , 532110.8 tokens/s INFO:__main__:2024-10-27 01:47:22 | Epoch: 1 | Step: 35510 | Dataset: 0-8680901 | Loss: 2.126 | 676 ms/step , 58121.68 GFLOP/s , 531673.1 tokens/s INFO:__main__:2024-10-27 01:47:30 | Epoch: 1 | Step: 35520 | Dataset: 0-8688901 | Loss: 2.217 | 683 ms/step , 57591.63 GFLOP/s , 531913.0 tokens/s INFO:__main__:2024-10-27 01:47:38 | Epoch: 1 | Step: 35530 | Dataset: 0-8696901 | Loss: 2.404 | 675 ms/step , 58197.26 GFLOP/s , 531404.5 tokens/s INFO:__main__:2024-10-27 01:47:46 | Epoch: 1 | Step: 35540 | Dataset: 0-8704901 | Loss: 2.342 | 676 ms/step , 58189.88 GFLOP/s , 531188.8 tokens/s INFO:__main__:2024-10-27 01:47:53 | Epoch: 1 | Step: 35550 | Dataset: 0-8712901 | Loss: 2.316 | 674 ms/step , 58290.45 GFLOP/s , 532611.0 tokens/s INFO:__main__:2024-10-27 01:48:01 | Epoch: 1 | Step: 35560 | Dataset: 0-8720901 | Loss: 2.295 | 676 ms/step , 58155.30 GFLOP/s , 532161.3 tokens/s INFO:__main__:2024-10-27 01:48:09 | Epoch: 1 | Step: 35570 | Dataset: 0-8728901 | Loss: 2.312 | 675 ms/step , 58232.47 GFLOP/s , 533099.0 tokens/s INFO:__main__:2024-10-27 01:48:16 | Epoch: 1 | Step: 35580 | Dataset: 0-8736901 | Loss: 2.238 | 674 ms/step , 58335.39 GFLOP/s , 533031.9 tokens/s INFO:__main__:2024-10-27 01:48:24 | Epoch: 1 | Step: 35590 | Dataset: 0-8744901 | Loss: 2.285 | 676 ms/step , 58182.98 GFLOP/s , 533052.8 tokens/s INFO:__main__:2024-10-27 01:48:32 | Epoch: 1 | Step: 35600 | Dataset: 0-8752901 | Loss: 2.235 | 678 ms/step , 57993.92 GFLOP/s , 531337.1 tokens/s INFO:__main__:2024-10-27 01:48:39 | Epoch: 1 | Step: 35610 | Dataset: 0-8760901 | Loss: 2.244 | 675 ms/step , 58247.03 GFLOP/s , 532819.5 tokens/s INFO:__main__:2024-10-27 01:48:47 | Epoch: 1 | Step: 35620 | Dataset: 0-8768901 | Loss: 2.162 | 675 ms/step , 58213.80 GFLOP/s , 532497.3 tokens/s INFO:__main__:2024-10-27 01:48:55 | Epoch: 1 | Step: 35630 | Dataset: 0-8776901 | Loss: 2.215 | 676 ms/step , 58150.45 GFLOP/s , 531811.5 tokens/s INFO:__main__:2024-10-27 01:49:02 | Epoch: 1 | Step: 35640 | Dataset: 0-8784901 | Loss: 2.200 | 675 ms/step , 58213.32 GFLOP/s , 532764.8 tokens/s INFO:__main__:2024-10-27 01:49:10 | Epoch: 1 | Step: 35650 | Dataset: 0-8792901 | Loss: 2.146 | 675 ms/step , 58204.23 GFLOP/s , 532078.2 tokens/s INFO:__main__:2024-10-27 01:49:18 | Epoch: 1 | Step: 35660 | Dataset: 0-8800901 | Loss: 2.191 | 676 ms/step , 58175.82 GFLOP/s , 532723.0 tokens/s INFO:__main__:2024-10-27 01:49:26 | Epoch: 1 | Step: 35670 | Dataset: 0-8808901 | Loss: 2.135 | 675 ms/step , 58247.93 GFLOP/s , 532497.8 tokens/s INFO:__main__:2024-10-27 01:49:33 | Epoch: 1 | Step: 35680 | Dataset: 0-8816901 | Loss: 2.195 | 675 ms/step , 58197.08 GFLOP/s , 532703.4 tokens/s INFO:__main__:2024-10-27 01:49:41 | Epoch: 1 | Step: 35690 | Dataset: 0-8824901 | Loss: 2.319 | 675 ms/step , 58233.92 GFLOP/s , 531789.6 tokens/s INFO:__main__:2024-10-27 01:49:49 | Epoch: 1 | Step: 35700 | Dataset: 0-8832901 | Loss: 1.945 | 675 ms/step , 58202.03 GFLOP/s , 532246.4 tokens/s INFO:__main__:2024-10-27 01:49:56 | Epoch: 1 | Step: 35710 | Dataset: 0-8840901 | Loss: 1.882 | 678 ms/step , 57966.90 GFLOP/s , 531690.9 tokens/s INFO:__main__:2024-10-27 01:50:04 | Epoch: 1 | Step: 35720 | Dataset: 0-8848901 | Loss: 1.876 | 674 ms/step , 58316.76 GFLOP/s , 532114.5 tokens/s INFO:__main__:2024-10-27 01:50:12 | Epoch: 1 | Step: 35730 | Dataset: 0-8856901 | Loss: 1.846 | 675 ms/step , 58216.36 GFLOP/s , 532326.4 tokens/s INFO:__main__:2024-10-27 01:50:19 | Epoch: 1 | Step: 35740 | Dataset: 0-8864901 | Loss: 1.846 | 676 ms/step , 58186.48 GFLOP/s , 531729.9 tokens/s INFO:__main__:2024-10-27 01:50:27 | Epoch: 1 | Step: 35750 | Dataset: 0-8872901 | Loss: 1.826 | 674 ms/step , 58314.39 GFLOP/s , 532155.5 tokens/s INFO:__main__:2024-10-27 01:50:35 | Epoch: 1 | Step: 35760 | Dataset: 0-8880901 | Loss: 1.784 | 675 ms/step , 58215.72 GFLOP/s , 531960.2 tokens/s INFO:__main__:2024-10-27 01:50:42 | Epoch: 1 | Step: 35770 | Dataset: 0-8888901 | Loss: 1.785 | 675 ms/step , 58243.37 GFLOP/s , 531930.2 tokens/s INFO:__main__:2024-10-27 01:50:50 | Epoch: 1 | Step: 35780 | Dataset: 0-8896901 | Loss: 2.447 | 675 ms/step , 58251.26 GFLOP/s , 532027.5 tokens/s INFO:__main__:2024-10-27 01:50:58 | Epoch: 1 | Step: 35790 | Dataset: 0-8904901 | Loss: 2.261 | 675 ms/step , 58244.78 GFLOP/s , 532446.5 tokens/s INFO:__main__:2024-10-27 01:51:06 | Epoch: 1 | Step: 35800 | Dataset: 0-8912901 | Loss: 2.267 | 674 ms/step , 58291.34 GFLOP/s , 531812.9 tokens/s INFO:__main__:2024-10-27 01:51:13 | Epoch: 1 | Step: 35810 | Dataset: 0-8920901 | Loss: 2.192 | 674 ms/step , 58281.99 GFLOP/s , 532753.8 tokens/s INFO:__main__:2024-10-27 01:51:21 | Epoch: 1 | Step: 35820 | Dataset: 0-8928901 | Loss: 2.160 | 674 ms/step , 58297.11 GFLOP/s , 532883.2 tokens/s INFO:__main__:2024-10-27 01:51:29 | Epoch: 1 | Step: 35830 | Dataset: 0-8936901 | Loss: 2.207 | 675 ms/step , 58254.56 GFLOP/s , 532595.2 tokens/s INFO:__main__:2024-10-27 01:51:36 | Epoch: 1 | Step: 35840 | Dataset: 0-8944901 | Loss: 2.305 | 675 ms/step , 58270.42 GFLOP/s , 533158.2 tokens/s INFO:__main__:2024-10-27 01:51:44 | Epoch: 1 | Step: 35850 | Dataset: 0-8952901 | Loss: 2.222 | 676 ms/step , 58156.38 GFLOP/s , 532082.9 tokens/s INFO:__main__:2024-10-27 01:51:52 | Epoch: 1 | Step: 35860 | Dataset: 0-8960901 | Loss: 2.192 | 675 ms/step , 58221.21 GFLOP/s , 532398.4 tokens/s INFO:__main__:2024-10-27 01:51:59 | Epoch: 1 | Step: 35870 | Dataset: 0-8968901 | Loss: 2.255 | 675 ms/step , 58261.80 GFLOP/s , 532886.5 tokens/s INFO:__main__:2024-10-27 01:52:07 | Epoch: 1 | Step: 35880 | Dataset: 0-8976901 | Loss: 2.227 | 675 ms/step , 58265.36 GFLOP/s , 532877.0 tokens/s INFO:__main__:2024-10-27 01:52:15 | Epoch: 1 | Step: 35890 | Dataset: 0-8984901 | Loss: 2.265 | 675 ms/step , 58245.46 GFLOP/s , 532665.2 tokens/s INFO:__main__:2024-10-27 01:52:22 | Epoch: 1 | Step: 35900 | Dataset: 0-8992901 | Loss: 2.164 | 675 ms/step , 58275.78 GFLOP/s , 532025.5 tokens/s INFO:__main__:2024-10-27 01:52:30 | Epoch: 1 | Step: 35910 | Dataset: 0-9000901 | Loss: 2.114 | 675 ms/step , 58202.21 GFLOP/s , 532153.8 tokens/s INFO:__main__:2024-10-27 01:52:38 | Epoch: 1 | Step: 35920 | Dataset: 0-9008901 | Loss: 2.182 | 674 ms/step , 58321.54 GFLOP/s , 532861.9 tokens/s INFO:__main__:2024-10-27 01:52:46 | Epoch: 1 | Step: 35930 | Dataset: 0-9016901 | Loss: 2.223 | 676 ms/step , 58124.38 GFLOP/s , 531797.1 tokens/s INFO:__main__:2024-10-27 01:52:53 | Epoch: 1 | Step: 35940 | Dataset: 0-9024901 | Loss: 2.240 | 674 ms/step , 58286.91 GFLOP/s , 532447.8 tokens/s INFO:__main__:2024-10-27 01:53:01 | Epoch: 1 | Step: 35950 | Dataset: 0-9032901 | Loss: 2.236 | 675 ms/step , 58226.96 GFLOP/s , 532237.5 tokens/s INFO:__main__:2024-10-27 01:53:09 | Epoch: 1 | Step: 35960 | Dataset: 0-9040901 | Loss: 2.243 | 676 ms/step , 58132.22 GFLOP/s , 530555.4 tokens/s INFO:__main__:2024-10-27 01:53:16 | Epoch: 1 | Step: 35970 | Dataset: 0-9048901 | Loss: 2.245 | 676 ms/step , 58180.76 GFLOP/s , 531066.4 tokens/s INFO:__main__:2024-10-27 01:53:24 | Epoch: 1 | Step: 35980 | Dataset: 0-9056901 | Loss: 2.279 | 676 ms/step , 58153.03 GFLOP/s , 531129.9 tokens/s INFO:__main__:2024-10-27 01:53:32 | Epoch: 1 | Step: 35990 | Dataset: 0-9064901 | Loss: 2.145 | 676 ms/step , 58107.09 GFLOP/s , 531048.2 tokens/s INFO:__main__:2024-10-27 01:53:39 | Validation | Step: 36000 | Val_loss: 2.092 | Best_val_loss: 1.9440 INFO:__main__:2024-10-27 01:53:39 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_015339_step_36000.pt` INFO:__main__:2024-10-27 01:53:40 | Epoch: 1 | Step: 36000 | Dataset: 0-9072901 | Loss: 2.264 | 674 ms/step , 58360.74 GFLOP/s , 478087.9 tokens/s INFO:__main__:2024-10-27 01:53:48 | Epoch: 1 | Step: 36010 | Dataset: 0-9080901 | Loss: 2.280 | 675 ms/step , 58270.35 GFLOP/s , 531950.7 tokens/s INFO:__main__:2024-10-27 01:53:56 | Epoch: 1 | Step: 36020 | Dataset: 0-9088901 | Loss: 2.171 | 676 ms/step , 58122.08 GFLOP/s , 531354.2 tokens/s INFO:__main__:2024-10-27 01:54:04 | Epoch: 1 | Step: 36030 | Dataset: 0-9096901 | Loss: 2.200 | 676 ms/step , 58127.57 GFLOP/s , 528693.9 tokens/s INFO:__main__:2024-10-27 01:54:11 | Epoch: 1 | Step: 36040 | Dataset: 0-9104901 | Loss: 2.213 | 675 ms/step , 58243.40 GFLOP/s , 531735.4 tokens/s INFO:__main__:2024-10-27 01:54:19 | Epoch: 1 | Step: 36050 | Dataset: 0-9112901 | Loss: 2.218 | 675 ms/step , 58198.56 GFLOP/s , 532093.4 tokens/s INFO:__main__:2024-10-27 01:54:27 | Epoch: 1 | Step: 36060 | Dataset: 0-9120901 | Loss: 2.187 | 675 ms/step , 58209.03 GFLOP/s , 532522.6 tokens/s INFO:__main__:2024-10-27 01:54:34 | Epoch: 1 | Step: 36070 | Dataset: 0-9128901 | Loss: 2.233 | 675 ms/step , 58266.29 GFLOP/s , 531883.4 tokens/s INFO:__main__:2024-10-27 01:54:42 | Epoch: 1 | Step: 36080 | Dataset: 0-9136901 | Loss: 2.309 | 676 ms/step , 58182.95 GFLOP/s , 532176.8 tokens/s INFO:__main__:2024-10-27 01:54:50 | Epoch: 1 | Step: 36090 | Dataset: 0-9144901 | Loss: 2.292 | 675 ms/step , 58253.97 GFLOP/s , 532205.0 tokens/s INFO:__main__:2024-10-27 01:54:57 | Epoch: 1 | Step: 36100 | Dataset: 0-9152901 | Loss: 2.228 | 675 ms/step , 58208.60 GFLOP/s , 532489.0 tokens/s INFO:__main__:2024-10-27 01:55:05 | Epoch: 1 | Step: 36110 | Dataset: 0-9160901 | Loss: 2.260 | 675 ms/step , 58235.51 GFLOP/s , 532490.6 tokens/s INFO:__main__:2024-10-27 01:55:13 | Epoch: 1 | Step: 36120 | Dataset: 0-9168901 | Loss: 2.277 | 675 ms/step , 58220.69 GFLOP/s , 532450.8 tokens/s INFO:__main__:2024-10-27 01:55:21 | Epoch: 1 | Step: 36130 | Dataset: 0-9176901 | Loss: 2.183 | 675 ms/step , 58256.77 GFLOP/s , 532149.7 tokens/s INFO:__main__:2024-10-27 01:55:28 | Epoch: 1 | Step: 36140 | Dataset: 0-9184901 | Loss: 2.146 | 675 ms/step , 58214.43 GFLOP/s , 532097.8 tokens/s INFO:__main__:2024-10-27 01:55:36 | Epoch: 1 | Step: 36150 | Dataset: 0-9192901 | Loss: 2.217 | 675 ms/step , 58194.10 GFLOP/s , 532404.7 tokens/s INFO:__main__:2024-10-27 01:55:44 | Epoch: 1 | Step: 36160 | Dataset: 0-9200901 | Loss: 2.152 | 675 ms/step , 58245.43 GFLOP/s , 532225.5 tokens/s INFO:__main__:2024-10-27 01:55:51 | Epoch: 1 | Step: 36170 | Dataset: 0-9208901 | Loss: 2.250 | 676 ms/step , 58155.14 GFLOP/s , 532519.7 tokens/s INFO:__main__:2024-10-27 01:55:59 | Epoch: 1 | Step: 36180 | Dataset: 0-9216901 | Loss: 2.204 | 676 ms/step , 58178.88 GFLOP/s , 532211.5 tokens/s INFO:__main__:2024-10-27 01:56:07 | Epoch: 1 | Step: 36190 | Dataset: 0-9224901 | Loss: 2.224 | 676 ms/step , 58172.51 GFLOP/s , 532300.0 tokens/s INFO:__main__:2024-10-27 01:56:14 | Epoch: 1 | Step: 36200 | Dataset: 0-9232901 | Loss: 2.152 | 674 ms/step , 58327.69 GFLOP/s , 532338.7 tokens/s INFO:__main__:2024-10-27 01:56:22 | Epoch: 1 | Step: 36210 | Dataset: 0-9240901 | Loss: 2.171 | 675 ms/step , 58243.20 GFLOP/s , 532299.5 tokens/s INFO:__main__:2024-10-27 01:56:30 | Epoch: 1 | Step: 36220 | Dataset: 0-9248901 | Loss: 2.117 | 675 ms/step , 58207.36 GFLOP/s , 532381.1 tokens/s INFO:__main__:2024-10-27 01:56:37 | Epoch: 1 | Step: 36230 | Dataset: 0-9256901 | Loss: 2.167 | 674 ms/step , 58315.11 GFLOP/s , 532356.0 tokens/s INFO:__main__:2024-10-27 01:56:45 | Epoch: 1 | Step: 36240 | Dataset: 0-9264901 | Loss: 2.225 | 674 ms/step , 58339.18 GFLOP/s , 532761.1 tokens/s INFO:__main__:2024-10-27 01:56:53 | Epoch: 1 | Step: 36250 | Dataset: 0-9272901 | Loss: 2.168 | 674 ms/step , 58329.10 GFLOP/s , 532967.8 tokens/s INFO:__main__:2024-10-27 01:57:01 | Epoch: 1 | Step: 36260 | Dataset: 0-9280901 | Loss: 2.090 | 674 ms/step , 58290.25 GFLOP/s , 532507.6 tokens/s INFO:__main__:2024-10-27 01:57:08 | Epoch: 1 | Step: 36270 | Dataset: 0-9288901 | Loss: 1.931 | 675 ms/step , 58272.23 GFLOP/s , 531833.6 tokens/s INFO:__main__:2024-10-27 01:57:16 | Epoch: 1 | Step: 36280 | Dataset: 0-9296901 | Loss: 1.838 | 675 ms/step , 58244.61 GFLOP/s , 531963.7 tokens/s INFO:__main__:2024-10-27 01:57:24 | Epoch: 1 | Step: 36290 | Dataset: 0-9304901 | Loss: 1.821 | 674 ms/step , 58328.50 GFLOP/s , 531683.9 tokens/s INFO:__main__:2024-10-27 01:57:31 | Epoch: 1 | Step: 36300 | Dataset: 0-9312901 | Loss: 1.832 | 676 ms/step , 58146.78 GFLOP/s , 532428.4 tokens/s INFO:__main__:2024-10-27 01:57:39 | Epoch: 1 | Step: 36310 | Dataset: 0-9320901 | Loss: 1.805 | 674 ms/step , 58350.70 GFLOP/s , 531841.5 tokens/s INFO:__main__:2024-10-27 01:57:47 | Epoch: 1 | Step: 36320 | Dataset: 0-9328901 | Loss: 1.821 | 676 ms/step , 58108.60 GFLOP/s , 531675.0 tokens/s INFO:__main__:2024-10-27 01:57:54 | Epoch: 1 | Step: 36330 | Dataset: 0-9336901 | Loss: 1.797 | 675 ms/step , 58232.05 GFLOP/s , 531862.3 tokens/s INFO:__main__:2024-10-27 01:58:02 | Epoch: 1 | Step: 36340 | Dataset: 0-9344901 | Loss: 1.778 | 676 ms/step , 58183.62 GFLOP/s , 531098.8 tokens/s INFO:__main__:2024-10-27 01:58:10 | Epoch: 1 | Step: 36350 | Dataset: 0-9352901 | Loss: 1.787 | 675 ms/step , 58232.01 GFLOP/s , 531587.1 tokens/s INFO:__main__:2024-10-27 01:58:18 | Epoch: 1 | Step: 36360 | Dataset: 0-9360901 | Loss: 2.258 | 674 ms/step , 58297.24 GFLOP/s , 532659.6 tokens/s INFO:__main__:2024-10-27 01:58:25 | Epoch: 1 | Step: 36370 | Dataset: 0-9368901 | Loss: 2.162 | 679 ms/step , 57924.71 GFLOP/s , 532501.4 tokens/s INFO:__main__:2024-10-27 01:58:33 | Epoch: 1 | Step: 36380 | Dataset: 0-9376901 | Loss: 2.159 | 675 ms/step , 58199.01 GFLOP/s , 532175.8 tokens/s INFO:__main__:2024-10-27 01:58:41 | Epoch: 1 | Step: 36390 | Dataset: 0-9384901 | Loss: 2.125 | 675 ms/step , 58214.88 GFLOP/s , 532264.4 tokens/s INFO:__main__:2024-10-27 01:58:48 | Epoch: 1 | Step: 36400 | Dataset: 0-9392901 | Loss: 2.165 | 676 ms/step , 58130.10 GFLOP/s , 531689.0 tokens/s INFO:__main__:2024-10-27 01:58:56 | Epoch: 1 | Step: 36410 | Dataset: 0-9400901 | Loss: 2.141 | 676 ms/step , 58177.33 GFLOP/s , 531140.7 tokens/s INFO:__main__:2024-10-27 01:59:04 | Epoch: 1 | Step: 36420 | Dataset: 0-9408901 | Loss: 2.185 | 674 ms/step , 58280.89 GFLOP/s , 532504.3 tokens/s INFO:__main__:2024-10-27 01:59:11 | Epoch: 1 | Step: 36430 | Dataset: 0-9416901 | Loss: 2.130 | 675 ms/step , 58266.02 GFLOP/s , 532880.9 tokens/s INFO:__main__:2024-10-27 01:59:19 | Epoch: 1 | Step: 36440 | Dataset: 0-9424901 | Loss: 2.150 | 675 ms/step , 58265.56 GFLOP/s , 532595.2 tokens/s INFO:__main__:2024-10-27 01:59:27 | Epoch: 1 | Step: 36450 | Dataset: 0-9432901 | Loss: 2.148 | 675 ms/step , 58216.20 GFLOP/s , 532998.0 tokens/s INFO:__main__:2024-10-27 01:59:34 | Epoch: 1 | Step: 36460 | Dataset: 0-9440901 | Loss: 2.169 | 676 ms/step , 58116.02 GFLOP/s , 531629.5 tokens/s INFO:__main__:2024-10-27 01:59:42 | Epoch: 1 | Step: 36470 | Dataset: 0-9448901 | Loss: 2.131 | 674 ms/step , 58332.07 GFLOP/s , 532318.8 tokens/s INFO:__main__:2024-10-27 01:59:50 | Epoch: 1 | Step: 36480 | Dataset: 0-9456901 | Loss: 2.172 | 675 ms/step , 58247.83 GFLOP/s , 532547.5 tokens/s INFO:__main__:2024-10-27 01:59:58 | Epoch: 1 | Step: 36490 | Dataset: 0-9464901 | Loss: 2.123 | 675 ms/step , 58262.79 GFLOP/s , 532836.5 tokens/s INFO:__main__:2024-10-27 02:00:04 | Epoch: 1 | Step: 36500 | Dataset: 0-9472901 | Loss: 2.074 | 675 ms/step , 58226.72 GFLOP/s , 604519.0 tokens/s INFO:__main__:2024-10-27 02:00:12 | Epoch: 1 | Step: 36510 | Dataset: 0-9480901 | Loss: 2.170 | 674 ms/step , 58296.69 GFLOP/s , 532497.7 tokens/s INFO:__main__:2024-10-27 02:00:20 | Epoch: 1 | Step: 36520 | Dataset: 0-9488901 | Loss: 2.179 | 674 ms/step , 58286.49 GFLOP/s , 532663.1 tokens/s INFO:__main__:2024-10-27 02:00:27 | Epoch: 1 | Step: 36530 | Dataset: 0-9496901 | Loss: 2.204 | 676 ms/step , 58145.02 GFLOP/s , 532560.8 tokens/s INFO:__main__:2024-10-27 02:00:35 | Epoch: 1 | Step: 36540 | Dataset: 0-9504901 | Loss: 2.200 | 675 ms/step , 58259.50 GFLOP/s , 532908.2 tokens/s INFO:__main__:2024-10-27 02:00:43 | Epoch: 1 | Step: 36550 | Dataset: 0-9512901 | Loss: 2.201 | 676 ms/step , 58170.96 GFLOP/s , 532653.9 tokens/s INFO:__main__:2024-10-27 02:00:50 | Epoch: 1 | Step: 36560 | Dataset: 0-9520901 | Loss: 2.221 | 675 ms/step , 58275.83 GFLOP/s , 532807.6 tokens/s INFO:__main__:2024-10-27 02:00:58 | Epoch: 1 | Step: 36570 | Dataset: 0-9528901 | Loss: 2.274 | 676 ms/step , 58156.93 GFLOP/s , 532561.4 tokens/s INFO:__main__:2024-10-27 02:01:06 | Epoch: 1 | Step: 36580 | Dataset: 0-9536901 | Loss: 2.235 | 675 ms/step , 58199.06 GFLOP/s , 532672.2 tokens/s INFO:__main__:2024-10-27 02:01:14 | Epoch: 1 | Step: 36590 | Dataset: 0-9544901 | Loss: 2.219 | 674 ms/step , 58356.38 GFLOP/s , 532721.9 tokens/s INFO:__main__:2024-10-27 02:01:21 | Epoch: 1 | Step: 36600 | Dataset: 0-9552901 | Loss: 2.146 | 675 ms/step , 58245.98 GFLOP/s , 532616.0 tokens/s INFO:__main__:2024-10-27 02:01:29 | Epoch: 1 | Step: 36610 | Dataset: 0-9560901 | Loss: 2.188 | 675 ms/step , 58208.75 GFLOP/s , 533272.4 tokens/s INFO:__main__:2024-10-27 02:01:37 | Epoch: 1 | Step: 36620 | Dataset: 0-9568901 | Loss: 2.130 | 680 ms/step , 57818.29 GFLOP/s , 529373.5 tokens/s INFO:__main__:2024-10-27 02:01:44 | Epoch: 1 | Step: 36630 | Dataset: 0-9576901 | Loss: 2.127 | 679 ms/step , 57887.77 GFLOP/s , 529034.5 tokens/s INFO:__main__:2024-10-27 02:01:52 | Epoch: 1 | Step: 36640 | Dataset: 0-9584901 | Loss: 2.132 | 679 ms/step , 57907.10 GFLOP/s , 529049.7 tokens/s INFO:__main__:2024-10-27 02:02:00 | Epoch: 1 | Step: 36650 | Dataset: 0-9592901 | Loss: 2.173 | 680 ms/step , 57811.09 GFLOP/s , 528721.4 tokens/s INFO:__main__:2024-10-27 02:02:08 | Epoch: 1 | Step: 36660 | Dataset: 0-9600901 | Loss: 2.202 | 680 ms/step , 57800.92 GFLOP/s , 528731.2 tokens/s INFO:__main__:2024-10-27 02:02:15 | Epoch: 1 | Step: 36670 | Dataset: 0-9608901 | Loss: 2.195 | 680 ms/step , 57813.63 GFLOP/s , 528388.0 tokens/s INFO:__main__:2024-10-27 02:02:23 | Epoch: 1 | Step: 36680 | Dataset: 0-9616901 | Loss: 2.180 | 679 ms/step , 57872.78 GFLOP/s , 528574.4 tokens/s INFO:__main__:2024-10-27 02:02:31 | Epoch: 1 | Step: 36690 | Dataset: 0-9624901 | Loss: 2.158 | 681 ms/step , 57756.69 GFLOP/s , 528663.4 tokens/s INFO:__main__:2024-10-27 02:02:39 | Epoch: 1 | Step: 36700 | Dataset: 0-9632901 | Loss: 2.162 | 680 ms/step , 57800.45 GFLOP/s , 528483.2 tokens/s INFO:__main__:2024-10-27 02:02:46 | Epoch: 1 | Step: 36710 | Dataset: 0-9640901 | Loss: 2.156 | 681 ms/step , 57725.71 GFLOP/s , 528091.7 tokens/s INFO:__main__:2024-10-27 02:02:54 | Epoch: 1 | Step: 36720 | Dataset: 0-9648901 | Loss: 2.235 | 680 ms/step , 57846.82 GFLOP/s , 528615.3 tokens/s INFO:__main__:2024-10-27 02:03:02 | Epoch: 1 | Step: 36730 | Dataset: 0-9656901 | Loss: 2.190 | 679 ms/step , 57857.92 GFLOP/s , 527940.3 tokens/s INFO:__main__:2024-10-27 02:03:10 | Epoch: 1 | Step: 36740 | Dataset: 0-9664901 | Loss: 2.164 | 681 ms/step , 57705.57 GFLOP/s , 528394.6 tokens/s INFO:__main__:2024-10-27 02:03:17 | Epoch: 1 | Step: 36750 | Dataset: 0-9672901 | Loss: 2.065 | 682 ms/step , 57675.49 GFLOP/s , 528103.0 tokens/s INFO:__main__:2024-10-27 02:03:25 | Epoch: 1 | Step: 36760 | Dataset: 0-9680901 | Loss: 2.126 | 675 ms/step , 58265.79 GFLOP/s , 529811.8 tokens/s INFO:__main__:2024-10-27 02:03:33 | Epoch: 1 | Step: 36770 | Dataset: 0-9688901 | Loss: 2.142 | 674 ms/step , 58307.04 GFLOP/s , 532973.3 tokens/s INFO:__main__:2024-10-27 02:03:41 | Epoch: 1 | Step: 36780 | Dataset: 0-9696901 | Loss: 2.261 | 676 ms/step , 58171.97 GFLOP/s , 532123.2 tokens/s INFO:__main__:2024-10-27 02:03:48 | Epoch: 1 | Step: 36790 | Dataset: 0-9704901 | Loss: 2.195 | 677 ms/step , 58064.16 GFLOP/s , 531282.1 tokens/s INFO:__main__:2024-10-27 02:03:56 | Epoch: 1 | Step: 36800 | Dataset: 0-9712901 | Loss: 2.083 | 678 ms/step , 57986.83 GFLOP/s , 530890.7 tokens/s INFO:__main__:2024-10-27 02:04:04 | Epoch: 1 | Step: 36810 | Dataset: 0-9720901 | Loss: 2.126 | 677 ms/step , 58053.89 GFLOP/s , 531272.8 tokens/s INFO:__main__:2024-10-27 02:04:11 | Epoch: 1 | Step: 36820 | Dataset: 0-9728901 | Loss: 2.117 | 677 ms/step , 58080.27 GFLOP/s , 531246.1 tokens/s INFO:__main__:2024-10-27 02:04:19 | Epoch: 1 | Step: 36830 | Dataset: 0-9736901 | Loss: 2.050 | 676 ms/step , 58136.13 GFLOP/s , 531291.7 tokens/s INFO:__main__:2024-10-27 02:04:27 | Epoch: 1 | Step: 36840 | Dataset: 0-9744901 | Loss: 1.861 | 677 ms/step , 58022.56 GFLOP/s , 530547.4 tokens/s INFO:__main__:2024-10-27 02:04:34 | Epoch: 1 | Step: 36850 | Dataset: 0-9752901 | Loss: 1.765 | 676 ms/step , 58142.98 GFLOP/s , 531117.4 tokens/s INFO:__main__:2024-10-27 02:04:42 | Epoch: 1 | Step: 36860 | Dataset: 0-9760901 | Loss: 1.749 | 677 ms/step , 58046.55 GFLOP/s , 530652.9 tokens/s INFO:__main__:2024-10-27 02:04:50 | Epoch: 1 | Step: 36870 | Dataset: 0-9768901 | Loss: 1.756 | 677 ms/step , 58092.05 GFLOP/s , 530990.7 tokens/s INFO:__main__:2024-10-27 02:04:58 | Epoch: 1 | Step: 36880 | Dataset: 0-9776901 | Loss: 1.715 | 676 ms/step , 58155.32 GFLOP/s , 530817.6 tokens/s INFO:__main__:2024-10-27 02:05:05 | Epoch: 1 | Step: 36890 | Dataset: 0-9784901 | Loss: 1.705 | 676 ms/step , 58181.28 GFLOP/s , 530929.3 tokens/s INFO:__main__:2024-10-27 02:05:13 | Epoch: 1 | Step: 36900 | Dataset: 0-9792901 | Loss: 1.667 | 676 ms/step , 58158.49 GFLOP/s , 531410.4 tokens/s INFO:__main__:2024-10-27 02:05:21 | Epoch: 1 | Step: 36910 | Dataset: 0-9800901 | Loss: 1.702 | 676 ms/step , 58169.49 GFLOP/s , 531217.4 tokens/s INFO:__main__:2024-10-27 02:05:28 | Epoch: 1 | Step: 36920 | Dataset: 0-9808901 | Loss: 1.708 | 676 ms/step , 58165.39 GFLOP/s , 531322.5 tokens/s INFO:__main__:2024-10-27 02:05:36 | Epoch: 1 | Step: 36930 | Dataset: 0-9816901 | Loss: 2.203 | 676 ms/step , 58114.63 GFLOP/s , 530481.8 tokens/s INFO:__main__:2024-10-27 02:05:44 | Epoch: 1 | Step: 36940 | Dataset: 0-9824901 | Loss: 2.161 | 677 ms/step , 58077.51 GFLOP/s , 531732.0 tokens/s INFO:__main__:2024-10-27 02:05:52 | Epoch: 1 | Step: 36950 | Dataset: 0-9832901 | Loss: 2.178 | 677 ms/step , 58039.24 GFLOP/s , 530559.9 tokens/s INFO:__main__:2024-10-27 02:05:59 | Epoch: 1 | Step: 36960 | Dataset: 0-9840901 | Loss: 2.158 | 675 ms/step , 58246.85 GFLOP/s , 532229.5 tokens/s INFO:__main__:2024-10-27 02:06:07 | Epoch: 1 | Step: 36970 | Dataset: 0-9848901 | Loss: 2.133 | 676 ms/step , 58172.07 GFLOP/s , 532095.0 tokens/s INFO:__main__:2024-10-27 02:06:15 | Epoch: 1 | Step: 36980 | Dataset: 0-9856901 | Loss: 2.188 | 674 ms/step , 58321.86 GFLOP/s , 533108.2 tokens/s INFO:__main__:2024-10-27 02:06:22 | Epoch: 1 | Step: 36990 | Dataset: 0-9864901 | Loss: 2.175 | 677 ms/step , 58091.41 GFLOP/s , 532704.1 tokens/s INFO:__main__:2024-10-27 02:06:30 | Validation | Step: 37000 | Val_loss: 2.079 | Best_val_loss: 1.9440 INFO:__main__:2024-10-27 02:06:30 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_020630_step_37000.pt` INFO:__main__:2024-10-27 02:06:31 | Epoch: 1 | Step: 37000 | Dataset: 0-9872901 | Loss: 2.140 | 673 ms/step , 58378.27 GFLOP/s , 479243.7 tokens/s INFO:__main__:2024-10-27 02:06:39 | Epoch: 1 | Step: 37010 | Dataset: 0-9880901 | Loss: 2.040 | 676 ms/step , 58124.38 GFLOP/s , 532381.0 tokens/s INFO:__main__:2024-10-27 02:06:46 | Epoch: 1 | Step: 37020 | Dataset: 0-9888901 | Loss: 2.128 | 675 ms/step , 58219.65 GFLOP/s , 532499.0 tokens/s INFO:__main__:2024-10-27 02:06:54 | Epoch: 1 | Step: 37030 | Dataset: 0-9896901 | Loss: 2.080 | 675 ms/step , 58248.28 GFLOP/s , 532641.8 tokens/s INFO:__main__:2024-10-27 02:07:02 | Epoch: 1 | Step: 37040 | Dataset: 0-9904901 | Loss: 2.071 | 675 ms/step , 58236.18 GFLOP/s , 542292.5 tokens/s INFO:__main__:2024-10-27 02:07:09 | Epoch: 1 | Step: 37050 | Dataset: 0-9912901 | Loss: 2.132 | 676 ms/step , 58188.51 GFLOP/s , 532594.0 tokens/s INFO:__main__:2024-10-27 02:07:17 | Epoch: 1 | Step: 37060 | Dataset: 0-9920901 | Loss: 2.142 | 675 ms/step , 58246.45 GFLOP/s , 532492.7 tokens/s INFO:__main__:2024-10-27 02:07:25 | Epoch: 1 | Step: 37070 | Dataset: 0-9928901 | Loss: 2.097 | 674 ms/step , 58295.15 GFLOP/s , 532520.8 tokens/s INFO:__main__:2024-10-27 02:07:32 | Epoch: 1 | Step: 37080 | Dataset: 0-9936901 | Loss: 2.196 | 676 ms/step , 58169.11 GFLOP/s , 532045.7 tokens/s INFO:__main__:2024-10-27 02:07:40 | Epoch: 1 | Step: 37090 | Dataset: 0-9944901 | Loss: 2.300 | 674 ms/step , 58292.03 GFLOP/s , 532925.3 tokens/s INFO:__main__:2024-10-27 02:07:48 | Epoch: 1 | Step: 37100 | Dataset: 0-9952901 | Loss: 2.239 | 675 ms/step , 58214.81 GFLOP/s , 532827.3 tokens/s INFO:__main__:2024-10-27 02:07:55 | Epoch: 1 | Step: 37110 | Dataset: 0-9960901 | Loss: 2.245 | 675 ms/step , 58263.14 GFLOP/s , 532098.8 tokens/s INFO:__main__:2024-10-27 02:08:03 | Epoch: 1 | Step: 37120 | Dataset: 0-9968901 | Loss: 2.266 | 674 ms/step , 58319.89 GFLOP/s , 532868.6 tokens/s INFO:__main__:2024-10-27 02:08:11 | Epoch: 1 | Step: 37130 | Dataset: 0-9976901 | Loss: 2.236 | 676 ms/step , 58164.08 GFLOP/s , 532068.7 tokens/s INFO:__main__:2024-10-27 02:08:18 | Epoch: 1 | Step: 37140 | Dataset: 0-9984901 | Loss: 2.254 | 675 ms/step , 58228.26 GFLOP/s , 532634.3 tokens/s INFO:__main__:2024-10-27 02:08:26 | Epoch: 1 | Step: 37150 | Dataset: 0-9992901 | Loss: 2.224 | 674 ms/step , 58319.82 GFLOP/s , 532328.8 tokens/s INFO:__main__:2024-10-27 02:08:34 | Epoch: 1 | Step: 37160 | Dataset: 0-10000901 | Loss: 2.244 | 675 ms/step , 58274.41 GFLOP/s , 532928.6 tokens/s INFO:__main__:2024-10-27 02:08:42 | Epoch: 1 | Step: 37170 | Dataset: 0-10008901 | Loss: 2.235 | 674 ms/step , 58324.57 GFLOP/s , 532498.0 tokens/s INFO:__main__:2024-10-27 02:08:49 | Epoch: 1 | Step: 37180 | Dataset: 0-10016901 | Loss: 2.225 | 675 ms/step , 58203.89 GFLOP/s , 532735.7 tokens/s INFO:__main__:2024-10-27 02:08:57 | Epoch: 1 | Step: 37190 | Dataset: 0-10024901 | Loss: 2.218 | 678 ms/step , 58013.11 GFLOP/s , 532368.8 tokens/s INFO:__main__:2024-10-27 02:09:05 | Epoch: 1 | Step: 37200 | Dataset: 0-10032901 | Loss: 2.161 | 675 ms/step , 58270.22 GFLOP/s , 531096.4 tokens/s INFO:__main__:2024-10-27 02:09:12 | Epoch: 1 | Step: 37210 | Dataset: 0-10040901 | Loss: 2.167 | 675 ms/step , 58225.90 GFLOP/s , 532699.9 tokens/s INFO:__main__:2024-10-27 02:09:20 | Epoch: 1 | Step: 37220 | Dataset: 0-10048901 | Loss: 2.145 | 674 ms/step , 58353.19 GFLOP/s , 532301.8 tokens/s INFO:__main__:2024-10-27 02:09:28 | Epoch: 1 | Step: 37230 | Dataset: 0-10056901 | Loss: 2.180 | 674 ms/step , 58284.76 GFLOP/s , 533109.2 tokens/s INFO:__main__:2024-10-27 02:09:35 | Epoch: 1 | Step: 37240 | Dataset: 0-10064901 | Loss: 2.198 | 675 ms/step , 58197.69 GFLOP/s , 531118.1 tokens/s INFO:__main__:2024-10-27 02:09:43 | Epoch: 1 | Step: 37250 | Dataset: 0-10072901 | Loss: 1.943 | 676 ms/step , 58186.56 GFLOP/s , 532034.7 tokens/s INFO:__main__:2024-10-27 02:09:51 | Epoch: 1 | Step: 37260 | Dataset: 0-10080901 | Loss: 1.841 | 675 ms/step , 58244.95 GFLOP/s , 531381.9 tokens/s INFO:__main__:2024-10-27 02:09:59 | Epoch: 1 | Step: 37270 | Dataset: 0-10088901 | Loss: 1.844 | 675 ms/step , 58274.02 GFLOP/s , 532493.8 tokens/s INFO:__main__:2024-10-27 02:10:06 | Epoch: 1 | Step: 37280 | Dataset: 0-10096901 | Loss: 1.796 | 674 ms/step , 58315.68 GFLOP/s , 531341.3 tokens/s INFO:__main__:2024-10-27 02:10:14 | Epoch: 1 | Step: 37290 | Dataset: 0-10104901 | Loss: 1.814 | 674 ms/step , 58286.33 GFLOP/s , 532714.1 tokens/s INFO:__main__:2024-10-27 02:10:22 | Epoch: 1 | Step: 37300 | Dataset: 0-10112901 | Loss: 1.787 | 675 ms/step , 58263.18 GFLOP/s , 532242.0 tokens/s INFO:__main__:2024-10-27 02:10:29 | Epoch: 1 | Step: 37310 | Dataset: 0-10120901 | Loss: 1.795 | 675 ms/step , 58255.32 GFLOP/s , 531973.1 tokens/s INFO:__main__:2024-10-27 02:10:37 | Epoch: 1 | Step: 37320 | Dataset: 0-10128901 | Loss: 1.805 | 676 ms/step , 58150.79 GFLOP/s , 531870.1 tokens/s INFO:__main__:2024-10-27 02:10:45 | Epoch: 1 | Step: 37330 | Dataset: 0-10136901 | Loss: 1.775 | 675 ms/step , 58228.18 GFLOP/s , 530638.6 tokens/s INFO:__main__:2024-10-27 02:10:52 | Epoch: 1 | Step: 37340 | Dataset: 0-10144901 | Loss: 2.221 | 675 ms/step , 58208.35 GFLOP/s , 531540.6 tokens/s INFO:__main__:2024-10-27 02:11:00 | Epoch: 1 | Step: 37350 | Dataset: 0-10152901 | Loss: 2.200 | 676 ms/step , 58146.46 GFLOP/s , 531467.2 tokens/s INFO:__main__:2024-10-27 02:11:08 | Epoch: 1 | Step: 37360 | Dataset: 0-10160901 | Loss: 2.100 | 676 ms/step , 58180.90 GFLOP/s , 531853.7 tokens/s INFO:__main__:2024-10-27 02:11:16 | Epoch: 1 | Step: 37370 | Dataset: 0-10168901 | Loss: 2.119 | 675 ms/step , 58264.59 GFLOP/s , 531952.4 tokens/s INFO:__main__:2024-10-27 02:11:23 | Epoch: 1 | Step: 37380 | Dataset: 0-10176901 | Loss: 2.141 | 676 ms/step , 58115.63 GFLOP/s , 531965.6 tokens/s INFO:__main__:2024-10-27 02:11:31 | Epoch: 1 | Step: 37390 | Dataset: 0-10184901 | Loss: 2.189 | 677 ms/step , 58105.42 GFLOP/s , 531791.8 tokens/s INFO:__main__:2024-10-27 02:11:39 | Epoch: 1 | Step: 37400 | Dataset: 0-10192901 | Loss: 2.087 | 675 ms/step , 58265.80 GFLOP/s , 530773.7 tokens/s INFO:__main__:2024-10-27 02:11:46 | Epoch: 1 | Step: 37410 | Dataset: 0-10200901 | Loss: 2.135 | 676 ms/step , 58163.64 GFLOP/s , 532704.7 tokens/s INFO:__main__:2024-10-27 02:11:54 | Epoch: 1 | Step: 37420 | Dataset: 0-10208901 | Loss: 2.110 | 674 ms/step , 58339.39 GFLOP/s , 532694.9 tokens/s INFO:__main__:2024-10-27 02:12:02 | Epoch: 1 | Step: 37430 | Dataset: 0-10216901 | Loss: 2.142 | 676 ms/step , 58132.70 GFLOP/s , 532683.6 tokens/s INFO:__main__:2024-10-27 02:12:09 | Epoch: 1 | Step: 37440 | Dataset: 0-10224901 | Loss: 2.154 | 675 ms/step , 58247.70 GFLOP/s , 532058.5 tokens/s INFO:__main__:2024-10-27 02:12:17 | Epoch: 1 | Step: 37450 | Dataset: 0-10232901 | Loss: 2.152 | 676 ms/step , 58127.49 GFLOP/s , 531573.0 tokens/s INFO:__main__:2024-10-27 02:12:25 | Epoch: 1 | Step: 37460 | Dataset: 0-10240901 | Loss: 2.140 | 676 ms/step , 58122.04 GFLOP/s , 530278.6 tokens/s INFO:__main__:2024-10-27 02:12:33 | Epoch: 1 | Step: 37470 | Dataset: 0-10248901 | Loss: 2.090 | 675 ms/step , 58217.64 GFLOP/s , 531686.4 tokens/s INFO:__main__:2024-10-27 02:12:40 | Epoch: 1 | Step: 37480 | Dataset: 0-10256901 | Loss: 2.178 | 675 ms/step , 58219.09 GFLOP/s , 531108.4 tokens/s INFO:__main__:2024-10-27 02:12:48 | Epoch: 1 | Step: 37490 | Dataset: 0-10264901 | Loss: 2.183 | 677 ms/step , 58106.26 GFLOP/s , 531233.8 tokens/s INFO:__main__:2024-10-27 02:12:56 | Epoch: 1 | Step: 37500 | Dataset: 0-10272901 | Loss: 2.262 | 676 ms/step , 58177.24 GFLOP/s , 531457.5 tokens/s INFO:__main__:2024-10-27 02:13:03 | Epoch: 1 | Step: 37510 | Dataset: 0-10280901 | Loss: 2.235 | 675 ms/step , 58199.79 GFLOP/s , 531577.1 tokens/s INFO:__main__:2024-10-27 02:13:11 | Epoch: 1 | Step: 37520 | Dataset: 0-10288901 | Loss: 2.180 | 675 ms/step , 58227.80 GFLOP/s , 531217.2 tokens/s INFO:__main__:2024-10-27 02:13:19 | Epoch: 1 | Step: 37530 | Dataset: 0-10296901 | Loss: 2.231 | 675 ms/step , 58246.35 GFLOP/s , 529511.9 tokens/s INFO:__main__:2024-10-27 02:13:27 | Epoch: 1 | Step: 37540 | Dataset: 0-10304901 | Loss: 2.194 | 674 ms/step , 58284.14 GFLOP/s , 532271.1 tokens/s INFO:__main__:2024-10-27 02:13:34 | Epoch: 1 | Step: 37550 | Dataset: 0-10312901 | Loss: 2.145 | 675 ms/step , 58235.76 GFLOP/s , 532992.3 tokens/s INFO:__main__:2024-10-27 02:13:42 | Epoch: 1 | Step: 37560 | Dataset: 0-10320901 | Loss: 2.230 | 675 ms/step , 58258.00 GFLOP/s , 532553.8 tokens/s INFO:__main__:2024-10-27 02:13:50 | Epoch: 1 | Step: 37570 | Dataset: 0-10328901 | Loss: 2.149 | 674 ms/step , 58304.03 GFLOP/s , 532428.5 tokens/s INFO:__main__:2024-10-27 02:13:57 | Epoch: 1 | Step: 37580 | Dataset: 0-10336901 | Loss: 2.215 | 674 ms/step , 58297.63 GFLOP/s , 533122.6 tokens/s INFO:__main__:2024-10-27 02:14:05 | Epoch: 1 | Step: 37590 | Dataset: 0-10344901 | Loss: 2.183 | 675 ms/step , 58221.63 GFLOP/s , 532192.1 tokens/s INFO:__main__:2024-10-27 02:14:13 | Epoch: 1 | Step: 37600 | Dataset: 0-10352901 | Loss: 2.130 | 676 ms/step , 58161.32 GFLOP/s , 532269.7 tokens/s INFO:__main__:2024-10-27 02:14:20 | Epoch: 1 | Step: 37610 | Dataset: 0-10360901 | Loss: 2.152 | 676 ms/step , 58182.36 GFLOP/s , 532791.6 tokens/s INFO:__main__:2024-10-27 02:14:28 | Epoch: 1 | Step: 37620 | Dataset: 0-10368901 | Loss: 2.130 | 675 ms/step , 58263.96 GFLOP/s , 532797.7 tokens/s INFO:__main__:2024-10-27 02:14:36 | Epoch: 1 | Step: 37630 | Dataset: 0-10376901 | Loss: 2.116 | 675 ms/step , 58266.21 GFLOP/s , 532369.2 tokens/s INFO:__main__:2024-10-27 02:14:43 | Epoch: 1 | Step: 37640 | Dataset: 0-10384901 | Loss: 2.172 | 676 ms/step , 58187.13 GFLOP/s , 531473.3 tokens/s INFO:__main__:2024-10-27 02:14:51 | Epoch: 1 | Step: 37650 | Dataset: 0-10392901 | Loss: 2.159 | 675 ms/step , 58216.23 GFLOP/s , 532438.1 tokens/s INFO:__main__:2024-10-27 02:14:59 | Epoch: 1 | Step: 37660 | Dataset: 0-10400901 | Loss: 2.210 | 674 ms/step , 58341.97 GFLOP/s , 532833.8 tokens/s INFO:__main__:2024-10-27 02:15:07 | Epoch: 1 | Step: 37670 | Dataset: 0-10408901 | Loss: 2.223 | 674 ms/step , 58280.69 GFLOP/s , 532741.6 tokens/s INFO:__main__:2024-10-27 02:15:14 | Epoch: 1 | Step: 37680 | Dataset: 0-10416901 | Loss: 2.219 | 675 ms/step , 58209.64 GFLOP/s , 532375.1 tokens/s INFO:__main__:2024-10-27 02:15:22 | Epoch: 1 | Step: 37690 | Dataset: 0-10424901 | Loss: 2.236 | 675 ms/step , 58260.60 GFLOP/s , 532876.2 tokens/s INFO:__main__:2024-10-27 02:15:30 | Epoch: 1 | Step: 37700 | Dataset: 0-10432901 | Loss: 2.193 | 675 ms/step , 58249.55 GFLOP/s , 532623.0 tokens/s INFO:__main__:2024-10-27 02:15:37 | Epoch: 1 | Step: 37710 | Dataset: 0-10440901 | Loss: 2.219 | 675 ms/step , 58231.74 GFLOP/s , 532866.8 tokens/s INFO:__main__:2024-10-27 02:15:45 | Epoch: 1 | Step: 37720 | Dataset: 0-10448901 | Loss: 2.245 | 675 ms/step , 58245.03 GFLOP/s , 533286.3 tokens/s INFO:__main__:2024-10-27 02:15:53 | Epoch: 1 | Step: 37730 | Dataset: 0-10456901 | Loss: 2.139 | 675 ms/step , 58226.03 GFLOP/s , 532415.1 tokens/s INFO:__main__:2024-10-27 02:16:00 | Epoch: 1 | Step: 37740 | Dataset: 0-10464901 | Loss: 2.225 | 675 ms/step , 58232.56 GFLOP/s , 532588.8 tokens/s INFO:__main__:2024-10-27 02:16:08 | Epoch: 1 | Step: 37750 | Dataset: 0-10472901 | Loss: 2.218 | 675 ms/step , 58247.33 GFLOP/s , 531107.7 tokens/s INFO:__main__:2024-10-27 02:16:16 | Epoch: 1 | Step: 37760 | Dataset: 0-10480901 | Loss: 2.181 | 675 ms/step , 58249.96 GFLOP/s , 532099.2 tokens/s INFO:__main__:2024-10-27 02:16:23 | Epoch: 1 | Step: 37770 | Dataset: 0-10488901 | Loss: 2.206 | 675 ms/step , 58253.46 GFLOP/s , 532161.5 tokens/s INFO:__main__:2024-10-27 02:16:31 | Epoch: 1 | Step: 37780 | Dataset: 0-10496901 | Loss: 2.216 | 675 ms/step , 58266.19 GFLOP/s , 532537.2 tokens/s INFO:__main__:2024-10-27 02:16:39 | Epoch: 1 | Step: 37790 | Dataset: 0-10504901 | Loss: 2.263 | 675 ms/step , 58224.47 GFLOP/s , 532946.7 tokens/s INFO:__main__:2024-10-27 02:16:47 | Epoch: 1 | Step: 37800 | Dataset: 0-10512901 | Loss: 2.204 | 675 ms/step , 58277.35 GFLOP/s , 533026.7 tokens/s INFO:__main__:2024-10-27 02:16:54 | Epoch: 1 | Step: 37810 | Dataset: 0-10520901 | Loss: 2.270 | 675 ms/step , 58204.39 GFLOP/s , 532122.4 tokens/s INFO:__main__:2024-10-27 02:17:02 | Epoch: 1 | Step: 37820 | Dataset: 0-10528901 | Loss: 1.866 | 675 ms/step , 58238.29 GFLOP/s , 532213.2 tokens/s INFO:__main__:2024-10-27 02:17:10 | Epoch: 1 | Step: 37830 | Dataset: 0-10536901 | Loss: 1.751 | 675 ms/step , 58206.50 GFLOP/s , 532125.2 tokens/s INFO:__main__:2024-10-27 02:17:17 | Epoch: 1 | Step: 37840 | Dataset: 0-10544901 | Loss: 1.748 | 674 ms/step , 58324.08 GFLOP/s , 531957.4 tokens/s INFO:__main__:2024-10-27 02:17:25 | Epoch: 1 | Step: 37850 | Dataset: 0-10552901 | Loss: 1.747 | 675 ms/step , 58196.53 GFLOP/s , 531941.9 tokens/s INFO:__main__:2024-10-27 02:17:33 | Epoch: 1 | Step: 37860 | Dataset: 0-10560901 | Loss: 1.729 | 675 ms/step , 58215.47 GFLOP/s , 531983.2 tokens/s INFO:__main__:2024-10-27 02:17:40 | Epoch: 1 | Step: 37870 | Dataset: 0-10568901 | Loss: 1.705 | 675 ms/step , 58261.20 GFLOP/s , 531921.3 tokens/s INFO:__main__:2024-10-27 02:17:48 | Epoch: 1 | Step: 37880 | Dataset: 0-10576901 | Loss: 1.724 | 675 ms/step , 58253.27 GFLOP/s , 532359.1 tokens/s INFO:__main__:2024-10-27 02:17:56 | Epoch: 1 | Step: 37890 | Dataset: 0-10584901 | Loss: 1.688 | 675 ms/step , 58252.51 GFLOP/s , 532393.5 tokens/s INFO:__main__:2024-10-27 02:18:04 | Epoch: 1 | Step: 37900 | Dataset: 0-10592901 | Loss: 1.691 | 676 ms/step , 58160.57 GFLOP/s , 531769.5 tokens/s INFO:__main__:2024-10-27 02:18:11 | Epoch: 1 | Step: 37910 | Dataset: 0-10600901 | Loss: 1.714 | 675 ms/step , 58213.89 GFLOP/s , 532285.1 tokens/s INFO:__main__:2024-10-27 02:18:19 | Epoch: 1 | Step: 37920 | Dataset: 0-10608901 | Loss: 1.673 | 676 ms/step , 58187.06 GFLOP/s , 532229.7 tokens/s INFO:__main__:2024-10-27 02:18:27 | Epoch: 1 | Step: 37930 | Dataset: 0-10616901 | Loss: 1.647 | 675 ms/step , 58208.49 GFLOP/s , 531905.5 tokens/s INFO:__main__:2024-10-27 02:18:34 | Epoch: 1 | Step: 37940 | Dataset: 0-10624901 | Loss: 1.641 | 675 ms/step , 58238.14 GFLOP/s , 530752.0 tokens/s INFO:__main__:2024-10-27 02:18:42 | Epoch: 1 | Step: 37950 | Dataset: 0-10632901 | Loss: 1.653 | 675 ms/step , 58216.97 GFLOP/s , 528469.8 tokens/s INFO:__main__:2024-10-27 02:18:50 | Epoch: 1 | Step: 37960 | Dataset: 0-10640901 | Loss: 1.690 | 675 ms/step , 58221.43 GFLOP/s , 532122.4 tokens/s INFO:__main__:2024-10-27 02:18:57 | Epoch: 1 | Step: 37970 | Dataset: 0-10648901 | Loss: 1.686 | 674 ms/step , 58343.59 GFLOP/s , 532100.5 tokens/s INFO:__main__:2024-10-27 02:19:05 | Epoch: 1 | Step: 37980 | Dataset: 0-10656901 | Loss: 1.656 | 676 ms/step , 58180.13 GFLOP/s , 532376.0 tokens/s INFO:__main__:2024-10-27 02:19:13 | Epoch: 1 | Step: 37990 | Dataset: 0-10664901 | Loss: 2.458 | 676 ms/step , 58132.85 GFLOP/s , 531460.8 tokens/s INFO:__main__:2024-10-27 02:19:20 | Validation | Step: 38000 | Val_loss: 2.042 | Best_val_loss: 1.9440 INFO:__main__:2024-10-27 02:19:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_021920_step_38000.pt` INFO:__main__:2024-10-27 02:19:21 | Epoch: 1 | Step: 38000 | Dataset: 0-10672901 | Loss: 2.245 | 674 ms/step , 58364.61 GFLOP/s , 478900.1 tokens/s INFO:__main__:2024-10-27 02:19:29 | Epoch: 1 | Step: 38010 | Dataset: 0-10680901 | Loss: 2.198 | 676 ms/step , 58116.56 GFLOP/s , 531733.3 tokens/s INFO:__main__:2024-10-27 02:19:37 | Epoch: 1 | Step: 38020 | Dataset: 0-10688901 | Loss: 2.163 | 675 ms/step , 58250.95 GFLOP/s , 531200.6 tokens/s INFO:__main__:2024-10-27 02:19:45 | Epoch: 1 | Step: 38030 | Dataset: 0-10696901 | Loss: 2.091 | 675 ms/step , 58239.87 GFLOP/s , 532286.9 tokens/s INFO:__main__:2024-10-27 02:19:52 | Epoch: 1 | Step: 38040 | Dataset: 0-10704901 | Loss: 2.159 | 675 ms/step , 58214.98 GFLOP/s , 532684.6 tokens/s INFO:__main__:2024-10-27 02:20:00 | Epoch: 1 | Step: 38050 | Dataset: 0-10712901 | Loss: 2.135 | 676 ms/step , 58188.93 GFLOP/s , 532149.6 tokens/s INFO:__main__:2024-10-27 02:20:08 | Epoch: 1 | Step: 38060 | Dataset: 0-10720901 | Loss: 2.163 | 675 ms/step , 58258.65 GFLOP/s , 532670.7 tokens/s INFO:__main__:2024-10-27 02:20:15 | Epoch: 1 | Step: 38070 | Dataset: 0-10728901 | Loss: 2.249 | 676 ms/step , 58188.03 GFLOP/s , 532191.5 tokens/s INFO:__main__:2024-10-27 02:20:23 | Epoch: 1 | Step: 38080 | Dataset: 0-10736901 | Loss: 2.140 | 676 ms/step , 58173.79 GFLOP/s , 532258.0 tokens/s INFO:__main__:2024-10-27 02:20:31 | Epoch: 1 | Step: 38090 | Dataset: 0-10744901 | Loss: 2.183 | 674 ms/step , 58284.91 GFLOP/s , 532410.7 tokens/s INFO:__main__:2024-10-27 02:20:38 | Epoch: 1 | Step: 38100 | Dataset: 0-10752901 | Loss: 2.153 | 676 ms/step , 58173.22 GFLOP/s , 532204.1 tokens/s INFO:__main__:2024-10-27 02:20:46 | Epoch: 1 | Step: 38110 | Dataset: 0-10760901 | Loss: 2.092 | 675 ms/step , 58238.46 GFLOP/s , 532363.4 tokens/s INFO:__main__:2024-10-27 02:20:54 | Epoch: 1 | Step: 38120 | Dataset: 0-10768901 | Loss: 2.143 | 674 ms/step , 58318.39 GFLOP/s , 532258.6 tokens/s INFO:__main__:2024-10-27 02:21:01 | Epoch: 1 | Step: 38130 | Dataset: 0-10776901 | Loss: 2.043 | 674 ms/step , 58328.35 GFLOP/s , 532837.5 tokens/s INFO:__main__:2024-10-27 02:21:09 | Epoch: 1 | Step: 38140 | Dataset: 0-10784901 | Loss: 2.242 | 675 ms/step , 58228.14 GFLOP/s , 532217.6 tokens/s INFO:__main__:2024-10-27 02:21:17 | Epoch: 1 | Step: 38150 | Dataset: 0-10792901 | Loss: 2.158 | 675 ms/step , 58256.10 GFLOP/s , 532625.0 tokens/s INFO:__main__:2024-10-27 02:21:25 | Epoch: 1 | Step: 38160 | Dataset: 0-10800901 | Loss: 1.729 | 675 ms/step , 58229.05 GFLOP/s , 532072.6 tokens/s INFO:__main__:2024-10-27 02:21:32 | Epoch: 1 | Step: 38170 | Dataset: 0-10808901 | Loss: 1.712 | 674 ms/step , 58310.79 GFLOP/s , 532124.9 tokens/s INFO:__main__:2024-10-27 02:21:40 | Epoch: 1 | Step: 38180 | Dataset: 0-10816901 | Loss: 1.689 | 675 ms/step , 58229.54 GFLOP/s , 532204.7 tokens/s INFO:__main__:2024-10-27 02:21:48 | Epoch: 1 | Step: 38190 | Dataset: 0-10824901 | Loss: 1.684 | 675 ms/step , 58233.92 GFLOP/s , 532396.3 tokens/s INFO:__main__:2024-10-27 02:21:55 | Epoch: 1 | Step: 38200 | Dataset: 0-10832901 | Loss: 1.657 | 675 ms/step , 58239.40 GFLOP/s , 532286.1 tokens/s INFO:__main__:2024-10-27 02:22:03 | Epoch: 1 | Step: 38210 | Dataset: 0-10840901 | Loss: 1.642 | 675 ms/step , 58232.97 GFLOP/s , 532086.4 tokens/s INFO:__main__:2024-10-27 02:22:11 | Epoch: 1 | Step: 38220 | Dataset: 0-10848901 | Loss: 1.676 | 675 ms/step , 58206.85 GFLOP/s , 532266.1 tokens/s INFO:__main__:2024-10-27 02:22:18 | Epoch: 1 | Step: 38230 | Dataset: 0-10856901 | Loss: 1.648 | 677 ms/step , 58026.94 GFLOP/s , 531512.6 tokens/s INFO:__main__:2024-10-27 02:22:26 | Epoch: 1 | Step: 38240 | Dataset: 0-10864901 | Loss: 1.656 | 675 ms/step , 58204.99 GFLOP/s , 532081.3 tokens/s INFO:__main__:2024-10-27 02:22:34 | Epoch: 1 | Step: 38250 | Dataset: 0-10872901 | Loss: 1.635 | 676 ms/step , 58169.78 GFLOP/s , 531915.4 tokens/s INFO:__main__:2024-10-27 02:22:42 | Epoch: 1 | Step: 38260 | Dataset: 0-10880901 | Loss: 1.676 | 678 ms/step , 57998.67 GFLOP/s , 531414.6 tokens/s INFO:__main__:2024-10-27 02:22:49 | Epoch: 1 | Step: 38270 | Dataset: 0-10888901 | Loss: 1.666 | 676 ms/step , 58112.40 GFLOP/s , 531422.2 tokens/s INFO:__main__:2024-10-27 02:22:57 | Epoch: 1 | Step: 38280 | Dataset: 0-10896901 | Loss: 1.660 | 675 ms/step , 58269.36 GFLOP/s , 530727.2 tokens/s INFO:__main__:2024-10-27 02:23:05 | Epoch: 1 | Step: 38290 | Dataset: 0-10904901 | Loss: 1.641 | 676 ms/step , 58186.35 GFLOP/s , 531239.1 tokens/s INFO:__main__:2024-10-27 02:23:12 | Epoch: 1 | Step: 38300 | Dataset: 0-10912901 | Loss: 1.622 | 714 ms/step , 55056.09 GFLOP/s , 528887.8 tokens/s INFO:__main__:2024-10-27 02:23:20 | Epoch: 1 | Step: 38310 | Dataset: 0-10920901 | Loss: 1.631 | 676 ms/step , 58127.34 GFLOP/s , 530985.9 tokens/s INFO:__main__:2024-10-27 02:23:28 | Epoch: 1 | Step: 38320 | Dataset: 0-10928901 | Loss: 1.608 | 676 ms/step , 58131.27 GFLOP/s , 530292.8 tokens/s INFO:__main__:2024-10-27 02:23:36 | Epoch: 1 | Step: 38330 | Dataset: 0-10936901 | Loss: 2.388 | 674 ms/step , 58290.54 GFLOP/s , 531611.7 tokens/s INFO:__main__:2024-10-27 02:23:43 | Epoch: 1 | Step: 38340 | Dataset: 0-10944901 | Loss: 2.285 | 681 ms/step , 57759.00 GFLOP/s , 531592.5 tokens/s INFO:__main__:2024-10-27 02:23:51 | Epoch: 1 | Step: 38350 | Dataset: 0-10952901 | Loss: 2.196 | 675 ms/step , 58204.88 GFLOP/s , 531594.7 tokens/s INFO:__main__:2024-10-27 02:23:59 | Epoch: 1 | Step: 38360 | Dataset: 0-10960901 | Loss: 2.251 | 675 ms/step , 58204.55 GFLOP/s , 531229.9 tokens/s INFO:__main__:2024-10-27 02:24:06 | Epoch: 1 | Step: 38370 | Dataset: 0-10968901 | Loss: 2.225 | 675 ms/step , 58194.45 GFLOP/s , 531599.6 tokens/s INFO:__main__:2024-10-27 02:24:14 | Epoch: 1 | Step: 38380 | Dataset: 0-10976901 | Loss: 2.193 | 675 ms/step , 58225.93 GFLOP/s , 531771.7 tokens/s INFO:__main__:2024-10-27 02:24:22 | Epoch: 1 | Step: 38390 | Dataset: 0-10984901 | Loss: 2.216 | 676 ms/step , 58179.00 GFLOP/s , 531225.9 tokens/s INFO:__main__:2024-10-27 02:24:30 | Epoch: 1 | Step: 38400 | Dataset: 0-10992901 | Loss: 2.178 | 676 ms/step , 58179.31 GFLOP/s , 530527.0 tokens/s INFO:__main__:2024-10-27 02:24:37 | Epoch: 1 | Step: 38410 | Dataset: 0-11000901 | Loss: 2.178 | 675 ms/step , 58206.69 GFLOP/s , 531924.3 tokens/s INFO:__main__:2024-10-27 02:24:45 | Epoch: 1 | Step: 38420 | Dataset: 0-11008901 | Loss: 2.189 | 674 ms/step , 58289.08 GFLOP/s , 532677.1 tokens/s INFO:__main__:2024-10-27 02:24:53 | Epoch: 1 | Step: 38430 | Dataset: 0-11016901 | Loss: 2.147 | 677 ms/step , 58024.66 GFLOP/s , 531617.7 tokens/s INFO:__main__:2024-10-27 02:25:00 | Epoch: 1 | Step: 38440 | Dataset: 0-11024901 | Loss: 2.231 | 673 ms/step , 58402.08 GFLOP/s , 531592.6 tokens/s INFO:__main__:2024-10-27 02:25:08 | Epoch: 1 | Step: 38450 | Dataset: 0-11032901 | Loss: 2.179 | 676 ms/step , 58171.99 GFLOP/s , 532786.7 tokens/s INFO:__main__:2024-10-27 02:25:16 | Epoch: 1 | Step: 38460 | Dataset: 0-11040901 | Loss: 2.213 | 675 ms/step , 58199.37 GFLOP/s , 532053.9 tokens/s INFO:__main__:2024-10-27 02:25:23 | Epoch: 1 | Step: 38470 | Dataset: 0-11048901 | Loss: 2.187 | 675 ms/step , 58196.54 GFLOP/s , 532433.6 tokens/s INFO:__main__:2024-10-27 02:25:31 | Epoch: 1 | Step: 38480 | Dataset: 0-11056901 | Loss: 2.173 | 675 ms/step , 58267.35 GFLOP/s , 532600.8 tokens/s INFO:__main__:2024-10-27 02:25:39 | Epoch: 1 | Step: 38490 | Dataset: 0-11064901 | Loss: 2.175 | 676 ms/step , 58190.55 GFLOP/s , 532538.3 tokens/s INFO:__main__:2024-10-27 02:25:46 | Epoch: 1 | Step: 38500 | Dataset: 0-11072901 | Loss: 2.079 | 675 ms/step , 58257.24 GFLOP/s , 532231.7 tokens/s INFO:__main__:2024-10-27 02:25:54 | Epoch: 1 | Step: 38510 | Dataset: 0-11080901 | Loss: 2.246 | 675 ms/step , 58198.22 GFLOP/s , 532819.3 tokens/s INFO:__main__:2024-10-27 02:26:02 | Epoch: 1 | Step: 38520 | Dataset: 0-11088901 | Loss: 2.033 | 675 ms/step , 58257.62 GFLOP/s , 532677.1 tokens/s INFO:__main__:2024-10-27 02:26:10 | Epoch: 1 | Step: 38530 | Dataset: 0-11096901 | Loss: 2.222 | 675 ms/step , 58228.83 GFLOP/s , 532532.0 tokens/s INFO:__main__:2024-10-27 02:26:17 | Epoch: 1 | Step: 38540 | Dataset: 0-11104901 | Loss: 2.133 | 675 ms/step , 58206.82 GFLOP/s , 532401.0 tokens/s INFO:__main__:2024-10-27 02:26:25 | Epoch: 1 | Step: 38550 | Dataset: 0-11112901 | Loss: 2.145 | 675 ms/step , 58276.16 GFLOP/s , 532286.7 tokens/s INFO:__main__:2024-10-27 02:26:33 | Epoch: 1 | Step: 38560 | Dataset: 0-11120901 | Loss: 2.151 | 674 ms/step , 58282.76 GFLOP/s , 532595.4 tokens/s INFO:__main__:2024-10-27 02:26:40 | Epoch: 1 | Step: 38570 | Dataset: 0-11128901 | Loss: 2.184 | 675 ms/step , 58217.09 GFLOP/s , 532583.8 tokens/s INFO:__main__:2024-10-27 02:26:48 | Epoch: 1 | Step: 38580 | Dataset: 0-11136901 | Loss: 2.211 | 674 ms/step , 58334.00 GFLOP/s , 532874.6 tokens/s INFO:__main__:2024-10-27 02:26:56 | Epoch: 1 | Step: 38590 | Dataset: 0-11144901 | Loss: 2.144 | 674 ms/step , 58301.06 GFLOP/s , 532305.3 tokens/s INFO:__main__:2024-10-27 02:27:03 | Epoch: 1 | Step: 38600 | Dataset: 0-11152901 | Loss: 2.088 | 674 ms/step , 58279.87 GFLOP/s , 532793.8 tokens/s INFO:__main__:2024-10-27 02:27:11 | Epoch: 1 | Step: 38610 | Dataset: 0-11160901 | Loss: 2.173 | 675 ms/step , 58196.44 GFLOP/s , 531801.5 tokens/s INFO:__main__:2024-10-27 02:27:19 | Epoch: 1 | Step: 38620 | Dataset: 0-11168901 | Loss: 2.081 | 676 ms/step , 58139.98 GFLOP/s , 532833.4 tokens/s INFO:__main__:2024-10-27 02:27:26 | Epoch: 1 | Step: 38630 | Dataset: 0-11176901 | Loss: 2.171 | 675 ms/step , 58199.69 GFLOP/s , 532034.9 tokens/s INFO:__main__:2024-10-27 02:27:34 | Epoch: 1 | Step: 38640 | Dataset: 0-11184901 | Loss: 2.114 | 674 ms/step , 58306.63 GFLOP/s , 532230.1 tokens/s INFO:__main__:2024-10-27 02:27:42 | Epoch: 1 | Step: 38650 | Dataset: 0-11192901 | Loss: 2.270 | 675 ms/step , 58249.59 GFLOP/s , 532377.6 tokens/s INFO:__main__:2024-10-27 02:27:50 | Epoch: 1 | Step: 38660 | Dataset: 0-11200901 | Loss: 2.199 | 674 ms/step , 58344.04 GFLOP/s , 533010.8 tokens/s INFO:__main__:2024-10-27 02:27:57 | Epoch: 1 | Step: 38670 | Dataset: 0-11208901 | Loss: 2.250 | 674 ms/step , 58362.08 GFLOP/s , 532777.7 tokens/s INFO:__main__:2024-10-27 02:28:05 | Epoch: 1 | Step: 38680 | Dataset: 0-11216901 | Loss: 2.220 | 675 ms/step , 58200.68 GFLOP/s , 532354.6 tokens/s INFO:__main__:2024-10-27 02:28:13 | Epoch: 1 | Step: 38690 | Dataset: 0-11224901 | Loss: 2.200 | 675 ms/step , 58240.46 GFLOP/s , 532117.2 tokens/s INFO:__main__:2024-10-27 02:28:20 | Epoch: 1 | Step: 38700 | Dataset: 0-11232901 | Loss: 2.219 | 676 ms/step , 58147.87 GFLOP/s , 531711.1 tokens/s INFO:__main__:2024-10-27 02:28:28 | Epoch: 1 | Step: 38710 | Dataset: 0-11240901 | Loss: 2.248 | 674 ms/step , 58296.72 GFLOP/s , 532246.0 tokens/s INFO:__main__:2024-10-27 02:28:36 | Epoch: 1 | Step: 38720 | Dataset: 0-11248901 | Loss: 2.217 | 675 ms/step , 58239.92 GFLOP/s , 532090.6 tokens/s INFO:__main__:2024-10-27 02:28:43 | Epoch: 1 | Step: 38730 | Dataset: 0-11256901 | Loss: 2.258 | 675 ms/step , 58263.55 GFLOP/s , 532158.2 tokens/s INFO:__main__:2024-10-27 02:28:51 | Epoch: 1 | Step: 38740 | Dataset: 0-11264901 | Loss: 2.156 | 675 ms/step , 58254.75 GFLOP/s , 531830.9 tokens/s INFO:__main__:2024-10-27 02:28:59 | Epoch: 1 | Step: 38750 | Dataset: 0-11272901 | Loss: 2.157 | 675 ms/step , 58271.95 GFLOP/s , 532342.2 tokens/s INFO:__main__:2024-10-27 02:29:07 | Epoch: 1 | Step: 38760 | Dataset: 0-11280901 | Loss: 2.206 | 675 ms/step , 58245.61 GFLOP/s , 532234.7 tokens/s INFO:__main__:2024-10-27 02:29:14 | Epoch: 1 | Step: 38770 | Dataset: 0-11288901 | Loss: 2.229 | 675 ms/step , 58208.78 GFLOP/s , 531672.7 tokens/s INFO:__main__:2024-10-27 02:29:22 | Epoch: 1 | Step: 38780 | Dataset: 0-11296901 | Loss: 2.162 | 677 ms/step , 58090.40 GFLOP/s , 532578.0 tokens/s INFO:__main__:2024-10-27 02:29:30 | Epoch: 1 | Step: 38790 | Dataset: 0-11304901 | Loss: 2.123 | 674 ms/step , 58345.88 GFLOP/s , 532259.6 tokens/s INFO:__main__:2024-10-27 02:29:37 | Epoch: 1 | Step: 38800 | Dataset: 0-11312901 | Loss: 2.212 | 675 ms/step , 58272.90 GFLOP/s , 532738.1 tokens/s INFO:__main__:2024-10-27 02:29:45 | Epoch: 1 | Step: 38810 | Dataset: 0-11320901 | Loss: 2.312 | 675 ms/step , 58264.99 GFLOP/s , 532091.7 tokens/s INFO:__main__:2024-10-27 02:29:53 | Epoch: 1 | Step: 38820 | Dataset: 0-11328901 | Loss: 2.184 | 674 ms/step , 58347.32 GFLOP/s , 532824.6 tokens/s INFO:__main__:2024-10-27 02:30:00 | Epoch: 1 | Step: 38830 | Dataset: 0-11336901 | Loss: 2.198 | 675 ms/step , 58211.10 GFLOP/s , 529200.1 tokens/s INFO:__main__:2024-10-27 02:30:08 | Epoch: 1 | Step: 38840 | Dataset: 0-11344901 | Loss: 2.224 | 676 ms/step , 58180.44 GFLOP/s , 530548.4 tokens/s INFO:__main__:2024-10-27 02:30:16 | Epoch: 1 | Step: 38850 | Dataset: 0-11352901 | Loss: 2.188 | 675 ms/step , 58197.04 GFLOP/s , 530954.0 tokens/s INFO:__main__:2024-10-27 02:30:24 | Epoch: 1 | Step: 38860 | Dataset: 0-11360901 | Loss: 2.210 | 674 ms/step , 58299.75 GFLOP/s , 531593.0 tokens/s INFO:__main__:2024-10-27 02:30:31 | Epoch: 1 | Step: 38870 | Dataset: 0-11368901 | Loss: 2.118 | 675 ms/step , 58209.23 GFLOP/s , 531313.2 tokens/s INFO:__main__:2024-10-27 02:30:39 | Epoch: 1 | Step: 38880 | Dataset: 0-11376901 | Loss: 2.218 | 676 ms/step , 58123.04 GFLOP/s , 531074.7 tokens/s INFO:__main__:2024-10-27 02:30:47 | Epoch: 1 | Step: 38890 | Dataset: 0-11384901 | Loss: 2.181 | 676 ms/step , 58118.25 GFLOP/s , 530378.8 tokens/s INFO:__main__:2024-10-27 02:30:54 | Epoch: 1 | Step: 38900 | Dataset: 0-11392901 | Loss: 2.214 | 678 ms/step , 57938.27 GFLOP/s , 528253.4 tokens/s INFO:__main__:2024-10-27 02:31:02 | Epoch: 1 | Step: 38910 | Dataset: 0-11400901 | Loss: 2.231 | 675 ms/step , 58273.06 GFLOP/s , 531943.4 tokens/s INFO:__main__:2024-10-27 02:31:10 | Epoch: 1 | Step: 38920 | Dataset: 0-11408901 | Loss: 2.118 | 675 ms/step , 58257.88 GFLOP/s , 531766.6 tokens/s INFO:__main__:2024-10-27 02:31:18 | Epoch: 1 | Step: 38930 | Dataset: 0-11416901 | Loss: 2.235 | 676 ms/step , 58180.39 GFLOP/s , 532382.2 tokens/s INFO:__main__:2024-10-27 02:31:25 | Epoch: 1 | Step: 38940 | Dataset: 0-11424901 | Loss: 2.198 | 676 ms/step , 58182.28 GFLOP/s , 531329.9 tokens/s INFO:__main__:2024-10-27 02:31:33 | Epoch: 1 | Step: 38950 | Dataset: 0-11432901 | Loss: 2.201 | 675 ms/step , 58202.38 GFLOP/s , 531880.1 tokens/s INFO:__main__:2024-10-27 02:31:41 | Epoch: 1 | Step: 38960 | Dataset: 0-11440901 | Loss: 2.140 | 675 ms/step , 58253.25 GFLOP/s , 531720.7 tokens/s INFO:__main__:2024-10-27 02:31:48 | Epoch: 1 | Step: 38970 | Dataset: 0-11448901 | Loss: 1.905 | 676 ms/step , 58155.67 GFLOP/s , 530989.4 tokens/s INFO:__main__:2024-10-27 02:31:56 | Epoch: 1 | Step: 38980 | Dataset: 0-11456901 | Loss: 1.774 | 676 ms/step , 58128.58 GFLOP/s , 531406.1 tokens/s INFO:__main__:2024-10-27 02:32:04 | Epoch: 1 | Step: 38990 | Dataset: 0-11464901 | Loss: 1.738 | 675 ms/step , 58252.77 GFLOP/s , 531207.0 tokens/s INFO:__main__:2024-10-27 02:32:11 | Validation | Step: 39000 | Val_loss: 1.836 | Best_val_loss: 1.9440 INFO:__main__:2024-10-27 02:32:11 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_023211_step_39000.pt` INFO:__main__:2024-10-27 02:32:12 | Epoch: 1 | Step: 39000 | Dataset: 0-11472901 | Loss: 1.734 | 674 ms/step , 58290.35 GFLOP/s , 477888.1 tokens/s INFO:__main__:2024-10-27 02:32:20 | Epoch: 1 | Step: 39010 | Dataset: 0-11480901 | Loss: 1.722 | 675 ms/step , 58224.64 GFLOP/s , 531157.9 tokens/s INFO:__main__:2024-10-27 02:32:28 | Epoch: 1 | Step: 39020 | Dataset: 0-11488901 | Loss: 1.713 | 676 ms/step , 58139.29 GFLOP/s , 531374.0 tokens/s INFO:__main__:2024-10-27 02:32:36 | Epoch: 1 | Step: 39030 | Dataset: 0-11496901 | Loss: 1.680 | 674 ms/step , 58284.96 GFLOP/s , 531453.9 tokens/s INFO:__main__:2024-10-27 02:32:43 | Epoch: 1 | Step: 39040 | Dataset: 0-11504901 | Loss: 1.672 | 675 ms/step , 58218.84 GFLOP/s , 532064.6 tokens/s INFO:__main__:2024-10-27 02:32:51 | Epoch: 1 | Step: 39050 | Dataset: 0-11512901 | Loss: 1.688 | 675 ms/step , 58257.67 GFLOP/s , 532080.0 tokens/s INFO:__main__:2024-10-27 02:32:59 | Epoch: 1 | Step: 39060 | Dataset: 0-11520901 | Loss: 2.291 | 674 ms/step , 58338.13 GFLOP/s , 532586.3 tokens/s INFO:__main__:2024-10-27 02:33:06 | Epoch: 1 | Step: 39070 | Dataset: 0-11528901 | Loss: 2.330 | 675 ms/step , 58226.96 GFLOP/s , 532346.9 tokens/s INFO:__main__:2024-10-27 02:33:14 | Epoch: 1 | Step: 39080 | Dataset: 0-11536901 | Loss: 2.232 | 676 ms/step , 58182.84 GFLOP/s , 531766.8 tokens/s INFO:__main__:2024-10-27 02:33:22 | Epoch: 1 | Step: 39090 | Dataset: 0-11544901 | Loss: 2.315 | 675 ms/step , 58194.30 GFLOP/s , 531867.6 tokens/s INFO:__main__:2024-10-27 02:33:29 | Epoch: 1 | Step: 39100 | Dataset: 0-11552901 | Loss: 2.208 | 674 ms/step , 58308.79 GFLOP/s , 532420.3 tokens/s INFO:__main__:2024-10-27 02:33:37 | Epoch: 1 | Step: 39110 | Dataset: 0-11560901 | Loss: 2.259 | 674 ms/step , 58335.35 GFLOP/s , 532689.6 tokens/s INFO:__main__:2024-10-27 02:33:45 | Epoch: 1 | Step: 39120 | Dataset: 0-11568901 | Loss: 2.265 | 675 ms/step , 58217.64 GFLOP/s , 531887.0 tokens/s INFO:__main__:2024-10-27 02:33:52 | Epoch: 1 | Step: 39130 | Dataset: 0-11576901 | Loss: 2.204 | 675 ms/step , 58198.11 GFLOP/s , 532013.3 tokens/s INFO:__main__:2024-10-27 02:34:00 | Epoch: 1 | Step: 39140 | Dataset: 0-11584901 | Loss: 2.206 | 674 ms/step , 58289.23 GFLOP/s , 532099.8 tokens/s INFO:__main__:2024-10-27 02:34:08 | Epoch: 1 | Step: 39150 | Dataset: 0-11592901 | Loss: 2.261 | 675 ms/step , 58214.37 GFLOP/s , 532139.9 tokens/s INFO:__main__:2024-10-27 02:34:16 | Epoch: 1 | Step: 39160 | Dataset: 0-11600901 | Loss: 2.215 | 675 ms/step , 58258.27 GFLOP/s , 532460.8 tokens/s INFO:__main__:2024-10-27 02:34:23 | Epoch: 1 | Step: 39170 | Dataset: 0-11608901 | Loss: 2.228 | 675 ms/step , 58251.12 GFLOP/s , 532199.0 tokens/s INFO:__main__:2024-10-27 02:34:31 | Epoch: 1 | Step: 39180 | Dataset: 0-11616901 | Loss: 2.218 | 675 ms/step , 58242.61 GFLOP/s , 532331.9 tokens/s INFO:__main__:2024-10-27 02:34:39 | Epoch: 1 | Step: 39190 | Dataset: 0-11624901 | Loss: 2.201 | 675 ms/step , 58272.16 GFLOP/s , 532431.2 tokens/s INFO:__main__:2024-10-27 02:34:46 | Epoch: 1 | Step: 39200 | Dataset: 0-11632901 | Loss: 2.234 | 675 ms/step , 58215.40 GFLOP/s , 532242.8 tokens/s INFO:__main__:2024-10-27 02:34:54 | Epoch: 1 | Step: 39210 | Dataset: 0-11640901 | Loss: 2.138 | 674 ms/step , 58321.33 GFLOP/s , 532615.5 tokens/s INFO:__main__:2024-10-27 02:35:02 | Epoch: 1 | Step: 39220 | Dataset: 0-11648901 | Loss: 2.226 | 678 ms/step , 57947.63 GFLOP/s , 530992.7 tokens/s INFO:__main__:2024-10-27 02:35:09 | Epoch: 1 | Step: 39230 | Dataset: 0-11656901 | Loss: 2.209 | 677 ms/step , 58034.85 GFLOP/s , 529725.2 tokens/s INFO:__main__:2024-10-27 02:35:17 | Epoch: 1 | Step: 39240 | Dataset: 0-11664901 | Loss: 2.114 | 675 ms/step , 58224.99 GFLOP/s , 531693.8 tokens/s INFO:__main__:2024-10-27 02:35:25 | Epoch: 1 | Step: 39250 | Dataset: 0-11672901 | Loss: 2.137 | 675 ms/step , 58193.55 GFLOP/s , 531597.5 tokens/s INFO:__main__:2024-10-27 02:35:33 | Epoch: 1 | Step: 39260 | Dataset: 0-11680901 | Loss: 2.079 | 676 ms/step , 58190.87 GFLOP/s , 531903.7 tokens/s INFO:__main__:2024-10-27 02:35:40 | Epoch: 1 | Step: 39270 | Dataset: 0-11688901 | Loss: 2.163 | 676 ms/step , 58176.36 GFLOP/s , 531929.3 tokens/s INFO:__main__:2024-10-27 02:35:48 | Epoch: 1 | Step: 39280 | Dataset: 0-11696901 | Loss: 2.208 | 675 ms/step , 58216.43 GFLOP/s , 532266.9 tokens/s INFO:__main__:2024-10-27 02:35:56 | Epoch: 1 | Step: 39290 | Dataset: 0-11704901 | Loss: 2.213 | 675 ms/step , 58230.69 GFLOP/s , 532579.3 tokens/s INFO:__main__:2024-10-27 02:36:03 | Epoch: 1 | Step: 39300 | Dataset: 0-11712901 | Loss: 2.135 | 675 ms/step , 58216.59 GFLOP/s , 531500.1 tokens/s INFO:__main__:2024-10-27 02:36:11 | Epoch: 1 | Step: 39310 | Dataset: 0-11720901 | Loss: 2.132 | 676 ms/step , 58145.78 GFLOP/s , 530874.0 tokens/s INFO:__main__:2024-10-27 02:36:19 | Epoch: 1 | Step: 39320 | Dataset: 0-11728901 | Loss: 2.103 | 675 ms/step , 58251.08 GFLOP/s , 532089.5 tokens/s INFO:__main__:2024-10-27 02:36:26 | Epoch: 1 | Step: 39330 | Dataset: 0-11736901 | Loss: 2.123 | 675 ms/step , 58202.82 GFLOP/s , 532703.3 tokens/s INFO:__main__:2024-10-27 02:36:34 | Epoch: 1 | Step: 39340 | Dataset: 0-11744901 | Loss: 2.121 | 675 ms/step , 58272.89 GFLOP/s , 533017.3 tokens/s INFO:__main__:2024-10-27 02:36:42 | Epoch: 1 | Step: 39350 | Dataset: 0-11752901 | Loss: 2.048 | 680 ms/step , 57802.49 GFLOP/s , 532964.5 tokens/s INFO:__main__:2024-10-27 02:36:50 | Epoch: 1 | Step: 39360 | Dataset: 0-11760901 | Loss: 2.152 | 674 ms/step , 58308.28 GFLOP/s , 532326.3 tokens/s INFO:__main__:2024-10-27 02:36:57 | Epoch: 1 | Step: 39370 | Dataset: 0-11768901 | Loss: 2.159 | 675 ms/step , 58269.17 GFLOP/s , 533368.9 tokens/s INFO:__main__:2024-10-27 02:37:05 | Epoch: 1 | Step: 39380 | Dataset: 0-11776901 | Loss: 2.743 | 673 ms/step , 58375.62 GFLOP/s , 533427.3 tokens/s INFO:__main__:2024-10-27 02:37:13 | Epoch: 1 | Step: 39390 | Dataset: 0-11784901 | Loss: 2.630 | 673 ms/step , 58375.44 GFLOP/s , 532120.6 tokens/s INFO:__main__:2024-10-27 02:37:20 | Epoch: 1 | Step: 39400 | Dataset: 0-11792901 | Loss: 2.629 | 676 ms/step , 58185.45 GFLOP/s , 533330.3 tokens/s INFO:__main__:2024-10-27 02:37:28 | Epoch: 1 | Step: 39410 | Dataset: 0-11800901 | Loss: 2.604 | 677 ms/step , 58091.68 GFLOP/s , 532864.4 tokens/s INFO:__main__:2024-10-27 02:37:36 | Epoch: 1 | Step: 39420 | Dataset: 0-11808901 | Loss: 2.569 | 675 ms/step , 58241.03 GFLOP/s , 531956.8 tokens/s INFO:__main__:2024-10-27 02:37:43 | Epoch: 1 | Step: 39430 | Dataset: 0-11816901 | Loss: 2.511 | 676 ms/step , 58176.50 GFLOP/s , 532895.7 tokens/s INFO:__main__:2024-10-27 02:37:51 | Epoch: 1 | Step: 39440 | Dataset: 0-11824901 | Loss: 2.499 | 675 ms/step , 58267.44 GFLOP/s , 533239.0 tokens/s INFO:__main__:2024-10-27 02:37:59 | Epoch: 1 | Step: 39450 | Dataset: 0-11832901 | Loss: 2.495 | 674 ms/step , 58345.78 GFLOP/s , 532814.5 tokens/s INFO:__main__:2024-10-27 02:38:06 | Epoch: 1 | Step: 39460 | Dataset: 0-11840901 | Loss: 2.442 | 674 ms/step , 58360.77 GFLOP/s , 533282.0 tokens/s INFO:__main__:2024-10-27 02:38:14 | Epoch: 1 | Step: 39470 | Dataset: 0-11848901 | Loss: 2.524 | 673 ms/step , 58379.93 GFLOP/s , 532753.8 tokens/s INFO:__main__:2024-10-27 02:38:22 | Epoch: 1 | Step: 39480 | Dataset: 0-11856901 | Loss: 2.461 | 674 ms/step , 58279.62 GFLOP/s , 533491.5 tokens/s INFO:__main__:2024-10-27 02:38:29 | Epoch: 1 | Step: 39490 | Dataset: 0-11864901 | Loss: 2.435 | 674 ms/step , 58280.02 GFLOP/s , 533204.9 tokens/s INFO:__main__:2024-10-27 02:38:37 | Epoch: 1 | Step: 39500 | Dataset: 0-11872901 | Loss: 2.481 | 675 ms/step , 58258.49 GFLOP/s , 533075.7 tokens/s INFO:__main__:2024-10-27 02:38:45 | Epoch: 1 | Step: 39510 | Dataset: 0-11880901 | Loss: 2.466 | 673 ms/step , 58373.10 GFLOP/s , 533155.2 tokens/s INFO:__main__:2024-10-27 02:38:53 | Epoch: 1 | Step: 39520 | Dataset: 0-11888901 | Loss: 2.533 | 674 ms/step , 58321.61 GFLOP/s , 532616.1 tokens/s INFO:__main__:2024-10-27 02:39:00 | Epoch: 1 | Step: 39530 | Dataset: 0-11896901 | Loss: 2.474 | 675 ms/step , 58278.85 GFLOP/s , 533173.2 tokens/s INFO:__main__:2024-10-27 02:39:08 | Epoch: 1 | Step: 39540 | Dataset: 0-11904901 | Loss: 2.291 | 676 ms/step , 58153.49 GFLOP/s , 533225.6 tokens/s INFO:__main__:2024-10-27 02:39:16 | Epoch: 1 | Step: 39550 | Dataset: 0-11912901 | Loss: 2.324 | 675 ms/step , 58234.63 GFLOP/s , 532840.0 tokens/s INFO:__main__:2024-10-27 02:39:23 | Epoch: 1 | Step: 39560 | Dataset: 0-11920901 | Loss: 2.236 | 676 ms/step , 58171.52 GFLOP/s , 532845.8 tokens/s INFO:__main__:2024-10-27 02:39:31 | Epoch: 1 | Step: 39570 | Dataset: 0-11928901 | Loss: 2.291 | 675 ms/step , 58193.81 GFLOP/s , 533023.6 tokens/s INFO:__main__:2024-10-27 02:39:39 | Epoch: 1 | Step: 39580 | Dataset: 0-11936901 | Loss: 2.286 | 675 ms/step , 58201.83 GFLOP/s , 532524.3 tokens/s INFO:__main__:2024-10-27 02:39:46 | Epoch: 1 | Step: 39590 | Dataset: 0-11944901 | Loss: 2.219 | 675 ms/step , 58234.44 GFLOP/s , 532629.9 tokens/s INFO:__main__:2024-10-27 02:39:54 | Epoch: 1 | Step: 39600 | Dataset: 0-11952901 | Loss: 2.261 | 675 ms/step , 58245.94 GFLOP/s , 532831.2 tokens/s INFO:__main__:2024-10-27 02:40:02 | Epoch: 1 | Step: 39610 | Dataset: 0-11960901 | Loss: 2.228 | 675 ms/step , 58224.99 GFLOP/s , 532976.0 tokens/s INFO:__main__:2024-10-27 02:40:09 | Epoch: 1 | Step: 39620 | Dataset: 0-11968901 | Loss: 2.255 | 676 ms/step , 58169.46 GFLOP/s , 532712.1 tokens/s INFO:__main__:2024-10-27 02:40:17 | Epoch: 1 | Step: 39630 | Dataset: 0-11976901 | Loss: 2.193 | 675 ms/step , 58275.50 GFLOP/s , 532484.7 tokens/s INFO:__main__:2024-10-27 02:40:25 | Epoch: 1 | Step: 39640 | Dataset: 0-11984901 | Loss: 2.187 | 674 ms/step , 58291.28 GFLOP/s , 532523.6 tokens/s INFO:__main__:2024-10-27 02:40:32 | Epoch: 1 | Step: 39650 | Dataset: 0-11992901 | Loss: 2.219 | 675 ms/step , 58223.79 GFLOP/s , 532772.5 tokens/s INFO:__main__:2024-10-27 02:40:40 | Epoch: 1 | Step: 39660 | Dataset: 0-12000901 | Loss: 2.184 | 675 ms/step , 58198.87 GFLOP/s , 533010.8 tokens/s INFO:__main__:2024-10-27 02:40:48 | Epoch: 1 | Step: 39670 | Dataset: 0-12008901 | Loss: 2.130 | 675 ms/step , 58195.45 GFLOP/s , 532167.2 tokens/s INFO:__main__:2024-10-27 02:40:56 | Epoch: 1 | Step: 39680 | Dataset: 0-12016901 | Loss: 2.186 | 675 ms/step , 58272.11 GFLOP/s , 532472.9 tokens/s INFO:__main__:2024-10-27 02:41:03 | Epoch: 1 | Step: 39690 | Dataset: 0-12024901 | Loss: 2.190 | 676 ms/step , 58151.51 GFLOP/s , 532760.7 tokens/s INFO:__main__:2024-10-27 02:41:11 | Epoch: 1 | Step: 39700 | Dataset: 0-12032901 | Loss: 2.333 | 675 ms/step , 58255.61 GFLOP/s , 532494.1 tokens/s INFO:__main__:2024-10-27 02:41:19 | Epoch: 1 | Step: 39710 | Dataset: 0-12040901 | Loss: 2.184 | 675 ms/step , 58243.23 GFLOP/s , 532486.3 tokens/s INFO:__main__:2024-10-27 02:41:26 | Epoch: 1 | Step: 39720 | Dataset: 0-12048901 | Loss: 2.096 | 675 ms/step , 58248.17 GFLOP/s , 532541.7 tokens/s INFO:__main__:2024-10-27 02:41:34 | Epoch: 1 | Step: 39730 | Dataset: 0-12056901 | Loss: 2.135 | 675 ms/step , 58262.90 GFLOP/s , 532613.7 tokens/s INFO:__main__:2024-10-27 02:41:42 | Epoch: 1 | Step: 39740 | Dataset: 0-12064901 | Loss: 2.244 | 674 ms/step , 58326.28 GFLOP/s , 532649.0 tokens/s INFO:__main__:2024-10-27 02:41:49 | Epoch: 1 | Step: 39750 | Dataset: 0-12072901 | Loss: 2.178 | 676 ms/step , 58189.69 GFLOP/s , 532606.7 tokens/s INFO:__main__:2024-10-27 02:41:57 | Epoch: 1 | Step: 39760 | Dataset: 0-12080901 | Loss: 2.199 | 674 ms/step , 58302.02 GFLOP/s , 532652.8 tokens/s INFO:__main__:2024-10-27 02:42:05 | Epoch: 1 | Step: 39770 | Dataset: 0-12088901 | Loss: 2.132 | 674 ms/step , 58324.99 GFLOP/s , 533316.7 tokens/s INFO:__main__:2024-10-27 02:42:12 | Epoch: 1 | Step: 39780 | Dataset: 0-12096901 | Loss: 2.154 | 674 ms/step , 58324.10 GFLOP/s , 533215.1 tokens/s INFO:__main__:2024-10-27 02:42:20 | Epoch: 1 | Step: 39790 | Dataset: 0-12104901 | Loss: 2.136 | 674 ms/step , 58320.74 GFLOP/s , 532386.5 tokens/s INFO:__main__:2024-10-27 02:42:28 | Epoch: 1 | Step: 39800 | Dataset: 0-12112901 | Loss: 2.166 | 674 ms/step , 58319.06 GFLOP/s , 533380.7 tokens/s INFO:__main__:2024-10-27 02:42:35 | Epoch: 1 | Step: 39810 | Dataset: 0-12120901 | Loss: 2.146 | 675 ms/step , 58238.20 GFLOP/s , 532514.7 tokens/s INFO:__main__:2024-10-27 02:42:43 | Epoch: 1 | Step: 39820 | Dataset: 0-12128901 | Loss: 2.145 | 675 ms/step , 58247.98 GFLOP/s , 532605.2 tokens/s INFO:__main__:2024-10-27 02:42:51 | Epoch: 1 | Step: 39830 | Dataset: 0-12136901 | Loss: 2.189 | 677 ms/step , 58100.67 GFLOP/s , 532290.5 tokens/s INFO:__main__:2024-10-27 02:42:59 | Epoch: 1 | Step: 39840 | Dataset: 0-12144901 | Loss: 2.145 | 675 ms/step , 58237.94 GFLOP/s , 532923.4 tokens/s INFO:__main__:2024-10-27 02:43:06 | Epoch: 1 | Step: 39850 | Dataset: 0-12152901 | Loss: 2.105 | 675 ms/step , 58218.14 GFLOP/s , 532735.3 tokens/s INFO:__main__:2024-10-27 02:43:14 | Epoch: 1 | Step: 39860 | Dataset: 0-12160901 | Loss: 2.333 | 675 ms/step , 58274.49 GFLOP/s , 533210.4 tokens/s INFO:__main__:2024-10-27 02:43:22 | Epoch: 1 | Step: 39870 | Dataset: 0-12168901 | Loss: 2.240 | 676 ms/step , 58188.64 GFLOP/s , 533000.7 tokens/s INFO:__main__:2024-10-27 02:43:29 | Epoch: 1 | Step: 39880 | Dataset: 0-12176901 | Loss: 2.223 | 675 ms/step , 58251.25 GFLOP/s , 532739.0 tokens/s INFO:__main__:2024-10-27 02:43:37 | Epoch: 1 | Step: 39890 | Dataset: 0-12184901 | Loss: 2.218 | 674 ms/step , 58291.15 GFLOP/s , 533174.1 tokens/s INFO:__main__:2024-10-27 02:43:45 | Epoch: 1 | Step: 39900 | Dataset: 0-12192901 | Loss: 2.241 | 675 ms/step , 58257.07 GFLOP/s , 532544.2 tokens/s INFO:__main__:2024-10-27 02:43:52 | Epoch: 1 | Step: 39910 | Dataset: 0-12200901 | Loss: 2.204 | 674 ms/step , 58354.90 GFLOP/s , 533020.7 tokens/s INFO:__main__:2024-10-27 02:44:00 | Epoch: 1 | Step: 39920 | Dataset: 0-12208901 | Loss: 2.250 | 674 ms/step , 58288.61 GFLOP/s , 532906.7 tokens/s INFO:__main__:2024-10-27 02:44:08 | Epoch: 1 | Step: 39930 | Dataset: 0-12216901 | Loss: 2.215 | 674 ms/step , 58350.30 GFLOP/s , 533226.9 tokens/s INFO:__main__:2024-10-27 02:44:15 | Epoch: 1 | Step: 39940 | Dataset: 0-12224901 | Loss: 2.208 | 674 ms/step , 58363.71 GFLOP/s , 533506.3 tokens/s INFO:__main__:2024-10-27 02:44:23 | Epoch: 1 | Step: 39950 | Dataset: 0-12232901 | Loss: 2.253 | 674 ms/step , 58300.07 GFLOP/s , 533339.0 tokens/s INFO:__main__:2024-10-27 02:44:31 | Epoch: 1 | Step: 39960 | Dataset: 0-12240901 | Loss: 2.158 | 675 ms/step , 58277.10 GFLOP/s , 533132.6 tokens/s INFO:__main__:2024-10-27 02:44:38 | Epoch: 1 | Step: 39970 | Dataset: 0-12248901 | Loss: 2.232 | 674 ms/step , 58293.34 GFLOP/s , 533553.8 tokens/s INFO:__main__:2024-10-27 02:44:46 | Epoch: 1 | Step: 39980 | Dataset: 0-12256901 | Loss: 2.208 | 674 ms/step , 58317.35 GFLOP/s , 533459.4 tokens/s INFO:__main__:2024-10-27 02:44:54 | Epoch: 1 | Step: 39990 | Dataset: 0-12264901 | Loss: 2.223 | 674 ms/step , 58351.77 GFLOP/s , 533729.1 tokens/s INFO:__main__:2024-10-27 02:45:01 | Validation | Step: 40000 | Val_loss: 2.346 | Best_val_loss: 1.8362 INFO:__main__:2024-10-27 02:45:01 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_024501_step_40000.pt` INFO:__main__:2024-10-27 02:45:02 | Epoch: 1 | Step: 40000 | Dataset: 0-12272901 | Loss: 2.246 | 672 ms/step , 58454.91 GFLOP/s , 478804.4 tokens/s INFO:__main__:2024-10-27 02:45:10 | Epoch: 1 | Step: 40010 | Dataset: 0-12280901 | Loss: 2.229 | 674 ms/step , 58279.38 GFLOP/s , 533084.5 tokens/s INFO:__main__:2024-10-27 02:45:18 | Epoch: 1 | Step: 40020 | Dataset: 0-12288901 | Loss: 2.162 | 673 ms/step , 58368.65 GFLOP/s , 533391.0 tokens/s INFO:__main__:2024-10-27 02:45:25 | Epoch: 1 | Step: 40030 | Dataset: 0-12296901 | Loss: 2.204 | 674 ms/step , 58362.86 GFLOP/s , 533662.4 tokens/s INFO:__main__:2024-10-27 02:45:33 | Epoch: 1 | Step: 40040 | Dataset: 0-12304901 | Loss: 2.178 | 674 ms/step , 58349.23 GFLOP/s , 533585.9 tokens/s INFO:__main__:2024-10-27 02:45:41 | Epoch: 1 | Step: 40050 | Dataset: 0-12312901 | Loss: 2.163 | 674 ms/step , 58284.11 GFLOP/s , 533193.7 tokens/s INFO:__main__:2024-10-27 02:45:48 | Epoch: 1 | Step: 40060 | Dataset: 0-12320901 | Loss: 2.166 | 674 ms/step , 58349.96 GFLOP/s , 533568.2 tokens/s INFO:__main__:2024-10-27 02:45:56 | Epoch: 1 | Step: 40070 | Dataset: 0-12328901 | Loss: 2.178 | 674 ms/step , 58355.91 GFLOP/s , 533422.4 tokens/s INFO:__main__:2024-10-27 02:46:04 | Epoch: 1 | Step: 40080 | Dataset: 0-12336901 | Loss: 2.099 | 674 ms/step , 58314.86 GFLOP/s , 533320.3 tokens/s INFO:__main__:2024-10-27 02:46:11 | Epoch: 1 | Step: 40090 | Dataset: 0-12344901 | Loss: 2.143 | 674 ms/step , 58318.46 GFLOP/s , 533194.9 tokens/s INFO:__main__:2024-10-27 02:46:19 | Epoch: 1 | Step: 40100 | Dataset: 0-12352901 | Loss: 2.091 | 674 ms/step , 58363.62 GFLOP/s , 532967.5 tokens/s INFO:__main__:2024-10-27 02:46:27 | Epoch: 1 | Step: 40110 | Dataset: 0-12360901 | Loss: 2.188 | 674 ms/step , 58321.57 GFLOP/s , 533229.7 tokens/s INFO:__main__:2024-10-27 02:46:35 | Epoch: 1 | Step: 40120 | Dataset: 0-12368901 | Loss: 2.061 | 675 ms/step , 58243.32 GFLOP/s , 533028.6 tokens/s INFO:__main__:2024-10-27 02:46:42 | Epoch: 1 | Step: 40130 | Dataset: 0-12376901 | Loss: 2.200 | 673 ms/step , 58446.19 GFLOP/s , 533701.5 tokens/s INFO:__main__:2024-10-27 02:46:50 | Epoch: 1 | Step: 40140 | Dataset: 0-12384901 | Loss: 2.163 | 673 ms/step , 58391.22 GFLOP/s , 533729.2 tokens/s INFO:__main__:2024-10-27 02:46:58 | Epoch: 1 | Step: 40150 | Dataset: 0-12392901 | Loss: 2.193 | 674 ms/step , 58308.71 GFLOP/s , 532999.0 tokens/s INFO:__main__:2024-10-27 02:47:05 | Epoch: 1 | Step: 40160 | Dataset: 0-12400901 | Loss: 2.120 | 675 ms/step , 58260.41 GFLOP/s , 531251.9 tokens/s INFO:__main__:2024-10-27 02:47:13 | Epoch: 1 | Step: 40170 | Dataset: 0-12408901 | Loss: 2.126 | 674 ms/step , 58309.94 GFLOP/s , 533311.5 tokens/s INFO:__main__:2024-10-27 02:47:21 | Epoch: 1 | Step: 40180 | Dataset: 0-12416901 | Loss: 2.210 | 675 ms/step , 58277.00 GFLOP/s , 532981.0 tokens/s INFO:__main__:2024-10-27 02:47:28 | Epoch: 1 | Step: 40190 | Dataset: 0-12424901 | Loss: 2.190 | 674 ms/step , 58293.43 GFLOP/s , 533197.8 tokens/s INFO:__main__:2024-10-27 02:47:36 | Epoch: 1 | Step: 40200 | Dataset: 0-12432901 | Loss: 2.168 | 674 ms/step , 58329.15 GFLOP/s , 532576.4 tokens/s INFO:__main__:2024-10-27 02:47:44 | Epoch: 1 | Step: 40210 | Dataset: 0-12440901 | Loss: 2.177 | 674 ms/step , 58297.24 GFLOP/s , 532116.8 tokens/s INFO:__main__:2024-10-27 02:47:51 | Epoch: 1 | Step: 40220 | Dataset: 0-12448901 | Loss: 2.251 | 676 ms/step , 58174.15 GFLOP/s , 532503.3 tokens/s INFO:__main__:2024-10-27 02:47:59 | Epoch: 1 | Step: 40230 | Dataset: 0-12456901 | Loss: 2.195 | 677 ms/step , 58091.77 GFLOP/s , 530698.6 tokens/s INFO:__main__:2024-10-27 02:48:07 | Epoch: 1 | Step: 40240 | Dataset: 0-12464901 | Loss: 2.172 | 676 ms/step , 58171.39 GFLOP/s , 531012.0 tokens/s INFO:__main__:2024-10-27 02:48:15 | Epoch: 1 | Step: 40250 | Dataset: 0-12472901 | Loss: 2.126 | 675 ms/step , 58272.28 GFLOP/s , 531625.6 tokens/s INFO:__main__:2024-10-27 02:48:22 | Epoch: 1 | Step: 40260 | Dataset: 0-12480901 | Loss: 2.121 | 675 ms/step , 58245.08 GFLOP/s , 532115.7 tokens/s INFO:__main__:2024-10-27 02:48:30 | Epoch: 1 | Step: 40270 | Dataset: 0-12488901 | Loss: 2.200 | 675 ms/step , 58269.41 GFLOP/s , 531491.9 tokens/s INFO:__main__:2024-10-27 02:48:38 | Epoch: 1 | Step: 40280 | Dataset: 0-12496901 | Loss: 2.193 | 675 ms/step , 58233.40 GFLOP/s , 531644.2 tokens/s INFO:__main__:2024-10-27 02:48:45 | Epoch: 1 | Step: 40290 | Dataset: 0-12504901 | Loss: 2.161 | 676 ms/step , 58134.50 GFLOP/s , 531305.2 tokens/s INFO:__main__:2024-10-27 02:48:53 | Epoch: 1 | Step: 40300 | Dataset: 0-12512901 | Loss: 2.251 | 676 ms/step , 58145.06 GFLOP/s , 530366.4 tokens/s INFO:__main__:2024-10-27 02:49:01 | Epoch: 1 | Step: 40310 | Dataset: 0-12520901 | Loss: 2.173 | 676 ms/step , 58123.04 GFLOP/s , 529319.7 tokens/s INFO:__main__:2024-10-27 02:49:08 | Epoch: 1 | Step: 40320 | Dataset: 0-12528901 | Loss: 2.137 | 674 ms/step , 58322.29 GFLOP/s , 532058.0 tokens/s INFO:__main__:2024-10-27 02:49:16 | Epoch: 1 | Step: 40330 | Dataset: 0-12536901 | Loss: 2.131 | 675 ms/step , 58267.37 GFLOP/s , 532952.0 tokens/s INFO:__main__:2024-10-27 02:49:24 | Epoch: 1 | Step: 40340 | Dataset: 0-12544901 | Loss: 2.229 | 676 ms/step , 58161.74 GFLOP/s , 532117.1 tokens/s INFO:__main__:2024-10-27 02:49:32 | Epoch: 1 | Step: 40350 | Dataset: 0-12552901 | Loss: 2.196 | 675 ms/step , 58202.98 GFLOP/s , 532337.9 tokens/s INFO:__main__:2024-10-27 02:49:39 | Epoch: 1 | Step: 40360 | Dataset: 0-12560901 | Loss: 2.204 | 675 ms/step , 58233.27 GFLOP/s , 531532.5 tokens/s INFO:__main__:2024-10-27 02:49:47 | Epoch: 1 | Step: 40370 | Dataset: 0-12568901 | Loss: 2.203 | 675 ms/step , 58234.79 GFLOP/s , 532873.4 tokens/s INFO:__main__:2024-10-27 02:49:55 | Epoch: 1 | Step: 40380 | Dataset: 0-12576901 | Loss: 2.159 | 676 ms/step , 58151.78 GFLOP/s , 532533.5 tokens/s INFO:__main__:2024-10-27 02:50:02 | Epoch: 1 | Step: 40390 | Dataset: 0-12584901 | Loss: 2.224 | 675 ms/step , 58247.76 GFLOP/s , 532369.8 tokens/s INFO:__main__:2024-10-27 02:50:10 | Epoch: 1 | Step: 40400 | Dataset: 0-12592901 | Loss: 2.135 | 675 ms/step , 58201.89 GFLOP/s , 532605.6 tokens/s INFO:__main__:2024-10-27 02:50:18 | Epoch: 1 | Step: 40410 | Dataset: 0-12600901 | Loss: 2.070 | 675 ms/step , 58215.96 GFLOP/s , 532425.4 tokens/s INFO:__main__:2024-10-27 02:50:25 | Epoch: 1 | Step: 40420 | Dataset: 0-12608901 | Loss: 2.205 | 676 ms/step , 58172.57 GFLOP/s , 532462.6 tokens/s INFO:__main__:2024-10-27 02:50:33 | Epoch: 1 | Step: 40430 | Dataset: 0-12616901 | Loss: 2.200 | 674 ms/step , 58287.53 GFLOP/s , 532839.9 tokens/s INFO:__main__:2024-10-27 02:50:41 | Epoch: 1 | Step: 40440 | Dataset: 0-12624901 | Loss: 2.258 | 674 ms/step , 58352.37 GFLOP/s , 533241.4 tokens/s INFO:__main__:2024-10-27 02:50:48 | Epoch: 1 | Step: 40450 | Dataset: 0-12632901 | Loss: 2.173 | 675 ms/step , 58267.04 GFLOP/s , 532327.1 tokens/s INFO:__main__:2024-10-27 02:50:56 | Epoch: 1 | Step: 40460 | Dataset: 0-12640901 | Loss: 2.132 | 675 ms/step , 58273.11 GFLOP/s , 532747.1 tokens/s INFO:__main__:2024-10-27 02:51:04 | Epoch: 1 | Step: 40470 | Dataset: 0-12648901 | Loss: 2.210 | 675 ms/step , 58234.13 GFLOP/s , 532417.0 tokens/s INFO:__main__:2024-10-27 02:51:12 | Epoch: 1 | Step: 40480 | Dataset: 0-12656901 | Loss: 2.099 | 674 ms/step , 58284.11 GFLOP/s , 532880.2 tokens/s INFO:__main__:2024-10-27 02:51:19 | Epoch: 1 | Step: 40490 | Dataset: 0-12664901 | Loss: 2.185 | 675 ms/step , 58209.00 GFLOP/s , 532521.3 tokens/s INFO:__main__:2024-10-27 02:51:27 | Epoch: 1 | Step: 40500 | Dataset: 0-12672901 | Loss: 2.233 | 675 ms/step , 58256.72 GFLOP/s , 533202.6 tokens/s INFO:__main__:2024-10-27 02:51:35 | Epoch: 1 | Step: 40510 | Dataset: 0-12680901 | Loss: 2.237 | 675 ms/step , 58208.54 GFLOP/s , 532562.4 tokens/s INFO:__main__:2024-10-27 02:51:42 | Epoch: 1 | Step: 40520 | Dataset: 0-12688901 | Loss: 2.187 | 673 ms/step , 58390.29 GFLOP/s , 532915.3 tokens/s INFO:__main__:2024-10-27 02:51:50 | Epoch: 1 | Step: 40530 | Dataset: 0-12696901 | Loss: 2.222 | 675 ms/step , 58208.34 GFLOP/s , 533022.4 tokens/s INFO:__main__:2024-10-27 02:51:58 | Epoch: 1 | Step: 40540 | Dataset: 0-12704901 | Loss: 2.142 | 674 ms/step , 58321.62 GFLOP/s , 533246.1 tokens/s INFO:__main__:2024-10-27 02:52:05 | Epoch: 1 | Step: 40550 | Dataset: 0-12712901 | Loss: 2.179 | 675 ms/step , 58237.77 GFLOP/s , 533553.3 tokens/s INFO:__main__:2024-10-27 02:52:13 | Epoch: 1 | Step: 40560 | Dataset: 0-12720901 | Loss: 2.224 | 674 ms/step , 58355.47 GFLOP/s , 533484.5 tokens/s INFO:__main__:2024-10-27 02:52:21 | Epoch: 1 | Step: 40570 | Dataset: 0-12728901 | Loss: 2.220 | 676 ms/step , 58180.42 GFLOP/s , 533356.0 tokens/s INFO:__main__:2024-10-27 02:52:28 | Epoch: 1 | Step: 40580 | Dataset: 0-12736901 | Loss: 2.263 | 674 ms/step , 58347.89 GFLOP/s , 533259.0 tokens/s INFO:__main__:2024-10-27 02:52:36 | Epoch: 1 | Step: 40590 | Dataset: 0-12744901 | Loss: 2.161 | 674 ms/step , 58338.55 GFLOP/s , 533470.7 tokens/s INFO:__main__:2024-10-27 02:52:44 | Epoch: 1 | Step: 40600 | Dataset: 0-12752901 | Loss: 2.205 | 674 ms/step , 58289.78 GFLOP/s , 533278.5 tokens/s INFO:__main__:2024-10-27 02:52:51 | Epoch: 1 | Step: 40610 | Dataset: 0-12760901 | Loss: 2.246 | 676 ms/step , 58152.37 GFLOP/s , 533482.3 tokens/s INFO:__main__:2024-10-27 02:52:59 | Epoch: 1 | Step: 40620 | Dataset: 0-12768901 | Loss: 2.233 | 673 ms/step , 58366.23 GFLOP/s , 533467.7 tokens/s INFO:__main__:2024-10-27 02:53:07 | Epoch: 1 | Step: 40630 | Dataset: 0-12776901 | Loss: 2.213 | 675 ms/step , 58228.34 GFLOP/s , 531343.4 tokens/s INFO:__main__:2024-10-27 02:53:14 | Epoch: 1 | Step: 40640 | Dataset: 0-12784901 | Loss: 2.214 | 675 ms/step , 58248.98 GFLOP/s , 533246.2 tokens/s INFO:__main__:2024-10-27 02:53:22 | Epoch: 1 | Step: 40650 | Dataset: 0-12792901 | Loss: 2.159 | 673 ms/step , 58369.99 GFLOP/s , 533135.6 tokens/s INFO:__main__:2024-10-27 02:53:30 | Epoch: 1 | Step: 40660 | Dataset: 0-12800901 | Loss: 2.190 | 674 ms/step , 58344.37 GFLOP/s , 533548.6 tokens/s INFO:__main__:2024-10-27 02:53:38 | Epoch: 1 | Step: 40670 | Dataset: 0-12808901 | Loss: 2.418 | 673 ms/step , 58441.96 GFLOP/s , 533099.6 tokens/s INFO:__main__:2024-10-27 02:53:45 | Epoch: 1 | Step: 40680 | Dataset: 0-12816901 | Loss: 2.424 | 673 ms/step , 58441.61 GFLOP/s , 533868.9 tokens/s INFO:__main__:2024-10-27 02:53:53 | Epoch: 1 | Step: 40690 | Dataset: 0-12824901 | Loss: 2.429 | 675 ms/step , 58252.95 GFLOP/s , 533221.0 tokens/s INFO:__main__:2024-10-27 02:54:01 | Epoch: 1 | Step: 40700 | Dataset: 0-12832901 | Loss: 2.409 | 674 ms/step , 58282.80 GFLOP/s , 533448.7 tokens/s INFO:__main__:2024-10-27 02:54:08 | Epoch: 1 | Step: 40710 | Dataset: 0-12840901 | Loss: 2.322 | 673 ms/step , 58388.05 GFLOP/s , 533846.6 tokens/s INFO:__main__:2024-10-27 02:54:16 | Epoch: 1 | Step: 40720 | Dataset: 0-12848901 | Loss: 2.323 | 674 ms/step , 58325.41 GFLOP/s , 533803.0 tokens/s INFO:__main__:2024-10-27 02:54:24 | Epoch: 1 | Step: 40730 | Dataset: 0-12856901 | Loss: 2.382 | 673 ms/step , 58369.47 GFLOP/s , 533623.7 tokens/s INFO:__main__:2024-10-27 02:54:31 | Epoch: 1 | Step: 40740 | Dataset: 0-12864901 | Loss: 2.380 | 674 ms/step , 58353.22 GFLOP/s , 533645.7 tokens/s INFO:__main__:2024-10-27 02:54:39 | Epoch: 1 | Step: 40750 | Dataset: 0-12872901 | Loss: 2.358 | 674 ms/step , 58341.50 GFLOP/s , 533516.4 tokens/s INFO:__main__:2024-10-27 02:54:47 | Epoch: 1 | Step: 40760 | Dataset: 0-12880901 | Loss: 2.274 | 675 ms/step , 58196.39 GFLOP/s , 533203.9 tokens/s INFO:__main__:2024-10-27 02:54:54 | Epoch: 1 | Step: 40770 | Dataset: 0-12888901 | Loss: 2.426 | 674 ms/step , 58320.71 GFLOP/s , 533449.1 tokens/s INFO:__main__:2024-10-27 02:55:02 | Epoch: 1 | Step: 40780 | Dataset: 0-12896901 | Loss: 2.267 | 674 ms/step , 58332.85 GFLOP/s , 533209.0 tokens/s INFO:__main__:2024-10-27 02:55:10 | Epoch: 1 | Step: 40790 | Dataset: 0-12904901 | Loss: 2.349 | 673 ms/step , 58397.50 GFLOP/s , 533722.2 tokens/s INFO:__main__:2024-10-27 02:55:17 | Epoch: 1 | Step: 40800 | Dataset: 0-12912901 | Loss: 2.313 | 675 ms/step , 58277.53 GFLOP/s , 533536.3 tokens/s INFO:__main__:2024-10-27 02:55:25 | Epoch: 1 | Step: 40810 | Dataset: 0-12920901 | Loss: 2.340 | 674 ms/step , 58342.46 GFLOP/s , 533686.0 tokens/s INFO:__main__:2024-10-27 02:55:33 | Epoch: 1 | Step: 40820 | Dataset: 0-12928901 | Loss: 2.391 | 676 ms/step , 58115.81 GFLOP/s , 533387.5 tokens/s INFO:__main__:2024-10-27 02:55:40 | Epoch: 1 | Step: 40830 | Dataset: 0-12936901 | Loss: 2.038 | 675 ms/step , 58274.54 GFLOP/s , 531944.1 tokens/s INFO:__main__:2024-10-27 02:55:48 | Epoch: 1 | Step: 40840 | Dataset: 0-12944901 | Loss: 1.936 | 675 ms/step , 58193.25 GFLOP/s , 533244.1 tokens/s INFO:__main__:2024-10-27 02:55:56 | Epoch: 1 | Step: 40850 | Dataset: 0-12952901 | Loss: 1.916 | 674 ms/step , 58326.43 GFLOP/s , 533176.5 tokens/s INFO:__main__:2024-10-27 02:56:03 | Epoch: 1 | Step: 40860 | Dataset: 0-12960901 | Loss: 1.879 | 674 ms/step , 58363.14 GFLOP/s , 533376.1 tokens/s INFO:__main__:2024-10-27 02:56:11 | Epoch: 1 | Step: 40870 | Dataset: 0-12968901 | Loss: 1.853 | 674 ms/step , 58299.14 GFLOP/s , 532667.3 tokens/s INFO:__main__:2024-10-27 02:56:19 | Epoch: 1 | Step: 40880 | Dataset: 0-12976901 | Loss: 1.824 | 674 ms/step , 58348.51 GFLOP/s , 533326.9 tokens/s INFO:__main__:2024-10-27 02:56:26 | Epoch: 1 | Step: 40890 | Dataset: 0-12984901 | Loss: 1.821 | 674 ms/step , 58282.50 GFLOP/s , 532910.7 tokens/s INFO:__main__:2024-10-27 02:56:34 | Epoch: 1 | Step: 40900 | Dataset: 0-12992901 | Loss: 1.840 | 673 ms/step , 58390.59 GFLOP/s , 532888.8 tokens/s INFO:__main__:2024-10-27 02:56:42 | Epoch: 1 | Step: 40910 | Dataset: 0-13000901 | Loss: 1.833 | 674 ms/step , 58337.31 GFLOP/s , 532752.6 tokens/s INFO:__main__:2024-10-27 02:56:50 | Epoch: 1 | Step: 40920 | Dataset: 0-13008901 | Loss: 1.792 | 674 ms/step , 58312.32 GFLOP/s , 533132.8 tokens/s INFO:__main__:2024-10-27 02:56:57 | Epoch: 1 | Step: 40930 | Dataset: 0-13016901 | Loss: 1.795 | 674 ms/step , 58322.27 GFLOP/s , 533422.1 tokens/s INFO:__main__:2024-10-27 02:57:05 | Epoch: 1 | Step: 40940 | Dataset: 0-13024901 | Loss: 1.817 | 673 ms/step , 58405.80 GFLOP/s , 533232.3 tokens/s INFO:__main__:2024-10-27 02:57:13 | Epoch: 1 | Step: 40950 | Dataset: 0-13032901 | Loss: 1.807 | 675 ms/step , 58265.36 GFLOP/s , 533446.0 tokens/s INFO:__main__:2024-10-27 02:57:20 | Epoch: 1 | Step: 40960 | Dataset: 0-13040901 | Loss: 1.741 | 674 ms/step , 58292.83 GFLOP/s , 532657.7 tokens/s INFO:__main__:2024-10-27 02:57:28 | Epoch: 1 | Step: 40970 | Dataset: 0-13048901 | Loss: 1.794 | 675 ms/step , 58221.02 GFLOP/s , 532641.9 tokens/s INFO:__main__:2024-10-27 02:57:36 | Epoch: 1 | Step: 40980 | Dataset: 0-13056901 | Loss: 1.765 | 675 ms/step , 58270.60 GFLOP/s , 532406.0 tokens/s INFO:__main__:2024-10-27 02:57:43 | Epoch: 1 | Step: 40990 | Dataset: 0-13064901 | Loss: 1.789 | 676 ms/step , 58162.40 GFLOP/s , 532785.8 tokens/s INFO:__main__:2024-10-27 02:57:51 | Validation | Step: 41000 | Val_loss: 1.783 | Best_val_loss: 1.8362 INFO:__main__:2024-10-27 02:57:51 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_025751_step_41000.pt` INFO:__main__:2024-10-27 02:57:52 | Epoch: 1 | Step: 41000 | Dataset: 0-13072901 | Loss: 2.490 | 674 ms/step , 58340.95 GFLOP/s , 479669.7 tokens/s INFO:__main__:2024-10-27 02:58:00 | Epoch: 1 | Step: 41010 | Dataset: 0-13080901 | Loss: 2.331 | 675 ms/step , 58256.50 GFLOP/s , 533135.1 tokens/s INFO:__main__:2024-10-27 02:58:07 | Epoch: 1 | Step: 41020 | Dataset: 0-13088901 | Loss: 2.232 | 675 ms/step , 58274.22 GFLOP/s , 533500.9 tokens/s INFO:__main__:2024-10-27 02:58:15 | Epoch: 1 | Step: 41030 | Dataset: 0-13096901 | Loss: 2.129 | 674 ms/step , 58280.01 GFLOP/s , 533412.9 tokens/s INFO:__main__:2024-10-27 02:58:23 | Epoch: 1 | Step: 41040 | Dataset: 0-13104901 | Loss: 2.164 | 674 ms/step , 58315.84 GFLOP/s , 533238.2 tokens/s INFO:__main__:2024-10-27 02:58:30 | Epoch: 1 | Step: 41050 | Dataset: 0-13112901 | Loss: 2.087 | 675 ms/step , 58271.56 GFLOP/s , 533010.5 tokens/s INFO:__main__:2024-10-27 02:58:38 | Epoch: 1 | Step: 41060 | Dataset: 0-13120901 | Loss: 2.112 | 675 ms/step , 58273.15 GFLOP/s , 533520.4 tokens/s INFO:__main__:2024-10-27 02:58:46 | Epoch: 1 | Step: 41070 | Dataset: 0-13128901 | Loss: 2.125 | 675 ms/step , 58236.10 GFLOP/s , 532995.9 tokens/s INFO:__main__:2024-10-27 02:58:53 | Epoch: 1 | Step: 41080 | Dataset: 0-13136901 | Loss: 2.104 | 675 ms/step , 58234.04 GFLOP/s , 533181.7 tokens/s INFO:__main__:2024-10-27 02:59:01 | Epoch: 1 | Step: 41090 | Dataset: 0-13144901 | Loss: 2.031 | 674 ms/step , 58346.71 GFLOP/s , 533001.1 tokens/s INFO:__main__:2024-10-27 02:59:09 | Epoch: 1 | Step: 41100 | Dataset: 0-13152901 | Loss: 2.047 | 676 ms/step , 58181.99 GFLOP/s , 533303.2 tokens/s INFO:__main__:2024-10-27 02:59:16 | Epoch: 1 | Step: 41110 | Dataset: 0-13160901 | Loss: 2.129 | 677 ms/step , 58085.51 GFLOP/s , 532587.6 tokens/s INFO:__main__:2024-10-27 02:59:24 | Epoch: 1 | Step: 41120 | Dataset: 0-13168901 | Loss: 2.117 | 675 ms/step , 58228.66 GFLOP/s , 533308.5 tokens/s INFO:__main__:2024-10-27 02:59:32 | Epoch: 1 | Step: 41130 | Dataset: 0-13176901 | Loss: 2.012 | 675 ms/step , 58226.07 GFLOP/s , 533514.9 tokens/s INFO:__main__:2024-10-27 02:59:39 | Epoch: 1 | Step: 41140 | Dataset: 0-13184901 | Loss: 2.123 | 674 ms/step , 58319.37 GFLOP/s , 533074.5 tokens/s INFO:__main__:2024-10-27 02:59:47 | Epoch: 1 | Step: 41150 | Dataset: 0-13192901 | Loss: 2.004 | 676 ms/step , 58129.89 GFLOP/s , 533331.1 tokens/s INFO:__main__:2024-10-27 02:59:55 | Epoch: 1 | Step: 41160 | Dataset: 0-13200901 | Loss: 2.109 | 674 ms/step , 58293.92 GFLOP/s , 532946.5 tokens/s INFO:__main__:2024-10-27 03:00:02 | Epoch: 1 | Step: 41170 | Dataset: 0-13208901 | Loss: 2.331 | 674 ms/step , 58359.03 GFLOP/s , 532471.6 tokens/s INFO:__main__:2024-10-27 03:00:10 | Epoch: 1 | Step: 41180 | Dataset: 0-13216901 | Loss: 2.341 | 675 ms/step , 58201.61 GFLOP/s , 532557.1 tokens/s INFO:__main__:2024-10-27 03:00:18 | Epoch: 1 | Step: 41190 | Dataset: 0-13224901 | Loss: 2.273 | 677 ms/step , 58096.61 GFLOP/s , 532246.0 tokens/s INFO:__main__:2024-10-27 03:00:26 | Epoch: 1 | Step: 41200 | Dataset: 0-13232901 | Loss: 2.202 | 675 ms/step , 58231.00 GFLOP/s , 532038.8 tokens/s INFO:__main__:2024-10-27 03:00:33 | Epoch: 1 | Step: 41210 | Dataset: 0-13240901 | Loss: 2.144 | 674 ms/step , 58294.96 GFLOP/s , 532978.7 tokens/s INFO:__main__:2024-10-27 03:00:41 | Epoch: 1 | Step: 41220 | Dataset: 0-13248901 | Loss: 2.167 | 675 ms/step , 58240.04 GFLOP/s , 532670.5 tokens/s INFO:__main__:2024-10-27 03:00:49 | Epoch: 1 | Step: 41230 | Dataset: 0-13256901 | Loss: 2.241 | 676 ms/step , 58128.41 GFLOP/s , 531839.6 tokens/s INFO:__main__:2024-10-27 03:00:56 | Epoch: 1 | Step: 41240 | Dataset: 0-13264901 | Loss: 2.208 | 676 ms/step , 58146.76 GFLOP/s , 532473.6 tokens/s INFO:__main__:2024-10-27 03:01:04 | Epoch: 1 | Step: 41250 | Dataset: 0-13272901 | Loss: 2.173 | 675 ms/step , 58256.03 GFLOP/s , 532457.7 tokens/s INFO:__main__:2024-10-27 03:01:12 | Epoch: 1 | Step: 41260 | Dataset: 0-13280901 | Loss: 2.173 | 676 ms/step , 58177.06 GFLOP/s , 532072.0 tokens/s INFO:__main__:2024-10-27 03:01:19 | Epoch: 1 | Step: 41270 | Dataset: 0-13288901 | Loss: 2.159 | 676 ms/step , 58166.96 GFLOP/s , 531277.3 tokens/s INFO:__main__:2024-10-27 03:01:27 | Epoch: 1 | Step: 41280 | Dataset: 0-13296901 | Loss: 2.275 | 675 ms/step , 58245.18 GFLOP/s , 532757.7 tokens/s INFO:__main__:2024-10-27 03:01:35 | Epoch: 1 | Step: 41290 | Dataset: 0-13304901 | Loss: 2.252 | 675 ms/step , 58248.73 GFLOP/s , 532935.6 tokens/s INFO:__main__:2024-10-27 03:01:42 | Epoch: 1 | Step: 41300 | Dataset: 0-13312901 | Loss: 2.089 | 675 ms/step , 58225.46 GFLOP/s , 533281.0 tokens/s INFO:__main__:2024-10-27 03:01:50 | Epoch: 1 | Step: 41310 | Dataset: 0-13320901 | Loss: 2.201 | 675 ms/step , 58237.16 GFLOP/s , 532898.5 tokens/s INFO:__main__:2024-10-27 03:01:58 | Epoch: 1 | Step: 41320 | Dataset: 0-13328901 | Loss: 2.157 | 676 ms/step , 58142.85 GFLOP/s , 532991.0 tokens/s INFO:__main__:2024-10-27 03:02:06 | Epoch: 1 | Step: 41330 | Dataset: 0-13336901 | Loss: 2.133 | 676 ms/step , 58112.36 GFLOP/s , 532776.1 tokens/s INFO:__main__:2024-10-27 03:02:13 | Epoch: 1 | Step: 41340 | Dataset: 0-13344901 | Loss: 2.197 | 677 ms/step , 58067.01 GFLOP/s , 533228.7 tokens/s INFO:__main__:2024-10-27 03:02:21 | Epoch: 1 | Step: 41350 | Dataset: 0-13352901 | Loss: 2.153 | 675 ms/step , 58261.05 GFLOP/s , 533456.8 tokens/s INFO:__main__:2024-10-27 03:02:29 | Epoch: 1 | Step: 41360 | Dataset: 0-13360901 | Loss: 2.160 | 674 ms/step , 58353.25 GFLOP/s , 533533.1 tokens/s INFO:__main__:2024-10-27 03:02:36 | Epoch: 1 | Step: 41370 | Dataset: 0-13368901 | Loss: 2.203 | 674 ms/step , 58332.63 GFLOP/s , 533644.9 tokens/s INFO:__main__:2024-10-27 03:02:44 | Epoch: 1 | Step: 41380 | Dataset: 0-13376901 | Loss: 2.188 | 674 ms/step , 58328.21 GFLOP/s , 533347.4 tokens/s INFO:__main__:2024-10-27 03:02:52 | Epoch: 1 | Step: 41390 | Dataset: 0-13384901 | Loss: 2.178 | 674 ms/step , 58282.29 GFLOP/s , 533367.8 tokens/s INFO:__main__:2024-10-27 03:02:59 | Epoch: 1 | Step: 41400 | Dataset: 0-13392901 | Loss: 2.139 | 673 ms/step , 58379.06 GFLOP/s , 533618.1 tokens/s INFO:__main__:2024-10-27 03:03:07 | Epoch: 1 | Step: 41410 | Dataset: 0-13400901 | Loss: 2.322 | 673 ms/step , 58405.72 GFLOP/s , 533710.2 tokens/s INFO:__main__:2024-10-27 03:03:15 | Epoch: 1 | Step: 41420 | Dataset: 0-13408901 | Loss: 2.136 | 675 ms/step , 58217.75 GFLOP/s , 533706.2 tokens/s INFO:__main__:2024-10-27 03:03:22 | Epoch: 1 | Step: 41430 | Dataset: 0-13416901 | Loss: 2.158 | 676 ms/step , 58174.80 GFLOP/s , 533211.5 tokens/s INFO:__main__:2024-10-27 03:03:30 | Epoch: 1 | Step: 41440 | Dataset: 0-13424901 | Loss: 2.112 | 675 ms/step , 58276.43 GFLOP/s , 533111.2 tokens/s INFO:__main__:2024-10-27 03:03:38 | Epoch: 1 | Step: 41450 | Dataset: 0-13432901 | Loss: 2.147 | 676 ms/step , 58176.24 GFLOP/s , 533248.2 tokens/s INFO:__main__:2024-10-27 03:03:45 | Epoch: 1 | Step: 41460 | Dataset: 0-13440901 | Loss: 2.140 | 675 ms/step , 58203.44 GFLOP/s , 533010.5 tokens/s INFO:__main__:2024-10-27 03:03:53 | Epoch: 1 | Step: 41470 | Dataset: 0-13448901 | Loss: 2.118 | 674 ms/step , 58289.40 GFLOP/s , 533480.0 tokens/s INFO:__main__:2024-10-27 03:04:01 | Epoch: 1 | Step: 41480 | Dataset: 0-13456901 | Loss: 2.234 | 674 ms/step , 58342.81 GFLOP/s , 533211.7 tokens/s INFO:__main__:2024-10-27 03:04:08 | Epoch: 1 | Step: 41490 | Dataset: 0-13464901 | Loss: 1.886 | 674 ms/step , 58300.58 GFLOP/s , 532267.1 tokens/s INFO:__main__:2024-10-27 03:04:16 | Epoch: 1 | Step: 41500 | Dataset: 0-13472901 | Loss: 1.778 | 675 ms/step , 58228.25 GFLOP/s , 532769.4 tokens/s INFO:__main__:2024-10-27 03:04:24 | Epoch: 1 | Step: 41510 | Dataset: 0-13480901 | Loss: 1.739 | 674 ms/step , 58356.07 GFLOP/s , 533089.5 tokens/s INFO:__main__:2024-10-27 03:04:31 | Epoch: 1 | Step: 41520 | Dataset: 0-13488901 | Loss: 1.734 | 675 ms/step , 58273.02 GFLOP/s , 533034.8 tokens/s INFO:__main__:2024-10-27 03:04:39 | Epoch: 1 | Step: 41530 | Dataset: 0-13496901 | Loss: 1.738 | 674 ms/step , 58353.51 GFLOP/s , 533006.0 tokens/s INFO:__main__:2024-10-27 03:04:47 | Epoch: 1 | Step: 41540 | Dataset: 0-13504901 | Loss: 1.688 | 673 ms/step , 58376.66 GFLOP/s , 533554.1 tokens/s INFO:__main__:2024-10-27 03:04:55 | Epoch: 1 | Step: 41550 | Dataset: 0-13512901 | Loss: 1.671 | 675 ms/step , 58202.46 GFLOP/s , 533144.0 tokens/s INFO:__main__:2024-10-27 03:05:02 | Epoch: 1 | Step: 41560 | Dataset: 0-13520901 | Loss: 1.715 | 674 ms/step , 58335.88 GFLOP/s , 532419.1 tokens/s INFO:__main__:2024-10-27 03:05:10 | Epoch: 1 | Step: 41570 | Dataset: 0-13528901 | Loss: 1.691 | 675 ms/step , 58203.28 GFLOP/s , 533120.7 tokens/s INFO:__main__:2024-10-27 03:05:18 | Epoch: 1 | Step: 41580 | Dataset: 0-13536901 | Loss: 1.802 | 674 ms/step , 58300.69 GFLOP/s , 532902.8 tokens/s INFO:__main__:2024-10-27 03:05:25 | Epoch: 1 | Step: 41590 | Dataset: 0-13544901 | Loss: 1.786 | 675 ms/step , 58208.08 GFLOP/s , 532686.4 tokens/s INFO:__main__:2024-10-27 03:05:33 | Epoch: 1 | Step: 41600 | Dataset: 0-13552901 | Loss: 1.805 | 674 ms/step , 58329.31 GFLOP/s , 533125.4 tokens/s INFO:__main__:2024-10-27 03:05:41 | Epoch: 1 | Step: 41610 | Dataset: 0-13560901 | Loss: 1.790 | 674 ms/step , 58345.27 GFLOP/s , 533257.3 tokens/s INFO:__main__:2024-10-27 03:05:48 | Epoch: 1 | Step: 41620 | Dataset: 0-13568901 | Loss: 1.760 | 676 ms/step , 58176.10 GFLOP/s , 531682.6 tokens/s INFO:__main__:2024-10-27 03:05:56 | Epoch: 1 | Step: 41630 | Dataset: 0-13576901 | Loss: 1.751 | 675 ms/step , 58230.21 GFLOP/s , 531134.6 tokens/s INFO:__main__:2024-10-27 03:06:04 | Epoch: 1 | Step: 41640 | Dataset: 0-13584901 | Loss: 1.762 | 676 ms/step , 58139.95 GFLOP/s , 531551.6 tokens/s INFO:__main__:2024-10-27 03:06:11 | Epoch: 1 | Step: 41650 | Dataset: 0-13592901 | Loss: 1.756 | 675 ms/step , 58232.27 GFLOP/s , 530600.5 tokens/s INFO:__main__:2024-10-27 03:06:19 | Epoch: 1 | Step: 41660 | Dataset: 0-13600901 | Loss: 1.786 | 673 ms/step , 58392.33 GFLOP/s , 531711.7 tokens/s INFO:__main__:2024-10-27 03:06:27 | Epoch: 1 | Step: 41670 | Dataset: 0-13608901 | Loss: 2.263 | 676 ms/step , 58157.44 GFLOP/s , 532535.4 tokens/s INFO:__main__:2024-10-27 03:06:35 | Epoch: 1 | Step: 41680 | Dataset: 0-13616901 | Loss: 2.162 | 677 ms/step , 58084.72 GFLOP/s , 531374.5 tokens/s INFO:__main__:2024-10-27 03:06:42 | Epoch: 1 | Step: 41690 | Dataset: 0-13624901 | Loss: 2.139 | 676 ms/step , 58137.03 GFLOP/s , 529621.9 tokens/s INFO:__main__:2024-10-27 03:06:50 | Epoch: 1 | Step: 41700 | Dataset: 0-13632901 | Loss: 2.132 | 674 ms/step , 58289.17 GFLOP/s , 532062.4 tokens/s INFO:__main__:2024-10-27 03:06:58 | Epoch: 1 | Step: 41710 | Dataset: 0-13640901 | Loss: 2.146 | 676 ms/step , 58119.29 GFLOP/s , 532668.8 tokens/s INFO:__main__:2024-10-27 03:07:05 | Epoch: 1 | Step: 41720 | Dataset: 0-13648901 | Loss: 2.206 | 676 ms/step , 58151.98 GFLOP/s , 533692.9 tokens/s INFO:__main__:2024-10-27 03:07:13 | Epoch: 1 | Step: 41730 | Dataset: 0-13656901 | Loss: 2.227 | 676 ms/step , 58117.08 GFLOP/s , 532867.5 tokens/s INFO:__main__:2024-10-27 03:07:21 | Epoch: 1 | Step: 41740 | Dataset: 0-13664901 | Loss: 2.261 | 674 ms/step , 58292.18 GFLOP/s , 533389.8 tokens/s INFO:__main__:2024-10-27 03:07:28 | Epoch: 1 | Step: 41750 | Dataset: 0-13672901 | Loss: 2.191 | 675 ms/step , 58261.92 GFLOP/s , 533207.9 tokens/s INFO:__main__:2024-10-27 03:07:36 | Epoch: 1 | Step: 41760 | Dataset: 0-13680901 | Loss: 2.201 | 674 ms/step , 58336.93 GFLOP/s , 533229.7 tokens/s INFO:__main__:2024-10-27 03:07:44 | Epoch: 1 | Step: 41770 | Dataset: 0-13688901 | Loss: 2.261 | 676 ms/step , 58149.36 GFLOP/s , 533536.6 tokens/s INFO:__main__:2024-10-27 03:07:51 | Epoch: 1 | Step: 41780 | Dataset: 0-13696901 | Loss: 2.154 | 675 ms/step , 58254.82 GFLOP/s , 533127.0 tokens/s INFO:__main__:2024-10-27 03:07:59 | Epoch: 1 | Step: 41790 | Dataset: 0-13704901 | Loss: 2.105 | 674 ms/step , 58296.01 GFLOP/s , 533667.8 tokens/s INFO:__main__:2024-10-27 03:08:07 | Epoch: 1 | Step: 41800 | Dataset: 0-13712901 | Loss: 2.176 | 674 ms/step , 58292.20 GFLOP/s , 533461.6 tokens/s INFO:__main__:2024-10-27 03:08:15 | Epoch: 1 | Step: 41810 | Dataset: 0-13720901 | Loss: 2.161 | 674 ms/step , 58324.49 GFLOP/s , 533546.4 tokens/s INFO:__main__:2024-10-27 03:08:22 | Epoch: 1 | Step: 41820 | Dataset: 0-13728901 | Loss: 2.229 | 676 ms/step , 58154.33 GFLOP/s , 533556.0 tokens/s INFO:__main__:2024-10-27 03:08:30 | Epoch: 1 | Step: 41830 | Dataset: 0-13736901 | Loss: 2.252 | 675 ms/step , 58271.86 GFLOP/s , 533509.3 tokens/s INFO:__main__:2024-10-27 03:08:38 | Epoch: 1 | Step: 41840 | Dataset: 0-13744901 | Loss: 2.196 | 674 ms/step , 58343.40 GFLOP/s , 533532.3 tokens/s INFO:__main__:2024-10-27 03:08:45 | Epoch: 1 | Step: 41850 | Dataset: 0-13752901 | Loss: 2.217 | 675 ms/step , 58273.60 GFLOP/s , 533356.6 tokens/s INFO:__main__:2024-10-27 03:08:53 | Epoch: 1 | Step: 41860 | Dataset: 0-13760901 | Loss: 2.219 | 676 ms/step , 58180.57 GFLOP/s , 533180.4 tokens/s INFO:__main__:2024-10-27 03:09:01 | Epoch: 1 | Step: 41870 | Dataset: 0-13768901 | Loss: 2.193 | 675 ms/step , 58277.27 GFLOP/s , 533335.3 tokens/s INFO:__main__:2024-10-27 03:09:08 | Epoch: 1 | Step: 41880 | Dataset: 0-13776901 | Loss: 2.181 | 675 ms/step , 58262.56 GFLOP/s , 533235.9 tokens/s INFO:__main__:2024-10-27 03:09:16 | Epoch: 1 | Step: 41890 | Dataset: 0-13784901 | Loss: 2.242 | 675 ms/step , 58247.24 GFLOP/s , 532755.1 tokens/s INFO:__main__:2024-10-27 03:09:24 | Epoch: 1 | Step: 41900 | Dataset: 0-13792901 | Loss: 2.266 | 675 ms/step , 58214.43 GFLOP/s , 533133.6 tokens/s INFO:__main__:2024-10-27 03:09:31 | Epoch: 1 | Step: 41910 | Dataset: 0-13800901 | Loss: 2.190 | 675 ms/step , 58270.06 GFLOP/s , 532986.0 tokens/s INFO:__main__:2024-10-27 03:09:39 | Epoch: 1 | Step: 41920 | Dataset: 0-13808901 | Loss: 2.188 | 676 ms/step , 58136.84 GFLOP/s , 532972.1 tokens/s INFO:__main__:2024-10-27 03:09:47 | Epoch: 1 | Step: 41930 | Dataset: 0-13816901 | Loss: 2.174 | 676 ms/step , 58173.37 GFLOP/s , 533231.6 tokens/s INFO:__main__:2024-10-27 03:09:54 | Epoch: 1 | Step: 41940 | Dataset: 0-13824901 | Loss: 2.275 | 675 ms/step , 58198.30 GFLOP/s , 533001.3 tokens/s INFO:__main__:2024-10-27 03:10:02 | Epoch: 1 | Step: 41950 | Dataset: 0-13832901 | Loss: 2.223 | 675 ms/step , 58261.56 GFLOP/s , 533298.0 tokens/s INFO:__main__:2024-10-27 03:10:10 | Epoch: 1 | Step: 41960 | Dataset: 0-13840901 | Loss: 2.152 | 675 ms/step , 58278.62 GFLOP/s , 533238.8 tokens/s INFO:__main__:2024-10-27 03:10:17 | Epoch: 1 | Step: 41970 | Dataset: 0-13848901 | Loss: 2.263 | 675 ms/step , 58229.98 GFLOP/s , 533048.6 tokens/s INFO:__main__:2024-10-27 03:10:25 | Epoch: 1 | Step: 41980 | Dataset: 0-13856901 | Loss: 2.234 | 676 ms/step , 58190.32 GFLOP/s , 532774.0 tokens/s INFO:__main__:2024-10-27 03:10:33 | Epoch: 1 | Step: 41990 | Dataset: 0-13864901 | Loss: 2.236 | 675 ms/step , 58240.72 GFLOP/s , 533178.3 tokens/s INFO:__main__:2024-10-27 03:10:40 | Validation | Step: 42000 | Val_loss: 2.114 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 03:10:40 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_031040_step_42000.pt` INFO:__main__:2024-10-27 03:10:41 | Epoch: 1 | Step: 42000 | Dataset: 0-13872901 | Loss: 2.168 | 673 ms/step , 58397.66 GFLOP/s , 480116.5 tokens/s INFO:__main__:2024-10-27 03:10:49 | Epoch: 1 | Step: 42010 | Dataset: 0-13880901 | Loss: 2.153 | 675 ms/step , 58267.11 GFLOP/s , 533298.3 tokens/s INFO:__main__:2024-10-27 03:10:57 | Epoch: 1 | Step: 42020 | Dataset: 0-13888901 | Loss: 2.154 | 674 ms/step , 58316.71 GFLOP/s , 533263.5 tokens/s INFO:__main__:2024-10-27 03:11:04 | Epoch: 1 | Step: 42030 | Dataset: 0-13896901 | Loss: 2.258 | 674 ms/step , 58289.57 GFLOP/s , 533274.9 tokens/s INFO:__main__:2024-10-27 03:11:12 | Epoch: 1 | Step: 42040 | Dataset: 0-13904901 | Loss: 2.108 | 674 ms/step , 58289.41 GFLOP/s , 532905.4 tokens/s INFO:__main__:2024-10-27 03:11:20 | Epoch: 1 | Step: 42050 | Dataset: 0-13912901 | Loss: 2.193 | 674 ms/step , 58336.34 GFLOP/s , 533401.2 tokens/s INFO:__main__:2024-10-27 03:11:27 | Epoch: 1 | Step: 42060 | Dataset: 0-13920901 | Loss: 2.083 | 674 ms/step , 58302.28 GFLOP/s , 533450.6 tokens/s INFO:__main__:2024-10-27 03:11:35 | Epoch: 1 | Step: 42070 | Dataset: 0-13928901 | Loss: 2.276 | 674 ms/step , 58327.02 GFLOP/s , 533398.1 tokens/s INFO:__main__:2024-10-27 03:11:43 | Epoch: 1 | Step: 42080 | Dataset: 0-13936901 | Loss: 2.166 | 675 ms/step , 58228.62 GFLOP/s , 533148.8 tokens/s INFO:__main__:2024-10-27 03:11:50 | Epoch: 1 | Step: 42090 | Dataset: 0-13944901 | Loss: 2.164 | 676 ms/step , 58154.81 GFLOP/s , 533297.7 tokens/s INFO:__main__:2024-10-27 03:11:58 | Epoch: 1 | Step: 42100 | Dataset: 0-13952901 | Loss: 2.145 | 674 ms/step , 58286.65 GFLOP/s , 533536.2 tokens/s INFO:__main__:2024-10-27 03:12:06 | Epoch: 1 | Step: 42110 | Dataset: 0-13960901 | Loss: 2.193 | 674 ms/step , 58291.88 GFLOP/s , 533558.0 tokens/s INFO:__main__:2024-10-27 03:12:13 | Epoch: 1 | Step: 42120 | Dataset: 0-13968901 | Loss: 2.177 | 675 ms/step , 58244.34 GFLOP/s , 533270.0 tokens/s INFO:__main__:2024-10-27 03:12:21 | Epoch: 1 | Step: 42130 | Dataset: 0-13976901 | Loss: 2.222 | 674 ms/step , 58313.83 GFLOP/s , 533354.1 tokens/s INFO:__main__:2024-10-27 03:12:29 | Epoch: 1 | Step: 42140 | Dataset: 0-13984901 | Loss: 2.213 | 674 ms/step , 58289.30 GFLOP/s , 533384.5 tokens/s INFO:__main__:2024-10-27 03:12:37 | Epoch: 1 | Step: 42150 | Dataset: 0-13992901 | Loss: 2.132 | 675 ms/step , 58200.34 GFLOP/s , 533208.8 tokens/s INFO:__main__:2024-10-27 03:12:44 | Epoch: 1 | Step: 42160 | Dataset: 0-14000901 | Loss: 2.113 | 674 ms/step , 58343.82 GFLOP/s , 533788.2 tokens/s INFO:__main__:2024-10-27 03:12:52 | Epoch: 1 | Step: 42170 | Dataset: 0-14008901 | Loss: 2.048 | 674 ms/step , 58297.07 GFLOP/s , 533435.5 tokens/s INFO:__main__:2024-10-27 03:13:00 | Epoch: 1 | Step: 42180 | Dataset: 0-14016901 | Loss: 2.065 | 675 ms/step , 58211.76 GFLOP/s , 532007.6 tokens/s INFO:__main__:2024-10-27 03:13:07 | Epoch: 1 | Step: 42190 | Dataset: 0-14024901 | Loss: 2.055 | 675 ms/step , 58278.94 GFLOP/s , 533424.9 tokens/s INFO:__main__:2024-10-27 03:13:15 | Epoch: 1 | Step: 42200 | Dataset: 0-14032901 | Loss: 2.056 | 674 ms/step , 58304.69 GFLOP/s , 533461.6 tokens/s INFO:__main__:2024-10-27 03:13:23 | Epoch: 1 | Step: 42210 | Dataset: 0-14040901 | Loss: 1.990 | 674 ms/step , 58318.07 GFLOP/s , 533768.0 tokens/s INFO:__main__:2024-10-27 03:13:30 | Epoch: 1 | Step: 42220 | Dataset: 0-14048901 | Loss: 2.006 | 674 ms/step , 58321.25 GFLOP/s , 533538.7 tokens/s INFO:__main__:2024-10-27 03:13:38 | Epoch: 1 | Step: 42230 | Dataset: 0-14056901 | Loss: 2.004 | 675 ms/step , 58245.08 GFLOP/s , 533683.1 tokens/s INFO:__main__:2024-10-27 03:13:46 | Epoch: 1 | Step: 42240 | Dataset: 0-14064901 | Loss: 2.017 | 674 ms/step , 58354.52 GFLOP/s , 533171.9 tokens/s INFO:__main__:2024-10-27 03:13:53 | Epoch: 1 | Step: 42250 | Dataset: 0-14072901 | Loss: 2.024 | 674 ms/step , 58310.52 GFLOP/s , 533686.4 tokens/s INFO:__main__:2024-10-27 03:14:01 | Epoch: 1 | Step: 42260 | Dataset: 0-14080901 | Loss: 2.058 | 674 ms/step , 58302.19 GFLOP/s , 533537.8 tokens/s INFO:__main__:2024-10-27 03:14:09 | Epoch: 1 | Step: 42270 | Dataset: 0-14088901 | Loss: 1.981 | 674 ms/step , 58344.61 GFLOP/s , 533048.4 tokens/s INFO:__main__:2024-10-27 03:14:16 | Epoch: 1 | Step: 42280 | Dataset: 0-14096901 | Loss: 2.017 | 673 ms/step , 58391.93 GFLOP/s , 532463.8 tokens/s INFO:__main__:2024-10-27 03:14:24 | Epoch: 1 | Step: 42290 | Dataset: 0-14104901 | Loss: 1.964 | 675 ms/step , 58203.78 GFLOP/s , 532821.8 tokens/s INFO:__main__:2024-10-27 03:14:32 | Epoch: 1 | Step: 42300 | Dataset: 0-14112901 | Loss: 1.952 | 676 ms/step , 58148.30 GFLOP/s , 531684.3 tokens/s INFO:__main__:2024-10-27 03:14:39 | Epoch: 1 | Step: 42310 | Dataset: 0-14120901 | Loss: 2.321 | 675 ms/step , 58216.97 GFLOP/s , 532226.3 tokens/s INFO:__main__:2024-10-27 03:14:47 | Epoch: 1 | Step: 42320 | Dataset: 0-14128901 | Loss: 2.246 | 674 ms/step , 58279.22 GFLOP/s , 532868.2 tokens/s INFO:__main__:2024-10-27 03:14:55 | Epoch: 1 | Step: 42330 | Dataset: 0-14136901 | Loss: 2.194 | 675 ms/step , 58263.84 GFLOP/s , 532275.9 tokens/s INFO:__main__:2024-10-27 03:15:03 | Epoch: 1 | Step: 42340 | Dataset: 0-14144901 | Loss: 2.273 | 674 ms/step , 58306.74 GFLOP/s , 532997.7 tokens/s INFO:__main__:2024-10-27 03:15:10 | Epoch: 1 | Step: 42350 | Dataset: 0-14152901 | Loss: 2.203 | 676 ms/step , 58164.55 GFLOP/s , 532471.9 tokens/s INFO:__main__:2024-10-27 03:15:18 | Epoch: 1 | Step: 42360 | Dataset: 0-14160901 | Loss: 2.109 | 676 ms/step , 58120.16 GFLOP/s , 531436.7 tokens/s INFO:__main__:2024-10-27 03:15:26 | Epoch: 1 | Step: 42370 | Dataset: 0-14168901 | Loss: 2.167 | 673 ms/step , 58411.95 GFLOP/s , 532493.6 tokens/s INFO:__main__:2024-10-27 03:15:33 | Epoch: 1 | Step: 42380 | Dataset: 0-14176901 | Loss: 2.137 | 675 ms/step , 58242.82 GFLOP/s , 533678.6 tokens/s INFO:__main__:2024-10-27 03:15:41 | Epoch: 1 | Step: 42390 | Dataset: 0-14184901 | Loss: 2.214 | 675 ms/step , 58252.53 GFLOP/s , 533433.9 tokens/s INFO:__main__:2024-10-27 03:15:49 | Epoch: 1 | Step: 42400 | Dataset: 0-14192901 | Loss: 2.103 | 674 ms/step , 58303.30 GFLOP/s , 533401.9 tokens/s INFO:__main__:2024-10-27 03:15:56 | Epoch: 1 | Step: 42410 | Dataset: 0-14200901 | Loss: 2.250 | 676 ms/step , 58121.27 GFLOP/s , 533501.7 tokens/s INFO:__main__:2024-10-27 03:16:04 | Epoch: 1 | Step: 42420 | Dataset: 0-14208901 | Loss: 2.226 | 675 ms/step , 58274.06 GFLOP/s , 532784.3 tokens/s INFO:__main__:2024-10-27 03:16:12 | Epoch: 1 | Step: 42430 | Dataset: 0-14216901 | Loss: 2.159 | 674 ms/step , 58286.63 GFLOP/s , 533396.0 tokens/s INFO:__main__:2024-10-27 03:16:19 | Epoch: 1 | Step: 42440 | Dataset: 0-14224901 | Loss: 2.117 | 674 ms/step , 58296.91 GFLOP/s , 533191.0 tokens/s INFO:__main__:2024-10-27 03:16:27 | Epoch: 1 | Step: 42450 | Dataset: 0-14232901 | Loss: 2.066 | 674 ms/step , 58307.67 GFLOP/s , 533500.3 tokens/s INFO:__main__:2024-10-27 03:16:35 | Epoch: 1 | Step: 42460 | Dataset: 0-14240901 | Loss: 2.167 | 674 ms/step , 58282.34 GFLOP/s , 532690.1 tokens/s INFO:__main__:2024-10-27 03:16:42 | Epoch: 1 | Step: 42470 | Dataset: 0-14248901 | Loss: 2.002 | 673 ms/step , 58365.76 GFLOP/s , 533257.5 tokens/s INFO:__main__:2024-10-27 03:16:50 | Epoch: 1 | Step: 42480 | Dataset: 0-14256901 | Loss: 1.849 | 674 ms/step , 58351.20 GFLOP/s , 533453.5 tokens/s INFO:__main__:2024-10-27 03:16:58 | Epoch: 1 | Step: 42490 | Dataset: 0-14264901 | Loss: 1.756 | 675 ms/step , 58236.06 GFLOP/s , 532762.0 tokens/s INFO:__main__:2024-10-27 03:17:05 | Epoch: 1 | Step: 42500 | Dataset: 0-14272901 | Loss: 1.731 | 673 ms/step , 58366.00 GFLOP/s , 533115.6 tokens/s INFO:__main__:2024-10-27 03:17:13 | Epoch: 1 | Step: 42510 | Dataset: 0-14280901 | Loss: 1.706 | 673 ms/step , 58377.92 GFLOP/s , 533142.2 tokens/s INFO:__main__:2024-10-27 03:17:21 | Epoch: 1 | Step: 42520 | Dataset: 0-14288901 | Loss: 1.713 | 675 ms/step , 58235.72 GFLOP/s , 532887.9 tokens/s INFO:__main__:2024-10-27 03:17:29 | Epoch: 1 | Step: 42530 | Dataset: 0-14296901 | Loss: 1.695 | 674 ms/step , 58338.58 GFLOP/s , 532690.9 tokens/s INFO:__main__:2024-10-27 03:17:36 | Epoch: 1 | Step: 42540 | Dataset: 0-14304901 | Loss: 1.693 | 675 ms/step , 58253.73 GFLOP/s , 533319.4 tokens/s INFO:__main__:2024-10-27 03:17:44 | Epoch: 1 | Step: 42550 | Dataset: 0-14312901 | Loss: 1.703 | 677 ms/step , 58102.00 GFLOP/s , 532939.7 tokens/s INFO:__main__:2024-10-27 03:17:52 | Epoch: 1 | Step: 42560 | Dataset: 0-14320901 | Loss: 2.207 | 674 ms/step , 58348.30 GFLOP/s , 533589.2 tokens/s INFO:__main__:2024-10-27 03:17:59 | Epoch: 1 | Step: 42570 | Dataset: 0-14328901 | Loss: 2.287 | 675 ms/step , 58242.42 GFLOP/s , 533071.6 tokens/s INFO:__main__:2024-10-27 03:18:07 | Epoch: 1 | Step: 42580 | Dataset: 0-14336901 | Loss: 2.258 | 675 ms/step , 58255.84 GFLOP/s , 533133.3 tokens/s INFO:__main__:2024-10-27 03:18:15 | Epoch: 1 | Step: 42590 | Dataset: 0-14344901 | Loss: 2.178 | 674 ms/step , 58294.88 GFLOP/s , 533307.7 tokens/s INFO:__main__:2024-10-27 03:18:22 | Epoch: 1 | Step: 42600 | Dataset: 0-14352901 | Loss: 2.189 | 675 ms/step , 58276.56 GFLOP/s , 533270.6 tokens/s INFO:__main__:2024-10-27 03:18:30 | Epoch: 1 | Step: 42610 | Dataset: 0-14360901 | Loss: 2.264 | 674 ms/step , 58336.82 GFLOP/s , 533297.0 tokens/s INFO:__main__:2024-10-27 03:18:38 | Epoch: 1 | Step: 42620 | Dataset: 0-14368901 | Loss: 2.214 | 675 ms/step , 58259.51 GFLOP/s , 532821.3 tokens/s INFO:__main__:2024-10-27 03:18:45 | Epoch: 1 | Step: 42630 | Dataset: 0-14376901 | Loss: 2.218 | 673 ms/step , 58442.07 GFLOP/s , 532606.6 tokens/s INFO:__main__:2024-10-27 03:18:53 | Epoch: 1 | Step: 42640 | Dataset: 0-14384901 | Loss: 2.168 | 691 ms/step , 56872.91 GFLOP/s , 531651.3 tokens/s INFO:__main__:2024-10-27 03:19:01 | Epoch: 1 | Step: 42650 | Dataset: 0-14392901 | Loss: 2.245 | 674 ms/step , 58359.62 GFLOP/s , 533700.9 tokens/s INFO:__main__:2024-10-27 03:19:08 | Epoch: 1 | Step: 42660 | Dataset: 0-14400901 | Loss: 2.200 | 674 ms/step , 58295.10 GFLOP/s , 533194.6 tokens/s INFO:__main__:2024-10-27 03:19:16 | Epoch: 1 | Step: 42670 | Dataset: 0-14408901 | Loss: 2.157 | 674 ms/step , 58308.06 GFLOP/s , 533625.1 tokens/s INFO:__main__:2024-10-27 03:19:24 | Epoch: 1 | Step: 42680 | Dataset: 0-14416901 | Loss: 2.170 | 674 ms/step , 58319.38 GFLOP/s , 533264.3 tokens/s INFO:__main__:2024-10-27 03:19:31 | Epoch: 1 | Step: 42690 | Dataset: 0-14424901 | Loss: 2.148 | 673 ms/step , 58372.39 GFLOP/s , 533375.6 tokens/s INFO:__main__:2024-10-27 03:19:39 | Epoch: 1 | Step: 42700 | Dataset: 0-14432901 | Loss: 2.157 | 674 ms/step , 58297.50 GFLOP/s , 533107.8 tokens/s INFO:__main__:2024-10-27 03:19:47 | Epoch: 1 | Step: 42710 | Dataset: 0-14440901 | Loss: 2.256 | 675 ms/step , 58276.73 GFLOP/s , 533159.8 tokens/s INFO:__main__:2024-10-27 03:19:55 | Epoch: 1 | Step: 42720 | Dataset: 0-14448901 | Loss: 1.755 | 674 ms/step , 58320.39 GFLOP/s , 532823.0 tokens/s INFO:__main__:2024-10-27 03:20:02 | Epoch: 1 | Step: 42730 | Dataset: 0-14456901 | Loss: 1.711 | 675 ms/step , 58260.15 GFLOP/s , 532512.8 tokens/s INFO:__main__:2024-10-27 03:20:10 | Epoch: 1 | Step: 42740 | Dataset: 0-14464901 | Loss: 1.681 | 675 ms/step , 58264.36 GFLOP/s , 532764.4 tokens/s INFO:__main__:2024-10-27 03:20:18 | Epoch: 1 | Step: 42750 | Dataset: 0-14472901 | Loss: 1.711 | 674 ms/step , 58337.50 GFLOP/s , 532913.8 tokens/s INFO:__main__:2024-10-27 03:20:25 | Epoch: 1 | Step: 42760 | Dataset: 0-14480901 | Loss: 1.688 | 675 ms/step , 58274.71 GFLOP/s , 532746.6 tokens/s INFO:__main__:2024-10-27 03:20:33 | Epoch: 1 | Step: 42770 | Dataset: 0-14488901 | Loss: 1.647 | 676 ms/step , 58120.97 GFLOP/s , 532600.2 tokens/s INFO:__main__:2024-10-27 03:20:41 | Epoch: 1 | Step: 42780 | Dataset: 0-14496901 | Loss: 1.638 | 674 ms/step , 58309.48 GFLOP/s , 532759.3 tokens/s INFO:__main__:2024-10-27 03:20:48 | Epoch: 1 | Step: 42790 | Dataset: 0-14504901 | Loss: 1.605 | 678 ms/step , 58015.37 GFLOP/s , 532564.2 tokens/s INFO:__main__:2024-10-27 03:20:56 | Epoch: 1 | Step: 42800 | Dataset: 0-14512901 | Loss: 1.667 | 674 ms/step , 58289.21 GFLOP/s , 532611.4 tokens/s INFO:__main__:2024-10-27 03:21:04 | Epoch: 1 | Step: 42810 | Dataset: 0-14520901 | Loss: 2.179 | 673 ms/step , 58367.98 GFLOP/s , 533694.3 tokens/s INFO:__main__:2024-10-27 03:21:11 | Epoch: 1 | Step: 42820 | Dataset: 0-14528901 | Loss: 2.176 | 677 ms/step , 58087.50 GFLOP/s , 533338.9 tokens/s INFO:__main__:2024-10-27 03:21:19 | Epoch: 1 | Step: 42830 | Dataset: 0-14536901 | Loss: 2.199 | 674 ms/step , 58328.03 GFLOP/s , 533082.1 tokens/s INFO:__main__:2024-10-27 03:21:27 | Epoch: 1 | Step: 42840 | Dataset: 0-14544901 | Loss: 2.186 | 674 ms/step , 58312.10 GFLOP/s , 533159.8 tokens/s INFO:__main__:2024-10-27 03:21:34 | Epoch: 1 | Step: 42850 | Dataset: 0-14552901 | Loss: 2.206 | 674 ms/step , 58308.27 GFLOP/s , 533303.6 tokens/s INFO:__main__:2024-10-27 03:21:42 | Epoch: 1 | Step: 42860 | Dataset: 0-14560901 | Loss: 2.121 | 675 ms/step , 58222.17 GFLOP/s , 533057.2 tokens/s INFO:__main__:2024-10-27 03:21:50 | Epoch: 1 | Step: 42870 | Dataset: 0-14568901 | Loss: 2.096 | 674 ms/step , 58295.41 GFLOP/s , 533197.6 tokens/s INFO:__main__:2024-10-27 03:21:57 | Epoch: 1 | Step: 42880 | Dataset: 0-14576901 | Loss: 2.112 | 675 ms/step , 58217.16 GFLOP/s , 533397.6 tokens/s INFO:__main__:2024-10-27 03:22:05 | Epoch: 1 | Step: 42890 | Dataset: 0-14584901 | Loss: 2.090 | 674 ms/step , 58363.30 GFLOP/s , 533334.2 tokens/s INFO:__main__:2024-10-27 03:22:13 | Epoch: 1 | Step: 42900 | Dataset: 0-14592901 | Loss: 2.176 | 674 ms/step , 58288.96 GFLOP/s , 533749.9 tokens/s INFO:__main__:2024-10-27 03:22:21 | Epoch: 1 | Step: 42910 | Dataset: 0-14600901 | Loss: 2.105 | 674 ms/step , 58328.60 GFLOP/s , 533162.6 tokens/s INFO:__main__:2024-10-27 03:22:28 | Epoch: 1 | Step: 42920 | Dataset: 0-14608901 | Loss: 2.220 | 676 ms/step , 58181.07 GFLOP/s , 533394.8 tokens/s INFO:__main__:2024-10-27 03:22:36 | Epoch: 1 | Step: 42930 | Dataset: 0-14616901 | Loss: 2.198 | 675 ms/step , 58206.20 GFLOP/s , 533235.2 tokens/s INFO:__main__:2024-10-27 03:22:44 | Epoch: 1 | Step: 42940 | Dataset: 0-14624901 | Loss: 2.236 | 674 ms/step , 58335.76 GFLOP/s , 533586.1 tokens/s INFO:__main__:2024-10-27 03:22:51 | Epoch: 1 | Step: 42950 | Dataset: 0-14632901 | Loss: 2.198 | 674 ms/step , 58298.10 GFLOP/s , 533446.2 tokens/s INFO:__main__:2024-10-27 03:22:59 | Epoch: 1 | Step: 42960 | Dataset: 0-14640901 | Loss: 2.148 | 675 ms/step , 58278.87 GFLOP/s , 533314.1 tokens/s INFO:__main__:2024-10-27 03:23:07 | Epoch: 1 | Step: 42970 | Dataset: 0-14648901 | Loss: 2.276 | 675 ms/step , 58272.83 GFLOP/s , 531970.6 tokens/s INFO:__main__:2024-10-27 03:23:14 | Epoch: 1 | Step: 42980 | Dataset: 0-14656901 | Loss: 2.183 | 674 ms/step , 58313.95 GFLOP/s , 531780.8 tokens/s INFO:__main__:2024-10-27 03:23:22 | Epoch: 1 | Step: 42990 | Dataset: 0-14664901 | Loss: 2.294 | 675 ms/step , 58236.31 GFLOP/s , 531793.0 tokens/s INFO:__main__:2024-10-27 03:23:29 | Validation | Step: 43000 | Val_loss: 2.044 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 03:23:29 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_032329_step_43000.pt` INFO:__main__:2024-10-27 03:23:31 | Epoch: 1 | Step: 43000 | Dataset: 0-14672901 | Loss: 2.255 | 674 ms/step , 58298.43 GFLOP/s , 478503.5 tokens/s INFO:__main__:2024-10-27 03:23:38 | Epoch: 1 | Step: 43010 | Dataset: 0-14680901 | Loss: 2.235 | 674 ms/step , 58304.50 GFLOP/s , 532496.9 tokens/s INFO:__main__:2024-10-27 03:23:46 | Epoch: 1 | Step: 43020 | Dataset: 0-14688901 | Loss: 2.128 | 674 ms/step , 58294.51 GFLOP/s , 532265.5 tokens/s INFO:__main__:2024-10-27 03:23:54 | Epoch: 1 | Step: 43030 | Dataset: 0-14696901 | Loss: 2.124 | 674 ms/step , 58291.07 GFLOP/s , 531540.4 tokens/s INFO:__main__:2024-10-27 03:24:01 | Epoch: 1 | Step: 43040 | Dataset: 0-14704901 | Loss: 2.259 | 675 ms/step , 58238.06 GFLOP/s , 530747.8 tokens/s INFO:__main__:2024-10-27 03:24:09 | Epoch: 1 | Step: 43050 | Dataset: 0-14712901 | Loss: 2.190 | 674 ms/step , 58300.75 GFLOP/s , 533301.7 tokens/s INFO:__main__:2024-10-27 03:24:17 | Epoch: 1 | Step: 43060 | Dataset: 0-14720901 | Loss: 2.301 | 674 ms/step , 58355.70 GFLOP/s , 532947.2 tokens/s INFO:__main__:2024-10-27 03:24:24 | Epoch: 1 | Step: 43070 | Dataset: 0-14728901 | Loss: 2.174 | 673 ms/step , 58395.26 GFLOP/s , 533629.9 tokens/s INFO:__main__:2024-10-27 03:24:32 | Epoch: 1 | Step: 43080 | Dataset: 0-14736901 | Loss: 2.225 | 673 ms/step , 58403.57 GFLOP/s , 533187.2 tokens/s INFO:__main__:2024-10-27 03:24:40 | Epoch: 1 | Step: 43090 | Dataset: 0-14744901 | Loss: 2.235 | 675 ms/step , 58268.61 GFLOP/s , 533267.4 tokens/s INFO:__main__:2024-10-27 03:24:47 | Epoch: 1 | Step: 43100 | Dataset: 0-14752901 | Loss: 2.258 | 674 ms/step , 58317.23 GFLOP/s , 533247.4 tokens/s INFO:__main__:2024-10-27 03:24:55 | Epoch: 1 | Step: 43110 | Dataset: 0-14760901 | Loss: 2.185 | 674 ms/step , 58341.30 GFLOP/s , 533230.6 tokens/s INFO:__main__:2024-10-27 03:25:03 | Epoch: 1 | Step: 43120 | Dataset: 0-14768901 | Loss: 2.106 | 675 ms/step , 58238.59 GFLOP/s , 533380.4 tokens/s INFO:__main__:2024-10-27 03:25:11 | Epoch: 1 | Step: 43130 | Dataset: 0-14776901 | Loss: 2.262 | 675 ms/step , 58251.61 GFLOP/s , 533339.7 tokens/s INFO:__main__:2024-10-27 03:25:18 | Epoch: 1 | Step: 43140 | Dataset: 0-14784901 | Loss: 2.172 | 674 ms/step , 58288.90 GFLOP/s , 533549.8 tokens/s INFO:__main__:2024-10-27 03:25:26 | Epoch: 1 | Step: 43150 | Dataset: 0-14792901 | Loss: 2.240 | 674 ms/step , 58313.02 GFLOP/s , 533562.8 tokens/s INFO:__main__:2024-10-27 03:25:34 | Epoch: 1 | Step: 43160 | Dataset: 0-14800901 | Loss: 2.174 | 676 ms/step , 58174.65 GFLOP/s , 533723.5 tokens/s INFO:__main__:2024-10-27 03:25:41 | Epoch: 1 | Step: 43170 | Dataset: 0-14808901 | Loss: 2.194 | 674 ms/step , 58325.35 GFLOP/s , 533510.3 tokens/s INFO:__main__:2024-10-27 03:25:49 | Epoch: 1 | Step: 43180 | Dataset: 0-14816901 | Loss: 2.290 | 674 ms/step , 58295.28 GFLOP/s , 533778.2 tokens/s INFO:__main__:2024-10-27 03:25:57 | Epoch: 1 | Step: 43190 | Dataset: 0-14824901 | Loss: 2.179 | 674 ms/step , 58295.58 GFLOP/s , 533624.7 tokens/s INFO:__main__:2024-10-27 03:26:04 | Epoch: 1 | Step: 43200 | Dataset: 0-14832901 | Loss: 2.182 | 674 ms/step , 58358.44 GFLOP/s , 533301.3 tokens/s INFO:__main__:2024-10-27 03:26:12 | Epoch: 1 | Step: 43210 | Dataset: 0-14840901 | Loss: 2.191 | 675 ms/step , 58263.44 GFLOP/s , 533555.0 tokens/s INFO:__main__:2024-10-27 03:26:20 | Epoch: 1 | Step: 43220 | Dataset: 0-14848901 | Loss: 2.235 | 675 ms/step , 58261.30 GFLOP/s , 533297.7 tokens/s INFO:__main__:2024-10-27 03:26:27 | Epoch: 1 | Step: 43230 | Dataset: 0-14856901 | Loss: 2.181 | 675 ms/step , 58267.24 GFLOP/s , 533443.2 tokens/s INFO:__main__:2024-10-27 03:26:35 | Epoch: 1 | Step: 43240 | Dataset: 0-14864901 | Loss: 2.165 | 674 ms/step , 58284.96 GFLOP/s , 533151.9 tokens/s INFO:__main__:2024-10-27 03:26:43 | Epoch: 1 | Step: 43250 | Dataset: 0-14872901 | Loss: 2.094 | 674 ms/step , 58317.46 GFLOP/s , 533569.3 tokens/s INFO:__main__:2024-10-27 03:26:50 | Epoch: 1 | Step: 43260 | Dataset: 0-14880901 | Loss: 2.180 | 674 ms/step , 58283.38 GFLOP/s , 533626.9 tokens/s INFO:__main__:2024-10-27 03:26:58 | Epoch: 1 | Step: 43270 | Dataset: 0-14888901 | Loss: 2.175 | 674 ms/step , 58296.57 GFLOP/s , 533764.8 tokens/s INFO:__main__:2024-10-27 03:27:06 | Epoch: 1 | Step: 43280 | Dataset: 0-14896901 | Loss: 2.118 | 674 ms/step , 58293.08 GFLOP/s , 533445.1 tokens/s INFO:__main__:2024-10-27 03:27:13 | Epoch: 1 | Step: 43290 | Dataset: 0-14904901 | Loss: 2.205 | 675 ms/step , 58244.50 GFLOP/s , 533536.2 tokens/s INFO:__main__:2024-10-27 03:27:21 | Epoch: 1 | Step: 43300 | Dataset: 0-14912901 | Loss: 2.130 | 674 ms/step , 58358.16 GFLOP/s , 533900.8 tokens/s INFO:__main__:2024-10-27 03:27:29 | Epoch: 1 | Step: 43310 | Dataset: 0-14920901 | Loss: 2.261 | 674 ms/step , 58300.43 GFLOP/s , 533512.9 tokens/s INFO:__main__:2024-10-27 03:27:36 | Epoch: 1 | Step: 43320 | Dataset: 0-14928901 | Loss: 2.284 | 674 ms/step , 58338.05 GFLOP/s , 533787.8 tokens/s INFO:__main__:2024-10-27 03:27:44 | Epoch: 1 | Step: 43330 | Dataset: 0-14936901 | Loss: 2.182 | 675 ms/step , 58272.94 GFLOP/s , 533458.0 tokens/s INFO:__main__:2024-10-27 03:27:52 | Epoch: 1 | Step: 43340 | Dataset: 0-14944901 | Loss: 2.193 | 673 ms/step , 58444.43 GFLOP/s , 533988.0 tokens/s INFO:__main__:2024-10-27 03:27:59 | Epoch: 1 | Step: 43350 | Dataset: 0-14952901 | Loss: 2.175 | 674 ms/step , 58352.81 GFLOP/s , 533802.8 tokens/s INFO:__main__:2024-10-27 03:28:07 | Epoch: 1 | Step: 43360 | Dataset: 0-14960901 | Loss: 2.118 | 674 ms/step , 58300.16 GFLOP/s , 534027.7 tokens/s INFO:__main__:2024-10-27 03:28:15 | Epoch: 1 | Step: 43370 | Dataset: 0-14968901 | Loss: 2.190 | 673 ms/step , 58405.43 GFLOP/s , 532891.4 tokens/s INFO:__main__:2024-10-27 03:28:22 | Epoch: 1 | Step: 43380 | Dataset: 0-14976901 | Loss: 2.219 | 675 ms/step , 58270.25 GFLOP/s , 533490.3 tokens/s INFO:__main__:2024-10-27 03:28:30 | Epoch: 1 | Step: 43390 | Dataset: 0-14984901 | Loss: 2.263 | 674 ms/step , 58325.83 GFLOP/s , 533324.3 tokens/s INFO:__main__:2024-10-27 03:28:38 | Epoch: 1 | Step: 43400 | Dataset: 0-14992901 | Loss: 2.227 | 674 ms/step , 58302.72 GFLOP/s , 533403.2 tokens/s INFO:__main__:2024-10-27 03:28:45 | Epoch: 1 | Step: 43410 | Dataset: 0-15000901 | Loss: 2.232 | 674 ms/step , 58364.12 GFLOP/s , 533996.2 tokens/s INFO:__main__:2024-10-27 03:28:53 | Epoch: 1 | Step: 43420 | Dataset: 0-15008901 | Loss: 2.174 | 677 ms/step , 58096.08 GFLOP/s , 532235.9 tokens/s INFO:__main__:2024-10-27 03:29:01 | Epoch: 1 | Step: 43430 | Dataset: 0-15016901 | Loss: 2.236 | 675 ms/step , 58269.44 GFLOP/s , 532409.1 tokens/s INFO:__main__:2024-10-27 03:29:09 | Epoch: 1 | Step: 43440 | Dataset: 0-15024901 | Loss: 2.211 | 675 ms/step , 58229.33 GFLOP/s , 532415.3 tokens/s INFO:__main__:2024-10-27 03:29:16 | Epoch: 1 | Step: 43450 | Dataset: 0-15032901 | Loss: 2.267 | 676 ms/step , 58118.23 GFLOP/s , 531797.8 tokens/s INFO:__main__:2024-10-27 03:29:24 | Epoch: 1 | Step: 43460 | Dataset: 0-15040901 | Loss: 2.222 | 676 ms/step , 58144.78 GFLOP/s , 532108.1 tokens/s INFO:__main__:2024-10-27 03:29:32 | Epoch: 1 | Step: 43470 | Dataset: 0-15048901 | Loss: 2.163 | 675 ms/step , 58214.43 GFLOP/s , 532603.3 tokens/s INFO:__main__:2024-10-27 03:29:39 | Epoch: 1 | Step: 43480 | Dataset: 0-15056901 | Loss: 2.178 | 675 ms/step , 58245.07 GFLOP/s , 532327.0 tokens/s INFO:__main__:2024-10-27 03:29:47 | Epoch: 1 | Step: 43490 | Dataset: 0-15064901 | Loss: 2.174 | 674 ms/step , 58286.55 GFLOP/s , 532869.6 tokens/s INFO:__main__:2024-10-27 03:29:55 | Epoch: 1 | Step: 43500 | Dataset: 0-15072901 | Loss: 2.243 | 675 ms/step , 58196.90 GFLOP/s , 532319.2 tokens/s INFO:__main__:2024-10-27 03:30:02 | Epoch: 1 | Step: 43510 | Dataset: 0-15080901 | Loss: 2.209 | 675 ms/step , 58250.69 GFLOP/s , 531008.0 tokens/s INFO:__main__:2024-10-27 03:30:10 | Epoch: 1 | Step: 43520 | Dataset: 0-15088901 | Loss: 2.126 | 675 ms/step , 58210.93 GFLOP/s , 532969.9 tokens/s INFO:__main__:2024-10-27 03:30:18 | Epoch: 1 | Step: 43530 | Dataset: 0-15096901 | Loss: 2.268 | 675 ms/step , 58206.85 GFLOP/s , 532820.7 tokens/s INFO:__main__:2024-10-27 03:30:25 | Epoch: 1 | Step: 43540 | Dataset: 0-15104901 | Loss: 2.217 | 674 ms/step , 58328.25 GFLOP/s , 532693.3 tokens/s INFO:__main__:2024-10-27 03:30:33 | Epoch: 1 | Step: 43550 | Dataset: 0-15112901 | Loss: 2.187 | 675 ms/step , 58236.39 GFLOP/s , 532182.1 tokens/s INFO:__main__:2024-10-27 03:30:41 | Epoch: 1 | Step: 43560 | Dataset: 0-15120901 | Loss: 2.241 | 674 ms/step , 58287.71 GFLOP/s , 533291.8 tokens/s INFO:__main__:2024-10-27 03:30:49 | Epoch: 1 | Step: 43570 | Dataset: 0-15128901 | Loss: 2.208 | 675 ms/step , 58200.69 GFLOP/s , 532544.6 tokens/s INFO:__main__:2024-10-27 03:30:56 | Epoch: 1 | Step: 43580 | Dataset: 0-15136901 | Loss: 2.168 | 674 ms/step , 58335.24 GFLOP/s , 533250.9 tokens/s INFO:__main__:2024-10-27 03:31:04 | Epoch: 1 | Step: 43590 | Dataset: 0-15144901 | Loss: 2.129 | 674 ms/step , 58283.37 GFLOP/s , 533141.1 tokens/s INFO:__main__:2024-10-27 03:31:12 | Epoch: 1 | Step: 43600 | Dataset: 0-15152901 | Loss: 2.225 | 674 ms/step , 58288.56 GFLOP/s , 533082.8 tokens/s INFO:__main__:2024-10-27 03:31:19 | Epoch: 1 | Step: 43610 | Dataset: 0-15160901 | Loss: 2.172 | 674 ms/step , 58296.60 GFLOP/s , 533064.3 tokens/s INFO:__main__:2024-10-27 03:31:27 | Epoch: 1 | Step: 43620 | Dataset: 0-15168901 | Loss: 2.195 | 674 ms/step , 58285.87 GFLOP/s , 532951.0 tokens/s INFO:__main__:2024-10-27 03:31:35 | Epoch: 1 | Step: 43630 | Dataset: 0-15176901 | Loss: 2.184 | 675 ms/step , 58249.14 GFLOP/s , 533099.4 tokens/s INFO:__main__:2024-10-27 03:31:42 | Epoch: 1 | Step: 43640 | Dataset: 0-15184901 | Loss: 2.158 | 674 ms/step , 58326.83 GFLOP/s , 533315.4 tokens/s INFO:__main__:2024-10-27 03:31:50 | Epoch: 1 | Step: 43650 | Dataset: 0-15192901 | Loss: 2.151 | 675 ms/step , 58212.36 GFLOP/s , 533622.1 tokens/s INFO:__main__:2024-10-27 03:31:58 | Epoch: 1 | Step: 43660 | Dataset: 0-15200901 | Loss: 2.174 | 674 ms/step , 58352.12 GFLOP/s , 533285.3 tokens/s INFO:__main__:2024-10-27 03:32:05 | Epoch: 1 | Step: 43670 | Dataset: 0-15208901 | Loss: 2.171 | 675 ms/step , 58263.81 GFLOP/s , 533332.5 tokens/s INFO:__main__:2024-10-27 03:32:13 | Epoch: 1 | Step: 43680 | Dataset: 0-15216901 | Loss: 2.176 | 676 ms/step , 58155.15 GFLOP/s , 532482.8 tokens/s INFO:__main__:2024-10-27 03:32:21 | Epoch: 1 | Step: 43690 | Dataset: 0-15224901 | Loss: 2.095 | 675 ms/step , 58264.86 GFLOP/s , 533345.4 tokens/s INFO:__main__:2024-10-27 03:32:28 | Epoch: 1 | Step: 43700 | Dataset: 0-15232901 | Loss: 2.118 | 676 ms/step , 58181.13 GFLOP/s , 533435.4 tokens/s INFO:__main__:2024-10-27 03:32:36 | Epoch: 1 | Step: 43710 | Dataset: 0-15240901 | Loss: 2.173 | 674 ms/step , 58296.23 GFLOP/s , 533360.6 tokens/s INFO:__main__:2024-10-27 03:32:44 | Epoch: 1 | Step: 43720 | Dataset: 0-15248901 | Loss: 2.050 | 674 ms/step , 58290.08 GFLOP/s , 533439.6 tokens/s INFO:__main__:2024-10-27 03:32:51 | Epoch: 1 | Step: 43730 | Dataset: 0-15256901 | Loss: 2.110 | 674 ms/step , 58319.30 GFLOP/s , 533357.7 tokens/s INFO:__main__:2024-10-27 03:32:59 | Epoch: 1 | Step: 43740 | Dataset: 0-15264901 | Loss: 2.130 | 674 ms/step , 58325.89 GFLOP/s , 533461.8 tokens/s INFO:__main__:2024-10-27 03:33:07 | Epoch: 1 | Step: 43750 | Dataset: 0-15272901 | Loss: 2.187 | 674 ms/step , 58333.64 GFLOP/s , 533412.8 tokens/s INFO:__main__:2024-10-27 03:33:14 | Epoch: 1 | Step: 43760 | Dataset: 0-15280901 | Loss: 2.102 | 675 ms/step , 58196.30 GFLOP/s , 533327.4 tokens/s INFO:__main__:2024-10-27 03:33:22 | Epoch: 1 | Step: 43770 | Dataset: 0-15288901 | Loss: 2.049 | 674 ms/step , 58335.75 GFLOP/s , 532844.1 tokens/s INFO:__main__:2024-10-27 03:33:30 | Epoch: 1 | Step: 43780 | Dataset: 0-15296901 | Loss: 1.827 | 675 ms/step , 58199.22 GFLOP/s , 532788.3 tokens/s INFO:__main__:2024-10-27 03:33:38 | Epoch: 1 | Step: 43790 | Dataset: 0-15304901 | Loss: 1.769 | 674 ms/step , 58319.02 GFLOP/s , 532784.6 tokens/s INFO:__main__:2024-10-27 03:33:45 | Epoch: 1 | Step: 43800 | Dataset: 0-15312901 | Loss: 1.774 | 674 ms/step , 58292.51 GFLOP/s , 532943.2 tokens/s INFO:__main__:2024-10-27 03:33:53 | Epoch: 1 | Step: 43810 | Dataset: 0-15320901 | Loss: 1.726 | 674 ms/step , 58315.94 GFLOP/s , 533118.0 tokens/s INFO:__main__:2024-10-27 03:34:01 | Epoch: 1 | Step: 43820 | Dataset: 0-15328901 | Loss: 1.728 | 674 ms/step , 58343.25 GFLOP/s , 532814.5 tokens/s INFO:__main__:2024-10-27 03:34:08 | Epoch: 1 | Step: 43830 | Dataset: 0-15336901 | Loss: 1.741 | 674 ms/step , 58340.76 GFLOP/s , 533357.6 tokens/s INFO:__main__:2024-10-27 03:34:16 | Epoch: 1 | Step: 43840 | Dataset: 0-15344901 | Loss: 1.705 | 673 ms/step , 58406.74 GFLOP/s , 533366.7 tokens/s INFO:__main__:2024-10-27 03:34:24 | Epoch: 1 | Step: 43850 | Dataset: 0-15352901 | Loss: 1.686 | 674 ms/step , 58302.98 GFLOP/s , 533069.4 tokens/s INFO:__main__:2024-10-27 03:34:31 | Epoch: 1 | Step: 43860 | Dataset: 0-15360901 | Loss: 2.348 | 673 ms/step , 58430.64 GFLOP/s , 533571.7 tokens/s INFO:__main__:2024-10-27 03:34:39 | Epoch: 1 | Step: 43870 | Dataset: 0-15368901 | Loss: 2.269 | 673 ms/step , 58401.06 GFLOP/s , 534385.7 tokens/s INFO:__main__:2024-10-27 03:34:47 | Epoch: 1 | Step: 43880 | Dataset: 0-15376901 | Loss: 2.102 | 674 ms/step , 58285.17 GFLOP/s , 533611.0 tokens/s INFO:__main__:2024-10-27 03:34:54 | Epoch: 1 | Step: 43890 | Dataset: 0-15384901 | Loss: 2.186 | 674 ms/step , 58291.19 GFLOP/s , 533391.8 tokens/s INFO:__main__:2024-10-27 03:35:02 | Epoch: 1 | Step: 43900 | Dataset: 0-15392901 | Loss: 2.192 | 674 ms/step , 58339.65 GFLOP/s , 533161.6 tokens/s INFO:__main__:2024-10-27 03:35:10 | Epoch: 1 | Step: 43910 | Dataset: 0-15400901 | Loss: 2.166 | 674 ms/step , 58290.71 GFLOP/s , 533051.8 tokens/s INFO:__main__:2024-10-27 03:35:17 | Epoch: 1 | Step: 43920 | Dataset: 0-15408901 | Loss: 2.095 | 674 ms/step , 58286.47 GFLOP/s , 533497.5 tokens/s INFO:__main__:2024-10-27 03:35:25 | Epoch: 1 | Step: 43930 | Dataset: 0-15416901 | Loss: 2.180 | 674 ms/step , 58305.73 GFLOP/s , 532810.8 tokens/s INFO:__main__:2024-10-27 03:35:33 | Epoch: 1 | Step: 43940 | Dataset: 0-15424901 | Loss: 2.152 | 675 ms/step , 58268.32 GFLOP/s , 533278.8 tokens/s INFO:__main__:2024-10-27 03:35:40 | Epoch: 1 | Step: 43950 | Dataset: 0-15432901 | Loss: 2.196 | 675 ms/step , 58271.62 GFLOP/s , 532914.7 tokens/s INFO:__main__:2024-10-27 03:35:48 | Epoch: 1 | Step: 43960 | Dataset: 0-15440901 | Loss: 2.109 | 674 ms/step , 58304.01 GFLOP/s , 533318.2 tokens/s INFO:__main__:2024-10-27 03:35:56 | Epoch: 1 | Step: 43970 | Dataset: 0-15448901 | Loss: 2.174 | 674 ms/step , 58341.65 GFLOP/s , 533460.6 tokens/s INFO:__main__:2024-10-27 03:36:03 | Epoch: 1 | Step: 43980 | Dataset: 0-15456901 | Loss: 2.163 | 675 ms/step , 58249.03 GFLOP/s , 533388.4 tokens/s INFO:__main__:2024-10-27 03:36:11 | Epoch: 1 | Step: 43990 | Dataset: 0-15464901 | Loss: 2.150 | 675 ms/step , 58214.04 GFLOP/s , 532972.3 tokens/s INFO:__main__:2024-10-27 03:36:18 | Validation | Step: 44000 | Val_loss: 2.056 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 03:36:18 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_033618_step_44000.pt` INFO:__main__:2024-10-27 03:36:20 | Epoch: 1 | Step: 44000 | Dataset: 0-15472901 | Loss: 2.117 | 674 ms/step , 58332.65 GFLOP/s , 476370.5 tokens/s INFO:__main__:2024-10-27 03:36:27 | Epoch: 1 | Step: 44010 | Dataset: 0-15480901 | Loss: 2.176 | 674 ms/step , 58339.06 GFLOP/s , 533186.7 tokens/s INFO:__main__:2024-10-27 03:36:35 | Epoch: 1 | Step: 44020 | Dataset: 0-15488901 | Loss: 1.899 | 676 ms/step , 58192.80 GFLOP/s , 532764.3 tokens/s INFO:__main__:2024-10-27 03:36:43 | Epoch: 1 | Step: 44030 | Dataset: 0-15496901 | Loss: 1.836 | 675 ms/step , 58240.99 GFLOP/s , 532717.0 tokens/s INFO:__main__:2024-10-27 03:36:51 | Epoch: 1 | Step: 44040 | Dataset: 0-15504901 | Loss: 1.847 | 674 ms/step , 58334.43 GFLOP/s , 533189.4 tokens/s INFO:__main__:2024-10-27 03:36:58 | Epoch: 1 | Step: 44050 | Dataset: 0-15512901 | Loss: 1.843 | 675 ms/step , 58197.35 GFLOP/s , 532865.5 tokens/s INFO:__main__:2024-10-27 03:37:06 | Epoch: 1 | Step: 44060 | Dataset: 0-15520901 | Loss: 1.835 | 674 ms/step , 58295.51 GFLOP/s , 532703.7 tokens/s INFO:__main__:2024-10-27 03:37:14 | Epoch: 1 | Step: 44070 | Dataset: 0-15528901 | Loss: 1.807 | 674 ms/step , 58341.35 GFLOP/s , 532892.2 tokens/s INFO:__main__:2024-10-27 03:37:21 | Epoch: 1 | Step: 44080 | Dataset: 0-15536901 | Loss: 1.795 | 675 ms/step , 58267.26 GFLOP/s , 532697.0 tokens/s INFO:__main__:2024-10-27 03:37:29 | Epoch: 1 | Step: 44090 | Dataset: 0-15544901 | Loss: 1.775 | 675 ms/step , 58223.30 GFLOP/s , 532772.8 tokens/s INFO:__main__:2024-10-27 03:37:37 | Epoch: 1 | Step: 44100 | Dataset: 0-15552901 | Loss: 1.794 | 676 ms/step , 58144.27 GFLOP/s , 532421.1 tokens/s INFO:__main__:2024-10-27 03:37:44 | Epoch: 1 | Step: 44110 | Dataset: 0-15560901 | Loss: 1.802 | 674 ms/step , 58280.32 GFLOP/s , 533045.4 tokens/s INFO:__main__:2024-10-27 03:37:52 | Epoch: 1 | Step: 44120 | Dataset: 0-15568901 | Loss: 1.807 | 674 ms/step , 58303.86 GFLOP/s , 532965.4 tokens/s INFO:__main__:2024-10-27 03:38:00 | Epoch: 1 | Step: 44130 | Dataset: 0-15576901 | Loss: 1.769 | 675 ms/step , 58273.59 GFLOP/s , 532741.0 tokens/s INFO:__main__:2024-10-27 03:38:07 | Epoch: 1 | Step: 44140 | Dataset: 0-15584901 | Loss: 1.766 | 674 ms/step , 58305.19 GFLOP/s , 532586.6 tokens/s INFO:__main__:2024-10-27 03:38:15 | Epoch: 1 | Step: 44150 | Dataset: 0-15592901 | Loss: 1.755 | 675 ms/step , 58224.48 GFLOP/s , 532235.8 tokens/s INFO:__main__:2024-10-27 03:38:23 | Epoch: 1 | Step: 44160 | Dataset: 0-15600901 | Loss: 1.746 | 675 ms/step , 58266.53 GFLOP/s , 533093.6 tokens/s INFO:__main__:2024-10-27 03:38:30 | Epoch: 1 | Step: 44170 | Dataset: 0-15608901 | Loss: 1.757 | 675 ms/step , 58234.62 GFLOP/s , 532423.1 tokens/s INFO:__main__:2024-10-27 03:38:38 | Epoch: 1 | Step: 44180 | Dataset: 0-15616901 | Loss: 1.739 | 675 ms/step , 58209.02 GFLOP/s , 532569.7 tokens/s INFO:__main__:2024-10-27 03:38:46 | Epoch: 1 | Step: 44190 | Dataset: 0-15624901 | Loss: 1.735 | 677 ms/step , 58078.58 GFLOP/s , 531937.3 tokens/s INFO:__main__:2024-10-27 03:38:54 | Epoch: 1 | Step: 44200 | Dataset: 0-15632901 | Loss: 2.308 | 675 ms/step , 58227.79 GFLOP/s , 531813.4 tokens/s INFO:__main__:2024-10-27 03:39:01 | Epoch: 1 | Step: 44210 | Dataset: 0-15640901 | Loss: 2.225 | 678 ms/step , 58002.25 GFLOP/s , 529677.7 tokens/s INFO:__main__:2024-10-27 03:39:09 | Epoch: 1 | Step: 44220 | Dataset: 0-15648901 | Loss: 2.190 | 675 ms/step , 58270.29 GFLOP/s , 533705.6 tokens/s INFO:__main__:2024-10-27 03:39:17 | Epoch: 1 | Step: 44230 | Dataset: 0-15656901 | Loss: 2.147 | 674 ms/step , 58350.01 GFLOP/s , 533602.4 tokens/s INFO:__main__:2024-10-27 03:39:24 | Epoch: 1 | Step: 44240 | Dataset: 0-15664901 | Loss: 2.239 | 675 ms/step , 58245.13 GFLOP/s , 532655.1 tokens/s INFO:__main__:2024-10-27 03:39:32 | Epoch: 1 | Step: 44250 | Dataset: 0-15672901 | Loss: 2.249 | 676 ms/step , 58192.86 GFLOP/s , 533027.1 tokens/s INFO:__main__:2024-10-27 03:39:40 | Epoch: 1 | Step: 44260 | Dataset: 0-15680901 | Loss: 2.158 | 676 ms/step , 58152.20 GFLOP/s , 532837.3 tokens/s INFO:__main__:2024-10-27 03:39:47 | Epoch: 1 | Step: 44270 | Dataset: 0-15688901 | Loss: 2.147 | 675 ms/step , 58235.94 GFLOP/s , 532580.6 tokens/s INFO:__main__:2024-10-27 03:39:55 | Epoch: 1 | Step: 44280 | Dataset: 0-15696901 | Loss: 2.147 | 677 ms/step , 58070.80 GFLOP/s , 532498.2 tokens/s INFO:__main__:2024-10-27 03:40:03 | Epoch: 1 | Step: 44290 | Dataset: 0-15704901 | Loss: 2.225 | 674 ms/step , 58280.48 GFLOP/s , 532937.6 tokens/s INFO:__main__:2024-10-27 03:40:10 | Epoch: 1 | Step: 44300 | Dataset: 0-15712901 | Loss: 2.142 | 675 ms/step , 58247.69 GFLOP/s , 532637.3 tokens/s INFO:__main__:2024-10-27 03:40:18 | Epoch: 1 | Step: 44310 | Dataset: 0-15720901 | Loss: 2.235 | 676 ms/step , 58160.56 GFLOP/s , 532954.9 tokens/s INFO:__main__:2024-10-27 03:40:26 | Epoch: 1 | Step: 44320 | Dataset: 0-15728901 | Loss: 2.151 | 675 ms/step , 58193.55 GFLOP/s , 532674.6 tokens/s INFO:__main__:2024-10-27 03:40:34 | Epoch: 1 | Step: 44330 | Dataset: 0-15736901 | Loss: 2.169 | 675 ms/step , 58254.96 GFLOP/s , 532536.0 tokens/s INFO:__main__:2024-10-27 03:40:41 | Epoch: 1 | Step: 44340 | Dataset: 0-15744901 | Loss: 2.104 | 675 ms/step , 58211.42 GFLOP/s , 532389.3 tokens/s INFO:__main__:2024-10-27 03:40:49 | Epoch: 1 | Step: 44350 | Dataset: 0-15752901 | Loss: 2.159 | 674 ms/step , 58343.71 GFLOP/s , 532848.8 tokens/s INFO:__main__:2024-10-27 03:40:57 | Epoch: 1 | Step: 44360 | Dataset: 0-15760901 | Loss: 1.764 | 674 ms/step , 58337.41 GFLOP/s , 533233.3 tokens/s INFO:__main__:2024-10-27 03:41:04 | Epoch: 1 | Step: 44370 | Dataset: 0-15768901 | Loss: 1.743 | 675 ms/step , 58195.47 GFLOP/s , 532212.0 tokens/s INFO:__main__:2024-10-27 03:41:12 | Epoch: 1 | Step: 44380 | Dataset: 0-15776901 | Loss: 1.697 | 674 ms/step , 58286.51 GFLOP/s , 532597.8 tokens/s INFO:__main__:2024-10-27 03:41:20 | Epoch: 1 | Step: 44390 | Dataset: 0-15784901 | Loss: 1.668 | 676 ms/step , 58128.26 GFLOP/s , 531814.2 tokens/s INFO:__main__:2024-10-27 03:41:27 | Epoch: 1 | Step: 44400 | Dataset: 0-15792901 | Loss: 1.681 | 675 ms/step , 58211.95 GFLOP/s , 531466.0 tokens/s INFO:__main__:2024-10-27 03:41:35 | Epoch: 1 | Step: 44410 | Dataset: 0-15800901 | Loss: 1.658 | 676 ms/step , 58173.93 GFLOP/s , 531282.9 tokens/s INFO:__main__:2024-10-27 03:41:43 | Epoch: 1 | Step: 44420 | Dataset: 0-15808901 | Loss: 1.655 | 675 ms/step , 58210.14 GFLOP/s , 531009.2 tokens/s INFO:__main__:2024-10-27 03:41:51 | Epoch: 1 | Step: 44430 | Dataset: 0-15816901 | Loss: 1.676 | 675 ms/step , 58243.69 GFLOP/s , 531627.8 tokens/s INFO:__main__:2024-10-27 03:41:58 | Epoch: 1 | Step: 44440 | Dataset: 0-15824901 | Loss: 1.649 | 675 ms/step , 58208.64 GFLOP/s , 532015.5 tokens/s INFO:__main__:2024-10-27 03:42:06 | Epoch: 1 | Step: 44450 | Dataset: 0-15832901 | Loss: 2.298 | 675 ms/step , 58212.06 GFLOP/s , 531102.4 tokens/s INFO:__main__:2024-10-27 03:42:14 | Epoch: 1 | Step: 44460 | Dataset: 0-15840901 | Loss: 2.177 | 675 ms/step , 58211.77 GFLOP/s , 529213.7 tokens/s INFO:__main__:2024-10-27 03:42:21 | Epoch: 1 | Step: 44470 | Dataset: 0-15848901 | Loss: 2.221 | 675 ms/step , 58258.31 GFLOP/s , 533370.1 tokens/s INFO:__main__:2024-10-27 03:42:29 | Epoch: 1 | Step: 44480 | Dataset: 0-15856901 | Loss: 2.201 | 673 ms/step , 58378.43 GFLOP/s , 533194.3 tokens/s INFO:__main__:2024-10-27 03:42:37 | Epoch: 1 | Step: 44490 | Dataset: 0-15864901 | Loss: 2.208 | 673 ms/step , 58376.86 GFLOP/s , 533541.5 tokens/s INFO:__main__:2024-10-27 03:42:44 | Epoch: 1 | Step: 44500 | Dataset: 0-15872901 | Loss: 2.203 | 673 ms/step , 58414.07 GFLOP/s , 533369.4 tokens/s INFO:__main__:2024-10-27 03:42:52 | Epoch: 1 | Step: 44510 | Dataset: 0-15880901 | Loss: 2.198 | 673 ms/step , 58382.43 GFLOP/s , 533230.6 tokens/s INFO:__main__:2024-10-27 03:43:00 | Epoch: 1 | Step: 44520 | Dataset: 0-15888901 | Loss: 2.213 | 675 ms/step , 58251.94 GFLOP/s , 532110.7 tokens/s INFO:__main__:2024-10-27 03:43:07 | Epoch: 1 | Step: 44530 | Dataset: 0-15896901 | Loss: 2.142 | 674 ms/step , 58291.16 GFLOP/s , 532501.4 tokens/s INFO:__main__:2024-10-27 03:43:15 | Epoch: 1 | Step: 44540 | Dataset: 0-15904901 | Loss: 2.081 | 673 ms/step , 58410.77 GFLOP/s , 533404.8 tokens/s INFO:__main__:2024-10-27 03:43:23 | Epoch: 1 | Step: 44550 | Dataset: 0-15912901 | Loss: 2.161 | 675 ms/step , 58267.32 GFLOP/s , 533288.5 tokens/s INFO:__main__:2024-10-27 03:43:30 | Epoch: 1 | Step: 44560 | Dataset: 0-15920901 | Loss: 2.206 | 675 ms/step , 58260.14 GFLOP/s , 532839.3 tokens/s INFO:__main__:2024-10-27 03:43:38 | Epoch: 1 | Step: 44570 | Dataset: 0-15928901 | Loss: 2.214 | 675 ms/step , 58247.20 GFLOP/s , 532592.0 tokens/s INFO:__main__:2024-10-27 03:43:46 | Epoch: 1 | Step: 44580 | Dataset: 0-15936901 | Loss: 2.211 | 674 ms/step , 58300.88 GFLOP/s , 532465.7 tokens/s INFO:__main__:2024-10-27 03:43:54 | Epoch: 1 | Step: 44590 | Dataset: 0-15944901 | Loss: 2.187 | 675 ms/step , 58226.29 GFLOP/s , 532152.6 tokens/s INFO:__main__:2024-10-27 03:44:01 | Epoch: 1 | Step: 44600 | Dataset: 0-15952901 | Loss: 2.097 | 673 ms/step , 58408.42 GFLOP/s , 532106.0 tokens/s INFO:__main__:2024-10-27 03:44:09 | Epoch: 1 | Step: 44610 | Dataset: 0-15960901 | Loss: 2.144 | 674 ms/step , 58284.06 GFLOP/s , 532135.3 tokens/s INFO:__main__:2024-10-27 03:44:17 | Epoch: 1 | Step: 44620 | Dataset: 0-15968901 | Loss: 2.238 | 675 ms/step , 58232.27 GFLOP/s , 531988.2 tokens/s INFO:__main__:2024-10-27 03:44:24 | Epoch: 1 | Step: 44630 | Dataset: 0-15976901 | Loss: 2.282 | 673 ms/step , 58411.92 GFLOP/s , 532484.5 tokens/s INFO:__main__:2024-10-27 03:44:32 | Epoch: 1 | Step: 44640 | Dataset: 0-15984901 | Loss: 2.184 | 673 ms/step , 58390.98 GFLOP/s , 533269.4 tokens/s INFO:__main__:2024-10-27 03:44:40 | Epoch: 1 | Step: 44650 | Dataset: 0-15992901 | Loss: 2.173 | 677 ms/step , 58055.66 GFLOP/s , 531177.0 tokens/s INFO:__main__:2024-10-27 03:44:47 | Epoch: 1 | Step: 44660 | Dataset: 0-16000901 | Loss: 2.144 | 677 ms/step , 58067.08 GFLOP/s , 529995.8 tokens/s INFO:__main__:2024-10-27 03:44:55 | Epoch: 1 | Step: 44670 | Dataset: 0-16008901 | Loss: 2.176 | 676 ms/step , 58176.75 GFLOP/s , 531717.5 tokens/s INFO:__main__:2024-10-27 03:45:03 | Epoch: 1 | Step: 44680 | Dataset: 0-16016901 | Loss: 2.081 | 674 ms/step , 58285.30 GFLOP/s , 532448.7 tokens/s INFO:__main__:2024-10-27 03:45:11 | Epoch: 1 | Step: 44690 | Dataset: 0-16024901 | Loss: 2.057 | 674 ms/step , 58323.44 GFLOP/s , 533447.7 tokens/s INFO:__main__:2024-10-27 03:45:18 | Epoch: 1 | Step: 44700 | Dataset: 0-16032901 | Loss: 2.150 | 674 ms/step , 58345.73 GFLOP/s , 533053.2 tokens/s INFO:__main__:2024-10-27 03:45:26 | Epoch: 1 | Step: 44710 | Dataset: 0-16040901 | Loss: 2.145 | 674 ms/step , 58364.66 GFLOP/s , 533308.8 tokens/s INFO:__main__:2024-10-27 03:45:34 | Epoch: 1 | Step: 44720 | Dataset: 0-16048901 | Loss: 2.206 | 675 ms/step , 58254.44 GFLOP/s , 533151.9 tokens/s INFO:__main__:2024-10-27 03:45:41 | Epoch: 1 | Step: 44730 | Dataset: 0-16056901 | Loss: 2.175 | 675 ms/step , 58233.73 GFLOP/s , 533260.6 tokens/s INFO:__main__:2024-10-27 03:45:49 | Epoch: 1 | Step: 44740 | Dataset: 0-16064901 | Loss: 2.189 | 675 ms/step , 58270.15 GFLOP/s , 533288.6 tokens/s INFO:__main__:2024-10-27 03:45:57 | Epoch: 1 | Step: 44750 | Dataset: 0-16072901 | Loss: 2.128 | 675 ms/step , 58242.12 GFLOP/s , 532998.5 tokens/s INFO:__main__:2024-10-27 03:46:04 | Epoch: 1 | Step: 44760 | Dataset: 0-16080901 | Loss: 2.159 | 674 ms/step , 58289.70 GFLOP/s , 533004.9 tokens/s INFO:__main__:2024-10-27 03:46:12 | Epoch: 1 | Step: 44770 | Dataset: 0-16088901 | Loss: 2.129 | 675 ms/step , 58229.86 GFLOP/s , 533005.9 tokens/s INFO:__main__:2024-10-27 03:46:20 | Epoch: 1 | Step: 44780 | Dataset: 0-16096901 | Loss: 2.205 | 675 ms/step , 58219.40 GFLOP/s , 533085.5 tokens/s INFO:__main__:2024-10-27 03:46:27 | Epoch: 1 | Step: 44790 | Dataset: 0-16104901 | Loss: 2.191 | 676 ms/step , 58150.92 GFLOP/s , 532271.5 tokens/s INFO:__main__:2024-10-27 03:46:35 | Epoch: 1 | Step: 44800 | Dataset: 0-16112901 | Loss: 2.123 | 675 ms/step , 58238.43 GFLOP/s , 532577.6 tokens/s INFO:__main__:2024-10-27 03:46:43 | Epoch: 1 | Step: 44810 | Dataset: 0-16120901 | Loss: 2.129 | 675 ms/step , 58258.52 GFLOP/s , 532397.4 tokens/s INFO:__main__:2024-10-27 03:46:50 | Epoch: 1 | Step: 44820 | Dataset: 0-16128901 | Loss: 2.149 | 674 ms/step , 58288.66 GFLOP/s , 533203.9 tokens/s INFO:__main__:2024-10-27 03:46:58 | Epoch: 1 | Step: 44830 | Dataset: 0-16136901 | Loss: 2.136 | 675 ms/step , 58261.01 GFLOP/s , 532924.1 tokens/s INFO:__main__:2024-10-27 03:47:06 | Epoch: 1 | Step: 44840 | Dataset: 0-16144901 | Loss: 2.077 | 674 ms/step , 58306.67 GFLOP/s , 532975.3 tokens/s INFO:__main__:2024-10-27 03:47:14 | Epoch: 1 | Step: 44850 | Dataset: 0-16152901 | Loss: 2.168 | 675 ms/step , 58227.77 GFLOP/s , 532550.8 tokens/s INFO:__main__:2024-10-27 03:47:21 | Epoch: 1 | Step: 44860 | Dataset: 0-16160901 | Loss: 2.198 | 675 ms/step , 58269.70 GFLOP/s , 532126.6 tokens/s INFO:__main__:2024-10-27 03:47:29 | Epoch: 1 | Step: 44870 | Dataset: 0-16168901 | Loss: 2.180 | 675 ms/step , 58210.90 GFLOP/s , 532536.3 tokens/s INFO:__main__:2024-10-27 03:47:37 | Epoch: 1 | Step: 44880 | Dataset: 0-16176901 | Loss: 2.111 | 674 ms/step , 58308.88 GFLOP/s , 532904.6 tokens/s INFO:__main__:2024-10-27 03:47:44 | Epoch: 1 | Step: 44890 | Dataset: 0-16184901 | Loss: 2.186 | 676 ms/step , 58167.47 GFLOP/s , 532497.1 tokens/s INFO:__main__:2024-10-27 03:47:52 | Epoch: 1 | Step: 44900 | Dataset: 0-16192901 | Loss: 2.191 | 675 ms/step , 58200.89 GFLOP/s , 532134.8 tokens/s INFO:__main__:2024-10-27 03:48:00 | Epoch: 1 | Step: 44910 | Dataset: 0-16200901 | Loss: 2.165 | 675 ms/step , 58252.49 GFLOP/s , 531870.2 tokens/s INFO:__main__:2024-10-27 03:48:07 | Epoch: 1 | Step: 44920 | Dataset: 0-16208901 | Loss: 2.118 | 675 ms/step , 58276.01 GFLOP/s , 532465.4 tokens/s INFO:__main__:2024-10-27 03:48:15 | Epoch: 1 | Step: 44930 | Dataset: 0-16216901 | Loss: 2.116 | 675 ms/step , 58241.50 GFLOP/s , 533119.6 tokens/s INFO:__main__:2024-10-27 03:48:23 | Epoch: 1 | Step: 44940 | Dataset: 0-16224901 | Loss: 2.239 | 678 ms/step , 58002.92 GFLOP/s , 532307.4 tokens/s INFO:__main__:2024-10-27 03:48:30 | Epoch: 1 | Step: 44950 | Dataset: 0-16232901 | Loss: 2.180 | 675 ms/step , 58243.25 GFLOP/s , 532248.6 tokens/s INFO:__main__:2024-10-27 03:48:38 | Epoch: 1 | Step: 44960 | Dataset: 0-16240901 | Loss: 2.173 | 675 ms/step , 58203.83 GFLOP/s , 532362.1 tokens/s INFO:__main__:2024-10-27 03:48:46 | Epoch: 1 | Step: 44970 | Dataset: 0-16248901 | Loss: 2.152 | 675 ms/step , 58209.51 GFLOP/s , 532455.4 tokens/s INFO:__main__:2024-10-27 03:48:54 | Epoch: 1 | Step: 44980 | Dataset: 0-16256901 | Loss: 2.152 | 676 ms/step , 58152.47 GFLOP/s , 532461.5 tokens/s INFO:__main__:2024-10-27 03:49:01 | Epoch: 1 | Step: 44990 | Dataset: 0-16264901 | Loss: 2.198 | 675 ms/step , 58233.05 GFLOP/s , 532194.0 tokens/s INFO:__main__:2024-10-27 03:49:08 | Validation | Step: 45000 | Val_loss: 2.185 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 03:49:08 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_034908_step_45000.pt` INFO:__main__:2024-10-27 03:49:10 | Epoch: 1 | Step: 45000 | Dataset: 0-16272901 | Loss: 2.159 | 674 ms/step , 58326.70 GFLOP/s , 478466.6 tokens/s INFO:__main__:2024-10-27 03:49:17 | Epoch: 1 | Step: 45010 | Dataset: 0-16280901 | Loss: 2.113 | 675 ms/step , 58266.61 GFLOP/s , 532043.0 tokens/s INFO:__main__:2024-10-27 03:49:25 | Epoch: 1 | Step: 45020 | Dataset: 0-16288901 | Loss: 2.069 | 676 ms/step , 58181.36 GFLOP/s , 531834.0 tokens/s INFO:__main__:2024-10-27 03:49:33 | Epoch: 1 | Step: 45030 | Dataset: 0-16296901 | Loss: 2.151 | 678 ms/step , 58011.61 GFLOP/s , 530939.6 tokens/s INFO:__main__:2024-10-27 03:49:41 | Epoch: 1 | Step: 45040 | Dataset: 0-16304901 | Loss: 2.192 | 674 ms/step , 58284.93 GFLOP/s , 531858.0 tokens/s INFO:__main__:2024-10-27 03:49:48 | Epoch: 1 | Step: 45050 | Dataset: 0-16312901 | Loss: 2.129 | 676 ms/step , 58169.61 GFLOP/s , 532142.4 tokens/s INFO:__main__:2024-10-27 03:49:56 | Epoch: 1 | Step: 45060 | Dataset: 0-16320901 | Loss: 2.106 | 675 ms/step , 58212.31 GFLOP/s , 532282.5 tokens/s INFO:__main__:2024-10-27 03:50:04 | Epoch: 1 | Step: 45070 | Dataset: 0-16328901 | Loss: 2.149 | 675 ms/step , 58224.61 GFLOP/s , 532132.4 tokens/s INFO:__main__:2024-10-27 03:50:11 | Epoch: 1 | Step: 45080 | Dataset: 0-16336901 | Loss: 2.201 | 675 ms/step , 58234.01 GFLOP/s , 532036.2 tokens/s INFO:__main__:2024-10-27 03:50:19 | Epoch: 1 | Step: 45090 | Dataset: 0-16344901 | Loss: 1.848 | 674 ms/step , 58320.43 GFLOP/s , 531665.5 tokens/s INFO:__main__:2024-10-27 03:50:27 | Epoch: 1 | Step: 45100 | Dataset: 0-16352901 | Loss: 1.741 | 673 ms/step , 58372.74 GFLOP/s , 532275.8 tokens/s INFO:__main__:2024-10-27 03:50:34 | Epoch: 1 | Step: 45110 | Dataset: 0-16360901 | Loss: 1.714 | 677 ms/step , 58088.22 GFLOP/s , 532067.2 tokens/s INFO:__main__:2024-10-27 03:50:42 | Epoch: 1 | Step: 45120 | Dataset: 0-16368901 | Loss: 1.704 | 677 ms/step , 58088.43 GFLOP/s , 530591.0 tokens/s INFO:__main__:2024-10-27 03:50:50 | Epoch: 1 | Step: 45130 | Dataset: 0-16376901 | Loss: 1.684 | 675 ms/step , 58207.88 GFLOP/s , 531863.8 tokens/s INFO:__main__:2024-10-27 03:50:58 | Epoch: 1 | Step: 45140 | Dataset: 0-16384901 | Loss: 1.692 | 677 ms/step , 58094.67 GFLOP/s , 531414.5 tokens/s INFO:__main__:2024-10-27 03:51:05 | Epoch: 1 | Step: 45150 | Dataset: 0-16392901 | Loss: 1.684 | 675 ms/step , 58228.68 GFLOP/s , 532187.5 tokens/s INFO:__main__:2024-10-27 03:51:13 | Epoch: 1 | Step: 45160 | Dataset: 0-16400901 | Loss: 1.657 | 674 ms/step , 58336.85 GFLOP/s , 532934.8 tokens/s INFO:__main__:2024-10-27 03:51:21 | Epoch: 1 | Step: 45170 | Dataset: 0-16408901 | Loss: 1.690 | 674 ms/step , 58289.43 GFLOP/s , 532429.2 tokens/s INFO:__main__:2024-10-27 03:51:28 | Epoch: 1 | Step: 45180 | Dataset: 0-16416901 | Loss: 2.201 | 674 ms/step , 58341.24 GFLOP/s , 532437.1 tokens/s INFO:__main__:2024-10-27 03:51:36 | Epoch: 1 | Step: 45190 | Dataset: 0-16424901 | Loss: 2.160 | 675 ms/step , 58193.61 GFLOP/s , 532609.2 tokens/s INFO:__main__:2024-10-27 03:51:44 | Epoch: 1 | Step: 45200 | Dataset: 0-16432901 | Loss: 2.124 | 674 ms/step , 58310.13 GFLOP/s , 532801.8 tokens/s INFO:__main__:2024-10-27 03:51:51 | Epoch: 1 | Step: 45210 | Dataset: 0-16440901 | Loss: 2.192 | 675 ms/step , 58250.38 GFLOP/s , 532719.7 tokens/s INFO:__main__:2024-10-27 03:51:59 | Epoch: 1 | Step: 45220 | Dataset: 0-16448901 | Loss: 2.148 | 674 ms/step , 58287.49 GFLOP/s , 532702.9 tokens/s INFO:__main__:2024-10-27 03:52:07 | Epoch: 1 | Step: 45230 | Dataset: 0-16456901 | Loss: 2.221 | 674 ms/step , 58304.56 GFLOP/s , 532162.1 tokens/s INFO:__main__:2024-10-27 03:52:15 | Epoch: 1 | Step: 45240 | Dataset: 0-16464901 | Loss: 2.161 | 674 ms/step , 58283.15 GFLOP/s , 532533.3 tokens/s INFO:__main__:2024-10-27 03:52:22 | Epoch: 1 | Step: 45250 | Dataset: 0-16472901 | Loss: 2.167 | 675 ms/step , 58243.19 GFLOP/s , 532752.1 tokens/s INFO:__main__:2024-10-27 03:52:30 | Epoch: 1 | Step: 45260 | Dataset: 0-16480901 | Loss: 2.128 | 674 ms/step , 58307.34 GFLOP/s , 532317.7 tokens/s INFO:__main__:2024-10-27 03:52:38 | Epoch: 1 | Step: 45270 | Dataset: 0-16488901 | Loss: 2.102 | 674 ms/step , 58288.58 GFLOP/s , 532638.4 tokens/s INFO:__main__:2024-10-27 03:52:45 | Epoch: 1 | Step: 45280 | Dataset: 0-16496901 | Loss: 2.099 | 675 ms/step , 58215.64 GFLOP/s , 532481.2 tokens/s INFO:__main__:2024-10-27 03:52:53 | Epoch: 1 | Step: 45290 | Dataset: 0-16504901 | Loss: 2.085 | 675 ms/step , 58262.16 GFLOP/s , 532402.7 tokens/s INFO:__main__:2024-10-27 03:53:01 | Epoch: 1 | Step: 45300 | Dataset: 0-16512901 | Loss: 2.151 | 676 ms/step , 58158.33 GFLOP/s , 532469.8 tokens/s INFO:__main__:2024-10-27 03:53:08 | Epoch: 1 | Step: 45310 | Dataset: 0-16520901 | Loss: 2.108 | 675 ms/step , 58209.01 GFLOP/s , 532523.4 tokens/s INFO:__main__:2024-10-27 03:53:16 | Epoch: 1 | Step: 45320 | Dataset: 0-16528901 | Loss: 2.205 | 674 ms/step , 58310.29 GFLOP/s , 532392.0 tokens/s INFO:__main__:2024-10-27 03:53:24 | Epoch: 1 | Step: 45330 | Dataset: 0-16536901 | Loss: 2.139 | 675 ms/step , 58208.06 GFLOP/s , 532484.4 tokens/s INFO:__main__:2024-10-27 03:53:31 | Epoch: 1 | Step: 45340 | Dataset: 0-16544901 | Loss: 2.136 | 674 ms/step , 58365.25 GFLOP/s , 532893.7 tokens/s INFO:__main__:2024-10-27 03:53:39 | Epoch: 1 | Step: 45350 | Dataset: 0-16552901 | Loss: 2.162 | 674 ms/step , 58342.29 GFLOP/s , 533058.8 tokens/s INFO:__main__:2024-10-27 03:53:47 | Epoch: 1 | Step: 45360 | Dataset: 0-16560901 | Loss: 2.112 | 674 ms/step , 58319.04 GFLOP/s , 532902.5 tokens/s INFO:__main__:2024-10-27 03:53:55 | Epoch: 1 | Step: 45370 | Dataset: 0-16568901 | Loss: 2.160 | 675 ms/step , 58265.79 GFLOP/s , 532308.1 tokens/s INFO:__main__:2024-10-27 03:54:02 | Epoch: 1 | Step: 45380 | Dataset: 0-16576901 | Loss: 2.081 | 674 ms/step , 58308.13 GFLOP/s , 532913.3 tokens/s INFO:__main__:2024-10-27 03:54:10 | Epoch: 1 | Step: 45390 | Dataset: 0-16584901 | Loss: 2.069 | 674 ms/step , 58286.01 GFLOP/s , 532304.8 tokens/s INFO:__main__:2024-10-27 03:54:18 | Epoch: 1 | Step: 45400 | Dataset: 0-16592901 | Loss: 2.153 | 674 ms/step , 58300.96 GFLOP/s , 532442.6 tokens/s INFO:__main__:2024-10-27 03:54:25 | Epoch: 1 | Step: 45410 | Dataset: 0-16600901 | Loss: 2.172 | 674 ms/step , 58292.37 GFLOP/s , 532471.7 tokens/s INFO:__main__:2024-10-27 03:54:33 | Epoch: 1 | Step: 45420 | Dataset: 0-16608901 | Loss: 2.097 | 675 ms/step , 58254.55 GFLOP/s , 532387.0 tokens/s INFO:__main__:2024-10-27 03:54:41 | Epoch: 1 | Step: 45430 | Dataset: 0-16616901 | Loss: 2.164 | 674 ms/step , 58337.22 GFLOP/s , 532550.7 tokens/s INFO:__main__:2024-10-27 03:54:48 | Epoch: 1 | Step: 45440 | Dataset: 0-16624901 | Loss: 2.182 | 674 ms/step , 58282.72 GFLOP/s , 533161.3 tokens/s INFO:__main__:2024-10-27 03:54:56 | Epoch: 1 | Step: 45450 | Dataset: 0-16632901 | Loss: 2.111 | 675 ms/step , 58228.72 GFLOP/s , 532383.6 tokens/s INFO:__main__:2024-10-27 03:55:04 | Epoch: 1 | Step: 45460 | Dataset: 0-16640901 | Loss: 2.233 | 674 ms/step , 58296.60 GFLOP/s , 532475.8 tokens/s INFO:__main__:2024-10-27 03:55:11 | Epoch: 1 | Step: 45470 | Dataset: 0-16648901 | Loss: 2.231 | 675 ms/step , 58227.06 GFLOP/s , 532379.3 tokens/s INFO:__main__:2024-10-27 03:55:19 | Epoch: 1 | Step: 45480 | Dataset: 0-16656901 | Loss: 2.033 | 675 ms/step , 58262.99 GFLOP/s , 532601.8 tokens/s INFO:__main__:2024-10-27 03:55:27 | Epoch: 1 | Step: 45490 | Dataset: 0-16664901 | Loss: 2.118 | 675 ms/step , 58259.56 GFLOP/s , 532760.6 tokens/s INFO:__main__:2024-10-27 03:55:35 | Epoch: 1 | Step: 45500 | Dataset: 0-16672901 | Loss: 1.769 | 675 ms/step , 58279.15 GFLOP/s , 532302.5 tokens/s INFO:__main__:2024-10-27 03:55:42 | Epoch: 1 | Step: 45510 | Dataset: 0-16680901 | Loss: 1.762 | 675 ms/step , 58271.63 GFLOP/s , 532223.6 tokens/s INFO:__main__:2024-10-27 03:55:50 | Epoch: 1 | Step: 45520 | Dataset: 0-16688901 | Loss: 1.712 | 675 ms/step , 58241.79 GFLOP/s , 532026.8 tokens/s INFO:__main__:2024-10-27 03:55:58 | Epoch: 1 | Step: 45530 | Dataset: 0-16696901 | Loss: 1.730 | 675 ms/step , 58248.55 GFLOP/s , 532155.6 tokens/s INFO:__main__:2024-10-27 03:56:05 | Epoch: 1 | Step: 45540 | Dataset: 0-16704901 | Loss: 1.721 | 676 ms/step , 58144.25 GFLOP/s , 531129.6 tokens/s INFO:__main__:2024-10-27 03:56:13 | Epoch: 1 | Step: 45550 | Dataset: 0-16712901 | Loss: 1.701 | 676 ms/step , 58169.80 GFLOP/s , 530964.8 tokens/s INFO:__main__:2024-10-27 03:56:21 | Epoch: 1 | Step: 45560 | Dataset: 0-16720901 | Loss: 1.709 | 676 ms/step , 58176.49 GFLOP/s , 530374.7 tokens/s INFO:__main__:2024-10-27 03:56:28 | Epoch: 1 | Step: 45570 | Dataset: 0-16728901 | Loss: 1.669 | 674 ms/step , 58279.26 GFLOP/s , 531186.8 tokens/s INFO:__main__:2024-10-27 03:56:36 | Epoch: 1 | Step: 45580 | Dataset: 0-16736901 | Loss: 2.408 | 675 ms/step , 58266.17 GFLOP/s , 531886.2 tokens/s INFO:__main__:2024-10-27 03:56:44 | Epoch: 1 | Step: 45590 | Dataset: 0-16744901 | Loss: 2.168 | 676 ms/step , 58162.37 GFLOP/s , 531623.7 tokens/s INFO:__main__:2024-10-27 03:56:52 | Epoch: 1 | Step: 45600 | Dataset: 0-16752901 | Loss: 2.179 | 675 ms/step , 58211.78 GFLOP/s , 531491.7 tokens/s INFO:__main__:2024-10-27 03:56:59 | Epoch: 1 | Step: 45610 | Dataset: 0-16760901 | Loss: 2.200 | 674 ms/step , 58281.54 GFLOP/s , 531692.9 tokens/s INFO:__main__:2024-10-27 03:57:07 | Epoch: 1 | Step: 45620 | Dataset: 0-16768901 | Loss: 2.117 | 676 ms/step , 58163.82 GFLOP/s , 531992.0 tokens/s INFO:__main__:2024-10-27 03:57:15 | Epoch: 1 | Step: 45630 | Dataset: 0-16776901 | Loss: 2.149 | 677 ms/step , 58071.26 GFLOP/s , 530719.5 tokens/s INFO:__main__:2024-10-27 03:57:22 | Epoch: 1 | Step: 45640 | Dataset: 0-16784901 | Loss: 2.204 | 676 ms/step , 58189.66 GFLOP/s , 530712.2 tokens/s INFO:__main__:2024-10-27 03:57:30 | Epoch: 1 | Step: 45650 | Dataset: 0-16792901 | Loss: 2.097 | 675 ms/step , 58194.44 GFLOP/s , 531740.0 tokens/s INFO:__main__:2024-10-27 03:57:38 | Epoch: 1 | Step: 45660 | Dataset: 0-16800901 | Loss: 2.079 | 676 ms/step , 58175.29 GFLOP/s , 531871.8 tokens/s INFO:__main__:2024-10-27 03:57:46 | Epoch: 1 | Step: 45670 | Dataset: 0-16808901 | Loss: 2.162 | 676 ms/step , 58177.96 GFLOP/s , 532376.8 tokens/s INFO:__main__:2024-10-27 03:57:53 | Epoch: 1 | Step: 45680 | Dataset: 0-16816901 | Loss: 2.146 | 675 ms/step , 58202.04 GFLOP/s , 532225.0 tokens/s INFO:__main__:2024-10-27 03:58:01 | Epoch: 1 | Step: 45690 | Dataset: 0-16824901 | Loss: 2.123 | 675 ms/step , 58244.52 GFLOP/s , 532287.1 tokens/s INFO:__main__:2024-10-27 03:58:09 | Epoch: 1 | Step: 45700 | Dataset: 0-16832901 | Loss: 2.077 | 675 ms/step , 58247.54 GFLOP/s , 532063.0 tokens/s INFO:__main__:2024-10-27 03:58:16 | Epoch: 1 | Step: 45710 | Dataset: 0-16840901 | Loss: 2.103 | 675 ms/step , 58232.31 GFLOP/s , 532328.5 tokens/s INFO:__main__:2024-10-27 03:58:24 | Epoch: 1 | Step: 45720 | Dataset: 0-16848901 | Loss: 2.125 | 674 ms/step , 58357.03 GFLOP/s , 532340.8 tokens/s INFO:__main__:2024-10-27 03:58:32 | Epoch: 1 | Step: 45730 | Dataset: 0-16856901 | Loss: 2.166 | 674 ms/step , 58286.69 GFLOP/s , 533036.6 tokens/s INFO:__main__:2024-10-27 03:58:39 | Epoch: 1 | Step: 45740 | Dataset: 0-16864901 | Loss: 2.037 | 675 ms/step , 58240.08 GFLOP/s , 532188.0 tokens/s INFO:__main__:2024-10-27 03:58:47 | Epoch: 1 | Step: 45750 | Dataset: 0-16872901 | Loss: 1.733 | 674 ms/step , 58296.25 GFLOP/s , 532645.4 tokens/s INFO:__main__:2024-10-27 03:58:55 | Epoch: 1 | Step: 45760 | Dataset: 0-16880901 | Loss: 1.695 | 676 ms/step , 58157.17 GFLOP/s , 531996.0 tokens/s INFO:__main__:2024-10-27 03:59:02 | Epoch: 1 | Step: 45770 | Dataset: 0-16888901 | Loss: 1.698 | 675 ms/step , 58274.06 GFLOP/s , 532029.8 tokens/s INFO:__main__:2024-10-27 03:59:10 | Epoch: 1 | Step: 45780 | Dataset: 0-16896901 | Loss: 1.690 | 674 ms/step , 58299.62 GFLOP/s , 531302.8 tokens/s INFO:__main__:2024-10-27 03:59:18 | Epoch: 1 | Step: 45790 | Dataset: 0-16904901 | Loss: 1.664 | 676 ms/step , 58107.16 GFLOP/s , 530889.5 tokens/s INFO:__main__:2024-10-27 03:59:26 | Epoch: 1 | Step: 45800 | Dataset: 0-16912901 | Loss: 1.642 | 677 ms/step , 58105.98 GFLOP/s , 531198.3 tokens/s INFO:__main__:2024-10-27 03:59:33 | Epoch: 1 | Step: 45810 | Dataset: 0-16920901 | Loss: 1.663 | 676 ms/step , 58174.43 GFLOP/s , 530955.2 tokens/s INFO:__main__:2024-10-27 03:59:41 | Epoch: 1 | Step: 45820 | Dataset: 0-16928901 | Loss: 1.626 | 675 ms/step , 58230.53 GFLOP/s , 531107.6 tokens/s INFO:__main__:2024-10-27 03:59:49 | Epoch: 1 | Step: 45830 | Dataset: 0-16936901 | Loss: 2.320 | 676 ms/step , 58173.00 GFLOP/s , 531108.6 tokens/s INFO:__main__:2024-10-27 03:59:56 | Epoch: 1 | Step: 45840 | Dataset: 0-16944901 | Loss: 2.309 | 676 ms/step , 58173.64 GFLOP/s , 531217.9 tokens/s INFO:__main__:2024-10-27 04:00:04 | Epoch: 1 | Step: 45850 | Dataset: 0-16952901 | Loss: 2.263 | 675 ms/step , 58272.68 GFLOP/s , 547775.1 tokens/s INFO:__main__:2024-10-27 04:00:12 | Epoch: 1 | Step: 45860 | Dataset: 0-16960901 | Loss: 2.220 | 675 ms/step , 58214.06 GFLOP/s , 531485.3 tokens/s INFO:__main__:2024-10-27 04:00:19 | Epoch: 1 | Step: 45870 | Dataset: 0-16968901 | Loss: 2.246 | 677 ms/step , 58105.93 GFLOP/s , 531736.7 tokens/s INFO:__main__:2024-10-27 04:00:27 | Epoch: 1 | Step: 45880 | Dataset: 0-16976901 | Loss: 2.280 | 676 ms/step , 58176.36 GFLOP/s , 531386.9 tokens/s INFO:__main__:2024-10-27 04:00:35 | Epoch: 1 | Step: 45890 | Dataset: 0-16984901 | Loss: 2.202 | 676 ms/step , 58133.39 GFLOP/s , 531744.6 tokens/s INFO:__main__:2024-10-27 04:00:42 | Epoch: 1 | Step: 45900 | Dataset: 0-16992901 | Loss: 2.254 | 675 ms/step , 58208.92 GFLOP/s , 531504.6 tokens/s INFO:__main__:2024-10-27 04:00:50 | Epoch: 1 | Step: 45910 | Dataset: 0-17000901 | Loss: 2.212 | 676 ms/step , 58182.68 GFLOP/s , 531796.9 tokens/s INFO:__main__:2024-10-27 04:00:58 | Epoch: 1 | Step: 45920 | Dataset: 0-17008901 | Loss: 2.217 | 676 ms/step , 58109.60 GFLOP/s , 530875.2 tokens/s INFO:__main__:2024-10-27 04:01:06 | Epoch: 1 | Step: 45930 | Dataset: 0-17016901 | Loss: 2.198 | 676 ms/step , 58153.50 GFLOP/s , 531413.4 tokens/s INFO:__main__:2024-10-27 04:01:13 | Epoch: 1 | Step: 45940 | Dataset: 0-17024901 | Loss: 2.243 | 678 ms/step , 57983.42 GFLOP/s , 531120.0 tokens/s INFO:__main__:2024-10-27 04:01:21 | Epoch: 1 | Step: 45950 | Dataset: 0-17032901 | Loss: 2.173 | 678 ms/step , 57980.47 GFLOP/s , 530022.4 tokens/s INFO:__main__:2024-10-27 04:01:29 | Epoch: 1 | Step: 45960 | Dataset: 0-17040901 | Loss: 2.187 | 677 ms/step , 58047.55 GFLOP/s , 529915.4 tokens/s INFO:__main__:2024-10-27 04:01:36 | Epoch: 1 | Step: 45970 | Dataset: 0-17048901 | Loss: 2.152 | 678 ms/step , 58000.34 GFLOP/s , 530285.2 tokens/s INFO:__main__:2024-10-27 04:01:44 | Epoch: 1 | Step: 45980 | Dataset: 0-17056901 | Loss: 2.207 | 677 ms/step , 58031.26 GFLOP/s , 530049.6 tokens/s INFO:__main__:2024-10-27 04:01:52 | Epoch: 1 | Step: 45990 | Dataset: 0-17064901 | Loss: 2.261 | 674 ms/step , 58310.43 GFLOP/s , 532438.5 tokens/s INFO:__main__:2024-10-27 04:01:59 | Validation | Step: 46000 | Val_loss: 2.001 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 04:01:59 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_040159_step_46000.pt` INFO:__main__:2024-10-27 04:02:00 | Epoch: 1 | Step: 46000 | Dataset: 0-17072901 | Loss: 2.247 | 673 ms/step , 58407.11 GFLOP/s , 479286.2 tokens/s INFO:__main__:2024-10-27 04:02:08 | Epoch: 1 | Step: 46010 | Dataset: 0-17080901 | Loss: 2.274 | 674 ms/step , 58294.80 GFLOP/s , 532613.0 tokens/s INFO:__main__:2024-10-27 04:02:16 | Epoch: 1 | Step: 46020 | Dataset: 0-17088901 | Loss: 2.234 | 674 ms/step , 58341.91 GFLOP/s , 532863.4 tokens/s INFO:__main__:2024-10-27 04:02:23 | Epoch: 1 | Step: 46030 | Dataset: 0-17096901 | Loss: 2.255 | 673 ms/step , 58426.13 GFLOP/s , 532958.2 tokens/s INFO:__main__:2024-10-27 04:02:31 | Epoch: 1 | Step: 46040 | Dataset: 0-17104901 | Loss: 2.187 | 674 ms/step , 58351.01 GFLOP/s , 533322.5 tokens/s INFO:__main__:2024-10-27 04:02:39 | Epoch: 1 | Step: 46050 | Dataset: 0-17112901 | Loss: 2.154 | 676 ms/step , 58176.86 GFLOP/s , 532822.7 tokens/s INFO:__main__:2024-10-27 04:02:47 | Epoch: 1 | Step: 46060 | Dataset: 0-17120901 | Loss: 2.208 | 676 ms/step , 58178.64 GFLOP/s , 532502.7 tokens/s INFO:__main__:2024-10-27 04:02:54 | Epoch: 1 | Step: 46070 | Dataset: 0-17128901 | Loss: 2.182 | 674 ms/step , 58312.62 GFLOP/s , 532787.6 tokens/s INFO:__main__:2024-10-27 04:03:02 | Epoch: 1 | Step: 46080 | Dataset: 0-17136901 | Loss: 2.239 | 675 ms/step , 58278.54 GFLOP/s , 532429.5 tokens/s INFO:__main__:2024-10-27 04:03:10 | Epoch: 1 | Step: 46090 | Dataset: 0-17144901 | Loss: 2.254 | 674 ms/step , 58339.92 GFLOP/s , 532727.7 tokens/s INFO:__main__:2024-10-27 04:03:17 | Epoch: 1 | Step: 46100 | Dataset: 0-17152901 | Loss: 2.157 | 674 ms/step , 58316.69 GFLOP/s , 533150.4 tokens/s INFO:__main__:2024-10-27 04:03:25 | Epoch: 1 | Step: 46110 | Dataset: 0-17160901 | Loss: 2.130 | 673 ms/step , 58367.79 GFLOP/s , 533159.0 tokens/s INFO:__main__:2024-10-27 04:03:33 | Epoch: 1 | Step: 46120 | Dataset: 0-17168901 | Loss: 2.194 | 675 ms/step , 58259.67 GFLOP/s , 533341.5 tokens/s INFO:__main__:2024-10-27 04:03:40 | Epoch: 1 | Step: 46130 | Dataset: 0-17176901 | Loss: 2.182 | 674 ms/step , 58338.31 GFLOP/s , 533165.7 tokens/s INFO:__main__:2024-10-27 04:03:48 | Epoch: 1 | Step: 46140 | Dataset: 0-17184901 | Loss: 2.219 | 674 ms/step , 58283.27 GFLOP/s , 532981.4 tokens/s INFO:__main__:2024-10-27 04:03:56 | Epoch: 1 | Step: 46150 | Dataset: 0-17192901 | Loss: 1.838 | 675 ms/step , 58264.87 GFLOP/s , 532711.7 tokens/s INFO:__main__:2024-10-27 04:04:03 | Epoch: 1 | Step: 46160 | Dataset: 0-17200901 | Loss: 1.742 | 675 ms/step , 58218.34 GFLOP/s , 531982.1 tokens/s INFO:__main__:2024-10-27 04:04:11 | Epoch: 1 | Step: 46170 | Dataset: 0-17208901 | Loss: 1.695 | 675 ms/step , 58264.19 GFLOP/s , 531932.9 tokens/s INFO:__main__:2024-10-27 04:04:19 | Epoch: 1 | Step: 46180 | Dataset: 0-17216901 | Loss: 1.695 | 674 ms/step , 58334.86 GFLOP/s , 532039.9 tokens/s INFO:__main__:2024-10-27 04:04:27 | Epoch: 1 | Step: 46190 | Dataset: 0-17224901 | Loss: 1.676 | 674 ms/step , 58308.23 GFLOP/s , 532101.3 tokens/s INFO:__main__:2024-10-27 04:04:34 | Epoch: 1 | Step: 46200 | Dataset: 0-17232901 | Loss: 1.665 | 675 ms/step , 58222.40 GFLOP/s , 532231.0 tokens/s INFO:__main__:2024-10-27 04:04:42 | Epoch: 1 | Step: 46210 | Dataset: 0-17240901 | Loss: 1.674 | 674 ms/step , 58324.18 GFLOP/s , 531888.0 tokens/s INFO:__main__:2024-10-27 04:04:50 | Epoch: 1 | Step: 46220 | Dataset: 0-17248901 | Loss: 1.638 | 675 ms/step , 58274.90 GFLOP/s , 532014.0 tokens/s INFO:__main__:2024-10-27 04:04:57 | Epoch: 1 | Step: 46230 | Dataset: 0-17256901 | Loss: 1.668 | 674 ms/step , 58286.67 GFLOP/s , 531971.5 tokens/s INFO:__main__:2024-10-27 04:05:05 | Epoch: 1 | Step: 46240 | Dataset: 0-17264901 | Loss: 2.305 | 675 ms/step , 58219.83 GFLOP/s , 531944.7 tokens/s INFO:__main__:2024-10-27 04:05:13 | Epoch: 1 | Step: 46250 | Dataset: 0-17272901 | Loss: 2.215 | 676 ms/step , 58191.80 GFLOP/s , 531808.2 tokens/s INFO:__main__:2024-10-27 04:05:20 | Epoch: 1 | Step: 46260 | Dataset: 0-17280901 | Loss: 2.200 | 676 ms/step , 58145.92 GFLOP/s , 531969.8 tokens/s INFO:__main__:2024-10-27 04:05:28 | Epoch: 1 | Step: 46270 | Dataset: 0-17288901 | Loss: 2.252 | 676 ms/step , 58170.19 GFLOP/s , 531848.1 tokens/s INFO:__main__:2024-10-27 04:05:36 | Epoch: 1 | Step: 46280 | Dataset: 0-17296901 | Loss: 2.198 | 676 ms/step , 58191.92 GFLOP/s , 534612.5 tokens/s INFO:__main__:2024-10-27 04:05:43 | Epoch: 1 | Step: 46290 | Dataset: 0-17304901 | Loss: 2.199 | 676 ms/step , 58118.67 GFLOP/s , 532089.4 tokens/s INFO:__main__:2024-10-27 04:05:51 | Epoch: 1 | Step: 46300 | Dataset: 0-17312901 | Loss: 2.203 | 676 ms/step , 58130.64 GFLOP/s , 532020.5 tokens/s INFO:__main__:2024-10-27 04:05:59 | Epoch: 1 | Step: 46310 | Dataset: 0-17320901 | Loss: 2.193 | 676 ms/step , 58175.61 GFLOP/s , 532029.0 tokens/s INFO:__main__:2024-10-27 04:06:07 | Epoch: 1 | Step: 46320 | Dataset: 0-17328901 | Loss: 2.213 | 674 ms/step , 58338.64 GFLOP/s , 532687.4 tokens/s INFO:__main__:2024-10-27 04:06:14 | Epoch: 1 | Step: 46330 | Dataset: 0-17336901 | Loss: 2.179 | 675 ms/step , 58218.42 GFLOP/s , 532198.7 tokens/s INFO:__main__:2024-10-27 04:06:22 | Epoch: 1 | Step: 46340 | Dataset: 0-17344901 | Loss: 2.255 | 676 ms/step , 58145.99 GFLOP/s , 531746.7 tokens/s INFO:__main__:2024-10-27 04:06:30 | Epoch: 1 | Step: 46350 | Dataset: 0-17352901 | Loss: 2.216 | 676 ms/step , 58171.99 GFLOP/s , 531948.6 tokens/s INFO:__main__:2024-10-27 04:06:37 | Epoch: 1 | Step: 46360 | Dataset: 0-17360901 | Loss: 2.269 | 675 ms/step , 58196.93 GFLOP/s , 531971.7 tokens/s INFO:__main__:2024-10-27 04:06:45 | Epoch: 1 | Step: 46370 | Dataset: 0-17368901 | Loss: 2.101 | 675 ms/step , 58230.25 GFLOP/s , 532188.7 tokens/s INFO:__main__:2024-10-27 04:06:53 | Epoch: 1 | Step: 46380 | Dataset: 0-17376901 | Loss: 2.130 | 675 ms/step , 58262.83 GFLOP/s , 532207.5 tokens/s INFO:__main__:2024-10-27 04:07:00 | Epoch: 1 | Step: 46390 | Dataset: 0-17384901 | Loss: 2.182 | 676 ms/step , 58109.68 GFLOP/s , 531760.7 tokens/s INFO:__main__:2024-10-27 04:07:08 | Epoch: 1 | Step: 46400 | Dataset: 0-17392901 | Loss: 2.422 | 676 ms/step , 58146.11 GFLOP/s , 532159.1 tokens/s INFO:__main__:2024-10-27 04:07:16 | Epoch: 1 | Step: 46410 | Dataset: 0-17400901 | Loss: 2.296 | 676 ms/step , 58137.21 GFLOP/s , 531899.1 tokens/s INFO:__main__:2024-10-27 04:07:24 | Epoch: 1 | Step: 46420 | Dataset: 0-17408901 | Loss: 2.208 | 675 ms/step , 58252.83 GFLOP/s , 532191.5 tokens/s INFO:__main__:2024-10-27 04:07:31 | Epoch: 1 | Step: 46430 | Dataset: 0-17416901 | Loss: 2.210 | 674 ms/step , 58303.49 GFLOP/s , 532504.2 tokens/s INFO:__main__:2024-10-27 04:07:39 | Epoch: 1 | Step: 46440 | Dataset: 0-17424901 | Loss: 2.199 | 675 ms/step , 58267.82 GFLOP/s , 532828.2 tokens/s INFO:__main__:2024-10-27 04:07:47 | Epoch: 1 | Step: 46450 | Dataset: 0-17432901 | Loss: 2.190 | 674 ms/step , 58320.23 GFLOP/s , 532721.1 tokens/s INFO:__main__:2024-10-27 04:07:54 | Epoch: 1 | Step: 46460 | Dataset: 0-17440901 | Loss: 2.216 | 675 ms/step , 58212.19 GFLOP/s , 532545.8 tokens/s INFO:__main__:2024-10-27 04:08:02 | Epoch: 1 | Step: 46470 | Dataset: 0-17448901 | Loss: 2.117 | 674 ms/step , 58362.17 GFLOP/s , 532518.5 tokens/s INFO:__main__:2024-10-27 04:08:10 | Epoch: 1 | Step: 46480 | Dataset: 0-17456901 | Loss: 2.136 | 675 ms/step , 58220.93 GFLOP/s , 532781.4 tokens/s INFO:__main__:2024-10-27 04:08:17 | Epoch: 1 | Step: 46490 | Dataset: 0-17464901 | Loss: 2.134 | 675 ms/step , 58222.18 GFLOP/s , 531296.2 tokens/s INFO:__main__:2024-10-27 04:08:25 | Epoch: 1 | Step: 46500 | Dataset: 0-17472901 | Loss: 2.147 | 676 ms/step , 58141.92 GFLOP/s , 532021.0 tokens/s INFO:__main__:2024-10-27 04:08:33 | Epoch: 1 | Step: 46510 | Dataset: 0-17480901 | Loss: 2.093 | 675 ms/step , 58196.14 GFLOP/s , 531884.1 tokens/s INFO:__main__:2024-10-27 04:08:40 | Epoch: 1 | Step: 46520 | Dataset: 0-17488901 | Loss: 2.123 | 676 ms/step , 58152.58 GFLOP/s , 532122.5 tokens/s INFO:__main__:2024-10-27 04:08:48 | Epoch: 1 | Step: 46530 | Dataset: 0-17496901 | Loss: 2.090 | 675 ms/step , 58234.02 GFLOP/s , 531891.0 tokens/s INFO:__main__:2024-10-27 04:08:56 | Epoch: 1 | Step: 46540 | Dataset: 0-17504901 | Loss: 2.150 | 674 ms/step , 58307.18 GFLOP/s , 531862.2 tokens/s INFO:__main__:2024-10-27 04:09:04 | Epoch: 1 | Step: 46550 | Dataset: 0-17512901 | Loss: 2.051 | 674 ms/step , 58327.69 GFLOP/s , 532362.4 tokens/s INFO:__main__:2024-10-27 04:09:11 | Epoch: 1 | Step: 46560 | Dataset: 0-17520901 | Loss: 2.240 | 674 ms/step , 58292.27 GFLOP/s , 531737.7 tokens/s INFO:__main__:2024-10-27 04:09:19 | Epoch: 1 | Step: 46570 | Dataset: 0-17528901 | Loss: 2.263 | 676 ms/step , 58173.31 GFLOP/s , 531156.3 tokens/s INFO:__main__:2024-10-27 04:09:27 | Epoch: 1 | Step: 46580 | Dataset: 0-17536901 | Loss: 2.268 | 675 ms/step , 58201.95 GFLOP/s , 530898.1 tokens/s INFO:__main__:2024-10-27 04:09:34 | Epoch: 1 | Step: 46590 | Dataset: 0-17544901 | Loss: 2.213 | 675 ms/step , 58216.58 GFLOP/s , 532622.2 tokens/s INFO:__main__:2024-10-27 04:09:42 | Epoch: 1 | Step: 46600 | Dataset: 0-17552901 | Loss: 2.182 | 674 ms/step , 58290.91 GFLOP/s , 532880.8 tokens/s INFO:__main__:2024-10-27 04:09:50 | Epoch: 1 | Step: 46610 | Dataset: 0-17560901 | Loss: 2.254 | 677 ms/step , 58071.49 GFLOP/s , 533069.9 tokens/s INFO:__main__:2024-10-27 04:09:57 | Epoch: 1 | Step: 46620 | Dataset: 0-17568901 | Loss: 2.155 | 674 ms/step , 58300.76 GFLOP/s , 532436.1 tokens/s INFO:__main__:2024-10-27 04:10:05 | Epoch: 1 | Step: 46630 | Dataset: 0-17576901 | Loss: 2.134 | 675 ms/step , 58249.45 GFLOP/s , 532888.9 tokens/s INFO:__main__:2024-10-27 04:10:13 | Epoch: 1 | Step: 46640 | Dataset: 0-17584901 | Loss: 2.157 | 675 ms/step , 58256.93 GFLOP/s , 532702.5 tokens/s INFO:__main__:2024-10-27 04:10:21 | Epoch: 1 | Step: 46650 | Dataset: 0-17592901 | Loss: 2.106 | 678 ms/step , 57961.44 GFLOP/s , 532507.7 tokens/s INFO:__main__:2024-10-27 04:10:28 | Epoch: 1 | Step: 46660 | Dataset: 0-17600901 | Loss: 2.198 | 675 ms/step , 58266.99 GFLOP/s , 531974.6 tokens/s INFO:__main__:2024-10-27 04:10:36 | Epoch: 1 | Step: 46670 | Dataset: 0-17608901 | Loss: 2.042 | 675 ms/step , 58230.03 GFLOP/s , 532495.6 tokens/s INFO:__main__:2024-10-27 04:10:44 | Epoch: 1 | Step: 46680 | Dataset: 0-17616901 | Loss: 2.191 | 674 ms/step , 58284.24 GFLOP/s , 532683.1 tokens/s INFO:__main__:2024-10-27 04:10:51 | Epoch: 1 | Step: 46690 | Dataset: 0-17624901 | Loss: 2.198 | 675 ms/step , 58268.57 GFLOP/s , 532666.0 tokens/s INFO:__main__:2024-10-27 04:10:59 | Epoch: 1 | Step: 46700 | Dataset: 0-17632901 | Loss: 2.168 | 675 ms/step , 58240.08 GFLOP/s , 532866.0 tokens/s INFO:__main__:2024-10-27 04:11:07 | Epoch: 1 | Step: 46710 | Dataset: 0-17640901 | Loss: 2.070 | 676 ms/step , 58170.59 GFLOP/s , 531717.5 tokens/s INFO:__main__:2024-10-27 04:11:14 | Epoch: 1 | Step: 46720 | Dataset: 0-17648901 | Loss: 2.247 | 674 ms/step , 58363.77 GFLOP/s , 533337.5 tokens/s INFO:__main__:2024-10-27 04:11:22 | Epoch: 1 | Step: 46730 | Dataset: 0-17656901 | Loss: 2.210 | 684 ms/step , 57474.50 GFLOP/s , 531969.6 tokens/s INFO:__main__:2024-10-27 04:11:30 | Epoch: 1 | Step: 46740 | Dataset: 0-17664901 | Loss: 2.218 | 674 ms/step , 58292.20 GFLOP/s , 532869.3 tokens/s INFO:__main__:2024-10-27 04:11:37 | Epoch: 1 | Step: 46750 | Dataset: 0-17672901 | Loss: 2.169 | 675 ms/step , 58204.62 GFLOP/s , 532522.0 tokens/s INFO:__main__:2024-10-27 04:11:45 | Epoch: 1 | Step: 46760 | Dataset: 0-17680901 | Loss: 2.195 | 675 ms/step , 58243.74 GFLOP/s , 532461.9 tokens/s INFO:__main__:2024-10-27 04:11:53 | Epoch: 1 | Step: 46770 | Dataset: 0-17688901 | Loss: 2.190 | 675 ms/step , 58259.75 GFLOP/s , 532919.6 tokens/s INFO:__main__:2024-10-27 04:12:01 | Epoch: 1 | Step: 46780 | Dataset: 0-17696901 | Loss: 2.224 | 675 ms/step , 58203.35 GFLOP/s , 532902.2 tokens/s INFO:__main__:2024-10-27 04:12:08 | Epoch: 1 | Step: 46790 | Dataset: 0-17704901 | Loss: 2.176 | 673 ms/step , 58365.81 GFLOP/s , 533238.8 tokens/s INFO:__main__:2024-10-27 04:12:16 | Epoch: 1 | Step: 46800 | Dataset: 0-17712901 | Loss: 2.208 | 676 ms/step , 58127.93 GFLOP/s , 532001.9 tokens/s INFO:__main__:2024-10-27 04:12:24 | Epoch: 1 | Step: 46810 | Dataset: 0-17720901 | Loss: 2.261 | 677 ms/step , 58091.04 GFLOP/s , 532039.4 tokens/s INFO:__main__:2024-10-27 04:12:31 | Epoch: 1 | Step: 46820 | Dataset: 0-17728901 | Loss: 2.138 | 674 ms/step , 58288.68 GFLOP/s , 532562.6 tokens/s INFO:__main__:2024-10-27 04:12:39 | Epoch: 1 | Step: 46830 | Dataset: 0-17736901 | Loss: 2.170 | 674 ms/step , 58336.04 GFLOP/s , 533087.7 tokens/s INFO:__main__:2024-10-27 04:12:47 | Epoch: 1 | Step: 46840 | Dataset: 0-17744901 | Loss: 2.165 | 674 ms/step , 58281.63 GFLOP/s , 532815.8 tokens/s INFO:__main__:2024-10-27 04:12:54 | Epoch: 1 | Step: 46850 | Dataset: 0-17752901 | Loss: 2.213 | 677 ms/step , 58094.46 GFLOP/s , 532866.9 tokens/s INFO:__main__:2024-10-27 04:13:02 | Epoch: 1 | Step: 46860 | Dataset: 0-17760901 | Loss: 2.183 | 676 ms/step , 58190.12 GFLOP/s , 532539.8 tokens/s INFO:__main__:2024-10-27 04:13:10 | Epoch: 1 | Step: 46870 | Dataset: 0-17768901 | Loss: 2.103 | 675 ms/step , 58214.58 GFLOP/s , 532149.7 tokens/s INFO:__main__:2024-10-27 04:13:17 | Epoch: 1 | Step: 46880 | Dataset: 0-17776901 | Loss: 2.235 | 675 ms/step , 58251.88 GFLOP/s , 532757.5 tokens/s INFO:__main__:2024-10-27 04:13:25 | Epoch: 1 | Step: 46890 | Dataset: 0-17784901 | Loss: 2.324 | 674 ms/step , 58307.78 GFLOP/s , 532809.8 tokens/s INFO:__main__:2024-10-27 04:13:33 | Epoch: 1 | Step: 46900 | Dataset: 0-17792901 | Loss: 2.234 | 676 ms/step , 58161.11 GFLOP/s , 532603.6 tokens/s INFO:__main__:2024-10-27 04:13:41 | Epoch: 1 | Step: 46910 | Dataset: 0-17800901 | Loss: 2.188 | 676 ms/step , 58167.30 GFLOP/s , 532534.8 tokens/s INFO:__main__:2024-10-27 04:13:48 | Epoch: 1 | Step: 46920 | Dataset: 0-17808901 | Loss: 2.276 | 676 ms/step , 58161.52 GFLOP/s , 532227.9 tokens/s INFO:__main__:2024-10-27 04:13:56 | Epoch: 1 | Step: 46930 | Dataset: 0-17816901 | Loss: 2.102 | 674 ms/step , 58303.29 GFLOP/s , 532790.7 tokens/s INFO:__main__:2024-10-27 04:14:04 | Epoch: 1 | Step: 46940 | Dataset: 0-17824901 | Loss: 2.185 | 674 ms/step , 58299.74 GFLOP/s , 533052.4 tokens/s INFO:__main__:2024-10-27 04:14:11 | Epoch: 1 | Step: 46950 | Dataset: 0-17832901 | Loss: 2.258 | 674 ms/step , 58283.07 GFLOP/s , 532642.6 tokens/s INFO:__main__:2024-10-27 04:14:19 | Epoch: 1 | Step: 46960 | Dataset: 0-17840901 | Loss: 2.205 | 678 ms/step , 57975.42 GFLOP/s , 532252.7 tokens/s INFO:__main__:2024-10-27 04:14:27 | Epoch: 1 | Step: 46970 | Dataset: 0-17848901 | Loss: 2.161 | 676 ms/step , 58184.57 GFLOP/s , 532444.9 tokens/s INFO:__main__:2024-10-27 04:14:34 | Epoch: 1 | Step: 46980 | Dataset: 0-17856901 | Loss: 2.145 | 676 ms/step , 58149.91 GFLOP/s , 532235.7 tokens/s INFO:__main__:2024-10-27 04:14:42 | Epoch: 1 | Step: 46990 | Dataset: 0-17864901 | Loss: 2.218 | 674 ms/step , 58363.71 GFLOP/s , 532823.7 tokens/s INFO:__main__:2024-10-27 04:14:49 | Validation | Step: 47000 | Val_loss: 2.299 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 04:14:49 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_041449_step_47000.pt` INFO:__main__:2024-10-27 04:14:51 | Epoch: 1 | Step: 47000 | Dataset: 0-17872901 | Loss: 2.186 | 674 ms/step , 58311.39 GFLOP/s , 479602.6 tokens/s INFO:__main__:2024-10-27 04:14:58 | Epoch: 1 | Step: 47010 | Dataset: 0-17880901 | Loss: 2.230 | 677 ms/step , 58029.14 GFLOP/s , 531925.0 tokens/s INFO:__main__:2024-10-27 04:15:06 | Epoch: 1 | Step: 47020 | Dataset: 0-17888901 | Loss: 2.143 | 675 ms/step , 58226.41 GFLOP/s , 532188.6 tokens/s INFO:__main__:2024-10-27 04:15:14 | Epoch: 1 | Step: 47030 | Dataset: 0-17896901 | Loss: 2.227 | 675 ms/step , 58213.61 GFLOP/s , 532348.7 tokens/s INFO:__main__:2024-10-27 04:15:21 | Epoch: 1 | Step: 47040 | Dataset: 0-17904901 | Loss: 2.082 | 676 ms/step , 58131.78 GFLOP/s , 532045.1 tokens/s INFO:__main__:2024-10-27 04:15:29 | Epoch: 1 | Step: 47050 | Dataset: 0-17912901 | Loss: 1.809 | 675 ms/step , 58269.16 GFLOP/s , 531781.7 tokens/s INFO:__main__:2024-10-27 04:15:37 | Epoch: 1 | Step: 47060 | Dataset: 0-17920901 | Loss: 1.757 | 676 ms/step , 58123.28 GFLOP/s , 532149.6 tokens/s INFO:__main__:2024-10-27 04:15:44 | Epoch: 1 | Step: 47070 | Dataset: 0-17928901 | Loss: 1.697 | 675 ms/step , 58193.67 GFLOP/s , 531632.3 tokens/s INFO:__main__:2024-10-27 04:15:52 | Epoch: 1 | Step: 47080 | Dataset: 0-17936901 | Loss: 1.720 | 674 ms/step , 58281.82 GFLOP/s , 532082.7 tokens/s INFO:__main__:2024-10-27 04:16:00 | Epoch: 1 | Step: 47090 | Dataset: 0-17944901 | Loss: 1.719 | 674 ms/step , 58315.84 GFLOP/s , 532014.2 tokens/s INFO:__main__:2024-10-27 04:16:08 | Epoch: 1 | Step: 47100 | Dataset: 0-17952901 | Loss: 1.653 | 675 ms/step , 58260.52 GFLOP/s , 531640.9 tokens/s INFO:__main__:2024-10-27 04:16:15 | Epoch: 1 | Step: 47110 | Dataset: 0-17960901 | Loss: 1.686 | 677 ms/step , 58064.38 GFLOP/s , 531417.4 tokens/s INFO:__main__:2024-10-27 04:16:23 | Epoch: 1 | Step: 47120 | Dataset: 0-17968901 | Loss: 1.691 | 676 ms/step , 58169.07 GFLOP/s , 531831.9 tokens/s INFO:__main__:2024-10-27 04:16:31 | Epoch: 1 | Step: 47130 | Dataset: 0-17976901 | Loss: 1.690 | 676 ms/step , 58180.16 GFLOP/s , 531685.6 tokens/s INFO:__main__:2024-10-27 04:16:38 | Epoch: 1 | Step: 47140 | Dataset: 0-17984901 | Loss: 1.694 | 676 ms/step , 58185.26 GFLOP/s , 531852.2 tokens/s INFO:__main__:2024-10-27 04:16:46 | Epoch: 1 | Step: 47150 | Dataset: 0-17992901 | Loss: 1.673 | 675 ms/step , 58260.50 GFLOP/s , 532071.5 tokens/s INFO:__main__:2024-10-27 04:16:54 | Epoch: 1 | Step: 47160 | Dataset: 0-18000901 | Loss: 1.643 | 674 ms/step , 58357.58 GFLOP/s , 532016.6 tokens/s INFO:__main__:2024-10-27 04:17:01 | Epoch: 1 | Step: 47170 | Dataset: 0-18008901 | Loss: 1.675 | 676 ms/step , 58187.77 GFLOP/s , 531699.1 tokens/s INFO:__main__:2024-10-27 04:17:09 | Epoch: 1 | Step: 47180 | Dataset: 0-18016901 | Loss: 1.639 | 675 ms/step , 58242.77 GFLOP/s , 531370.1 tokens/s INFO:__main__:2024-10-27 04:17:17 | Epoch: 1 | Step: 47190 | Dataset: 0-18024901 | Loss: 1.633 | 676 ms/step , 58168.82 GFLOP/s , 531250.0 tokens/s INFO:__main__:2024-10-27 04:17:25 | Epoch: 1 | Step: 47200 | Dataset: 0-18032901 | Loss: 1.680 | 676 ms/step , 58139.21 GFLOP/s , 531064.3 tokens/s INFO:__main__:2024-10-27 04:17:32 | Epoch: 1 | Step: 47210 | Dataset: 0-18040901 | Loss: 1.654 | 675 ms/step , 58254.34 GFLOP/s , 531667.8 tokens/s INFO:__main__:2024-10-27 04:17:40 | Epoch: 1 | Step: 47220 | Dataset: 0-18048901 | Loss: 2.335 | 675 ms/step , 58209.84 GFLOP/s , 531658.6 tokens/s INFO:__main__:2024-10-27 04:17:48 | Epoch: 1 | Step: 47230 | Dataset: 0-18056901 | Loss: 2.239 | 675 ms/step , 58193.25 GFLOP/s , 531948.1 tokens/s INFO:__main__:2024-10-27 04:17:55 | Epoch: 1 | Step: 47240 | Dataset: 0-18064901 | Loss: 2.161 | 675 ms/step , 58278.92 GFLOP/s , 531406.1 tokens/s INFO:__main__:2024-10-27 04:18:03 | Epoch: 1 | Step: 47250 | Dataset: 0-18072901 | Loss: 2.218 | 675 ms/step , 58261.99 GFLOP/s , 530546.0 tokens/s INFO:__main__:2024-10-27 04:18:11 | Epoch: 1 | Step: 47260 | Dataset: 0-18080901 | Loss: 2.165 | 675 ms/step , 58266.20 GFLOP/s , 530511.3 tokens/s INFO:__main__:2024-10-27 04:18:19 | Epoch: 1 | Step: 47270 | Dataset: 0-18088901 | Loss: 2.087 | 675 ms/step , 58226.85 GFLOP/s , 532147.6 tokens/s INFO:__main__:2024-10-27 04:18:26 | Epoch: 1 | Step: 47280 | Dataset: 0-18096901 | Loss: 2.119 | 674 ms/step , 58293.39 GFLOP/s , 532815.9 tokens/s INFO:__main__:2024-10-27 04:18:34 | Epoch: 1 | Step: 47290 | Dataset: 0-18104901 | Loss: 2.211 | 674 ms/step , 58299.17 GFLOP/s , 532671.4 tokens/s INFO:__main__:2024-10-27 04:18:42 | Epoch: 1 | Step: 47300 | Dataset: 0-18112901 | Loss: 2.127 | 674 ms/step , 58361.23 GFLOP/s , 532993.1 tokens/s INFO:__main__:2024-10-27 04:18:49 | Epoch: 1 | Step: 47310 | Dataset: 0-18120901 | Loss: 2.192 | 673 ms/step , 58369.63 GFLOP/s , 533295.7 tokens/s INFO:__main__:2024-10-27 04:18:57 | Epoch: 1 | Step: 47320 | Dataset: 0-18128901 | Loss: 2.167 | 673 ms/step , 58394.97 GFLOP/s , 533275.9 tokens/s INFO:__main__:2024-10-27 04:19:05 | Epoch: 1 | Step: 47330 | Dataset: 0-18136901 | Loss: 2.184 | 676 ms/step , 58190.09 GFLOP/s , 532772.7 tokens/s INFO:__main__:2024-10-27 04:19:12 | Epoch: 1 | Step: 47340 | Dataset: 0-18144901 | Loss: 2.209 | 674 ms/step , 58313.35 GFLOP/s , 532797.8 tokens/s INFO:__main__:2024-10-27 04:19:20 | Epoch: 1 | Step: 47350 | Dataset: 0-18152901 | Loss: 2.140 | 675 ms/step , 58247.64 GFLOP/s , 532971.7 tokens/s INFO:__main__:2024-10-27 04:19:28 | Epoch: 1 | Step: 47360 | Dataset: 0-18160901 | Loss: 2.196 | 675 ms/step , 58200.41 GFLOP/s , 532622.8 tokens/s INFO:__main__:2024-10-27 04:19:35 | Epoch: 1 | Step: 47370 | Dataset: 0-18168901 | Loss: 2.205 | 674 ms/step , 58348.77 GFLOP/s , 532419.2 tokens/s INFO:__main__:2024-10-27 04:19:43 | Epoch: 1 | Step: 47380 | Dataset: 0-18176901 | Loss: 1.869 | 675 ms/step , 58267.53 GFLOP/s , 532849.6 tokens/s INFO:__main__:2024-10-27 04:19:51 | Epoch: 1 | Step: 47390 | Dataset: 0-18184901 | Loss: 1.790 | 675 ms/step , 58211.98 GFLOP/s , 532554.6 tokens/s INFO:__main__:2024-10-27 04:19:58 | Epoch: 1 | Step: 47400 | Dataset: 0-18192901 | Loss: 1.789 | 673 ms/step , 58379.71 GFLOP/s , 532825.2 tokens/s INFO:__main__:2024-10-27 04:20:06 | Epoch: 1 | Step: 47410 | Dataset: 0-18200901 | Loss: 1.759 | 674 ms/step , 58309.25 GFLOP/s , 532591.6 tokens/s INFO:__main__:2024-10-27 04:20:14 | Epoch: 1 | Step: 47420 | Dataset: 0-18208901 | Loss: 1.780 | 674 ms/step , 58284.24 GFLOP/s , 531692.4 tokens/s INFO:__main__:2024-10-27 04:20:22 | Epoch: 1 | Step: 47430 | Dataset: 0-18216901 | Loss: 1.779 | 675 ms/step , 58235.28 GFLOP/s , 531990.0 tokens/s INFO:__main__:2024-10-27 04:20:29 | Epoch: 1 | Step: 47440 | Dataset: 0-18224901 | Loss: 1.742 | 675 ms/step , 58271.68 GFLOP/s , 531797.4 tokens/s INFO:__main__:2024-10-27 04:20:37 | Epoch: 1 | Step: 47450 | Dataset: 0-18232901 | Loss: 1.754 | 675 ms/step , 58258.21 GFLOP/s , 531972.3 tokens/s INFO:__main__:2024-10-27 04:20:45 | Epoch: 1 | Step: 47460 | Dataset: 0-18240901 | Loss: 1.745 | 675 ms/step , 58277.62 GFLOP/s , 532325.1 tokens/s INFO:__main__:2024-10-27 04:20:52 | Epoch: 1 | Step: 47470 | Dataset: 0-18248901 | Loss: 2.272 | 675 ms/step , 58267.44 GFLOP/s , 532118.6 tokens/s INFO:__main__:2024-10-27 04:21:00 | Epoch: 1 | Step: 47480 | Dataset: 0-18256901 | Loss: 2.273 | 675 ms/step , 58223.37 GFLOP/s , 532417.5 tokens/s INFO:__main__:2024-10-27 04:21:08 | Epoch: 1 | Step: 47490 | Dataset: 0-18264901 | Loss: 2.218 | 674 ms/step , 58342.29 GFLOP/s , 532622.4 tokens/s INFO:__main__:2024-10-27 04:21:15 | Epoch: 1 | Step: 47500 | Dataset: 0-18272901 | Loss: 2.215 | 676 ms/step , 58180.11 GFLOP/s , 532493.7 tokens/s INFO:__main__:2024-10-27 04:21:23 | Epoch: 1 | Step: 47510 | Dataset: 0-18280901 | Loss: 2.245 | 676 ms/step , 58177.86 GFLOP/s , 531383.0 tokens/s INFO:__main__:2024-10-27 04:21:31 | Epoch: 1 | Step: 47520 | Dataset: 0-18288901 | Loss: 2.146 | 675 ms/step , 58197.32 GFLOP/s , 531630.8 tokens/s INFO:__main__:2024-10-27 04:21:39 | Epoch: 1 | Step: 47530 | Dataset: 0-18296901 | Loss: 2.292 | 675 ms/step , 58219.95 GFLOP/s , 532876.3 tokens/s INFO:__main__:2024-10-27 04:21:46 | Epoch: 1 | Step: 47540 | Dataset: 0-18304901 | Loss: 2.173 | 676 ms/step , 58169.70 GFLOP/s , 532433.6 tokens/s INFO:__main__:2024-10-27 04:21:54 | Epoch: 1 | Step: 47550 | Dataset: 0-18312901 | Loss: 2.132 | 673 ms/step , 58390.59 GFLOP/s , 532815.5 tokens/s INFO:__main__:2024-10-27 04:22:02 | Epoch: 1 | Step: 47560 | Dataset: 0-18320901 | Loss: 2.154 | 674 ms/step , 58324.65 GFLOP/s , 532258.1 tokens/s INFO:__main__:2024-10-27 04:22:09 | Epoch: 1 | Step: 47570 | Dataset: 0-18328901 | Loss: 2.225 | 676 ms/step , 58165.96 GFLOP/s , 533005.2 tokens/s INFO:__main__:2024-10-27 04:22:17 | Epoch: 1 | Step: 47580 | Dataset: 0-18336901 | Loss: 2.206 | 675 ms/step , 58197.32 GFLOP/s , 532718.1 tokens/s INFO:__main__:2024-10-27 04:22:25 | Epoch: 1 | Step: 47590 | Dataset: 0-18344901 | Loss: 2.206 | 675 ms/step , 58272.07 GFLOP/s , 532858.2 tokens/s INFO:__main__:2024-10-27 04:22:32 | Epoch: 1 | Step: 47600 | Dataset: 0-18352901 | Loss: 2.161 | 676 ms/step , 58172.48 GFLOP/s , 532511.8 tokens/s INFO:__main__:2024-10-27 04:22:40 | Epoch: 1 | Step: 47610 | Dataset: 0-18360901 | Loss: 2.224 | 675 ms/step , 58203.34 GFLOP/s , 532446.8 tokens/s INFO:__main__:2024-10-27 04:22:48 | Epoch: 1 | Step: 47620 | Dataset: 0-18368901 | Loss: 2.174 | 676 ms/step , 58133.34 GFLOP/s , 532625.5 tokens/s INFO:__main__:2024-10-27 04:22:55 | Epoch: 1 | Step: 47630 | Dataset: 0-18376901 | Loss: 2.185 | 675 ms/step , 58227.03 GFLOP/s , 532033.9 tokens/s INFO:__main__:2024-10-27 04:23:03 | Epoch: 1 | Step: 47640 | Dataset: 0-18384901 | Loss: 2.222 | 676 ms/step , 58148.64 GFLOP/s , 532168.8 tokens/s INFO:__main__:2024-10-27 04:23:11 | Epoch: 1 | Step: 47650 | Dataset: 0-18392901 | Loss: 2.152 | 675 ms/step , 58253.54 GFLOP/s , 532174.2 tokens/s INFO:__main__:2024-10-27 04:23:19 | Epoch: 1 | Step: 47660 | Dataset: 0-18400901 | Loss: 2.175 | 676 ms/step , 58190.62 GFLOP/s , 532114.2 tokens/s INFO:__main__:2024-10-27 04:23:26 | Epoch: 1 | Step: 47670 | Dataset: 0-18408901 | Loss: 2.079 | 676 ms/step , 58171.08 GFLOP/s , 531867.0 tokens/s INFO:__main__:2024-10-27 04:23:34 | Epoch: 1 | Step: 47680 | Dataset: 0-18416901 | Loss: 2.194 | 676 ms/step , 58172.58 GFLOP/s , 531911.0 tokens/s INFO:__main__:2024-10-27 04:23:42 | Epoch: 1 | Step: 47690 | Dataset: 0-18424901 | Loss: 2.209 | 675 ms/step , 58255.65 GFLOP/s , 532013.6 tokens/s INFO:__main__:2024-10-27 04:23:49 | Epoch: 1 | Step: 47700 | Dataset: 0-18432901 | Loss: 2.176 | 676 ms/step , 58175.02 GFLOP/s , 532286.8 tokens/s INFO:__main__:2024-10-27 04:23:57 | Epoch: 1 | Step: 47710 | Dataset: 0-18440901 | Loss: 2.220 | 676 ms/step , 58181.85 GFLOP/s , 532334.6 tokens/s INFO:__main__:2024-10-27 04:24:05 | Epoch: 1 | Step: 47720 | Dataset: 0-18448901 | Loss: 2.169 | 675 ms/step , 58204.70 GFLOP/s , 531894.0 tokens/s INFO:__main__:2024-10-27 04:24:12 | Epoch: 1 | Step: 47730 | Dataset: 0-18456901 | Loss: 2.156 | 676 ms/step , 58168.08 GFLOP/s , 531939.2 tokens/s INFO:__main__:2024-10-27 04:24:20 | Epoch: 1 | Step: 47740 | Dataset: 0-18464901 | Loss: 2.190 | 675 ms/step , 58225.38 GFLOP/s , 532945.9 tokens/s INFO:__main__:2024-10-27 04:24:28 | Epoch: 1 | Step: 47750 | Dataset: 0-18472901 | Loss: 2.198 | 676 ms/step , 58190.75 GFLOP/s , 532442.4 tokens/s INFO:__main__:2024-10-27 04:24:36 | Epoch: 1 | Step: 47760 | Dataset: 0-18480901 | Loss: 2.243 | 675 ms/step , 58263.51 GFLOP/s , 532465.8 tokens/s INFO:__main__:2024-10-27 04:24:43 | Epoch: 1 | Step: 47770 | Dataset: 0-18488901 | Loss: 2.120 | 676 ms/step , 58173.98 GFLOP/s , 532036.5 tokens/s INFO:__main__:2024-10-27 04:24:51 | Epoch: 1 | Step: 47780 | Dataset: 0-18496901 | Loss: 2.265 | 674 ms/step , 58308.29 GFLOP/s , 532230.9 tokens/s INFO:__main__:2024-10-27 04:24:59 | Epoch: 1 | Step: 47790 | Dataset: 0-18504901 | Loss: 2.115 | 675 ms/step , 58196.66 GFLOP/s , 532127.4 tokens/s INFO:__main__:2024-10-27 04:25:06 | Epoch: 1 | Step: 47800 | Dataset: 0-18512901 | Loss: 2.306 | 675 ms/step , 58222.14 GFLOP/s , 532383.0 tokens/s INFO:__main__:2024-10-27 04:25:14 | Epoch: 1 | Step: 47810 | Dataset: 0-18520901 | Loss: 2.158 | 675 ms/step , 58269.09 GFLOP/s , 532549.8 tokens/s INFO:__main__:2024-10-27 04:25:22 | Epoch: 1 | Step: 47820 | Dataset: 0-18528901 | Loss: 2.107 | 675 ms/step , 58265.35 GFLOP/s , 532412.0 tokens/s INFO:__main__:2024-10-27 04:25:29 | Epoch: 1 | Step: 47830 | Dataset: 0-18536901 | Loss: 2.134 | 676 ms/step , 58168.74 GFLOP/s , 531965.2 tokens/s INFO:__main__:2024-10-27 04:25:37 | Epoch: 1 | Step: 47840 | Dataset: 0-18544901 | Loss: 2.133 | 676 ms/step , 58164.55 GFLOP/s , 532116.8 tokens/s INFO:__main__:2024-10-27 04:25:45 | Epoch: 1 | Step: 47850 | Dataset: 0-18552901 | Loss: 2.181 | 674 ms/step , 58307.33 GFLOP/s , 532973.3 tokens/s INFO:__main__:2024-10-27 04:25:52 | Epoch: 1 | Step: 47860 | Dataset: 0-18560901 | Loss: 2.184 | 675 ms/step , 58263.75 GFLOP/s , 532607.5 tokens/s INFO:__main__:2024-10-27 04:26:00 | Epoch: 1 | Step: 47870 | Dataset: 0-18568901 | Loss: 2.089 | 676 ms/step , 58164.29 GFLOP/s , 531903.9 tokens/s INFO:__main__:2024-10-27 04:26:08 | Epoch: 1 | Step: 47880 | Dataset: 0-18576901 | Loss: 2.139 | 675 ms/step , 58199.00 GFLOP/s , 532267.6 tokens/s INFO:__main__:2024-10-27 04:26:16 | Epoch: 1 | Step: 47890 | Dataset: 0-18584901 | Loss: 2.127 | 674 ms/step , 58351.21 GFLOP/s , 532859.2 tokens/s INFO:__main__:2024-10-27 04:26:23 | Epoch: 1 | Step: 47900 | Dataset: 0-18592901 | Loss: 2.149 | 675 ms/step , 58274.84 GFLOP/s , 531983.8 tokens/s INFO:__main__:2024-10-27 04:26:31 | Epoch: 1 | Step: 47910 | Dataset: 0-18600901 | Loss: 2.161 | 674 ms/step , 58284.66 GFLOP/s , 532552.5 tokens/s INFO:__main__:2024-10-27 04:26:39 | Epoch: 1 | Step: 47920 | Dataset: 0-18608901 | Loss: 2.168 | 674 ms/step , 58287.00 GFLOP/s , 532575.6 tokens/s INFO:__main__:2024-10-27 04:26:46 | Epoch: 1 | Step: 47930 | Dataset: 0-18616901 | Loss: 2.092 | 675 ms/step , 58203.78 GFLOP/s , 531966.6 tokens/s INFO:__main__:2024-10-27 04:26:54 | Epoch: 1 | Step: 47940 | Dataset: 0-18624901 | Loss: 2.176 | 675 ms/step , 58258.44 GFLOP/s , 532280.5 tokens/s INFO:__main__:2024-10-27 04:27:02 | Epoch: 1 | Step: 47950 | Dataset: 0-18632901 | Loss: 2.211 | 674 ms/step , 58291.43 GFLOP/s , 532478.3 tokens/s INFO:__main__:2024-10-27 04:27:09 | Epoch: 1 | Step: 47960 | Dataset: 0-18640901 | Loss: 2.137 | 675 ms/step , 58275.57 GFLOP/s , 532141.1 tokens/s INFO:__main__:2024-10-27 04:27:17 | Epoch: 1 | Step: 47970 | Dataset: 0-18648901 | Loss: 2.208 | 673 ms/step , 58370.24 GFLOP/s , 533031.1 tokens/s INFO:__main__:2024-10-27 04:27:25 | Epoch: 1 | Step: 47980 | Dataset: 0-18656901 | Loss: 2.151 | 674 ms/step , 58339.25 GFLOP/s , 532745.5 tokens/s INFO:__main__:2024-10-27 04:27:32 | Epoch: 1 | Step: 47990 | Dataset: 0-18664901 | Loss: 2.133 | 675 ms/step , 58278.99 GFLOP/s , 532853.2 tokens/s INFO:__main__:2024-10-27 04:27:40 | Validation | Step: 48000 | Val_loss: 2.193 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 04:27:40 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_042740_step_48000.pt` INFO:__main__:2024-10-27 04:27:41 | Epoch: 1 | Step: 48000 | Dataset: 0-18672901 | Loss: 2.132 | 674 ms/step , 58334.11 GFLOP/s , 479065.9 tokens/s INFO:__main__:2024-10-27 04:27:49 | Epoch: 1 | Step: 48010 | Dataset: 0-18680901 | Loss: 2.132 | 675 ms/step , 58263.19 GFLOP/s , 532299.5 tokens/s INFO:__main__:2024-10-27 04:27:56 | Epoch: 1 | Step: 48020 | Dataset: 0-18688901 | Loss: 2.123 | 675 ms/step , 58212.91 GFLOP/s , 532318.8 tokens/s INFO:__main__:2024-10-27 04:28:04 | Epoch: 1 | Step: 48030 | Dataset: 0-18696901 | Loss: 2.134 | 676 ms/step , 58175.71 GFLOP/s , 531800.7 tokens/s INFO:__main__:2024-10-27 04:28:12 | Epoch: 1 | Step: 48040 | Dataset: 0-18704901 | Loss: 2.102 | 675 ms/step , 58257.71 GFLOP/s , 531957.2 tokens/s INFO:__main__:2024-10-27 04:28:20 | Epoch: 1 | Step: 48050 | Dataset: 0-18712901 | Loss: 2.154 | 674 ms/step , 58286.73 GFLOP/s , 532291.9 tokens/s INFO:__main__:2024-10-27 04:28:27 | Epoch: 1 | Step: 48060 | Dataset: 0-18720901 | Loss: 2.104 | 675 ms/step , 58227.95 GFLOP/s , 532621.7 tokens/s INFO:__main__:2024-10-27 04:28:35 | Epoch: 1 | Step: 48070 | Dataset: 0-18728901 | Loss: 2.171 | 674 ms/step , 58281.57 GFLOP/s , 532465.6 tokens/s INFO:__main__:2024-10-27 04:28:43 | Epoch: 1 | Step: 48080 | Dataset: 0-18736901 | Loss: 2.226 | 674 ms/step , 58306.22 GFLOP/s , 532620.1 tokens/s INFO:__main__:2024-10-27 04:28:50 | Epoch: 1 | Step: 48090 | Dataset: 0-18744901 | Loss: 2.113 | 674 ms/step , 58300.29 GFLOP/s , 532463.2 tokens/s INFO:__main__:2024-10-27 04:28:58 | Epoch: 1 | Step: 48100 | Dataset: 0-18752901 | Loss: 2.054 | 675 ms/step , 58225.69 GFLOP/s , 532448.5 tokens/s INFO:__main__:2024-10-27 04:29:06 | Epoch: 1 | Step: 48110 | Dataset: 0-18760901 | Loss: 1.997 | 675 ms/step , 58226.95 GFLOP/s , 531515.5 tokens/s INFO:__main__:2024-10-27 04:29:13 | Epoch: 1 | Step: 48120 | Dataset: 0-18768901 | Loss: 1.849 | 674 ms/step , 58334.12 GFLOP/s , 531813.5 tokens/s INFO:__main__:2024-10-27 04:29:21 | Epoch: 1 | Step: 48130 | Dataset: 0-18776901 | Loss: 1.800 | 675 ms/step , 58196.90 GFLOP/s , 532075.3 tokens/s INFO:__main__:2024-10-27 04:29:29 | Epoch: 1 | Step: 48140 | Dataset: 0-18784901 | Loss: 1.791 | 675 ms/step , 58237.35 GFLOP/s , 532347.7 tokens/s INFO:__main__:2024-10-27 04:29:36 | Epoch: 1 | Step: 48150 | Dataset: 0-18792901 | Loss: 1.776 | 676 ms/step , 58142.98 GFLOP/s , 531713.2 tokens/s INFO:__main__:2024-10-27 04:29:44 | Epoch: 1 | Step: 48160 | Dataset: 0-18800901 | Loss: 1.788 | 675 ms/step , 58244.88 GFLOP/s , 531983.8 tokens/s INFO:__main__:2024-10-27 04:29:52 | Epoch: 1 | Step: 48170 | Dataset: 0-18808901 | Loss: 1.766 | 675 ms/step , 58219.65 GFLOP/s , 531654.8 tokens/s INFO:__main__:2024-10-27 04:30:00 | Epoch: 1 | Step: 48180 | Dataset: 0-18816901 | Loss: 1.828 | 675 ms/step , 58258.74 GFLOP/s , 531368.9 tokens/s INFO:__main__:2024-10-27 04:30:07 | Epoch: 1 | Step: 48190 | Dataset: 0-18824901 | Loss: 1.762 | 676 ms/step , 58161.08 GFLOP/s , 531506.1 tokens/s INFO:__main__:2024-10-27 04:30:15 | Epoch: 1 | Step: 48200 | Dataset: 0-18832901 | Loss: 2.341 | 674 ms/step , 58306.17 GFLOP/s , 531417.0 tokens/s INFO:__main__:2024-10-27 04:30:23 | Epoch: 1 | Step: 48210 | Dataset: 0-18840901 | Loss: 2.276 | 674 ms/step , 58312.81 GFLOP/s , 532651.3 tokens/s INFO:__main__:2024-10-27 04:30:30 | Epoch: 1 | Step: 48220 | Dataset: 0-18848901 | Loss: 2.195 | 677 ms/step , 58094.53 GFLOP/s , 531507.8 tokens/s INFO:__main__:2024-10-27 04:30:38 | Epoch: 1 | Step: 48230 | Dataset: 0-18856901 | Loss: 2.170 | 675 ms/step , 58214.63 GFLOP/s , 531675.3 tokens/s INFO:__main__:2024-10-27 04:30:46 | Epoch: 1 | Step: 48240 | Dataset: 0-18864901 | Loss: 2.148 | 676 ms/step , 58124.89 GFLOP/s , 530316.0 tokens/s INFO:__main__:2024-10-27 04:30:54 | Epoch: 1 | Step: 48250 | Dataset: 0-18872901 | Loss: 2.191 | 674 ms/step , 58350.57 GFLOP/s , 532323.8 tokens/s INFO:__main__:2024-10-27 04:31:01 | Epoch: 1 | Step: 48260 | Dataset: 0-18880901 | Loss: 2.165 | 676 ms/step , 58135.91 GFLOP/s , 531444.3 tokens/s INFO:__main__:2024-10-27 04:31:09 | Epoch: 1 | Step: 48270 | Dataset: 0-18888901 | Loss: 2.132 | 675 ms/step , 58199.90 GFLOP/s , 531562.7 tokens/s INFO:__main__:2024-10-27 04:31:17 | Epoch: 1 | Step: 48280 | Dataset: 0-18896901 | Loss: 2.193 | 675 ms/step , 58238.23 GFLOP/s , 532120.0 tokens/s INFO:__main__:2024-10-27 04:31:24 | Epoch: 1 | Step: 48290 | Dataset: 0-18904901 | Loss: 2.129 | 675 ms/step , 58270.00 GFLOP/s , 531627.2 tokens/s INFO:__main__:2024-10-27 04:31:32 | Epoch: 1 | Step: 48300 | Dataset: 0-18912901 | Loss: 2.209 | 676 ms/step , 58188.60 GFLOP/s , 531530.9 tokens/s INFO:__main__:2024-10-27 04:31:40 | Epoch: 1 | Step: 48310 | Dataset: 0-18920901 | Loss: 2.159 | 675 ms/step , 58240.67 GFLOP/s , 531313.1 tokens/s INFO:__main__:2024-10-27 04:31:47 | Epoch: 1 | Step: 48320 | Dataset: 0-18928901 | Loss: 2.193 | 675 ms/step , 58204.77 GFLOP/s , 532127.8 tokens/s INFO:__main__:2024-10-27 04:31:55 | Epoch: 1 | Step: 48330 | Dataset: 0-18936901 | Loss: 2.030 | 675 ms/step , 58250.52 GFLOP/s , 532293.0 tokens/s INFO:__main__:2024-10-27 04:32:03 | Epoch: 1 | Step: 48340 | Dataset: 0-18944901 | Loss: 2.102 | 684 ms/step , 57503.34 GFLOP/s , 530666.4 tokens/s INFO:__main__:2024-10-27 04:32:11 | Epoch: 1 | Step: 48350 | Dataset: 0-18952901 | Loss: 2.224 | 676 ms/step , 58123.57 GFLOP/s , 530677.1 tokens/s INFO:__main__:2024-10-27 04:32:18 | Epoch: 1 | Step: 48360 | Dataset: 0-18960901 | Loss: 2.291 | 676 ms/step , 58132.37 GFLOP/s , 529717.6 tokens/s INFO:__main__:2024-10-27 04:32:26 | Epoch: 1 | Step: 48370 | Dataset: 0-18968901 | Loss: 2.203 | 677 ms/step , 58095.54 GFLOP/s , 530646.3 tokens/s INFO:__main__:2024-10-27 04:32:34 | Epoch: 1 | Step: 48380 | Dataset: 0-18976901 | Loss: 2.061 | 676 ms/step , 58162.83 GFLOP/s , 531299.0 tokens/s INFO:__main__:2024-10-27 04:32:41 | Epoch: 1 | Step: 48390 | Dataset: 0-18984901 | Loss: 2.182 | 676 ms/step , 58132.25 GFLOP/s , 530967.7 tokens/s INFO:__main__:2024-10-27 04:32:49 | Epoch: 1 | Step: 48400 | Dataset: 0-18992901 | Loss: 2.210 | 676 ms/step , 58120.26 GFLOP/s , 530277.7 tokens/s INFO:__main__:2024-10-27 04:32:57 | Epoch: 1 | Step: 48410 | Dataset: 0-19000901 | Loss: 2.154 | 676 ms/step , 58164.60 GFLOP/s , 530984.7 tokens/s INFO:__main__:2024-10-27 04:33:05 | Epoch: 1 | Step: 48420 | Dataset: 0-19008901 | Loss: 2.234 | 677 ms/step , 58028.79 GFLOP/s , 531013.8 tokens/s INFO:__main__:2024-10-27 04:33:12 | Epoch: 1 | Step: 48430 | Dataset: 0-19016901 | Loss: 2.193 | 676 ms/step , 58167.39 GFLOP/s , 530671.9 tokens/s INFO:__main__:2024-10-27 04:33:20 | Epoch: 1 | Step: 48440 | Dataset: 0-19024901 | Loss: 2.154 | 676 ms/step , 58144.36 GFLOP/s , 531097.1 tokens/s INFO:__main__:2024-10-27 04:33:28 | Epoch: 1 | Step: 48450 | Dataset: 0-19032901 | Loss: 2.157 | 677 ms/step , 58054.90 GFLOP/s , 531890.8 tokens/s INFO:__main__:2024-10-27 04:33:35 | Epoch: 1 | Step: 48460 | Dataset: 0-19040901 | Loss: 2.204 | 678 ms/step , 58009.53 GFLOP/s , 530902.0 tokens/s INFO:__main__:2024-10-27 04:33:43 | Epoch: 1 | Step: 48470 | Dataset: 0-19048901 | Loss: 2.179 | 675 ms/step , 58225.86 GFLOP/s , 532139.9 tokens/s INFO:__main__:2024-10-27 04:33:51 | Epoch: 1 | Step: 48480 | Dataset: 0-19056901 | Loss: 2.237 | 676 ms/step , 58122.16 GFLOP/s , 532220.5 tokens/s INFO:__main__:2024-10-27 04:33:59 | Epoch: 1 | Step: 48490 | Dataset: 0-19064901 | Loss: 2.084 | 676 ms/step , 58178.96 GFLOP/s , 532138.4 tokens/s INFO:__main__:2024-10-27 04:34:06 | Epoch: 1 | Step: 48500 | Dataset: 0-19072901 | Loss: 2.220 | 675 ms/step , 58274.50 GFLOP/s , 532229.8 tokens/s INFO:__main__:2024-10-27 04:34:14 | Epoch: 1 | Step: 48510 | Dataset: 0-19080901 | Loss: 2.201 | 676 ms/step , 58176.10 GFLOP/s , 531961.5 tokens/s INFO:__main__:2024-10-27 04:34:22 | Epoch: 1 | Step: 48520 | Dataset: 0-19088901 | Loss: 2.209 | 674 ms/step , 58308.14 GFLOP/s , 533235.2 tokens/s INFO:__main__:2024-10-27 04:34:29 | Epoch: 1 | Step: 48530 | Dataset: 0-19096901 | Loss: 2.181 | 674 ms/step , 58349.56 GFLOP/s , 532961.4 tokens/s INFO:__main__:2024-10-27 04:34:37 | Epoch: 1 | Step: 48540 | Dataset: 0-19104901 | Loss: 2.137 | 674 ms/step , 58311.22 GFLOP/s , 532563.8 tokens/s INFO:__main__:2024-10-27 04:34:45 | Epoch: 1 | Step: 48550 | Dataset: 0-19112901 | Loss: 2.240 | 675 ms/step , 58219.02 GFLOP/s , 532674.4 tokens/s INFO:__main__:2024-10-27 04:34:52 | Epoch: 1 | Step: 48560 | Dataset: 0-19120901 | Loss: 2.186 | 676 ms/step , 58171.87 GFLOP/s , 532418.2 tokens/s INFO:__main__:2024-10-27 04:35:00 | Epoch: 1 | Step: 48570 | Dataset: 0-19128901 | Loss: 2.105 | 676 ms/step , 58170.59 GFLOP/s , 532489.2 tokens/s INFO:__main__:2024-10-27 04:35:08 | Epoch: 1 | Step: 48580 | Dataset: 0-19136901 | Loss: 2.240 | 676 ms/step , 58141.68 GFLOP/s , 531189.0 tokens/s INFO:__main__:2024-10-27 04:35:16 | Epoch: 1 | Step: 48590 | Dataset: 0-19144901 | Loss: 2.102 | 676 ms/step , 58113.13 GFLOP/s , 530601.2 tokens/s INFO:__main__:2024-10-27 04:35:23 | Epoch: 1 | Step: 48600 | Dataset: 0-19152901 | Loss: 2.139 | 676 ms/step , 58109.60 GFLOP/s , 531194.2 tokens/s INFO:__main__:2024-10-27 04:35:31 | Epoch: 1 | Step: 48610 | Dataset: 0-19160901 | Loss: 2.150 | 676 ms/step , 58127.59 GFLOP/s , 531212.7 tokens/s INFO:__main__:2024-10-27 04:35:39 | Epoch: 1 | Step: 48620 | Dataset: 0-19168901 | Loss: 2.168 | 676 ms/step , 58172.86 GFLOP/s , 531124.6 tokens/s INFO:__main__:2024-10-27 04:35:46 | Epoch: 1 | Step: 48630 | Dataset: 0-19176901 | Loss: 2.232 | 676 ms/step , 58143.46 GFLOP/s , 531364.7 tokens/s INFO:__main__:2024-10-27 04:35:54 | Epoch: 1 | Step: 48640 | Dataset: 0-19184901 | Loss: 2.213 | 675 ms/step , 58224.31 GFLOP/s , 531386.6 tokens/s INFO:__main__:2024-10-27 04:36:02 | Epoch: 1 | Step: 48650 | Dataset: 0-19192901 | Loss: 2.192 | 675 ms/step , 58258.75 GFLOP/s , 529669.7 tokens/s INFO:__main__:2024-10-27 04:36:09 | Epoch: 1 | Step: 48660 | Dataset: 0-19200901 | Loss: 2.168 | 675 ms/step , 58230.34 GFLOP/s , 532612.3 tokens/s INFO:__main__:2024-10-27 04:36:17 | Epoch: 1 | Step: 48670 | Dataset: 0-19208901 | Loss: 2.171 | 674 ms/step , 58307.30 GFLOP/s , 533059.2 tokens/s INFO:__main__:2024-10-27 04:36:25 | Epoch: 1 | Step: 48680 | Dataset: 0-19216901 | Loss: 2.803 | 674 ms/step , 58320.54 GFLOP/s , 533258.1 tokens/s INFO:__main__:2024-10-27 04:36:33 | Epoch: 1 | Step: 48690 | Dataset: 0-19224901 | Loss: 2.649 | 674 ms/step , 58312.59 GFLOP/s , 533164.8 tokens/s INFO:__main__:2024-10-27 04:36:40 | Epoch: 1 | Step: 48700 | Dataset: 0-19232901 | Loss: 2.543 | 675 ms/step , 58264.57 GFLOP/s , 532836.8 tokens/s INFO:__main__:2024-10-27 04:36:48 | Epoch: 1 | Step: 48710 | Dataset: 0-19240901 | Loss: 2.581 | 674 ms/step , 58308.78 GFLOP/s , 532547.4 tokens/s INFO:__main__:2024-10-27 04:36:56 | Epoch: 1 | Step: 48720 | Dataset: 0-19248901 | Loss: 2.512 | 675 ms/step , 58258.76 GFLOP/s , 532902.4 tokens/s INFO:__main__:2024-10-27 04:37:03 | Epoch: 1 | Step: 48730 | Dataset: 0-19256901 | Loss: 2.562 | 674 ms/step , 58326.43 GFLOP/s , 532655.7 tokens/s INFO:__main__:2024-10-27 04:37:11 | Epoch: 1 | Step: 48740 | Dataset: 0-19264901 | Loss: 2.549 | 675 ms/step , 58277.95 GFLOP/s , 532807.3 tokens/s INFO:__main__:2024-10-27 04:37:19 | Epoch: 1 | Step: 48750 | Dataset: 0-19272901 | Loss: 2.422 | 675 ms/step , 58248.13 GFLOP/s , 532454.3 tokens/s INFO:__main__:2024-10-27 04:37:26 | Epoch: 1 | Step: 48760 | Dataset: 0-19280901 | Loss: 2.462 | 674 ms/step , 58319.11 GFLOP/s , 532945.0 tokens/s INFO:__main__:2024-10-27 04:37:34 | Epoch: 1 | Step: 48770 | Dataset: 0-19288901 | Loss: 2.496 | 674 ms/step , 58292.44 GFLOP/s , 532728.1 tokens/s INFO:__main__:2024-10-27 04:37:42 | Epoch: 1 | Step: 48780 | Dataset: 0-19296901 | Loss: 2.456 | 674 ms/step , 58300.21 GFLOP/s , 533627.9 tokens/s INFO:__main__:2024-10-27 04:37:49 | Epoch: 1 | Step: 48790 | Dataset: 0-19304901 | Loss: 2.441 | 676 ms/step , 58179.92 GFLOP/s , 533114.0 tokens/s INFO:__main__:2024-10-27 04:37:57 | Epoch: 1 | Step: 48800 | Dataset: 0-19312901 | Loss: 2.411 | 674 ms/step , 58339.60 GFLOP/s , 533318.2 tokens/s INFO:__main__:2024-10-27 04:38:05 | Epoch: 1 | Step: 48810 | Dataset: 0-19320901 | Loss: 2.448 | 674 ms/step , 58326.01 GFLOP/s , 533319.4 tokens/s INFO:__main__:2024-10-27 04:38:12 | Epoch: 1 | Step: 48820 | Dataset: 0-19328901 | Loss: 2.534 | 674 ms/step , 58303.69 GFLOP/s , 533530.0 tokens/s INFO:__main__:2024-10-27 04:38:20 | Epoch: 1 | Step: 48830 | Dataset: 0-19336901 | Loss: 2.470 | 674 ms/step , 58294.73 GFLOP/s , 533319.4 tokens/s INFO:__main__:2024-10-27 04:38:28 | Epoch: 1 | Step: 48840 | Dataset: 0-19344901 | Loss: 2.416 | 674 ms/step , 58279.90 GFLOP/s , 533341.3 tokens/s INFO:__main__:2024-10-27 04:38:35 | Epoch: 1 | Step: 48850 | Dataset: 0-19352901 | Loss: 2.281 | 675 ms/step , 58271.38 GFLOP/s , 533791.3 tokens/s INFO:__main__:2024-10-27 04:38:43 | Epoch: 1 | Step: 48860 | Dataset: 0-19360901 | Loss: 2.207 | 675 ms/step , 58254.36 GFLOP/s , 533313.1 tokens/s INFO:__main__:2024-10-27 04:38:51 | Epoch: 1 | Step: 48870 | Dataset: 0-19368901 | Loss: 2.234 | 675 ms/step , 58253.63 GFLOP/s , 533232.1 tokens/s INFO:__main__:2024-10-27 04:38:59 | Epoch: 1 | Step: 48880 | Dataset: 0-19376901 | Loss: 2.129 | 674 ms/step , 58313.81 GFLOP/s , 533328.6 tokens/s INFO:__main__:2024-10-27 04:39:06 | Epoch: 1 | Step: 48890 | Dataset: 0-19384901 | Loss: 2.166 | 674 ms/step , 58362.73 GFLOP/s , 533611.6 tokens/s INFO:__main__:2024-10-27 04:39:14 | Epoch: 1 | Step: 48900 | Dataset: 0-19392901 | Loss: 2.258 | 675 ms/step , 58225.26 GFLOP/s , 533738.5 tokens/s INFO:__main__:2024-10-27 04:39:22 | Epoch: 1 | Step: 48910 | Dataset: 0-19400901 | Loss: 2.188 | 675 ms/step , 58193.67 GFLOP/s , 530524.2 tokens/s INFO:__main__:2024-10-27 04:39:29 | Epoch: 1 | Step: 48920 | Dataset: 0-19408901 | Loss: 2.205 | 674 ms/step , 58308.18 GFLOP/s , 533408.1 tokens/s INFO:__main__:2024-10-27 04:39:37 | Epoch: 1 | Step: 48930 | Dataset: 0-19416901 | Loss: 2.204 | 673 ms/step , 58366.95 GFLOP/s , 533401.2 tokens/s INFO:__main__:2024-10-27 04:39:45 | Epoch: 1 | Step: 48940 | Dataset: 0-19424901 | Loss: 2.214 | 674 ms/step , 58289.04 GFLOP/s , 533714.6 tokens/s INFO:__main__:2024-10-27 04:39:52 | Epoch: 1 | Step: 48950 | Dataset: 0-19432901 | Loss: 2.086 | 675 ms/step , 58246.17 GFLOP/s , 533288.7 tokens/s INFO:__main__:2024-10-27 04:40:00 | Epoch: 1 | Step: 48960 | Dataset: 0-19440901 | Loss: 2.224 | 675 ms/step , 58255.00 GFLOP/s , 533274.9 tokens/s INFO:__main__:2024-10-27 04:40:08 | Epoch: 1 | Step: 48970 | Dataset: 0-19448901 | Loss: 2.189 | 675 ms/step , 58194.26 GFLOP/s , 532507.1 tokens/s INFO:__main__:2024-10-27 04:40:15 | Epoch: 1 | Step: 48980 | Dataset: 0-19456901 | Loss: 2.212 | 675 ms/step , 58261.18 GFLOP/s , 533123.2 tokens/s INFO:__main__:2024-10-27 04:40:23 | Epoch: 1 | Step: 48990 | Dataset: 0-19464901 | Loss: 2.187 | 675 ms/step , 58211.43 GFLOP/s , 533048.2 tokens/s INFO:__main__:2024-10-27 04:40:30 | Validation | Step: 49000 | Val_loss: 2.224 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 04:40:30 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_044030_step_49000.pt` INFO:__main__:2024-10-27 04:40:32 | Epoch: 1 | Step: 49000 | Dataset: 0-19472901 | Loss: 2.215 | 675 ms/step , 58245.14 GFLOP/s , 479595.4 tokens/s INFO:__main__:2024-10-27 04:40:39 | Epoch: 1 | Step: 49010 | Dataset: 0-19480901 | Loss: 2.170 | 675 ms/step , 58235.55 GFLOP/s , 532935.1 tokens/s INFO:__main__:2024-10-27 04:40:47 | Epoch: 1 | Step: 49020 | Dataset: 0-19488901 | Loss: 2.148 | 675 ms/step , 58229.08 GFLOP/s , 532750.9 tokens/s INFO:__main__:2024-10-27 04:40:55 | Epoch: 1 | Step: 49030 | Dataset: 0-19496901 | Loss: 2.065 | 675 ms/step , 58201.70 GFLOP/s , 532929.6 tokens/s INFO:__main__:2024-10-27 04:41:02 | Epoch: 1 | Step: 49040 | Dataset: 0-19504901 | Loss: 2.208 | 676 ms/step , 58172.13 GFLOP/s , 532687.3 tokens/s INFO:__main__:2024-10-27 04:41:10 | Epoch: 1 | Step: 49050 | Dataset: 0-19512901 | Loss: 2.209 | 675 ms/step , 58202.60 GFLOP/s , 533039.0 tokens/s INFO:__main__:2024-10-27 04:41:18 | Epoch: 1 | Step: 49060 | Dataset: 0-19520901 | Loss: 2.132 | 676 ms/step , 58157.32 GFLOP/s , 532150.2 tokens/s INFO:__main__:2024-10-27 04:41:25 | Epoch: 1 | Step: 49070 | Dataset: 0-19528901 | Loss: 2.122 | 676 ms/step , 58171.36 GFLOP/s , 532815.5 tokens/s INFO:__main__:2024-10-27 04:41:33 | Epoch: 1 | Step: 49080 | Dataset: 0-19536901 | Loss: 2.125 | 675 ms/step , 58216.26 GFLOP/s , 532946.3 tokens/s INFO:__main__:2024-10-27 04:41:41 | Epoch: 1 | Step: 49090 | Dataset: 0-19544901 | Loss: 2.109 | 676 ms/step , 58117.35 GFLOP/s , 532388.2 tokens/s INFO:__main__:2024-10-27 04:41:48 | Epoch: 1 | Step: 49100 | Dataset: 0-19552901 | Loss: 2.164 | 677 ms/step , 58080.13 GFLOP/s , 532428.7 tokens/s INFO:__main__:2024-10-27 04:41:56 | Epoch: 1 | Step: 49110 | Dataset: 0-19560901 | Loss: 2.175 | 676 ms/step , 58185.96 GFLOP/s , 532599.9 tokens/s INFO:__main__:2024-10-27 04:42:04 | Epoch: 1 | Step: 49120 | Dataset: 0-19568901 | Loss: 2.157 | 675 ms/step , 58231.64 GFLOP/s , 532407.2 tokens/s INFO:__main__:2024-10-27 04:42:12 | Epoch: 1 | Step: 49130 | Dataset: 0-19576901 | Loss: 2.214 | 675 ms/step , 58223.51 GFLOP/s , 532366.9 tokens/s INFO:__main__:2024-10-27 04:42:19 | Epoch: 1 | Step: 49140 | Dataset: 0-19584901 | Loss: 2.100 | 675 ms/step , 58259.65 GFLOP/s , 532656.4 tokens/s INFO:__main__:2024-10-27 04:42:27 | Epoch: 1 | Step: 49150 | Dataset: 0-19592901 | Loss: 2.121 | 675 ms/step , 58227.86 GFLOP/s , 532609.8 tokens/s INFO:__main__:2024-10-27 04:42:35 | Epoch: 1 | Step: 49160 | Dataset: 0-19600901 | Loss: 2.202 | 674 ms/step , 58294.60 GFLOP/s , 532997.3 tokens/s INFO:__main__:2024-10-27 04:42:42 | Epoch: 1 | Step: 49170 | Dataset: 0-19608901 | Loss: 2.253 | 676 ms/step , 58141.64 GFLOP/s , 532787.1 tokens/s INFO:__main__:2024-10-27 04:42:50 | Epoch: 1 | Step: 49180 | Dataset: 0-19616901 | Loss: 2.191 | 675 ms/step , 58264.57 GFLOP/s , 532747.8 tokens/s INFO:__main__:2024-10-27 04:42:58 | Epoch: 1 | Step: 49190 | Dataset: 0-19624901 | Loss: 2.202 | 675 ms/step , 58213.23 GFLOP/s , 532497.8 tokens/s INFO:__main__:2024-10-27 04:43:05 | Epoch: 1 | Step: 49200 | Dataset: 0-19632901 | Loss: 2.219 | 675 ms/step , 58255.80 GFLOP/s , 532861.6 tokens/s INFO:__main__:2024-10-27 04:43:13 | Epoch: 1 | Step: 49210 | Dataset: 0-19640901 | Loss: 2.224 | 676 ms/step , 58174.14 GFLOP/s , 532325.8 tokens/s INFO:__main__:2024-10-27 04:43:21 | Epoch: 1 | Step: 49220 | Dataset: 0-19648901 | Loss: 2.182 | 675 ms/step , 58272.67 GFLOP/s , 532943.8 tokens/s INFO:__main__:2024-10-27 04:43:28 | Epoch: 1 | Step: 49230 | Dataset: 0-19656901 | Loss: 2.115 | 676 ms/step , 58185.22 GFLOP/s , 532713.6 tokens/s INFO:__main__:2024-10-27 04:43:36 | Epoch: 1 | Step: 49240 | Dataset: 0-19664901 | Loss: 2.144 | 675 ms/step , 58219.01 GFLOP/s , 532761.9 tokens/s INFO:__main__:2024-10-27 04:43:44 | Epoch: 1 | Step: 49250 | Dataset: 0-19672901 | Loss: 2.219 | 675 ms/step , 58265.18 GFLOP/s , 532983.9 tokens/s INFO:__main__:2024-10-27 04:43:51 | Epoch: 1 | Step: 49260 | Dataset: 0-19680901 | Loss: 2.183 | 675 ms/step , 58272.13 GFLOP/s , 532890.3 tokens/s INFO:__main__:2024-10-27 04:43:59 | Epoch: 1 | Step: 49270 | Dataset: 0-19688901 | Loss: 2.186 | 675 ms/step , 58201.99 GFLOP/s , 532687.5 tokens/s INFO:__main__:2024-10-27 04:44:07 | Epoch: 1 | Step: 49280 | Dataset: 0-19696901 | Loss: 2.173 | 675 ms/step , 58213.33 GFLOP/s , 532672.2 tokens/s INFO:__main__:2024-10-27 04:44:15 | Epoch: 1 | Step: 49290 | Dataset: 0-19704901 | Loss: 2.187 | 673 ms/step , 58372.15 GFLOP/s , 533160.7 tokens/s INFO:__main__:2024-10-27 04:44:22 | Epoch: 1 | Step: 49300 | Dataset: 0-19712901 | Loss: 2.210 | 675 ms/step , 58259.30 GFLOP/s , 533289.5 tokens/s INFO:__main__:2024-10-27 04:44:30 | Epoch: 1 | Step: 49310 | Dataset: 0-19720901 | Loss: 2.158 | 673 ms/step , 58377.06 GFLOP/s , 533759.1 tokens/s INFO:__main__:2024-10-27 04:44:38 | Epoch: 2 | Step: 49320 | Dataset: 0-802 | Loss: 2.152 | 675 ms/step , 58212.23 GFLOP/s , 533168.4 tokens/s INFO:__main__:2024-10-27 04:44:45 | Epoch: 2 | Step: 49330 | Dataset: 0-8802 | Loss: 1.992 | 676 ms/step , 58138.46 GFLOP/s , 531838.7 tokens/s INFO:__main__:2024-10-27 04:44:53 | Epoch: 2 | Step: 49340 | Dataset: 0-16802 | Loss: 1.886 | 675 ms/step , 58217.08 GFLOP/s , 531595.2 tokens/s INFO:__main__:2024-10-27 04:45:01 | Epoch: 2 | Step: 49350 | Dataset: 0-24802 | Loss: 1.860 | 675 ms/step , 58197.65 GFLOP/s , 531515.9 tokens/s INFO:__main__:2024-10-27 04:45:08 | Epoch: 2 | Step: 49360 | Dataset: 0-32802 | Loss: 1.827 | 676 ms/step , 58181.14 GFLOP/s , 531166.5 tokens/s INFO:__main__:2024-10-27 04:45:16 | Epoch: 2 | Step: 49370 | Dataset: 0-40802 | Loss: 1.788 | 674 ms/step , 58350.53 GFLOP/s , 532003.6 tokens/s INFO:__main__:2024-10-27 04:45:24 | Epoch: 2 | Step: 49380 | Dataset: 0-48802 | Loss: 1.786 | 675 ms/step , 58278.20 GFLOP/s , 532213.0 tokens/s INFO:__main__:2024-10-27 04:45:32 | Epoch: 2 | Step: 49390 | Dataset: 0-56802 | Loss: 1.804 | 675 ms/step , 58218.56 GFLOP/s , 531529.4 tokens/s INFO:__main__:2024-10-27 04:45:39 | Epoch: 2 | Step: 49400 | Dataset: 0-64802 | Loss: 1.817 | 676 ms/step , 58190.31 GFLOP/s , 531992.1 tokens/s INFO:__main__:2024-10-27 04:45:47 | Epoch: 2 | Step: 49410 | Dataset: 0-72802 | Loss: 1.800 | 676 ms/step , 58150.64 GFLOP/s , 532160.1 tokens/s INFO:__main__:2024-10-27 04:45:55 | Epoch: 2 | Step: 49420 | Dataset: 0-80802 | Loss: 2.170 | 676 ms/step , 58164.23 GFLOP/s , 531328.5 tokens/s INFO:__main__:2024-10-27 04:46:02 | Epoch: 2 | Step: 49430 | Dataset: 0-88802 | Loss: 2.135 | 674 ms/step , 58331.81 GFLOP/s , 531903.6 tokens/s INFO:__main__:2024-10-27 04:46:10 | Epoch: 2 | Step: 49440 | Dataset: 0-96802 | Loss: 2.243 | 674 ms/step , 58347.30 GFLOP/s , 533725.5 tokens/s INFO:__main__:2024-10-27 04:46:18 | Epoch: 2 | Step: 49450 | Dataset: 0-104802 | Loss: 2.125 | 675 ms/step , 58260.09 GFLOP/s , 533357.2 tokens/s INFO:__main__:2024-10-27 04:46:25 | Epoch: 2 | Step: 49460 | Dataset: 0-112802 | Loss: 2.239 | 676 ms/step , 58150.42 GFLOP/s , 531973.1 tokens/s INFO:__main__:2024-10-27 04:46:33 | Epoch: 2 | Step: 49470 | Dataset: 0-120802 | Loss: 2.045 | 673 ms/step , 58388.86 GFLOP/s , 533268.6 tokens/s INFO:__main__:2024-10-27 04:46:41 | Epoch: 2 | Step: 49480 | Dataset: 0-128802 | Loss: 2.250 | 674 ms/step , 58298.74 GFLOP/s , 533620.6 tokens/s INFO:__main__:2024-10-27 04:46:48 | Epoch: 2 | Step: 49490 | Dataset: 0-136802 | Loss: 2.100 | 674 ms/step , 58282.64 GFLOP/s , 533127.5 tokens/s INFO:__main__:2024-10-27 04:46:56 | Epoch: 2 | Step: 49500 | Dataset: 0-144802 | Loss: 2.058 | 674 ms/step , 58320.15 GFLOP/s , 533269.3 tokens/s INFO:__main__:2024-10-27 04:47:04 | Epoch: 2 | Step: 49510 | Dataset: 0-152802 | Loss: 2.154 | 674 ms/step , 58343.68 GFLOP/s , 533098.4 tokens/s INFO:__main__:2024-10-27 04:47:11 | Epoch: 2 | Step: 49520 | Dataset: 0-160802 | Loss: 2.160 | 675 ms/step , 58252.68 GFLOP/s , 532989.7 tokens/s INFO:__main__:2024-10-27 04:47:19 | Epoch: 2 | Step: 49530 | Dataset: 0-168802 | Loss: 2.129 | 675 ms/step , 58259.64 GFLOP/s , 532971.2 tokens/s INFO:__main__:2024-10-27 04:47:27 | Epoch: 2 | Step: 49540 | Dataset: 0-176802 | Loss: 2.126 | 674 ms/step , 58335.22 GFLOP/s , 533131.5 tokens/s INFO:__main__:2024-10-27 04:47:35 | Epoch: 2 | Step: 49550 | Dataset: 0-184802 | Loss: 2.149 | 674 ms/step , 58296.24 GFLOP/s , 533309.5 tokens/s INFO:__main__:2024-10-27 04:47:42 | Epoch: 2 | Step: 49560 | Dataset: 0-192802 | Loss: 2.171 | 674 ms/step , 58328.74 GFLOP/s , 532855.0 tokens/s INFO:__main__:2024-10-27 04:47:50 | Epoch: 2 | Step: 49570 | Dataset: 0-200802 | Loss: 2.132 | 674 ms/step , 58312.42 GFLOP/s , 533201.7 tokens/s INFO:__main__:2024-10-27 04:47:58 | Epoch: 2 | Step: 49580 | Dataset: 0-208802 | Loss: 2.211 | 674 ms/step , 58346.52 GFLOP/s , 533560.6 tokens/s INFO:__main__:2024-10-27 04:48:05 | Epoch: 2 | Step: 49590 | Dataset: 0-216802 | Loss: 2.206 | 675 ms/step , 58236.87 GFLOP/s , 532982.0 tokens/s INFO:__main__:2024-10-27 04:48:13 | Epoch: 2 | Step: 49600 | Dataset: 0-224802 | Loss: 2.231 | 674 ms/step , 58285.46 GFLOP/s , 533082.6 tokens/s INFO:__main__:2024-10-27 04:48:21 | Epoch: 2 | Step: 49610 | Dataset: 0-232802 | Loss: 2.205 | 676 ms/step , 58156.94 GFLOP/s , 532532.9 tokens/s INFO:__main__:2024-10-27 04:48:28 | Epoch: 2 | Step: 49620 | Dataset: 0-240802 | Loss: 2.199 | 675 ms/step , 58271.42 GFLOP/s , 533210.8 tokens/s INFO:__main__:2024-10-27 04:48:36 | Epoch: 2 | Step: 49630 | Dataset: 0-248802 | Loss: 2.236 | 675 ms/step , 58250.05 GFLOP/s , 532754.5 tokens/s INFO:__main__:2024-10-27 04:48:44 | Epoch: 2 | Step: 49640 | Dataset: 0-256802 | Loss: 2.245 | 675 ms/step , 58237.47 GFLOP/s , 532883.0 tokens/s INFO:__main__:2024-10-27 04:48:51 | Epoch: 2 | Step: 49650 | Dataset: 0-264802 | Loss: 2.211 | 675 ms/step , 58269.48 GFLOP/s , 533079.6 tokens/s INFO:__main__:2024-10-27 04:48:59 | Epoch: 2 | Step: 49660 | Dataset: 0-272802 | Loss: 2.163 | 673 ms/step , 58381.86 GFLOP/s , 533372.3 tokens/s INFO:__main__:2024-10-27 04:49:07 | Epoch: 2 | Step: 49670 | Dataset: 0-280802 | Loss: 2.203 | 675 ms/step , 58274.77 GFLOP/s , 533386.3 tokens/s INFO:__main__:2024-10-27 04:49:14 | Epoch: 2 | Step: 49680 | Dataset: 0-288802 | Loss: 2.235 | 677 ms/step , 58053.37 GFLOP/s , 531455.8 tokens/s INFO:__main__:2024-10-27 04:49:22 | Epoch: 2 | Step: 49690 | Dataset: 0-296802 | Loss: 2.234 | 675 ms/step , 58232.43 GFLOP/s , 532162.7 tokens/s INFO:__main__:2024-10-27 04:49:30 | Epoch: 2 | Step: 49700 | Dataset: 0-304802 | Loss: 2.220 | 675 ms/step , 58211.90 GFLOP/s , 532686.2 tokens/s INFO:__main__:2024-10-27 04:49:37 | Epoch: 2 | Step: 49710 | Dataset: 0-312802 | Loss: 2.195 | 675 ms/step , 58223.10 GFLOP/s , 532641.8 tokens/s INFO:__main__:2024-10-27 04:49:45 | Epoch: 2 | Step: 49720 | Dataset: 0-320802 | Loss: 2.160 | 675 ms/step , 58236.79 GFLOP/s , 532935.8 tokens/s INFO:__main__:2024-10-27 04:49:53 | Epoch: 2 | Step: 49730 | Dataset: 0-328802 | Loss: 2.177 | 675 ms/step , 58237.35 GFLOP/s , 532826.0 tokens/s INFO:__main__:2024-10-27 04:50:01 | Epoch: 2 | Step: 49740 | Dataset: 0-336802 | Loss: 1.900 | 674 ms/step , 58303.83 GFLOP/s , 532333.4 tokens/s INFO:__main__:2024-10-27 04:50:08 | Epoch: 2 | Step: 49750 | Dataset: 0-344802 | Loss: 1.816 | 675 ms/step , 58273.84 GFLOP/s , 531993.6 tokens/s INFO:__main__:2024-10-27 04:50:16 | Epoch: 2 | Step: 49760 | Dataset: 0-352802 | Loss: 1.824 | 675 ms/step , 58195.14 GFLOP/s , 532260.1 tokens/s INFO:__main__:2024-10-27 04:50:24 | Epoch: 2 | Step: 49770 | Dataset: 0-360802 | Loss: 1.815 | 675 ms/step , 58247.28 GFLOP/s , 532480.1 tokens/s INFO:__main__:2024-10-27 04:50:31 | Epoch: 2 | Step: 49780 | Dataset: 0-368802 | Loss: 1.798 | 675 ms/step , 58241.80 GFLOP/s , 532555.0 tokens/s INFO:__main__:2024-10-27 04:50:39 | Epoch: 2 | Step: 49790 | Dataset: 0-376802 | Loss: 1.777 | 677 ms/step , 58104.34 GFLOP/s , 531021.3 tokens/s INFO:__main__:2024-10-27 04:50:47 | Epoch: 2 | Step: 49800 | Dataset: 0-384802 | Loss: 1.784 | 678 ms/step , 57978.69 GFLOP/s , 531044.0 tokens/s INFO:__main__:2024-10-27 04:50:54 | Epoch: 2 | Step: 49810 | Dataset: 0-392802 | Loss: 1.790 | 677 ms/step , 58103.22 GFLOP/s , 530907.3 tokens/s INFO:__main__:2024-10-27 04:51:02 | Epoch: 2 | Step: 49820 | Dataset: 0-400802 | Loss: 1.766 | 674 ms/step , 58293.71 GFLOP/s , 531752.7 tokens/s INFO:__main__:2024-10-27 04:51:10 | Epoch: 2 | Step: 49830 | Dataset: 0-408802 | Loss: 2.235 | 676 ms/step , 58189.65 GFLOP/s , 533119.1 tokens/s INFO:__main__:2024-10-27 04:51:18 | Epoch: 2 | Step: 49840 | Dataset: 0-416802 | Loss: 2.168 | 675 ms/step , 58246.67 GFLOP/s , 532533.6 tokens/s INFO:__main__:2024-10-27 04:51:25 | Epoch: 2 | Step: 49850 | Dataset: 0-424802 | Loss: 2.167 | 674 ms/step , 58356.15 GFLOP/s , 533510.2 tokens/s INFO:__main__:2024-10-27 04:51:33 | Epoch: 2 | Step: 49860 | Dataset: 0-432802 | Loss: 2.183 | 675 ms/step , 58213.12 GFLOP/s , 532938.4 tokens/s INFO:__main__:2024-10-27 04:51:41 | Epoch: 2 | Step: 49870 | Dataset: 0-440802 | Loss: 2.182 | 676 ms/step , 58184.21 GFLOP/s , 532825.1 tokens/s INFO:__main__:2024-10-27 04:51:48 | Epoch: 2 | Step: 49880 | Dataset: 0-448802 | Loss: 2.136 | 675 ms/step , 58222.38 GFLOP/s , 532928.2 tokens/s INFO:__main__:2024-10-27 04:51:56 | Epoch: 2 | Step: 49890 | Dataset: 0-456802 | Loss: 2.150 | 675 ms/step , 58248.07 GFLOP/s , 532615.0 tokens/s INFO:__main__:2024-10-27 04:52:04 | Epoch: 2 | Step: 49900 | Dataset: 0-464802 | Loss: 2.187 | 676 ms/step , 58161.21 GFLOP/s , 532077.3 tokens/s INFO:__main__:2024-10-27 04:52:11 | Epoch: 2 | Step: 49910 | Dataset: 0-472802 | Loss: 2.166 | 676 ms/step , 58188.76 GFLOP/s , 532588.0 tokens/s INFO:__main__:2024-10-27 04:52:19 | Epoch: 2 | Step: 49920 | Dataset: 0-480802 | Loss: 2.258 | 676 ms/step , 58166.97 GFLOP/s , 532744.5 tokens/s INFO:__main__:2024-10-27 04:52:27 | Epoch: 2 | Step: 49930 | Dataset: 0-488802 | Loss: 2.142 | 675 ms/step , 58257.91 GFLOP/s , 532627.4 tokens/s INFO:__main__:2024-10-27 04:52:34 | Epoch: 2 | Step: 49940 | Dataset: 0-496802 | Loss: 2.204 | 675 ms/step , 58258.28 GFLOP/s , 532656.0 tokens/s INFO:__main__:2024-10-27 04:52:42 | Epoch: 2 | Step: 49950 | Dataset: 0-504802 | Loss: 2.157 | 675 ms/step , 58235.46 GFLOP/s , 532868.4 tokens/s INFO:__main__:2024-10-27 04:52:50 | Epoch: 2 | Step: 49960 | Dataset: 0-512802 | Loss: 2.198 | 675 ms/step , 58243.02 GFLOP/s , 533017.2 tokens/s INFO:__main__:2024-10-27 04:52:58 | Epoch: 2 | Step: 49970 | Dataset: 0-520802 | Loss: 2.108 | 675 ms/step , 58252.20 GFLOP/s , 533020.9 tokens/s INFO:__main__:2024-10-27 04:53:05 | Epoch: 2 | Step: 49980 | Dataset: 0-528802 | Loss: 2.164 | 675 ms/step , 58248.27 GFLOP/s , 533122.3 tokens/s INFO:__main__:2024-10-27 04:53:13 | Epoch: 2 | Step: 49990 | Dataset: 0-536802 | Loss: 2.135 | 678 ms/step , 58013.18 GFLOP/s , 531509.3 tokens/s INFO:__main__:2024-10-27 04:53:20 | Validation | Step: 50000 | Val_loss: 2.000 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 04:53:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_045320_step_50000.pt` INFO:__main__:2024-10-27 04:53:21 | Epoch: 2 | Step: 50000 | Dataset: 0-544802 | Loss: 2.116 | 675 ms/step , 58231.01 GFLOP/s , 477668.6 tokens/s INFO:__main__:2024-10-27 04:53:29 | Epoch: 2 | Step: 50010 | Dataset: 0-552802 | Loss: 2.089 | 677 ms/step , 58092.55 GFLOP/s , 531046.9 tokens/s INFO:__main__:2024-10-27 04:53:37 | Epoch: 2 | Step: 50020 | Dataset: 0-560802 | Loss: 2.089 | 677 ms/step , 58066.87 GFLOP/s , 531001.3 tokens/s INFO:__main__:2024-10-27 04:53:45 | Epoch: 2 | Step: 50030 | Dataset: 0-568802 | Loss: 2.089 | 676 ms/step , 58170.29 GFLOP/s , 531137.0 tokens/s INFO:__main__:2024-10-27 04:53:52 | Epoch: 2 | Step: 50040 | Dataset: 0-576802 | Loss: 2.044 | 676 ms/step , 58143.88 GFLOP/s , 531476.2 tokens/s INFO:__main__:2024-10-27 04:54:00 | Epoch: 2 | Step: 50050 | Dataset: 0-584802 | Loss: 2.071 | 675 ms/step , 58260.42 GFLOP/s , 530906.1 tokens/s INFO:__main__:2024-10-27 04:54:08 | Epoch: 2 | Step: 50060 | Dataset: 0-592802 | Loss: 2.127 | 675 ms/step , 58224.64 GFLOP/s , 529292.2 tokens/s INFO:__main__:2024-10-27 04:54:15 | Epoch: 2 | Step: 50070 | Dataset: 0-600802 | Loss: 1.999 | 675 ms/step , 58232.54 GFLOP/s , 531472.8 tokens/s INFO:__main__:2024-10-27 04:54:23 | Epoch: 2 | Step: 50080 | Dataset: 0-608802 | Loss: 2.122 | 675 ms/step , 58193.30 GFLOP/s , 532422.3 tokens/s INFO:__main__:2024-10-27 04:54:31 | Epoch: 2 | Step: 50090 | Dataset: 0-616802 | Loss: 2.071 | 674 ms/step , 58335.00 GFLOP/s , 532993.5 tokens/s INFO:__main__:2024-10-27 04:54:39 | Epoch: 2 | Step: 50100 | Dataset: 0-624802 | Loss: 2.128 | 677 ms/step , 58031.35 GFLOP/s , 532168.7 tokens/s INFO:__main__:2024-10-27 04:54:46 | Epoch: 2 | Step: 50110 | Dataset: 0-632802 | Loss: 2.092 | 677 ms/step , 58039.68 GFLOP/s , 531424.2 tokens/s INFO:__main__:2024-10-27 04:54:54 | Epoch: 2 | Step: 50120 | Dataset: 0-640802 | Loss: 2.107 | 674 ms/step , 58308.85 GFLOP/s , 533241.6 tokens/s INFO:__main__:2024-10-27 04:55:02 | Epoch: 2 | Step: 50130 | Dataset: 0-648802 | Loss: 2.155 | 675 ms/step , 58238.65 GFLOP/s , 533300.1 tokens/s INFO:__main__:2024-10-27 04:55:09 | Epoch: 2 | Step: 50140 | Dataset: 0-656802 | Loss: 2.071 | 675 ms/step , 58261.02 GFLOP/s , 533168.0 tokens/s INFO:__main__:2024-10-27 04:55:17 | Epoch: 2 | Step: 50150 | Dataset: 0-664802 | Loss: 2.195 | 674 ms/step , 58325.74 GFLOP/s , 533473.3 tokens/s INFO:__main__:2024-10-27 04:55:25 | Epoch: 2 | Step: 50160 | Dataset: 0-672802 | Loss: 2.049 | 674 ms/step , 58312.76 GFLOP/s , 533389.4 tokens/s INFO:__main__:2024-10-27 04:55:32 | Epoch: 2 | Step: 50170 | Dataset: 0-680802 | Loss: 2.113 | 674 ms/step , 58310.69 GFLOP/s , 533511.2 tokens/s INFO:__main__:2024-10-27 04:55:40 | Epoch: 2 | Step: 50180 | Dataset: 0-688802 | Loss: 2.105 | 676 ms/step , 58147.66 GFLOP/s , 532825.0 tokens/s INFO:__main__:2024-10-27 04:55:48 | Epoch: 2 | Step: 50190 | Dataset: 0-696802 | Loss: 2.207 | 675 ms/step , 58243.98 GFLOP/s , 532934.2 tokens/s INFO:__main__:2024-10-27 04:55:55 | Epoch: 2 | Step: 50200 | Dataset: 0-704802 | Loss: 2.139 | 675 ms/step , 58249.65 GFLOP/s , 532938.5 tokens/s INFO:__main__:2024-10-27 04:56:03 | Epoch: 2 | Step: 50210 | Dataset: 0-712802 | Loss: 2.092 | 675 ms/step , 58237.08 GFLOP/s , 532728.1 tokens/s INFO:__main__:2024-10-27 04:56:11 | Epoch: 2 | Step: 50220 | Dataset: 0-720802 | Loss: 2.044 | 674 ms/step , 58290.49 GFLOP/s , 532864.9 tokens/s INFO:__main__:2024-10-27 04:56:18 | Epoch: 2 | Step: 50230 | Dataset: 0-728802 | Loss: 2.156 | 675 ms/step , 58263.24 GFLOP/s , 532789.3 tokens/s INFO:__main__:2024-10-27 04:56:26 | Epoch: 2 | Step: 50240 | Dataset: 0-736802 | Loss: 2.002 | 675 ms/step , 58212.17 GFLOP/s , 533573.7 tokens/s INFO:__main__:2024-10-27 04:56:34 | Epoch: 2 | Step: 50250 | Dataset: 0-744802 | Loss: 2.162 | 674 ms/step , 58291.04 GFLOP/s , 533624.0 tokens/s INFO:__main__:2024-10-27 04:56:41 | Epoch: 2 | Step: 50260 | Dataset: 0-752802 | Loss: 2.127 | 674 ms/step , 58332.42 GFLOP/s , 533190.3 tokens/s INFO:__main__:2024-10-27 04:56:49 | Epoch: 2 | Step: 50270 | Dataset: 0-760802 | Loss: 2.140 | 674 ms/step , 58351.61 GFLOP/s , 533822.1 tokens/s INFO:__main__:2024-10-27 04:56:57 | Epoch: 2 | Step: 50280 | Dataset: 0-768802 | Loss: 2.123 | 674 ms/step , 58353.35 GFLOP/s , 533889.3 tokens/s INFO:__main__:2024-10-27 04:57:05 | Epoch: 2 | Step: 50290 | Dataset: 0-776802 | Loss: 2.106 | 677 ms/step , 58039.61 GFLOP/s , 533262.8 tokens/s INFO:__main__:2024-10-27 04:57:12 | Epoch: 2 | Step: 50300 | Dataset: 0-784802 | Loss: 2.128 | 677 ms/step , 58067.07 GFLOP/s , 531177.6 tokens/s INFO:__main__:2024-10-27 04:57:20 | Epoch: 2 | Step: 50310 | Dataset: 0-792802 | Loss: 2.180 | 677 ms/step , 58075.81 GFLOP/s , 532580.3 tokens/s INFO:__main__:2024-10-27 04:57:28 | Epoch: 2 | Step: 50320 | Dataset: 0-800802 | Loss: 2.229 | 675 ms/step , 58262.88 GFLOP/s , 532073.2 tokens/s INFO:__main__:2024-10-27 04:57:35 | Epoch: 2 | Step: 50330 | Dataset: 0-808802 | Loss: 2.234 | 675 ms/step , 58267.74 GFLOP/s , 533303.5 tokens/s INFO:__main__:2024-10-27 04:57:43 | Epoch: 2 | Step: 50340 | Dataset: 0-816802 | Loss: 2.084 | 674 ms/step , 58315.35 GFLOP/s , 532810.9 tokens/s INFO:__main__:2024-10-27 04:57:51 | Epoch: 2 | Step: 50350 | Dataset: 0-824802 | Loss: 2.127 | 674 ms/step , 58315.53 GFLOP/s , 532835.9 tokens/s INFO:__main__:2024-10-27 04:57:58 | Epoch: 2 | Step: 50360 | Dataset: 0-832802 | Loss: 2.042 | 676 ms/step , 58128.25 GFLOP/s , 533153.1 tokens/s INFO:__main__:2024-10-27 04:58:06 | Epoch: 2 | Step: 50370 | Dataset: 0-840802 | Loss: 2.023 | 675 ms/step , 58274.17 GFLOP/s , 533095.2 tokens/s INFO:__main__:2024-10-27 04:58:14 | Epoch: 2 | Step: 50380 | Dataset: 0-848802 | Loss: 2.177 | 674 ms/step , 58338.64 GFLOP/s , 533067.3 tokens/s INFO:__main__:2024-10-27 04:58:21 | Epoch: 2 | Step: 50390 | Dataset: 0-856802 | Loss: 2.092 | 676 ms/step , 58168.61 GFLOP/s , 532517.3 tokens/s INFO:__main__:2024-10-27 04:58:29 | Epoch: 2 | Step: 50400 | Dataset: 0-864802 | Loss: 2.160 | 675 ms/step , 58277.46 GFLOP/s , 532543.6 tokens/s INFO:__main__:2024-10-27 04:58:37 | Epoch: 2 | Step: 50410 | Dataset: 0-872802 | Loss: 2.037 | 674 ms/step , 58348.86 GFLOP/s , 532737.9 tokens/s INFO:__main__:2024-10-27 04:58:44 | Epoch: 2 | Step: 50420 | Dataset: 0-880802 | Loss: 2.123 | 675 ms/step , 58250.97 GFLOP/s , 533429.7 tokens/s INFO:__main__:2024-10-27 04:58:52 | Epoch: 2 | Step: 50430 | Dataset: 0-888802 | Loss: 2.134 | 675 ms/step , 58236.45 GFLOP/s , 532771.6 tokens/s INFO:__main__:2024-10-27 04:59:00 | Epoch: 2 | Step: 50440 | Dataset: 0-896802 | Loss: 2.117 | 675 ms/step , 58272.71 GFLOP/s , 532293.2 tokens/s INFO:__main__:2024-10-27 04:59:08 | Epoch: 2 | Step: 50450 | Dataset: 0-904802 | Loss: 2.030 | 675 ms/step , 58200.25 GFLOP/s , 532644.7 tokens/s INFO:__main__:2024-10-27 04:59:15 | Epoch: 2 | Step: 50460 | Dataset: 0-912802 | Loss: 2.027 | 677 ms/step , 58093.25 GFLOP/s , 532026.8 tokens/s INFO:__main__:2024-10-27 04:59:23 | Epoch: 2 | Step: 50470 | Dataset: 0-920802 | Loss: 2.145 | 675 ms/step , 58269.60 GFLOP/s , 532167.0 tokens/s INFO:__main__:2024-10-27 04:59:31 | Epoch: 2 | Step: 50480 | Dataset: 0-928802 | Loss: 2.207 | 674 ms/step , 58287.62 GFLOP/s , 531873.2 tokens/s INFO:__main__:2024-10-27 04:59:38 | Epoch: 2 | Step: 50490 | Dataset: 0-936802 | Loss: 2.148 | 674 ms/step , 58340.58 GFLOP/s , 531446.3 tokens/s INFO:__main__:2024-10-27 04:59:46 | Epoch: 2 | Step: 50500 | Dataset: 0-944802 | Loss: 2.082 | 674 ms/step , 58280.31 GFLOP/s , 532703.5 tokens/s INFO:__main__:2024-10-27 04:59:54 | Epoch: 2 | Step: 50510 | Dataset: 0-952802 | Loss: 2.195 | 675 ms/step , 58210.22 GFLOP/s , 531849.8 tokens/s INFO:__main__:2024-10-27 05:00:01 | Epoch: 2 | Step: 50520 | Dataset: 0-960802 | Loss: 2.170 | 675 ms/step , 58202.46 GFLOP/s , 532348.5 tokens/s INFO:__main__:2024-10-27 05:00:08 | Epoch: 2 | Step: 50530 | Dataset: 0-968802 | Loss: 2.090 | 675 ms/step , 58266.26 GFLOP/s , 606884.8 tokens/s INFO:__main__:2024-10-27 05:00:16 | Epoch: 2 | Step: 50540 | Dataset: 0-976802 | Loss: 2.115 | 675 ms/step , 58228.88 GFLOP/s , 531799.5 tokens/s INFO:__main__:2024-10-27 05:00:24 | Epoch: 2 | Step: 50550 | Dataset: 0-984802 | Loss: 2.193 | 676 ms/step , 58144.54 GFLOP/s , 530105.5 tokens/s INFO:__main__:2024-10-27 05:00:31 | Epoch: 2 | Step: 50560 | Dataset: 0-992802 | Loss: 2.111 | 683 ms/step , 57539.69 GFLOP/s , 532059.1 tokens/s INFO:__main__:2024-10-27 05:00:39 | Epoch: 2 | Step: 50570 | Dataset: 0-1000802 | Loss: 2.117 | 683 ms/step , 57569.19 GFLOP/s , 526716.3 tokens/s INFO:__main__:2024-10-27 05:00:47 | Epoch: 2 | Step: 50580 | Dataset: 0-1008802 | Loss: 2.210 | 684 ms/step , 57465.00 GFLOP/s , 526262.7 tokens/s INFO:__main__:2024-10-27 05:00:55 | Epoch: 2 | Step: 50590 | Dataset: 0-1016802 | Loss: 2.137 | 683 ms/step , 57557.16 GFLOP/s , 526035.9 tokens/s INFO:__main__:2024-10-27 05:01:02 | Epoch: 2 | Step: 50600 | Dataset: 0-1024802 | Loss: 2.094 | 683 ms/step , 57518.16 GFLOP/s , 526578.0 tokens/s INFO:__main__:2024-10-27 05:01:10 | Epoch: 2 | Step: 50610 | Dataset: 0-1032802 | Loss: 2.135 | 684 ms/step , 57504.22 GFLOP/s , 525656.8 tokens/s INFO:__main__:2024-10-27 05:01:18 | Epoch: 2 | Step: 50620 | Dataset: 0-1040802 | Loss: 2.097 | 675 ms/step , 58208.08 GFLOP/s , 529013.7 tokens/s INFO:__main__:2024-10-27 05:01:26 | Epoch: 2 | Step: 50630 | Dataset: 0-1048802 | Loss: 2.117 | 675 ms/step , 58232.49 GFLOP/s , 532396.8 tokens/s INFO:__main__:2024-10-27 05:01:33 | Epoch: 2 | Step: 50640 | Dataset: 0-1056802 | Loss: 1.934 | 681 ms/step , 57758.94 GFLOP/s , 530298.6 tokens/s INFO:__main__:2024-10-27 05:01:41 | Epoch: 2 | Step: 50650 | Dataset: 0-1064802 | Loss: 1.874 | 681 ms/step , 57761.15 GFLOP/s , 528540.8 tokens/s INFO:__main__:2024-10-27 05:01:49 | Epoch: 2 | Step: 50660 | Dataset: 0-1072802 | Loss: 1.811 | 679 ms/step , 57915.46 GFLOP/s , 528733.3 tokens/s INFO:__main__:2024-10-27 05:01:57 | Epoch: 2 | Step: 50670 | Dataset: 0-1080802 | Loss: 1.833 | 679 ms/step , 57907.14 GFLOP/s , 529416.6 tokens/s INFO:__main__:2024-10-27 05:02:04 | Epoch: 2 | Step: 50680 | Dataset: 0-1088802 | Loss: 1.806 | 679 ms/step , 57933.38 GFLOP/s , 528834.9 tokens/s INFO:__main__:2024-10-27 05:02:12 | Epoch: 2 | Step: 50690 | Dataset: 0-1096802 | Loss: 1.806 | 679 ms/step , 57910.55 GFLOP/s , 529324.4 tokens/s INFO:__main__:2024-10-27 05:02:20 | Epoch: 2 | Step: 50700 | Dataset: 0-1104802 | Loss: 1.762 | 680 ms/step , 57832.42 GFLOP/s , 528782.0 tokens/s INFO:__main__:2024-10-27 05:02:28 | Epoch: 2 | Step: 50710 | Dataset: 0-1112802 | Loss: 1.791 | 674 ms/step , 58283.48 GFLOP/s , 530315.6 tokens/s INFO:__main__:2024-10-27 05:02:35 | Epoch: 2 | Step: 50720 | Dataset: 0-1120802 | Loss: 2.342 | 677 ms/step , 58093.53 GFLOP/s , 532311.3 tokens/s INFO:__main__:2024-10-27 05:02:43 | Epoch: 2 | Step: 50730 | Dataset: 0-1128802 | Loss: 2.326 | 677 ms/step , 58093.90 GFLOP/s , 531829.4 tokens/s INFO:__main__:2024-10-27 05:02:51 | Epoch: 2 | Step: 50740 | Dataset: 0-1136802 | Loss: 2.228 | 676 ms/step , 58146.02 GFLOP/s , 531606.0 tokens/s INFO:__main__:2024-10-27 05:02:58 | Epoch: 2 | Step: 50750 | Dataset: 0-1144802 | Loss: 2.248 | 677 ms/step , 58041.85 GFLOP/s , 531689.5 tokens/s INFO:__main__:2024-10-27 05:03:06 | Epoch: 2 | Step: 50760 | Dataset: 0-1152802 | Loss: 2.279 | 676 ms/step , 58112.45 GFLOP/s , 531552.4 tokens/s INFO:__main__:2024-10-27 05:03:14 | Epoch: 2 | Step: 50770 | Dataset: 0-1160802 | Loss: 2.237 | 677 ms/step , 58086.55 GFLOP/s , 531406.9 tokens/s INFO:__main__:2024-10-27 05:03:21 | Epoch: 2 | Step: 50780 | Dataset: 0-1168802 | Loss: 2.192 | 677 ms/step , 58063.23 GFLOP/s , 531849.2 tokens/s INFO:__main__:2024-10-27 05:03:29 | Epoch: 2 | Step: 50790 | Dataset: 0-1176802 | Loss: 2.216 | 676 ms/step , 58113.64 GFLOP/s , 531777.2 tokens/s INFO:__main__:2024-10-27 05:03:37 | Epoch: 2 | Step: 50800 | Dataset: 0-1184802 | Loss: 2.214 | 678 ms/step , 57959.98 GFLOP/s , 531768.1 tokens/s INFO:__main__:2024-10-27 05:03:45 | Epoch: 2 | Step: 50810 | Dataset: 0-1192802 | Loss: 2.204 | 676 ms/step , 58151.45 GFLOP/s , 532095.6 tokens/s INFO:__main__:2024-10-27 05:03:52 | Epoch: 2 | Step: 50820 | Dataset: 0-1200802 | Loss: 2.135 | 676 ms/step , 58169.09 GFLOP/s , 532353.9 tokens/s INFO:__main__:2024-10-27 05:04:00 | Epoch: 2 | Step: 50830 | Dataset: 0-1208802 | Loss: 2.160 | 675 ms/step , 58213.88 GFLOP/s , 532095.5 tokens/s INFO:__main__:2024-10-27 05:04:08 | Epoch: 2 | Step: 50840 | Dataset: 0-1216802 | Loss: 2.196 | 675 ms/step , 58222.52 GFLOP/s , 532517.9 tokens/s INFO:__main__:2024-10-27 05:04:15 | Epoch: 2 | Step: 50850 | Dataset: 0-1224802 | Loss: 2.212 | 676 ms/step , 58179.28 GFLOP/s , 532310.7 tokens/s INFO:__main__:2024-10-27 05:04:23 | Epoch: 2 | Step: 50860 | Dataset: 0-1232802 | Loss: 2.085 | 676 ms/step , 58120.81 GFLOP/s , 532056.0 tokens/s INFO:__main__:2024-10-27 05:04:31 | Epoch: 2 | Step: 50870 | Dataset: 0-1240802 | Loss: 2.176 | 677 ms/step , 58088.90 GFLOP/s , 531848.7 tokens/s INFO:__main__:2024-10-27 05:04:38 | Epoch: 2 | Step: 50880 | Dataset: 0-1248802 | Loss: 2.220 | 676 ms/step , 58136.32 GFLOP/s , 532092.4 tokens/s INFO:__main__:2024-10-27 05:04:46 | Epoch: 2 | Step: 50890 | Dataset: 0-1256802 | Loss: 2.142 | 675 ms/step , 58227.22 GFLOP/s , 532076.0 tokens/s INFO:__main__:2024-10-27 05:04:54 | Epoch: 2 | Step: 50900 | Dataset: 0-1264802 | Loss: 2.121 | 675 ms/step , 58276.38 GFLOP/s , 532893.1 tokens/s INFO:__main__:2024-10-27 05:05:02 | Epoch: 2 | Step: 50910 | Dataset: 0-1272802 | Loss: 2.131 | 675 ms/step , 58229.02 GFLOP/s , 532639.1 tokens/s INFO:__main__:2024-10-27 05:05:09 | Epoch: 2 | Step: 50920 | Dataset: 0-1280802 | Loss: 2.188 | 676 ms/step , 58129.61 GFLOP/s , 532714.4 tokens/s INFO:__main__:2024-10-27 05:05:17 | Epoch: 2 | Step: 50930 | Dataset: 0-1288802 | Loss: 2.170 | 675 ms/step , 58207.73 GFLOP/s , 532918.7 tokens/s INFO:__main__:2024-10-27 05:05:25 | Epoch: 2 | Step: 50940 | Dataset: 0-1296802 | Loss: 2.163 | 675 ms/step , 58217.74 GFLOP/s , 532608.0 tokens/s INFO:__main__:2024-10-27 05:05:32 | Epoch: 2 | Step: 50950 | Dataset: 0-1304802 | Loss: 2.159 | 675 ms/step , 58264.59 GFLOP/s , 532964.0 tokens/s INFO:__main__:2024-10-27 05:05:40 | Epoch: 2 | Step: 50960 | Dataset: 0-1312802 | Loss: 2.147 | 676 ms/step , 58174.96 GFLOP/s , 532356.9 tokens/s INFO:__main__:2024-10-27 05:05:48 | Epoch: 2 | Step: 50970 | Dataset: 0-1320802 | Loss: 2.170 | 675 ms/step , 58271.95 GFLOP/s , 532876.0 tokens/s INFO:__main__:2024-10-27 05:05:55 | Epoch: 2 | Step: 50980 | Dataset: 0-1328802 | Loss: 2.186 | 675 ms/step , 58213.22 GFLOP/s , 549989.4 tokens/s INFO:__main__:2024-10-27 05:06:03 | Epoch: 2 | Step: 50990 | Dataset: 0-1336802 | Loss: 2.182 | 675 ms/step , 58211.90 GFLOP/s , 532163.1 tokens/s INFO:__main__:2024-10-27 05:06:10 | Validation | Step: 51000 | Val_loss: 2.235 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 05:06:10 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_050610_step_51000.pt` INFO:__main__:2024-10-27 05:06:11 | Epoch: 2 | Step: 51000 | Dataset: 0-1344802 | Loss: 2.124 | 674 ms/step , 58361.02 GFLOP/s , 479912.2 tokens/s INFO:__main__:2024-10-27 05:06:19 | Epoch: 2 | Step: 51010 | Dataset: 0-1352802 | Loss: 2.152 | 675 ms/step , 58212.02 GFLOP/s , 533015.3 tokens/s INFO:__main__:2024-10-27 05:06:27 | Epoch: 2 | Step: 51020 | Dataset: 0-1360802 | Loss: 2.160 | 675 ms/step , 58222.09 GFLOP/s , 532464.1 tokens/s INFO:__main__:2024-10-27 05:06:34 | Epoch: 2 | Step: 51030 | Dataset: 0-1368802 | Loss: 2.113 | 676 ms/step , 58180.67 GFLOP/s , 532721.5 tokens/s INFO:__main__:2024-10-27 05:06:42 | Epoch: 2 | Step: 51040 | Dataset: 0-1376802 | Loss: 2.166 | 675 ms/step , 58214.37 GFLOP/s , 533414.8 tokens/s INFO:__main__:2024-10-27 05:06:50 | Epoch: 2 | Step: 51050 | Dataset: 0-1384802 | Loss: 2.222 | 674 ms/step , 58301.79 GFLOP/s , 532815.4 tokens/s INFO:__main__:2024-10-27 05:06:57 | Epoch: 2 | Step: 51060 | Dataset: 0-1392802 | Loss: 2.224 | 675 ms/step , 58267.88 GFLOP/s , 532778.4 tokens/s INFO:__main__:2024-10-27 05:07:05 | Epoch: 2 | Step: 51070 | Dataset: 0-1400802 | Loss: 2.200 | 675 ms/step , 58214.74 GFLOP/s , 532804.0 tokens/s INFO:__main__:2024-10-27 05:07:13 | Epoch: 2 | Step: 51080 | Dataset: 0-1408802 | Loss: 2.205 | 676 ms/step , 58156.34 GFLOP/s , 532868.6 tokens/s INFO:__main__:2024-10-27 05:07:21 | Epoch: 2 | Step: 51090 | Dataset: 0-1416802 | Loss: 2.178 | 674 ms/step , 58307.92 GFLOP/s , 533009.6 tokens/s INFO:__main__:2024-10-27 05:07:28 | Epoch: 2 | Step: 51100 | Dataset: 0-1424802 | Loss: 2.156 | 675 ms/step , 58236.97 GFLOP/s , 532940.9 tokens/s INFO:__main__:2024-10-27 05:07:36 | Epoch: 2 | Step: 51110 | Dataset: 0-1432802 | Loss: 2.181 | 674 ms/step , 58292.39 GFLOP/s , 532569.2 tokens/s INFO:__main__:2024-10-27 05:07:44 | Epoch: 2 | Step: 51120 | Dataset: 0-1440802 | Loss: 2.154 | 675 ms/step , 58244.81 GFLOP/s , 532364.7 tokens/s INFO:__main__:2024-10-27 05:07:51 | Epoch: 2 | Step: 51130 | Dataset: 0-1448802 | Loss: 2.148 | 673 ms/step , 58376.40 GFLOP/s , 533014.5 tokens/s INFO:__main__:2024-10-27 05:07:59 | Epoch: 2 | Step: 51140 | Dataset: 0-1456802 | Loss: 2.268 | 675 ms/step , 58267.16 GFLOP/s , 533013.0 tokens/s INFO:__main__:2024-10-27 05:08:07 | Epoch: 2 | Step: 51150 | Dataset: 0-1464802 | Loss: 2.162 | 675 ms/step , 58255.65 GFLOP/s , 532902.3 tokens/s INFO:__main__:2024-10-27 05:08:14 | Epoch: 2 | Step: 51160 | Dataset: 0-1472802 | Loss: 2.117 | 675 ms/step , 58236.30 GFLOP/s , 532495.8 tokens/s INFO:__main__:2024-10-27 05:08:22 | Epoch: 2 | Step: 51170 | Dataset: 0-1480802 | Loss: 2.176 | 676 ms/step , 58181.37 GFLOP/s , 532462.9 tokens/s INFO:__main__:2024-10-27 05:08:30 | Epoch: 2 | Step: 51180 | Dataset: 0-1488802 | Loss: 2.177 | 675 ms/step , 58239.54 GFLOP/s , 532316.0 tokens/s INFO:__main__:2024-10-27 05:08:37 | Epoch: 2 | Step: 51190 | Dataset: 0-1496802 | Loss: 2.145 | 675 ms/step , 58276.11 GFLOP/s , 532663.6 tokens/s INFO:__main__:2024-10-27 05:08:45 | Epoch: 2 | Step: 51200 | Dataset: 0-1504802 | Loss: 2.104 | 674 ms/step , 58304.70 GFLOP/s , 532477.1 tokens/s INFO:__main__:2024-10-27 05:08:53 | Epoch: 2 | Step: 51210 | Dataset: 0-1512802 | Loss: 1.919 | 674 ms/step , 58290.35 GFLOP/s , 532257.9 tokens/s INFO:__main__:2024-10-27 05:09:01 | Epoch: 2 | Step: 51220 | Dataset: 0-1520802 | Loss: 1.842 | 675 ms/step , 58225.95 GFLOP/s , 531995.9 tokens/s INFO:__main__:2024-10-27 05:09:08 | Epoch: 2 | Step: 51230 | Dataset: 0-1528802 | Loss: 1.850 | 674 ms/step , 58309.80 GFLOP/s , 532187.6 tokens/s INFO:__main__:2024-10-27 05:09:16 | Epoch: 2 | Step: 51240 | Dataset: 0-1536802 | Loss: 1.811 | 674 ms/step , 58309.81 GFLOP/s , 532459.2 tokens/s INFO:__main__:2024-10-27 05:09:24 | Epoch: 2 | Step: 51250 | Dataset: 0-1544802 | Loss: 1.812 | 674 ms/step , 58334.37 GFLOP/s , 532503.4 tokens/s INFO:__main__:2024-10-27 05:09:31 | Epoch: 2 | Step: 51260 | Dataset: 0-1552802 | Loss: 1.756 | 675 ms/step , 58193.61 GFLOP/s , 532198.5 tokens/s INFO:__main__:2024-10-27 05:09:39 | Epoch: 2 | Step: 51270 | Dataset: 0-1560802 | Loss: 1.794 | 675 ms/step , 58239.27 GFLOP/s , 532072.3 tokens/s INFO:__main__:2024-10-27 05:09:47 | Epoch: 2 | Step: 51280 | Dataset: 0-1568802 | Loss: 1.781 | 675 ms/step , 58271.10 GFLOP/s , 532790.6 tokens/s INFO:__main__:2024-10-27 05:09:54 | Epoch: 2 | Step: 51290 | Dataset: 0-1576802 | Loss: 2.419 | 676 ms/step , 58189.43 GFLOP/s , 532575.0 tokens/s INFO:__main__:2024-10-27 05:10:02 | Epoch: 2 | Step: 51300 | Dataset: 0-1584802 | Loss: 2.126 | 674 ms/step , 58286.79 GFLOP/s , 532828.3 tokens/s INFO:__main__:2024-10-27 05:10:10 | Epoch: 2 | Step: 51310 | Dataset: 0-1592802 | Loss: 2.192 | 674 ms/step , 58302.13 GFLOP/s , 532927.6 tokens/s INFO:__main__:2024-10-27 05:10:17 | Epoch: 2 | Step: 51320 | Dataset: 0-1600802 | Loss: 2.149 | 677 ms/step , 58041.42 GFLOP/s , 531359.1 tokens/s INFO:__main__:2024-10-27 05:10:25 | Epoch: 2 | Step: 51330 | Dataset: 0-1608802 | Loss: 2.222 | 674 ms/step , 58282.57 GFLOP/s , 531772.1 tokens/s INFO:__main__:2024-10-27 05:10:33 | Epoch: 2 | Step: 51340 | Dataset: 0-1616802 | Loss: 2.118 | 674 ms/step , 58300.16 GFLOP/s , 533240.2 tokens/s INFO:__main__:2024-10-27 05:10:41 | Epoch: 2 | Step: 51350 | Dataset: 0-1624802 | Loss: 2.094 | 675 ms/step , 58249.31 GFLOP/s , 532987.3 tokens/s INFO:__main__:2024-10-27 05:10:48 | Epoch: 2 | Step: 51360 | Dataset: 0-1632802 | Loss: 2.105 | 676 ms/step , 58114.24 GFLOP/s , 530901.9 tokens/s INFO:__main__:2024-10-27 05:10:56 | Epoch: 2 | Step: 51370 | Dataset: 0-1640802 | Loss: 2.183 | 677 ms/step , 58076.95 GFLOP/s , 531391.1 tokens/s INFO:__main__:2024-10-27 05:11:04 | Epoch: 2 | Step: 51380 | Dataset: 0-1648802 | Loss: 2.118 | 674 ms/step , 58326.93 GFLOP/s , 533110.5 tokens/s INFO:__main__:2024-10-27 05:11:11 | Epoch: 2 | Step: 51390 | Dataset: 0-1656802 | Loss: 2.069 | 675 ms/step , 58217.97 GFLOP/s , 533237.9 tokens/s INFO:__main__:2024-10-27 05:11:19 | Epoch: 2 | Step: 51400 | Dataset: 0-1664802 | Loss: 2.164 | 675 ms/step , 58230.58 GFLOP/s , 532641.1 tokens/s INFO:__main__:2024-10-27 05:11:27 | Epoch: 2 | Step: 51410 | Dataset: 0-1672802 | Loss: 2.007 | 675 ms/step , 58223.89 GFLOP/s , 532633.5 tokens/s INFO:__main__:2024-10-27 05:11:34 | Epoch: 2 | Step: 51420 | Dataset: 0-1680802 | Loss: 2.193 | 676 ms/step , 58130.71 GFLOP/s , 531753.2 tokens/s INFO:__main__:2024-10-27 05:11:42 | Epoch: 2 | Step: 51430 | Dataset: 0-1688802 | Loss: 2.166 | 676 ms/step , 58141.56 GFLOP/s , 531067.4 tokens/s INFO:__main__:2024-10-27 05:11:50 | Epoch: 2 | Step: 51440 | Dataset: 0-1696802 | Loss: 2.144 | 676 ms/step , 58152.97 GFLOP/s , 530779.2 tokens/s INFO:__main__:2024-10-27 05:11:58 | Epoch: 2 | Step: 51450 | Dataset: 0-1704802 | Loss: 2.138 | 674 ms/step , 58281.69 GFLOP/s , 531541.6 tokens/s INFO:__main__:2024-10-27 05:12:05 | Epoch: 2 | Step: 51460 | Dataset: 0-1712802 | Loss: 2.208 | 676 ms/step , 58166.48 GFLOP/s , 531421.4 tokens/s INFO:__main__:2024-10-27 05:12:13 | Epoch: 2 | Step: 51470 | Dataset: 0-1720802 | Loss: 2.228 | 675 ms/step , 58251.78 GFLOP/s , 531847.6 tokens/s INFO:__main__:2024-10-27 05:12:21 | Epoch: 2 | Step: 51480 | Dataset: 0-1728802 | Loss: 2.230 | 677 ms/step , 58099.21 GFLOP/s , 531636.5 tokens/s INFO:__main__:2024-10-27 05:12:28 | Epoch: 2 | Step: 51490 | Dataset: 0-1736802 | Loss: 2.230 | 676 ms/step , 58191.36 GFLOP/s , 530428.6 tokens/s INFO:__main__:2024-10-27 05:12:36 | Epoch: 2 | Step: 51500 | Dataset: 0-1744802 | Loss: 2.152 | 675 ms/step , 58227.38 GFLOP/s , 530631.9 tokens/s INFO:__main__:2024-10-27 05:12:44 | Epoch: 2 | Step: 51510 | Dataset: 0-1752802 | Loss: 2.152 | 676 ms/step , 58174.47 GFLOP/s , 531953.9 tokens/s INFO:__main__:2024-10-27 05:12:51 | Epoch: 2 | Step: 51520 | Dataset: 0-1760802 | Loss: 2.205 | 675 ms/step , 58236.04 GFLOP/s , 531593.9 tokens/s INFO:__main__:2024-10-27 05:12:59 | Epoch: 2 | Step: 51530 | Dataset: 0-1768802 | Loss: 2.276 | 676 ms/step , 58187.27 GFLOP/s , 532353.3 tokens/s INFO:__main__:2024-10-27 05:13:07 | Epoch: 2 | Step: 51540 | Dataset: 0-1776802 | Loss: 2.243 | 675 ms/step , 58252.64 GFLOP/s , 532158.8 tokens/s INFO:__main__:2024-10-27 05:13:15 | Epoch: 2 | Step: 51550 | Dataset: 0-1784802 | Loss: 2.185 | 675 ms/step , 58238.57 GFLOP/s , 532825.8 tokens/s INFO:__main__:2024-10-27 05:13:22 | Epoch: 2 | Step: 51560 | Dataset: 0-1792802 | Loss: 2.171 | 675 ms/step , 58204.21 GFLOP/s , 531955.5 tokens/s INFO:__main__:2024-10-27 05:13:30 | Epoch: 2 | Step: 51570 | Dataset: 0-1800802 | Loss: 2.134 | 675 ms/step , 58239.59 GFLOP/s , 532196.5 tokens/s INFO:__main__:2024-10-27 05:13:38 | Epoch: 2 | Step: 51580 | Dataset: 0-1808802 | Loss: 2.196 | 676 ms/step , 58192.83 GFLOP/s , 531883.3 tokens/s INFO:__main__:2024-10-27 05:13:45 | Epoch: 2 | Step: 51590 | Dataset: 0-1816802 | Loss: 2.193 | 676 ms/step , 58158.47 GFLOP/s , 532454.6 tokens/s INFO:__main__:2024-10-27 05:13:53 | Epoch: 2 | Step: 51600 | Dataset: 0-1824802 | Loss: 2.229 | 676 ms/step , 58178.68 GFLOP/s , 532002.9 tokens/s INFO:__main__:2024-10-27 05:14:01 | Epoch: 2 | Step: 51610 | Dataset: 0-1832802 | Loss: 2.142 | 674 ms/step , 58329.63 GFLOP/s , 531736.7 tokens/s INFO:__main__:2024-10-27 05:14:08 | Epoch: 2 | Step: 51620 | Dataset: 0-1840802 | Loss: 1.847 | 692 ms/step , 56798.55 GFLOP/s , 530013.2 tokens/s INFO:__main__:2024-10-27 05:14:16 | Epoch: 2 | Step: 51630 | Dataset: 0-1848802 | Loss: 1.794 | 675 ms/step , 58230.54 GFLOP/s , 528098.7 tokens/s INFO:__main__:2024-10-27 05:14:24 | Epoch: 2 | Step: 51640 | Dataset: 0-1856802 | Loss: 1.794 | 676 ms/step , 58152.23 GFLOP/s , 531064.7 tokens/s INFO:__main__:2024-10-27 05:14:32 | Epoch: 2 | Step: 51650 | Dataset: 0-1864802 | Loss: 1.749 | 676 ms/step , 58172.22 GFLOP/s , 531157.3 tokens/s INFO:__main__:2024-10-27 05:14:39 | Epoch: 2 | Step: 51660 | Dataset: 0-1872802 | Loss: 1.756 | 676 ms/step , 58154.08 GFLOP/s , 531527.5 tokens/s INFO:__main__:2024-10-27 05:14:47 | Epoch: 2 | Step: 51670 | Dataset: 0-1880802 | Loss: 1.732 | 676 ms/step , 58174.94 GFLOP/s , 530877.0 tokens/s INFO:__main__:2024-10-27 05:14:55 | Epoch: 2 | Step: 51680 | Dataset: 0-1888802 | Loss: 1.736 | 675 ms/step , 58225.96 GFLOP/s , 531539.6 tokens/s INFO:__main__:2024-10-27 05:15:03 | Epoch: 2 | Step: 51690 | Dataset: 0-1896802 | Loss: 1.744 | 676 ms/step , 58122.55 GFLOP/s , 531433.2 tokens/s INFO:__main__:2024-10-27 05:15:10 | Epoch: 2 | Step: 51700 | Dataset: 0-1904802 | Loss: 2.371 | 677 ms/step , 58070.97 GFLOP/s , 530454.1 tokens/s INFO:__main__:2024-10-27 05:15:18 | Epoch: 2 | Step: 51710 | Dataset: 0-1912802 | Loss: 2.330 | 675 ms/step , 58216.96 GFLOP/s , 532514.8 tokens/s INFO:__main__:2024-10-27 05:15:26 | Epoch: 2 | Step: 51720 | Dataset: 0-1920802 | Loss: 2.200 | 675 ms/step , 58212.23 GFLOP/s , 532938.2 tokens/s INFO:__main__:2024-10-27 05:15:33 | Epoch: 2 | Step: 51730 | Dataset: 0-1928802 | Loss: 2.162 | 675 ms/step , 58235.93 GFLOP/s , 532435.1 tokens/s INFO:__main__:2024-10-27 05:15:41 | Epoch: 2 | Step: 51740 | Dataset: 0-1936802 | Loss: 2.282 | 674 ms/step , 58295.91 GFLOP/s , 532668.9 tokens/s INFO:__main__:2024-10-27 05:15:49 | Epoch: 2 | Step: 51750 | Dataset: 0-1944802 | Loss: 2.148 | 675 ms/step , 58275.85 GFLOP/s , 533208.5 tokens/s INFO:__main__:2024-10-27 05:15:56 | Epoch: 2 | Step: 51760 | Dataset: 0-1952802 | Loss: 2.218 | 674 ms/step , 58307.45 GFLOP/s , 533306.2 tokens/s INFO:__main__:2024-10-27 05:16:04 | Epoch: 2 | Step: 51770 | Dataset: 0-1960802 | Loss: 2.050 | 675 ms/step , 58249.58 GFLOP/s , 532587.1 tokens/s INFO:__main__:2024-10-27 05:16:12 | Epoch: 2 | Step: 51780 | Dataset: 0-1968802 | Loss: 2.105 | 675 ms/step , 58222.63 GFLOP/s , 531794.3 tokens/s INFO:__main__:2024-10-27 05:16:19 | Epoch: 2 | Step: 51790 | Dataset: 0-1976802 | Loss: 2.080 | 677 ms/step , 58075.17 GFLOP/s , 532296.1 tokens/s INFO:__main__:2024-10-27 05:16:27 | Epoch: 2 | Step: 51800 | Dataset: 0-1984802 | Loss: 2.175 | 676 ms/step , 58181.79 GFLOP/s , 531957.2 tokens/s INFO:__main__:2024-10-27 05:16:35 | Epoch: 2 | Step: 51810 | Dataset: 0-1992802 | Loss: 2.153 | 677 ms/step , 58067.24 GFLOP/s , 532189.7 tokens/s INFO:__main__:2024-10-27 05:16:43 | Epoch: 2 | Step: 51820 | Dataset: 0-2000802 | Loss: 2.133 | 676 ms/step , 58163.29 GFLOP/s , 532418.4 tokens/s INFO:__main__:2024-10-27 05:16:50 | Epoch: 2 | Step: 51830 | Dataset: 0-2008802 | Loss: 2.214 | 675 ms/step , 58241.75 GFLOP/s , 532520.4 tokens/s INFO:__main__:2024-10-27 05:16:58 | Epoch: 2 | Step: 51840 | Dataset: 0-2016802 | Loss: 2.144 | 675 ms/step , 58265.42 GFLOP/s , 532566.2 tokens/s INFO:__main__:2024-10-27 05:17:06 | Epoch: 2 | Step: 51850 | Dataset: 0-2024802 | Loss: 2.070 | 676 ms/step , 58191.14 GFLOP/s , 532526.3 tokens/s INFO:__main__:2024-10-27 05:17:13 | Epoch: 2 | Step: 51860 | Dataset: 0-2032802 | Loss: 2.034 | 675 ms/step , 58272.91 GFLOP/s , 532371.6 tokens/s INFO:__main__:2024-10-27 05:17:21 | Epoch: 2 | Step: 51870 | Dataset: 0-2040802 | Loss: 1.751 | 675 ms/step , 58244.56 GFLOP/s , 531775.2 tokens/s INFO:__main__:2024-10-27 05:17:29 | Epoch: 2 | Step: 51880 | Dataset: 0-2048802 | Loss: 1.719 | 673 ms/step , 58374.85 GFLOP/s , 532396.4 tokens/s INFO:__main__:2024-10-27 05:17:36 | Epoch: 2 | Step: 51890 | Dataset: 0-2056802 | Loss: 1.691 | 674 ms/step , 58285.26 GFLOP/s , 532126.9 tokens/s INFO:__main__:2024-10-27 05:17:44 | Epoch: 2 | Step: 51900 | Dataset: 0-2064802 | Loss: 1.641 | 674 ms/step , 58321.11 GFLOP/s , 532261.4 tokens/s INFO:__main__:2024-10-27 05:17:52 | Epoch: 2 | Step: 51910 | Dataset: 0-2072802 | Loss: 1.683 | 676 ms/step , 58114.02 GFLOP/s , 532132.1 tokens/s INFO:__main__:2024-10-27 05:17:59 | Epoch: 2 | Step: 51920 | Dataset: 0-2080802 | Loss: 1.665 | 675 ms/step , 58254.05 GFLOP/s , 532652.8 tokens/s INFO:__main__:2024-10-27 05:18:07 | Epoch: 2 | Step: 51930 | Dataset: 0-2088802 | Loss: 1.640 | 674 ms/step , 58326.38 GFLOP/s , 532940.4 tokens/s INFO:__main__:2024-10-27 05:18:15 | Epoch: 2 | Step: 51940 | Dataset: 0-2096802 | Loss: 1.654 | 675 ms/step , 58254.12 GFLOP/s , 532263.6 tokens/s INFO:__main__:2024-10-27 05:18:23 | Epoch: 2 | Step: 51950 | Dataset: 0-2104802 | Loss: 2.230 | 675 ms/step , 58278.09 GFLOP/s , 532503.2 tokens/s INFO:__main__:2024-10-27 05:18:30 | Epoch: 2 | Step: 51960 | Dataset: 0-2112802 | Loss: 2.140 | 675 ms/step , 58261.45 GFLOP/s , 533283.5 tokens/s INFO:__main__:2024-10-27 05:18:38 | Epoch: 2 | Step: 51970 | Dataset: 0-2120802 | Loss: 2.146 | 677 ms/step , 58060.81 GFLOP/s , 532877.2 tokens/s INFO:__main__:2024-10-27 05:18:46 | Epoch: 2 | Step: 51980 | Dataset: 0-2128802 | Loss: 2.148 | 677 ms/step , 58070.78 GFLOP/s , 531009.7 tokens/s INFO:__main__:2024-10-27 05:18:53 | Epoch: 2 | Step: 51990 | Dataset: 0-2136802 | Loss: 2.188 | 674 ms/step , 58319.51 GFLOP/s , 532679.1 tokens/s INFO:__main__:2024-10-27 05:19:00 | Validation | Step: 52000 | Val_loss: 2.169 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 05:19:00 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_051900_step_52000.pt` INFO:__main__:2024-10-27 05:19:02 | Epoch: 2 | Step: 52000 | Dataset: 0-2144802 | Loss: 2.185 | 674 ms/step , 58287.57 GFLOP/s , 478716.4 tokens/s INFO:__main__:2024-10-27 05:19:10 | Epoch: 2 | Step: 52010 | Dataset: 0-2152802 | Loss: 2.116 | 675 ms/step , 58255.81 GFLOP/s , 532725.3 tokens/s INFO:__main__:2024-10-27 05:19:17 | Epoch: 2 | Step: 52020 | Dataset: 0-2160802 | Loss: 2.080 | 675 ms/step , 58261.66 GFLOP/s , 532312.1 tokens/s INFO:__main__:2024-10-27 05:19:25 | Epoch: 2 | Step: 52030 | Dataset: 0-2168802 | Loss: 2.106 | 674 ms/step , 58346.13 GFLOP/s , 532890.3 tokens/s INFO:__main__:2024-10-27 05:19:33 | Epoch: 2 | Step: 52040 | Dataset: 0-2176802 | Loss: 2.113 | 675 ms/step , 58223.41 GFLOP/s , 533259.2 tokens/s INFO:__main__:2024-10-27 05:19:40 | Epoch: 2 | Step: 52050 | Dataset: 0-2184802 | Loss: 2.142 | 675 ms/step , 58272.37 GFLOP/s , 533038.2 tokens/s INFO:__main__:2024-10-27 05:19:48 | Epoch: 2 | Step: 52060 | Dataset: 0-2192802 | Loss: 2.132 | 674 ms/step , 58320.41 GFLOP/s , 533064.2 tokens/s INFO:__main__:2024-10-27 05:19:56 | Epoch: 2 | Step: 52070 | Dataset: 0-2200802 | Loss: 2.091 | 674 ms/step , 58302.00 GFLOP/s , 532945.0 tokens/s INFO:__main__:2024-10-27 05:20:03 | Epoch: 2 | Step: 52080 | Dataset: 0-2208802 | Loss: 2.076 | 674 ms/step , 58339.19 GFLOP/s , 532992.6 tokens/s INFO:__main__:2024-10-27 05:20:11 | Epoch: 2 | Step: 52090 | Dataset: 0-2216802 | Loss: 2.219 | 675 ms/step , 58274.90 GFLOP/s , 532761.3 tokens/s INFO:__main__:2024-10-27 05:20:19 | Epoch: 2 | Step: 52100 | Dataset: 0-2224802 | Loss: 2.152 | 675 ms/step , 58268.84 GFLOP/s , 533237.9 tokens/s INFO:__main__:2024-10-27 05:20:26 | Epoch: 2 | Step: 52110 | Dataset: 0-2232802 | Loss: 1.918 | 674 ms/step , 58290.87 GFLOP/s , 533099.3 tokens/s INFO:__main__:2024-10-27 05:20:34 | Epoch: 2 | Step: 52120 | Dataset: 0-2240802 | Loss: 1.825 | 674 ms/step , 58364.80 GFLOP/s , 532679.0 tokens/s INFO:__main__:2024-10-27 05:20:42 | Epoch: 2 | Step: 52130 | Dataset: 0-2248802 | Loss: 1.774 | 676 ms/step , 58141.42 GFLOP/s , 532520.3 tokens/s INFO:__main__:2024-10-27 05:20:49 | Epoch: 2 | Step: 52140 | Dataset: 0-2256802 | Loss: 1.765 | 678 ms/step , 58011.51 GFLOP/s , 532635.3 tokens/s INFO:__main__:2024-10-27 05:20:57 | Epoch: 2 | Step: 52150 | Dataset: 0-2264802 | Loss: 1.750 | 676 ms/step , 58182.51 GFLOP/s , 531988.8 tokens/s INFO:__main__:2024-10-27 05:21:05 | Epoch: 2 | Step: 52160 | Dataset: 0-2272802 | Loss: 1.750 | 674 ms/step , 58284.49 GFLOP/s , 532340.1 tokens/s INFO:__main__:2024-10-27 05:21:13 | Epoch: 2 | Step: 52170 | Dataset: 0-2280802 | Loss: 1.762 | 675 ms/step , 58208.39 GFLOP/s , 531918.8 tokens/s INFO:__main__:2024-10-27 05:21:20 | Epoch: 2 | Step: 52180 | Dataset: 0-2288802 | Loss: 1.748 | 676 ms/step , 58167.35 GFLOP/s , 531720.7 tokens/s INFO:__main__:2024-10-27 05:21:28 | Epoch: 2 | Step: 52190 | Dataset: 0-2296802 | Loss: 1.730 | 675 ms/step , 58231.72 GFLOP/s , 532078.8 tokens/s INFO:__main__:2024-10-27 05:21:36 | Epoch: 2 | Step: 52200 | Dataset: 0-2304802 | Loss: 2.218 | 675 ms/step , 58217.64 GFLOP/s , 531841.4 tokens/s INFO:__main__:2024-10-27 05:21:43 | Epoch: 2 | Step: 52210 | Dataset: 0-2312802 | Loss: 2.167 | 675 ms/step , 58225.98 GFLOP/s , 532499.7 tokens/s INFO:__main__:2024-10-27 05:21:51 | Epoch: 2 | Step: 52220 | Dataset: 0-2320802 | Loss: 2.213 | 675 ms/step , 58277.76 GFLOP/s , 532493.1 tokens/s INFO:__main__:2024-10-27 05:21:59 | Epoch: 2 | Step: 52230 | Dataset: 0-2328802 | Loss: 2.158 | 676 ms/step , 58192.10 GFLOP/s , 532222.4 tokens/s INFO:__main__:2024-10-27 05:22:06 | Epoch: 2 | Step: 52240 | Dataset: 0-2336802 | Loss: 2.128 | 674 ms/step , 58281.03 GFLOP/s , 532716.6 tokens/s INFO:__main__:2024-10-27 05:22:14 | Epoch: 2 | Step: 52250 | Dataset: 0-2344802 | Loss: 2.179 | 675 ms/step , 58209.21 GFLOP/s , 532785.2 tokens/s INFO:__main__:2024-10-27 05:22:22 | Epoch: 2 | Step: 52260 | Dataset: 0-2352802 | Loss: 2.135 | 675 ms/step , 58224.23 GFLOP/s , 532900.7 tokens/s INFO:__main__:2024-10-27 05:22:30 | Epoch: 2 | Step: 52270 | Dataset: 0-2360802 | Loss: 2.190 | 675 ms/step , 58208.26 GFLOP/s , 532718.9 tokens/s INFO:__main__:2024-10-27 05:22:37 | Epoch: 2 | Step: 52280 | Dataset: 0-2368802 | Loss: 2.124 | 676 ms/step , 58160.41 GFLOP/s , 532298.3 tokens/s INFO:__main__:2024-10-27 05:22:45 | Epoch: 2 | Step: 52290 | Dataset: 0-2376802 | Loss: 2.102 | 675 ms/step , 58220.06 GFLOP/s , 532130.0 tokens/s INFO:__main__:2024-10-27 05:22:53 | Epoch: 2 | Step: 52300 | Dataset: 0-2384802 | Loss: 2.219 | 675 ms/step , 58211.47 GFLOP/s , 532217.6 tokens/s INFO:__main__:2024-10-27 05:23:00 | Epoch: 2 | Step: 52310 | Dataset: 0-2392802 | Loss: 2.171 | 676 ms/step , 58184.95 GFLOP/s , 531846.3 tokens/s INFO:__main__:2024-10-27 05:23:08 | Epoch: 2 | Step: 52320 | Dataset: 0-2400802 | Loss: 2.136 | 676 ms/step , 58182.76 GFLOP/s , 531503.5 tokens/s INFO:__main__:2024-10-27 05:23:16 | Epoch: 2 | Step: 52330 | Dataset: 0-2408802 | Loss: 2.073 | 676 ms/step , 58182.62 GFLOP/s , 532254.3 tokens/s INFO:__main__:2024-10-27 05:23:23 | Epoch: 2 | Step: 52340 | Dataset: 0-2416802 | Loss: 2.143 | 676 ms/step , 58119.73 GFLOP/s , 532444.7 tokens/s INFO:__main__:2024-10-27 05:23:31 | Epoch: 2 | Step: 52350 | Dataset: 0-2424802 | Loss: 2.124 | 675 ms/step , 58271.18 GFLOP/s , 532741.8 tokens/s INFO:__main__:2024-10-27 05:23:39 | Epoch: 2 | Step: 52360 | Dataset: 0-2432802 | Loss: 2.167 | 674 ms/step , 58322.45 GFLOP/s , 532894.2 tokens/s INFO:__main__:2024-10-27 05:23:46 | Epoch: 2 | Step: 52370 | Dataset: 0-2440802 | Loss: 2.205 | 675 ms/step , 58201.74 GFLOP/s , 531833.5 tokens/s INFO:__main__:2024-10-27 05:23:54 | Epoch: 2 | Step: 52380 | Dataset: 0-2448802 | Loss: 2.158 | 675 ms/step , 58226.11 GFLOP/s , 532415.5 tokens/s INFO:__main__:2024-10-27 05:24:02 | Epoch: 2 | Step: 52390 | Dataset: 0-2456802 | Loss: 2.269 | 675 ms/step , 58267.02 GFLOP/s , 532406.5 tokens/s INFO:__main__:2024-10-27 05:24:10 | Epoch: 2 | Step: 52400 | Dataset: 0-2464802 | Loss: 2.127 | 675 ms/step , 58266.81 GFLOP/s , 532585.9 tokens/s INFO:__main__:2024-10-27 05:24:17 | Epoch: 2 | Step: 52410 | Dataset: 0-2472802 | Loss: 2.154 | 675 ms/step , 58246.04 GFLOP/s , 532479.4 tokens/s INFO:__main__:2024-10-27 05:24:25 | Epoch: 2 | Step: 52420 | Dataset: 0-2480802 | Loss: 2.162 | 675 ms/step , 58260.00 GFLOP/s , 532962.1 tokens/s INFO:__main__:2024-10-27 05:24:33 | Epoch: 2 | Step: 52430 | Dataset: 0-2488802 | Loss: 2.130 | 675 ms/step , 58262.83 GFLOP/s , 532788.4 tokens/s INFO:__main__:2024-10-27 05:24:40 | Epoch: 2 | Step: 52440 | Dataset: 0-2496802 | Loss: 2.156 | 677 ms/step , 58105.53 GFLOP/s , 532730.7 tokens/s INFO:__main__:2024-10-27 05:24:48 | Epoch: 2 | Step: 52450 | Dataset: 0-2504802 | Loss: 2.182 | 677 ms/step , 58059.85 GFLOP/s , 532934.0 tokens/s INFO:__main__:2024-10-27 05:24:56 | Epoch: 2 | Step: 52460 | Dataset: 0-2512802 | Loss: 2.240 | 676 ms/step , 58181.85 GFLOP/s , 532835.2 tokens/s INFO:__main__:2024-10-27 05:25:03 | Epoch: 2 | Step: 52470 | Dataset: 0-2520802 | Loss: 2.175 | 674 ms/step , 58295.94 GFLOP/s , 532408.3 tokens/s INFO:__main__:2024-10-27 05:25:11 | Epoch: 2 | Step: 52480 | Dataset: 0-2528802 | Loss: 2.163 | 675 ms/step , 58265.73 GFLOP/s , 531839.4 tokens/s INFO:__main__:2024-10-27 05:25:19 | Epoch: 2 | Step: 52490 | Dataset: 0-2536802 | Loss: 2.144 | 676 ms/step , 58167.73 GFLOP/s , 533082.8 tokens/s INFO:__main__:2024-10-27 05:25:26 | Epoch: 2 | Step: 52500 | Dataset: 0-2544802 | Loss: 2.133 | 675 ms/step , 58278.56 GFLOP/s , 531821.2 tokens/s INFO:__main__:2024-10-27 05:25:34 | Epoch: 2 | Step: 52510 | Dataset: 0-2552802 | Loss: 2.210 | 675 ms/step , 58232.16 GFLOP/s , 532809.5 tokens/s INFO:__main__:2024-10-27 05:25:42 | Epoch: 2 | Step: 52520 | Dataset: 0-2560802 | Loss: 2.187 | 675 ms/step , 58272.06 GFLOP/s , 533067.5 tokens/s INFO:__main__:2024-10-27 05:25:50 | Epoch: 2 | Step: 52530 | Dataset: 0-2568802 | Loss: 2.186 | 674 ms/step , 58349.55 GFLOP/s , 532881.4 tokens/s INFO:__main__:2024-10-27 05:25:57 | Epoch: 2 | Step: 52540 | Dataset: 0-2576802 | Loss: 2.201 | 677 ms/step , 58095.11 GFLOP/s , 532945.6 tokens/s INFO:__main__:2024-10-27 05:26:05 | Epoch: 2 | Step: 52550 | Dataset: 0-2584802 | Loss: 2.155 | 675 ms/step , 58215.12 GFLOP/s , 532892.6 tokens/s INFO:__main__:2024-10-27 05:26:13 | Epoch: 2 | Step: 52560 | Dataset: 0-2592802 | Loss: 2.116 | 675 ms/step , 58239.62 GFLOP/s , 532664.7 tokens/s INFO:__main__:2024-10-27 05:26:20 | Epoch: 2 | Step: 52570 | Dataset: 0-2600802 | Loss: 2.133 | 675 ms/step , 58258.17 GFLOP/s , 533126.7 tokens/s INFO:__main__:2024-10-27 05:26:28 | Epoch: 2 | Step: 52580 | Dataset: 0-2608802 | Loss: 2.141 | 675 ms/step , 58268.41 GFLOP/s , 533041.0 tokens/s INFO:__main__:2024-10-27 05:26:36 | Epoch: 2 | Step: 52590 | Dataset: 0-2616802 | Loss: 2.178 | 676 ms/step , 58181.03 GFLOP/s , 533066.7 tokens/s INFO:__main__:2024-10-27 05:26:43 | Epoch: 2 | Step: 52600 | Dataset: 0-2624802 | Loss: 2.146 | 676 ms/step , 58186.82 GFLOP/s , 532853.6 tokens/s INFO:__main__:2024-10-27 05:26:51 | Epoch: 2 | Step: 52610 | Dataset: 0-2632802 | Loss: 2.101 | 675 ms/step , 58223.62 GFLOP/s , 533380.7 tokens/s INFO:__main__:2024-10-27 05:26:59 | Epoch: 2 | Step: 52620 | Dataset: 0-2640802 | Loss: 2.156 | 676 ms/step , 58171.18 GFLOP/s , 532950.3 tokens/s INFO:__main__:2024-10-27 05:27:06 | Epoch: 2 | Step: 52630 | Dataset: 0-2648802 | Loss: 2.122 | 675 ms/step , 58239.73 GFLOP/s , 533113.0 tokens/s INFO:__main__:2024-10-27 05:27:14 | Epoch: 2 | Step: 52640 | Dataset: 0-2656802 | Loss: 2.149 | 676 ms/step , 58165.37 GFLOP/s , 532537.4 tokens/s INFO:__main__:2024-10-27 05:27:22 | Epoch: 2 | Step: 52650 | Dataset: 0-2664802 | Loss: 2.152 | 674 ms/step , 58343.55 GFLOP/s , 533377.3 tokens/s INFO:__main__:2024-10-27 05:27:29 | Epoch: 2 | Step: 52660 | Dataset: 0-2672802 | Loss: 2.161 | 674 ms/step , 58305.23 GFLOP/s , 533445.0 tokens/s INFO:__main__:2024-10-27 05:27:37 | Epoch: 2 | Step: 52670 | Dataset: 0-2680802 | Loss: 2.065 | 674 ms/step , 58338.17 GFLOP/s , 533489.1 tokens/s INFO:__main__:2024-10-27 05:27:45 | Epoch: 2 | Step: 52680 | Dataset: 0-2688802 | Loss: 2.158 | 676 ms/step , 58183.66 GFLOP/s , 533439.4 tokens/s INFO:__main__:2024-10-27 05:27:52 | Epoch: 2 | Step: 52690 | Dataset: 0-2696802 | Loss: 2.299 | 674 ms/step , 58298.98 GFLOP/s , 533243.6 tokens/s INFO:__main__:2024-10-27 05:28:00 | Epoch: 2 | Step: 52700 | Dataset: 0-2704802 | Loss: 2.203 | 675 ms/step , 58217.98 GFLOP/s , 533204.7 tokens/s INFO:__main__:2024-10-27 05:28:08 | Epoch: 2 | Step: 52710 | Dataset: 0-2712802 | Loss: 2.185 | 675 ms/step , 58274.92 GFLOP/s , 532909.2 tokens/s INFO:__main__:2024-10-27 05:28:15 | Epoch: 2 | Step: 52720 | Dataset: 0-2720802 | Loss: 2.185 | 676 ms/step , 58147.58 GFLOP/s , 533562.1 tokens/s INFO:__main__:2024-10-27 05:28:23 | Epoch: 2 | Step: 52730 | Dataset: 0-2728802 | Loss: 2.081 | 674 ms/step , 58290.83 GFLOP/s , 533634.9 tokens/s INFO:__main__:2024-10-27 05:28:31 | Epoch: 2 | Step: 52740 | Dataset: 0-2736802 | Loss: 2.101 | 675 ms/step , 58267.13 GFLOP/s , 533143.9 tokens/s INFO:__main__:2024-10-27 05:28:39 | Epoch: 2 | Step: 52750 | Dataset: 0-2744802 | Loss: 2.097 | 675 ms/step , 58207.13 GFLOP/s , 533366.1 tokens/s INFO:__main__:2024-10-27 05:28:46 | Epoch: 2 | Step: 52760 | Dataset: 0-2752802 | Loss: 2.067 | 675 ms/step , 58227.88 GFLOP/s , 533437.4 tokens/s INFO:__main__:2024-10-27 05:28:54 | Epoch: 2 | Step: 52770 | Dataset: 0-2760802 | Loss: 2.037 | 674 ms/step , 58340.64 GFLOP/s , 533536.0 tokens/s INFO:__main__:2024-10-27 05:29:02 | Epoch: 2 | Step: 52780 | Dataset: 0-2768802 | Loss: 2.058 | 677 ms/step , 58074.32 GFLOP/s , 532981.7 tokens/s INFO:__main__:2024-10-27 05:29:09 | Epoch: 2 | Step: 52790 | Dataset: 0-2776802 | Loss: 2.007 | 676 ms/step , 58192.83 GFLOP/s , 532374.5 tokens/s INFO:__main__:2024-10-27 05:29:17 | Epoch: 2 | Step: 52800 | Dataset: 0-2784802 | Loss: 2.021 | 677 ms/step , 58045.54 GFLOP/s , 532842.4 tokens/s INFO:__main__:2024-10-27 05:29:25 | Epoch: 2 | Step: 52810 | Dataset: 0-2792802 | Loss: 2.021 | 675 ms/step , 58199.85 GFLOP/s , 531788.2 tokens/s INFO:__main__:2024-10-27 05:29:32 | Epoch: 2 | Step: 52820 | Dataset: 0-2800802 | Loss: 1.983 | 679 ms/step , 57929.99 GFLOP/s , 529702.8 tokens/s INFO:__main__:2024-10-27 05:29:40 | Epoch: 2 | Step: 52830 | Dataset: 0-2808802 | Loss: 1.997 | 674 ms/step , 58293.34 GFLOP/s , 531629.5 tokens/s INFO:__main__:2024-10-27 05:29:48 | Epoch: 2 | Step: 52840 | Dataset: 0-2816802 | Loss: 2.029 | 676 ms/step , 58120.32 GFLOP/s , 531861.5 tokens/s INFO:__main__:2024-10-27 05:29:55 | Epoch: 2 | Step: 52850 | Dataset: 0-2824802 | Loss: 2.350 | 677 ms/step , 58077.05 GFLOP/s , 531727.7 tokens/s INFO:__main__:2024-10-27 05:30:03 | Epoch: 2 | Step: 52860 | Dataset: 0-2832802 | Loss: 2.269 | 677 ms/step , 58030.00 GFLOP/s , 531731.0 tokens/s INFO:__main__:2024-10-27 05:30:11 | Epoch: 2 | Step: 52870 | Dataset: 0-2840802 | Loss: 2.331 | 675 ms/step , 58254.98 GFLOP/s , 531910.4 tokens/s INFO:__main__:2024-10-27 05:30:19 | Epoch: 2 | Step: 52880 | Dataset: 0-2848802 | Loss: 2.204 | 676 ms/step , 58130.16 GFLOP/s , 530819.2 tokens/s INFO:__main__:2024-10-27 05:30:26 | Epoch: 2 | Step: 52890 | Dataset: 0-2856802 | Loss: 2.199 | 676 ms/step , 58174.51 GFLOP/s , 530171.5 tokens/s INFO:__main__:2024-10-27 05:30:34 | Epoch: 2 | Step: 52900 | Dataset: 0-2864802 | Loss: 2.156 | 675 ms/step , 58233.04 GFLOP/s , 531470.6 tokens/s INFO:__main__:2024-10-27 05:30:42 | Epoch: 2 | Step: 52910 | Dataset: 0-2872802 | Loss: 2.177 | 675 ms/step , 58256.41 GFLOP/s , 531823.2 tokens/s INFO:__main__:2024-10-27 05:30:49 | Epoch: 2 | Step: 52920 | Dataset: 0-2880802 | Loss: 2.166 | 676 ms/step , 58139.71 GFLOP/s , 532115.9 tokens/s INFO:__main__:2024-10-27 05:30:57 | Epoch: 2 | Step: 52930 | Dataset: 0-2888802 | Loss: 2.216 | 676 ms/step , 58178.21 GFLOP/s , 531955.4 tokens/s INFO:__main__:2024-10-27 05:31:05 | Epoch: 2 | Step: 52940 | Dataset: 0-2896802 | Loss: 2.189 | 674 ms/step , 58302.10 GFLOP/s , 532595.5 tokens/s INFO:__main__:2024-10-27 05:31:13 | Epoch: 2 | Step: 52950 | Dataset: 0-2904802 | Loss: 2.283 | 675 ms/step , 58275.29 GFLOP/s , 532940.4 tokens/s INFO:__main__:2024-10-27 05:31:20 | Epoch: 2 | Step: 52960 | Dataset: 0-2912802 | Loss: 2.115 | 678 ms/step , 57944.97 GFLOP/s , 531411.9 tokens/s INFO:__main__:2024-10-27 05:31:28 | Epoch: 2 | Step: 52970 | Dataset: 0-2920802 | Loss: 2.163 | 678 ms/step , 58005.96 GFLOP/s , 530493.9 tokens/s INFO:__main__:2024-10-27 05:31:36 | Epoch: 2 | Step: 52980 | Dataset: 0-2928802 | Loss: 2.148 | 677 ms/step , 58088.33 GFLOP/s , 529685.6 tokens/s INFO:__main__:2024-10-27 05:31:43 | Epoch: 2 | Step: 52990 | Dataset: 0-2936802 | Loss: 2.153 | 677 ms/step , 58102.25 GFLOP/s , 529168.1 tokens/s INFO:__main__:2024-10-27 05:31:51 | Validation | Step: 53000 | Val_loss: 2.267 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 05:31:51 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_053151_step_53000.pt` INFO:__main__:2024-10-27 05:31:52 | Epoch: 2 | Step: 53000 | Dataset: 0-2944802 | Loss: 2.202 | 674 ms/step , 58279.39 GFLOP/s , 476328.0 tokens/s INFO:__main__:2024-10-27 05:32:00 | Epoch: 2 | Step: 53010 | Dataset: 0-2952802 | Loss: 2.220 | 681 ms/step , 57752.27 GFLOP/s , 528608.4 tokens/s INFO:__main__:2024-10-27 05:32:07 | Epoch: 2 | Step: 53020 | Dataset: 0-2960802 | Loss: 2.256 | 675 ms/step , 58224.16 GFLOP/s , 531525.7 tokens/s INFO:__main__:2024-10-27 05:32:15 | Epoch: 2 | Step: 53030 | Dataset: 0-2968802 | Loss: 2.199 | 676 ms/step , 58189.24 GFLOP/s , 531636.0 tokens/s INFO:__main__:2024-10-27 05:32:23 | Epoch: 2 | Step: 53040 | Dataset: 0-2976802 | Loss: 2.175 | 675 ms/step , 58251.35 GFLOP/s , 532397.0 tokens/s INFO:__main__:2024-10-27 05:32:31 | Epoch: 2 | Step: 53050 | Dataset: 0-2984802 | Loss: 2.246 | 675 ms/step , 58212.60 GFLOP/s , 532191.9 tokens/s INFO:__main__:2024-10-27 05:32:38 | Epoch: 2 | Step: 53060 | Dataset: 0-2992802 | Loss: 2.152 | 674 ms/step , 58323.90 GFLOP/s , 532080.5 tokens/s INFO:__main__:2024-10-27 05:32:46 | Epoch: 2 | Step: 53070 | Dataset: 0-3000802 | Loss: 2.187 | 675 ms/step , 58254.18 GFLOP/s , 532710.8 tokens/s INFO:__main__:2024-10-27 05:32:54 | Epoch: 2 | Step: 53080 | Dataset: 0-3008802 | Loss: 2.216 | 675 ms/step , 58214.95 GFLOP/s , 533216.4 tokens/s INFO:__main__:2024-10-27 05:33:01 | Epoch: 2 | Step: 53090 | Dataset: 0-3016802 | Loss: 2.194 | 675 ms/step , 58199.16 GFLOP/s , 532587.2 tokens/s INFO:__main__:2024-10-27 05:33:09 | Epoch: 2 | Step: 53100 | Dataset: 0-3024802 | Loss: 2.256 | 676 ms/step , 58126.11 GFLOP/s , 532802.8 tokens/s INFO:__main__:2024-10-27 05:33:17 | Epoch: 2 | Step: 53110 | Dataset: 0-3032802 | Loss: 2.237 | 676 ms/step , 58153.91 GFLOP/s , 531222.1 tokens/s INFO:__main__:2024-10-27 05:33:24 | Epoch: 2 | Step: 53120 | Dataset: 0-3040802 | Loss: 2.160 | 676 ms/step , 58147.02 GFLOP/s , 531653.8 tokens/s INFO:__main__:2024-10-27 05:33:32 | Epoch: 2 | Step: 53130 | Dataset: 0-3048802 | Loss: 2.188 | 674 ms/step , 58279.32 GFLOP/s , 532167.2 tokens/s INFO:__main__:2024-10-27 05:33:40 | Epoch: 2 | Step: 53140 | Dataset: 0-3056802 | Loss: 2.172 | 675 ms/step , 58216.76 GFLOP/s , 532600.8 tokens/s INFO:__main__:2024-10-27 05:33:48 | Epoch: 2 | Step: 53150 | Dataset: 0-3064802 | Loss: 2.228 | 677 ms/step , 58103.65 GFLOP/s , 532138.2 tokens/s INFO:__main__:2024-10-27 05:33:55 | Epoch: 2 | Step: 53160 | Dataset: 0-3072802 | Loss: 2.226 | 675 ms/step , 58251.26 GFLOP/s , 532357.0 tokens/s INFO:__main__:2024-10-27 05:34:03 | Epoch: 2 | Step: 53170 | Dataset: 0-3080802 | Loss: 1.896 | 676 ms/step , 58113.90 GFLOP/s , 532810.3 tokens/s INFO:__main__:2024-10-27 05:34:11 | Epoch: 2 | Step: 53180 | Dataset: 0-3088802 | Loss: 1.787 | 675 ms/step , 58201.83 GFLOP/s , 532752.5 tokens/s INFO:__main__:2024-10-27 05:34:18 | Epoch: 2 | Step: 53190 | Dataset: 0-3096802 | Loss: 1.701 | 675 ms/step , 58253.54 GFLOP/s , 532188.2 tokens/s INFO:__main__:2024-10-27 05:34:26 | Epoch: 2 | Step: 53200 | Dataset: 0-3104802 | Loss: 1.713 | 674 ms/step , 58313.01 GFLOP/s , 532639.1 tokens/s INFO:__main__:2024-10-27 05:34:34 | Epoch: 2 | Step: 53210 | Dataset: 0-3112802 | Loss: 1.647 | 675 ms/step , 58194.42 GFLOP/s , 532603.9 tokens/s INFO:__main__:2024-10-27 05:34:41 | Epoch: 2 | Step: 53220 | Dataset: 0-3120802 | Loss: 1.677 | 675 ms/step , 58225.65 GFLOP/s , 532079.8 tokens/s INFO:__main__:2024-10-27 05:34:49 | Epoch: 2 | Step: 53230 | Dataset: 0-3128802 | Loss: 2.245 | 676 ms/step , 58160.88 GFLOP/s , 532238.1 tokens/s INFO:__main__:2024-10-27 05:34:57 | Epoch: 2 | Step: 53240 | Dataset: 0-3136802 | Loss: 2.239 | 676 ms/step , 58131.16 GFLOP/s , 531031.3 tokens/s INFO:__main__:2024-10-27 05:35:04 | Epoch: 2 | Step: 53250 | Dataset: 0-3144802 | Loss: 2.255 | 676 ms/step , 58190.71 GFLOP/s , 532567.1 tokens/s INFO:__main__:2024-10-27 05:35:12 | Epoch: 2 | Step: 53260 | Dataset: 0-3152802 | Loss: 2.160 | 675 ms/step , 58193.12 GFLOP/s , 532457.4 tokens/s INFO:__main__:2024-10-27 05:35:20 | Epoch: 2 | Step: 53270 | Dataset: 0-3160802 | Loss: 2.114 | 675 ms/step , 58242.49 GFLOP/s , 533003.0 tokens/s INFO:__main__:2024-10-27 05:35:28 | Epoch: 2 | Step: 53280 | Dataset: 0-3168802 | Loss: 2.194 | 677 ms/step , 58063.21 GFLOP/s , 532345.8 tokens/s INFO:__main__:2024-10-27 05:35:35 | Epoch: 2 | Step: 53290 | Dataset: 0-3176802 | Loss: 2.220 | 678 ms/step , 58003.15 GFLOP/s , 531204.3 tokens/s INFO:__main__:2024-10-27 05:35:43 | Epoch: 2 | Step: 53300 | Dataset: 0-3184802 | Loss: 2.167 | 674 ms/step , 58318.63 GFLOP/s , 533000.2 tokens/s INFO:__main__:2024-10-27 05:35:51 | Epoch: 2 | Step: 53310 | Dataset: 0-3192802 | Loss: 2.281 | 676 ms/step , 58159.19 GFLOP/s , 533019.9 tokens/s INFO:__main__:2024-10-27 05:35:58 | Epoch: 2 | Step: 53320 | Dataset: 0-3200802 | Loss: 2.166 | 674 ms/step , 58334.82 GFLOP/s , 532745.4 tokens/s INFO:__main__:2024-10-27 05:36:06 | Epoch: 2 | Step: 53330 | Dataset: 0-3208802 | Loss: 2.178 | 676 ms/step , 58176.91 GFLOP/s , 533306.8 tokens/s INFO:__main__:2024-10-27 05:36:14 | Epoch: 2 | Step: 53340 | Dataset: 0-3216802 | Loss: 2.243 | 676 ms/step , 58189.87 GFLOP/s , 532629.3 tokens/s INFO:__main__:2024-10-27 05:36:21 | Epoch: 2 | Step: 53350 | Dataset: 0-3224802 | Loss: 2.173 | 676 ms/step , 58192.21 GFLOP/s , 532513.7 tokens/s INFO:__main__:2024-10-27 05:36:29 | Epoch: 2 | Step: 53360 | Dataset: 0-3232802 | Loss: 2.091 | 675 ms/step , 58198.88 GFLOP/s , 532897.3 tokens/s INFO:__main__:2024-10-27 05:36:37 | Epoch: 2 | Step: 53370 | Dataset: 0-3240802 | Loss: 2.206 | 675 ms/step , 58219.73 GFLOP/s , 532494.6 tokens/s INFO:__main__:2024-10-27 05:36:44 | Epoch: 2 | Step: 53380 | Dataset: 0-3248802 | Loss: 2.144 | 675 ms/step , 58256.86 GFLOP/s , 532872.8 tokens/s INFO:__main__:2024-10-27 05:36:52 | Epoch: 2 | Step: 53390 | Dataset: 0-3256802 | Loss: 2.303 | 674 ms/step , 58295.58 GFLOP/s , 532894.1 tokens/s INFO:__main__:2024-10-27 05:37:00 | Epoch: 2 | Step: 53400 | Dataset: 0-3264802 | Loss: 2.219 | 675 ms/step , 58211.75 GFLOP/s , 531644.4 tokens/s INFO:__main__:2024-10-27 05:37:08 | Epoch: 2 | Step: 53410 | Dataset: 0-3272802 | Loss: 2.221 | 675 ms/step , 58196.11 GFLOP/s , 532630.7 tokens/s INFO:__main__:2024-10-27 05:37:15 | Epoch: 2 | Step: 53420 | Dataset: 0-3280802 | Loss: 2.190 | 675 ms/step , 58209.63 GFLOP/s , 532224.1 tokens/s INFO:__main__:2024-10-27 05:37:23 | Epoch: 2 | Step: 53430 | Dataset: 0-3288802 | Loss: 2.229 | 675 ms/step , 58239.87 GFLOP/s , 532653.0 tokens/s INFO:__main__:2024-10-27 05:37:31 | Epoch: 2 | Step: 53440 | Dataset: 0-3296802 | Loss: 2.157 | 675 ms/step , 58195.80 GFLOP/s , 531995.3 tokens/s INFO:__main__:2024-10-27 05:37:38 | Epoch: 2 | Step: 53450 | Dataset: 0-3304802 | Loss: 2.200 | 674 ms/step , 58348.13 GFLOP/s , 532459.9 tokens/s INFO:__main__:2024-10-27 05:37:46 | Epoch: 2 | Step: 53460 | Dataset: 0-3312802 | Loss: 2.192 | 675 ms/step , 58219.43 GFLOP/s , 532401.6 tokens/s INFO:__main__:2024-10-27 05:37:54 | Epoch: 2 | Step: 53470 | Dataset: 0-3320802 | Loss: 2.203 | 676 ms/step , 58160.85 GFLOP/s , 532428.2 tokens/s INFO:__main__:2024-10-27 05:38:01 | Epoch: 2 | Step: 53480 | Dataset: 0-3328802 | Loss: 2.235 | 675 ms/step , 58270.82 GFLOP/s , 532598.1 tokens/s INFO:__main__:2024-10-27 05:38:09 | Epoch: 2 | Step: 53490 | Dataset: 0-3336802 | Loss: 2.119 | 675 ms/step , 58224.39 GFLOP/s , 532098.6 tokens/s INFO:__main__:2024-10-27 05:38:17 | Epoch: 2 | Step: 53500 | Dataset: 0-3344802 | Loss: 2.177 | 674 ms/step , 58323.52 GFLOP/s , 532648.1 tokens/s INFO:__main__:2024-10-27 05:38:24 | Epoch: 2 | Step: 53510 | Dataset: 0-3352802 | Loss: 2.173 | 674 ms/step , 58302.86 GFLOP/s , 532827.0 tokens/s INFO:__main__:2024-10-27 05:38:32 | Epoch: 2 | Step: 53520 | Dataset: 0-3360802 | Loss: 2.202 | 675 ms/step , 58226.52 GFLOP/s , 533093.6 tokens/s INFO:__main__:2024-10-27 05:38:40 | Epoch: 2 | Step: 53530 | Dataset: 0-3368802 | Loss: 2.145 | 675 ms/step , 58196.66 GFLOP/s , 531077.6 tokens/s INFO:__main__:2024-10-27 05:38:48 | Epoch: 2 | Step: 53540 | Dataset: 0-3376802 | Loss: 2.229 | 675 ms/step , 58213.33 GFLOP/s , 531171.9 tokens/s INFO:__main__:2024-10-27 05:38:55 | Epoch: 2 | Step: 53550 | Dataset: 0-3384802 | Loss: 2.240 | 676 ms/step , 58158.81 GFLOP/s , 530441.1 tokens/s INFO:__main__:2024-10-27 05:39:03 | Epoch: 2 | Step: 53560 | Dataset: 0-3392802 | Loss: 2.201 | 675 ms/step , 58252.63 GFLOP/s , 531759.0 tokens/s INFO:__main__:2024-10-27 05:39:11 | Epoch: 2 | Step: 53570 | Dataset: 0-3400802 | Loss: 2.110 | 677 ms/step , 58099.59 GFLOP/s , 530569.3 tokens/s INFO:__main__:2024-10-27 05:39:18 | Epoch: 2 | Step: 53580 | Dataset: 0-3408802 | Loss: 2.120 | 676 ms/step , 58122.19 GFLOP/s , 530804.4 tokens/s INFO:__main__:2024-10-27 05:39:26 | Epoch: 2 | Step: 53590 | Dataset: 0-3416802 | Loss: 2.166 | 675 ms/step , 58212.62 GFLOP/s , 531214.1 tokens/s INFO:__main__:2024-10-27 05:39:34 | Epoch: 2 | Step: 53600 | Dataset: 0-3424802 | Loss: 2.165 | 678 ms/step , 57973.79 GFLOP/s , 531326.8 tokens/s INFO:__main__:2024-10-27 05:39:42 | Epoch: 2 | Step: 53610 | Dataset: 0-3432802 | Loss: 2.206 | 678 ms/step , 57961.95 GFLOP/s , 530698.0 tokens/s INFO:__main__:2024-10-27 05:39:49 | Epoch: 2 | Step: 53620 | Dataset: 0-3440802 | Loss: 2.165 | 675 ms/step , 58272.72 GFLOP/s , 530994.3 tokens/s INFO:__main__:2024-10-27 05:39:57 | Epoch: 2 | Step: 53630 | Dataset: 0-3448802 | Loss: 2.116 | 679 ms/step , 57859.08 GFLOP/s , 530610.2 tokens/s INFO:__main__:2024-10-27 05:40:05 | Epoch: 2 | Step: 53640 | Dataset: 0-3456802 | Loss: 2.177 | 675 ms/step , 58232.64 GFLOP/s , 531517.1 tokens/s INFO:__main__:2024-10-27 05:40:12 | Epoch: 2 | Step: 53650 | Dataset: 0-3464802 | Loss: 2.188 | 675 ms/step , 58255.86 GFLOP/s , 532568.9 tokens/s INFO:__main__:2024-10-27 05:40:20 | Epoch: 2 | Step: 53660 | Dataset: 0-3472802 | Loss: 2.205 | 676 ms/step , 58167.27 GFLOP/s , 532624.6 tokens/s INFO:__main__:2024-10-27 05:40:28 | Epoch: 2 | Step: 53670 | Dataset: 0-3480802 | Loss: 2.119 | 675 ms/step , 58198.42 GFLOP/s , 532974.1 tokens/s INFO:__main__:2024-10-27 05:40:35 | Epoch: 2 | Step: 53680 | Dataset: 0-3488802 | Loss: 2.166 | 675 ms/step , 58205.25 GFLOP/s , 532060.8 tokens/s INFO:__main__:2024-10-27 05:40:43 | Epoch: 2 | Step: 53690 | Dataset: 0-3496802 | Loss: 2.105 | 675 ms/step , 58218.93 GFLOP/s , 532553.0 tokens/s INFO:__main__:2024-10-27 05:40:51 | Epoch: 2 | Step: 53700 | Dataset: 0-3504802 | Loss: 2.170 | 675 ms/step , 58195.85 GFLOP/s , 532030.5 tokens/s INFO:__main__:2024-10-27 05:40:59 | Epoch: 2 | Step: 53710 | Dataset: 0-3512802 | Loss: 2.129 | 675 ms/step , 58257.51 GFLOP/s , 532232.5 tokens/s INFO:__main__:2024-10-27 05:41:06 | Epoch: 2 | Step: 53720 | Dataset: 0-3520802 | Loss: 2.168 | 675 ms/step , 58218.41 GFLOP/s , 532296.6 tokens/s INFO:__main__:2024-10-27 05:41:14 | Epoch: 2 | Step: 53730 | Dataset: 0-3528802 | Loss: 2.182 | 675 ms/step , 58228.91 GFLOP/s , 532252.2 tokens/s INFO:__main__:2024-10-27 05:41:22 | Epoch: 2 | Step: 53740 | Dataset: 0-3536802 | Loss: 2.119 | 675 ms/step , 58200.38 GFLOP/s , 532253.6 tokens/s INFO:__main__:2024-10-27 05:41:29 | Epoch: 2 | Step: 53750 | Dataset: 0-3544802 | Loss: 2.120 | 674 ms/step , 58314.25 GFLOP/s , 532048.7 tokens/s INFO:__main__:2024-10-27 05:41:37 | Epoch: 2 | Step: 53760 | Dataset: 0-3552802 | Loss: 2.159 | 674 ms/step , 58280.89 GFLOP/s , 532881.0 tokens/s INFO:__main__:2024-10-27 05:41:45 | Epoch: 2 | Step: 53770 | Dataset: 0-3560802 | Loss: 2.156 | 675 ms/step , 58253.84 GFLOP/s , 532584.4 tokens/s INFO:__main__:2024-10-27 05:41:52 | Epoch: 2 | Step: 53780 | Dataset: 0-3568802 | Loss: 2.189 | 676 ms/step , 58174.96 GFLOP/s , 532895.7 tokens/s INFO:__main__:2024-10-27 05:42:00 | Epoch: 2 | Step: 53790 | Dataset: 0-3576802 | Loss: 2.214 | 676 ms/step , 58127.60 GFLOP/s , 531961.8 tokens/s INFO:__main__:2024-10-27 05:42:08 | Epoch: 2 | Step: 53800 | Dataset: 0-3584802 | Loss: 2.068 | 675 ms/step , 58194.73 GFLOP/s , 532241.9 tokens/s INFO:__main__:2024-10-27 05:42:15 | Epoch: 2 | Step: 53810 | Dataset: 0-3592802 | Loss: 2.121 | 675 ms/step , 58258.45 GFLOP/s , 532392.9 tokens/s INFO:__main__:2024-10-27 05:42:23 | Epoch: 2 | Step: 53820 | Dataset: 0-3600802 | Loss: 2.063 | 675 ms/step , 58225.69 GFLOP/s , 532208.6 tokens/s INFO:__main__:2024-10-27 05:42:31 | Epoch: 2 | Step: 53830 | Dataset: 0-3608802 | Loss: 2.133 | 675 ms/step , 58277.41 GFLOP/s , 532289.5 tokens/s INFO:__main__:2024-10-27 05:42:39 | Epoch: 2 | Step: 53840 | Dataset: 0-3616802 | Loss: 2.108 | 675 ms/step , 58202.06 GFLOP/s , 532692.4 tokens/s INFO:__main__:2024-10-27 05:42:46 | Epoch: 2 | Step: 53850 | Dataset: 0-3624802 | Loss: 2.109 | 675 ms/step , 58232.23 GFLOP/s , 532544.7 tokens/s INFO:__main__:2024-10-27 05:42:54 | Epoch: 2 | Step: 53860 | Dataset: 0-3632802 | Loss: 2.154 | 673 ms/step , 58367.83 GFLOP/s , 532764.0 tokens/s INFO:__main__:2024-10-27 05:43:02 | Epoch: 2 | Step: 53870 | Dataset: 0-3640802 | Loss: 2.205 | 675 ms/step , 58204.84 GFLOP/s , 532655.5 tokens/s INFO:__main__:2024-10-27 05:43:09 | Epoch: 2 | Step: 53880 | Dataset: 0-3648802 | Loss: 2.203 | 676 ms/step , 58189.29 GFLOP/s , 532037.2 tokens/s INFO:__main__:2024-10-27 05:43:17 | Epoch: 2 | Step: 53890 | Dataset: 0-3656802 | Loss: 2.262 | 675 ms/step , 58207.46 GFLOP/s , 532531.5 tokens/s INFO:__main__:2024-10-27 05:43:25 | Epoch: 2 | Step: 53900 | Dataset: 0-3664802 | Loss: 2.167 | 674 ms/step , 58301.90 GFLOP/s , 532866.3 tokens/s INFO:__main__:2024-10-27 05:43:32 | Epoch: 2 | Step: 53910 | Dataset: 0-3672802 | Loss: 2.178 | 674 ms/step , 58312.84 GFLOP/s , 532857.7 tokens/s INFO:__main__:2024-10-27 05:43:40 | Epoch: 2 | Step: 53920 | Dataset: 0-3680802 | Loss: 2.124 | 675 ms/step , 58244.81 GFLOP/s , 532229.8 tokens/s INFO:__main__:2024-10-27 05:43:48 | Epoch: 2 | Step: 53930 | Dataset: 0-3688802 | Loss: 2.229 | 675 ms/step , 58217.23 GFLOP/s , 532567.9 tokens/s INFO:__main__:2024-10-27 05:43:55 | Epoch: 2 | Step: 53940 | Dataset: 0-3696802 | Loss: 2.160 | 676 ms/step , 58184.96 GFLOP/s , 532389.0 tokens/s INFO:__main__:2024-10-27 05:44:03 | Epoch: 2 | Step: 53950 | Dataset: 0-3704802 | Loss: 2.170 | 676 ms/step , 58185.36 GFLOP/s , 532374.3 tokens/s INFO:__main__:2024-10-27 05:44:11 | Epoch: 2 | Step: 53960 | Dataset: 0-3712802 | Loss: 2.187 | 675 ms/step , 58226.09 GFLOP/s , 532376.4 tokens/s INFO:__main__:2024-10-27 05:44:19 | Epoch: 2 | Step: 53970 | Dataset: 0-3720802 | Loss: 2.197 | 674 ms/step , 58350.20 GFLOP/s , 532366.4 tokens/s INFO:__main__:2024-10-27 05:44:26 | Epoch: 2 | Step: 53980 | Dataset: 0-3728802 | Loss: 2.135 | 676 ms/step , 58180.18 GFLOP/s , 532205.7 tokens/s INFO:__main__:2024-10-27 05:44:34 | Epoch: 2 | Step: 53990 | Dataset: 0-3736802 | Loss: 2.170 | 675 ms/step , 58250.28 GFLOP/s , 532237.1 tokens/s INFO:__main__:2024-10-27 05:44:41 | Validation | Step: 54000 | Val_loss: 2.235 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 05:44:41 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_054441_step_54000.pt` INFO:__main__:2024-10-27 05:44:42 | Epoch: 2 | Step: 54000 | Dataset: 0-3744802 | Loss: 2.163 | 674 ms/step , 58364.27 GFLOP/s , 479616.8 tokens/s INFO:__main__:2024-10-27 05:44:50 | Epoch: 2 | Step: 54010 | Dataset: 0-3752802 | Loss: 2.188 | 676 ms/step , 58127.06 GFLOP/s , 531483.4 tokens/s INFO:__main__:2024-10-27 05:44:58 | Epoch: 2 | Step: 54020 | Dataset: 0-3760802 | Loss: 2.148 | 675 ms/step , 58201.72 GFLOP/s , 532919.3 tokens/s INFO:__main__:2024-10-27 05:45:06 | Epoch: 2 | Step: 54030 | Dataset: 0-3768802 | Loss: 2.180 | 676 ms/step , 58182.97 GFLOP/s , 532542.2 tokens/s INFO:__main__:2024-10-27 05:45:13 | Epoch: 2 | Step: 54040 | Dataset: 0-3776802 | Loss: 1.995 | 676 ms/step , 58137.05 GFLOP/s , 531927.5 tokens/s INFO:__main__:2024-10-27 05:45:21 | Epoch: 2 | Step: 54050 | Dataset: 0-3784802 | Loss: 1.874 | 674 ms/step , 58314.69 GFLOP/s , 532467.0 tokens/s INFO:__main__:2024-10-27 05:45:29 | Epoch: 2 | Step: 54060 | Dataset: 0-3792802 | Loss: 1.796 | 676 ms/step , 58154.86 GFLOP/s , 532123.3 tokens/s INFO:__main__:2024-10-27 05:45:36 | Epoch: 2 | Step: 54070 | Dataset: 0-3800802 | Loss: 1.765 | 675 ms/step , 58244.17 GFLOP/s , 531601.5 tokens/s INFO:__main__:2024-10-27 05:45:44 | Epoch: 2 | Step: 54080 | Dataset: 0-3808802 | Loss: 1.756 | 675 ms/step , 58274.66 GFLOP/s , 531870.8 tokens/s INFO:__main__:2024-10-27 05:45:52 | Epoch: 2 | Step: 54090 | Dataset: 0-3816802 | Loss: 1.750 | 675 ms/step , 58214.75 GFLOP/s , 531740.1 tokens/s INFO:__main__:2024-10-27 05:45:59 | Epoch: 2 | Step: 54100 | Dataset: 0-3824802 | Loss: 1.727 | 675 ms/step , 58218.68 GFLOP/s , 532121.3 tokens/s INFO:__main__:2024-10-27 05:46:07 | Epoch: 2 | Step: 54110 | Dataset: 0-3832802 | Loss: 1.724 | 675 ms/step , 58262.75 GFLOP/s , 532390.4 tokens/s INFO:__main__:2024-10-27 05:46:15 | Epoch: 2 | Step: 54120 | Dataset: 0-3840802 | Loss: 2.346 | 676 ms/step , 58189.98 GFLOP/s , 532250.5 tokens/s INFO:__main__:2024-10-27 05:46:23 | Epoch: 2 | Step: 54130 | Dataset: 0-3848802 | Loss: 2.208 | 674 ms/step , 58307.07 GFLOP/s , 533112.6 tokens/s INFO:__main__:2024-10-27 05:46:30 | Epoch: 2 | Step: 54140 | Dataset: 0-3856802 | Loss: 2.156 | 675 ms/step , 58207.50 GFLOP/s , 532883.6 tokens/s INFO:__main__:2024-10-27 05:46:38 | Epoch: 2 | Step: 54150 | Dataset: 0-3864802 | Loss: 2.091 | 675 ms/step , 58213.67 GFLOP/s , 532675.3 tokens/s INFO:__main__:2024-10-27 05:46:46 | Epoch: 2 | Step: 54160 | Dataset: 0-3872802 | Loss: 2.091 | 673 ms/step , 58373.66 GFLOP/s , 533340.0 tokens/s INFO:__main__:2024-10-27 05:46:53 | Epoch: 2 | Step: 54170 | Dataset: 0-3880802 | Loss: 2.159 | 674 ms/step , 58354.27 GFLOP/s , 533127.5 tokens/s INFO:__main__:2024-10-27 05:47:01 | Epoch: 2 | Step: 54180 | Dataset: 0-3888802 | Loss: 2.143 | 675 ms/step , 58276.91 GFLOP/s , 532884.4 tokens/s INFO:__main__:2024-10-27 05:47:09 | Epoch: 2 | Step: 54190 | Dataset: 0-3896802 | Loss: 2.154 | 674 ms/step , 58334.86 GFLOP/s , 533223.3 tokens/s INFO:__main__:2024-10-27 05:47:16 | Epoch: 2 | Step: 54200 | Dataset: 0-3904802 | Loss: 2.088 | 674 ms/step , 58292.12 GFLOP/s , 532902.5 tokens/s INFO:__main__:2024-10-27 05:47:24 | Epoch: 2 | Step: 54210 | Dataset: 0-3912802 | Loss: 2.160 | 675 ms/step , 58228.83 GFLOP/s , 533165.2 tokens/s INFO:__main__:2024-10-27 05:47:32 | Epoch: 2 | Step: 54220 | Dataset: 0-3920802 | Loss: 2.096 | 677 ms/step , 58099.28 GFLOP/s , 528766.0 tokens/s INFO:__main__:2024-10-27 05:47:39 | Epoch: 2 | Step: 54230 | Dataset: 0-3928802 | Loss: 2.182 | 678 ms/step , 57988.30 GFLOP/s , 530699.8 tokens/s INFO:__main__:2024-10-27 05:47:47 | Epoch: 2 | Step: 54240 | Dataset: 0-3936802 | Loss: 2.075 | 675 ms/step , 58194.66 GFLOP/s , 531197.2 tokens/s INFO:__main__:2024-10-27 05:47:55 | Epoch: 2 | Step: 54250 | Dataset: 0-3944802 | Loss: 2.080 | 676 ms/step , 58109.21 GFLOP/s , 530437.6 tokens/s INFO:__main__:2024-10-27 05:48:03 | Epoch: 2 | Step: 54260 | Dataset: 0-3952802 | Loss: 2.229 | 675 ms/step , 58215.02 GFLOP/s , 531501.3 tokens/s INFO:__main__:2024-10-27 05:48:10 | Epoch: 2 | Step: 54270 | Dataset: 0-3960802 | Loss: 2.039 | 676 ms/step , 58136.10 GFLOP/s , 531478.9 tokens/s INFO:__main__:2024-10-27 05:48:18 | Epoch: 2 | Step: 54280 | Dataset: 0-3968802 | Loss: 2.171 | 678 ms/step , 57942.24 GFLOP/s , 530438.4 tokens/s INFO:__main__:2024-10-27 05:48:26 | Epoch: 2 | Step: 54290 | Dataset: 0-3976802 | Loss: 2.208 | 675 ms/step , 58199.56 GFLOP/s , 529294.3 tokens/s INFO:__main__:2024-10-27 05:48:33 | Epoch: 2 | Step: 54300 | Dataset: 0-3984802 | Loss: 2.212 | 675 ms/step , 58230.61 GFLOP/s , 532330.5 tokens/s INFO:__main__:2024-10-27 05:48:41 | Epoch: 2 | Step: 54310 | Dataset: 0-3992802 | Loss: 2.186 | 676 ms/step , 58108.20 GFLOP/s , 532149.1 tokens/s INFO:__main__:2024-10-27 05:48:49 | Epoch: 2 | Step: 54320 | Dataset: 0-4000802 | Loss: 2.237 | 675 ms/step , 58243.32 GFLOP/s , 532704.5 tokens/s INFO:__main__:2024-10-27 05:48:57 | Epoch: 2 | Step: 54330 | Dataset: 0-4008802 | Loss: 2.199 | 674 ms/step , 58301.33 GFLOP/s , 532722.2 tokens/s INFO:__main__:2024-10-27 05:49:04 | Epoch: 2 | Step: 54340 | Dataset: 0-4016802 | Loss: 2.192 | 675 ms/step , 58204.19 GFLOP/s , 532944.9 tokens/s INFO:__main__:2024-10-27 05:49:12 | Epoch: 2 | Step: 54350 | Dataset: 0-4024802 | Loss: 2.222 | 674 ms/step , 58325.85 GFLOP/s , 532986.6 tokens/s INFO:__main__:2024-10-27 05:49:20 | Epoch: 2 | Step: 54360 | Dataset: 0-4032802 | Loss: 2.144 | 676 ms/step , 58192.63 GFLOP/s , 532805.2 tokens/s INFO:__main__:2024-10-27 05:49:27 | Epoch: 2 | Step: 54370 | Dataset: 0-4040802 | Loss: 2.108 | 675 ms/step , 58243.38 GFLOP/s , 532692.7 tokens/s INFO:__main__:2024-10-27 05:49:35 | Epoch: 2 | Step: 54380 | Dataset: 0-4048802 | Loss: 2.118 | 675 ms/step , 58203.68 GFLOP/s , 532422.1 tokens/s INFO:__main__:2024-10-27 05:49:43 | Epoch: 2 | Step: 54390 | Dataset: 0-4056802 | Loss: 2.191 | 675 ms/step , 58232.34 GFLOP/s , 530692.6 tokens/s INFO:__main__:2024-10-27 05:49:50 | Epoch: 2 | Step: 54400 | Dataset: 0-4064802 | Loss: 2.148 | 674 ms/step , 58305.40 GFLOP/s , 533148.2 tokens/s INFO:__main__:2024-10-27 05:49:58 | Epoch: 2 | Step: 54410 | Dataset: 0-4072802 | Loss: 2.193 | 674 ms/step , 58295.24 GFLOP/s , 532828.7 tokens/s INFO:__main__:2024-10-27 05:50:06 | Epoch: 2 | Step: 54420 | Dataset: 0-4080802 | Loss: 2.140 | 674 ms/step , 58289.78 GFLOP/s , 532428.7 tokens/s INFO:__main__:2024-10-27 05:50:13 | Epoch: 2 | Step: 54430 | Dataset: 0-4088802 | Loss: 2.213 | 675 ms/step , 58267.23 GFLOP/s , 532617.9 tokens/s INFO:__main__:2024-10-27 05:50:21 | Epoch: 2 | Step: 54440 | Dataset: 0-4096802 | Loss: 2.122 | 675 ms/step , 58254.26 GFLOP/s , 533374.9 tokens/s INFO:__main__:2024-10-27 05:50:29 | Epoch: 2 | Step: 54450 | Dataset: 0-4104802 | Loss: 2.174 | 676 ms/step , 58133.37 GFLOP/s , 532516.9 tokens/s INFO:__main__:2024-10-27 05:50:37 | Epoch: 2 | Step: 54460 | Dataset: 0-4112802 | Loss: 2.182 | 676 ms/step , 58147.54 GFLOP/s , 532371.4 tokens/s INFO:__main__:2024-10-27 05:50:44 | Epoch: 2 | Step: 54470 | Dataset: 0-4120802 | Loss: 2.188 | 674 ms/step , 58298.80 GFLOP/s , 532994.8 tokens/s INFO:__main__:2024-10-27 05:50:52 | Epoch: 2 | Step: 54480 | Dataset: 0-4128802 | Loss: 2.148 | 674 ms/step , 58355.26 GFLOP/s , 533083.7 tokens/s INFO:__main__:2024-10-27 05:51:00 | Epoch: 2 | Step: 54490 | Dataset: 0-4136802 | Loss: 2.268 | 674 ms/step , 58346.88 GFLOP/s , 532942.3 tokens/s INFO:__main__:2024-10-27 05:51:07 | Epoch: 2 | Step: 54500 | Dataset: 0-4144802 | Loss: 2.163 | 674 ms/step , 58307.33 GFLOP/s , 532835.6 tokens/s INFO:__main__:2024-10-27 05:51:15 | Epoch: 2 | Step: 54510 | Dataset: 0-4152802 | Loss: 2.211 | 676 ms/step , 58173.98 GFLOP/s , 532685.6 tokens/s INFO:__main__:2024-10-27 05:51:23 | Epoch: 2 | Step: 54520 | Dataset: 0-4160802 | Loss: 2.207 | 676 ms/step , 58170.72 GFLOP/s , 532706.7 tokens/s INFO:__main__:2024-10-27 05:51:30 | Epoch: 2 | Step: 54530 | Dataset: 0-4168802 | Loss: 2.177 | 675 ms/step , 58233.66 GFLOP/s , 532533.8 tokens/s INFO:__main__:2024-10-27 05:51:38 | Epoch: 2 | Step: 54540 | Dataset: 0-4176802 | Loss: 2.219 | 676 ms/step , 58177.30 GFLOP/s , 532593.2 tokens/s INFO:__main__:2024-10-27 05:51:46 | Epoch: 2 | Step: 54550 | Dataset: 0-4184802 | Loss: 2.230 | 676 ms/step , 58166.08 GFLOP/s , 532466.5 tokens/s INFO:__main__:2024-10-27 05:51:53 | Epoch: 2 | Step: 54560 | Dataset: 0-4192802 | Loss: 2.178 | 675 ms/step , 58196.73 GFLOP/s , 532794.1 tokens/s INFO:__main__:2024-10-27 05:52:01 | Epoch: 2 | Step: 54570 | Dataset: 0-4200802 | Loss: 2.125 | 674 ms/step , 58301.13 GFLOP/s , 532934.3 tokens/s INFO:__main__:2024-10-27 05:52:09 | Epoch: 2 | Step: 54580 | Dataset: 0-4208802 | Loss: 2.173 | 674 ms/step , 58333.16 GFLOP/s , 533174.2 tokens/s INFO:__main__:2024-10-27 05:52:16 | Epoch: 2 | Step: 54590 | Dataset: 0-4216802 | Loss: 2.222 | 675 ms/step , 58255.31 GFLOP/s , 532325.6 tokens/s INFO:__main__:2024-10-27 05:52:24 | Epoch: 2 | Step: 54600 | Dataset: 0-4224802 | Loss: 2.246 | 675 ms/step , 58276.25 GFLOP/s , 532686.3 tokens/s INFO:__main__:2024-10-27 05:52:32 | Epoch: 2 | Step: 54610 | Dataset: 0-4232802 | Loss: 2.215 | 674 ms/step , 58310.31 GFLOP/s , 532868.3 tokens/s INFO:__main__:2024-10-27 05:52:40 | Epoch: 2 | Step: 54620 | Dataset: 0-4240802 | Loss: 2.215 | 675 ms/step , 58221.61 GFLOP/s , 532969.3 tokens/s INFO:__main__:2024-10-27 05:52:47 | Epoch: 2 | Step: 54630 | Dataset: 0-4248802 | Loss: 2.133 | 675 ms/step , 58196.26 GFLOP/s , 532587.3 tokens/s INFO:__main__:2024-10-27 05:52:55 | Epoch: 2 | Step: 54640 | Dataset: 0-4256802 | Loss: 2.262 | 684 ms/step , 57502.49 GFLOP/s , 532030.7 tokens/s INFO:__main__:2024-10-27 05:53:03 | Epoch: 2 | Step: 54650 | Dataset: 0-4264802 | Loss: 2.209 | 674 ms/step , 58297.80 GFLOP/s , 533042.1 tokens/s INFO:__main__:2024-10-27 05:53:10 | Epoch: 2 | Step: 54660 | Dataset: 0-4272802 | Loss: 2.213 | 674 ms/step , 58286.59 GFLOP/s , 533184.5 tokens/s INFO:__main__:2024-10-27 05:53:18 | Epoch: 2 | Step: 54670 | Dataset: 0-4280802 | Loss: 2.192 | 674 ms/step , 58312.20 GFLOP/s , 533161.3 tokens/s INFO:__main__:2024-10-27 05:53:26 | Epoch: 2 | Step: 54680 | Dataset: 0-4288802 | Loss: 2.151 | 674 ms/step , 58293.71 GFLOP/s , 533069.3 tokens/s INFO:__main__:2024-10-27 05:53:33 | Epoch: 2 | Step: 54690 | Dataset: 0-4296802 | Loss: 2.240 | 675 ms/step , 58275.52 GFLOP/s , 533174.7 tokens/s INFO:__main__:2024-10-27 05:53:41 | Epoch: 2 | Step: 54700 | Dataset: 0-4304802 | Loss: 2.155 | 675 ms/step , 58275.84 GFLOP/s , 532518.1 tokens/s INFO:__main__:2024-10-27 05:53:49 | Epoch: 2 | Step: 54710 | Dataset: 0-4312802 | Loss: 2.117 | 675 ms/step , 58262.84 GFLOP/s , 533492.1 tokens/s INFO:__main__:2024-10-27 05:53:56 | Epoch: 2 | Step: 54720 | Dataset: 0-4320802 | Loss: 2.202 | 674 ms/step , 58327.90 GFLOP/s , 533179.1 tokens/s INFO:__main__:2024-10-27 05:54:04 | Epoch: 2 | Step: 54730 | Dataset: 0-4328802 | Loss: 2.244 | 674 ms/step , 58352.50 GFLOP/s , 533597.3 tokens/s INFO:__main__:2024-10-27 05:54:12 | Epoch: 2 | Step: 54740 | Dataset: 0-4336802 | Loss: 2.184 | 675 ms/step , 58226.13 GFLOP/s , 533124.4 tokens/s INFO:__main__:2024-10-27 05:54:19 | Epoch: 2 | Step: 54750 | Dataset: 0-4344802 | Loss: 2.114 | 675 ms/step , 58223.54 GFLOP/s , 532772.9 tokens/s INFO:__main__:2024-10-27 05:54:27 | Epoch: 2 | Step: 54760 | Dataset: 0-4352802 | Loss: 2.228 | 675 ms/step , 58201.61 GFLOP/s , 532840.8 tokens/s INFO:__main__:2024-10-27 05:54:35 | Epoch: 2 | Step: 54770 | Dataset: 0-4360802 | Loss: 1.995 | 675 ms/step , 58257.97 GFLOP/s , 532337.6 tokens/s INFO:__main__:2024-10-27 05:54:43 | Epoch: 2 | Step: 54780 | Dataset: 0-4368802 | Loss: 1.888 | 676 ms/step , 58166.93 GFLOP/s , 532678.3 tokens/s INFO:__main__:2024-10-27 05:54:50 | Epoch: 2 | Step: 54790 | Dataset: 0-4376802 | Loss: 1.862 | 676 ms/step , 58182.66 GFLOP/s , 532377.5 tokens/s INFO:__main__:2024-10-27 05:54:58 | Epoch: 2 | Step: 54800 | Dataset: 0-4384802 | Loss: 1.814 | 677 ms/step , 58030.12 GFLOP/s , 532800.5 tokens/s INFO:__main__:2024-10-27 05:55:06 | Epoch: 2 | Step: 54810 | Dataset: 0-4392802 | Loss: 1.792 | 674 ms/step , 58290.62 GFLOP/s , 532942.9 tokens/s INFO:__main__:2024-10-27 05:55:13 | Epoch: 2 | Step: 54820 | Dataset: 0-4400802 | Loss: 1.787 | 674 ms/step , 58311.40 GFLOP/s , 533056.0 tokens/s INFO:__main__:2024-10-27 05:55:21 | Epoch: 2 | Step: 54830 | Dataset: 0-4408802 | Loss: 1.754 | 675 ms/step , 58273.23 GFLOP/s , 532702.9 tokens/s INFO:__main__:2024-10-27 05:55:29 | Epoch: 2 | Step: 54840 | Dataset: 0-4416802 | Loss: 1.798 | 676 ms/step , 58185.52 GFLOP/s , 532242.6 tokens/s INFO:__main__:2024-10-27 05:55:36 | Epoch: 2 | Step: 54850 | Dataset: 0-4424802 | Loss: 1.795 | 675 ms/step , 58258.18 GFLOP/s , 532118.8 tokens/s INFO:__main__:2024-10-27 05:55:44 | Epoch: 2 | Step: 54860 | Dataset: 0-4432802 | Loss: 1.743 | 674 ms/step , 58329.16 GFLOP/s , 532171.6 tokens/s INFO:__main__:2024-10-27 05:55:52 | Epoch: 2 | Step: 54870 | Dataset: 0-4440802 | Loss: 1.707 | 675 ms/step , 58209.90 GFLOP/s , 532892.1 tokens/s INFO:__main__:2024-10-27 05:55:59 | Epoch: 2 | Step: 54880 | Dataset: 0-4448802 | Loss: 1.701 | 674 ms/step , 58328.97 GFLOP/s , 532815.9 tokens/s INFO:__main__:2024-10-27 05:56:07 | Epoch: 2 | Step: 54890 | Dataset: 0-4456802 | Loss: 1.730 | 675 ms/step , 58192.97 GFLOP/s , 532502.0 tokens/s INFO:__main__:2024-10-27 05:56:15 | Epoch: 2 | Step: 54900 | Dataset: 0-4464802 | Loss: 1.669 | 675 ms/step , 58257.37 GFLOP/s , 532289.0 tokens/s INFO:__main__:2024-10-27 05:56:22 | Epoch: 2 | Step: 54910 | Dataset: 0-4472802 | Loss: 1.703 | 675 ms/step , 58211.16 GFLOP/s , 532510.8 tokens/s INFO:__main__:2024-10-27 05:56:30 | Epoch: 2 | Step: 54920 | Dataset: 0-4480802 | Loss: 1.670 | 677 ms/step , 58044.81 GFLOP/s , 531695.5 tokens/s INFO:__main__:2024-10-27 05:56:38 | Epoch: 2 | Step: 54930 | Dataset: 0-4488802 | Loss: 1.693 | 674 ms/step , 58283.41 GFLOP/s , 532514.7 tokens/s INFO:__main__:2024-10-27 05:56:46 | Epoch: 2 | Step: 54940 | Dataset: 0-4496802 | Loss: 1.694 | 676 ms/step , 58173.23 GFLOP/s , 532199.3 tokens/s INFO:__main__:2024-10-27 05:56:53 | Epoch: 2 | Step: 54950 | Dataset: 0-4504802 | Loss: 2.298 | 675 ms/step , 58270.58 GFLOP/s , 532264.0 tokens/s INFO:__main__:2024-10-27 05:57:01 | Epoch: 2 | Step: 54960 | Dataset: 0-4512802 | Loss: 2.139 | 676 ms/step , 58134.17 GFLOP/s , 532770.9 tokens/s INFO:__main__:2024-10-27 05:57:09 | Epoch: 2 | Step: 54970 | Dataset: 0-4520802 | Loss: 2.252 | 675 ms/step , 58246.41 GFLOP/s , 532588.8 tokens/s INFO:__main__:2024-10-27 05:57:16 | Epoch: 2 | Step: 54980 | Dataset: 0-4528802 | Loss: 2.163 | 674 ms/step , 58339.94 GFLOP/s , 533481.9 tokens/s INFO:__main__:2024-10-27 05:57:24 | Epoch: 2 | Step: 54990 | Dataset: 0-4536802 | Loss: 2.270 | 674 ms/step , 58330.39 GFLOP/s , 532979.4 tokens/s INFO:__main__:2024-10-27 05:57:31 | Validation | Step: 55000 | Val_loss: 2.269 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 05:57:31 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_055731_step_55000.pt` INFO:__main__:2024-10-27 05:57:33 | Epoch: 2 | Step: 55000 | Dataset: 0-4544802 | Loss: 2.219 | 674 ms/step , 58316.92 GFLOP/s , 480170.2 tokens/s INFO:__main__:2024-10-27 05:57:40 | Epoch: 2 | Step: 55010 | Dataset: 0-4552802 | Loss: 2.171 | 677 ms/step , 58082.28 GFLOP/s , 531689.5 tokens/s INFO:__main__:2024-10-27 05:57:48 | Epoch: 2 | Step: 55020 | Dataset: 0-4560802 | Loss: 2.229 | 676 ms/step , 58132.89 GFLOP/s , 532343.6 tokens/s INFO:__main__:2024-10-27 05:57:56 | Epoch: 2 | Step: 55030 | Dataset: 0-4568802 | Loss: 2.098 | 675 ms/step , 58225.52 GFLOP/s , 532504.0 tokens/s INFO:__main__:2024-10-27 05:58:03 | Epoch: 2 | Step: 55040 | Dataset: 0-4576802 | Loss: 2.190 | 676 ms/step , 58171.65 GFLOP/s , 532775.5 tokens/s INFO:__main__:2024-10-27 05:58:11 | Epoch: 2 | Step: 55050 | Dataset: 0-4584802 | Loss: 2.052 | 676 ms/step , 58134.60 GFLOP/s , 532949.5 tokens/s INFO:__main__:2024-10-27 05:58:19 | Epoch: 2 | Step: 55060 | Dataset: 0-4592802 | Loss: 2.089 | 677 ms/step , 58076.53 GFLOP/s , 532852.0 tokens/s INFO:__main__:2024-10-27 05:58:26 | Epoch: 2 | Step: 55070 | Dataset: 0-4600802 | Loss: 2.197 | 676 ms/step , 58138.30 GFLOP/s , 532794.9 tokens/s INFO:__main__:2024-10-27 05:58:34 | Epoch: 2 | Step: 55080 | Dataset: 0-4608802 | Loss: 2.150 | 675 ms/step , 58228.79 GFLOP/s , 532734.3 tokens/s INFO:__main__:2024-10-27 05:58:42 | Epoch: 2 | Step: 55090 | Dataset: 0-4616802 | Loss: 2.141 | 674 ms/step , 58284.72 GFLOP/s , 532764.9 tokens/s INFO:__main__:2024-10-27 05:58:49 | Epoch: 2 | Step: 55100 | Dataset: 0-4624802 | Loss: 2.205 | 675 ms/step , 58246.20 GFLOP/s , 532696.8 tokens/s INFO:__main__:2024-10-27 05:58:57 | Epoch: 2 | Step: 55110 | Dataset: 0-4632802 | Loss: 2.257 | 675 ms/step , 58274.61 GFLOP/s , 532895.1 tokens/s INFO:__main__:2024-10-27 05:59:05 | Epoch: 2 | Step: 55120 | Dataset: 0-4640802 | Loss: 2.218 | 675 ms/step , 58209.24 GFLOP/s , 533137.0 tokens/s INFO:__main__:2024-10-27 05:59:13 | Epoch: 2 | Step: 55130 | Dataset: 0-4648802 | Loss: 2.228 | 675 ms/step , 58251.53 GFLOP/s , 533317.5 tokens/s INFO:__main__:2024-10-27 05:59:20 | Epoch: 2 | Step: 55140 | Dataset: 0-4656802 | Loss: 2.183 | 676 ms/step , 58189.49 GFLOP/s , 532933.9 tokens/s INFO:__main__:2024-10-27 05:59:28 | Epoch: 2 | Step: 55150 | Dataset: 0-4664802 | Loss: 2.019 | 677 ms/step , 58078.70 GFLOP/s , 533199.8 tokens/s INFO:__main__:2024-10-27 05:59:36 | Epoch: 2 | Step: 55160 | Dataset: 0-4672802 | Loss: 2.166 | 675 ms/step , 58219.06 GFLOP/s , 533031.8 tokens/s INFO:__main__:2024-10-27 05:59:43 | Epoch: 2 | Step: 55170 | Dataset: 0-4680802 | Loss: 2.200 | 676 ms/step , 58190.05 GFLOP/s , 531943.5 tokens/s INFO:__main__:2024-10-27 05:59:51 | Epoch: 2 | Step: 55180 | Dataset: 0-4688802 | Loss: 2.251 | 676 ms/step , 58175.61 GFLOP/s , 532554.1 tokens/s INFO:__main__:2024-10-27 05:59:59 | Epoch: 2 | Step: 55190 | Dataset: 0-4696802 | Loss: 2.122 | 675 ms/step , 58194.00 GFLOP/s , 532809.6 tokens/s INFO:__main__:2024-10-27 06:00:06 | Epoch: 2 | Step: 55200 | Dataset: 0-4704802 | Loss: 2.182 | 676 ms/step , 58144.58 GFLOP/s , 529772.7 tokens/s INFO:__main__:2024-10-27 06:00:14 | Epoch: 2 | Step: 55210 | Dataset: 0-4712802 | Loss: 2.127 | 674 ms/step , 58284.66 GFLOP/s , 532698.5 tokens/s INFO:__main__:2024-10-27 06:00:22 | Epoch: 2 | Step: 55220 | Dataset: 0-4720802 | Loss: 2.203 | 676 ms/step , 58164.81 GFLOP/s , 532668.4 tokens/s INFO:__main__:2024-10-27 06:00:29 | Epoch: 2 | Step: 55230 | Dataset: 0-4728802 | Loss: 2.182 | 676 ms/step , 58181.32 GFLOP/s , 532587.0 tokens/s INFO:__main__:2024-10-27 06:00:37 | Epoch: 2 | Step: 55240 | Dataset: 0-4736802 | Loss: 2.119 | 675 ms/step , 58202.01 GFLOP/s , 532716.6 tokens/s INFO:__main__:2024-10-27 06:00:45 | Epoch: 2 | Step: 55250 | Dataset: 0-4744802 | Loss: 2.114 | 687 ms/step , 57231.32 GFLOP/s , 531694.9 tokens/s INFO:__main__:2024-10-27 06:00:53 | Epoch: 2 | Step: 55260 | Dataset: 0-4752802 | Loss: 2.163 | 676 ms/step , 58180.62 GFLOP/s , 532150.9 tokens/s INFO:__main__:2024-10-27 06:01:00 | Epoch: 2 | Step: 55270 | Dataset: 0-4760802 | Loss: 2.078 | 674 ms/step , 58316.66 GFLOP/s , 532758.8 tokens/s INFO:__main__:2024-10-27 06:01:08 | Epoch: 2 | Step: 55280 | Dataset: 0-4768802 | Loss: 2.197 | 675 ms/step , 58232.46 GFLOP/s , 533574.3 tokens/s INFO:__main__:2024-10-27 06:01:16 | Epoch: 2 | Step: 55290 | Dataset: 0-4776802 | Loss: 2.146 | 674 ms/step , 58292.33 GFLOP/s , 533647.1 tokens/s INFO:__main__:2024-10-27 06:01:23 | Epoch: 2 | Step: 55300 | Dataset: 0-4784802 | Loss: 2.109 | 675 ms/step , 58278.94 GFLOP/s , 532727.7 tokens/s INFO:__main__:2024-10-27 06:01:31 | Epoch: 2 | Step: 55310 | Dataset: 0-4792802 | Loss: 2.196 | 675 ms/step , 58250.38 GFLOP/s , 533021.9 tokens/s INFO:__main__:2024-10-27 06:01:39 | Epoch: 2 | Step: 55320 | Dataset: 0-4800802 | Loss: 2.106 | 675 ms/step , 58247.24 GFLOP/s , 532827.3 tokens/s INFO:__main__:2024-10-27 06:01:46 | Epoch: 2 | Step: 55330 | Dataset: 0-4808802 | Loss: 2.096 | 675 ms/step , 58234.87 GFLOP/s , 532985.1 tokens/s INFO:__main__:2024-10-27 06:01:54 | Epoch: 2 | Step: 55340 | Dataset: 0-4816802 | Loss: 2.200 | 675 ms/step , 58264.06 GFLOP/s , 533091.3 tokens/s INFO:__main__:2024-10-27 06:02:02 | Epoch: 2 | Step: 55350 | Dataset: 0-4824802 | Loss: 2.198 | 675 ms/step , 58244.41 GFLOP/s , 532827.2 tokens/s INFO:__main__:2024-10-27 06:02:09 | Epoch: 2 | Step: 55360 | Dataset: 0-4832802 | Loss: 2.135 | 675 ms/step , 58241.23 GFLOP/s , 532910.0 tokens/s INFO:__main__:2024-10-27 06:02:17 | Epoch: 2 | Step: 55370 | Dataset: 0-4840802 | Loss: 2.148 | 674 ms/step , 58285.12 GFLOP/s , 533645.3 tokens/s INFO:__main__:2024-10-27 06:02:25 | Epoch: 2 | Step: 55380 | Dataset: 0-4848802 | Loss: 2.089 | 674 ms/step , 58305.92 GFLOP/s , 533703.8 tokens/s INFO:__main__:2024-10-27 06:02:32 | Epoch: 2 | Step: 55390 | Dataset: 0-4856802 | Loss: 2.103 | 674 ms/step , 58357.43 GFLOP/s , 533693.9 tokens/s INFO:__main__:2024-10-27 06:02:40 | Epoch: 2 | Step: 55400 | Dataset: 0-4864802 | Loss: 2.238 | 675 ms/step , 58246.27 GFLOP/s , 533320.2 tokens/s INFO:__main__:2024-10-27 06:02:48 | Epoch: 2 | Step: 55410 | Dataset: 0-4872802 | Loss: 2.170 | 675 ms/step , 58223.81 GFLOP/s , 533005.7 tokens/s INFO:__main__:2024-10-27 06:02:55 | Epoch: 2 | Step: 55420 | Dataset: 0-4880802 | Loss: 2.225 | 675 ms/step , 58244.54 GFLOP/s , 532873.4 tokens/s INFO:__main__:2024-10-27 06:03:03 | Epoch: 2 | Step: 55430 | Dataset: 0-4888802 | Loss: 2.107 | 674 ms/step , 58319.43 GFLOP/s , 533409.0 tokens/s INFO:__main__:2024-10-27 06:03:11 | Epoch: 2 | Step: 55440 | Dataset: 0-4896802 | Loss: 2.166 | 674 ms/step , 58322.75 GFLOP/s , 533688.2 tokens/s INFO:__main__:2024-10-27 06:03:18 | Epoch: 2 | Step: 55450 | Dataset: 0-4904802 | Loss: 2.100 | 674 ms/step , 58332.97 GFLOP/s , 533893.8 tokens/s INFO:__main__:2024-10-27 06:03:26 | Epoch: 2 | Step: 55460 | Dataset: 0-4912802 | Loss: 2.110 | 676 ms/step , 58160.93 GFLOP/s , 533281.3 tokens/s INFO:__main__:2024-10-27 06:03:34 | Epoch: 2 | Step: 55470 | Dataset: 0-4920802 | Loss: 2.049 | 675 ms/step , 58272.86 GFLOP/s , 533099.1 tokens/s INFO:__main__:2024-10-27 06:03:42 | Epoch: 2 | Step: 55480 | Dataset: 0-4928802 | Loss: 2.209 | 675 ms/step , 58237.21 GFLOP/s , 532634.1 tokens/s INFO:__main__:2024-10-27 06:03:49 | Epoch: 2 | Step: 55490 | Dataset: 0-4936802 | Loss: 2.127 | 675 ms/step , 58202.71 GFLOP/s , 533302.1 tokens/s INFO:__main__:2024-10-27 06:03:57 | Epoch: 2 | Step: 55500 | Dataset: 0-4944802 | Loss: 2.064 | 674 ms/step , 58324.42 GFLOP/s , 533156.6 tokens/s INFO:__main__:2024-10-27 06:04:05 | Epoch: 2 | Step: 55510 | Dataset: 0-4952802 | Loss: 2.154 | 673 ms/step , 58392.00 GFLOP/s , 533826.9 tokens/s INFO:__main__:2024-10-27 06:04:12 | Epoch: 2 | Step: 55520 | Dataset: 0-4960802 | Loss: 2.084 | 674 ms/step , 58317.79 GFLOP/s , 533270.4 tokens/s INFO:__main__:2024-10-27 06:04:20 | Epoch: 2 | Step: 55530 | Dataset: 0-4968802 | Loss: 2.056 | 673 ms/step , 58372.58 GFLOP/s , 533201.5 tokens/s INFO:__main__:2024-10-27 06:04:28 | Epoch: 2 | Step: 55540 | Dataset: 0-4976802 | Loss: 2.096 | 675 ms/step , 58239.33 GFLOP/s , 532783.6 tokens/s INFO:__main__:2024-10-27 06:04:35 | Epoch: 2 | Step: 55550 | Dataset: 0-4984802 | Loss: 2.193 | 675 ms/step , 58209.12 GFLOP/s , 532593.9 tokens/s INFO:__main__:2024-10-27 06:04:43 | Epoch: 2 | Step: 55560 | Dataset: 0-4992802 | Loss: 2.151 | 676 ms/step , 58181.95 GFLOP/s , 532615.1 tokens/s INFO:__main__:2024-10-27 06:04:51 | Epoch: 2 | Step: 55570 | Dataset: 0-5000802 | Loss: 2.102 | 675 ms/step , 58258.51 GFLOP/s , 532801.6 tokens/s INFO:__main__:2024-10-27 06:04:58 | Epoch: 2 | Step: 55580 | Dataset: 0-5008802 | Loss: 2.055 | 674 ms/step , 58293.08 GFLOP/s , 533296.5 tokens/s INFO:__main__:2024-10-27 06:05:06 | Epoch: 2 | Step: 55590 | Dataset: 0-5016802 | Loss: 2.312 | 675 ms/step , 58245.93 GFLOP/s , 533173.8 tokens/s INFO:__main__:2024-10-27 06:05:14 | Epoch: 2 | Step: 55600 | Dataset: 0-5024802 | Loss: 2.207 | 676 ms/step , 58107.83 GFLOP/s , 531395.0 tokens/s INFO:__main__:2024-10-27 06:05:21 | Epoch: 2 | Step: 55610 | Dataset: 0-5032802 | Loss: 2.215 | 675 ms/step , 58216.66 GFLOP/s , 531076.9 tokens/s INFO:__main__:2024-10-27 06:05:29 | Epoch: 2 | Step: 55620 | Dataset: 0-5040802 | Loss: 2.200 | 677 ms/step , 58067.91 GFLOP/s , 531749.5 tokens/s INFO:__main__:2024-10-27 06:05:37 | Epoch: 2 | Step: 55630 | Dataset: 0-5048802 | Loss: 2.204 | 677 ms/step , 58090.44 GFLOP/s , 531335.1 tokens/s INFO:__main__:2024-10-27 06:05:45 | Epoch: 2 | Step: 55640 | Dataset: 0-5056802 | Loss: 2.157 | 676 ms/step , 58167.60 GFLOP/s , 531531.9 tokens/s INFO:__main__:2024-10-27 06:05:52 | Epoch: 2 | Step: 55650 | Dataset: 0-5064802 | Loss: 2.177 | 676 ms/step , 58153.03 GFLOP/s , 531652.9 tokens/s INFO:__main__:2024-10-27 06:06:00 | Epoch: 2 | Step: 55660 | Dataset: 0-5072802 | Loss: 2.120 | 675 ms/step , 58225.75 GFLOP/s , 531069.8 tokens/s INFO:__main__:2024-10-27 06:06:08 | Epoch: 2 | Step: 55670 | Dataset: 0-5080802 | Loss: 2.058 | 677 ms/step , 58064.01 GFLOP/s , 528944.9 tokens/s INFO:__main__:2024-10-27 06:06:15 | Epoch: 2 | Step: 55680 | Dataset: 0-5088802 | Loss: 2.110 | 675 ms/step , 58230.64 GFLOP/s , 532237.5 tokens/s INFO:__main__:2024-10-27 06:06:23 | Epoch: 2 | Step: 55690 | Dataset: 0-5096802 | Loss: 2.166 | 675 ms/step , 58212.14 GFLOP/s , 532279.1 tokens/s INFO:__main__:2024-10-27 06:06:31 | Epoch: 2 | Step: 55700 | Dataset: 0-5104802 | Loss: 2.023 | 675 ms/step , 58194.73 GFLOP/s , 531998.7 tokens/s INFO:__main__:2024-10-27 06:06:39 | Epoch: 2 | Step: 55710 | Dataset: 0-5112802 | Loss: 2.077 | 675 ms/step , 58260.90 GFLOP/s , 532793.8 tokens/s INFO:__main__:2024-10-27 06:06:46 | Epoch: 2 | Step: 55720 | Dataset: 0-5120802 | Loss: 2.141 | 676 ms/step , 58168.87 GFLOP/s , 532559.9 tokens/s INFO:__main__:2024-10-27 06:06:54 | Epoch: 2 | Step: 55730 | Dataset: 0-5128802 | Loss: 2.123 | 675 ms/step , 58224.17 GFLOP/s , 532966.0 tokens/s INFO:__main__:2024-10-27 06:07:02 | Epoch: 2 | Step: 55740 | Dataset: 0-5136802 | Loss: 2.100 | 676 ms/step , 58120.77 GFLOP/s , 532759.6 tokens/s INFO:__main__:2024-10-27 06:07:09 | Epoch: 2 | Step: 55750 | Dataset: 0-5144802 | Loss: 2.125 | 676 ms/step , 58121.99 GFLOP/s , 531386.2 tokens/s INFO:__main__:2024-10-27 06:07:17 | Epoch: 2 | Step: 55760 | Dataset: 0-5152802 | Loss: 1.870 | 676 ms/step , 58144.12 GFLOP/s , 530909.3 tokens/s INFO:__main__:2024-10-27 06:07:25 | Epoch: 2 | Step: 55770 | Dataset: 0-5160802 | Loss: 1.759 | 676 ms/step , 58180.83 GFLOP/s , 531664.3 tokens/s INFO:__main__:2024-10-27 06:07:32 | Epoch: 2 | Step: 55780 | Dataset: 0-5168802 | Loss: 1.750 | 675 ms/step , 58260.40 GFLOP/s , 531636.6 tokens/s INFO:__main__:2024-10-27 06:07:40 | Epoch: 2 | Step: 55790 | Dataset: 0-5176802 | Loss: 1.718 | 676 ms/step , 58132.87 GFLOP/s , 531858.2 tokens/s INFO:__main__:2024-10-27 06:07:48 | Epoch: 2 | Step: 55800 | Dataset: 0-5184802 | Loss: 1.725 | 675 ms/step , 58208.65 GFLOP/s , 531747.8 tokens/s INFO:__main__:2024-10-27 06:07:56 | Epoch: 2 | Step: 55810 | Dataset: 0-5192802 | Loss: 1.695 | 675 ms/step , 58203.04 GFLOP/s , 531682.4 tokens/s INFO:__main__:2024-10-27 06:08:03 | Epoch: 2 | Step: 55820 | Dataset: 0-5200802 | Loss: 1.718 | 675 ms/step , 58199.57 GFLOP/s , 532478.2 tokens/s INFO:__main__:2024-10-27 06:08:11 | Epoch: 2 | Step: 55830 | Dataset: 0-5208802 | Loss: 1.662 | 675 ms/step , 58228.71 GFLOP/s , 531953.1 tokens/s INFO:__main__:2024-10-27 06:08:19 | Epoch: 2 | Step: 55840 | Dataset: 0-5216802 | Loss: 2.243 | 675 ms/step , 58204.49 GFLOP/s , 532457.5 tokens/s INFO:__main__:2024-10-27 06:08:26 | Epoch: 2 | Step: 55850 | Dataset: 0-5224802 | Loss: 2.229 | 675 ms/step , 58249.20 GFLOP/s , 532652.1 tokens/s INFO:__main__:2024-10-27 06:08:34 | Epoch: 2 | Step: 55860 | Dataset: 0-5232802 | Loss: 2.179 | 676 ms/step , 58167.32 GFLOP/s , 532352.9 tokens/s INFO:__main__:2024-10-27 06:08:42 | Epoch: 2 | Step: 55870 | Dataset: 0-5240802 | Loss: 2.095 | 677 ms/step , 58094.64 GFLOP/s , 532264.9 tokens/s INFO:__main__:2024-10-27 06:08:49 | Epoch: 2 | Step: 55880 | Dataset: 0-5248802 | Loss: 2.111 | 675 ms/step , 58198.87 GFLOP/s , 532473.1 tokens/s INFO:__main__:2024-10-27 06:08:57 | Epoch: 2 | Step: 55890 | Dataset: 0-5256802 | Loss: 2.222 | 675 ms/step , 58222.56 GFLOP/s , 532667.6 tokens/s INFO:__main__:2024-10-27 06:09:05 | Epoch: 2 | Step: 55900 | Dataset: 0-5264802 | Loss: 2.107 | 675 ms/step , 58232.88 GFLOP/s , 532538.5 tokens/s INFO:__main__:2024-10-27 06:09:12 | Epoch: 2 | Step: 55910 | Dataset: 0-5272802 | Loss: 2.139 | 675 ms/step , 58233.56 GFLOP/s , 532616.2 tokens/s INFO:__main__:2024-10-27 06:09:20 | Epoch: 2 | Step: 55920 | Dataset: 0-5280802 | Loss: 2.082 | 675 ms/step , 58202.07 GFLOP/s , 532384.2 tokens/s INFO:__main__:2024-10-27 06:09:28 | Epoch: 2 | Step: 55930 | Dataset: 0-5288802 | Loss: 2.109 | 675 ms/step , 58244.75 GFLOP/s , 532570.6 tokens/s INFO:__main__:2024-10-27 06:09:36 | Epoch: 2 | Step: 55940 | Dataset: 0-5296802 | Loss: 2.155 | 675 ms/step , 58249.67 GFLOP/s , 532363.7 tokens/s INFO:__main__:2024-10-27 06:09:43 | Epoch: 2 | Step: 55950 | Dataset: 0-5304802 | Loss: 2.015 | 675 ms/step , 58199.63 GFLOP/s , 532733.2 tokens/s INFO:__main__:2024-10-27 06:09:51 | Epoch: 2 | Step: 55960 | Dataset: 0-5312802 | Loss: 2.098 | 675 ms/step , 58225.55 GFLOP/s , 532589.4 tokens/s INFO:__main__:2024-10-27 06:09:59 | Epoch: 2 | Step: 55970 | Dataset: 0-5320802 | Loss: 2.201 | 675 ms/step , 58242.91 GFLOP/s , 532697.8 tokens/s INFO:__main__:2024-10-27 06:10:06 | Epoch: 2 | Step: 55980 | Dataset: 0-5328802 | Loss: 1.992 | 674 ms/step , 58331.77 GFLOP/s , 533445.3 tokens/s INFO:__main__:2024-10-27 06:10:14 | Epoch: 2 | Step: 55990 | Dataset: 0-5336802 | Loss: 2.046 | 675 ms/step , 58253.80 GFLOP/s , 532569.1 tokens/s INFO:__main__:2024-10-27 06:10:21 | Validation | Step: 56000 | Val_loss: 2.200 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 06:10:21 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_061021_step_56000.pt` INFO:__main__:2024-10-27 06:10:23 | Epoch: 2 | Step: 56000 | Dataset: 0-5344802 | Loss: 2.052 | 674 ms/step , 58312.11 GFLOP/s , 480279.1 tokens/s INFO:__main__:2024-10-27 06:10:30 | Epoch: 2 | Step: 56010 | Dataset: 0-5352802 | Loss: 2.009 | 674 ms/step , 58302.65 GFLOP/s , 532560.2 tokens/s INFO:__main__:2024-10-27 06:10:38 | Epoch: 2 | Step: 56020 | Dataset: 0-5360802 | Loss: 2.009 | 675 ms/step , 58250.10 GFLOP/s , 532473.6 tokens/s INFO:__main__:2024-10-27 06:10:46 | Epoch: 2 | Step: 56030 | Dataset: 0-5368802 | Loss: 2.013 | 675 ms/step , 58195.35 GFLOP/s , 532339.7 tokens/s INFO:__main__:2024-10-27 06:10:53 | Epoch: 2 | Step: 56040 | Dataset: 0-5376802 | Loss: 1.962 | 675 ms/step , 58261.10 GFLOP/s , 532836.1 tokens/s INFO:__main__:2024-10-27 06:11:01 | Epoch: 2 | Step: 56050 | Dataset: 0-5384802 | Loss: 1.941 | 675 ms/step , 58278.21 GFLOP/s , 532761.1 tokens/s INFO:__main__:2024-10-27 06:11:09 | Epoch: 2 | Step: 56060 | Dataset: 0-5392802 | Loss: 1.879 | 675 ms/step , 58242.21 GFLOP/s , 532439.3 tokens/s INFO:__main__:2024-10-27 06:11:16 | Epoch: 2 | Step: 56070 | Dataset: 0-5400802 | Loss: 1.930 | 677 ms/step , 58055.75 GFLOP/s , 531569.3 tokens/s INFO:__main__:2024-10-27 06:11:24 | Epoch: 2 | Step: 56080 | Dataset: 0-5408802 | Loss: 1.904 | 675 ms/step , 58222.45 GFLOP/s , 532126.7 tokens/s INFO:__main__:2024-10-27 06:11:32 | Epoch: 2 | Step: 56090 | Dataset: 0-5416802 | Loss: 1.873 | 677 ms/step , 58091.47 GFLOP/s , 531856.4 tokens/s INFO:__main__:2024-10-27 06:11:39 | Epoch: 2 | Step: 56100 | Dataset: 0-5424802 | Loss: 1.851 | 675 ms/step , 58207.97 GFLOP/s , 532163.1 tokens/s INFO:__main__:2024-10-27 06:11:47 | Epoch: 2 | Step: 56110 | Dataset: 0-5432802 | Loss: 1.778 | 674 ms/step , 58279.27 GFLOP/s , 532329.8 tokens/s INFO:__main__:2024-10-27 06:11:55 | Epoch: 2 | Step: 56120 | Dataset: 0-5440802 | Loss: 1.790 | 675 ms/step , 58223.30 GFLOP/s , 532032.5 tokens/s INFO:__main__:2024-10-27 06:12:03 | Epoch: 2 | Step: 56130 | Dataset: 0-5448802 | Loss: 1.746 | 674 ms/step , 58359.25 GFLOP/s , 532842.8 tokens/s INFO:__main__:2024-10-27 06:12:10 | Epoch: 2 | Step: 56140 | Dataset: 0-5456802 | Loss: 1.752 | 674 ms/step , 58339.02 GFLOP/s , 532618.8 tokens/s INFO:__main__:2024-10-27 06:12:18 | Epoch: 2 | Step: 56150 | Dataset: 0-5464802 | Loss: 1.751 | 674 ms/step , 58309.19 GFLOP/s , 532623.1 tokens/s INFO:__main__:2024-10-27 06:12:26 | Epoch: 2 | Step: 56160 | Dataset: 0-5472802 | Loss: 1.777 | 675 ms/step , 58215.12 GFLOP/s , 532344.3 tokens/s INFO:__main__:2024-10-27 06:12:33 | Epoch: 2 | Step: 56170 | Dataset: 0-5480802 | Loss: 1.771 | 675 ms/step , 58227.43 GFLOP/s , 532252.5 tokens/s INFO:__main__:2024-10-27 06:12:41 | Epoch: 2 | Step: 56180 | Dataset: 0-5488802 | Loss: 1.774 | 673 ms/step , 58376.63 GFLOP/s , 532908.1 tokens/s INFO:__main__:2024-10-27 06:12:49 | Epoch: 2 | Step: 56190 | Dataset: 0-5496802 | Loss: 2.508 | 674 ms/step , 58322.19 GFLOP/s , 532815.0 tokens/s INFO:__main__:2024-10-27 06:12:56 | Epoch: 2 | Step: 56200 | Dataset: 0-5504802 | Loss: 2.290 | 674 ms/step , 58282.54 GFLOP/s , 533597.4 tokens/s INFO:__main__:2024-10-27 06:13:04 | Epoch: 2 | Step: 56210 | Dataset: 0-5512802 | Loss: 2.234 | 675 ms/step , 58250.93 GFLOP/s , 533067.0 tokens/s INFO:__main__:2024-10-27 06:13:12 | Epoch: 2 | Step: 56220 | Dataset: 0-5520802 | Loss: 2.241 | 674 ms/step , 58314.09 GFLOP/s , 533613.3 tokens/s INFO:__main__:2024-10-27 06:13:19 | Epoch: 2 | Step: 56230 | Dataset: 0-5528802 | Loss: 2.311 | 675 ms/step , 58279.01 GFLOP/s , 529938.9 tokens/s INFO:__main__:2024-10-27 06:13:27 | Epoch: 2 | Step: 56240 | Dataset: 0-5536802 | Loss: 2.143 | 674 ms/step , 58363.47 GFLOP/s , 533027.1 tokens/s INFO:__main__:2024-10-27 06:13:35 | Epoch: 2 | Step: 56250 | Dataset: 0-5544802 | Loss: 2.166 | 674 ms/step , 58317.26 GFLOP/s , 532932.4 tokens/s INFO:__main__:2024-10-27 06:13:42 | Epoch: 2 | Step: 56260 | Dataset: 0-5552802 | Loss: 2.204 | 674 ms/step , 58303.58 GFLOP/s , 533360.9 tokens/s INFO:__main__:2024-10-27 06:13:50 | Epoch: 2 | Step: 56270 | Dataset: 0-5560802 | Loss: 2.136 | 675 ms/step , 58215.13 GFLOP/s , 532453.9 tokens/s INFO:__main__:2024-10-27 06:13:58 | Epoch: 2 | Step: 56280 | Dataset: 0-5568802 | Loss: 2.169 | 674 ms/step , 58331.26 GFLOP/s , 533500.8 tokens/s INFO:__main__:2024-10-27 06:14:06 | Epoch: 2 | Step: 56290 | Dataset: 0-5576802 | Loss: 2.179 | 674 ms/step , 58306.86 GFLOP/s , 533054.1 tokens/s INFO:__main__:2024-10-27 06:14:13 | Epoch: 2 | Step: 56300 | Dataset: 0-5584802 | Loss: 2.192 | 675 ms/step , 58204.23 GFLOP/s , 532677.3 tokens/s INFO:__main__:2024-10-27 06:14:21 | Epoch: 2 | Step: 56310 | Dataset: 0-5592802 | Loss: 2.167 | 674 ms/step , 58314.94 GFLOP/s , 532698.5 tokens/s INFO:__main__:2024-10-27 06:14:29 | Epoch: 2 | Step: 56320 | Dataset: 0-5600802 | Loss: 2.212 | 676 ms/step , 58147.87 GFLOP/s , 532720.0 tokens/s INFO:__main__:2024-10-27 06:14:36 | Epoch: 2 | Step: 56330 | Dataset: 0-5608802 | Loss: 2.163 | 676 ms/step , 58162.97 GFLOP/s , 532369.3 tokens/s INFO:__main__:2024-10-27 06:14:44 | Epoch: 2 | Step: 56340 | Dataset: 0-5616802 | Loss: 2.157 | 675 ms/step , 58207.30 GFLOP/s , 532311.1 tokens/s INFO:__main__:2024-10-27 06:14:52 | Epoch: 2 | Step: 56350 | Dataset: 0-5624802 | Loss: 2.097 | 675 ms/step , 58253.00 GFLOP/s , 532821.7 tokens/s INFO:__main__:2024-10-27 06:14:59 | Epoch: 2 | Step: 56360 | Dataset: 0-5632802 | Loss: 2.246 | 674 ms/step , 58348.48 GFLOP/s , 532695.8 tokens/s INFO:__main__:2024-10-27 06:15:07 | Epoch: 2 | Step: 56370 | Dataset: 0-5640802 | Loss: 2.226 | 675 ms/step , 58218.49 GFLOP/s , 532445.0 tokens/s INFO:__main__:2024-10-27 06:15:15 | Epoch: 2 | Step: 56380 | Dataset: 0-5648802 | Loss: 2.214 | 675 ms/step , 58218.40 GFLOP/s , 531109.1 tokens/s INFO:__main__:2024-10-27 06:15:22 | Epoch: 2 | Step: 56390 | Dataset: 0-5656802 | Loss: 2.268 | 675 ms/step , 58240.25 GFLOP/s , 531596.4 tokens/s INFO:__main__:2024-10-27 06:15:30 | Epoch: 2 | Step: 56400 | Dataset: 0-5664802 | Loss: 2.239 | 677 ms/step , 58093.52 GFLOP/s , 531445.2 tokens/s INFO:__main__:2024-10-27 06:15:38 | Epoch: 2 | Step: 56410 | Dataset: 0-5672802 | Loss: 2.186 | 675 ms/step , 58202.95 GFLOP/s , 531717.1 tokens/s INFO:__main__:2024-10-27 06:15:46 | Epoch: 2 | Step: 56420 | Dataset: 0-5680802 | Loss: 2.169 | 676 ms/step , 58135.62 GFLOP/s , 531785.0 tokens/s INFO:__main__:2024-10-27 06:15:53 | Epoch: 2 | Step: 56430 | Dataset: 0-5688802 | Loss: 2.170 | 677 ms/step , 58039.22 GFLOP/s , 531309.7 tokens/s INFO:__main__:2024-10-27 06:16:01 | Epoch: 2 | Step: 56440 | Dataset: 0-5696802 | Loss: 2.158 | 676 ms/step , 58185.12 GFLOP/s , 531755.9 tokens/s INFO:__main__:2024-10-27 06:16:09 | Epoch: 2 | Step: 56450 | Dataset: 0-5704802 | Loss: 2.189 | 677 ms/step , 58087.58 GFLOP/s , 531717.3 tokens/s INFO:__main__:2024-10-27 06:16:16 | Epoch: 2 | Step: 56460 | Dataset: 0-5712802 | Loss: 2.151 | 676 ms/step , 58111.08 GFLOP/s , 531260.0 tokens/s INFO:__main__:2024-10-27 06:16:24 | Epoch: 2 | Step: 56470 | Dataset: 0-5720802 | Loss: 2.180 | 676 ms/step , 58170.97 GFLOP/s , 531134.1 tokens/s INFO:__main__:2024-10-27 06:16:32 | Epoch: 2 | Step: 56480 | Dataset: 0-5728802 | Loss: 2.104 | 676 ms/step , 58123.47 GFLOP/s , 531349.4 tokens/s INFO:__main__:2024-10-27 06:16:40 | Epoch: 2 | Step: 56490 | Dataset: 0-5736802 | Loss: 2.117 | 676 ms/step , 58168.50 GFLOP/s , 531522.2 tokens/s INFO:__main__:2024-10-27 06:16:47 | Epoch: 2 | Step: 56500 | Dataset: 0-5744802 | Loss: 2.163 | 675 ms/step , 58231.36 GFLOP/s , 532067.1 tokens/s INFO:__main__:2024-10-27 06:16:55 | Epoch: 2 | Step: 56510 | Dataset: 0-5752802 | Loss: 2.138 | 675 ms/step , 58278.19 GFLOP/s , 532166.5 tokens/s INFO:__main__:2024-10-27 06:17:03 | Epoch: 2 | Step: 56520 | Dataset: 0-5760802 | Loss: 2.273 | 676 ms/step , 58170.10 GFLOP/s , 531876.2 tokens/s INFO:__main__:2024-10-27 06:17:10 | Epoch: 2 | Step: 56530 | Dataset: 0-5768802 | Loss: 2.230 | 675 ms/step , 58273.11 GFLOP/s , 532672.6 tokens/s INFO:__main__:2024-10-27 06:17:18 | Epoch: 2 | Step: 56540 | Dataset: 0-5776802 | Loss: 2.184 | 675 ms/step , 58236.24 GFLOP/s , 532542.9 tokens/s INFO:__main__:2024-10-27 06:17:26 | Epoch: 2 | Step: 56550 | Dataset: 0-5784802 | Loss: 2.137 | 675 ms/step , 58234.15 GFLOP/s , 532361.5 tokens/s INFO:__main__:2024-10-27 06:17:33 | Epoch: 2 | Step: 56560 | Dataset: 0-5792802 | Loss: 2.239 | 674 ms/step , 58340.60 GFLOP/s , 532444.5 tokens/s INFO:__main__:2024-10-27 06:17:41 | Epoch: 2 | Step: 56570 | Dataset: 0-5800802 | Loss: 2.213 | 675 ms/step , 58232.91 GFLOP/s , 532318.4 tokens/s INFO:__main__:2024-10-27 06:17:49 | Epoch: 2 | Step: 56580 | Dataset: 0-5808802 | Loss: 2.176 | 674 ms/step , 58302.94 GFLOP/s , 532297.2 tokens/s INFO:__main__:2024-10-27 06:17:57 | Epoch: 2 | Step: 56590 | Dataset: 0-5816802 | Loss: 2.156 | 676 ms/step , 58148.02 GFLOP/s , 532184.3 tokens/s INFO:__main__:2024-10-27 06:18:04 | Epoch: 2 | Step: 56600 | Dataset: 0-5824802 | Loss: 2.186 | 675 ms/step , 58251.11 GFLOP/s , 531852.1 tokens/s INFO:__main__:2024-10-27 06:18:12 | Epoch: 2 | Step: 56610 | Dataset: 0-5832802 | Loss: 2.102 | 676 ms/step , 58174.15 GFLOP/s , 532293.9 tokens/s INFO:__main__:2024-10-27 06:18:20 | Epoch: 2 | Step: 56620 | Dataset: 0-5840802 | Loss: 2.116 | 675 ms/step , 58266.99 GFLOP/s , 532411.3 tokens/s INFO:__main__:2024-10-27 06:18:27 | Epoch: 2 | Step: 56630 | Dataset: 0-5848802 | Loss: 2.203 | 674 ms/step , 58279.73 GFLOP/s , 532468.1 tokens/s INFO:__main__:2024-10-27 06:18:35 | Epoch: 2 | Step: 56640 | Dataset: 0-5856802 | Loss: 2.236 | 674 ms/step , 58282.88 GFLOP/s , 532751.8 tokens/s INFO:__main__:2024-10-27 06:18:43 | Epoch: 2 | Step: 56650 | Dataset: 0-5864802 | Loss: 2.103 | 675 ms/step , 58252.56 GFLOP/s , 532520.0 tokens/s INFO:__main__:2024-10-27 06:18:50 | Epoch: 2 | Step: 56660 | Dataset: 0-5872802 | Loss: 2.177 | 675 ms/step , 58238.50 GFLOP/s , 532091.0 tokens/s INFO:__main__:2024-10-27 06:18:58 | Epoch: 2 | Step: 56670 | Dataset: 0-5880802 | Loss: 2.126 | 676 ms/step , 58153.85 GFLOP/s , 531941.0 tokens/s INFO:__main__:2024-10-27 06:19:06 | Epoch: 2 | Step: 56680 | Dataset: 0-5888802 | Loss: 2.196 | 676 ms/step , 58141.83 GFLOP/s , 531609.5 tokens/s INFO:__main__:2024-10-27 06:19:13 | Epoch: 2 | Step: 56690 | Dataset: 0-5896802 | Loss: 2.225 | 675 ms/step , 58262.32 GFLOP/s , 532621.7 tokens/s INFO:__main__:2024-10-27 06:19:21 | Epoch: 2 | Step: 56700 | Dataset: 0-5904802 | Loss: 2.089 | 676 ms/step , 58154.87 GFLOP/s , 532257.3 tokens/s INFO:__main__:2024-10-27 06:19:29 | Epoch: 2 | Step: 56710 | Dataset: 0-5912802 | Loss: 2.205 | 675 ms/step , 58227.14 GFLOP/s , 532060.3 tokens/s INFO:__main__:2024-10-27 06:19:37 | Epoch: 2 | Step: 56720 | Dataset: 0-5920802 | Loss: 2.154 | 675 ms/step , 58262.15 GFLOP/s , 532308.2 tokens/s INFO:__main__:2024-10-27 06:19:44 | Epoch: 2 | Step: 56730 | Dataset: 0-5928802 | Loss: 2.186 | 674 ms/step , 58312.10 GFLOP/s , 532457.4 tokens/s INFO:__main__:2024-10-27 06:19:52 | Epoch: 2 | Step: 56740 | Dataset: 0-5936802 | Loss: 2.138 | 674 ms/step , 58302.14 GFLOP/s , 532508.3 tokens/s INFO:__main__:2024-10-27 06:20:00 | Epoch: 2 | Step: 56750 | Dataset: 0-5944802 | Loss: 2.175 | 674 ms/step , 58346.41 GFLOP/s , 532995.9 tokens/s INFO:__main__:2024-10-27 06:20:07 | Epoch: 2 | Step: 56760 | Dataset: 0-5952802 | Loss: 2.164 | 674 ms/step , 58331.23 GFLOP/s , 531050.4 tokens/s INFO:__main__:2024-10-27 06:20:15 | Epoch: 2 | Step: 56770 | Dataset: 0-5960802 | Loss: 2.142 | 675 ms/step , 58204.29 GFLOP/s , 532306.4 tokens/s INFO:__main__:2024-10-27 06:20:23 | Epoch: 2 | Step: 56780 | Dataset: 0-5968802 | Loss: 2.167 | 675 ms/step , 58261.09 GFLOP/s , 531059.0 tokens/s INFO:__main__:2024-10-27 06:20:30 | Epoch: 2 | Step: 56790 | Dataset: 0-5976802 | Loss: 2.184 | 675 ms/step , 58242.72 GFLOP/s , 532937.9 tokens/s INFO:__main__:2024-10-27 06:20:38 | Epoch: 2 | Step: 56800 | Dataset: 0-5984802 | Loss: 2.214 | 674 ms/step , 58318.08 GFLOP/s , 532599.1 tokens/s INFO:__main__:2024-10-27 06:20:46 | Epoch: 2 | Step: 56810 | Dataset: 0-5992802 | Loss: 2.188 | 673 ms/step , 58377.90 GFLOP/s , 532734.4 tokens/s INFO:__main__:2024-10-27 06:20:53 | Epoch: 2 | Step: 56820 | Dataset: 0-6000802 | Loss: 2.190 | 675 ms/step , 58203.55 GFLOP/s , 532454.2 tokens/s INFO:__main__:2024-10-27 06:21:01 | Epoch: 2 | Step: 56830 | Dataset: 0-6008802 | Loss: 2.241 | 674 ms/step , 58281.94 GFLOP/s , 532759.3 tokens/s INFO:__main__:2024-10-27 06:21:09 | Epoch: 2 | Step: 56840 | Dataset: 0-6016802 | Loss: 2.201 | 675 ms/step , 58261.48 GFLOP/s , 532639.5 tokens/s INFO:__main__:2024-10-27 06:21:17 | Epoch: 2 | Step: 56850 | Dataset: 0-6024802 | Loss: 2.226 | 674 ms/step , 58342.49 GFLOP/s , 532642.8 tokens/s INFO:__main__:2024-10-27 06:21:24 | Epoch: 2 | Step: 56860 | Dataset: 0-6032802 | Loss: 2.258 | 674 ms/step , 58282.78 GFLOP/s , 533005.9 tokens/s INFO:__main__:2024-10-27 06:21:32 | Epoch: 2 | Step: 56870 | Dataset: 0-6040802 | Loss: 2.094 | 675 ms/step , 58246.42 GFLOP/s , 532476.3 tokens/s INFO:__main__:2024-10-27 06:21:40 | Epoch: 2 | Step: 56880 | Dataset: 0-6048802 | Loss: 2.112 | 675 ms/step , 58237.75 GFLOP/s , 532469.1 tokens/s INFO:__main__:2024-10-27 06:21:47 | Epoch: 2 | Step: 56890 | Dataset: 0-6056802 | Loss: 2.159 | 673 ms/step , 58384.14 GFLOP/s , 532428.1 tokens/s INFO:__main__:2024-10-27 06:21:55 | Epoch: 2 | Step: 56900 | Dataset: 0-6064802 | Loss: 2.120 | 674 ms/step , 58292.10 GFLOP/s , 533180.0 tokens/s INFO:__main__:2024-10-27 06:22:03 | Epoch: 2 | Step: 56910 | Dataset: 0-6072802 | Loss: 2.103 | 674 ms/step , 58344.19 GFLOP/s , 533023.9 tokens/s INFO:__main__:2024-10-27 06:22:10 | Epoch: 2 | Step: 56920 | Dataset: 0-6080802 | Loss: 2.105 | 675 ms/step , 58241.46 GFLOP/s , 532257.1 tokens/s INFO:__main__:2024-10-27 06:22:18 | Epoch: 2 | Step: 56930 | Dataset: 0-6088802 | Loss: 2.047 | 675 ms/step , 58243.12 GFLOP/s , 532181.9 tokens/s INFO:__main__:2024-10-27 06:22:26 | Epoch: 2 | Step: 56940 | Dataset: 0-6096802 | Loss: 2.136 | 674 ms/step , 58326.73 GFLOP/s , 532769.9 tokens/s INFO:__main__:2024-10-27 06:22:33 | Epoch: 2 | Step: 56950 | Dataset: 0-6104802 | Loss: 2.109 | 676 ms/step , 58129.78 GFLOP/s , 531092.4 tokens/s INFO:__main__:2024-10-27 06:22:41 | Epoch: 2 | Step: 56960 | Dataset: 0-6112802 | Loss: 2.104 | 673 ms/step , 58392.18 GFLOP/s , 531263.9 tokens/s INFO:__main__:2024-10-27 06:22:49 | Epoch: 2 | Step: 56970 | Dataset: 0-6120802 | Loss: 2.087 | 674 ms/step , 58356.04 GFLOP/s , 533473.5 tokens/s INFO:__main__:2024-10-27 06:22:57 | Epoch: 2 | Step: 56980 | Dataset: 0-6128802 | Loss: 2.167 | 675 ms/step , 58238.72 GFLOP/s , 532982.7 tokens/s INFO:__main__:2024-10-27 06:23:04 | Epoch: 2 | Step: 56990 | Dataset: 0-6136802 | Loss: 2.152 | 675 ms/step , 58234.57 GFLOP/s , 533043.8 tokens/s INFO:__main__:2024-10-27 06:23:11 | Validation | Step: 57000 | Val_loss: 2.202 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 06:23:11 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_062311_step_57000.pt` INFO:__main__:2024-10-27 06:23:13 | Epoch: 2 | Step: 57000 | Dataset: 0-6144802 | Loss: 2.114 | 675 ms/step , 58255.91 GFLOP/s , 478532.5 tokens/s INFO:__main__:2024-10-27 06:23:21 | Epoch: 2 | Step: 57010 | Dataset: 0-6152802 | Loss: 2.156 | 677 ms/step , 58100.80 GFLOP/s , 530070.6 tokens/s INFO:__main__:2024-10-27 06:23:28 | Epoch: 2 | Step: 57020 | Dataset: 0-6160802 | Loss: 2.174 | 674 ms/step , 58285.98 GFLOP/s , 532061.8 tokens/s INFO:__main__:2024-10-27 06:23:36 | Epoch: 2 | Step: 57030 | Dataset: 0-6168802 | Loss: 2.219 | 676 ms/step , 58181.83 GFLOP/s , 531526.2 tokens/s INFO:__main__:2024-10-27 06:23:44 | Epoch: 2 | Step: 57040 | Dataset: 0-6176802 | Loss: 2.108 | 676 ms/step , 58141.94 GFLOP/s , 531242.3 tokens/s INFO:__main__:2024-10-27 06:23:51 | Epoch: 2 | Step: 57050 | Dataset: 0-6184802 | Loss: 2.018 | 676 ms/step , 58143.86 GFLOP/s , 531846.1 tokens/s INFO:__main__:2024-10-27 06:23:59 | Epoch: 2 | Step: 57060 | Dataset: 0-6192802 | Loss: 2.196 | 677 ms/step , 58103.55 GFLOP/s , 531201.6 tokens/s INFO:__main__:2024-10-27 06:24:07 | Epoch: 2 | Step: 57070 | Dataset: 0-6200802 | Loss: 2.055 | 677 ms/step , 58043.54 GFLOP/s , 528958.2 tokens/s INFO:__main__:2024-10-27 06:24:15 | Epoch: 2 | Step: 57080 | Dataset: 0-6208802 | Loss: 2.163 | 675 ms/step , 58223.86 GFLOP/s , 531639.2 tokens/s INFO:__main__:2024-10-27 06:24:22 | Epoch: 2 | Step: 57090 | Dataset: 0-6216802 | Loss: 2.098 | 674 ms/step , 58281.08 GFLOP/s , 532365.1 tokens/s INFO:__main__:2024-10-27 06:24:30 | Epoch: 2 | Step: 57100 | Dataset: 0-6224802 | Loss: 2.140 | 675 ms/step , 58236.24 GFLOP/s , 532755.1 tokens/s INFO:__main__:2024-10-27 06:24:38 | Epoch: 2 | Step: 57110 | Dataset: 0-6232802 | Loss: 2.051 | 675 ms/step , 58215.42 GFLOP/s , 532474.3 tokens/s INFO:__main__:2024-10-27 06:24:45 | Epoch: 2 | Step: 57120 | Dataset: 0-6240802 | Loss: 2.225 | 676 ms/step , 58180.17 GFLOP/s , 532354.6 tokens/s INFO:__main__:2024-10-27 06:24:53 | Epoch: 2 | Step: 57130 | Dataset: 0-6248802 | Loss: 2.087 | 674 ms/step , 58283.19 GFLOP/s , 531775.0 tokens/s INFO:__main__:2024-10-27 06:25:01 | Epoch: 2 | Step: 57140 | Dataset: 0-6256802 | Loss: 2.124 | 674 ms/step , 58291.57 GFLOP/s , 532889.2 tokens/s INFO:__main__:2024-10-27 06:25:08 | Epoch: 2 | Step: 57150 | Dataset: 0-6264802 | Loss: 2.116 | 676 ms/step , 58173.72 GFLOP/s , 532285.3 tokens/s INFO:__main__:2024-10-27 06:25:16 | Epoch: 2 | Step: 57160 | Dataset: 0-6272802 | Loss: 2.167 | 676 ms/step , 58192.33 GFLOP/s , 532284.4 tokens/s INFO:__main__:2024-10-27 06:25:24 | Epoch: 2 | Step: 57170 | Dataset: 0-6280802 | Loss: 2.174 | 676 ms/step , 58165.29 GFLOP/s , 532035.5 tokens/s INFO:__main__:2024-10-27 06:25:31 | Epoch: 2 | Step: 57180 | Dataset: 0-6288802 | Loss: 2.140 | 676 ms/step , 58169.07 GFLOP/s , 532524.1 tokens/s INFO:__main__:2024-10-27 06:25:39 | Epoch: 2 | Step: 57190 | Dataset: 0-6296802 | Loss: 2.205 | 676 ms/step , 58172.46 GFLOP/s , 532485.1 tokens/s INFO:__main__:2024-10-27 06:25:47 | Epoch: 2 | Step: 57200 | Dataset: 0-6304802 | Loss: 2.214 | 676 ms/step , 58150.88 GFLOP/s , 532000.3 tokens/s INFO:__main__:2024-10-27 06:25:55 | Epoch: 2 | Step: 57210 | Dataset: 0-6312802 | Loss: 2.145 | 676 ms/step , 58146.14 GFLOP/s , 532281.1 tokens/s INFO:__main__:2024-10-27 06:26:02 | Epoch: 2 | Step: 57220 | Dataset: 0-6320802 | Loss: 2.106 | 675 ms/step , 58205.93 GFLOP/s , 532225.4 tokens/s INFO:__main__:2024-10-27 06:26:10 | Epoch: 2 | Step: 57230 | Dataset: 0-6328802 | Loss: 2.119 | 677 ms/step , 58103.11 GFLOP/s , 532264.1 tokens/s INFO:__main__:2024-10-27 06:26:18 | Epoch: 2 | Step: 57240 | Dataset: 0-6336802 | Loss: 2.229 | 679 ms/step , 57890.45 GFLOP/s , 532104.6 tokens/s INFO:__main__:2024-10-27 06:26:25 | Epoch: 2 | Step: 57250 | Dataset: 0-6344802 | Loss: 2.262 | 674 ms/step , 58286.74 GFLOP/s , 533023.8 tokens/s INFO:__main__:2024-10-27 06:26:33 | Epoch: 2 | Step: 57260 | Dataset: 0-6352802 | Loss: 2.114 | 675 ms/step , 58261.15 GFLOP/s , 532957.6 tokens/s INFO:__main__:2024-10-27 06:26:41 | Epoch: 2 | Step: 57270 | Dataset: 0-6360802 | Loss: 2.220 | 675 ms/step , 58203.00 GFLOP/s , 532512.2 tokens/s INFO:__main__:2024-10-27 06:26:48 | Epoch: 2 | Step: 57280 | Dataset: 0-6368802 | Loss: 2.193 | 675 ms/step , 58262.26 GFLOP/s , 532617.5 tokens/s INFO:__main__:2024-10-27 06:26:56 | Epoch: 2 | Step: 57290 | Dataset: 0-6376802 | Loss: 2.125 | 675 ms/step , 58229.78 GFLOP/s , 532655.4 tokens/s INFO:__main__:2024-10-27 06:27:04 | Epoch: 2 | Step: 57300 | Dataset: 0-6384802 | Loss: 2.108 | 677 ms/step , 58097.62 GFLOP/s , 532933.2 tokens/s INFO:__main__:2024-10-27 06:27:11 | Epoch: 2 | Step: 57310 | Dataset: 0-6392802 | Loss: 2.112 | 675 ms/step , 58207.15 GFLOP/s , 532479.6 tokens/s INFO:__main__:2024-10-27 06:27:19 | Epoch: 2 | Step: 57320 | Dataset: 0-6400802 | Loss: 2.319 | 675 ms/step , 58268.29 GFLOP/s , 532619.8 tokens/s INFO:__main__:2024-10-27 06:27:27 | Epoch: 2 | Step: 57330 | Dataset: 0-6408802 | Loss: 2.236 | 678 ms/step , 57979.29 GFLOP/s , 532285.3 tokens/s INFO:__main__:2024-10-27 06:27:35 | Epoch: 2 | Step: 57340 | Dataset: 0-6416802 | Loss: 2.140 | 676 ms/step , 58155.59 GFLOP/s , 532838.2 tokens/s INFO:__main__:2024-10-27 06:27:42 | Epoch: 2 | Step: 57350 | Dataset: 0-6424802 | Loss: 2.147 | 676 ms/step , 58177.60 GFLOP/s , 532729.9 tokens/s INFO:__main__:2024-10-27 06:27:50 | Epoch: 2 | Step: 57360 | Dataset: 0-6432802 | Loss: 2.076 | 677 ms/step , 58092.07 GFLOP/s , 532864.9 tokens/s INFO:__main__:2024-10-27 06:27:58 | Epoch: 2 | Step: 57370 | Dataset: 0-6440802 | Loss: 2.095 | 675 ms/step , 58244.54 GFLOP/s , 533176.3 tokens/s INFO:__main__:2024-10-27 06:28:05 | Epoch: 2 | Step: 57380 | Dataset: 0-6448802 | Loss: 2.048 | 675 ms/step , 58251.28 GFLOP/s , 533101.4 tokens/s INFO:__main__:2024-10-27 06:28:13 | Epoch: 2 | Step: 57390 | Dataset: 0-6456802 | Loss: 2.081 | 674 ms/step , 58289.80 GFLOP/s , 533277.2 tokens/s INFO:__main__:2024-10-27 06:28:21 | Epoch: 2 | Step: 57400 | Dataset: 0-6464802 | Loss: 2.050 | 675 ms/step , 58194.63 GFLOP/s , 533188.8 tokens/s INFO:__main__:2024-10-27 06:28:28 | Epoch: 2 | Step: 57410 | Dataset: 0-6472802 | Loss: 2.023 | 675 ms/step , 58270.09 GFLOP/s , 533263.1 tokens/s INFO:__main__:2024-10-27 06:28:36 | Epoch: 2 | Step: 57420 | Dataset: 0-6480802 | Loss: 1.993 | 674 ms/step , 58280.05 GFLOP/s , 532784.8 tokens/s INFO:__main__:2024-10-27 06:28:44 | Epoch: 2 | Step: 57430 | Dataset: 0-6488802 | Loss: 2.034 | 675 ms/step , 58194.54 GFLOP/s , 532792.6 tokens/s INFO:__main__:2024-10-27 06:28:51 | Epoch: 2 | Step: 57440 | Dataset: 0-6496802 | Loss: 2.004 | 676 ms/step , 58183.66 GFLOP/s , 532937.5 tokens/s INFO:__main__:2024-10-27 06:28:59 | Epoch: 2 | Step: 57450 | Dataset: 0-6504802 | Loss: 1.996 | 674 ms/step , 58294.50 GFLOP/s , 533415.5 tokens/s INFO:__main__:2024-10-27 06:29:07 | Epoch: 2 | Step: 57460 | Dataset: 0-6512802 | Loss: 1.977 | 674 ms/step , 58300.20 GFLOP/s , 533609.2 tokens/s INFO:__main__:2024-10-27 06:29:14 | Epoch: 2 | Step: 57470 | Dataset: 0-6520802 | Loss: 1.996 | 674 ms/step , 58311.15 GFLOP/s , 533378.1 tokens/s INFO:__main__:2024-10-27 06:29:22 | Epoch: 2 | Step: 57480 | Dataset: 0-6528802 | Loss: 2.202 | 674 ms/step , 58305.10 GFLOP/s , 533007.6 tokens/s INFO:__main__:2024-10-27 06:29:30 | Epoch: 2 | Step: 57490 | Dataset: 0-6536802 | Loss: 1.969 | 677 ms/step , 58057.55 GFLOP/s , 532055.0 tokens/s INFO:__main__:2024-10-27 06:29:37 | Epoch: 2 | Step: 57500 | Dataset: 0-6544802 | Loss: 1.891 | 677 ms/step , 58043.92 GFLOP/s , 530650.1 tokens/s INFO:__main__:2024-10-27 06:29:45 | Epoch: 2 | Step: 57510 | Dataset: 0-6552802 | Loss: 1.843 | 677 ms/step , 58050.73 GFLOP/s , 529538.6 tokens/s INFO:__main__:2024-10-27 06:29:53 | Epoch: 2 | Step: 57520 | Dataset: 0-6560802 | Loss: 1.842 | 674 ms/step , 58350.61 GFLOP/s , 531847.0 tokens/s INFO:__main__:2024-10-27 06:30:01 | Epoch: 2 | Step: 57530 | Dataset: 0-6568802 | Loss: 1.815 | 680 ms/step , 57820.08 GFLOP/s , 532186.2 tokens/s INFO:__main__:2024-10-27 06:30:08 | Epoch: 2 | Step: 57540 | Dataset: 0-6576802 | Loss: 1.815 | 675 ms/step , 58263.10 GFLOP/s , 532638.4 tokens/s INFO:__main__:2024-10-27 06:30:16 | Epoch: 2 | Step: 57550 | Dataset: 0-6584802 | Loss: 1.807 | 675 ms/step , 58250.99 GFLOP/s , 532673.8 tokens/s INFO:__main__:2024-10-27 06:30:24 | Epoch: 2 | Step: 57560 | Dataset: 0-6592802 | Loss: 1.800 | 675 ms/step , 58263.23 GFLOP/s , 532399.1 tokens/s INFO:__main__:2024-10-27 06:30:31 | Epoch: 2 | Step: 57570 | Dataset: 0-6600802 | Loss: 1.797 | 674 ms/step , 58333.04 GFLOP/s , 532852.6 tokens/s INFO:__main__:2024-10-27 06:30:39 | Epoch: 2 | Step: 57580 | Dataset: 0-6608802 | Loss: 1.806 | 677 ms/step , 58078.70 GFLOP/s , 532050.3 tokens/s INFO:__main__:2024-10-27 06:30:47 | Epoch: 2 | Step: 57590 | Dataset: 0-6616802 | Loss: 1.818 | 675 ms/step , 58278.16 GFLOP/s , 531501.7 tokens/s INFO:__main__:2024-10-27 06:30:54 | Epoch: 2 | Step: 57600 | Dataset: 0-6624802 | Loss: 1.800 | 675 ms/step , 58234.99 GFLOP/s , 532121.9 tokens/s INFO:__main__:2024-10-27 06:31:02 | Epoch: 2 | Step: 57610 | Dataset: 0-6632802 | Loss: 1.758 | 675 ms/step , 58221.29 GFLOP/s , 532006.2 tokens/s INFO:__main__:2024-10-27 06:31:10 | Epoch: 2 | Step: 57620 | Dataset: 0-6640802 | Loss: 1.771 | 674 ms/step , 58292.38 GFLOP/s , 532370.7 tokens/s INFO:__main__:2024-10-27 06:31:18 | Epoch: 2 | Step: 57630 | Dataset: 0-6648802 | Loss: 1.779 | 674 ms/step , 58325.98 GFLOP/s , 532499.4 tokens/s INFO:__main__:2024-10-27 06:31:25 | Epoch: 2 | Step: 57640 | Dataset: 0-6656802 | Loss: 1.792 | 675 ms/step , 58274.65 GFLOP/s , 531618.3 tokens/s INFO:__main__:2024-10-27 06:31:33 | Epoch: 2 | Step: 57650 | Dataset: 0-6664802 | Loss: 1.739 | 675 ms/step , 58206.36 GFLOP/s , 531799.9 tokens/s INFO:__main__:2024-10-27 06:31:41 | Epoch: 2 | Step: 57660 | Dataset: 0-6672802 | Loss: 2.498 | 675 ms/step , 58207.05 GFLOP/s , 531819.2 tokens/s INFO:__main__:2024-10-27 06:31:48 | Epoch: 2 | Step: 57670 | Dataset: 0-6680802 | Loss: 2.188 | 676 ms/step , 58107.70 GFLOP/s , 530919.3 tokens/s INFO:__main__:2024-10-27 06:31:56 | Epoch: 2 | Step: 57680 | Dataset: 0-6688802 | Loss: 2.145 | 675 ms/step , 58244.32 GFLOP/s , 532173.3 tokens/s INFO:__main__:2024-10-27 06:32:04 | Epoch: 2 | Step: 57690 | Dataset: 0-6696802 | Loss: 2.137 | 675 ms/step , 58221.76 GFLOP/s , 532555.0 tokens/s INFO:__main__:2024-10-27 06:32:11 | Epoch: 2 | Step: 57700 | Dataset: 0-6704802 | Loss: 2.143 | 673 ms/step , 58368.01 GFLOP/s , 533336.8 tokens/s INFO:__main__:2024-10-27 06:32:19 | Epoch: 2 | Step: 57710 | Dataset: 0-6712802 | Loss: 2.086 | 675 ms/step , 58197.15 GFLOP/s , 532660.1 tokens/s INFO:__main__:2024-10-27 06:32:27 | Epoch: 2 | Step: 57720 | Dataset: 0-6720802 | Loss: 2.076 | 675 ms/step , 58266.22 GFLOP/s , 532783.5 tokens/s INFO:__main__:2024-10-27 06:32:35 | Epoch: 2 | Step: 57730 | Dataset: 0-6728802 | Loss: 2.034 | 675 ms/step , 58259.05 GFLOP/s , 532647.3 tokens/s INFO:__main__:2024-10-27 06:32:42 | Epoch: 2 | Step: 57740 | Dataset: 0-6736802 | Loss: 1.999 | 675 ms/step , 58246.45 GFLOP/s , 532946.1 tokens/s INFO:__main__:2024-10-27 06:32:50 | Epoch: 2 | Step: 57750 | Dataset: 0-6744802 | Loss: 2.015 | 676 ms/step , 58146.76 GFLOP/s , 532527.5 tokens/s INFO:__main__:2024-10-27 06:32:58 | Epoch: 2 | Step: 57760 | Dataset: 0-6752802 | Loss: 2.119 | 675 ms/step , 58194.95 GFLOP/s , 533238.9 tokens/s INFO:__main__:2024-10-27 06:33:05 | Epoch: 2 | Step: 57770 | Dataset: 0-6760802 | Loss: 2.072 | 676 ms/step , 58190.50 GFLOP/s , 532890.4 tokens/s INFO:__main__:2024-10-27 06:33:13 | Epoch: 2 | Step: 57780 | Dataset: 0-6768802 | Loss: 1.981 | 675 ms/step , 58214.79 GFLOP/s , 532879.8 tokens/s INFO:__main__:2024-10-27 06:33:21 | Epoch: 2 | Step: 57790 | Dataset: 0-6776802 | Loss: 2.029 | 675 ms/step , 58215.60 GFLOP/s , 532456.2 tokens/s INFO:__main__:2024-10-27 06:33:28 | Epoch: 2 | Step: 57800 | Dataset: 0-6784802 | Loss: 1.981 | 675 ms/step , 58205.85 GFLOP/s , 532772.1 tokens/s INFO:__main__:2024-10-27 06:33:36 | Epoch: 2 | Step: 57810 | Dataset: 0-6792802 | Loss: 1.894 | 675 ms/step , 58224.53 GFLOP/s , 532798.2 tokens/s INFO:__main__:2024-10-27 06:33:44 | Epoch: 2 | Step: 57820 | Dataset: 0-6800802 | Loss: 1.906 | 675 ms/step , 58257.08 GFLOP/s , 532368.7 tokens/s INFO:__main__:2024-10-27 06:33:51 | Epoch: 2 | Step: 57830 | Dataset: 0-6808802 | Loss: 1.824 | 675 ms/step , 58256.15 GFLOP/s , 532163.4 tokens/s INFO:__main__:2024-10-27 06:33:59 | Epoch: 2 | Step: 57840 | Dataset: 0-6816802 | Loss: 1.767 | 674 ms/step , 58311.15 GFLOP/s , 530801.7 tokens/s INFO:__main__:2024-10-27 06:34:07 | Epoch: 2 | Step: 57850 | Dataset: 0-6824802 | Loss: 1.775 | 675 ms/step , 58208.57 GFLOP/s , 532370.4 tokens/s INFO:__main__:2024-10-27 06:34:15 | Epoch: 2 | Step: 57860 | Dataset: 0-6832802 | Loss: 1.772 | 675 ms/step , 58242.13 GFLOP/s , 532169.6 tokens/s INFO:__main__:2024-10-27 06:34:22 | Epoch: 2 | Step: 57870 | Dataset: 0-6840802 | Loss: 1.762 | 675 ms/step , 58222.44 GFLOP/s , 532371.8 tokens/s INFO:__main__:2024-10-27 06:34:30 | Epoch: 2 | Step: 57880 | Dataset: 0-6848802 | Loss: 1.737 | 676 ms/step , 58156.22 GFLOP/s , 531769.8 tokens/s INFO:__main__:2024-10-27 06:34:38 | Epoch: 2 | Step: 57890 | Dataset: 0-6856802 | Loss: 1.771 | 675 ms/step , 58232.67 GFLOP/s , 532004.8 tokens/s INFO:__main__:2024-10-27 06:34:45 | Epoch: 2 | Step: 57900 | Dataset: 0-6864802 | Loss: 1.765 | 676 ms/step , 58172.09 GFLOP/s , 532137.7 tokens/s INFO:__main__:2024-10-27 06:34:53 | Epoch: 2 | Step: 57910 | Dataset: 0-6872802 | Loss: 2.355 | 676 ms/step , 58147.29 GFLOP/s , 532122.1 tokens/s INFO:__main__:2024-10-27 06:35:01 | Epoch: 2 | Step: 57920 | Dataset: 0-6880802 | Loss: 2.277 | 675 ms/step , 58209.86 GFLOP/s , 532670.3 tokens/s INFO:__main__:2024-10-27 06:35:08 | Epoch: 2 | Step: 57930 | Dataset: 0-6888802 | Loss: 2.287 | 675 ms/step , 58219.90 GFLOP/s , 532330.5 tokens/s INFO:__main__:2024-10-27 06:35:16 | Epoch: 2 | Step: 57940 | Dataset: 0-6896802 | Loss: 2.260 | 674 ms/step , 58313.24 GFLOP/s , 532796.9 tokens/s INFO:__main__:2024-10-27 06:35:24 | Epoch: 2 | Step: 57950 | Dataset: 0-6904802 | Loss: 2.289 | 676 ms/step , 58189.74 GFLOP/s , 533115.0 tokens/s INFO:__main__:2024-10-27 06:35:31 | Epoch: 2 | Step: 57960 | Dataset: 0-6912802 | Loss: 2.167 | 675 ms/step , 58214.78 GFLOP/s , 532575.1 tokens/s INFO:__main__:2024-10-27 06:35:39 | Epoch: 2 | Step: 57970 | Dataset: 0-6920802 | Loss: 2.245 | 678 ms/step , 58012.99 GFLOP/s , 532293.5 tokens/s INFO:__main__:2024-10-27 06:35:47 | Epoch: 2 | Step: 57980 | Dataset: 0-6928802 | Loss: 2.283 | 676 ms/step , 58125.31 GFLOP/s , 532379.8 tokens/s INFO:__main__:2024-10-27 06:35:55 | Epoch: 2 | Step: 57990 | Dataset: 0-6936802 | Loss: 2.221 | 676 ms/step , 58186.41 GFLOP/s , 532160.2 tokens/s INFO:__main__:2024-10-27 06:36:02 | Validation | Step: 58000 | Val_loss: 2.308 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 06:36:02 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_063602_step_58000.pt` INFO:__main__:2024-10-27 06:36:03 | Epoch: 2 | Step: 58000 | Dataset: 0-6944802 | Loss: 2.218 | 674 ms/step , 58336.70 GFLOP/s , 479747.7 tokens/s INFO:__main__:2024-10-27 06:36:11 | Epoch: 2 | Step: 58010 | Dataset: 0-6952802 | Loss: 2.187 | 678 ms/step , 57980.87 GFLOP/s , 532208.2 tokens/s INFO:__main__:2024-10-27 06:36:19 | Epoch: 2 | Step: 58020 | Dataset: 0-6960802 | Loss: 2.195 | 677 ms/step , 58076.91 GFLOP/s , 531031.8 tokens/s INFO:__main__:2024-10-27 06:36:26 | Epoch: 2 | Step: 58030 | Dataset: 0-6968802 | Loss: 2.240 | 677 ms/step , 58035.14 GFLOP/s , 532159.2 tokens/s INFO:__main__:2024-10-27 06:36:34 | Epoch: 2 | Step: 58040 | Dataset: 0-6976802 | Loss: 2.226 | 675 ms/step , 58269.03 GFLOP/s , 533173.4 tokens/s INFO:__main__:2024-10-27 06:36:42 | Epoch: 2 | Step: 58050 | Dataset: 0-6984802 | Loss: 2.185 | 675 ms/step , 58217.99 GFLOP/s , 532776.6 tokens/s INFO:__main__:2024-10-27 06:36:49 | Epoch: 2 | Step: 58060 | Dataset: 0-6992802 | Loss: 2.255 | 676 ms/step , 58183.36 GFLOP/s , 532375.0 tokens/s INFO:__main__:2024-10-27 06:36:57 | Epoch: 2 | Step: 58070 | Dataset: 0-7000802 | Loss: 2.242 | 675 ms/step , 58229.94 GFLOP/s , 532639.5 tokens/s INFO:__main__:2024-10-27 06:37:05 | Epoch: 2 | Step: 58080 | Dataset: 0-7008802 | Loss: 2.219 | 677 ms/step , 58071.34 GFLOP/s , 532461.3 tokens/s INFO:__main__:2024-10-27 06:37:12 | Epoch: 2 | Step: 58090 | Dataset: 0-7016802 | Loss: 2.217 | 676 ms/step , 58190.46 GFLOP/s , 532430.3 tokens/s INFO:__main__:2024-10-27 06:37:20 | Epoch: 2 | Step: 58100 | Dataset: 0-7024802 | Loss: 2.220 | 675 ms/step , 58271.43 GFLOP/s , 532966.9 tokens/s INFO:__main__:2024-10-27 06:37:28 | Epoch: 2 | Step: 58110 | Dataset: 0-7032802 | Loss: 2.159 | 675 ms/step , 58243.74 GFLOP/s , 532472.0 tokens/s INFO:__main__:2024-10-27 06:37:35 | Epoch: 2 | Step: 58120 | Dataset: 0-7040802 | Loss: 2.120 | 675 ms/step , 58219.17 GFLOP/s , 532720.3 tokens/s INFO:__main__:2024-10-27 06:37:43 | Epoch: 2 | Step: 58130 | Dataset: 0-7048802 | Loss: 2.199 | 674 ms/step , 58285.35 GFLOP/s , 532660.8 tokens/s INFO:__main__:2024-10-27 06:37:51 | Epoch: 2 | Step: 58140 | Dataset: 0-7056802 | Loss: 2.186 | 676 ms/step , 58185.97 GFLOP/s , 532854.1 tokens/s INFO:__main__:2024-10-27 06:37:58 | Epoch: 2 | Step: 58150 | Dataset: 0-7064802 | Loss: 2.201 | 674 ms/step , 58303.96 GFLOP/s , 532788.7 tokens/s INFO:__main__:2024-10-27 06:38:06 | Epoch: 2 | Step: 58160 | Dataset: 0-7072802 | Loss: 2.188 | 674 ms/step , 58279.37 GFLOP/s , 532929.9 tokens/s INFO:__main__:2024-10-27 06:38:14 | Epoch: 2 | Step: 58170 | Dataset: 0-7080802 | Loss: 2.230 | 675 ms/step , 58222.81 GFLOP/s , 532909.0 tokens/s INFO:__main__:2024-10-27 06:38:22 | Epoch: 2 | Step: 58180 | Dataset: 0-7088802 | Loss: 2.142 | 675 ms/step , 58223.69 GFLOP/s , 533584.9 tokens/s INFO:__main__:2024-10-27 06:38:29 | Epoch: 2 | Step: 58190 | Dataset: 0-7096802 | Loss: 2.142 | 674 ms/step , 58357.97 GFLOP/s , 533461.5 tokens/s INFO:__main__:2024-10-27 06:38:37 | Epoch: 2 | Step: 58200 | Dataset: 0-7104802 | Loss: 2.149 | 674 ms/step , 58351.46 GFLOP/s , 532982.4 tokens/s INFO:__main__:2024-10-27 06:38:45 | Epoch: 2 | Step: 58210 | Dataset: 0-7112802 | Loss: 2.095 | 673 ms/step , 58375.10 GFLOP/s , 533600.1 tokens/s INFO:__main__:2024-10-27 06:38:52 | Epoch: 2 | Step: 58220 | Dataset: 0-7120802 | Loss: 2.182 | 675 ms/step , 58250.59 GFLOP/s , 533273.0 tokens/s INFO:__main__:2024-10-27 06:39:00 | Epoch: 2 | Step: 58230 | Dataset: 0-7128802 | Loss: 1.871 | 675 ms/step , 58261.77 GFLOP/s , 532740.1 tokens/s INFO:__main__:2024-10-27 06:39:08 | Epoch: 2 | Step: 58240 | Dataset: 0-7136802 | Loss: 1.722 | 675 ms/step , 58239.80 GFLOP/s , 532239.8 tokens/s INFO:__main__:2024-10-27 06:39:15 | Epoch: 2 | Step: 58250 | Dataset: 0-7144802 | Loss: 1.709 | 676 ms/step , 58164.76 GFLOP/s , 532233.9 tokens/s INFO:__main__:2024-10-27 06:39:23 | Epoch: 2 | Step: 58260 | Dataset: 0-7152802 | Loss: 1.708 | 674 ms/step , 58317.34 GFLOP/s , 532123.1 tokens/s INFO:__main__:2024-10-27 06:39:31 | Epoch: 2 | Step: 58270 | Dataset: 0-7160802 | Loss: 1.718 | 674 ms/step , 58289.47 GFLOP/s , 533028.0 tokens/s INFO:__main__:2024-10-27 06:39:38 | Epoch: 2 | Step: 58280 | Dataset: 0-7168802 | Loss: 1.712 | 675 ms/step , 58205.28 GFLOP/s , 532202.3 tokens/s INFO:__main__:2024-10-27 06:39:46 | Epoch: 2 | Step: 58290 | Dataset: 0-7176802 | Loss: 1.686 | 674 ms/step , 58304.24 GFLOP/s , 532483.0 tokens/s INFO:__main__:2024-10-27 06:39:54 | Epoch: 2 | Step: 58300 | Dataset: 0-7184802 | Loss: 1.640 | 676 ms/step , 58152.89 GFLOP/s , 532140.7 tokens/s INFO:__main__:2024-10-27 06:40:01 | Epoch: 2 | Step: 58310 | Dataset: 0-7192802 | Loss: 1.651 | 673 ms/step , 58379.35 GFLOP/s , 532575.6 tokens/s INFO:__main__:2024-10-27 06:40:09 | Epoch: 2 | Step: 58320 | Dataset: 0-7200802 | Loss: 2.251 | 675 ms/step , 58249.94 GFLOP/s , 533151.1 tokens/s INFO:__main__:2024-10-27 06:40:17 | Epoch: 2 | Step: 58330 | Dataset: 0-7208802 | Loss: 2.274 | 675 ms/step , 58276.96 GFLOP/s , 532290.0 tokens/s INFO:__main__:2024-10-27 06:40:25 | Epoch: 2 | Step: 58340 | Dataset: 0-7216802 | Loss: 2.235 | 674 ms/step , 58309.90 GFLOP/s , 533230.6 tokens/s INFO:__main__:2024-10-27 06:40:32 | Epoch: 2 | Step: 58350 | Dataset: 0-7224802 | Loss: 2.205 | 675 ms/step , 58242.67 GFLOP/s , 533063.7 tokens/s INFO:__main__:2024-10-27 06:40:40 | Epoch: 2 | Step: 58360 | Dataset: 0-7232802 | Loss: 2.184 | 675 ms/step , 58255.48 GFLOP/s , 532782.6 tokens/s INFO:__main__:2024-10-27 06:40:48 | Epoch: 2 | Step: 58370 | Dataset: 0-7240802 | Loss: 2.214 | 675 ms/step , 58276.59 GFLOP/s , 532996.3 tokens/s INFO:__main__:2024-10-27 06:40:55 | Epoch: 2 | Step: 58380 | Dataset: 0-7248802 | Loss: 2.166 | 677 ms/step , 58056.44 GFLOP/s , 530597.5 tokens/s INFO:__main__:2024-10-27 06:41:03 | Epoch: 2 | Step: 58390 | Dataset: 0-7256802 | Loss: 2.176 | 677 ms/step , 58098.53 GFLOP/s , 529977.4 tokens/s INFO:__main__:2024-10-27 06:41:11 | Epoch: 2 | Step: 58400 | Dataset: 0-7264802 | Loss: 2.138 | 676 ms/step , 58140.70 GFLOP/s , 529547.9 tokens/s INFO:__main__:2024-10-27 06:41:18 | Epoch: 2 | Step: 58410 | Dataset: 0-7272802 | Loss: 2.128 | 677 ms/step , 58054.78 GFLOP/s , 530981.0 tokens/s INFO:__main__:2024-10-27 06:41:26 | Epoch: 2 | Step: 58420 | Dataset: 0-7280802 | Loss: 2.158 | 678 ms/step , 57966.53 GFLOP/s , 529070.8 tokens/s INFO:__main__:2024-10-27 06:41:34 | Epoch: 2 | Step: 58430 | Dataset: 0-7288802 | Loss: 2.081 | 678 ms/step , 57998.21 GFLOP/s , 530068.6 tokens/s INFO:__main__:2024-10-27 06:41:42 | Epoch: 2 | Step: 58440 | Dataset: 0-7296802 | Loss: 2.190 | 678 ms/step , 58018.01 GFLOP/s , 529783.1 tokens/s INFO:__main__:2024-10-27 06:41:49 | Epoch: 2 | Step: 58450 | Dataset: 0-7304802 | Loss: 2.145 | 676 ms/step , 58186.81 GFLOP/s , 529787.5 tokens/s INFO:__main__:2024-10-27 06:41:57 | Epoch: 2 | Step: 58460 | Dataset: 0-7312802 | Loss: 2.109 | 677 ms/step , 58083.04 GFLOP/s , 529503.6 tokens/s INFO:__main__:2024-10-27 06:42:05 | Epoch: 2 | Step: 58470 | Dataset: 0-7320802 | Loss: 2.143 | 676 ms/step , 58175.54 GFLOP/s , 530096.1 tokens/s INFO:__main__:2024-10-27 06:42:13 | Epoch: 2 | Step: 58480 | Dataset: 0-7328802 | Loss: 1.879 | 681 ms/step , 57755.75 GFLOP/s , 529183.7 tokens/s INFO:__main__:2024-10-27 06:42:20 | Epoch: 2 | Step: 58490 | Dataset: 0-7336802 | Loss: 1.778 | 675 ms/step , 58224.03 GFLOP/s , 530234.3 tokens/s INFO:__main__:2024-10-27 06:42:28 | Epoch: 2 | Step: 58500 | Dataset: 0-7344802 | Loss: 1.766 | 677 ms/step , 58030.96 GFLOP/s , 530363.0 tokens/s INFO:__main__:2024-10-27 06:42:36 | Epoch: 2 | Step: 58510 | Dataset: 0-7352802 | Loss: 1.782 | 675 ms/step , 58265.79 GFLOP/s , 531073.2 tokens/s INFO:__main__:2024-10-27 06:42:43 | Epoch: 2 | Step: 58520 | Dataset: 0-7360802 | Loss: 1.766 | 675 ms/step , 58220.30 GFLOP/s , 531986.5 tokens/s INFO:__main__:2024-10-27 06:42:51 | Epoch: 2 | Step: 58530 | Dataset: 0-7368802 | Loss: 1.757 | 675 ms/step , 58204.26 GFLOP/s , 531554.7 tokens/s INFO:__main__:2024-10-27 06:42:59 | Epoch: 2 | Step: 58540 | Dataset: 0-7376802 | Loss: 1.773 | 675 ms/step , 58222.38 GFLOP/s , 531824.2 tokens/s INFO:__main__:2024-10-27 06:43:07 | Epoch: 2 | Step: 58550 | Dataset: 0-7384802 | Loss: 1.760 | 674 ms/step , 58298.94 GFLOP/s , 532035.4 tokens/s INFO:__main__:2024-10-27 06:43:14 | Epoch: 2 | Step: 58560 | Dataset: 0-7392802 | Loss: 1.757 | 674 ms/step , 58301.72 GFLOP/s , 532468.7 tokens/s INFO:__main__:2024-10-27 06:43:22 | Epoch: 2 | Step: 58570 | Dataset: 0-7400802 | Loss: 2.348 | 675 ms/step , 58258.12 GFLOP/s , 532254.1 tokens/s INFO:__main__:2024-10-27 06:43:30 | Epoch: 2 | Step: 58580 | Dataset: 0-7408802 | Loss: 2.235 | 676 ms/step , 58173.74 GFLOP/s , 532409.2 tokens/s INFO:__main__:2024-10-27 06:43:37 | Epoch: 2 | Step: 58590 | Dataset: 0-7416802 | Loss: 2.225 | 674 ms/step , 58282.25 GFLOP/s , 532403.5 tokens/s INFO:__main__:2024-10-27 06:43:45 | Epoch: 2 | Step: 58600 | Dataset: 0-7424802 | Loss: 2.191 | 676 ms/step , 58188.73 GFLOP/s , 532209.7 tokens/s INFO:__main__:2024-10-27 06:43:53 | Epoch: 2 | Step: 58610 | Dataset: 0-7432802 | Loss: 2.170 | 674 ms/step , 58339.03 GFLOP/s , 532409.4 tokens/s INFO:__main__:2024-10-27 06:44:00 | Epoch: 2 | Step: 58620 | Dataset: 0-7440802 | Loss: 2.163 | 675 ms/step , 58258.58 GFLOP/s , 532980.9 tokens/s INFO:__main__:2024-10-27 06:44:08 | Epoch: 2 | Step: 58630 | Dataset: 0-7448802 | Loss: 2.171 | 675 ms/step , 58274.77 GFLOP/s , 532786.7 tokens/s INFO:__main__:2024-10-27 06:44:16 | Epoch: 2 | Step: 58640 | Dataset: 0-7456802 | Loss: 2.170 | 674 ms/step , 58325.01 GFLOP/s , 532521.8 tokens/s INFO:__main__:2024-10-27 06:44:24 | Epoch: 2 | Step: 58650 | Dataset: 0-7464802 | Loss: 2.168 | 674 ms/step , 58295.33 GFLOP/s , 532711.7 tokens/s INFO:__main__:2024-10-27 06:44:31 | Epoch: 2 | Step: 58660 | Dataset: 0-7472802 | Loss: 2.236 | 675 ms/step , 58254.03 GFLOP/s , 532101.3 tokens/s INFO:__main__:2024-10-27 06:44:39 | Epoch: 2 | Step: 58670 | Dataset: 0-7480802 | Loss: 2.161 | 675 ms/step , 58228.31 GFLOP/s , 532175.8 tokens/s INFO:__main__:2024-10-27 06:44:47 | Epoch: 2 | Step: 58680 | Dataset: 0-7488802 | Loss: 2.212 | 676 ms/step , 58110.03 GFLOP/s , 532057.2 tokens/s INFO:__main__:2024-10-27 06:44:54 | Epoch: 2 | Step: 58690 | Dataset: 0-7496802 | Loss: 2.205 | 674 ms/step , 58341.15 GFLOP/s , 532242.6 tokens/s INFO:__main__:2024-10-27 06:45:02 | Epoch: 2 | Step: 58700 | Dataset: 0-7504802 | Loss: 2.126 | 676 ms/step , 58144.62 GFLOP/s , 532575.8 tokens/s INFO:__main__:2024-10-27 06:45:10 | Epoch: 2 | Step: 58710 | Dataset: 0-7512802 | Loss: 2.109 | 674 ms/step , 58299.20 GFLOP/s , 532455.6 tokens/s INFO:__main__:2024-10-27 06:45:17 | Epoch: 2 | Step: 58720 | Dataset: 0-7520802 | Loss: 2.138 | 674 ms/step , 58314.23 GFLOP/s , 532581.4 tokens/s INFO:__main__:2024-10-27 06:45:25 | Epoch: 2 | Step: 58730 | Dataset: 0-7528802 | Loss: 2.186 | 676 ms/step , 58165.39 GFLOP/s , 532466.1 tokens/s INFO:__main__:2024-10-27 06:45:33 | Epoch: 2 | Step: 58740 | Dataset: 0-7536802 | Loss: 2.136 | 676 ms/step , 58185.45 GFLOP/s , 532022.7 tokens/s INFO:__main__:2024-10-27 06:45:40 | Epoch: 2 | Step: 58750 | Dataset: 0-7544802 | Loss: 2.138 | 675 ms/step , 58236.11 GFLOP/s , 532120.3 tokens/s INFO:__main__:2024-10-27 06:45:48 | Epoch: 2 | Step: 58760 | Dataset: 0-7552802 | Loss: 2.094 | 674 ms/step , 58293.29 GFLOP/s , 531690.7 tokens/s INFO:__main__:2024-10-27 06:45:56 | Epoch: 2 | Step: 58770 | Dataset: 0-7560802 | Loss: 2.128 | 675 ms/step , 58217.01 GFLOP/s , 532349.3 tokens/s INFO:__main__:2024-10-27 06:46:04 | Epoch: 2 | Step: 58780 | Dataset: 0-7568802 | Loss: 2.134 | 675 ms/step , 58240.40 GFLOP/s , 532245.0 tokens/s INFO:__main__:2024-10-27 06:46:11 | Epoch: 2 | Step: 58790 | Dataset: 0-7576802 | Loss: 2.157 | 677 ms/step , 58102.74 GFLOP/s , 531606.3 tokens/s INFO:__main__:2024-10-27 06:46:19 | Epoch: 2 | Step: 58800 | Dataset: 0-7584802 | Loss: 2.231 | 675 ms/step , 58218.46 GFLOP/s , 532063.1 tokens/s INFO:__main__:2024-10-27 06:46:27 | Epoch: 2 | Step: 58810 | Dataset: 0-7592802 | Loss: 2.194 | 676 ms/step , 58156.00 GFLOP/s , 531494.2 tokens/s INFO:__main__:2024-10-27 06:46:34 | Epoch: 2 | Step: 58820 | Dataset: 0-7600802 | Loss: 2.169 | 675 ms/step , 58196.34 GFLOP/s , 531876.2 tokens/s INFO:__main__:2024-10-27 06:46:42 | Epoch: 2 | Step: 58830 | Dataset: 0-7608802 | Loss: 2.135 | 674 ms/step , 58302.38 GFLOP/s , 532379.0 tokens/s INFO:__main__:2024-10-27 06:46:50 | Epoch: 2 | Step: 58840 | Dataset: 0-7616802 | Loss: 2.114 | 674 ms/step , 58283.79 GFLOP/s , 532527.5 tokens/s INFO:__main__:2024-10-27 06:46:57 | Epoch: 2 | Step: 58850 | Dataset: 0-7624802 | Loss: 2.144 | 675 ms/step , 58248.04 GFLOP/s , 532074.4 tokens/s INFO:__main__:2024-10-27 06:47:05 | Epoch: 2 | Step: 58860 | Dataset: 0-7632802 | Loss: 2.166 | 676 ms/step , 58169.15 GFLOP/s , 532170.3 tokens/s INFO:__main__:2024-10-27 06:47:13 | Epoch: 2 | Step: 58870 | Dataset: 0-7640802 | Loss: 2.131 | 675 ms/step , 58221.39 GFLOP/s , 531989.2 tokens/s INFO:__main__:2024-10-27 06:47:21 | Epoch: 2 | Step: 58880 | Dataset: 0-7648802 | Loss: 2.144 | 675 ms/step , 58207.91 GFLOP/s , 531030.4 tokens/s INFO:__main__:2024-10-27 06:47:28 | Epoch: 2 | Step: 58890 | Dataset: 0-7656802 | Loss: 2.249 | 678 ms/step , 58003.49 GFLOP/s , 531028.1 tokens/s INFO:__main__:2024-10-27 06:47:36 | Epoch: 2 | Step: 58900 | Dataset: 0-7664802 | Loss: 2.161 | 677 ms/step , 58092.33 GFLOP/s , 531076.2 tokens/s INFO:__main__:2024-10-27 06:47:44 | Epoch: 2 | Step: 58910 | Dataset: 0-7672802 | Loss: 2.227 | 674 ms/step , 58293.76 GFLOP/s , 531130.9 tokens/s INFO:__main__:2024-10-27 06:47:51 | Epoch: 2 | Step: 58920 | Dataset: 0-7680802 | Loss: 2.240 | 674 ms/step , 58345.06 GFLOP/s , 533586.3 tokens/s INFO:__main__:2024-10-27 06:47:59 | Epoch: 2 | Step: 58930 | Dataset: 0-7688802 | Loss: 2.238 | 674 ms/step , 58339.54 GFLOP/s , 533104.5 tokens/s INFO:__main__:2024-10-27 06:48:07 | Epoch: 2 | Step: 58940 | Dataset: 0-7696802 | Loss: 2.277 | 674 ms/step , 58308.15 GFLOP/s , 533253.3 tokens/s INFO:__main__:2024-10-27 06:48:14 | Epoch: 2 | Step: 58950 | Dataset: 0-7704802 | Loss: 2.160 | 675 ms/step , 58232.27 GFLOP/s , 532810.1 tokens/s INFO:__main__:2024-10-27 06:48:22 | Epoch: 2 | Step: 58960 | Dataset: 0-7712802 | Loss: 2.176 | 675 ms/step , 58254.61 GFLOP/s , 532338.8 tokens/s INFO:__main__:2024-10-27 06:48:30 | Epoch: 2 | Step: 58970 | Dataset: 0-7720802 | Loss: 2.201 | 675 ms/step , 58234.48 GFLOP/s , 532346.0 tokens/s INFO:__main__:2024-10-27 06:48:38 | Epoch: 2 | Step: 58980 | Dataset: 0-7728802 | Loss: 2.085 | 675 ms/step , 58239.21 GFLOP/s , 532844.8 tokens/s INFO:__main__:2024-10-27 06:48:45 | Epoch: 2 | Step: 58990 | Dataset: 0-7736802 | Loss: 2.212 | 675 ms/step , 58203.56 GFLOP/s , 532614.3 tokens/s INFO:__main__:2024-10-27 06:48:52 | Validation | Step: 59000 | Val_loss: 2.210 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 06:48:52 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_064852_step_59000.pt` INFO:__main__:2024-10-27 06:48:54 | Epoch: 2 | Step: 59000 | Dataset: 0-7744802 | Loss: 2.223 | 673 ms/step , 58366.01 GFLOP/s , 480378.1 tokens/s INFO:__main__:2024-10-27 06:49:01 | Epoch: 2 | Step: 59010 | Dataset: 0-7752802 | Loss: 2.136 | 675 ms/step , 58233.05 GFLOP/s , 532671.3 tokens/s INFO:__main__:2024-10-27 06:49:09 | Epoch: 2 | Step: 59020 | Dataset: 0-7760802 | Loss: 2.241 | 675 ms/step , 58278.39 GFLOP/s , 532866.1 tokens/s INFO:__main__:2024-10-27 06:49:17 | Epoch: 2 | Step: 59030 | Dataset: 0-7768802 | Loss: 2.199 | 674 ms/step , 58323.34 GFLOP/s , 533055.3 tokens/s INFO:__main__:2024-10-27 06:49:24 | Epoch: 2 | Step: 59040 | Dataset: 0-7776802 | Loss: 2.106 | 676 ms/step , 58174.80 GFLOP/s , 533065.2 tokens/s INFO:__main__:2024-10-27 06:49:32 | Epoch: 2 | Step: 59050 | Dataset: 0-7784802 | Loss: 2.159 | 675 ms/step , 58257.12 GFLOP/s , 533280.3 tokens/s INFO:__main__:2024-10-27 06:49:40 | Epoch: 2 | Step: 59060 | Dataset: 0-7792802 | Loss: 2.123 | 675 ms/step , 58277.83 GFLOP/s , 532763.6 tokens/s INFO:__main__:2024-10-27 06:49:48 | Epoch: 2 | Step: 59070 | Dataset: 0-7800802 | Loss: 2.161 | 674 ms/step , 58334.31 GFLOP/s , 533577.2 tokens/s INFO:__main__:2024-10-27 06:49:55 | Epoch: 2 | Step: 59080 | Dataset: 0-7808802 | Loss: 2.143 | 674 ms/step , 58326.14 GFLOP/s , 533304.8 tokens/s INFO:__main__:2024-10-27 06:50:03 | Epoch: 2 | Step: 59090 | Dataset: 0-7816802 | Loss: 2.153 | 675 ms/step , 58240.74 GFLOP/s , 533072.4 tokens/s INFO:__main__:2024-10-27 06:50:11 | Epoch: 2 | Step: 59100 | Dataset: 0-7824802 | Loss: 2.189 | 674 ms/step , 58297.01 GFLOP/s , 532838.3 tokens/s INFO:__main__:2024-10-27 06:50:18 | Epoch: 2 | Step: 59110 | Dataset: 0-7832802 | Loss: 2.131 | 675 ms/step , 58260.37 GFLOP/s , 532871.7 tokens/s INFO:__main__:2024-10-27 06:50:26 | Epoch: 2 | Step: 59120 | Dataset: 0-7840802 | Loss: 2.099 | 675 ms/step , 58265.64 GFLOP/s , 532790.1 tokens/s INFO:__main__:2024-10-27 06:50:34 | Epoch: 2 | Step: 59130 | Dataset: 0-7848802 | Loss: 2.132 | 675 ms/step , 58203.56 GFLOP/s , 532669.0 tokens/s INFO:__main__:2024-10-27 06:50:41 | Epoch: 2 | Step: 59140 | Dataset: 0-7856802 | Loss: 2.108 | 675 ms/step , 58270.78 GFLOP/s , 532915.6 tokens/s INFO:__main__:2024-10-27 06:50:49 | Epoch: 2 | Step: 59150 | Dataset: 0-7864802 | Loss: 2.100 | 675 ms/step , 58262.06 GFLOP/s , 532974.0 tokens/s INFO:__main__:2024-10-27 06:50:57 | Epoch: 2 | Step: 59160 | Dataset: 0-7872802 | Loss: 2.046 | 674 ms/step , 58347.73 GFLOP/s , 532947.7 tokens/s INFO:__main__:2024-10-27 06:51:04 | Epoch: 2 | Step: 59170 | Dataset: 0-7880802 | Loss: 2.095 | 674 ms/step , 58308.75 GFLOP/s , 533617.6 tokens/s INFO:__main__:2024-10-27 06:51:12 | Epoch: 2 | Step: 59180 | Dataset: 0-7888802 | Loss: 2.142 | 674 ms/step , 58365.04 GFLOP/s , 533453.3 tokens/s INFO:__main__:2024-10-27 06:51:20 | Epoch: 2 | Step: 59190 | Dataset: 0-7896802 | Loss: 2.195 | 674 ms/step , 58311.08 GFLOP/s , 533647.3 tokens/s INFO:__main__:2024-10-27 06:51:27 | Epoch: 2 | Step: 59200 | Dataset: 0-7904802 | Loss: 2.140 | 676 ms/step , 58141.60 GFLOP/s , 533368.6 tokens/s INFO:__main__:2024-10-27 06:51:35 | Epoch: 2 | Step: 59210 | Dataset: 0-7912802 | Loss: 2.260 | 675 ms/step , 58200.28 GFLOP/s , 532706.3 tokens/s INFO:__main__:2024-10-27 06:51:43 | Epoch: 2 | Step: 59220 | Dataset: 0-7920802 | Loss: 2.275 | 675 ms/step , 58212.89 GFLOP/s , 532885.1 tokens/s INFO:__main__:2024-10-27 06:51:50 | Epoch: 2 | Step: 59230 | Dataset: 0-7928802 | Loss: 2.227 | 675 ms/step , 58275.49 GFLOP/s , 532984.4 tokens/s INFO:__main__:2024-10-27 06:51:58 | Epoch: 2 | Step: 59240 | Dataset: 0-7936802 | Loss: 2.201 | 676 ms/step , 58129.27 GFLOP/s , 532731.5 tokens/s INFO:__main__:2024-10-27 06:52:06 | Epoch: 2 | Step: 59250 | Dataset: 0-7944802 | Loss: 2.169 | 674 ms/step , 58287.20 GFLOP/s , 533018.8 tokens/s INFO:__main__:2024-10-27 06:52:14 | Epoch: 2 | Step: 59260 | Dataset: 0-7952802 | Loss: 2.202 | 673 ms/step , 58367.75 GFLOP/s , 533310.4 tokens/s INFO:__main__:2024-10-27 06:52:21 | Epoch: 2 | Step: 59270 | Dataset: 0-7960802 | Loss: 2.099 | 674 ms/step , 58308.52 GFLOP/s , 533409.6 tokens/s INFO:__main__:2024-10-27 06:52:29 | Epoch: 2 | Step: 59280 | Dataset: 0-7968802 | Loss: 2.167 | 674 ms/step , 58352.87 GFLOP/s , 532892.3 tokens/s INFO:__main__:2024-10-27 06:52:37 | Epoch: 2 | Step: 59290 | Dataset: 0-7976802 | Loss: 2.234 | 675 ms/step , 58244.22 GFLOP/s , 532930.9 tokens/s INFO:__main__:2024-10-27 06:52:44 | Epoch: 2 | Step: 59300 | Dataset: 0-7984802 | Loss: 2.132 | 674 ms/step , 58310.79 GFLOP/s , 532679.5 tokens/s INFO:__main__:2024-10-27 06:52:52 | Epoch: 2 | Step: 59310 | Dataset: 0-7992802 | Loss: 2.145 | 674 ms/step , 58324.57 GFLOP/s , 533177.2 tokens/s INFO:__main__:2024-10-27 06:53:00 | Epoch: 2 | Step: 59320 | Dataset: 0-8000802 | Loss: 2.138 | 675 ms/step , 58243.51 GFLOP/s , 532949.8 tokens/s INFO:__main__:2024-10-27 06:53:07 | Epoch: 2 | Step: 59330 | Dataset: 0-8008802 | Loss: 2.089 | 674 ms/step , 58345.04 GFLOP/s , 532913.0 tokens/s INFO:__main__:2024-10-27 06:53:15 | Epoch: 2 | Step: 59340 | Dataset: 0-8016802 | Loss: 2.062 | 674 ms/step , 58296.53 GFLOP/s , 533206.1 tokens/s INFO:__main__:2024-10-27 06:53:23 | Epoch: 2 | Step: 59350 | Dataset: 0-8024802 | Loss: 2.215 | 675 ms/step , 58275.46 GFLOP/s , 533027.7 tokens/s INFO:__main__:2024-10-27 06:53:30 | Epoch: 2 | Step: 59360 | Dataset: 0-8032802 | Loss: 2.084 | 675 ms/step , 58213.89 GFLOP/s , 532663.6 tokens/s INFO:__main__:2024-10-27 06:53:38 | Epoch: 2 | Step: 59370 | Dataset: 0-8040802 | Loss: 2.174 | 675 ms/step , 58244.16 GFLOP/s , 532624.4 tokens/s INFO:__main__:2024-10-27 06:53:46 | Epoch: 2 | Step: 59380 | Dataset: 0-8048802 | Loss: 2.166 | 675 ms/step , 58220.84 GFLOP/s , 532629.2 tokens/s INFO:__main__:2024-10-27 06:53:53 | Epoch: 2 | Step: 59390 | Dataset: 0-8056802 | Loss: 2.095 | 675 ms/step , 58224.50 GFLOP/s , 532722.6 tokens/s INFO:__main__:2024-10-27 06:54:01 | Epoch: 2 | Step: 59400 | Dataset: 0-8064802 | Loss: 2.134 | 676 ms/step , 58168.26 GFLOP/s , 532728.6 tokens/s INFO:__main__:2024-10-27 06:54:09 | Epoch: 2 | Step: 59410 | Dataset: 0-8072802 | Loss: 2.142 | 674 ms/step , 58279.85 GFLOP/s , 533216.1 tokens/s INFO:__main__:2024-10-27 06:54:16 | Epoch: 2 | Step: 59420 | Dataset: 0-8080802 | Loss: 2.073 | 674 ms/step , 58302.56 GFLOP/s , 533062.3 tokens/s INFO:__main__:2024-10-27 06:54:24 | Epoch: 2 | Step: 59430 | Dataset: 0-8088802 | Loss: 2.103 | 676 ms/step , 58128.25 GFLOP/s , 532580.0 tokens/s INFO:__main__:2024-10-27 06:54:32 | Epoch: 2 | Step: 59440 | Dataset: 0-8096802 | Loss: 2.187 | 675 ms/step , 58197.87 GFLOP/s , 532701.3 tokens/s INFO:__main__:2024-10-27 06:54:40 | Epoch: 2 | Step: 59450 | Dataset: 0-8104802 | Loss: 2.230 | 675 ms/step , 58233.41 GFLOP/s , 532867.0 tokens/s INFO:__main__:2024-10-27 06:54:47 | Epoch: 2 | Step: 59460 | Dataset: 0-8112802 | Loss: 2.135 | 676 ms/step , 58189.12 GFLOP/s , 532303.5 tokens/s INFO:__main__:2024-10-27 06:54:55 | Epoch: 2 | Step: 59470 | Dataset: 0-8120802 | Loss: 2.120 | 676 ms/step , 58191.05 GFLOP/s , 532719.9 tokens/s INFO:__main__:2024-10-27 06:55:03 | Epoch: 2 | Step: 59480 | Dataset: 0-8128802 | Loss: 2.093 | 675 ms/step , 58250.06 GFLOP/s , 532541.0 tokens/s INFO:__main__:2024-10-27 06:55:10 | Epoch: 2 | Step: 59490 | Dataset: 0-8136802 | Loss: 2.136 | 675 ms/step , 58249.52 GFLOP/s , 532722.2 tokens/s INFO:__main__:2024-10-27 06:55:18 | Epoch: 2 | Step: 59500 | Dataset: 0-8144802 | Loss: 2.046 | 676 ms/step , 58177.63 GFLOP/s , 532456.3 tokens/s INFO:__main__:2024-10-27 06:55:26 | Epoch: 2 | Step: 59510 | Dataset: 0-8152802 | Loss: 2.157 | 675 ms/step , 58210.82 GFLOP/s , 532928.7 tokens/s INFO:__main__:2024-10-27 06:55:33 | Epoch: 2 | Step: 59520 | Dataset: 0-8160802 | Loss: 2.037 | 675 ms/step , 58229.97 GFLOP/s , 532616.0 tokens/s INFO:__main__:2024-10-27 06:55:41 | Epoch: 2 | Step: 59530 | Dataset: 0-8168802 | Loss: 2.134 | 675 ms/step , 58225.75 GFLOP/s , 532788.7 tokens/s INFO:__main__:2024-10-27 06:55:49 | Epoch: 2 | Step: 59540 | Dataset: 0-8176802 | Loss: 2.133 | 675 ms/step , 58249.38 GFLOP/s , 532805.3 tokens/s INFO:__main__:2024-10-27 06:55:56 | Epoch: 2 | Step: 59550 | Dataset: 0-8184802 | Loss: 2.071 | 676 ms/step , 58174.22 GFLOP/s , 532702.4 tokens/s INFO:__main__:2024-10-27 06:56:04 | Epoch: 2 | Step: 59560 | Dataset: 0-8192802 | Loss: 2.135 | 675 ms/step , 58252.38 GFLOP/s , 532982.8 tokens/s INFO:__main__:2024-10-27 06:56:12 | Epoch: 2 | Step: 59570 | Dataset: 0-8200802 | Loss: 2.148 | 674 ms/step , 58295.90 GFLOP/s , 532733.2 tokens/s INFO:__main__:2024-10-27 06:56:19 | Epoch: 2 | Step: 59580 | Dataset: 0-8208802 | Loss: 2.169 | 674 ms/step , 58309.25 GFLOP/s , 533382.6 tokens/s INFO:__main__:2024-10-27 06:56:27 | Epoch: 2 | Step: 59590 | Dataset: 0-8216802 | Loss: 2.084 | 675 ms/step , 58230.80 GFLOP/s , 533094.4 tokens/s INFO:__main__:2024-10-27 06:56:35 | Epoch: 2 | Step: 59600 | Dataset: 0-8224802 | Loss: 2.134 | 676 ms/step , 58186.91 GFLOP/s , 532900.4 tokens/s INFO:__main__:2024-10-27 06:56:43 | Epoch: 2 | Step: 59610 | Dataset: 0-8232802 | Loss: 2.114 | 674 ms/step , 58344.36 GFLOP/s , 533307.6 tokens/s INFO:__main__:2024-10-27 06:56:50 | Epoch: 2 | Step: 59620 | Dataset: 0-8240802 | Loss: 2.071 | 676 ms/step , 58180.87 GFLOP/s , 532819.0 tokens/s INFO:__main__:2024-10-27 06:56:58 | Epoch: 2 | Step: 59630 | Dataset: 0-8248802 | Loss: 2.156 | 675 ms/step , 58265.37 GFLOP/s , 532582.4 tokens/s INFO:__main__:2024-10-27 06:57:06 | Epoch: 2 | Step: 59640 | Dataset: 0-8256802 | Loss: 2.208 | 677 ms/step , 58068.67 GFLOP/s , 532013.8 tokens/s INFO:__main__:2024-10-27 06:57:13 | Epoch: 2 | Step: 59650 | Dataset: 0-8264802 | Loss: 2.094 | 677 ms/step , 58069.09 GFLOP/s , 531145.6 tokens/s INFO:__main__:2024-10-27 06:57:21 | Epoch: 2 | Step: 59660 | Dataset: 0-8272802 | Loss: 2.134 | 674 ms/step , 58280.12 GFLOP/s , 532375.2 tokens/s INFO:__main__:2024-10-27 06:57:29 | Epoch: 2 | Step: 59670 | Dataset: 0-8280802 | Loss: 2.097 | 675 ms/step , 58237.24 GFLOP/s , 533253.6 tokens/s INFO:__main__:2024-10-27 06:57:36 | Epoch: 2 | Step: 59680 | Dataset: 0-8288802 | Loss: 2.171 | 675 ms/step , 58261.54 GFLOP/s , 532974.7 tokens/s INFO:__main__:2024-10-27 06:57:44 | Epoch: 2 | Step: 59690 | Dataset: 0-8296802 | Loss: 2.084 | 675 ms/step , 58224.15 GFLOP/s , 533062.6 tokens/s INFO:__main__:2024-10-27 06:57:52 | Epoch: 2 | Step: 59700 | Dataset: 0-8304802 | Loss: 1.931 | 675 ms/step , 58207.58 GFLOP/s , 531961.3 tokens/s INFO:__main__:2024-10-27 06:57:59 | Epoch: 2 | Step: 59710 | Dataset: 0-8312802 | Loss: 1.789 | 675 ms/step , 58249.45 GFLOP/s , 532091.0 tokens/s INFO:__main__:2024-10-27 06:58:07 | Epoch: 2 | Step: 59720 | Dataset: 0-8320802 | Loss: 1.772 | 674 ms/step , 58297.60 GFLOP/s , 532268.0 tokens/s INFO:__main__:2024-10-27 06:58:15 | Epoch: 2 | Step: 59730 | Dataset: 0-8328802 | Loss: 1.691 | 675 ms/step , 58261.52 GFLOP/s , 532467.9 tokens/s INFO:__main__:2024-10-27 06:58:23 | Epoch: 2 | Step: 59740 | Dataset: 0-8336802 | Loss: 1.731 | 675 ms/step , 58216.99 GFLOP/s , 531868.0 tokens/s INFO:__main__:2024-10-27 06:58:30 | Epoch: 2 | Step: 59750 | Dataset: 0-8344802 | Loss: 1.709 | 676 ms/step , 58184.34 GFLOP/s , 532327.3 tokens/s INFO:__main__:2024-10-27 06:58:38 | Epoch: 2 | Step: 59760 | Dataset: 0-8352802 | Loss: 1.688 | 677 ms/step , 58047.34 GFLOP/s , 530120.1 tokens/s INFO:__main__:2024-10-27 06:58:46 | Epoch: 2 | Step: 59770 | Dataset: 0-8360802 | Loss: 1.687 | 675 ms/step , 58255.18 GFLOP/s , 530417.4 tokens/s INFO:__main__:2024-10-27 06:58:53 | Epoch: 2 | Step: 59780 | Dataset: 0-8368802 | Loss: 2.375 | 675 ms/step , 58242.94 GFLOP/s , 530129.1 tokens/s INFO:__main__:2024-10-27 06:59:01 | Epoch: 2 | Step: 59790 | Dataset: 0-8376802 | Loss: 2.218 | 676 ms/step , 58171.77 GFLOP/s , 531405.2 tokens/s INFO:__main__:2024-10-27 06:59:09 | Epoch: 2 | Step: 59800 | Dataset: 0-8384802 | Loss: 2.191 | 675 ms/step , 58230.31 GFLOP/s , 531741.8 tokens/s INFO:__main__:2024-10-27 06:59:17 | Epoch: 2 | Step: 59810 | Dataset: 0-8392802 | Loss: 2.288 | 678 ms/step , 57973.37 GFLOP/s , 530601.0 tokens/s INFO:__main__:2024-10-27 06:59:24 | Epoch: 2 | Step: 59820 | Dataset: 0-8400802 | Loss: 2.158 | 675 ms/step , 58253.91 GFLOP/s , 530959.8 tokens/s INFO:__main__:2024-10-27 06:59:32 | Epoch: 2 | Step: 59830 | Dataset: 0-8408802 | Loss: 2.208 | 678 ms/step , 57944.32 GFLOP/s , 530075.4 tokens/s INFO:__main__:2024-10-27 06:59:40 | Epoch: 2 | Step: 59840 | Dataset: 0-8416802 | Loss: 2.134 | 676 ms/step , 58191.51 GFLOP/s , 531296.7 tokens/s INFO:__main__:2024-10-27 06:59:47 | Epoch: 2 | Step: 59850 | Dataset: 0-8424802 | Loss: 2.161 | 676 ms/step , 58168.67 GFLOP/s , 528732.8 tokens/s INFO:__main__:2024-10-27 06:59:55 | Epoch: 2 | Step: 59860 | Dataset: 0-8432802 | Loss: 2.132 | 675 ms/step , 58262.92 GFLOP/s , 530671.4 tokens/s INFO:__main__:2024-10-27 07:00:03 | Epoch: 2 | Step: 59870 | Dataset: 0-8440802 | Loss: 2.165 | 677 ms/step , 58074.68 GFLOP/s , 531493.7 tokens/s INFO:__main__:2024-10-27 07:00:11 | Epoch: 2 | Step: 59880 | Dataset: 0-8448802 | Loss: 2.133 | 674 ms/step , 58302.41 GFLOP/s , 531723.6 tokens/s INFO:__main__:2024-10-27 07:00:18 | Epoch: 2 | Step: 59890 | Dataset: 0-8456802 | Loss: 2.142 | 675 ms/step , 58274.79 GFLOP/s , 532428.8 tokens/s INFO:__main__:2024-10-27 07:00:26 | Epoch: 2 | Step: 59900 | Dataset: 0-8464802 | Loss: 2.164 | 674 ms/step , 58343.68 GFLOP/s , 532776.9 tokens/s INFO:__main__:2024-10-27 07:00:34 | Epoch: 2 | Step: 59910 | Dataset: 0-8472802 | Loss: 2.139 | 674 ms/step , 58305.68 GFLOP/s , 532782.3 tokens/s INFO:__main__:2024-10-27 07:00:41 | Epoch: 2 | Step: 59920 | Dataset: 0-8480802 | Loss: 2.192 | 676 ms/step , 58186.35 GFLOP/s , 531751.3 tokens/s INFO:__main__:2024-10-27 07:00:49 | Epoch: 2 | Step: 59930 | Dataset: 0-8488802 | Loss: 2.259 | 675 ms/step , 58242.13 GFLOP/s , 532942.7 tokens/s INFO:__main__:2024-10-27 07:00:57 | Epoch: 2 | Step: 59940 | Dataset: 0-8496802 | Loss: 1.970 | 675 ms/step , 58245.83 GFLOP/s , 532012.8 tokens/s INFO:__main__:2024-10-27 07:01:04 | Epoch: 2 | Step: 59950 | Dataset: 0-8504802 | Loss: 1.782 | 674 ms/step , 58339.70 GFLOP/s , 532612.4 tokens/s INFO:__main__:2024-10-27 07:01:12 | Epoch: 2 | Step: 59960 | Dataset: 0-8512802 | Loss: 1.715 | 674 ms/step , 58294.89 GFLOP/s , 532879.2 tokens/s INFO:__main__:2024-10-27 07:01:20 | Epoch: 2 | Step: 59970 | Dataset: 0-8520802 | Loss: 1.749 | 674 ms/step , 58297.74 GFLOP/s , 532811.2 tokens/s INFO:__main__:2024-10-27 07:01:28 | Epoch: 2 | Step: 59980 | Dataset: 0-8528802 | Loss: 1.719 | 675 ms/step , 58274.33 GFLOP/s , 531665.1 tokens/s INFO:__main__:2024-10-27 07:01:35 | Epoch: 2 | Step: 59990 | Dataset: 0-8536802 | Loss: 1.721 | 675 ms/step , 58276.40 GFLOP/s , 532925.7 tokens/s INFO:__main__:2024-10-27 07:01:42 | Validation | Step: 60000 | Val_loss: 2.448 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 07:01:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_070142_step_60000.pt` INFO:__main__:2024-10-27 07:01:44 | Epoch: 2 | Step: 60000 | Dataset: 0-8544802 | Loss: 1.721 | 674 ms/step , 58353.77 GFLOP/s , 480222.9 tokens/s INFO:__main__:2024-10-27 07:01:51 | Epoch: 2 | Step: 60010 | Dataset: 0-8552802 | Loss: 1.660 | 674 ms/step , 58316.11 GFLOP/s , 531838.1 tokens/s INFO:__main__:2024-10-27 07:01:59 | Epoch: 2 | Step: 60020 | Dataset: 0-8560802 | Loss: 1.698 | 675 ms/step , 58265.06 GFLOP/s , 532290.8 tokens/s INFO:__main__:2024-10-27 07:02:07 | Epoch: 2 | Step: 60030 | Dataset: 0-8568802 | Loss: 2.275 | 674 ms/step , 58288.34 GFLOP/s , 532454.8 tokens/s INFO:__main__:2024-10-27 07:02:14 | Epoch: 2 | Step: 60040 | Dataset: 0-8576802 | Loss: 2.322 | 674 ms/step , 58299.64 GFLOP/s , 533424.8 tokens/s INFO:__main__:2024-10-27 07:02:22 | Epoch: 2 | Step: 60050 | Dataset: 0-8584802 | Loss: 2.170 | 682 ms/step , 57618.20 GFLOP/s , 532182.2 tokens/s INFO:__main__:2024-10-27 07:02:30 | Epoch: 2 | Step: 60060 | Dataset: 0-8592802 | Loss: 2.180 | 676 ms/step , 58177.46 GFLOP/s , 532532.9 tokens/s INFO:__main__:2024-10-27 07:02:38 | Epoch: 2 | Step: 60070 | Dataset: 0-8600802 | Loss: 2.121 | 675 ms/step , 58245.60 GFLOP/s , 532399.4 tokens/s INFO:__main__:2024-10-27 07:02:45 | Epoch: 2 | Step: 60080 | Dataset: 0-8608802 | Loss: 2.157 | 676 ms/step , 58160.82 GFLOP/s , 532557.9 tokens/s INFO:__main__:2024-10-27 07:02:53 | Epoch: 2 | Step: 60090 | Dataset: 0-8616802 | Loss: 2.233 | 675 ms/step , 58214.98 GFLOP/s , 532370.8 tokens/s INFO:__main__:2024-10-27 07:03:01 | Epoch: 2 | Step: 60100 | Dataset: 0-8624802 | Loss: 2.130 | 675 ms/step , 58224.30 GFLOP/s , 532464.4 tokens/s INFO:__main__:2024-10-27 07:03:08 | Epoch: 2 | Step: 60110 | Dataset: 0-8632802 | Loss: 2.182 | 674 ms/step , 58284.24 GFLOP/s , 532985.5 tokens/s INFO:__main__:2024-10-27 07:03:16 | Epoch: 2 | Step: 60120 | Dataset: 0-8640802 | Loss: 2.193 | 674 ms/step , 58304.65 GFLOP/s , 532556.8 tokens/s INFO:__main__:2024-10-27 07:03:24 | Epoch: 2 | Step: 60130 | Dataset: 0-8648802 | Loss: 2.160 | 675 ms/step , 58278.61 GFLOP/s , 533154.1 tokens/s INFO:__main__:2024-10-27 07:03:31 | Epoch: 2 | Step: 60140 | Dataset: 0-8656802 | Loss: 2.161 | 675 ms/step , 58246.10 GFLOP/s , 533180.6 tokens/s INFO:__main__:2024-10-27 07:03:39 | Epoch: 2 | Step: 60150 | Dataset: 0-8664802 | Loss: 2.125 | 674 ms/step , 58291.53 GFLOP/s , 533131.6 tokens/s INFO:__main__:2024-10-27 07:03:47 | Epoch: 2 | Step: 60160 | Dataset: 0-8672802 | Loss: 2.137 | 677 ms/step , 58043.66 GFLOP/s , 532959.2 tokens/s INFO:__main__:2024-10-27 07:03:54 | Epoch: 2 | Step: 60170 | Dataset: 0-8680802 | Loss: 2.108 | 674 ms/step , 58287.16 GFLOP/s , 532796.2 tokens/s INFO:__main__:2024-10-27 07:04:02 | Epoch: 2 | Step: 60180 | Dataset: 0-8688802 | Loss: 2.180 | 676 ms/step , 58181.56 GFLOP/s , 532662.4 tokens/s INFO:__main__:2024-10-27 07:04:10 | Epoch: 2 | Step: 60190 | Dataset: 0-8696802 | Loss: 2.327 | 675 ms/step , 58248.58 GFLOP/s , 533194.2 tokens/s INFO:__main__:2024-10-27 07:04:18 | Epoch: 2 | Step: 60200 | Dataset: 0-8704802 | Loss: 2.350 | 675 ms/step , 58221.18 GFLOP/s , 532979.2 tokens/s INFO:__main__:2024-10-27 07:04:25 | Epoch: 2 | Step: 60210 | Dataset: 0-8712802 | Loss: 2.321 | 674 ms/step , 58308.94 GFLOP/s , 533244.2 tokens/s INFO:__main__:2024-10-27 07:04:33 | Epoch: 2 | Step: 60220 | Dataset: 0-8720802 | Loss: 2.297 | 675 ms/step , 58252.15 GFLOP/s , 533493.8 tokens/s INFO:__main__:2024-10-27 07:04:41 | Epoch: 2 | Step: 60230 | Dataset: 0-8728802 | Loss: 2.294 | 675 ms/step , 58226.48 GFLOP/s , 532742.1 tokens/s INFO:__main__:2024-10-27 07:04:48 | Epoch: 2 | Step: 60240 | Dataset: 0-8736802 | Loss: 2.196 | 675 ms/step , 58223.21 GFLOP/s , 533006.1 tokens/s INFO:__main__:2024-10-27 07:04:56 | Epoch: 2 | Step: 60250 | Dataset: 0-8744802 | Loss: 2.213 | 674 ms/step , 58291.92 GFLOP/s , 532925.6 tokens/s INFO:__main__:2024-10-27 07:05:04 | Epoch: 2 | Step: 60260 | Dataset: 0-8752802 | Loss: 2.204 | 674 ms/step , 58298.07 GFLOP/s , 532790.0 tokens/s INFO:__main__:2024-10-27 07:05:11 | Epoch: 2 | Step: 60270 | Dataset: 0-8760802 | Loss: 2.234 | 675 ms/step , 58233.42 GFLOP/s , 532946.8 tokens/s INFO:__main__:2024-10-27 07:05:19 | Epoch: 2 | Step: 60280 | Dataset: 0-8768802 | Loss: 2.178 | 674 ms/step , 58287.67 GFLOP/s , 533228.1 tokens/s INFO:__main__:2024-10-27 07:05:27 | Epoch: 2 | Step: 60290 | Dataset: 0-8776802 | Loss: 2.204 | 675 ms/step , 58230.89 GFLOP/s , 532779.4 tokens/s INFO:__main__:2024-10-27 07:05:34 | Epoch: 2 | Step: 60300 | Dataset: 0-8784802 | Loss: 2.154 | 675 ms/step , 58260.04 GFLOP/s , 532403.4 tokens/s INFO:__main__:2024-10-27 07:05:42 | Epoch: 2 | Step: 60310 | Dataset: 0-8792802 | Loss: 2.156 | 675 ms/step , 58273.11 GFLOP/s , 533241.1 tokens/s INFO:__main__:2024-10-27 07:05:50 | Epoch: 2 | Step: 60320 | Dataset: 0-8800802 | Loss: 2.168 | 674 ms/step , 58295.32 GFLOP/s , 533505.2 tokens/s INFO:__main__:2024-10-27 07:05:57 | Epoch: 2 | Step: 60330 | Dataset: 0-8808802 | Loss: 2.195 | 675 ms/step , 58271.81 GFLOP/s , 533189.8 tokens/s INFO:__main__:2024-10-27 07:06:05 | Epoch: 2 | Step: 60340 | Dataset: 0-8816802 | Loss: 2.178 | 674 ms/step , 58282.45 GFLOP/s , 533065.7 tokens/s INFO:__main__:2024-10-27 07:06:13 | Epoch: 2 | Step: 60350 | Dataset: 0-8824802 | Loss: 2.222 | 674 ms/step , 58291.30 GFLOP/s , 533098.8 tokens/s INFO:__main__:2024-10-27 07:06:20 | Epoch: 2 | Step: 60360 | Dataset: 0-8832802 | Loss: 1.889 | 675 ms/step , 58218.78 GFLOP/s , 532324.4 tokens/s INFO:__main__:2024-10-27 07:06:28 | Epoch: 2 | Step: 60370 | Dataset: 0-8840802 | Loss: 1.865 | 671 ms/step , 58596.90 GFLOP/s , 532885.6 tokens/s INFO:__main__:2024-10-27 07:06:36 | Epoch: 2 | Step: 60380 | Dataset: 0-8848802 | Loss: 1.838 | 674 ms/step , 58309.73 GFLOP/s , 532637.4 tokens/s INFO:__main__:2024-10-27 07:06:44 | Epoch: 2 | Step: 60390 | Dataset: 0-8856802 | Loss: 1.804 | 675 ms/step , 58232.70 GFLOP/s , 532192.0 tokens/s INFO:__main__:2024-10-27 07:06:51 | Epoch: 2 | Step: 60400 | Dataset: 0-8864802 | Loss: 1.826 | 675 ms/step , 58267.66 GFLOP/s , 532141.2 tokens/s INFO:__main__:2024-10-27 07:06:59 | Epoch: 2 | Step: 60410 | Dataset: 0-8872802 | Loss: 1.787 | 675 ms/step , 58193.30 GFLOP/s , 532163.7 tokens/s INFO:__main__:2024-10-27 07:07:07 | Epoch: 2 | Step: 60420 | Dataset: 0-8880802 | Loss: 1.767 | 675 ms/step , 58204.02 GFLOP/s , 531950.6 tokens/s INFO:__main__:2024-10-27 07:07:14 | Epoch: 2 | Step: 60430 | Dataset: 0-8888802 | Loss: 1.779 | 674 ms/step , 58282.70 GFLOP/s , 532076.0 tokens/s INFO:__main__:2024-10-27 07:07:22 | Epoch: 2 | Step: 60440 | Dataset: 0-8896802 | Loss: 2.385 | 675 ms/step , 58228.67 GFLOP/s , 532195.5 tokens/s INFO:__main__:2024-10-27 07:07:30 | Epoch: 2 | Step: 60450 | Dataset: 0-8904802 | Loss: 2.195 | 675 ms/step , 58264.92 GFLOP/s , 532817.8 tokens/s INFO:__main__:2024-10-27 07:07:37 | Epoch: 2 | Step: 60460 | Dataset: 0-8912802 | Loss: 2.197 | 675 ms/step , 58235.44 GFLOP/s , 533089.6 tokens/s INFO:__main__:2024-10-27 07:07:45 | Epoch: 2 | Step: 60470 | Dataset: 0-8920802 | Loss: 2.215 | 675 ms/step , 58215.96 GFLOP/s , 532659.9 tokens/s INFO:__main__:2024-10-27 07:07:53 | Epoch: 2 | Step: 60480 | Dataset: 0-8928802 | Loss: 2.234 | 674 ms/step , 58315.82 GFLOP/s , 532839.4 tokens/s INFO:__main__:2024-10-27 07:08:00 | Epoch: 2 | Step: 60490 | Dataset: 0-8936802 | Loss: 2.192 | 674 ms/step , 58314.31 GFLOP/s , 533346.1 tokens/s INFO:__main__:2024-10-27 07:08:08 | Epoch: 2 | Step: 60500 | Dataset: 0-8944802 | Loss: 2.155 | 674 ms/step , 58319.20 GFLOP/s , 532988.3 tokens/s INFO:__main__:2024-10-27 07:08:16 | Epoch: 2 | Step: 60510 | Dataset: 0-8952802 | Loss: 2.228 | 675 ms/step , 58251.09 GFLOP/s , 533265.9 tokens/s INFO:__main__:2024-10-27 07:08:24 | Epoch: 2 | Step: 60520 | Dataset: 0-8960802 | Loss: 2.163 | 675 ms/step , 58264.85 GFLOP/s , 532680.8 tokens/s INFO:__main__:2024-10-27 07:08:31 | Epoch: 2 | Step: 60530 | Dataset: 0-8968802 | Loss: 2.209 | 674 ms/step , 58309.69 GFLOP/s , 533206.1 tokens/s INFO:__main__:2024-10-27 07:08:39 | Epoch: 2 | Step: 60540 | Dataset: 0-8976802 | Loss: 2.171 | 674 ms/step , 58342.16 GFLOP/s , 533117.1 tokens/s INFO:__main__:2024-10-27 07:08:47 | Epoch: 2 | Step: 60550 | Dataset: 0-8984802 | Loss: 2.214 | 675 ms/step , 58243.47 GFLOP/s , 532797.7 tokens/s INFO:__main__:2024-10-27 07:08:54 | Epoch: 2 | Step: 60560 | Dataset: 0-8992802 | Loss: 2.171 | 675 ms/step , 58277.41 GFLOP/s , 532514.0 tokens/s INFO:__main__:2024-10-27 07:09:02 | Epoch: 2 | Step: 60570 | Dataset: 0-9000802 | Loss: 2.135 | 673 ms/step , 58414.55 GFLOP/s , 533382.7 tokens/s INFO:__main__:2024-10-27 07:09:10 | Epoch: 2 | Step: 60580 | Dataset: 0-9008802 | Loss: 2.128 | 674 ms/step , 58303.59 GFLOP/s , 533285.3 tokens/s INFO:__main__:2024-10-27 07:09:17 | Epoch: 2 | Step: 60590 | Dataset: 0-9016802 | Loss: 2.173 | 674 ms/step , 58294.49 GFLOP/s , 532741.6 tokens/s INFO:__main__:2024-10-27 07:09:25 | Epoch: 2 | Step: 60600 | Dataset: 0-9024802 | Loss: 2.189 | 675 ms/step , 58230.92 GFLOP/s , 532746.5 tokens/s INFO:__main__:2024-10-27 07:09:33 | Epoch: 2 | Step: 60610 | Dataset: 0-9032802 | Loss: 2.239 | 675 ms/step , 58214.90 GFLOP/s , 532387.2 tokens/s INFO:__main__:2024-10-27 07:09:40 | Epoch: 2 | Step: 60620 | Dataset: 0-9040802 | Loss: 2.253 | 674 ms/step , 58300.35 GFLOP/s , 533327.4 tokens/s INFO:__main__:2024-10-27 07:09:48 | Epoch: 2 | Step: 60630 | Dataset: 0-9048802 | Loss: 2.243 | 675 ms/step , 58258.99 GFLOP/s , 532837.8 tokens/s INFO:__main__:2024-10-27 07:09:56 | Epoch: 2 | Step: 60640 | Dataset: 0-9056802 | Loss: 2.267 | 675 ms/step , 58234.46 GFLOP/s , 533010.2 tokens/s INFO:__main__:2024-10-27 07:10:03 | Epoch: 2 | Step: 60650 | Dataset: 0-9064802 | Loss: 2.122 | 674 ms/step , 58355.12 GFLOP/s , 533183.7 tokens/s INFO:__main__:2024-10-27 07:10:11 | Epoch: 2 | Step: 60660 | Dataset: 0-9072802 | Loss: 2.252 | 674 ms/step , 58288.00 GFLOP/s , 533546.4 tokens/s INFO:__main__:2024-10-27 07:10:19 | Epoch: 2 | Step: 60670 | Dataset: 0-9080802 | Loss: 2.217 | 674 ms/step , 58285.94 GFLOP/s , 533230.9 tokens/s INFO:__main__:2024-10-27 07:10:26 | Epoch: 2 | Step: 60680 | Dataset: 0-9088802 | Loss: 2.166 | 674 ms/step , 58330.90 GFLOP/s , 532768.8 tokens/s INFO:__main__:2024-10-27 07:10:34 | Epoch: 2 | Step: 60690 | Dataset: 0-9096802 | Loss: 2.223 | 674 ms/step , 58284.54 GFLOP/s , 532607.0 tokens/s INFO:__main__:2024-10-27 07:10:42 | Epoch: 2 | Step: 60700 | Dataset: 0-9104802 | Loss: 2.155 | 676 ms/step , 58187.91 GFLOP/s , 530806.3 tokens/s INFO:__main__:2024-10-27 07:10:50 | Epoch: 2 | Step: 60710 | Dataset: 0-9112802 | Loss: 2.170 | 674 ms/step , 58315.27 GFLOP/s , 533149.1 tokens/s INFO:__main__:2024-10-27 07:10:57 | Epoch: 2 | Step: 60720 | Dataset: 0-9120802 | Loss: 2.177 | 674 ms/step , 58304.17 GFLOP/s , 532630.0 tokens/s INFO:__main__:2024-10-27 07:11:05 | Epoch: 2 | Step: 60730 | Dataset: 0-9128802 | Loss: 2.230 | 675 ms/step , 58276.25 GFLOP/s , 532524.0 tokens/s INFO:__main__:2024-10-27 07:11:13 | Epoch: 2 | Step: 60740 | Dataset: 0-9136802 | Loss: 2.228 | 674 ms/step , 58353.79 GFLOP/s , 532547.2 tokens/s INFO:__main__:2024-10-27 07:11:20 | Epoch: 2 | Step: 60750 | Dataset: 0-9144802 | Loss: 2.250 | 674 ms/step , 58328.91 GFLOP/s , 532682.3 tokens/s INFO:__main__:2024-10-27 07:11:28 | Epoch: 2 | Step: 60760 | Dataset: 0-9152802 | Loss: 2.168 | 675 ms/step , 58264.34 GFLOP/s , 532491.5 tokens/s INFO:__main__:2024-10-27 07:11:36 | Epoch: 2 | Step: 60770 | Dataset: 0-9160802 | Loss: 2.231 | 675 ms/step , 58195.38 GFLOP/s , 532565.6 tokens/s INFO:__main__:2024-10-27 07:11:43 | Epoch: 2 | Step: 60780 | Dataset: 0-9168802 | Loss: 2.248 | 674 ms/step , 58289.91 GFLOP/s , 532464.7 tokens/s INFO:__main__:2024-10-27 07:11:51 | Epoch: 2 | Step: 60790 | Dataset: 0-9176802 | Loss: 2.184 | 675 ms/step , 58261.57 GFLOP/s , 532823.2 tokens/s INFO:__main__:2024-10-27 07:11:59 | Epoch: 2 | Step: 60800 | Dataset: 0-9184802 | Loss: 2.078 | 675 ms/step , 58268.29 GFLOP/s , 532717.9 tokens/s INFO:__main__:2024-10-27 07:12:06 | Epoch: 2 | Step: 60810 | Dataset: 0-9192802 | Loss: 2.161 | 675 ms/step , 58266.94 GFLOP/s , 532485.2 tokens/s INFO:__main__:2024-10-27 07:12:14 | Epoch: 2 | Step: 60820 | Dataset: 0-9200802 | Loss: 2.087 | 674 ms/step , 58286.36 GFLOP/s , 532633.3 tokens/s INFO:__main__:2024-10-27 07:12:22 | Epoch: 2 | Step: 60830 | Dataset: 0-9208802 | Loss: 2.202 | 675 ms/step , 58221.42 GFLOP/s , 532274.6 tokens/s INFO:__main__:2024-10-27 07:12:30 | Epoch: 2 | Step: 60840 | Dataset: 0-9216802 | Loss: 2.122 | 674 ms/step , 58334.22 GFLOP/s , 533136.6 tokens/s INFO:__main__:2024-10-27 07:12:37 | Epoch: 2 | Step: 60850 | Dataset: 0-9224802 | Loss: 2.178 | 675 ms/step , 58262.34 GFLOP/s , 532648.9 tokens/s INFO:__main__:2024-10-27 07:12:45 | Epoch: 2 | Step: 60860 | Dataset: 0-9232802 | Loss: 2.159 | 675 ms/step , 58215.21 GFLOP/s , 532638.3 tokens/s INFO:__main__:2024-10-27 07:12:53 | Epoch: 2 | Step: 60870 | Dataset: 0-9240802 | Loss: 2.131 | 675 ms/step , 58263.16 GFLOP/s , 532366.2 tokens/s INFO:__main__:2024-10-27 07:13:00 | Epoch: 2 | Step: 60880 | Dataset: 0-9248802 | Loss: 2.134 | 675 ms/step , 58225.45 GFLOP/s , 532250.8 tokens/s INFO:__main__:2024-10-27 07:13:08 | Epoch: 2 | Step: 60890 | Dataset: 0-9256802 | Loss: 2.127 | 675 ms/step , 58273.24 GFLOP/s , 532490.3 tokens/s INFO:__main__:2024-10-27 07:13:16 | Epoch: 2 | Step: 60900 | Dataset: 0-9264802 | Loss: 2.182 | 675 ms/step , 58255.53 GFLOP/s , 532324.5 tokens/s INFO:__main__:2024-10-27 07:13:23 | Epoch: 2 | Step: 60910 | Dataset: 0-9272802 | Loss: 2.133 | 674 ms/step , 58283.59 GFLOP/s , 532564.8 tokens/s INFO:__main__:2024-10-27 07:13:31 | Epoch: 2 | Step: 60920 | Dataset: 0-9280802 | Loss: 2.087 | 675 ms/step , 58262.75 GFLOP/s , 532603.7 tokens/s INFO:__main__:2024-10-27 07:13:39 | Epoch: 2 | Step: 60930 | Dataset: 0-9288802 | Loss: 1.884 | 674 ms/step , 58309.85 GFLOP/s , 532520.8 tokens/s INFO:__main__:2024-10-27 07:13:46 | Epoch: 2 | Step: 60940 | Dataset: 0-9296802 | Loss: 1.799 | 676 ms/step , 58186.35 GFLOP/s , 531649.9 tokens/s INFO:__main__:2024-10-27 07:13:54 | Epoch: 2 | Step: 60950 | Dataset: 0-9304802 | Loss: 1.802 | 675 ms/step , 58207.92 GFLOP/s , 531715.8 tokens/s INFO:__main__:2024-10-27 07:14:02 | Epoch: 2 | Step: 60960 | Dataset: 0-9312802 | Loss: 1.799 | 675 ms/step , 58232.34 GFLOP/s , 531278.0 tokens/s INFO:__main__:2024-10-27 07:14:10 | Epoch: 2 | Step: 60970 | Dataset: 0-9320802 | Loss: 1.785 | 675 ms/step , 58272.55 GFLOP/s , 532610.0 tokens/s INFO:__main__:2024-10-27 07:14:17 | Epoch: 2 | Step: 60980 | Dataset: 0-9328802 | Loss: 1.761 | 674 ms/step , 58313.68 GFLOP/s , 532199.3 tokens/s INFO:__main__:2024-10-27 07:14:25 | Epoch: 2 | Step: 60990 | Dataset: 0-9336802 | Loss: 1.786 | 675 ms/step , 58236.89 GFLOP/s , 531666.5 tokens/s INFO:__main__:2024-10-27 07:14:32 | Validation | Step: 61000 | Val_loss: 2.464 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 07:14:32 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_071432_step_61000.pt` INFO:__main__:2024-10-27 07:14:34 | Epoch: 2 | Step: 61000 | Dataset: 0-9344802 | Loss: 1.753 | 673 ms/step , 58371.09 GFLOP/s , 479149.7 tokens/s INFO:__main__:2024-10-27 07:14:41 | Epoch: 2 | Step: 61010 | Dataset: 0-9352802 | Loss: 1.735 | 675 ms/step , 58246.82 GFLOP/s , 531952.2 tokens/s INFO:__main__:2024-10-27 07:14:49 | Epoch: 2 | Step: 61020 | Dataset: 0-9360802 | Loss: 2.242 | 675 ms/step , 58242.16 GFLOP/s , 532290.9 tokens/s INFO:__main__:2024-10-27 07:14:57 | Epoch: 2 | Step: 61030 | Dataset: 0-9368802 | Loss: 2.226 | 674 ms/step , 58318.87 GFLOP/s , 532408.3 tokens/s INFO:__main__:2024-10-27 07:15:04 | Epoch: 2 | Step: 61040 | Dataset: 0-9376802 | Loss: 2.103 | 676 ms/step , 58134.45 GFLOP/s , 532435.3 tokens/s INFO:__main__:2024-10-27 07:15:12 | Epoch: 2 | Step: 61050 | Dataset: 0-9384802 | Loss: 2.109 | 674 ms/step , 58281.50 GFLOP/s , 531925.2 tokens/s INFO:__main__:2024-10-27 07:15:20 | Epoch: 2 | Step: 61060 | Dataset: 0-9392802 | Loss: 2.187 | 675 ms/step , 58233.85 GFLOP/s , 532972.6 tokens/s INFO:__main__:2024-10-27 07:15:27 | Epoch: 2 | Step: 61070 | Dataset: 0-9400802 | Loss: 2.017 | 675 ms/step , 58252.20 GFLOP/s , 532772.6 tokens/s INFO:__main__:2024-10-27 07:15:35 | Epoch: 2 | Step: 61080 | Dataset: 0-9408802 | Loss: 2.139 | 674 ms/step , 58352.16 GFLOP/s , 532743.8 tokens/s INFO:__main__:2024-10-27 07:15:43 | Epoch: 2 | Step: 61090 | Dataset: 0-9416802 | Loss: 2.177 | 675 ms/step , 58266.47 GFLOP/s , 532948.5 tokens/s INFO:__main__:2024-10-27 07:15:50 | Epoch: 2 | Step: 61100 | Dataset: 0-9424802 | Loss: 2.116 | 674 ms/step , 58310.29 GFLOP/s , 533415.5 tokens/s INFO:__main__:2024-10-27 07:15:58 | Epoch: 2 | Step: 61110 | Dataset: 0-9432802 | Loss: 2.098 | 676 ms/step , 58189.39 GFLOP/s , 532484.1 tokens/s INFO:__main__:2024-10-27 07:16:06 | Epoch: 2 | Step: 61120 | Dataset: 0-9440802 | Loss: 2.142 | 673 ms/step , 58404.14 GFLOP/s , 533267.4 tokens/s INFO:__main__:2024-10-27 07:16:13 | Epoch: 2 | Step: 61130 | Dataset: 0-9448802 | Loss: 2.085 | 675 ms/step , 58208.46 GFLOP/s , 532653.1 tokens/s INFO:__main__:2024-10-27 07:16:21 | Epoch: 2 | Step: 61140 | Dataset: 0-9456802 | Loss: 2.157 | 675 ms/step , 58272.39 GFLOP/s , 532828.9 tokens/s INFO:__main__:2024-10-27 07:16:29 | Epoch: 2 | Step: 61150 | Dataset: 0-9464802 | Loss: 2.167 | 674 ms/step , 58357.86 GFLOP/s , 532837.4 tokens/s INFO:__main__:2024-10-27 07:16:37 | Epoch: 2 | Step: 61160 | Dataset: 0-9472802 | Loss: 2.034 | 675 ms/step , 58258.99 GFLOP/s , 532789.7 tokens/s INFO:__main__:2024-10-27 07:16:44 | Epoch: 2 | Step: 61170 | Dataset: 0-9480802 | Loss: 2.089 | 675 ms/step , 58268.45 GFLOP/s , 532658.7 tokens/s INFO:__main__:2024-10-27 07:16:52 | Epoch: 2 | Step: 61180 | Dataset: 0-9488802 | Loss: 2.147 | 674 ms/step , 58289.27 GFLOP/s , 532746.8 tokens/s INFO:__main__:2024-10-27 07:17:00 | Epoch: 2 | Step: 61190 | Dataset: 0-9496802 | Loss: 2.081 | 674 ms/step , 58285.37 GFLOP/s , 532702.0 tokens/s INFO:__main__:2024-10-27 07:17:07 | Epoch: 2 | Step: 61200 | Dataset: 0-9504802 | Loss: 2.187 | 674 ms/step , 58325.42 GFLOP/s , 532594.9 tokens/s INFO:__main__:2024-10-27 07:17:15 | Epoch: 2 | Step: 61210 | Dataset: 0-9512802 | Loss: 2.146 | 676 ms/step , 58148.67 GFLOP/s , 531919.0 tokens/s INFO:__main__:2024-10-27 07:17:23 | Epoch: 2 | Step: 61220 | Dataset: 0-9520802 | Loss: 2.168 | 675 ms/step , 58218.17 GFLOP/s , 530834.8 tokens/s INFO:__main__:2024-10-27 07:17:30 | Epoch: 2 | Step: 61230 | Dataset: 0-9528802 | Loss: 2.191 | 675 ms/step , 58246.28 GFLOP/s , 531473.7 tokens/s INFO:__main__:2024-10-27 07:17:38 | Epoch: 2 | Step: 61240 | Dataset: 0-9536802 | Loss: 2.100 | 675 ms/step , 58244.40 GFLOP/s , 531936.1 tokens/s INFO:__main__:2024-10-27 07:17:46 | Epoch: 2 | Step: 61250 | Dataset: 0-9544802 | Loss: 2.087 | 675 ms/step , 58241.20 GFLOP/s , 531714.1 tokens/s INFO:__main__:2024-10-27 07:17:54 | Epoch: 2 | Step: 61260 | Dataset: 0-9552802 | Loss: 2.162 | 676 ms/step , 58179.80 GFLOP/s , 532158.9 tokens/s INFO:__main__:2024-10-27 07:18:01 | Epoch: 2 | Step: 61270 | Dataset: 0-9560802 | Loss: 2.148 | 675 ms/step , 58217.77 GFLOP/s , 531616.4 tokens/s INFO:__main__:2024-10-27 07:18:09 | Epoch: 2 | Step: 61280 | Dataset: 0-9568802 | Loss: 2.103 | 676 ms/step , 58156.31 GFLOP/s , 530667.9 tokens/s INFO:__main__:2024-10-27 07:18:17 | Epoch: 2 | Step: 61290 | Dataset: 0-9576802 | Loss: 2.177 | 676 ms/step , 58145.42 GFLOP/s , 529756.6 tokens/s INFO:__main__:2024-10-27 07:18:24 | Epoch: 2 | Step: 61300 | Dataset: 0-9584802 | Loss: 2.079 | 675 ms/step , 58200.77 GFLOP/s , 532858.1 tokens/s INFO:__main__:2024-10-27 07:18:32 | Epoch: 2 | Step: 61310 | Dataset: 0-9592802 | Loss: 2.159 | 674 ms/step , 58337.26 GFLOP/s , 532482.2 tokens/s INFO:__main__:2024-10-27 07:18:40 | Epoch: 2 | Step: 61320 | Dataset: 0-9600802 | Loss: 2.086 | 674 ms/step , 58291.32 GFLOP/s , 532801.6 tokens/s INFO:__main__:2024-10-27 07:18:47 | Epoch: 2 | Step: 61330 | Dataset: 0-9608802 | Loss: 2.191 | 675 ms/step , 58212.88 GFLOP/s , 532752.8 tokens/s INFO:__main__:2024-10-27 07:18:55 | Epoch: 2 | Step: 61340 | Dataset: 0-9616802 | Loss: 2.176 | 674 ms/step , 58353.84 GFLOP/s , 532575.6 tokens/s INFO:__main__:2024-10-27 07:19:03 | Epoch: 2 | Step: 61350 | Dataset: 0-9624802 | Loss: 2.021 | 674 ms/step , 58354.04 GFLOP/s , 532963.0 tokens/s INFO:__main__:2024-10-27 07:19:10 | Epoch: 2 | Step: 61360 | Dataset: 0-9632802 | Loss: 2.105 | 673 ms/step , 58383.73 GFLOP/s , 533572.2 tokens/s INFO:__main__:2024-10-27 07:19:18 | Epoch: 2 | Step: 61370 | Dataset: 0-9640802 | Loss: 2.140 | 675 ms/step , 58271.63 GFLOP/s , 533089.2 tokens/s INFO:__main__:2024-10-27 07:19:26 | Epoch: 2 | Step: 61380 | Dataset: 0-9648802 | Loss: 2.144 | 674 ms/step , 58317.92 GFLOP/s , 532770.3 tokens/s INFO:__main__:2024-10-27 07:19:34 | Epoch: 2 | Step: 61390 | Dataset: 0-9656802 | Loss: 2.107 | 674 ms/step , 58308.28 GFLOP/s , 532390.7 tokens/s INFO:__main__:2024-10-27 07:19:41 | Epoch: 2 | Step: 61400 | Dataset: 0-9664802 | Loss: 2.133 | 674 ms/step , 58343.77 GFLOP/s , 532900.6 tokens/s INFO:__main__:2024-10-27 07:19:49 | Epoch: 2 | Step: 61410 | Dataset: 0-9672802 | Loss: 2.013 | 674 ms/step , 58300.11 GFLOP/s , 533288.6 tokens/s INFO:__main__:2024-10-27 07:19:57 | Epoch: 2 | Step: 61420 | Dataset: 0-9680802 | Loss: 2.097 | 676 ms/step , 58188.94 GFLOP/s , 532815.4 tokens/s INFO:__main__:2024-10-27 07:20:04 | Epoch: 2 | Step: 61430 | Dataset: 0-9688802 | Loss: 2.105 | 674 ms/step , 58309.06 GFLOP/s , 533047.9 tokens/s INFO:__main__:2024-10-27 07:20:12 | Epoch: 2 | Step: 61440 | Dataset: 0-9696802 | Loss: 2.132 | 674 ms/step , 58327.71 GFLOP/s , 533409.1 tokens/s INFO:__main__:2024-10-27 07:20:20 | Epoch: 2 | Step: 61450 | Dataset: 0-9704802 | Loss: 2.142 | 674 ms/step , 58306.85 GFLOP/s , 533121.3 tokens/s INFO:__main__:2024-10-27 07:20:27 | Epoch: 2 | Step: 61460 | Dataset: 0-9712802 | Loss: 2.160 | 674 ms/step , 58315.06 GFLOP/s , 533008.2 tokens/s INFO:__main__:2024-10-27 07:20:35 | Epoch: 2 | Step: 61470 | Dataset: 0-9720802 | Loss: 1.974 | 674 ms/step , 58314.07 GFLOP/s , 533088.0 tokens/s INFO:__main__:2024-10-27 07:20:43 | Epoch: 2 | Step: 61480 | Dataset: 0-9728802 | Loss: 2.160 | 674 ms/step , 58311.44 GFLOP/s , 533172.6 tokens/s INFO:__main__:2024-10-27 07:20:50 | Epoch: 2 | Step: 61490 | Dataset: 0-9736802 | Loss: 1.991 | 673 ms/step , 58379.43 GFLOP/s , 533125.8 tokens/s INFO:__main__:2024-10-27 07:20:58 | Epoch: 2 | Step: 61500 | Dataset: 0-9744802 | Loss: 1.827 | 677 ms/step , 58055.64 GFLOP/s , 530715.2 tokens/s INFO:__main__:2024-10-27 07:21:06 | Epoch: 2 | Step: 61510 | Dataset: 0-9752802 | Loss: 1.747 | 674 ms/step , 58307.80 GFLOP/s , 532079.0 tokens/s INFO:__main__:2024-10-27 07:21:14 | Epoch: 2 | Step: 61520 | Dataset: 0-9760802 | Loss: 1.722 | 675 ms/step , 58225.76 GFLOP/s , 532390.0 tokens/s INFO:__main__:2024-10-27 07:21:21 | Epoch: 2 | Step: 61530 | Dataset: 0-9768802 | Loss: 1.734 | 674 ms/step , 58287.18 GFLOP/s , 532263.3 tokens/s INFO:__main__:2024-10-27 07:21:29 | Epoch: 2 | Step: 61540 | Dataset: 0-9776802 | Loss: 1.701 | 675 ms/step , 58254.63 GFLOP/s , 532084.2 tokens/s INFO:__main__:2024-10-27 07:21:37 | Epoch: 2 | Step: 61550 | Dataset: 0-9784802 | Loss: 1.689 | 677 ms/step , 58090.62 GFLOP/s , 531683.0 tokens/s INFO:__main__:2024-10-27 07:21:44 | Epoch: 2 | Step: 61560 | Dataset: 0-9792802 | Loss: 1.660 | 675 ms/step , 58218.51 GFLOP/s , 532009.9 tokens/s INFO:__main__:2024-10-27 07:21:52 | Epoch: 2 | Step: 61570 | Dataset: 0-9800802 | Loss: 1.685 | 675 ms/step , 58236.48 GFLOP/s , 532407.8 tokens/s INFO:__main__:2024-10-27 07:22:00 | Epoch: 2 | Step: 61580 | Dataset: 0-9808802 | Loss: 1.696 | 677 ms/step , 58081.15 GFLOP/s , 531802.0 tokens/s INFO:__main__:2024-10-27 07:22:07 | Epoch: 2 | Step: 61590 | Dataset: 0-9816802 | Loss: 2.219 | 675 ms/step , 58247.90 GFLOP/s , 532372.2 tokens/s INFO:__main__:2024-10-27 07:22:15 | Epoch: 2 | Step: 61600 | Dataset: 0-9824802 | Loss: 2.045 | 674 ms/step , 58308.93 GFLOP/s , 532439.2 tokens/s INFO:__main__:2024-10-27 07:22:23 | Epoch: 2 | Step: 61610 | Dataset: 0-9832802 | Loss: 2.181 | 674 ms/step , 58338.18 GFLOP/s , 532488.8 tokens/s INFO:__main__:2024-10-27 07:22:30 | Epoch: 2 | Step: 61620 | Dataset: 0-9840802 | Loss: 2.127 | 690 ms/step , 56955.40 GFLOP/s , 531472.0 tokens/s INFO:__main__:2024-10-27 07:22:38 | Epoch: 2 | Step: 61630 | Dataset: 0-9848802 | Loss: 2.089 | 675 ms/step , 58237.58 GFLOP/s , 532350.2 tokens/s INFO:__main__:2024-10-27 07:22:46 | Epoch: 2 | Step: 61640 | Dataset: 0-9856802 | Loss: 2.132 | 675 ms/step , 58246.52 GFLOP/s , 532329.4 tokens/s INFO:__main__:2024-10-27 07:22:54 | Epoch: 2 | Step: 61650 | Dataset: 0-9864802 | Loss: 2.164 | 675 ms/step , 58277.45 GFLOP/s , 532337.3 tokens/s INFO:__main__:2024-10-27 07:23:01 | Epoch: 2 | Step: 61660 | Dataset: 0-9872802 | Loss: 2.169 | 674 ms/step , 58338.55 GFLOP/s , 532845.3 tokens/s INFO:__main__:2024-10-27 07:23:09 | Epoch: 2 | Step: 61670 | Dataset: 0-9880802 | Loss: 2.059 | 675 ms/step , 58257.18 GFLOP/s , 531545.7 tokens/s INFO:__main__:2024-10-27 07:23:17 | Epoch: 2 | Step: 61680 | Dataset: 0-9888802 | Loss: 2.089 | 675 ms/step , 58271.54 GFLOP/s , 533005.6 tokens/s INFO:__main__:2024-10-27 07:23:24 | Epoch: 2 | Step: 61690 | Dataset: 0-9896802 | Loss: 2.029 | 673 ms/step , 58369.26 GFLOP/s , 532763.7 tokens/s INFO:__main__:2024-10-27 07:23:32 | Epoch: 2 | Step: 61700 | Dataset: 0-9904802 | Loss: 2.142 | 674 ms/step , 58326.07 GFLOP/s , 533041.8 tokens/s INFO:__main__:2024-10-27 07:23:40 | Epoch: 2 | Step: 61710 | Dataset: 0-9912802 | Loss: 2.180 | 674 ms/step , 58317.48 GFLOP/s , 532798.2 tokens/s INFO:__main__:2024-10-27 07:23:47 | Epoch: 2 | Step: 61720 | Dataset: 0-9920802 | Loss: 2.111 | 674 ms/step , 58346.74 GFLOP/s , 533108.4 tokens/s INFO:__main__:2024-10-27 07:23:55 | Epoch: 2 | Step: 61730 | Dataset: 0-9928802 | Loss: 2.060 | 675 ms/step , 58223.61 GFLOP/s , 532606.9 tokens/s INFO:__main__:2024-10-27 07:24:03 | Epoch: 2 | Step: 61740 | Dataset: 0-9936802 | Loss: 2.106 | 675 ms/step , 58233.74 GFLOP/s , 532748.8 tokens/s INFO:__main__:2024-10-27 07:24:10 | Epoch: 2 | Step: 61750 | Dataset: 0-9944802 | Loss: 2.279 | 675 ms/step , 58267.81 GFLOP/s , 532172.7 tokens/s INFO:__main__:2024-10-27 07:24:18 | Epoch: 2 | Step: 61760 | Dataset: 0-9952802 | Loss: 2.205 | 674 ms/step , 58345.34 GFLOP/s , 532542.6 tokens/s INFO:__main__:2024-10-27 07:24:26 | Epoch: 2 | Step: 61770 | Dataset: 0-9960802 | Loss: 2.216 | 676 ms/step , 58163.26 GFLOP/s , 532692.1 tokens/s INFO:__main__:2024-10-27 07:24:34 | Epoch: 2 | Step: 61780 | Dataset: 0-9968802 | Loss: 2.191 | 676 ms/step , 58180.99 GFLOP/s , 531909.3 tokens/s INFO:__main__:2024-10-27 07:24:41 | Epoch: 2 | Step: 61790 | Dataset: 0-9976802 | Loss: 2.153 | 676 ms/step , 58165.80 GFLOP/s , 532239.4 tokens/s INFO:__main__:2024-10-27 07:24:49 | Epoch: 2 | Step: 61800 | Dataset: 0-9984802 | Loss: 2.233 | 675 ms/step , 58269.86 GFLOP/s , 531785.2 tokens/s INFO:__main__:2024-10-27 07:24:57 | Epoch: 2 | Step: 61810 | Dataset: 0-9992802 | Loss: 2.122 | 675 ms/step , 58197.29 GFLOP/s , 532661.5 tokens/s INFO:__main__:2024-10-27 07:25:04 | Epoch: 2 | Step: 61820 | Dataset: 0-10000802 | Loss: 2.225 | 675 ms/step , 58212.74 GFLOP/s , 532564.0 tokens/s INFO:__main__:2024-10-27 07:25:12 | Epoch: 2 | Step: 61830 | Dataset: 0-10008802 | Loss: 2.231 | 673 ms/step , 58395.67 GFLOP/s , 531984.7 tokens/s INFO:__main__:2024-10-27 07:25:20 | Epoch: 2 | Step: 61840 | Dataset: 0-10016802 | Loss: 2.086 | 675 ms/step , 58252.96 GFLOP/s , 532209.4 tokens/s INFO:__main__:2024-10-27 07:25:27 | Epoch: 2 | Step: 61850 | Dataset: 0-10024802 | Loss: 2.220 | 673 ms/step , 58367.90 GFLOP/s , 532582.2 tokens/s INFO:__main__:2024-10-27 07:25:35 | Epoch: 2 | Step: 61860 | Dataset: 0-10032802 | Loss: 2.185 | 675 ms/step , 58229.96 GFLOP/s , 532743.7 tokens/s INFO:__main__:2024-10-27 07:25:43 | Epoch: 2 | Step: 61870 | Dataset: 0-10040802 | Loss: 2.119 | 675 ms/step , 58246.57 GFLOP/s , 533198.4 tokens/s INFO:__main__:2024-10-27 07:25:50 | Epoch: 2 | Step: 61880 | Dataset: 0-10048802 | Loss: 2.134 | 674 ms/step , 58308.51 GFLOP/s , 532730.0 tokens/s INFO:__main__:2024-10-27 07:25:58 | Epoch: 2 | Step: 61890 | Dataset: 0-10056802 | Loss: 2.177 | 676 ms/step , 58150.09 GFLOP/s , 532149.7 tokens/s INFO:__main__:2024-10-27 07:26:06 | Epoch: 2 | Step: 61900 | Dataset: 0-10064802 | Loss: 2.158 | 675 ms/step , 58253.82 GFLOP/s , 532221.3 tokens/s INFO:__main__:2024-10-27 07:26:14 | Epoch: 2 | Step: 61910 | Dataset: 0-10072802 | Loss: 1.876 | 675 ms/step , 58198.61 GFLOP/s , 532246.5 tokens/s INFO:__main__:2024-10-27 07:26:21 | Epoch: 2 | Step: 61920 | Dataset: 0-10080802 | Loss: 1.826 | 676 ms/step , 58189.83 GFLOP/s , 531386.6 tokens/s INFO:__main__:2024-10-27 07:26:29 | Epoch: 2 | Step: 61930 | Dataset: 0-10088802 | Loss: 1.803 | 674 ms/step , 58282.48 GFLOP/s , 532323.1 tokens/s INFO:__main__:2024-10-27 07:26:37 | Epoch: 2 | Step: 61940 | Dataset: 0-10096802 | Loss: 1.796 | 675 ms/step , 58260.51 GFLOP/s , 531990.9 tokens/s INFO:__main__:2024-10-27 07:26:44 | Epoch: 2 | Step: 61950 | Dataset: 0-10104802 | Loss: 1.796 | 674 ms/step , 58294.97 GFLOP/s , 532617.4 tokens/s INFO:__main__:2024-10-27 07:26:52 | Epoch: 2 | Step: 61960 | Dataset: 0-10112802 | Loss: 1.785 | 675 ms/step , 58243.15 GFLOP/s , 532583.6 tokens/s INFO:__main__:2024-10-27 07:27:00 | Epoch: 2 | Step: 61970 | Dataset: 0-10120802 | Loss: 1.765 | 675 ms/step , 58232.58 GFLOP/s , 531948.0 tokens/s INFO:__main__:2024-10-27 07:27:07 | Epoch: 2 | Step: 61980 | Dataset: 0-10128802 | Loss: 1.798 | 675 ms/step , 58217.93 GFLOP/s , 532872.0 tokens/s INFO:__main__:2024-10-27 07:27:15 | Epoch: 2 | Step: 61990 | Dataset: 0-10136802 | Loss: 1.756 | 674 ms/step , 58301.23 GFLOP/s , 532349.9 tokens/s INFO:__main__:2024-10-27 07:27:22 | Validation | Step: 62000 | Val_loss: 2.184 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 07:27:22 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_072722_step_62000.pt` INFO:__main__:2024-10-27 07:27:24 | Epoch: 2 | Step: 62000 | Dataset: 0-10144802 | Loss: 2.190 | 673 ms/step , 58371.96 GFLOP/s , 480979.3 tokens/s INFO:__main__:2024-10-27 07:27:31 | Epoch: 2 | Step: 62010 | Dataset: 0-10152802 | Loss: 2.159 | 675 ms/step , 58272.43 GFLOP/s , 531730.5 tokens/s INFO:__main__:2024-10-27 07:27:39 | Epoch: 2 | Step: 62020 | Dataset: 0-10160802 | Loss: 2.103 | 675 ms/step , 58238.23 GFLOP/s , 532963.7 tokens/s INFO:__main__:2024-10-27 07:27:47 | Epoch: 2 | Step: 62030 | Dataset: 0-10168802 | Loss: 2.170 | 674 ms/step , 58280.97 GFLOP/s , 532800.1 tokens/s INFO:__main__:2024-10-27 07:27:54 | Epoch: 2 | Step: 62040 | Dataset: 0-10176802 | Loss: 2.082 | 675 ms/step , 58216.63 GFLOP/s , 532978.0 tokens/s INFO:__main__:2024-10-27 07:28:02 | Epoch: 2 | Step: 62050 | Dataset: 0-10184802 | Loss: 2.079 | 675 ms/step , 58222.43 GFLOP/s , 532631.9 tokens/s INFO:__main__:2024-10-27 07:28:10 | Epoch: 2 | Step: 62060 | Dataset: 0-10192802 | Loss: 2.063 | 674 ms/step , 58280.27 GFLOP/s , 532611.6 tokens/s INFO:__main__:2024-10-27 07:28:17 | Epoch: 2 | Step: 62070 | Dataset: 0-10200802 | Loss: 2.123 | 673 ms/step , 58398.53 GFLOP/s , 533406.6 tokens/s INFO:__main__:2024-10-27 07:28:25 | Epoch: 2 | Step: 62080 | Dataset: 0-10208802 | Loss: 2.122 | 675 ms/step , 58254.34 GFLOP/s , 532784.8 tokens/s INFO:__main__:2024-10-27 07:28:33 | Epoch: 2 | Step: 62090 | Dataset: 0-10216802 | Loss: 2.142 | 675 ms/step , 58232.31 GFLOP/s , 532931.1 tokens/s INFO:__main__:2024-10-27 07:28:41 | Epoch: 2 | Step: 62100 | Dataset: 0-10224802 | Loss: 2.127 | 675 ms/step , 58241.86 GFLOP/s , 532524.8 tokens/s INFO:__main__:2024-10-27 07:28:48 | Epoch: 2 | Step: 62110 | Dataset: 0-10232802 | Loss: 2.190 | 675 ms/step , 58237.82 GFLOP/s , 532445.2 tokens/s INFO:__main__:2024-10-27 07:28:56 | Epoch: 2 | Step: 62120 | Dataset: 0-10240802 | Loss: 2.083 | 674 ms/step , 58288.18 GFLOP/s , 532541.6 tokens/s INFO:__main__:2024-10-27 07:29:04 | Epoch: 2 | Step: 62130 | Dataset: 0-10248802 | Loss: 2.156 | 675 ms/step , 58217.36 GFLOP/s , 532526.6 tokens/s INFO:__main__:2024-10-27 07:29:11 | Epoch: 2 | Step: 62140 | Dataset: 0-10256802 | Loss: 2.039 | 674 ms/step , 58352.04 GFLOP/s , 533326.4 tokens/s INFO:__main__:2024-10-27 07:29:19 | Epoch: 2 | Step: 62150 | Dataset: 0-10264802 | Loss: 2.128 | 676 ms/step , 58184.71 GFLOP/s , 532482.9 tokens/s INFO:__main__:2024-10-27 07:29:27 | Epoch: 2 | Step: 62160 | Dataset: 0-10272802 | Loss: 2.281 | 674 ms/step , 58284.55 GFLOP/s , 532743.9 tokens/s INFO:__main__:2024-10-27 07:29:34 | Epoch: 2 | Step: 62170 | Dataset: 0-10280802 | Loss: 2.232 | 675 ms/step , 58255.98 GFLOP/s , 532927.4 tokens/s INFO:__main__:2024-10-27 07:29:42 | Epoch: 2 | Step: 62180 | Dataset: 0-10288802 | Loss: 2.177 | 675 ms/step , 58228.98 GFLOP/s , 532475.6 tokens/s INFO:__main__:2024-10-27 07:29:50 | Epoch: 2 | Step: 62190 | Dataset: 0-10296802 | Loss: 2.153 | 675 ms/step , 58204.84 GFLOP/s , 532937.4 tokens/s INFO:__main__:2024-10-27 07:29:57 | Epoch: 2 | Step: 62200 | Dataset: 0-10304802 | Loss: 2.183 | 675 ms/step , 58276.54 GFLOP/s , 532844.0 tokens/s INFO:__main__:2024-10-27 07:30:05 | Epoch: 2 | Step: 62210 | Dataset: 0-10312802 | Loss: 2.135 | 675 ms/step , 58224.03 GFLOP/s , 532479.5 tokens/s INFO:__main__:2024-10-27 07:30:13 | Epoch: 2 | Step: 62220 | Dataset: 0-10320802 | Loss: 2.102 | 675 ms/step , 58248.94 GFLOP/s , 532356.1 tokens/s INFO:__main__:2024-10-27 07:30:20 | Epoch: 2 | Step: 62230 | Dataset: 0-10328802 | Loss: 2.124 | 676 ms/step , 58148.74 GFLOP/s , 531848.6 tokens/s INFO:__main__:2024-10-27 07:30:28 | Epoch: 2 | Step: 62240 | Dataset: 0-10336802 | Loss: 2.208 | 675 ms/step , 58237.44 GFLOP/s , 532425.7 tokens/s INFO:__main__:2024-10-27 07:30:36 | Epoch: 2 | Step: 62250 | Dataset: 0-10344802 | Loss: 2.135 | 675 ms/step , 58237.68 GFLOP/s , 531929.5 tokens/s INFO:__main__:2024-10-27 07:30:44 | Epoch: 2 | Step: 62260 | Dataset: 0-10352802 | Loss: 2.080 | 675 ms/step , 58259.58 GFLOP/s , 532039.7 tokens/s INFO:__main__:2024-10-27 07:30:51 | Epoch: 2 | Step: 62270 | Dataset: 0-10360802 | Loss: 2.107 | 676 ms/step , 58153.83 GFLOP/s , 531771.3 tokens/s INFO:__main__:2024-10-27 07:30:59 | Epoch: 2 | Step: 62280 | Dataset: 0-10368802 | Loss: 2.105 | 675 ms/step , 58244.68 GFLOP/s , 532218.4 tokens/s INFO:__main__:2024-10-27 07:31:07 | Epoch: 2 | Step: 62290 | Dataset: 0-10376802 | Loss: 2.068 | 676 ms/step , 58130.25 GFLOP/s , 531894.2 tokens/s INFO:__main__:2024-10-27 07:31:14 | Epoch: 2 | Step: 62300 | Dataset: 0-10384802 | Loss: 2.145 | 676 ms/step , 58135.06 GFLOP/s , 532144.7 tokens/s INFO:__main__:2024-10-27 07:31:22 | Epoch: 2 | Step: 62310 | Dataset: 0-10392802 | Loss: 2.145 | 675 ms/step , 58239.93 GFLOP/s , 532318.0 tokens/s INFO:__main__:2024-10-27 07:31:30 | Epoch: 2 | Step: 62320 | Dataset: 0-10400802 | Loss: 2.177 | 676 ms/step , 58156.44 GFLOP/s , 531733.1 tokens/s INFO:__main__:2024-10-27 07:31:37 | Epoch: 2 | Step: 62330 | Dataset: 0-10408802 | Loss: 2.222 | 674 ms/step , 58331.70 GFLOP/s , 531930.3 tokens/s INFO:__main__:2024-10-27 07:31:45 | Epoch: 2 | Step: 62340 | Dataset: 0-10416802 | Loss: 2.206 | 676 ms/step , 58154.85 GFLOP/s , 532672.1 tokens/s INFO:__main__:2024-10-27 07:31:53 | Epoch: 2 | Step: 62350 | Dataset: 0-10424802 | Loss: 2.188 | 676 ms/step , 58112.64 GFLOP/s , 532547.7 tokens/s INFO:__main__:2024-10-27 07:32:01 | Epoch: 2 | Step: 62360 | Dataset: 0-10432802 | Loss: 2.115 | 675 ms/step , 58227.21 GFLOP/s , 532361.3 tokens/s INFO:__main__:2024-10-27 07:32:08 | Epoch: 2 | Step: 62370 | Dataset: 0-10440802 | Loss: 2.190 | 674 ms/step , 58323.10 GFLOP/s , 532544.3 tokens/s INFO:__main__:2024-10-27 07:32:16 | Epoch: 2 | Step: 62380 | Dataset: 0-10448802 | Loss: 2.166 | 675 ms/step , 58213.04 GFLOP/s , 532700.3 tokens/s INFO:__main__:2024-10-27 07:32:24 | Epoch: 2 | Step: 62390 | Dataset: 0-10456802 | Loss: 2.160 | 676 ms/step , 58125.48 GFLOP/s , 532137.6 tokens/s INFO:__main__:2024-10-27 07:32:31 | Epoch: 2 | Step: 62400 | Dataset: 0-10464802 | Loss: 2.174 | 676 ms/step , 58184.39 GFLOP/s , 532735.9 tokens/s INFO:__main__:2024-10-27 07:32:39 | Epoch: 2 | Step: 62410 | Dataset: 0-10472802 | Loss: 2.190 | 675 ms/step , 58259.88 GFLOP/s , 532558.8 tokens/s INFO:__main__:2024-10-27 07:32:47 | Epoch: 2 | Step: 62420 | Dataset: 0-10480802 | Loss: 2.223 | 676 ms/step , 58184.31 GFLOP/s , 532309.5 tokens/s INFO:__main__:2024-10-27 07:32:54 | Epoch: 2 | Step: 62430 | Dataset: 0-10488802 | Loss: 2.193 | 675 ms/step , 58268.03 GFLOP/s , 532307.5 tokens/s INFO:__main__:2024-10-27 07:33:02 | Epoch: 2 | Step: 62440 | Dataset: 0-10496802 | Loss: 2.192 | 676 ms/step , 58157.23 GFLOP/s , 532277.6 tokens/s INFO:__main__:2024-10-27 07:33:10 | Epoch: 2 | Step: 62450 | Dataset: 0-10504802 | Loss: 2.201 | 676 ms/step , 58177.64 GFLOP/s , 532396.0 tokens/s INFO:__main__:2024-10-27 07:33:17 | Epoch: 2 | Step: 62460 | Dataset: 0-10512802 | Loss: 2.158 | 674 ms/step , 58282.69 GFLOP/s , 532613.1 tokens/s INFO:__main__:2024-10-27 07:33:25 | Epoch: 2 | Step: 62470 | Dataset: 0-10520802 | Loss: 2.228 | 675 ms/step , 58208.35 GFLOP/s , 532382.8 tokens/s INFO:__main__:2024-10-27 07:33:33 | Epoch: 2 | Step: 62480 | Dataset: 0-10528802 | Loss: 1.821 | 676 ms/step , 58177.62 GFLOP/s , 532147.4 tokens/s INFO:__main__:2024-10-27 07:33:41 | Epoch: 2 | Step: 62490 | Dataset: 0-10536802 | Loss: 1.742 | 675 ms/step , 58273.01 GFLOP/s , 531838.1 tokens/s INFO:__main__:2024-10-27 07:33:48 | Epoch: 2 | Step: 62500 | Dataset: 0-10544802 | Loss: 1.709 | 675 ms/step , 58275.62 GFLOP/s , 532422.4 tokens/s INFO:__main__:2024-10-27 07:33:56 | Epoch: 2 | Step: 62510 | Dataset: 0-10552802 | Loss: 1.736 | 675 ms/step , 58202.80 GFLOP/s , 532131.0 tokens/s INFO:__main__:2024-10-27 07:34:04 | Epoch: 2 | Step: 62520 | Dataset: 0-10560802 | Loss: 1.710 | 676 ms/step , 58188.46 GFLOP/s , 532220.2 tokens/s INFO:__main__:2024-10-27 07:34:11 | Epoch: 2 | Step: 62530 | Dataset: 0-10568802 | Loss: 1.686 | 676 ms/step , 58192.19 GFLOP/s , 532089.2 tokens/s INFO:__main__:2024-10-27 07:34:19 | Epoch: 2 | Step: 62540 | Dataset: 0-10576802 | Loss: 1.691 | 675 ms/step , 58235.18 GFLOP/s , 532380.0 tokens/s INFO:__main__:2024-10-27 07:34:27 | Epoch: 2 | Step: 62550 | Dataset: 0-10584802 | Loss: 1.661 | 675 ms/step , 58227.48 GFLOP/s , 531680.5 tokens/s INFO:__main__:2024-10-27 07:34:34 | Epoch: 2 | Step: 62560 | Dataset: 0-10592802 | Loss: 1.722 | 675 ms/step , 58242.32 GFLOP/s , 532291.4 tokens/s INFO:__main__:2024-10-27 07:34:42 | Epoch: 2 | Step: 62570 | Dataset: 0-10600802 | Loss: 1.663 | 674 ms/step , 58302.37 GFLOP/s , 532268.3 tokens/s INFO:__main__:2024-10-27 07:34:50 | Epoch: 2 | Step: 62580 | Dataset: 0-10608802 | Loss: 1.632 | 675 ms/step , 58262.79 GFLOP/s , 531973.9 tokens/s INFO:__main__:2024-10-27 07:34:58 | Epoch: 2 | Step: 62590 | Dataset: 0-10616802 | Loss: 1.615 | 677 ms/step , 58100.67 GFLOP/s , 531867.6 tokens/s INFO:__main__:2024-10-27 07:35:05 | Epoch: 2 | Step: 62600 | Dataset: 0-10624802 | Loss: 1.645 | 677 ms/step , 58052.65 GFLOP/s , 530487.8 tokens/s INFO:__main__:2024-10-27 07:35:13 | Epoch: 2 | Step: 62610 | Dataset: 0-10632802 | Loss: 1.654 | 674 ms/step , 58338.57 GFLOP/s , 531832.5 tokens/s INFO:__main__:2024-10-27 07:35:21 | Epoch: 2 | Step: 62620 | Dataset: 0-10640802 | Loss: 1.647 | 674 ms/step , 58306.75 GFLOP/s , 533025.1 tokens/s INFO:__main__:2024-10-27 07:35:28 | Epoch: 2 | Step: 62630 | Dataset: 0-10648802 | Loss: 1.660 | 675 ms/step , 58254.41 GFLOP/s , 532524.5 tokens/s INFO:__main__:2024-10-27 07:35:36 | Epoch: 2 | Step: 62640 | Dataset: 0-10656802 | Loss: 1.644 | 674 ms/step , 58342.49 GFLOP/s , 532098.8 tokens/s INFO:__main__:2024-10-27 07:35:44 | Epoch: 2 | Step: 62650 | Dataset: 0-10664802 | Loss: 2.327 | 676 ms/step , 58188.75 GFLOP/s , 532893.9 tokens/s INFO:__main__:2024-10-27 07:35:51 | Epoch: 2 | Step: 62660 | Dataset: 0-10672802 | Loss: 2.184 | 674 ms/step , 58346.15 GFLOP/s , 533208.2 tokens/s INFO:__main__:2024-10-27 07:35:59 | Epoch: 2 | Step: 62670 | Dataset: 0-10680802 | Loss: 2.140 | 674 ms/step , 58287.50 GFLOP/s , 532838.3 tokens/s INFO:__main__:2024-10-27 07:36:07 | Epoch: 2 | Step: 62680 | Dataset: 0-10688802 | Loss: 2.210 | 675 ms/step , 58227.63 GFLOP/s , 533070.5 tokens/s INFO:__main__:2024-10-27 07:36:14 | Epoch: 2 | Step: 62690 | Dataset: 0-10696802 | Loss: 2.106 | 675 ms/step , 58270.29 GFLOP/s , 532833.2 tokens/s INFO:__main__:2024-10-27 07:36:22 | Epoch: 2 | Step: 62700 | Dataset: 0-10704802 | Loss: 2.137 | 675 ms/step , 58246.51 GFLOP/s , 532954.0 tokens/s INFO:__main__:2024-10-27 07:36:30 | Epoch: 2 | Step: 62710 | Dataset: 0-10712802 | Loss: 2.069 | 677 ms/step , 58060.69 GFLOP/s , 528007.9 tokens/s INFO:__main__:2024-10-27 07:36:38 | Epoch: 2 | Step: 62720 | Dataset: 0-10720802 | Loss: 2.133 | 677 ms/step , 58084.08 GFLOP/s , 530867.6 tokens/s INFO:__main__:2024-10-27 07:36:45 | Epoch: 2 | Step: 62730 | Dataset: 0-10728802 | Loss: 2.115 | 674 ms/step , 58309.47 GFLOP/s , 531253.5 tokens/s INFO:__main__:2024-10-27 07:36:53 | Epoch: 2 | Step: 62740 | Dataset: 0-10736802 | Loss: 2.083 | 674 ms/step , 58286.40 GFLOP/s , 531486.1 tokens/s INFO:__main__:2024-10-27 07:37:01 | Epoch: 2 | Step: 62750 | Dataset: 0-10744802 | Loss: 2.146 | 674 ms/step , 58347.89 GFLOP/s , 532519.9 tokens/s INFO:__main__:2024-10-27 07:37:08 | Epoch: 2 | Step: 62760 | Dataset: 0-10752802 | Loss: 2.116 | 676 ms/step , 58150.73 GFLOP/s , 531312.2 tokens/s INFO:__main__:2024-10-27 07:37:16 | Epoch: 2 | Step: 62770 | Dataset: 0-10760802 | Loss: 2.051 | 676 ms/step , 58184.36 GFLOP/s , 529955.9 tokens/s INFO:__main__:2024-10-27 07:37:24 | Epoch: 2 | Step: 62780 | Dataset: 0-10768802 | Loss: 2.167 | 674 ms/step , 58328.67 GFLOP/s , 532681.0 tokens/s INFO:__main__:2024-10-27 07:37:32 | Epoch: 2 | Step: 62790 | Dataset: 0-10776802 | Loss: 1.979 | 675 ms/step , 58260.32 GFLOP/s , 532293.7 tokens/s INFO:__main__:2024-10-27 07:37:39 | Epoch: 2 | Step: 62800 | Dataset: 0-10784802 | Loss: 2.136 | 675 ms/step , 58233.60 GFLOP/s , 532133.5 tokens/s INFO:__main__:2024-10-27 07:37:47 | Epoch: 2 | Step: 62810 | Dataset: 0-10792802 | Loss: 2.154 | 674 ms/step , 58324.33 GFLOP/s , 531343.2 tokens/s INFO:__main__:2024-10-27 07:37:55 | Epoch: 2 | Step: 62820 | Dataset: 0-10800802 | Loss: 1.696 | 676 ms/step , 58165.31 GFLOP/s , 532140.4 tokens/s INFO:__main__:2024-10-27 07:38:02 | Epoch: 2 | Step: 62830 | Dataset: 0-10808802 | Loss: 1.671 | 675 ms/step , 58210.21 GFLOP/s , 531616.3 tokens/s INFO:__main__:2024-10-27 07:38:10 | Epoch: 2 | Step: 62840 | Dataset: 0-10816802 | Loss: 1.673 | 675 ms/step , 58210.50 GFLOP/s , 531389.5 tokens/s INFO:__main__:2024-10-27 07:38:18 | Epoch: 2 | Step: 62850 | Dataset: 0-10824802 | Loss: 1.651 | 683 ms/step , 57576.69 GFLOP/s , 530436.6 tokens/s INFO:__main__:2024-10-27 07:38:26 | Epoch: 2 | Step: 62860 | Dataset: 0-10832802 | Loss: 1.618 | 676 ms/step , 58165.30 GFLOP/s , 531071.3 tokens/s INFO:__main__:2024-10-27 07:38:33 | Epoch: 2 | Step: 62870 | Dataset: 0-10840802 | Loss: 1.655 | 675 ms/step , 58226.72 GFLOP/s , 531360.8 tokens/s INFO:__main__:2024-10-27 07:38:41 | Epoch: 2 | Step: 62880 | Dataset: 0-10848802 | Loss: 1.673 | 675 ms/step , 58272.09 GFLOP/s , 530714.1 tokens/s INFO:__main__:2024-10-27 07:38:49 | Epoch: 2 | Step: 62890 | Dataset: 0-10856802 | Loss: 1.651 | 675 ms/step , 58201.66 GFLOP/s , 530738.7 tokens/s INFO:__main__:2024-10-27 07:38:56 | Epoch: 2 | Step: 62900 | Dataset: 0-10864802 | Loss: 1.647 | 676 ms/step , 58191.83 GFLOP/s , 531264.1 tokens/s INFO:__main__:2024-10-27 07:39:04 | Epoch: 2 | Step: 62910 | Dataset: 0-10872802 | Loss: 1.653 | 675 ms/step , 58222.94 GFLOP/s , 532546.5 tokens/s INFO:__main__:2024-10-27 07:39:12 | Epoch: 2 | Step: 62920 | Dataset: 0-10880802 | Loss: 1.644 | 700 ms/step , 56183.61 GFLOP/s , 529967.3 tokens/s INFO:__main__:2024-10-27 07:39:19 | Epoch: 2 | Step: 62930 | Dataset: 0-10888802 | Loss: 1.655 | 674 ms/step , 58279.82 GFLOP/s , 532255.9 tokens/s INFO:__main__:2024-10-27 07:39:27 | Epoch: 2 | Step: 62940 | Dataset: 0-10896802 | Loss: 1.623 | 675 ms/step , 58277.21 GFLOP/s , 532578.5 tokens/s INFO:__main__:2024-10-27 07:39:35 | Epoch: 2 | Step: 62950 | Dataset: 0-10904802 | Loss: 1.651 | 676 ms/step , 58184.49 GFLOP/s , 531613.1 tokens/s INFO:__main__:2024-10-27 07:39:43 | Epoch: 2 | Step: 62960 | Dataset: 0-10912802 | Loss: 1.604 | 674 ms/step , 58291.52 GFLOP/s , 531906.5 tokens/s INFO:__main__:2024-10-27 07:39:50 | Epoch: 2 | Step: 62970 | Dataset: 0-10920802 | Loss: 1.629 | 675 ms/step , 58263.16 GFLOP/s , 531734.6 tokens/s INFO:__main__:2024-10-27 07:39:58 | Epoch: 2 | Step: 62980 | Dataset: 0-10928802 | Loss: 1.597 | 675 ms/step , 58207.40 GFLOP/s , 531857.3 tokens/s INFO:__main__:2024-10-27 07:40:06 | Epoch: 2 | Step: 62990 | Dataset: 0-10936802 | Loss: 2.288 | 676 ms/step , 58173.47 GFLOP/s , 531995.4 tokens/s INFO:__main__:2024-10-27 07:40:13 | Validation | Step: 63000 | Val_loss: 2.279 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 07:40:13 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_074013_step_63000.pt` INFO:__main__:2024-10-27 07:40:14 | Epoch: 2 | Step: 63000 | Dataset: 0-10944802 | Loss: 2.226 | 674 ms/step , 58340.75 GFLOP/s , 480527.5 tokens/s INFO:__main__:2024-10-27 07:40:22 | Epoch: 2 | Step: 63010 | Dataset: 0-10952802 | Loss: 2.174 | 676 ms/step , 58186.75 GFLOP/s , 531818.5 tokens/s INFO:__main__:2024-10-27 07:40:30 | Epoch: 2 | Step: 63020 | Dataset: 0-10960802 | Loss: 2.197 | 676 ms/step , 58127.59 GFLOP/s , 531856.5 tokens/s INFO:__main__:2024-10-27 07:40:37 | Epoch: 2 | Step: 63030 | Dataset: 0-10968802 | Loss: 2.175 | 675 ms/step , 58197.39 GFLOP/s , 532315.1 tokens/s INFO:__main__:2024-10-27 07:40:45 | Epoch: 2 | Step: 63040 | Dataset: 0-10976802 | Loss: 2.190 | 676 ms/step , 58143.85 GFLOP/s , 531817.6 tokens/s INFO:__main__:2024-10-27 07:40:53 | Epoch: 2 | Step: 63050 | Dataset: 0-10984802 | Loss: 2.117 | 675 ms/step , 58198.96 GFLOP/s , 532046.5 tokens/s INFO:__main__:2024-10-27 07:41:00 | Epoch: 2 | Step: 63060 | Dataset: 0-10992802 | Loss: 2.168 | 675 ms/step , 58217.68 GFLOP/s , 531999.6 tokens/s INFO:__main__:2024-10-27 07:41:08 | Epoch: 2 | Step: 63070 | Dataset: 0-11000802 | Loss: 2.186 | 675 ms/step , 58224.23 GFLOP/s , 532166.9 tokens/s INFO:__main__:2024-10-27 07:41:16 | Epoch: 2 | Step: 63080 | Dataset: 0-11008802 | Loss: 2.123 | 675 ms/step , 58264.06 GFLOP/s , 533156.4 tokens/s INFO:__main__:2024-10-27 07:41:23 | Epoch: 2 | Step: 63090 | Dataset: 0-11016802 | Loss: 2.149 | 676 ms/step , 58177.51 GFLOP/s , 532080.6 tokens/s INFO:__main__:2024-10-27 07:41:31 | Epoch: 2 | Step: 63100 | Dataset: 0-11024802 | Loss: 2.250 | 676 ms/step , 58181.04 GFLOP/s , 531928.7 tokens/s INFO:__main__:2024-10-27 07:41:39 | Epoch: 2 | Step: 63110 | Dataset: 0-11032802 | Loss: 2.132 | 676 ms/step , 58189.31 GFLOP/s , 531930.0 tokens/s INFO:__main__:2024-10-27 07:41:47 | Epoch: 2 | Step: 63120 | Dataset: 0-11040802 | Loss: 2.185 | 674 ms/step , 58282.53 GFLOP/s , 532405.9 tokens/s INFO:__main__:2024-10-27 07:41:54 | Epoch: 2 | Step: 63130 | Dataset: 0-11048802 | Loss: 2.184 | 676 ms/step , 58133.59 GFLOP/s , 531808.9 tokens/s INFO:__main__:2024-10-27 07:42:02 | Epoch: 2 | Step: 63140 | Dataset: 0-11056802 | Loss: 2.183 | 673 ms/step , 58370.86 GFLOP/s , 532467.9 tokens/s INFO:__main__:2024-10-27 07:42:10 | Epoch: 2 | Step: 63150 | Dataset: 0-11064802 | Loss: 2.161 | 676 ms/step , 58125.39 GFLOP/s , 531986.6 tokens/s INFO:__main__:2024-10-27 07:42:17 | Epoch: 2 | Step: 63160 | Dataset: 0-11072802 | Loss: 2.098 | 675 ms/step , 58226.92 GFLOP/s , 532655.3 tokens/s INFO:__main__:2024-10-27 07:42:25 | Epoch: 2 | Step: 63170 | Dataset: 0-11080802 | Loss: 2.207 | 675 ms/step , 58249.18 GFLOP/s , 532726.0 tokens/s INFO:__main__:2024-10-27 07:42:33 | Epoch: 2 | Step: 63180 | Dataset: 0-11088802 | Loss: 2.071 | 676 ms/step , 58183.31 GFLOP/s , 532205.1 tokens/s INFO:__main__:2024-10-27 07:42:40 | Epoch: 2 | Step: 63190 | Dataset: 0-11096802 | Loss: 2.147 | 674 ms/step , 58301.01 GFLOP/s , 532456.4 tokens/s INFO:__main__:2024-10-27 07:42:48 | Epoch: 2 | Step: 63200 | Dataset: 0-11104802 | Loss: 2.082 | 675 ms/step , 58215.65 GFLOP/s , 532121.8 tokens/s INFO:__main__:2024-10-27 07:42:56 | Epoch: 2 | Step: 63210 | Dataset: 0-11112802 | Loss: 2.080 | 674 ms/step , 58291.97 GFLOP/s , 532801.5 tokens/s INFO:__main__:2024-10-27 07:43:04 | Epoch: 2 | Step: 63220 | Dataset: 0-11120802 | Loss: 2.117 | 674 ms/step , 58296.74 GFLOP/s , 532632.9 tokens/s INFO:__main__:2024-10-27 07:43:11 | Epoch: 2 | Step: 63230 | Dataset: 0-11128802 | Loss: 2.184 | 675 ms/step , 58265.90 GFLOP/s , 532489.9 tokens/s INFO:__main__:2024-10-27 07:43:19 | Epoch: 2 | Step: 63240 | Dataset: 0-11136802 | Loss: 2.122 | 674 ms/step , 58325.89 GFLOP/s , 533118.3 tokens/s INFO:__main__:2024-10-27 07:43:27 | Epoch: 2 | Step: 63250 | Dataset: 0-11144802 | Loss: 2.083 | 676 ms/step , 58146.32 GFLOP/s , 532185.5 tokens/s INFO:__main__:2024-10-27 07:43:34 | Epoch: 2 | Step: 63260 | Dataset: 0-11152802 | Loss: 2.074 | 675 ms/step , 58262.20 GFLOP/s , 532611.5 tokens/s INFO:__main__:2024-10-27 07:43:42 | Epoch: 2 | Step: 63270 | Dataset: 0-11160802 | Loss: 2.139 | 674 ms/step , 58312.71 GFLOP/s , 532670.4 tokens/s INFO:__main__:2024-10-27 07:43:50 | Epoch: 2 | Step: 63280 | Dataset: 0-11168802 | Loss: 2.138 | 676 ms/step , 58192.10 GFLOP/s , 532441.5 tokens/s INFO:__main__:2024-10-27 07:43:57 | Epoch: 2 | Step: 63290 | Dataset: 0-11176802 | Loss: 2.139 | 674 ms/step , 58326.41 GFLOP/s , 533132.1 tokens/s INFO:__main__:2024-10-27 07:44:05 | Epoch: 2 | Step: 63300 | Dataset: 0-11184802 | Loss: 2.047 | 674 ms/step , 58305.74 GFLOP/s , 532776.3 tokens/s INFO:__main__:2024-10-27 07:44:13 | Epoch: 2 | Step: 63310 | Dataset: 0-11192802 | Loss: 2.233 | 675 ms/step , 58257.10 GFLOP/s , 533123.9 tokens/s INFO:__main__:2024-10-27 07:44:20 | Epoch: 2 | Step: 63320 | Dataset: 0-11200802 | Loss: 2.197 | 675 ms/step , 58269.51 GFLOP/s , 532596.4 tokens/s INFO:__main__:2024-10-27 07:44:28 | Epoch: 2 | Step: 63330 | Dataset: 0-11208802 | Loss: 2.191 | 676 ms/step , 58174.70 GFLOP/s , 532248.8 tokens/s INFO:__main__:2024-10-27 07:44:36 | Epoch: 2 | Step: 63340 | Dataset: 0-11216802 | Loss: 2.210 | 675 ms/step , 58253.70 GFLOP/s , 532381.4 tokens/s INFO:__main__:2024-10-27 07:44:43 | Epoch: 2 | Step: 63350 | Dataset: 0-11224802 | Loss: 2.177 | 674 ms/step , 58350.00 GFLOP/s , 532733.8 tokens/s INFO:__main__:2024-10-27 07:44:51 | Epoch: 2 | Step: 63360 | Dataset: 0-11232802 | Loss: 2.221 | 673 ms/step , 58390.27 GFLOP/s , 533307.2 tokens/s INFO:__main__:2024-10-27 07:44:59 | Epoch: 2 | Step: 63370 | Dataset: 0-11240802 | Loss: 2.176 | 676 ms/step , 58188.31 GFLOP/s , 532307.5 tokens/s INFO:__main__:2024-10-27 07:45:07 | Epoch: 2 | Step: 63380 | Dataset: 0-11248802 | Loss: 2.203 | 674 ms/step , 58291.91 GFLOP/s , 532904.8 tokens/s INFO:__main__:2024-10-27 07:45:14 | Epoch: 2 | Step: 63390 | Dataset: 0-11256802 | Loss: 2.257 | 675 ms/step , 58257.94 GFLOP/s , 532312.5 tokens/s INFO:__main__:2024-10-27 07:45:22 | Epoch: 2 | Step: 63400 | Dataset: 0-11264802 | Loss: 2.167 | 675 ms/step , 58245.95 GFLOP/s , 533095.7 tokens/s INFO:__main__:2024-10-27 07:45:30 | Epoch: 2 | Step: 63410 | Dataset: 0-11272802 | Loss: 2.142 | 675 ms/step , 58265.71 GFLOP/s , 533247.6 tokens/s INFO:__main__:2024-10-27 07:45:37 | Epoch: 2 | Step: 63420 | Dataset: 0-11280802 | Loss: 2.185 | 676 ms/step , 58168.14 GFLOP/s , 532964.4 tokens/s INFO:__main__:2024-10-27 07:45:45 | Epoch: 2 | Step: 63430 | Dataset: 0-11288802 | Loss: 2.102 | 674 ms/step , 58288.37 GFLOP/s , 533056.7 tokens/s INFO:__main__:2024-10-27 07:45:53 | Epoch: 2 | Step: 63440 | Dataset: 0-11296802 | Loss: 2.126 | 675 ms/step , 58264.92 GFLOP/s , 532952.9 tokens/s INFO:__main__:2024-10-27 07:46:00 | Epoch: 2 | Step: 63450 | Dataset: 0-11304802 | Loss: 2.149 | 673 ms/step , 58377.86 GFLOP/s , 533966.6 tokens/s INFO:__main__:2024-10-27 07:46:08 | Epoch: 2 | Step: 63460 | Dataset: 0-11312802 | Loss: 2.175 | 674 ms/step , 58303.72 GFLOP/s , 533485.7 tokens/s INFO:__main__:2024-10-27 07:46:16 | Epoch: 2 | Step: 63470 | Dataset: 0-11320802 | Loss: 2.189 | 674 ms/step , 58304.09 GFLOP/s , 533038.9 tokens/s INFO:__main__:2024-10-27 07:46:23 | Epoch: 2 | Step: 63480 | Dataset: 0-11328802 | Loss: 2.127 | 675 ms/step , 58249.09 GFLOP/s , 533023.4 tokens/s INFO:__main__:2024-10-27 07:46:31 | Epoch: 2 | Step: 63490 | Dataset: 0-11336802 | Loss: 2.152 | 674 ms/step , 58284.44 GFLOP/s , 532718.2 tokens/s INFO:__main__:2024-10-27 07:46:39 | Epoch: 2 | Step: 63500 | Dataset: 0-11344802 | Loss: 2.103 | 675 ms/step , 58265.40 GFLOP/s , 532815.8 tokens/s INFO:__main__:2024-10-27 07:46:46 | Epoch: 2 | Step: 63510 | Dataset: 0-11352802 | Loss: 2.143 | 675 ms/step , 58271.71 GFLOP/s , 533096.6 tokens/s INFO:__main__:2024-10-27 07:46:54 | Epoch: 2 | Step: 63520 | Dataset: 0-11360802 | Loss: 2.128 | 674 ms/step , 58325.02 GFLOP/s , 533628.1 tokens/s INFO:__main__:2024-10-27 07:47:02 | Epoch: 2 | Step: 63530 | Dataset: 0-11368802 | Loss: 2.115 | 674 ms/step , 58289.13 GFLOP/s , 533411.9 tokens/s INFO:__main__:2024-10-27 07:47:09 | Epoch: 2 | Step: 63540 | Dataset: 0-11376802 | Loss: 2.168 | 675 ms/step , 58199.59 GFLOP/s , 533019.4 tokens/s INFO:__main__:2024-10-27 07:47:17 | Epoch: 2 | Step: 63550 | Dataset: 0-11384802 | Loss: 2.106 | 675 ms/step , 58266.79 GFLOP/s , 533004.8 tokens/s INFO:__main__:2024-10-27 07:47:25 | Epoch: 2 | Step: 63560 | Dataset: 0-11392802 | Loss: 2.143 | 676 ms/step , 58129.12 GFLOP/s , 532153.2 tokens/s INFO:__main__:2024-10-27 07:47:33 | Epoch: 2 | Step: 63570 | Dataset: 0-11400802 | Loss: 2.176 | 675 ms/step , 58244.12 GFLOP/s , 532680.7 tokens/s INFO:__main__:2024-10-27 07:47:40 | Epoch: 2 | Step: 63580 | Dataset: 0-11408802 | Loss: 2.089 | 674 ms/step , 58312.60 GFLOP/s , 532200.8 tokens/s INFO:__main__:2024-10-27 07:47:48 | Epoch: 2 | Step: 63590 | Dataset: 0-11416802 | Loss: 2.170 | 677 ms/step , 58042.02 GFLOP/s , 531973.0 tokens/s INFO:__main__:2024-10-27 07:47:56 | Epoch: 2 | Step: 63600 | Dataset: 0-11424802 | Loss: 2.151 | 674 ms/step , 58311.39 GFLOP/s , 532619.1 tokens/s INFO:__main__:2024-10-27 07:48:03 | Epoch: 2 | Step: 63610 | Dataset: 0-11432802 | Loss: 2.156 | 674 ms/step , 58309.48 GFLOP/s , 533009.7 tokens/s INFO:__main__:2024-10-27 07:48:11 | Epoch: 2 | Step: 63620 | Dataset: 0-11440802 | Loss: 2.151 | 675 ms/step , 58264.08 GFLOP/s , 532988.3 tokens/s INFO:__main__:2024-10-27 07:48:19 | Epoch: 2 | Step: 63630 | Dataset: 0-11448802 | Loss: 1.896 | 674 ms/step , 58288.33 GFLOP/s , 532807.6 tokens/s INFO:__main__:2024-10-27 07:48:26 | Epoch: 2 | Step: 63640 | Dataset: 0-11456802 | Loss: 1.770 | 677 ms/step , 58067.93 GFLOP/s , 531457.2 tokens/s INFO:__main__:2024-10-27 07:48:34 | Epoch: 2 | Step: 63650 | Dataset: 0-11464802 | Loss: 1.721 | 675 ms/step , 58255.88 GFLOP/s , 530308.5 tokens/s INFO:__main__:2024-10-27 07:48:42 | Epoch: 2 | Step: 63660 | Dataset: 0-11472802 | Loss: 1.706 | 675 ms/step , 58243.52 GFLOP/s , 531904.2 tokens/s INFO:__main__:2024-10-27 07:48:50 | Epoch: 2 | Step: 63670 | Dataset: 0-11480802 | Loss: 1.707 | 675 ms/step , 58207.21 GFLOP/s , 532019.4 tokens/s INFO:__main__:2024-10-27 07:48:57 | Epoch: 2 | Step: 63680 | Dataset: 0-11488802 | Loss: 1.708 | 675 ms/step , 58241.06 GFLOP/s , 531904.8 tokens/s INFO:__main__:2024-10-27 07:49:05 | Epoch: 2 | Step: 63690 | Dataset: 0-11496802 | Loss: 1.705 | 675 ms/step , 58269.02 GFLOP/s , 532120.5 tokens/s INFO:__main__:2024-10-27 07:49:13 | Epoch: 2 | Step: 63700 | Dataset: 0-11504802 | Loss: 1.655 | 677 ms/step , 58080.99 GFLOP/s , 531561.1 tokens/s INFO:__main__:2024-10-27 07:49:20 | Epoch: 2 | Step: 63710 | Dataset: 0-11512802 | Loss: 1.668 | 675 ms/step , 58218.00 GFLOP/s , 531346.0 tokens/s INFO:__main__:2024-10-27 07:49:28 | Epoch: 2 | Step: 63720 | Dataset: 0-11520802 | Loss: 2.288 | 676 ms/step , 58122.06 GFLOP/s , 531698.9 tokens/s INFO:__main__:2024-10-27 07:49:36 | Epoch: 2 | Step: 63730 | Dataset: 0-11528802 | Loss: 2.286 | 674 ms/step , 58315.87 GFLOP/s , 531757.8 tokens/s INFO:__main__:2024-10-27 07:49:43 | Epoch: 2 | Step: 63740 | Dataset: 0-11536802 | Loss: 2.329 | 676 ms/step , 58152.50 GFLOP/s , 531905.8 tokens/s INFO:__main__:2024-10-27 07:49:51 | Epoch: 2 | Step: 63750 | Dataset: 0-11544802 | Loss: 2.219 | 674 ms/step , 58294.40 GFLOP/s , 531964.3 tokens/s INFO:__main__:2024-10-27 07:49:59 | Epoch: 2 | Step: 63760 | Dataset: 0-11552802 | Loss: 2.180 | 676 ms/step , 58163.72 GFLOP/s , 531878.9 tokens/s INFO:__main__:2024-10-27 07:50:07 | Epoch: 2 | Step: 63770 | Dataset: 0-11560802 | Loss: 2.169 | 675 ms/step , 58222.20 GFLOP/s , 532136.8 tokens/s INFO:__main__:2024-10-27 07:50:14 | Epoch: 2 | Step: 63780 | Dataset: 0-11568802 | Loss: 2.260 | 674 ms/step , 58353.71 GFLOP/s , 532904.7 tokens/s INFO:__main__:2024-10-27 07:50:22 | Epoch: 2 | Step: 63790 | Dataset: 0-11576802 | Loss: 2.143 | 675 ms/step , 58198.33 GFLOP/s , 531762.5 tokens/s INFO:__main__:2024-10-27 07:50:30 | Epoch: 2 | Step: 63800 | Dataset: 0-11584802 | Loss: 2.187 | 676 ms/step , 58117.14 GFLOP/s , 531782.2 tokens/s INFO:__main__:2024-10-27 07:50:37 | Epoch: 2 | Step: 63810 | Dataset: 0-11592802 | Loss: 2.189 | 676 ms/step , 58171.78 GFLOP/s , 531943.3 tokens/s INFO:__main__:2024-10-27 07:50:45 | Epoch: 2 | Step: 63820 | Dataset: 0-11600802 | Loss: 2.121 | 675 ms/step , 58236.13 GFLOP/s , 532047.2 tokens/s INFO:__main__:2024-10-27 07:50:53 | Epoch: 2 | Step: 63830 | Dataset: 0-11608802 | Loss: 2.180 | 675 ms/step , 58217.61 GFLOP/s , 532464.6 tokens/s INFO:__main__:2024-10-27 07:51:00 | Epoch: 2 | Step: 63840 | Dataset: 0-11616802 | Loss: 2.202 | 675 ms/step , 58271.64 GFLOP/s , 532093.7 tokens/s INFO:__main__:2024-10-27 07:51:08 | Epoch: 2 | Step: 63850 | Dataset: 0-11624802 | Loss: 2.153 | 674 ms/step , 58360.66 GFLOP/s , 532499.7 tokens/s INFO:__main__:2024-10-27 07:51:16 | Epoch: 2 | Step: 63860 | Dataset: 0-11632802 | Loss: 2.154 | 675 ms/step , 58196.76 GFLOP/s , 532228.2 tokens/s INFO:__main__:2024-10-27 07:51:24 | Epoch: 2 | Step: 63870 | Dataset: 0-11640802 | Loss: 2.069 | 675 ms/step , 58228.00 GFLOP/s , 532095.2 tokens/s INFO:__main__:2024-10-27 07:51:31 | Epoch: 2 | Step: 63880 | Dataset: 0-11648802 | Loss: 2.141 | 675 ms/step , 58254.30 GFLOP/s , 532510.5 tokens/s INFO:__main__:2024-10-27 07:51:39 | Epoch: 2 | Step: 63890 | Dataset: 0-11656802 | Loss: 2.077 | 675 ms/step , 58249.37 GFLOP/s , 532331.6 tokens/s INFO:__main__:2024-10-27 07:51:47 | Epoch: 2 | Step: 63900 | Dataset: 0-11664802 | Loss: 2.140 | 675 ms/step , 58252.91 GFLOP/s , 532682.5 tokens/s INFO:__main__:2024-10-27 07:51:54 | Epoch: 2 | Step: 63910 | Dataset: 0-11672802 | Loss: 2.058 | 675 ms/step , 58274.96 GFLOP/s , 532325.1 tokens/s INFO:__main__:2024-10-27 07:52:02 | Epoch: 2 | Step: 63920 | Dataset: 0-11680802 | Loss: 2.069 | 676 ms/step , 58178.97 GFLOP/s , 532008.9 tokens/s INFO:__main__:2024-10-27 07:52:10 | Epoch: 2 | Step: 63930 | Dataset: 0-11688802 | Loss: 2.126 | 675 ms/step , 58208.11 GFLOP/s , 532812.0 tokens/s INFO:__main__:2024-10-27 07:52:17 | Epoch: 2 | Step: 63940 | Dataset: 0-11696802 | Loss: 2.166 | 676 ms/step , 58187.66 GFLOP/s , 532965.5 tokens/s INFO:__main__:2024-10-27 07:52:25 | Epoch: 2 | Step: 63950 | Dataset: 0-11704802 | Loss: 2.193 | 675 ms/step , 58272.75 GFLOP/s , 532559.1 tokens/s INFO:__main__:2024-10-27 07:52:33 | Epoch: 2 | Step: 63960 | Dataset: 0-11712802 | Loss: 2.098 | 673 ms/step , 58372.76 GFLOP/s , 532512.2 tokens/s INFO:__main__:2024-10-27 07:52:40 | Epoch: 2 | Step: 63970 | Dataset: 0-11720802 | Loss: 2.078 | 674 ms/step , 58323.55 GFLOP/s , 532984.4 tokens/s INFO:__main__:2024-10-27 07:52:48 | Epoch: 2 | Step: 63980 | Dataset: 0-11728802 | Loss: 2.021 | 675 ms/step , 58214.48 GFLOP/s , 532793.5 tokens/s INFO:__main__:2024-10-27 07:52:56 | Epoch: 2 | Step: 63990 | Dataset: 0-11736802 | Loss: 2.201 | 675 ms/step , 58199.25 GFLOP/s , 532683.6 tokens/s INFO:__main__:2024-10-27 07:53:03 | Validation | Step: 64000 | Val_loss: 2.146 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 07:53:03 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_075303_step_64000.pt` INFO:__main__:2024-10-27 07:53:04 | Epoch: 2 | Step: 64000 | Dataset: 0-11744802 | Loss: 2.110 | 673 ms/step , 58432.15 GFLOP/s , 479331.8 tokens/s INFO:__main__:2024-10-27 07:53:12 | Epoch: 2 | Step: 64010 | Dataset: 0-11752802 | Loss: 2.129 | 674 ms/step , 58297.25 GFLOP/s , 532516.4 tokens/s INFO:__main__:2024-10-27 07:53:20 | Epoch: 2 | Step: 64020 | Dataset: 0-11760802 | Loss: 2.150 | 675 ms/step , 58253.19 GFLOP/s , 532624.6 tokens/s INFO:__main__:2024-10-27 07:53:27 | Epoch: 2 | Step: 64030 | Dataset: 0-11768802 | Loss: 2.136 | 674 ms/step , 58365.68 GFLOP/s , 533040.2 tokens/s INFO:__main__:2024-10-27 07:53:35 | Epoch: 2 | Step: 64040 | Dataset: 0-11776802 | Loss: 2.692 | 674 ms/step , 58301.70 GFLOP/s , 533191.2 tokens/s INFO:__main__:2024-10-27 07:53:43 | Epoch: 2 | Step: 64050 | Dataset: 0-11784802 | Loss: 2.578 | 673 ms/step , 58381.98 GFLOP/s , 532510.9 tokens/s INFO:__main__:2024-10-27 07:53:50 | Epoch: 2 | Step: 64060 | Dataset: 0-11792802 | Loss: 2.578 | 674 ms/step , 58297.05 GFLOP/s , 533244.5 tokens/s INFO:__main__:2024-10-27 07:53:58 | Epoch: 2 | Step: 64070 | Dataset: 0-11800802 | Loss: 2.496 | 673 ms/step , 58397.95 GFLOP/s , 533339.4 tokens/s INFO:__main__:2024-10-27 07:54:06 | Epoch: 2 | Step: 64080 | Dataset: 0-11808802 | Loss: 2.548 | 673 ms/step , 58405.83 GFLOP/s , 533854.2 tokens/s INFO:__main__:2024-10-27 07:54:13 | Epoch: 2 | Step: 64090 | Dataset: 0-11816802 | Loss: 2.499 | 676 ms/step , 58169.30 GFLOP/s , 533005.4 tokens/s INFO:__main__:2024-10-27 07:54:21 | Epoch: 2 | Step: 64100 | Dataset: 0-11824802 | Loss: 2.495 | 675 ms/step , 58208.58 GFLOP/s , 532672.1 tokens/s INFO:__main__:2024-10-27 07:54:29 | Epoch: 2 | Step: 64110 | Dataset: 0-11832802 | Loss: 2.471 | 676 ms/step , 58160.01 GFLOP/s , 529436.6 tokens/s INFO:__main__:2024-10-27 07:54:37 | Epoch: 2 | Step: 64120 | Dataset: 0-11840802 | Loss: 2.456 | 677 ms/step , 58051.39 GFLOP/s , 530634.6 tokens/s INFO:__main__:2024-10-27 07:54:44 | Epoch: 2 | Step: 64130 | Dataset: 0-11848802 | Loss: 2.482 | 676 ms/step , 58120.28 GFLOP/s , 530211.1 tokens/s INFO:__main__:2024-10-27 07:54:52 | Epoch: 2 | Step: 64140 | Dataset: 0-11856802 | Loss: 2.464 | 675 ms/step , 58238.24 GFLOP/s , 531019.5 tokens/s INFO:__main__:2024-10-27 07:55:00 | Epoch: 2 | Step: 64150 | Dataset: 0-11864802 | Loss: 2.391 | 676 ms/step , 58141.39 GFLOP/s , 529991.7 tokens/s INFO:__main__:2024-10-27 07:55:08 | Epoch: 2 | Step: 64160 | Dataset: 0-11872802 | Loss: 2.449 | 677 ms/step , 58076.06 GFLOP/s , 529807.5 tokens/s INFO:__main__:2024-10-27 07:55:15 | Epoch: 2 | Step: 64170 | Dataset: 0-11880802 | Loss: 2.409 | 675 ms/step , 58245.51 GFLOP/s , 529407.0 tokens/s INFO:__main__:2024-10-27 07:55:23 | Epoch: 2 | Step: 64180 | Dataset: 0-11888802 | Loss: 2.439 | 676 ms/step , 58121.57 GFLOP/s , 530154.5 tokens/s INFO:__main__:2024-10-27 07:55:31 | Epoch: 2 | Step: 64190 | Dataset: 0-11896802 | Loss: 2.370 | 677 ms/step , 58095.45 GFLOP/s , 529120.8 tokens/s INFO:__main__:2024-10-27 07:55:38 | Epoch: 2 | Step: 64200 | Dataset: 0-11904802 | Loss: 2.267 | 675 ms/step , 58263.02 GFLOP/s , 531163.4 tokens/s INFO:__main__:2024-10-27 07:55:46 | Epoch: 2 | Step: 64210 | Dataset: 0-11912802 | Loss: 2.268 | 675 ms/step , 58195.57 GFLOP/s , 528359.3 tokens/s INFO:__main__:2024-10-27 07:55:54 | Epoch: 2 | Step: 64220 | Dataset: 0-11920802 | Loss: 2.249 | 674 ms/step , 58343.57 GFLOP/s , 532297.1 tokens/s INFO:__main__:2024-10-27 07:56:02 | Epoch: 2 | Step: 64230 | Dataset: 0-11928802 | Loss: 2.159 | 679 ms/step , 57851.89 GFLOP/s , 532089.1 tokens/s INFO:__main__:2024-10-27 07:56:09 | Epoch: 2 | Step: 64240 | Dataset: 0-11936802 | Loss: 2.199 | 676 ms/step , 58125.31 GFLOP/s , 530751.1 tokens/s INFO:__main__:2024-10-27 07:56:17 | Epoch: 2 | Step: 64250 | Dataset: 0-11944802 | Loss: 2.237 | 676 ms/step , 58134.10 GFLOP/s , 531519.5 tokens/s INFO:__main__:2024-10-27 07:56:25 | Epoch: 2 | Step: 64260 | Dataset: 0-11952802 | Loss: 2.171 | 677 ms/step , 58083.20 GFLOP/s , 531628.1 tokens/s INFO:__main__:2024-10-27 07:56:32 | Epoch: 2 | Step: 64270 | Dataset: 0-11960802 | Loss: 2.191 | 675 ms/step , 58264.35 GFLOP/s , 531499.0 tokens/s INFO:__main__:2024-10-27 07:56:40 | Epoch: 2 | Step: 64280 | Dataset: 0-11968802 | Loss: 2.227 | 676 ms/step , 58159.77 GFLOP/s , 531476.2 tokens/s INFO:__main__:2024-10-27 07:56:48 | Epoch: 2 | Step: 64290 | Dataset: 0-11976802 | Loss: 2.171 | 676 ms/step , 58171.52 GFLOP/s , 531707.2 tokens/s INFO:__main__:2024-10-27 07:56:56 | Epoch: 2 | Step: 64300 | Dataset: 0-11984802 | Loss: 2.118 | 675 ms/step , 58228.08 GFLOP/s , 531749.3 tokens/s INFO:__main__:2024-10-27 07:57:03 | Epoch: 2 | Step: 64310 | Dataset: 0-11992802 | Loss: 2.158 | 675 ms/step , 58250.42 GFLOP/s , 531249.6 tokens/s INFO:__main__:2024-10-27 07:57:11 | Epoch: 2 | Step: 64320 | Dataset: 0-12000802 | Loss: 2.184 | 673 ms/step , 58373.19 GFLOP/s , 532507.5 tokens/s INFO:__main__:2024-10-27 07:57:19 | Epoch: 2 | Step: 64330 | Dataset: 0-12008802 | Loss: 2.064 | 673 ms/step , 58380.96 GFLOP/s , 533249.4 tokens/s INFO:__main__:2024-10-27 07:57:26 | Epoch: 2 | Step: 64340 | Dataset: 0-12016802 | Loss: 2.136 | 674 ms/step , 58323.35 GFLOP/s , 533300.8 tokens/s INFO:__main__:2024-10-27 07:57:34 | Epoch: 2 | Step: 64350 | Dataset: 0-12024802 | Loss: 2.176 | 675 ms/step , 58252.30 GFLOP/s , 531717.8 tokens/s INFO:__main__:2024-10-27 07:57:42 | Epoch: 2 | Step: 64360 | Dataset: 0-12032802 | Loss: 2.260 | 674 ms/step , 58342.27 GFLOP/s , 532641.5 tokens/s INFO:__main__:2024-10-27 07:57:49 | Epoch: 2 | Step: 64370 | Dataset: 0-12040802 | Loss: 2.163 | 674 ms/step , 58352.84 GFLOP/s , 532890.5 tokens/s INFO:__main__:2024-10-27 07:57:57 | Epoch: 2 | Step: 64380 | Dataset: 0-12048802 | Loss: 2.105 | 674 ms/step , 58313.66 GFLOP/s , 532921.1 tokens/s INFO:__main__:2024-10-27 07:58:05 | Epoch: 2 | Step: 64390 | Dataset: 0-12056802 | Loss: 2.075 | 675 ms/step , 58248.68 GFLOP/s , 532561.1 tokens/s INFO:__main__:2024-10-27 07:58:12 | Epoch: 2 | Step: 64400 | Dataset: 0-12064802 | Loss: 2.110 | 675 ms/step , 58210.66 GFLOP/s , 532747.7 tokens/s INFO:__main__:2024-10-27 07:58:20 | Epoch: 2 | Step: 64410 | Dataset: 0-12072802 | Loss: 2.142 | 676 ms/step , 58154.02 GFLOP/s , 533430.1 tokens/s INFO:__main__:2024-10-27 07:58:28 | Epoch: 2 | Step: 64420 | Dataset: 0-12080802 | Loss: 2.204 | 675 ms/step , 58223.67 GFLOP/s , 532598.0 tokens/s INFO:__main__:2024-10-27 07:58:36 | Epoch: 2 | Step: 64430 | Dataset: 0-12088802 | Loss: 2.155 | 675 ms/step , 58214.78 GFLOP/s , 531743.6 tokens/s INFO:__main__:2024-10-27 07:58:43 | Epoch: 2 | Step: 64440 | Dataset: 0-12096802 | Loss: 2.031 | 675 ms/step , 58206.78 GFLOP/s , 532676.3 tokens/s INFO:__main__:2024-10-27 07:58:51 | Epoch: 2 | Step: 64450 | Dataset: 0-12104802 | Loss: 2.047 | 674 ms/step , 58327.75 GFLOP/s , 532452.8 tokens/s INFO:__main__:2024-10-27 07:58:59 | Epoch: 2 | Step: 64460 | Dataset: 0-12112802 | Loss: 2.139 | 674 ms/step , 58353.26 GFLOP/s , 533195.0 tokens/s INFO:__main__:2024-10-27 07:59:06 | Epoch: 2 | Step: 64470 | Dataset: 0-12120802 | Loss: 2.177 | 676 ms/step , 58179.56 GFLOP/s , 532342.7 tokens/s INFO:__main__:2024-10-27 07:59:14 | Epoch: 2 | Step: 64480 | Dataset: 0-12128802 | Loss: 2.166 | 674 ms/step , 58283.83 GFLOP/s , 532310.9 tokens/s INFO:__main__:2024-10-27 07:59:22 | Epoch: 2 | Step: 64490 | Dataset: 0-12136802 | Loss: 2.129 | 676 ms/step , 58162.10 GFLOP/s , 531974.1 tokens/s INFO:__main__:2024-10-27 07:59:29 | Epoch: 2 | Step: 64500 | Dataset: 0-12144802 | Loss: 2.102 | 676 ms/step , 58178.33 GFLOP/s , 532823.4 tokens/s INFO:__main__:2024-10-27 07:59:37 | Epoch: 2 | Step: 64510 | Dataset: 0-12152802 | Loss: 2.189 | 676 ms/step , 58179.89 GFLOP/s , 532496.6 tokens/s INFO:__main__:2024-10-27 07:59:45 | Epoch: 2 | Step: 64520 | Dataset: 0-12160802 | Loss: 2.271 | 676 ms/step , 58116.35 GFLOP/s , 532229.0 tokens/s INFO:__main__:2024-10-27 07:59:52 | Epoch: 2 | Step: 64530 | Dataset: 0-12168802 | Loss: 2.221 | 676 ms/step , 58187.60 GFLOP/s , 532978.6 tokens/s INFO:__main__:2024-10-27 08:00:00 | Epoch: 2 | Step: 64540 | Dataset: 0-12176802 | Loss: 2.226 | 676 ms/step , 58130.00 GFLOP/s , 532334.7 tokens/s INFO:__main__:2024-10-27 08:00:07 | Epoch: 2 | Step: 64550 | Dataset: 0-12184802 | Loss: 2.177 | 675 ms/step , 58215.77 GFLOP/s , 613170.7 tokens/s INFO:__main__:2024-10-27 08:00:15 | Epoch: 2 | Step: 64560 | Dataset: 0-12192802 | Loss: 2.208 | 675 ms/step , 58236.58 GFLOP/s , 531711.0 tokens/s INFO:__main__:2024-10-27 08:00:22 | Epoch: 2 | Step: 64570 | Dataset: 0-12200802 | Loss: 2.189 | 676 ms/step , 58188.79 GFLOP/s , 532541.5 tokens/s INFO:__main__:2024-10-27 08:00:30 | Epoch: 2 | Step: 64580 | Dataset: 0-12208802 | Loss: 2.255 | 677 ms/step , 58101.07 GFLOP/s , 531958.3 tokens/s INFO:__main__:2024-10-27 08:00:38 | Epoch: 2 | Step: 64590 | Dataset: 0-12216802 | Loss: 2.159 | 676 ms/step , 58153.29 GFLOP/s , 531573.9 tokens/s INFO:__main__:2024-10-27 08:00:45 | Epoch: 2 | Step: 64600 | Dataset: 0-12224802 | Loss: 2.152 | 676 ms/step , 58171.17 GFLOP/s , 531719.3 tokens/s INFO:__main__:2024-10-27 08:00:53 | Epoch: 2 | Step: 64610 | Dataset: 0-12232802 | Loss: 2.120 | 676 ms/step , 58137.16 GFLOP/s , 531580.5 tokens/s INFO:__main__:2024-10-27 08:01:01 | Epoch: 2 | Step: 64620 | Dataset: 0-12240802 | Loss: 2.114 | 676 ms/step , 58116.20 GFLOP/s , 531267.6 tokens/s INFO:__main__:2024-10-27 08:01:08 | Epoch: 2 | Step: 64630 | Dataset: 0-12248802 | Loss: 2.173 | 676 ms/step , 58129.96 GFLOP/s , 531639.9 tokens/s INFO:__main__:2024-10-27 08:01:16 | Epoch: 2 | Step: 64640 | Dataset: 0-12256802 | Loss: 2.191 | 676 ms/step , 58181.10 GFLOP/s , 532248.7 tokens/s INFO:__main__:2024-10-27 08:01:24 | Epoch: 2 | Step: 64650 | Dataset: 0-12264802 | Loss: 2.213 | 676 ms/step , 58109.31 GFLOP/s , 532032.6 tokens/s INFO:__main__:2024-10-27 08:01:32 | Epoch: 2 | Step: 64660 | Dataset: 0-12272802 | Loss: 2.223 | 707 ms/step , 55607.45 GFLOP/s , 510162.9 tokens/s INFO:__main__:2024-10-27 08:01:40 | Epoch: 2 | Step: 64670 | Dataset: 0-12280802 | Loss: 2.233 | 706 ms/step , 55675.63 GFLOP/s , 509091.0 tokens/s INFO:__main__:2024-10-27 08:01:48 | Epoch: 2 | Step: 64680 | Dataset: 0-12288802 | Loss: 2.137 | 676 ms/step , 58188.03 GFLOP/s , 517455.2 tokens/s INFO:__main__:2024-10-27 08:01:56 | Epoch: 2 | Step: 64690 | Dataset: 0-12296802 | Loss: 2.172 | 676 ms/step , 58186.83 GFLOP/s , 532723.9 tokens/s INFO:__main__:2024-10-27 08:02:03 | Epoch: 2 | Step: 64700 | Dataset: 0-12304802 | Loss: 2.142 | 675 ms/step , 58230.60 GFLOP/s , 532617.4 tokens/s INFO:__main__:2024-10-27 08:02:11 | Epoch: 2 | Step: 64710 | Dataset: 0-12312802 | Loss: 2.208 | 675 ms/step , 58246.13 GFLOP/s , 532701.4 tokens/s INFO:__main__:2024-10-27 08:02:19 | Epoch: 2 | Step: 64720 | Dataset: 0-12320802 | Loss: 2.121 | 675 ms/step , 58212.71 GFLOP/s , 532609.4 tokens/s INFO:__main__:2024-10-27 08:02:26 | Epoch: 2 | Step: 64730 | Dataset: 0-12328802 | Loss: 2.080 | 674 ms/step , 58311.67 GFLOP/s , 532556.2 tokens/s INFO:__main__:2024-10-27 08:02:34 | Epoch: 2 | Step: 64740 | Dataset: 0-12336802 | Loss: 2.091 | 675 ms/step , 58224.20 GFLOP/s , 533636.3 tokens/s INFO:__main__:2024-10-27 08:02:42 | Epoch: 2 | Step: 64750 | Dataset: 0-12344802 | Loss: 2.141 | 675 ms/step , 58243.17 GFLOP/s , 533118.9 tokens/s INFO:__main__:2024-10-27 08:02:49 | Epoch: 2 | Step: 64760 | Dataset: 0-12352802 | Loss: 2.065 | 674 ms/step , 58307.27 GFLOP/s , 533071.1 tokens/s INFO:__main__:2024-10-27 08:02:57 | Epoch: 2 | Step: 64770 | Dataset: 0-12360802 | Loss: 2.183 | 675 ms/step , 58232.04 GFLOP/s , 532723.7 tokens/s INFO:__main__:2024-10-27 08:03:05 | Epoch: 2 | Step: 64780 | Dataset: 0-12368802 | Loss: 2.055 | 674 ms/step , 58335.22 GFLOP/s , 533345.7 tokens/s INFO:__main__:2024-10-27 08:03:12 | Epoch: 2 | Step: 64790 | Dataset: 0-12376802 | Loss: 2.125 | 675 ms/step , 58214.82 GFLOP/s , 533023.2 tokens/s INFO:__main__:2024-10-27 08:03:20 | Epoch: 2 | Step: 64800 | Dataset: 0-12384802 | Loss: 2.137 | 675 ms/step , 58199.91 GFLOP/s , 532699.5 tokens/s INFO:__main__:2024-10-27 08:03:28 | Epoch: 2 | Step: 64810 | Dataset: 0-12392802 | Loss: 2.162 | 675 ms/step , 58223.99 GFLOP/s , 532930.2 tokens/s INFO:__main__:2024-10-27 08:03:35 | Epoch: 2 | Step: 64820 | Dataset: 0-12400802 | Loss: 2.161 | 678 ms/step , 57973.62 GFLOP/s , 531953.4 tokens/s INFO:__main__:2024-10-27 08:03:43 | Epoch: 2 | Step: 64830 | Dataset: 0-12408802 | Loss: 2.076 | 676 ms/step , 58169.60 GFLOP/s , 531045.1 tokens/s INFO:__main__:2024-10-27 08:03:51 | Epoch: 2 | Step: 64840 | Dataset: 0-12416802 | Loss: 2.161 | 678 ms/step , 57979.23 GFLOP/s , 530562.4 tokens/s INFO:__main__:2024-10-27 08:03:59 | Epoch: 2 | Step: 64850 | Dataset: 0-12424802 | Loss: 2.248 | 677 ms/step , 58088.04 GFLOP/s , 531049.6 tokens/s INFO:__main__:2024-10-27 08:04:06 | Epoch: 2 | Step: 64860 | Dataset: 0-12432802 | Loss: 2.202 | 677 ms/step , 58023.29 GFLOP/s , 530694.0 tokens/s INFO:__main__:2024-10-27 08:04:14 | Epoch: 2 | Step: 64870 | Dataset: 0-12440802 | Loss: 2.161 | 678 ms/step , 58014.15 GFLOP/s , 529685.3 tokens/s INFO:__main__:2024-10-27 08:04:22 | Epoch: 2 | Step: 64880 | Dataset: 0-12448802 | Loss: 2.169 | 677 ms/step , 58046.28 GFLOP/s , 530409.5 tokens/s INFO:__main__:2024-10-27 08:04:29 | Epoch: 2 | Step: 64890 | Dataset: 0-12456802 | Loss: 2.162 | 678 ms/step , 58017.00 GFLOP/s , 530593.0 tokens/s INFO:__main__:2024-10-27 08:04:37 | Epoch: 2 | Step: 64900 | Dataset: 0-12464802 | Loss: 2.176 | 679 ms/step , 57875.81 GFLOP/s , 530469.2 tokens/s INFO:__main__:2024-10-27 08:04:45 | Epoch: 2 | Step: 64910 | Dataset: 0-12472802 | Loss: 2.107 | 675 ms/step , 58236.69 GFLOP/s , 532195.5 tokens/s INFO:__main__:2024-10-27 08:04:53 | Epoch: 2 | Step: 64920 | Dataset: 0-12480802 | Loss: 2.138 | 675 ms/step , 58261.84 GFLOP/s , 533336.0 tokens/s INFO:__main__:2024-10-27 08:05:00 | Epoch: 2 | Step: 64930 | Dataset: 0-12488802 | Loss: 2.184 | 676 ms/step , 58174.07 GFLOP/s , 533057.9 tokens/s INFO:__main__:2024-10-27 08:05:08 | Epoch: 2 | Step: 64940 | Dataset: 0-12496802 | Loss: 2.142 | 673 ms/step , 58384.28 GFLOP/s , 533482.3 tokens/s INFO:__main__:2024-10-27 08:05:16 | Epoch: 2 | Step: 64950 | Dataset: 0-12504802 | Loss: 2.088 | 673 ms/step , 58370.64 GFLOP/s , 533340.8 tokens/s INFO:__main__:2024-10-27 08:05:23 | Epoch: 2 | Step: 64960 | Dataset: 0-12512802 | Loss: 2.172 | 675 ms/step , 58212.99 GFLOP/s , 533299.9 tokens/s INFO:__main__:2024-10-27 08:05:31 | Epoch: 2 | Step: 64970 | Dataset: 0-12520802 | Loss: 2.188 | 675 ms/step , 58262.03 GFLOP/s , 532235.8 tokens/s INFO:__main__:2024-10-27 08:05:39 | Epoch: 2 | Step: 64980 | Dataset: 0-12528802 | Loss: 2.179 | 675 ms/step , 58197.15 GFLOP/s , 532681.0 tokens/s INFO:__main__:2024-10-27 08:05:46 | Epoch: 2 | Step: 64990 | Dataset: 0-12536802 | Loss: 2.121 | 619 ms/step , 63492.78 GFLOP/s , 551831.9 tokens/s INFO:__main__:2024-10-27 08:05:53 | Validation | Step: 65000 | Val_loss: 2.193 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 08:05:53 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_080553_step_65000.pt` INFO:__main__:2024-10-27 08:05:55 | Epoch: 2 | Step: 65000 | Dataset: 0-12544802 | Loss: 2.157 | 674 ms/step , 58333.64 GFLOP/s , 482033.5 tokens/s INFO:__main__:2024-10-27 08:06:02 | Epoch: 2 | Step: 65010 | Dataset: 0-12552802 | Loss: 2.149 | 675 ms/step , 58263.07 GFLOP/s , 533308.5 tokens/s INFO:__main__:2024-10-27 08:06:10 | Epoch: 2 | Step: 65020 | Dataset: 0-12560802 | Loss: 2.185 | 675 ms/step , 58234.43 GFLOP/s , 533322.8 tokens/s INFO:__main__:2024-10-27 08:06:18 | Epoch: 2 | Step: 65030 | Dataset: 0-12568802 | Loss: 2.148 | 675 ms/step , 58222.44 GFLOP/s , 532684.3 tokens/s INFO:__main__:2024-10-27 08:06:25 | Epoch: 2 | Step: 65040 | Dataset: 0-12576802 | Loss: 2.097 | 675 ms/step , 58257.17 GFLOP/s , 532830.7 tokens/s INFO:__main__:2024-10-27 08:06:33 | Epoch: 2 | Step: 65050 | Dataset: 0-12584802 | Loss: 2.125 | 676 ms/step , 58192.07 GFLOP/s , 532428.0 tokens/s INFO:__main__:2024-10-27 08:06:41 | Epoch: 2 | Step: 65060 | Dataset: 0-12592802 | Loss: 2.088 | 675 ms/step , 58194.03 GFLOP/s , 532811.6 tokens/s INFO:__main__:2024-10-27 08:06:48 | Epoch: 2 | Step: 65070 | Dataset: 0-12600802 | Loss: 2.091 | 673 ms/step , 58372.69 GFLOP/s , 533120.0 tokens/s INFO:__main__:2024-10-27 08:06:56 | Epoch: 2 | Step: 65080 | Dataset: 0-12608802 | Loss: 2.133 | 673 ms/step , 58400.32 GFLOP/s , 533088.3 tokens/s INFO:__main__:2024-10-27 08:07:04 | Epoch: 2 | Step: 65090 | Dataset: 0-12616802 | Loss: 2.163 | 676 ms/step , 58147.85 GFLOP/s , 532783.8 tokens/s INFO:__main__:2024-10-27 08:07:11 | Epoch: 2 | Step: 65100 | Dataset: 0-12624802 | Loss: 2.164 | 674 ms/step , 58303.03 GFLOP/s , 532768.3 tokens/s INFO:__main__:2024-10-27 08:07:19 | Epoch: 2 | Step: 65110 | Dataset: 0-12632802 | Loss: 2.204 | 675 ms/step , 58269.68 GFLOP/s , 532899.3 tokens/s INFO:__main__:2024-10-27 08:07:27 | Epoch: 2 | Step: 65120 | Dataset: 0-12640802 | Loss: 2.127 | 675 ms/step , 58232.06 GFLOP/s , 532664.6 tokens/s INFO:__main__:2024-10-27 08:07:35 | Epoch: 2 | Step: 65130 | Dataset: 0-12648802 | Loss: 2.175 | 675 ms/step , 58265.39 GFLOP/s , 532649.0 tokens/s INFO:__main__:2024-10-27 08:07:42 | Epoch: 2 | Step: 65140 | Dataset: 0-12656802 | Loss: 2.068 | 675 ms/step , 58266.22 GFLOP/s , 532875.3 tokens/s INFO:__main__:2024-10-27 08:07:50 | Epoch: 2 | Step: 65150 | Dataset: 0-12664802 | Loss: 2.160 | 674 ms/step , 58348.02 GFLOP/s , 532950.3 tokens/s INFO:__main__:2024-10-27 08:07:58 | Epoch: 2 | Step: 65160 | Dataset: 0-12672802 | Loss: 2.214 | 674 ms/step , 58327.95 GFLOP/s , 532873.9 tokens/s INFO:__main__:2024-10-27 08:08:05 | Epoch: 2 | Step: 65170 | Dataset: 0-12680802 | Loss: 2.160 | 674 ms/step , 58281.00 GFLOP/s , 533033.7 tokens/s INFO:__main__:2024-10-27 08:08:13 | Epoch: 2 | Step: 65180 | Dataset: 0-12688802 | Loss: 2.140 | 675 ms/step , 58204.68 GFLOP/s , 532814.5 tokens/s INFO:__main__:2024-10-27 08:08:21 | Epoch: 2 | Step: 65190 | Dataset: 0-12696802 | Loss: 2.224 | 675 ms/step , 58259.89 GFLOP/s , 532631.4 tokens/s INFO:__main__:2024-10-27 08:08:28 | Epoch: 2 | Step: 65200 | Dataset: 0-12704802 | Loss: 2.098 | 674 ms/step , 58327.70 GFLOP/s , 533220.7 tokens/s INFO:__main__:2024-10-27 08:08:36 | Epoch: 2 | Step: 65210 | Dataset: 0-12712802 | Loss: 2.118 | 675 ms/step , 58202.57 GFLOP/s , 532848.7 tokens/s INFO:__main__:2024-10-27 08:08:44 | Epoch: 2 | Step: 65220 | Dataset: 0-12720802 | Loss: 2.123 | 676 ms/step , 58183.22 GFLOP/s , 532411.3 tokens/s INFO:__main__:2024-10-27 08:08:51 | Epoch: 2 | Step: 65230 | Dataset: 0-12728802 | Loss: 2.137 | 674 ms/step , 58285.39 GFLOP/s , 532607.5 tokens/s INFO:__main__:2024-10-27 08:08:59 | Epoch: 2 | Step: 65240 | Dataset: 0-12736802 | Loss: 2.191 | 674 ms/step , 58279.41 GFLOP/s , 533138.6 tokens/s INFO:__main__:2024-10-27 08:09:07 | Epoch: 2 | Step: 65250 | Dataset: 0-12744802 | Loss: 2.120 | 676 ms/step , 58171.60 GFLOP/s , 533025.4 tokens/s INFO:__main__:2024-10-27 08:09:14 | Epoch: 2 | Step: 65260 | Dataset: 0-12752802 | Loss: 2.175 | 675 ms/step , 58255.30 GFLOP/s , 533244.2 tokens/s INFO:__main__:2024-10-27 08:09:22 | Epoch: 2 | Step: 65270 | Dataset: 0-12760802 | Loss: 2.162 | 673 ms/step , 58399.31 GFLOP/s , 532559.2 tokens/s INFO:__main__:2024-10-27 08:09:30 | Epoch: 2 | Step: 65280 | Dataset: 0-12768802 | Loss: 2.200 | 674 ms/step , 58296.64 GFLOP/s , 533157.3 tokens/s INFO:__main__:2024-10-27 08:09:38 | Epoch: 2 | Step: 65290 | Dataset: 0-12776802 | Loss: 2.206 | 674 ms/step , 58295.11 GFLOP/s , 532919.0 tokens/s INFO:__main__:2024-10-27 08:09:45 | Epoch: 2 | Step: 65300 | Dataset: 0-12784802 | Loss: 2.242 | 675 ms/step , 58228.01 GFLOP/s , 532747.9 tokens/s INFO:__main__:2024-10-27 08:09:53 | Epoch: 2 | Step: 65310 | Dataset: 0-12792802 | Loss: 2.118 | 675 ms/step , 58207.24 GFLOP/s , 532515.1 tokens/s INFO:__main__:2024-10-27 08:10:01 | Epoch: 2 | Step: 65320 | Dataset: 0-12800802 | Loss: 2.110 | 675 ms/step , 58241.81 GFLOP/s , 532935.9 tokens/s INFO:__main__:2024-10-27 08:10:08 | Epoch: 2 | Step: 65330 | Dataset: 0-12808802 | Loss: 2.418 | 675 ms/step , 58207.07 GFLOP/s , 532710.6 tokens/s INFO:__main__:2024-10-27 08:10:16 | Epoch: 2 | Step: 65340 | Dataset: 0-12816802 | Loss: 2.374 | 675 ms/step , 58193.02 GFLOP/s , 532418.6 tokens/s INFO:__main__:2024-10-27 08:10:24 | Epoch: 2 | Step: 65350 | Dataset: 0-12824802 | Loss: 2.367 | 676 ms/step , 58176.87 GFLOP/s , 532589.6 tokens/s INFO:__main__:2024-10-27 08:10:31 | Epoch: 2 | Step: 65360 | Dataset: 0-12832802 | Loss: 2.358 | 675 ms/step , 58198.52 GFLOP/s , 532334.6 tokens/s INFO:__main__:2024-10-27 08:10:39 | Epoch: 2 | Step: 65370 | Dataset: 0-12840802 | Loss: 2.335 | 675 ms/step , 58270.30 GFLOP/s , 532447.0 tokens/s INFO:__main__:2024-10-27 08:10:47 | Epoch: 2 | Step: 65380 | Dataset: 0-12848802 | Loss: 2.266 | 675 ms/step , 58235.68 GFLOP/s , 532135.2 tokens/s INFO:__main__:2024-10-27 08:10:54 | Epoch: 2 | Step: 65390 | Dataset: 0-12856802 | Loss: 2.311 | 675 ms/step , 58245.85 GFLOP/s , 532595.0 tokens/s INFO:__main__:2024-10-27 08:11:02 | Epoch: 2 | Step: 65400 | Dataset: 0-12864802 | Loss: 2.353 | 675 ms/step , 58240.64 GFLOP/s , 532628.8 tokens/s INFO:__main__:2024-10-27 08:11:10 | Epoch: 2 | Step: 65410 | Dataset: 0-12872802 | Loss: 2.244 | 675 ms/step , 58241.83 GFLOP/s , 532688.0 tokens/s INFO:__main__:2024-10-27 08:11:17 | Epoch: 2 | Step: 65420 | Dataset: 0-12880802 | Loss: 2.310 | 675 ms/step , 58230.85 GFLOP/s , 532705.9 tokens/s INFO:__main__:2024-10-27 08:11:25 | Epoch: 2 | Step: 65430 | Dataset: 0-12888802 | Loss: 2.383 | 674 ms/step , 58326.71 GFLOP/s , 532856.5 tokens/s INFO:__main__:2024-10-27 08:11:33 | Epoch: 2 | Step: 65440 | Dataset: 0-12896802 | Loss: 2.317 | 674 ms/step , 58288.00 GFLOP/s , 532932.1 tokens/s INFO:__main__:2024-10-27 08:11:41 | Epoch: 2 | Step: 65450 | Dataset: 0-12904802 | Loss: 2.308 | 675 ms/step , 58246.37 GFLOP/s , 532764.8 tokens/s INFO:__main__:2024-10-27 08:11:48 | Epoch: 2 | Step: 65460 | Dataset: 0-12912802 | Loss: 2.277 | 675 ms/step , 58260.08 GFLOP/s , 533222.9 tokens/s INFO:__main__:2024-10-27 08:11:56 | Epoch: 2 | Step: 65470 | Dataset: 0-12920802 | Loss: 2.309 | 674 ms/step , 58288.77 GFLOP/s , 533450.2 tokens/s INFO:__main__:2024-10-27 08:12:04 | Epoch: 2 | Step: 65480 | Dataset: 0-12928802 | Loss: 2.362 | 675 ms/step , 58270.29 GFLOP/s , 533115.9 tokens/s INFO:__main__:2024-10-27 08:12:11 | Epoch: 2 | Step: 65490 | Dataset: 0-12936802 | Loss: 2.028 | 675 ms/step , 58244.93 GFLOP/s , 532058.3 tokens/s INFO:__main__:2024-10-27 08:12:19 | Epoch: 2 | Step: 65500 | Dataset: 0-12944802 | Loss: 1.939 | 675 ms/step , 58275.58 GFLOP/s , 532574.7 tokens/s INFO:__main__:2024-10-27 08:12:27 | Epoch: 2 | Step: 65510 | Dataset: 0-12952802 | Loss: 1.885 | 674 ms/step , 58294.33 GFLOP/s , 532967.4 tokens/s INFO:__main__:2024-10-27 08:12:34 | Epoch: 2 | Step: 65520 | Dataset: 0-12960802 | Loss: 1.814 | 676 ms/step , 58140.97 GFLOP/s , 532341.0 tokens/s INFO:__main__:2024-10-27 08:12:42 | Epoch: 2 | Step: 65530 | Dataset: 0-12968802 | Loss: 1.838 | 675 ms/step , 58251.09 GFLOP/s , 532447.0 tokens/s INFO:__main__:2024-10-27 08:12:50 | Epoch: 2 | Step: 65540 | Dataset: 0-12976802 | Loss: 1.811 | 674 ms/step , 58306.27 GFLOP/s , 532713.8 tokens/s INFO:__main__:2024-10-27 08:12:57 | Epoch: 2 | Step: 65550 | Dataset: 0-12984802 | Loss: 1.815 | 676 ms/step , 58109.66 GFLOP/s , 530517.7 tokens/s INFO:__main__:2024-10-27 08:13:05 | Epoch: 2 | Step: 65560 | Dataset: 0-12992802 | Loss: 1.822 | 676 ms/step , 58171.10 GFLOP/s , 530163.6 tokens/s INFO:__main__:2024-10-27 08:13:13 | Epoch: 2 | Step: 65570 | Dataset: 0-13000802 | Loss: 1.788 | 675 ms/step , 58235.13 GFLOP/s , 530547.8 tokens/s INFO:__main__:2024-10-27 08:13:21 | Epoch: 2 | Step: 65580 | Dataset: 0-13008802 | Loss: 1.780 | 675 ms/step , 58212.45 GFLOP/s , 531691.1 tokens/s INFO:__main__:2024-10-27 08:13:28 | Epoch: 2 | Step: 65590 | Dataset: 0-13016802 | Loss: 1.796 | 675 ms/step , 58257.89 GFLOP/s , 531184.0 tokens/s INFO:__main__:2024-10-27 08:13:36 | Epoch: 2 | Step: 65600 | Dataset: 0-13024802 | Loss: 1.783 | 675 ms/step , 58232.26 GFLOP/s , 530028.1 tokens/s INFO:__main__:2024-10-27 08:13:44 | Epoch: 2 | Step: 65610 | Dataset: 0-13032802 | Loss: 1.773 | 676 ms/step , 58138.50 GFLOP/s , 530119.5 tokens/s INFO:__main__:2024-10-27 08:13:51 | Epoch: 2 | Step: 65620 | Dataset: 0-13040802 | Loss: 1.759 | 675 ms/step , 58239.01 GFLOP/s , 532361.1 tokens/s INFO:__main__:2024-10-27 08:13:59 | Epoch: 2 | Step: 65630 | Dataset: 0-13048802 | Loss: 1.786 | 675 ms/step , 58214.68 GFLOP/s , 532605.6 tokens/s INFO:__main__:2024-10-27 08:14:07 | Epoch: 2 | Step: 65640 | Dataset: 0-13056802 | Loss: 1.743 | 675 ms/step , 58241.54 GFLOP/s , 532501.3 tokens/s INFO:__main__:2024-10-27 08:14:15 | Epoch: 2 | Step: 65650 | Dataset: 0-13064802 | Loss: 1.749 | 674 ms/step , 58342.05 GFLOP/s , 532684.7 tokens/s INFO:__main__:2024-10-27 08:14:22 | Epoch: 2 | Step: 65660 | Dataset: 0-13072802 | Loss: 2.563 | 675 ms/step , 58235.59 GFLOP/s , 532459.1 tokens/s INFO:__main__:2024-10-27 08:14:30 | Epoch: 2 | Step: 65670 | Dataset: 0-13080802 | Loss: 2.291 | 674 ms/step , 58312.87 GFLOP/s , 533076.1 tokens/s INFO:__main__:2024-10-27 08:14:38 | Epoch: 2 | Step: 65680 | Dataset: 0-13088802 | Loss: 2.202 | 675 ms/step , 58255.93 GFLOP/s , 532898.1 tokens/s INFO:__main__:2024-10-27 08:14:45 | Epoch: 2 | Step: 65690 | Dataset: 0-13096802 | Loss: 2.118 | 674 ms/step , 58306.67 GFLOP/s , 532856.7 tokens/s INFO:__main__:2024-10-27 08:14:53 | Epoch: 2 | Step: 65700 | Dataset: 0-13104802 | Loss: 2.112 | 675 ms/step , 58260.18 GFLOP/s , 532690.5 tokens/s INFO:__main__:2024-10-27 08:15:01 | Epoch: 2 | Step: 65710 | Dataset: 0-13112802 | Loss: 2.076 | 675 ms/step , 58267.87 GFLOP/s , 532394.7 tokens/s INFO:__main__:2024-10-27 08:15:08 | Epoch: 2 | Step: 65720 | Dataset: 0-13120802 | Loss: 2.022 | 675 ms/step , 58239.54 GFLOP/s , 533119.6 tokens/s INFO:__main__:2024-10-27 08:15:16 | Epoch: 2 | Step: 65730 | Dataset: 0-13128802 | Loss: 2.149 | 674 ms/step , 58293.80 GFLOP/s , 533017.5 tokens/s INFO:__main__:2024-10-27 08:15:24 | Epoch: 2 | Step: 65740 | Dataset: 0-13136802 | Loss: 2.017 | 675 ms/step , 58208.36 GFLOP/s , 533590.5 tokens/s INFO:__main__:2024-10-27 08:15:31 | Epoch: 2 | Step: 65750 | Dataset: 0-13144802 | Loss: 1.992 | 674 ms/step , 58300.00 GFLOP/s , 533675.7 tokens/s INFO:__main__:2024-10-27 08:15:39 | Epoch: 2 | Step: 65760 | Dataset: 0-13152802 | Loss: 2.125 | 674 ms/step , 58340.00 GFLOP/s , 533225.8 tokens/s INFO:__main__:2024-10-27 08:15:47 | Epoch: 2 | Step: 65770 | Dataset: 0-13160802 | Loss: 2.099 | 675 ms/step , 58237.97 GFLOP/s , 533362.6 tokens/s INFO:__main__:2024-10-27 08:15:54 | Epoch: 2 | Step: 65780 | Dataset: 0-13168802 | Loss: 2.040 | 674 ms/step , 58308.41 GFLOP/s , 533076.0 tokens/s INFO:__main__:2024-10-27 08:16:02 | Epoch: 2 | Step: 65790 | Dataset: 0-13176802 | Loss: 2.020 | 674 ms/step , 58361.00 GFLOP/s , 533745.8 tokens/s INFO:__main__:2024-10-27 08:16:10 | Epoch: 2 | Step: 65800 | Dataset: 0-13184802 | Loss: 2.051 | 674 ms/step , 58317.55 GFLOP/s , 533784.1 tokens/s INFO:__main__:2024-10-27 08:16:17 | Epoch: 2 | Step: 65810 | Dataset: 0-13192802 | Loss: 2.026 | 674 ms/step , 58304.83 GFLOP/s , 533521.7 tokens/s INFO:__main__:2024-10-27 08:16:25 | Epoch: 2 | Step: 65820 | Dataset: 0-13200802 | Loss: 2.005 | 675 ms/step , 58231.44 GFLOP/s , 533035.8 tokens/s INFO:__main__:2024-10-27 08:16:33 | Epoch: 2 | Step: 65830 | Dataset: 0-13208802 | Loss: 2.360 | 675 ms/step , 58275.57 GFLOP/s , 532863.2 tokens/s INFO:__main__:2024-10-27 08:16:41 | Epoch: 2 | Step: 65840 | Dataset: 0-13216802 | Loss: 2.269 | 674 ms/step , 58280.67 GFLOP/s , 533182.6 tokens/s INFO:__main__:2024-10-27 08:16:48 | Epoch: 2 | Step: 65850 | Dataset: 0-13224802 | Loss: 2.238 | 675 ms/step , 58210.13 GFLOP/s , 533015.8 tokens/s INFO:__main__:2024-10-27 08:16:56 | Epoch: 2 | Step: 65860 | Dataset: 0-13232802 | Loss: 2.170 | 675 ms/step , 58251.00 GFLOP/s , 532934.8 tokens/s INFO:__main__:2024-10-27 08:17:04 | Epoch: 2 | Step: 65870 | Dataset: 0-13240802 | Loss: 2.203 | 675 ms/step , 58259.89 GFLOP/s , 532757.8 tokens/s INFO:__main__:2024-10-27 08:17:11 | Epoch: 2 | Step: 65880 | Dataset: 0-13248802 | Loss: 2.180 | 680 ms/step , 57806.28 GFLOP/s , 532461.2 tokens/s INFO:__main__:2024-10-27 08:17:19 | Epoch: 2 | Step: 65890 | Dataset: 0-13256802 | Loss: 2.191 | 675 ms/step , 58203.81 GFLOP/s , 532737.5 tokens/s INFO:__main__:2024-10-27 08:17:27 | Epoch: 2 | Step: 65900 | Dataset: 0-13264802 | Loss: 2.124 | 674 ms/step , 58325.51 GFLOP/s , 533118.0 tokens/s INFO:__main__:2024-10-27 08:17:34 | Epoch: 2 | Step: 65910 | Dataset: 0-13272802 | Loss: 2.209 | 674 ms/step , 58285.51 GFLOP/s , 533544.0 tokens/s INFO:__main__:2024-10-27 08:17:42 | Epoch: 2 | Step: 65920 | Dataset: 0-13280802 | Loss: 2.102 | 675 ms/step , 58218.80 GFLOP/s , 532800.8 tokens/s INFO:__main__:2024-10-27 08:17:50 | Epoch: 2 | Step: 65930 | Dataset: 0-13288802 | Loss: 2.156 | 676 ms/step , 58170.39 GFLOP/s , 532889.0 tokens/s INFO:__main__:2024-10-27 08:17:57 | Epoch: 2 | Step: 65940 | Dataset: 0-13296802 | Loss: 2.213 | 675 ms/step , 58263.69 GFLOP/s , 533263.4 tokens/s INFO:__main__:2024-10-27 08:18:05 | Epoch: 2 | Step: 65950 | Dataset: 0-13304802 | Loss: 2.235 | 674 ms/step , 58321.15 GFLOP/s , 533478.2 tokens/s INFO:__main__:2024-10-27 08:18:13 | Epoch: 2 | Step: 65960 | Dataset: 0-13312802 | Loss: 2.124 | 674 ms/step , 58288.25 GFLOP/s , 533050.3 tokens/s INFO:__main__:2024-10-27 08:18:20 | Epoch: 2 | Step: 65970 | Dataset: 0-13320802 | Loss: 2.121 | 675 ms/step , 58229.28 GFLOP/s , 532218.5 tokens/s INFO:__main__:2024-10-27 08:18:28 | Epoch: 2 | Step: 65980 | Dataset: 0-13328802 | Loss: 2.125 | 674 ms/step , 58290.55 GFLOP/s , 532970.2 tokens/s INFO:__main__:2024-10-27 08:18:36 | Epoch: 2 | Step: 65990 | Dataset: 0-13336802 | Loss: 2.108 | 673 ms/step , 58375.96 GFLOP/s , 532931.7 tokens/s INFO:__main__:2024-10-27 08:18:43 | Validation | Step: 66000 | Val_loss: 2.126 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 08:18:43 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_081843_step_66000.pt` INFO:__main__:2024-10-27 08:18:44 | Epoch: 2 | Step: 66000 | Dataset: 0-13344802 | Loss: 2.198 | 674 ms/step , 58330.93 GFLOP/s , 480263.8 tokens/s INFO:__main__:2024-10-27 08:18:52 | Epoch: 2 | Step: 66010 | Dataset: 0-13352802 | Loss: 2.162 | 674 ms/step , 58330.08 GFLOP/s , 532950.7 tokens/s INFO:__main__:2024-10-27 08:19:00 | Epoch: 2 | Step: 66020 | Dataset: 0-13360802 | Loss: 2.145 | 674 ms/step , 58346.04 GFLOP/s , 533059.7 tokens/s INFO:__main__:2024-10-27 08:19:07 | Epoch: 2 | Step: 66030 | Dataset: 0-13368802 | Loss: 2.117 | 675 ms/step , 58220.46 GFLOP/s , 532765.9 tokens/s INFO:__main__:2024-10-27 08:19:15 | Epoch: 2 | Step: 66040 | Dataset: 0-13376802 | Loss: 2.150 | 675 ms/step , 58245.07 GFLOP/s , 532456.5 tokens/s INFO:__main__:2024-10-27 08:19:23 | Epoch: 2 | Step: 66050 | Dataset: 0-13384802 | Loss: 2.151 | 675 ms/step , 58267.04 GFLOP/s , 532772.2 tokens/s INFO:__main__:2024-10-27 08:19:30 | Epoch: 2 | Step: 66060 | Dataset: 0-13392802 | Loss: 2.049 | 674 ms/step , 58295.91 GFLOP/s , 533280.4 tokens/s INFO:__main__:2024-10-27 08:19:38 | Epoch: 2 | Step: 66070 | Dataset: 0-13400802 | Loss: 2.251 | 674 ms/step , 58360.82 GFLOP/s , 533241.9 tokens/s INFO:__main__:2024-10-27 08:19:46 | Epoch: 2 | Step: 66080 | Dataset: 0-13408802 | Loss: 2.064 | 674 ms/step , 58298.98 GFLOP/s , 532929.2 tokens/s INFO:__main__:2024-10-27 08:19:54 | Epoch: 2 | Step: 66090 | Dataset: 0-13416802 | Loss: 2.122 | 675 ms/step , 58251.08 GFLOP/s , 532661.4 tokens/s INFO:__main__:2024-10-27 08:20:01 | Epoch: 2 | Step: 66100 | Dataset: 0-13424802 | Loss: 2.054 | 675 ms/step , 58228.17 GFLOP/s , 532703.8 tokens/s INFO:__main__:2024-10-27 08:20:09 | Epoch: 2 | Step: 66110 | Dataset: 0-13432802 | Loss: 2.187 | 674 ms/step , 58283.31 GFLOP/s , 532508.8 tokens/s INFO:__main__:2024-10-27 08:20:17 | Epoch: 2 | Step: 66120 | Dataset: 0-13440802 | Loss: 2.131 | 675 ms/step , 58215.52 GFLOP/s , 532600.5 tokens/s INFO:__main__:2024-10-27 08:20:24 | Epoch: 2 | Step: 66130 | Dataset: 0-13448802 | Loss: 2.146 | 676 ms/step , 58135.43 GFLOP/s , 531967.9 tokens/s INFO:__main__:2024-10-27 08:20:32 | Epoch: 2 | Step: 66140 | Dataset: 0-13456802 | Loss: 2.076 | 675 ms/step , 58200.99 GFLOP/s , 532209.7 tokens/s INFO:__main__:2024-10-27 08:20:40 | Epoch: 2 | Step: 66150 | Dataset: 0-13464802 | Loss: 1.866 | 676 ms/step , 58157.26 GFLOP/s , 531815.7 tokens/s INFO:__main__:2024-10-27 08:20:47 | Epoch: 2 | Step: 66160 | Dataset: 0-13472802 | Loss: 1.772 | 675 ms/step , 58208.33 GFLOP/s , 531604.3 tokens/s INFO:__main__:2024-10-27 08:20:55 | Epoch: 2 | Step: 66170 | Dataset: 0-13480802 | Loss: 1.729 | 676 ms/step , 58115.12 GFLOP/s , 531673.2 tokens/s INFO:__main__:2024-10-27 08:21:03 | Epoch: 2 | Step: 66180 | Dataset: 0-13488802 | Loss: 1.696 | 675 ms/step , 58261.19 GFLOP/s , 531596.1 tokens/s INFO:__main__:2024-10-27 08:21:10 | Epoch: 2 | Step: 66190 | Dataset: 0-13496802 | Loss: 1.689 | 675 ms/step , 58199.56 GFLOP/s , 532320.0 tokens/s INFO:__main__:2024-10-27 08:21:18 | Epoch: 2 | Step: 66200 | Dataset: 0-13504802 | Loss: 1.681 | 675 ms/step , 58245.61 GFLOP/s , 532152.9 tokens/s INFO:__main__:2024-10-27 08:21:26 | Epoch: 2 | Step: 66210 | Dataset: 0-13512802 | Loss: 1.664 | 676 ms/step , 58148.17 GFLOP/s , 530678.1 tokens/s INFO:__main__:2024-10-27 08:21:34 | Epoch: 2 | Step: 66220 | Dataset: 0-13520802 | Loss: 1.686 | 675 ms/step , 58197.24 GFLOP/s , 531793.9 tokens/s INFO:__main__:2024-10-27 08:21:41 | Epoch: 2 | Step: 66230 | Dataset: 0-13528802 | Loss: 1.679 | 675 ms/step , 58206.38 GFLOP/s , 532146.2 tokens/s INFO:__main__:2024-10-27 08:21:49 | Epoch: 2 | Step: 66240 | Dataset: 0-13536802 | Loss: 1.767 | 675 ms/step , 58221.08 GFLOP/s , 532111.1 tokens/s INFO:__main__:2024-10-27 08:21:57 | Epoch: 2 | Step: 66250 | Dataset: 0-13544802 | Loss: 1.750 | 675 ms/step , 58265.80 GFLOP/s , 532169.6 tokens/s INFO:__main__:2024-10-27 08:22:04 | Epoch: 2 | Step: 66260 | Dataset: 0-13552802 | Loss: 1.778 | 676 ms/step , 58192.84 GFLOP/s , 532611.1 tokens/s INFO:__main__:2024-10-27 08:22:12 | Epoch: 2 | Step: 66270 | Dataset: 0-13560802 | Loss: 1.780 | 676 ms/step , 58183.45 GFLOP/s , 532136.0 tokens/s INFO:__main__:2024-10-27 08:22:20 | Epoch: 2 | Step: 66280 | Dataset: 0-13568802 | Loss: 1.743 | 675 ms/step , 58225.05 GFLOP/s , 532151.2 tokens/s INFO:__main__:2024-10-27 08:22:27 | Epoch: 2 | Step: 66290 | Dataset: 0-13576802 | Loss: 1.732 | 675 ms/step , 58258.46 GFLOP/s , 532103.2 tokens/s INFO:__main__:2024-10-27 08:22:35 | Epoch: 2 | Step: 66300 | Dataset: 0-13584802 | Loss: 1.743 | 675 ms/step , 58213.57 GFLOP/s , 532013.4 tokens/s INFO:__main__:2024-10-27 08:22:43 | Epoch: 2 | Step: 66310 | Dataset: 0-13592802 | Loss: 1.739 | 674 ms/step , 58279.77 GFLOP/s , 532355.2 tokens/s INFO:__main__:2024-10-27 08:22:51 | Epoch: 2 | Step: 66320 | Dataset: 0-13600802 | Loss: 1.773 | 675 ms/step , 58272.84 GFLOP/s , 532331.3 tokens/s INFO:__main__:2024-10-27 08:22:58 | Epoch: 2 | Step: 66330 | Dataset: 0-13608802 | Loss: 2.159 | 675 ms/step , 58238.15 GFLOP/s , 533068.7 tokens/s INFO:__main__:2024-10-27 08:23:06 | Epoch: 2 | Step: 66340 | Dataset: 0-13616802 | Loss: 2.150 | 675 ms/step , 58193.40 GFLOP/s , 532481.7 tokens/s INFO:__main__:2024-10-27 08:23:14 | Epoch: 2 | Step: 66350 | Dataset: 0-13624802 | Loss: 2.149 | 675 ms/step , 58209.89 GFLOP/s , 532464.4 tokens/s INFO:__main__:2024-10-27 08:23:21 | Epoch: 2 | Step: 66360 | Dataset: 0-13632802 | Loss: 2.160 | 674 ms/step , 58288.14 GFLOP/s , 532748.8 tokens/s INFO:__main__:2024-10-27 08:23:29 | Epoch: 2 | Step: 66370 | Dataset: 0-13640802 | Loss: 2.147 | 677 ms/step , 58097.86 GFLOP/s , 533103.8 tokens/s INFO:__main__:2024-10-27 08:23:37 | Epoch: 2 | Step: 66380 | Dataset: 0-13648802 | Loss: 2.173 | 675 ms/step , 58195.94 GFLOP/s , 533086.5 tokens/s INFO:__main__:2024-10-27 08:23:44 | Epoch: 2 | Step: 66390 | Dataset: 0-13656802 | Loss: 2.179 | 676 ms/step , 58189.38 GFLOP/s , 532384.8 tokens/s INFO:__main__:2024-10-27 08:23:52 | Epoch: 2 | Step: 66400 | Dataset: 0-13664802 | Loss: 2.256 | 674 ms/step , 58319.01 GFLOP/s , 532119.6 tokens/s INFO:__main__:2024-10-27 08:24:00 | Epoch: 2 | Step: 66410 | Dataset: 0-13672802 | Loss: 2.151 | 674 ms/step , 58290.35 GFLOP/s , 533073.0 tokens/s INFO:__main__:2024-10-27 08:24:07 | Epoch: 2 | Step: 66420 | Dataset: 0-13680802 | Loss: 2.153 | 674 ms/step , 58330.98 GFLOP/s , 532901.6 tokens/s INFO:__main__:2024-10-27 08:24:15 | Epoch: 2 | Step: 66430 | Dataset: 0-13688802 | Loss: 2.241 | 674 ms/step , 58326.57 GFLOP/s , 533041.6 tokens/s INFO:__main__:2024-10-27 08:24:23 | Epoch: 2 | Step: 66440 | Dataset: 0-13696802 | Loss: 2.164 | 675 ms/step , 58232.02 GFLOP/s , 532197.1 tokens/s INFO:__main__:2024-10-27 08:24:31 | Epoch: 2 | Step: 66450 | Dataset: 0-13704802 | Loss: 2.156 | 675 ms/step , 58208.94 GFLOP/s , 532629.9 tokens/s INFO:__main__:2024-10-27 08:24:38 | Epoch: 2 | Step: 66460 | Dataset: 0-13712802 | Loss: 2.157 | 674 ms/step , 58365.63 GFLOP/s , 533083.8 tokens/s INFO:__main__:2024-10-27 08:24:46 | Epoch: 2 | Step: 66470 | Dataset: 0-13720802 | Loss: 2.136 | 675 ms/step , 58267.71 GFLOP/s , 533255.8 tokens/s INFO:__main__:2024-10-27 08:24:54 | Epoch: 2 | Step: 66480 | Dataset: 0-13728802 | Loss: 2.176 | 674 ms/step , 58341.88 GFLOP/s , 533330.8 tokens/s INFO:__main__:2024-10-27 08:25:01 | Epoch: 2 | Step: 66490 | Dataset: 0-13736802 | Loss: 2.146 | 675 ms/step , 58270.31 GFLOP/s , 533076.8 tokens/s INFO:__main__:2024-10-27 08:25:09 | Epoch: 2 | Step: 66500 | Dataset: 0-13744802 | Loss: 2.200 | 673 ms/step , 58368.36 GFLOP/s , 533325.6 tokens/s INFO:__main__:2024-10-27 08:25:17 | Epoch: 2 | Step: 66510 | Dataset: 0-13752802 | Loss: 2.163 | 675 ms/step , 58268.05 GFLOP/s , 533448.1 tokens/s INFO:__main__:2024-10-27 08:25:24 | Epoch: 2 | Step: 66520 | Dataset: 0-13760802 | Loss: 2.243 | 675 ms/step , 58274.58 GFLOP/s , 533118.5 tokens/s INFO:__main__:2024-10-27 08:25:32 | Epoch: 2 | Step: 66530 | Dataset: 0-13768802 | Loss: 2.113 | 675 ms/step , 58233.29 GFLOP/s , 532812.8 tokens/s INFO:__main__:2024-10-27 08:25:40 | Epoch: 2 | Step: 66540 | Dataset: 0-13776802 | Loss: 2.232 | 675 ms/step , 58246.98 GFLOP/s , 532815.5 tokens/s INFO:__main__:2024-10-27 08:25:47 | Epoch: 2 | Step: 66550 | Dataset: 0-13784802 | Loss: 2.196 | 676 ms/step , 58184.08 GFLOP/s , 532974.0 tokens/s INFO:__main__:2024-10-27 08:25:55 | Epoch: 2 | Step: 66560 | Dataset: 0-13792802 | Loss: 2.132 | 675 ms/step , 58257.24 GFLOP/s , 533169.2 tokens/s INFO:__main__:2024-10-27 08:26:03 | Epoch: 2 | Step: 66570 | Dataset: 0-13800802 | Loss: 2.166 | 675 ms/step , 58233.36 GFLOP/s , 532651.9 tokens/s INFO:__main__:2024-10-27 08:26:10 | Epoch: 2 | Step: 66580 | Dataset: 0-13808802 | Loss: 2.145 | 675 ms/step , 58244.09 GFLOP/s , 532716.1 tokens/s INFO:__main__:2024-10-27 08:26:18 | Epoch: 2 | Step: 66590 | Dataset: 0-13816802 | Loss: 2.100 | 675 ms/step , 58250.84 GFLOP/s , 532428.0 tokens/s INFO:__main__:2024-10-27 08:26:26 | Epoch: 2 | Step: 66600 | Dataset: 0-13824802 | Loss: 2.228 | 674 ms/step , 58315.31 GFLOP/s , 532847.7 tokens/s INFO:__main__:2024-10-27 08:26:33 | Epoch: 2 | Step: 66610 | Dataset: 0-13832802 | Loss: 2.231 | 674 ms/step , 58287.96 GFLOP/s , 532931.8 tokens/s INFO:__main__:2024-10-27 08:26:41 | Epoch: 2 | Step: 66620 | Dataset: 0-13840802 | Loss: 2.138 | 676 ms/step , 58145.49 GFLOP/s , 532650.9 tokens/s INFO:__main__:2024-10-27 08:26:49 | Epoch: 2 | Step: 66630 | Dataset: 0-13848802 | Loss: 2.155 | 674 ms/step , 58355.74 GFLOP/s , 533303.9 tokens/s INFO:__main__:2024-10-27 08:26:57 | Epoch: 2 | Step: 66640 | Dataset: 0-13856802 | Loss: 2.164 | 673 ms/step , 58376.29 GFLOP/s , 533412.2 tokens/s INFO:__main__:2024-10-27 08:27:04 | Epoch: 2 | Step: 66650 | Dataset: 0-13864802 | Loss: 2.229 | 675 ms/step , 58220.81 GFLOP/s , 532661.2 tokens/s INFO:__main__:2024-10-27 08:27:12 | Epoch: 2 | Step: 66660 | Dataset: 0-13872802 | Loss: 2.121 | 675 ms/step , 58205.62 GFLOP/s , 532917.3 tokens/s INFO:__main__:2024-10-27 08:27:20 | Epoch: 2 | Step: 66670 | Dataset: 0-13880802 | Loss: 2.155 | 674 ms/step , 58325.60 GFLOP/s , 533066.5 tokens/s INFO:__main__:2024-10-27 08:27:27 | Epoch: 2 | Step: 66680 | Dataset: 0-13888802 | Loss: 2.095 | 674 ms/step , 58314.76 GFLOP/s , 533192.2 tokens/s INFO:__main__:2024-10-27 08:27:35 | Epoch: 2 | Step: 66690 | Dataset: 0-13896802 | Loss: 2.171 | 674 ms/step , 58331.27 GFLOP/s , 533352.7 tokens/s INFO:__main__:2024-10-27 08:27:43 | Epoch: 2 | Step: 66700 | Dataset: 0-13904802 | Loss: 2.071 | 675 ms/step , 58258.18 GFLOP/s , 533144.9 tokens/s INFO:__main__:2024-10-27 08:27:50 | Epoch: 2 | Step: 66710 | Dataset: 0-13912802 | Loss: 2.130 | 675 ms/step , 58274.13 GFLOP/s , 532152.5 tokens/s INFO:__main__:2024-10-27 08:27:58 | Epoch: 2 | Step: 66720 | Dataset: 0-13920802 | Loss: 2.113 | 674 ms/step , 58315.20 GFLOP/s , 532933.4 tokens/s INFO:__main__:2024-10-27 08:28:06 | Epoch: 2 | Step: 66730 | Dataset: 0-13928802 | Loss: 2.234 | 675 ms/step , 58199.70 GFLOP/s , 533001.4 tokens/s INFO:__main__:2024-10-27 08:28:13 | Epoch: 2 | Step: 66740 | Dataset: 0-13936802 | Loss: 2.126 | 674 ms/step , 58356.11 GFLOP/s , 533355.5 tokens/s INFO:__main__:2024-10-27 08:28:21 | Epoch: 2 | Step: 66750 | Dataset: 0-13944802 | Loss: 2.119 | 675 ms/step , 58233.11 GFLOP/s , 533170.5 tokens/s INFO:__main__:2024-10-27 08:28:29 | Epoch: 2 | Step: 66760 | Dataset: 0-13952802 | Loss: 2.133 | 674 ms/step , 58284.94 GFLOP/s , 533507.7 tokens/s INFO:__main__:2024-10-27 08:28:36 | Epoch: 2 | Step: 66770 | Dataset: 0-13960802 | Loss: 2.190 | 674 ms/step , 58335.68 GFLOP/s , 533116.2 tokens/s INFO:__main__:2024-10-27 08:28:44 | Epoch: 2 | Step: 66780 | Dataset: 0-13968802 | Loss: 2.169 | 675 ms/step , 58243.62 GFLOP/s , 532526.9 tokens/s INFO:__main__:2024-10-27 08:28:52 | Epoch: 2 | Step: 66790 | Dataset: 0-13976802 | Loss: 2.169 | 675 ms/step , 58208.60 GFLOP/s , 532644.9 tokens/s INFO:__main__:2024-10-27 08:28:59 | Epoch: 2 | Step: 66800 | Dataset: 0-13984802 | Loss: 2.155 | 674 ms/step , 58301.73 GFLOP/s , 533135.3 tokens/s INFO:__main__:2024-10-27 08:29:07 | Epoch: 2 | Step: 66810 | Dataset: 0-13992802 | Loss: 2.131 | 674 ms/step , 58318.24 GFLOP/s , 533421.9 tokens/s INFO:__main__:2024-10-27 08:29:15 | Epoch: 2 | Step: 66820 | Dataset: 0-14000802 | Loss: 2.083 | 674 ms/step , 58314.67 GFLOP/s , 533534.4 tokens/s INFO:__main__:2024-10-27 08:29:23 | Epoch: 2 | Step: 66830 | Dataset: 0-14008802 | Loss: 2.002 | 676 ms/step , 58143.21 GFLOP/s , 531970.8 tokens/s INFO:__main__:2024-10-27 08:29:30 | Epoch: 2 | Step: 66840 | Dataset: 0-14016802 | Loss: 2.036 | 675 ms/step , 58220.92 GFLOP/s , 532797.1 tokens/s INFO:__main__:2024-10-27 08:29:38 | Epoch: 2 | Step: 66850 | Dataset: 0-14024802 | Loss: 2.036 | 675 ms/step , 58211.28 GFLOP/s , 532917.1 tokens/s INFO:__main__:2024-10-27 08:29:46 | Epoch: 2 | Step: 66860 | Dataset: 0-14032802 | Loss: 1.969 | 674 ms/step , 58314.18 GFLOP/s , 532805.0 tokens/s INFO:__main__:2024-10-27 08:29:53 | Epoch: 2 | Step: 66870 | Dataset: 0-14040802 | Loss: 1.994 | 674 ms/step , 58325.81 GFLOP/s , 533651.1 tokens/s INFO:__main__:2024-10-27 08:30:01 | Epoch: 2 | Step: 66880 | Dataset: 0-14048802 | Loss: 2.015 | 674 ms/step , 58363.74 GFLOP/s , 533436.1 tokens/s INFO:__main__:2024-10-27 08:30:09 | Epoch: 2 | Step: 66890 | Dataset: 0-14056802 | Loss: 2.063 | 675 ms/step , 58196.44 GFLOP/s , 533347.9 tokens/s INFO:__main__:2024-10-27 08:30:16 | Epoch: 2 | Step: 66900 | Dataset: 0-14064802 | Loss: 2.020 | 674 ms/step , 58320.50 GFLOP/s , 532495.9 tokens/s INFO:__main__:2024-10-27 08:30:24 | Epoch: 2 | Step: 66910 | Dataset: 0-14072802 | Loss: 1.994 | 676 ms/step , 58129.75 GFLOP/s , 531669.9 tokens/s INFO:__main__:2024-10-27 08:30:32 | Epoch: 2 | Step: 66920 | Dataset: 0-14080802 | Loss: 2.005 | 676 ms/step , 58147.32 GFLOP/s , 530937.7 tokens/s INFO:__main__:2024-10-27 08:30:39 | Epoch: 2 | Step: 66930 | Dataset: 0-14088802 | Loss: 1.982 | 677 ms/step , 58087.15 GFLOP/s , 531514.2 tokens/s INFO:__main__:2024-10-27 08:30:47 | Epoch: 2 | Step: 66940 | Dataset: 0-14096802 | Loss: 1.963 | 675 ms/step , 58206.01 GFLOP/s , 530884.7 tokens/s INFO:__main__:2024-10-27 08:30:55 | Epoch: 2 | Step: 66950 | Dataset: 0-14104802 | Loss: 1.963 | 676 ms/step , 58120.36 GFLOP/s , 531240.0 tokens/s INFO:__main__:2024-10-27 08:31:03 | Epoch: 2 | Step: 66960 | Dataset: 0-14112802 | Loss: 1.966 | 675 ms/step , 58243.08 GFLOP/s , 531898.2 tokens/s INFO:__main__:2024-10-27 08:31:10 | Epoch: 2 | Step: 66970 | Dataset: 0-14120802 | Loss: 2.327 | 675 ms/step , 58225.32 GFLOP/s , 530933.2 tokens/s INFO:__main__:2024-10-27 08:31:18 | Epoch: 2 | Step: 66980 | Dataset: 0-14128802 | Loss: 2.251 | 676 ms/step , 58116.59 GFLOP/s , 530851.0 tokens/s INFO:__main__:2024-10-27 08:31:26 | Epoch: 2 | Step: 66990 | Dataset: 0-14136802 | Loss: 2.111 | 674 ms/step , 58280.26 GFLOP/s , 530784.5 tokens/s INFO:__main__:2024-10-27 08:31:33 | Validation | Step: 67000 | Val_loss: 2.135 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 08:31:33 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_083133_step_67000.pt` INFO:__main__:2024-10-27 08:31:34 | Epoch: 2 | Step: 67000 | Dataset: 0-14144802 | Loss: 2.190 | 674 ms/step , 58337.15 GFLOP/s , 479549.1 tokens/s INFO:__main__:2024-10-27 08:31:42 | Epoch: 2 | Step: 67010 | Dataset: 0-14152802 | Loss: 2.229 | 674 ms/step , 58283.90 GFLOP/s , 532507.2 tokens/s INFO:__main__:2024-10-27 08:31:50 | Epoch: 2 | Step: 67020 | Dataset: 0-14160802 | Loss: 2.090 | 675 ms/step , 58203.53 GFLOP/s , 532419.0 tokens/s INFO:__main__:2024-10-27 08:31:57 | Epoch: 2 | Step: 67030 | Dataset: 0-14168802 | Loss: 2.141 | 674 ms/step , 58337.75 GFLOP/s , 533012.4 tokens/s INFO:__main__:2024-10-27 08:32:05 | Epoch: 2 | Step: 67040 | Dataset: 0-14176802 | Loss: 2.017 | 676 ms/step , 58159.50 GFLOP/s , 532375.8 tokens/s INFO:__main__:2024-10-27 08:32:13 | Epoch: 2 | Step: 67050 | Dataset: 0-14184802 | Loss: 2.124 | 675 ms/step , 58245.79 GFLOP/s , 532451.3 tokens/s INFO:__main__:2024-10-27 08:32:20 | Epoch: 2 | Step: 67060 | Dataset: 0-14192802 | Loss: 2.076 | 676 ms/step , 58172.61 GFLOP/s , 532248.1 tokens/s INFO:__main__:2024-10-27 08:32:28 | Epoch: 2 | Step: 67070 | Dataset: 0-14200802 | Loss: 2.147 | 675 ms/step , 58258.00 GFLOP/s , 532307.5 tokens/s INFO:__main__:2024-10-27 08:32:36 | Epoch: 2 | Step: 67080 | Dataset: 0-14208802 | Loss: 2.159 | 675 ms/step , 58197.31 GFLOP/s , 533044.8 tokens/s INFO:__main__:2024-10-27 08:32:44 | Epoch: 2 | Step: 67090 | Dataset: 0-14216802 | Loss: 2.072 | 700 ms/step , 56135.17 GFLOP/s , 530938.9 tokens/s INFO:__main__:2024-10-27 08:32:51 | Epoch: 2 | Step: 67100 | Dataset: 0-14224802 | Loss: 2.157 | 675 ms/step , 58231.76 GFLOP/s , 532660.9 tokens/s INFO:__main__:2024-10-27 08:32:59 | Epoch: 2 | Step: 67110 | Dataset: 0-14232802 | Loss: 2.151 | 675 ms/step , 58260.89 GFLOP/s , 532276.7 tokens/s INFO:__main__:2024-10-27 08:33:07 | Epoch: 2 | Step: 67120 | Dataset: 0-14240802 | Loss: 2.139 | 675 ms/step , 58236.53 GFLOP/s , 531965.5 tokens/s INFO:__main__:2024-10-27 08:33:14 | Epoch: 2 | Step: 67130 | Dataset: 0-14248802 | Loss: 1.959 | 675 ms/step , 58269.04 GFLOP/s , 532670.2 tokens/s INFO:__main__:2024-10-27 08:33:22 | Epoch: 2 | Step: 67140 | Dataset: 0-14256802 | Loss: 1.814 | 675 ms/step , 58236.14 GFLOP/s , 531914.2 tokens/s INFO:__main__:2024-10-27 08:33:30 | Epoch: 2 | Step: 67150 | Dataset: 0-14264802 | Loss: 1.739 | 676 ms/step , 58173.50 GFLOP/s , 532143.1 tokens/s INFO:__main__:2024-10-27 08:33:37 | Epoch: 2 | Step: 67160 | Dataset: 0-14272802 | Loss: 1.702 | 675 ms/step , 58195.33 GFLOP/s , 531965.4 tokens/s INFO:__main__:2024-10-27 08:33:45 | Epoch: 2 | Step: 67170 | Dataset: 0-14280802 | Loss: 1.704 | 676 ms/step , 58174.25 GFLOP/s , 532138.1 tokens/s INFO:__main__:2024-10-27 08:33:53 | Epoch: 2 | Step: 67180 | Dataset: 0-14288802 | Loss: 1.693 | 674 ms/step , 58303.20 GFLOP/s , 532753.2 tokens/s INFO:__main__:2024-10-27 08:34:00 | Epoch: 2 | Step: 67190 | Dataset: 0-14296802 | Loss: 1.662 | 677 ms/step , 58050.16 GFLOP/s , 532405.5 tokens/s INFO:__main__:2024-10-27 08:34:08 | Epoch: 2 | Step: 67200 | Dataset: 0-14304802 | Loss: 1.692 | 676 ms/step , 58186.67 GFLOP/s , 532405.9 tokens/s INFO:__main__:2024-10-27 08:34:16 | Epoch: 2 | Step: 67210 | Dataset: 0-14312802 | Loss: 1.647 | 675 ms/step , 58206.81 GFLOP/s , 532406.4 tokens/s INFO:__main__:2024-10-27 08:34:24 | Epoch: 2 | Step: 67220 | Dataset: 0-14320802 | Loss: 2.232 | 675 ms/step , 58270.35 GFLOP/s , 532585.1 tokens/s INFO:__main__:2024-10-27 08:34:31 | Epoch: 2 | Step: 67230 | Dataset: 0-14328802 | Loss: 2.287 | 674 ms/step , 58311.26 GFLOP/s , 532737.8 tokens/s INFO:__main__:2024-10-27 08:34:39 | Epoch: 2 | Step: 67240 | Dataset: 0-14336802 | Loss: 2.228 | 674 ms/step , 58291.11 GFLOP/s , 533619.4 tokens/s INFO:__main__:2024-10-27 08:34:47 | Epoch: 2 | Step: 67250 | Dataset: 0-14344802 | Loss: 2.168 | 675 ms/step , 58272.41 GFLOP/s , 532965.7 tokens/s INFO:__main__:2024-10-27 08:34:54 | Epoch: 2 | Step: 67260 | Dataset: 0-14352802 | Loss: 2.189 | 675 ms/step , 58250.29 GFLOP/s , 530860.0 tokens/s INFO:__main__:2024-10-27 08:35:02 | Epoch: 2 | Step: 67270 | Dataset: 0-14360802 | Loss: 2.238 | 674 ms/step , 58293.94 GFLOP/s , 532153.2 tokens/s INFO:__main__:2024-10-27 08:35:10 | Epoch: 2 | Step: 67280 | Dataset: 0-14368802 | Loss: 2.110 | 675 ms/step , 58254.77 GFLOP/s , 532249.4 tokens/s INFO:__main__:2024-10-27 08:35:17 | Epoch: 2 | Step: 67290 | Dataset: 0-14376802 | Loss: 2.180 | 674 ms/step , 58328.72 GFLOP/s , 533090.8 tokens/s INFO:__main__:2024-10-27 08:35:25 | Epoch: 2 | Step: 67300 | Dataset: 0-14384802 | Loss: 2.114 | 675 ms/step , 58268.35 GFLOP/s , 532703.6 tokens/s INFO:__main__:2024-10-27 08:35:33 | Epoch: 2 | Step: 67310 | Dataset: 0-14392802 | Loss: 2.228 | 673 ms/step , 58394.36 GFLOP/s , 533438.3 tokens/s INFO:__main__:2024-10-27 08:35:40 | Epoch: 2 | Step: 67320 | Dataset: 0-14400802 | Loss: 2.116 | 674 ms/step , 58302.90 GFLOP/s , 533114.9 tokens/s INFO:__main__:2024-10-27 08:35:48 | Epoch: 2 | Step: 67330 | Dataset: 0-14408802 | Loss: 2.163 | 674 ms/step , 58341.42 GFLOP/s , 532788.3 tokens/s INFO:__main__:2024-10-27 08:35:56 | Epoch: 2 | Step: 67340 | Dataset: 0-14416802 | Loss: 2.154 | 673 ms/step , 58372.98 GFLOP/s , 533517.3 tokens/s INFO:__main__:2024-10-27 08:36:03 | Epoch: 2 | Step: 67350 | Dataset: 0-14424802 | Loss: 2.171 | 674 ms/step , 58316.12 GFLOP/s , 533011.5 tokens/s INFO:__main__:2024-10-27 08:36:11 | Epoch: 2 | Step: 67360 | Dataset: 0-14432802 | Loss: 2.102 | 676 ms/step , 58110.26 GFLOP/s , 532190.4 tokens/s INFO:__main__:2024-10-27 08:36:19 | Epoch: 2 | Step: 67370 | Dataset: 0-14440802 | Loss: 2.240 | 679 ms/step , 57858.86 GFLOP/s , 531237.3 tokens/s INFO:__main__:2024-10-27 08:36:27 | Epoch: 2 | Step: 67380 | Dataset: 0-14448802 | Loss: 1.761 | 675 ms/step , 58248.99 GFLOP/s , 531224.3 tokens/s INFO:__main__:2024-10-27 08:36:34 | Epoch: 2 | Step: 67390 | Dataset: 0-14456802 | Loss: 1.680 | 675 ms/step , 58228.28 GFLOP/s , 532518.4 tokens/s INFO:__main__:2024-10-27 08:36:42 | Epoch: 2 | Step: 67400 | Dataset: 0-14464802 | Loss: 1.671 | 675 ms/step , 58197.02 GFLOP/s , 531664.6 tokens/s INFO:__main__:2024-10-27 08:36:50 | Epoch: 2 | Step: 67410 | Dataset: 0-14472802 | Loss: 1.645 | 676 ms/step , 58166.59 GFLOP/s , 532482.0 tokens/s INFO:__main__:2024-10-27 08:36:57 | Epoch: 2 | Step: 67420 | Dataset: 0-14480802 | Loss: 1.652 | 675 ms/step , 58266.10 GFLOP/s , 531579.7 tokens/s INFO:__main__:2024-10-27 08:37:05 | Epoch: 2 | Step: 67430 | Dataset: 0-14488802 | Loss: 1.643 | 676 ms/step , 58130.17 GFLOP/s , 531687.4 tokens/s INFO:__main__:2024-10-27 08:37:13 | Epoch: 2 | Step: 67440 | Dataset: 0-14496802 | Loss: 1.616 | 676 ms/step , 58126.83 GFLOP/s , 531742.0 tokens/s INFO:__main__:2024-10-27 08:37:21 | Epoch: 2 | Step: 67450 | Dataset: 0-14504802 | Loss: 1.621 | 675 ms/step , 58256.87 GFLOP/s , 531577.9 tokens/s INFO:__main__:2024-10-27 08:37:28 | Epoch: 2 | Step: 67460 | Dataset: 0-14512802 | Loss: 1.641 | 675 ms/step , 58236.79 GFLOP/s , 532206.7 tokens/s INFO:__main__:2024-10-27 08:37:36 | Epoch: 2 | Step: 67470 | Dataset: 0-14520802 | Loss: 2.276 | 676 ms/step , 58132.50 GFLOP/s , 532279.4 tokens/s INFO:__main__:2024-10-27 08:37:44 | Epoch: 2 | Step: 67480 | Dataset: 0-14528802 | Loss: 2.107 | 675 ms/step , 58254.21 GFLOP/s , 531550.6 tokens/s INFO:__main__:2024-10-27 08:37:51 | Epoch: 2 | Step: 67490 | Dataset: 0-14536802 | Loss: 2.136 | 674 ms/step , 58328.05 GFLOP/s , 531492.9 tokens/s INFO:__main__:2024-10-27 08:37:59 | Epoch: 2 | Step: 67500 | Dataset: 0-14544802 | Loss: 2.111 | 675 ms/step , 58200.72 GFLOP/s , 533165.7 tokens/s INFO:__main__:2024-10-27 08:38:07 | Epoch: 2 | Step: 67510 | Dataset: 0-14552802 | Loss: 2.185 | 676 ms/step , 58131.50 GFLOP/s , 532783.2 tokens/s INFO:__main__:2024-10-27 08:38:14 | Epoch: 2 | Step: 67520 | Dataset: 0-14560802 | Loss: 2.126 | 677 ms/step , 58033.31 GFLOP/s , 532978.1 tokens/s INFO:__main__:2024-10-27 08:38:22 | Epoch: 2 | Step: 67530 | Dataset: 0-14568802 | Loss: 2.108 | 674 ms/step , 58339.95 GFLOP/s , 533303.4 tokens/s INFO:__main__:2024-10-27 08:38:30 | Epoch: 2 | Step: 67540 | Dataset: 0-14576802 | Loss: 2.194 | 674 ms/step , 58341.18 GFLOP/s , 533213.4 tokens/s INFO:__main__:2024-10-27 08:38:37 | Epoch: 2 | Step: 67550 | Dataset: 0-14584802 | Loss: 2.149 | 674 ms/step , 58348.23 GFLOP/s , 533478.8 tokens/s INFO:__main__:2024-10-27 08:38:45 | Epoch: 2 | Step: 67560 | Dataset: 0-14592802 | Loss: 2.174 | 675 ms/step , 58206.71 GFLOP/s , 532746.2 tokens/s INFO:__main__:2024-10-27 08:38:53 | Epoch: 2 | Step: 67570 | Dataset: 0-14600802 | Loss: 2.084 | 674 ms/step , 58335.83 GFLOP/s , 533536.3 tokens/s INFO:__main__:2024-10-27 08:39:00 | Epoch: 2 | Step: 67580 | Dataset: 0-14608802 | Loss: 2.107 | 677 ms/step , 58049.73 GFLOP/s , 533162.8 tokens/s INFO:__main__:2024-10-27 08:39:08 | Epoch: 2 | Step: 67590 | Dataset: 0-14616802 | Loss: 2.129 | 674 ms/step , 58330.74 GFLOP/s , 532795.3 tokens/s INFO:__main__:2024-10-27 08:39:16 | Epoch: 2 | Step: 67600 | Dataset: 0-14624802 | Loss: 2.211 | 677 ms/step , 58078.19 GFLOP/s , 532344.7 tokens/s INFO:__main__:2024-10-27 08:39:24 | Epoch: 2 | Step: 67610 | Dataset: 0-14632802 | Loss: 2.129 | 675 ms/step , 58234.52 GFLOP/s , 532378.6 tokens/s INFO:__main__:2024-10-27 08:39:31 | Epoch: 2 | Step: 67620 | Dataset: 0-14640802 | Loss: 2.114 | 675 ms/step , 58265.43 GFLOP/s , 533177.2 tokens/s INFO:__main__:2024-10-27 08:39:39 | Epoch: 2 | Step: 67630 | Dataset: 0-14648802 | Loss: 2.205 | 675 ms/step , 58237.05 GFLOP/s , 532572.7 tokens/s INFO:__main__:2024-10-27 08:39:47 | Epoch: 2 | Step: 67640 | Dataset: 0-14656802 | Loss: 2.176 | 674 ms/step , 58280.53 GFLOP/s , 532964.3 tokens/s INFO:__main__:2024-10-27 08:39:54 | Epoch: 2 | Step: 67650 | Dataset: 0-14664802 | Loss: 2.212 | 675 ms/step , 58206.66 GFLOP/s , 533485.7 tokens/s INFO:__main__:2024-10-27 08:40:02 | Epoch: 2 | Step: 67660 | Dataset: 0-14672802 | Loss: 2.194 | 676 ms/step , 58118.38 GFLOP/s , 533090.0 tokens/s INFO:__main__:2024-10-27 08:40:10 | Epoch: 2 | Step: 67670 | Dataset: 0-14680802 | Loss: 2.217 | 674 ms/step , 58315.14 GFLOP/s , 533383.6 tokens/s INFO:__main__:2024-10-27 08:40:17 | Epoch: 2 | Step: 67680 | Dataset: 0-14688802 | Loss: 2.098 | 674 ms/step , 58364.64 GFLOP/s , 533424.9 tokens/s INFO:__main__:2024-10-27 08:40:25 | Epoch: 2 | Step: 67690 | Dataset: 0-14696802 | Loss: 2.161 | 675 ms/step , 58222.52 GFLOP/s , 533249.2 tokens/s INFO:__main__:2024-10-27 08:40:33 | Epoch: 2 | Step: 67700 | Dataset: 0-14704802 | Loss: 2.190 | 675 ms/step , 58230.07 GFLOP/s , 532575.2 tokens/s INFO:__main__:2024-10-27 08:40:40 | Epoch: 2 | Step: 67710 | Dataset: 0-14712802 | Loss: 2.150 | 676 ms/step , 58140.38 GFLOP/s , 533322.6 tokens/s INFO:__main__:2024-10-27 08:40:48 | Epoch: 2 | Step: 67720 | Dataset: 0-14720802 | Loss: 2.236 | 676 ms/step , 58185.69 GFLOP/s , 532992.1 tokens/s INFO:__main__:2024-10-27 08:40:56 | Epoch: 2 | Step: 67730 | Dataset: 0-14728802 | Loss: 2.163 | 675 ms/step , 58247.44 GFLOP/s , 533103.7 tokens/s INFO:__main__:2024-10-27 08:41:03 | Epoch: 2 | Step: 67740 | Dataset: 0-14736802 | Loss: 2.120 | 676 ms/step , 58129.09 GFLOP/s , 533212.7 tokens/s INFO:__main__:2024-10-27 08:41:11 | Epoch: 2 | Step: 67750 | Dataset: 0-14744802 | Loss: 2.197 | 675 ms/step , 58266.12 GFLOP/s , 532757.4 tokens/s INFO:__main__:2024-10-27 08:41:19 | Epoch: 2 | Step: 67760 | Dataset: 0-14752802 | Loss: 2.167 | 676 ms/step , 58189.97 GFLOP/s , 532660.0 tokens/s INFO:__main__:2024-10-27 08:41:26 | Epoch: 2 | Step: 67770 | Dataset: 0-14760802 | Loss: 2.176 | 675 ms/step , 58229.30 GFLOP/s , 531937.3 tokens/s INFO:__main__:2024-10-27 08:41:34 | Epoch: 2 | Step: 67780 | Dataset: 0-14768802 | Loss: 2.111 | 676 ms/step , 58153.76 GFLOP/s , 532210.6 tokens/s INFO:__main__:2024-10-27 08:41:42 | Epoch: 2 | Step: 67790 | Dataset: 0-14776802 | Loss: 2.248 | 676 ms/step , 58161.97 GFLOP/s , 532630.5 tokens/s INFO:__main__:2024-10-27 08:41:50 | Epoch: 2 | Step: 67800 | Dataset: 0-14784802 | Loss: 2.133 | 676 ms/step , 58181.15 GFLOP/s , 532436.9 tokens/s INFO:__main__:2024-10-27 08:41:57 | Epoch: 2 | Step: 67810 | Dataset: 0-14792802 | Loss: 2.196 | 675 ms/step , 58224.34 GFLOP/s , 532627.4 tokens/s INFO:__main__:2024-10-27 08:42:05 | Epoch: 2 | Step: 67820 | Dataset: 0-14800802 | Loss: 2.120 | 674 ms/step , 58353.90 GFLOP/s , 532897.8 tokens/s INFO:__main__:2024-10-27 08:42:13 | Epoch: 2 | Step: 67830 | Dataset: 0-14808802 | Loss: 2.175 | 675 ms/step , 58278.12 GFLOP/s , 533268.3 tokens/s INFO:__main__:2024-10-27 08:42:20 | Epoch: 2 | Step: 67840 | Dataset: 0-14816802 | Loss: 2.189 | 675 ms/step , 58263.82 GFLOP/s , 532739.9 tokens/s INFO:__main__:2024-10-27 08:42:28 | Epoch: 2 | Step: 67850 | Dataset: 0-14824802 | Loss: 2.134 | 674 ms/step , 58314.03 GFLOP/s , 533163.6 tokens/s INFO:__main__:2024-10-27 08:42:36 | Epoch: 2 | Step: 67860 | Dataset: 0-14832802 | Loss: 2.196 | 674 ms/step , 58324.83 GFLOP/s , 533149.5 tokens/s INFO:__main__:2024-10-27 08:42:43 | Epoch: 2 | Step: 67870 | Dataset: 0-14840802 | Loss: 2.052 | 675 ms/step , 58218.09 GFLOP/s , 532918.2 tokens/s INFO:__main__:2024-10-27 08:42:51 | Epoch: 2 | Step: 67880 | Dataset: 0-14848802 | Loss: 2.217 | 675 ms/step , 58203.70 GFLOP/s , 533293.0 tokens/s INFO:__main__:2024-10-27 08:42:59 | Epoch: 2 | Step: 67890 | Dataset: 0-14856802 | Loss: 2.114 | 675 ms/step , 58256.61 GFLOP/s , 532832.6 tokens/s INFO:__main__:2024-10-27 08:43:06 | Epoch: 2 | Step: 67900 | Dataset: 0-14864802 | Loss: 2.176 | 675 ms/step , 58267.82 GFLOP/s , 532999.4 tokens/s INFO:__main__:2024-10-27 08:43:14 | Epoch: 2 | Step: 67910 | Dataset: 0-14872802 | Loss: 2.174 | 675 ms/step , 58237.68 GFLOP/s , 532093.6 tokens/s INFO:__main__:2024-10-27 08:43:22 | Epoch: 2 | Step: 67920 | Dataset: 0-14880802 | Loss: 2.166 | 675 ms/step , 58252.91 GFLOP/s , 532908.4 tokens/s INFO:__main__:2024-10-27 08:43:29 | Epoch: 2 | Step: 67930 | Dataset: 0-14888802 | Loss: 2.210 | 675 ms/step , 58271.46 GFLOP/s , 533364.7 tokens/s INFO:__main__:2024-10-27 08:43:37 | Epoch: 2 | Step: 67940 | Dataset: 0-14896802 | Loss: 2.109 | 675 ms/step , 58268.84 GFLOP/s , 532916.9 tokens/s INFO:__main__:2024-10-27 08:43:45 | Epoch: 2 | Step: 67950 | Dataset: 0-14904802 | Loss: 2.166 | 674 ms/step , 58310.45 GFLOP/s , 533899.5 tokens/s INFO:__main__:2024-10-27 08:43:53 | Epoch: 2 | Step: 67960 | Dataset: 0-14912802 | Loss: 2.159 | 674 ms/step , 58359.79 GFLOP/s , 532992.5 tokens/s INFO:__main__:2024-10-27 08:44:00 | Epoch: 2 | Step: 67970 | Dataset: 0-14920802 | Loss: 2.250 | 675 ms/step , 58209.40 GFLOP/s , 533555.1 tokens/s INFO:__main__:2024-10-27 08:44:08 | Epoch: 2 | Step: 67980 | Dataset: 0-14928802 | Loss: 2.184 | 675 ms/step , 58223.22 GFLOP/s , 533031.5 tokens/s INFO:__main__:2024-10-27 08:44:16 | Epoch: 2 | Step: 67990 | Dataset: 0-14936802 | Loss: 2.171 | 674 ms/step , 58317.32 GFLOP/s , 533446.0 tokens/s INFO:__main__:2024-10-27 08:44:23 | Validation | Step: 68000 | Val_loss: 2.153 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 08:44:23 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_084423_step_68000.pt` INFO:__main__:2024-10-27 08:44:24 | Epoch: 2 | Step: 68000 | Dataset: 0-14944802 | Loss: 2.145 | 673 ms/step , 58380.71 GFLOP/s , 480374.8 tokens/s INFO:__main__:2024-10-27 08:44:32 | Epoch: 2 | Step: 68010 | Dataset: 0-14952802 | Loss: 2.164 | 675 ms/step , 58258.90 GFLOP/s , 532467.5 tokens/s INFO:__main__:2024-10-27 08:44:39 | Epoch: 2 | Step: 68020 | Dataset: 0-14960802 | Loss: 2.132 | 674 ms/step , 58321.16 GFLOP/s , 533434.6 tokens/s INFO:__main__:2024-10-27 08:44:47 | Epoch: 2 | Step: 68030 | Dataset: 0-14968802 | Loss: 2.132 | 674 ms/step , 58323.58 GFLOP/s , 532947.6 tokens/s INFO:__main__:2024-10-27 08:44:55 | Epoch: 2 | Step: 68040 | Dataset: 0-14976802 | Loss: 2.141 | 674 ms/step , 58299.14 GFLOP/s , 533735.3 tokens/s INFO:__main__:2024-10-27 08:45:03 | Epoch: 2 | Step: 68050 | Dataset: 0-14984802 | Loss: 2.194 | 675 ms/step , 58256.16 GFLOP/s , 533080.5 tokens/s INFO:__main__:2024-10-27 08:45:10 | Epoch: 2 | Step: 68060 | Dataset: 0-14992802 | Loss: 2.142 | 675 ms/step , 58256.18 GFLOP/s , 533031.6 tokens/s INFO:__main__:2024-10-27 08:45:18 | Epoch: 2 | Step: 68070 | Dataset: 0-15000802 | Loss: 2.134 | 674 ms/step , 58314.01 GFLOP/s , 533487.2 tokens/s INFO:__main__:2024-10-27 08:45:26 | Epoch: 2 | Step: 68080 | Dataset: 0-15008802 | Loss: 2.114 | 675 ms/step , 58240.23 GFLOP/s , 532608.0 tokens/s INFO:__main__:2024-10-27 08:45:33 | Epoch: 2 | Step: 68090 | Dataset: 0-15016802 | Loss: 2.209 | 674 ms/step , 58327.40 GFLOP/s , 533753.5 tokens/s INFO:__main__:2024-10-27 08:45:41 | Epoch: 2 | Step: 68100 | Dataset: 0-15024802 | Loss: 2.167 | 674 ms/step , 58343.28 GFLOP/s , 532687.1 tokens/s INFO:__main__:2024-10-27 08:45:49 | Epoch: 2 | Step: 68110 | Dataset: 0-15032802 | Loss: 2.259 | 675 ms/step , 58234.35 GFLOP/s , 532810.2 tokens/s INFO:__main__:2024-10-27 08:45:56 | Epoch: 2 | Step: 68120 | Dataset: 0-15040802 | Loss: 2.173 | 676 ms/step , 58187.93 GFLOP/s , 531852.8 tokens/s INFO:__main__:2024-10-27 08:46:04 | Epoch: 2 | Step: 68130 | Dataset: 0-15048802 | Loss: 2.167 | 675 ms/step , 58218.70 GFLOP/s , 532801.1 tokens/s INFO:__main__:2024-10-27 08:46:12 | Epoch: 2 | Step: 68140 | Dataset: 0-15056802 | Loss: 2.169 | 675 ms/step , 58239.67 GFLOP/s , 532950.8 tokens/s INFO:__main__:2024-10-27 08:46:19 | Epoch: 2 | Step: 68150 | Dataset: 0-15064802 | Loss: 2.165 | 674 ms/step , 58327.98 GFLOP/s , 532810.3 tokens/s INFO:__main__:2024-10-27 08:46:27 | Epoch: 2 | Step: 68160 | Dataset: 0-15072802 | Loss: 2.240 | 674 ms/step , 58284.80 GFLOP/s , 533365.2 tokens/s INFO:__main__:2024-10-27 08:46:35 | Epoch: 2 | Step: 68170 | Dataset: 0-15080802 | Loss: 2.177 | 675 ms/step , 58276.19 GFLOP/s , 532679.6 tokens/s INFO:__main__:2024-10-27 08:46:42 | Epoch: 2 | Step: 68180 | Dataset: 0-15088802 | Loss: 2.187 | 675 ms/step , 58258.63 GFLOP/s , 532864.5 tokens/s INFO:__main__:2024-10-27 08:46:50 | Epoch: 2 | Step: 68190 | Dataset: 0-15096802 | Loss: 2.221 | 675 ms/step , 58219.20 GFLOP/s , 532739.7 tokens/s INFO:__main__:2024-10-27 08:46:58 | Epoch: 2 | Step: 68200 | Dataset: 0-15104802 | Loss: 2.187 | 675 ms/step , 58222.49 GFLOP/s , 532835.7 tokens/s INFO:__main__:2024-10-27 08:47:06 | Epoch: 2 | Step: 68210 | Dataset: 0-15112802 | Loss: 2.150 | 676 ms/step , 58155.44 GFLOP/s , 532528.3 tokens/s INFO:__main__:2024-10-27 08:47:13 | Epoch: 2 | Step: 68220 | Dataset: 0-15120802 | Loss: 2.220 | 675 ms/step , 58211.47 GFLOP/s , 532521.3 tokens/s INFO:__main__:2024-10-27 08:47:21 | Epoch: 2 | Step: 68230 | Dataset: 0-15128802 | Loss: 2.183 | 674 ms/step , 58315.56 GFLOP/s , 532892.0 tokens/s INFO:__main__:2024-10-27 08:47:29 | Epoch: 2 | Step: 68240 | Dataset: 0-15136802 | Loss: 2.125 | 676 ms/step , 58177.45 GFLOP/s , 532570.2 tokens/s INFO:__main__:2024-10-27 08:47:36 | Epoch: 2 | Step: 68250 | Dataset: 0-15144802 | Loss: 2.081 | 675 ms/step , 58259.39 GFLOP/s , 532637.7 tokens/s INFO:__main__:2024-10-27 08:47:44 | Epoch: 2 | Step: 68260 | Dataset: 0-15152802 | Loss: 2.201 | 677 ms/step , 58051.14 GFLOP/s , 532576.8 tokens/s INFO:__main__:2024-10-27 08:47:52 | Epoch: 2 | Step: 68270 | Dataset: 0-15160802 | Loss: 2.151 | 675 ms/step , 58247.51 GFLOP/s , 532908.9 tokens/s INFO:__main__:2024-10-27 08:47:59 | Epoch: 2 | Step: 68280 | Dataset: 0-15168802 | Loss: 2.195 | 674 ms/step , 58291.22 GFLOP/s , 533057.0 tokens/s INFO:__main__:2024-10-27 08:48:07 | Epoch: 2 | Step: 68290 | Dataset: 0-15176802 | Loss: 2.146 | 675 ms/step , 58272.58 GFLOP/s , 532687.5 tokens/s INFO:__main__:2024-10-27 08:48:15 | Epoch: 2 | Step: 68300 | Dataset: 0-15184802 | Loss: 2.155 | 675 ms/step , 58233.00 GFLOP/s , 532767.3 tokens/s INFO:__main__:2024-10-27 08:48:22 | Epoch: 2 | Step: 68310 | Dataset: 0-15192802 | Loss: 2.177 | 675 ms/step , 58260.66 GFLOP/s , 532562.1 tokens/s INFO:__main__:2024-10-27 08:48:30 | Epoch: 2 | Step: 68320 | Dataset: 0-15200802 | Loss: 2.110 | 676 ms/step , 58173.68 GFLOP/s , 532423.6 tokens/s INFO:__main__:2024-10-27 08:48:38 | Epoch: 2 | Step: 68330 | Dataset: 0-15208802 | Loss: 2.174 | 674 ms/step , 58306.55 GFLOP/s , 532523.5 tokens/s INFO:__main__:2024-10-27 08:48:46 | Epoch: 2 | Step: 68340 | Dataset: 0-15216802 | Loss: 2.145 | 676 ms/step , 58190.07 GFLOP/s , 528266.1 tokens/s INFO:__main__:2024-10-27 08:48:53 | Epoch: 2 | Step: 68350 | Dataset: 0-15224802 | Loss: 2.069 | 675 ms/step , 58201.85 GFLOP/s , 531394.5 tokens/s INFO:__main__:2024-10-27 08:49:01 | Epoch: 2 | Step: 68360 | Dataset: 0-15232802 | Loss: 2.074 | 675 ms/step , 58252.51 GFLOP/s , 531481.7 tokens/s INFO:__main__:2024-10-27 08:49:09 | Epoch: 2 | Step: 68370 | Dataset: 0-15240802 | Loss: 2.072 | 677 ms/step , 58092.75 GFLOP/s , 531647.7 tokens/s INFO:__main__:2024-10-27 08:49:16 | Epoch: 2 | Step: 68380 | Dataset: 0-15248802 | Loss: 2.098 | 676 ms/step , 58183.96 GFLOP/s , 530838.8 tokens/s INFO:__main__:2024-10-27 08:49:24 | Epoch: 2 | Step: 68390 | Dataset: 0-15256802 | Loss: 2.098 | 676 ms/step , 58139.96 GFLOP/s , 531730.1 tokens/s INFO:__main__:2024-10-27 08:49:32 | Epoch: 2 | Step: 68400 | Dataset: 0-15264802 | Loss: 2.099 | 675 ms/step , 58208.41 GFLOP/s , 531150.1 tokens/s INFO:__main__:2024-10-27 08:49:40 | Epoch: 2 | Step: 68410 | Dataset: 0-15272802 | Loss: 2.104 | 675 ms/step , 58236.14 GFLOP/s , 529662.9 tokens/s INFO:__main__:2024-10-27 08:49:47 | Epoch: 2 | Step: 68420 | Dataset: 0-15280802 | Loss: 2.048 | 675 ms/step , 58239.49 GFLOP/s , 531163.5 tokens/s INFO:__main__:2024-10-27 08:49:55 | Epoch: 2 | Step: 68430 | Dataset: 0-15288802 | Loss: 2.043 | 674 ms/step , 58291.63 GFLOP/s , 532884.9 tokens/s INFO:__main__:2024-10-27 08:50:03 | Epoch: 2 | Step: 68440 | Dataset: 0-15296802 | Loss: 1.830 | 674 ms/step , 58294.75 GFLOP/s , 532744.7 tokens/s INFO:__main__:2024-10-27 08:50:10 | Epoch: 2 | Step: 68450 | Dataset: 0-15304802 | Loss: 1.785 | 675 ms/step , 58233.18 GFLOP/s , 532040.6 tokens/s INFO:__main__:2024-10-27 08:50:18 | Epoch: 2 | Step: 68460 | Dataset: 0-15312802 | Loss: 1.750 | 675 ms/step , 58236.76 GFLOP/s , 532338.6 tokens/s INFO:__main__:2024-10-27 08:50:26 | Epoch: 2 | Step: 68470 | Dataset: 0-15320802 | Loss: 1.731 | 676 ms/step , 58115.93 GFLOP/s , 531940.3 tokens/s INFO:__main__:2024-10-27 08:50:33 | Epoch: 2 | Step: 68480 | Dataset: 0-15328802 | Loss: 1.695 | 675 ms/step , 58257.03 GFLOP/s , 532111.2 tokens/s INFO:__main__:2024-10-27 08:50:41 | Epoch: 2 | Step: 68490 | Dataset: 0-15336802 | Loss: 1.701 | 674 ms/step , 58358.47 GFLOP/s , 532546.2 tokens/s INFO:__main__:2024-10-27 08:50:49 | Epoch: 2 | Step: 68500 | Dataset: 0-15344802 | Loss: 1.697 | 674 ms/step , 58303.50 GFLOP/s , 532663.3 tokens/s INFO:__main__:2024-10-27 08:50:56 | Epoch: 2 | Step: 68510 | Dataset: 0-15352802 | Loss: 1.673 | 674 ms/step , 58281.67 GFLOP/s , 532371.3 tokens/s INFO:__main__:2024-10-27 08:51:04 | Epoch: 2 | Step: 68520 | Dataset: 0-15360802 | Loss: 2.252 | 674 ms/step , 58309.60 GFLOP/s , 532643.2 tokens/s INFO:__main__:2024-10-27 08:51:12 | Epoch: 2 | Step: 68530 | Dataset: 0-15368802 | Loss: 2.201 | 675 ms/step , 58247.25 GFLOP/s , 532958.1 tokens/s INFO:__main__:2024-10-27 08:51:20 | Epoch: 2 | Step: 68540 | Dataset: 0-15376802 | Loss: 2.166 | 675 ms/step , 58242.37 GFLOP/s , 532747.9 tokens/s INFO:__main__:2024-10-27 08:51:27 | Epoch: 2 | Step: 68550 | Dataset: 0-15384802 | Loss: 2.160 | 674 ms/step , 58335.37 GFLOP/s , 533315.8 tokens/s INFO:__main__:2024-10-27 08:51:35 | Epoch: 2 | Step: 68560 | Dataset: 0-15392802 | Loss: 2.144 | 675 ms/step , 58214.77 GFLOP/s , 533273.9 tokens/s INFO:__main__:2024-10-27 08:51:43 | Epoch: 2 | Step: 68570 | Dataset: 0-15400802 | Loss: 2.213 | 675 ms/step , 58266.11 GFLOP/s , 533071.3 tokens/s INFO:__main__:2024-10-27 08:51:50 | Epoch: 2 | Step: 68580 | Dataset: 0-15408802 | Loss: 2.029 | 674 ms/step , 58333.11 GFLOP/s , 532878.5 tokens/s INFO:__main__:2024-10-27 08:51:58 | Epoch: 2 | Step: 68590 | Dataset: 0-15416802 | Loss: 2.118 | 675 ms/step , 58247.22 GFLOP/s , 532878.8 tokens/s INFO:__main__:2024-10-27 08:52:06 | Epoch: 2 | Step: 68600 | Dataset: 0-15424802 | Loss: 2.115 | 674 ms/step , 58290.30 GFLOP/s , 532762.6 tokens/s INFO:__main__:2024-10-27 08:52:13 | Epoch: 2 | Step: 68610 | Dataset: 0-15432802 | Loss: 2.120 | 675 ms/step , 58249.25 GFLOP/s , 532516.8 tokens/s INFO:__main__:2024-10-27 08:52:21 | Epoch: 2 | Step: 68620 | Dataset: 0-15440802 | Loss: 2.101 | 675 ms/step , 58212.40 GFLOP/s , 532623.8 tokens/s INFO:__main__:2024-10-27 08:52:29 | Epoch: 2 | Step: 68630 | Dataset: 0-15448802 | Loss: 2.083 | 675 ms/step , 58236.94 GFLOP/s , 532481.0 tokens/s INFO:__main__:2024-10-27 08:52:36 | Epoch: 2 | Step: 68640 | Dataset: 0-15456802 | Loss: 2.140 | 678 ms/step , 58008.98 GFLOP/s , 532244.5 tokens/s INFO:__main__:2024-10-27 08:52:44 | Epoch: 2 | Step: 68650 | Dataset: 0-15464802 | Loss: 2.122 | 675 ms/step , 58209.40 GFLOP/s , 532835.8 tokens/s INFO:__main__:2024-10-27 08:52:52 | Epoch: 2 | Step: 68660 | Dataset: 0-15472802 | Loss: 2.142 | 675 ms/step , 58246.59 GFLOP/s , 532696.8 tokens/s INFO:__main__:2024-10-27 08:52:59 | Epoch: 2 | Step: 68670 | Dataset: 0-15480802 | Loss: 2.180 | 676 ms/step , 58190.81 GFLOP/s , 533010.2 tokens/s INFO:__main__:2024-10-27 08:53:07 | Epoch: 2 | Step: 68680 | Dataset: 0-15488802 | Loss: 1.863 | 675 ms/step , 58264.03 GFLOP/s , 532957.1 tokens/s INFO:__main__:2024-10-27 08:53:15 | Epoch: 2 | Step: 68690 | Dataset: 0-15496802 | Loss: 1.826 | 675 ms/step , 58249.46 GFLOP/s , 531713.5 tokens/s INFO:__main__:2024-10-27 08:53:23 | Epoch: 2 | Step: 68700 | Dataset: 0-15504802 | Loss: 1.821 | 676 ms/step , 58136.72 GFLOP/s , 531282.9 tokens/s INFO:__main__:2024-10-27 08:53:30 | Epoch: 2 | Step: 68710 | Dataset: 0-15512802 | Loss: 1.810 | 674 ms/step , 58312.83 GFLOP/s , 531676.6 tokens/s INFO:__main__:2024-10-27 08:53:38 | Epoch: 2 | Step: 68720 | Dataset: 0-15520802 | Loss: 1.812 | 675 ms/step , 58216.30 GFLOP/s , 532425.9 tokens/s INFO:__main__:2024-10-27 08:53:46 | Epoch: 2 | Step: 68730 | Dataset: 0-15528802 | Loss: 1.794 | 676 ms/step , 58188.50 GFLOP/s , 532426.5 tokens/s INFO:__main__:2024-10-27 08:53:53 | Epoch: 2 | Step: 68740 | Dataset: 0-15536802 | Loss: 1.781 | 675 ms/step , 58237.68 GFLOP/s , 532887.2 tokens/s INFO:__main__:2024-10-27 08:54:01 | Epoch: 2 | Step: 68750 | Dataset: 0-15544802 | Loss: 1.762 | 676 ms/step , 58127.38 GFLOP/s , 532469.8 tokens/s INFO:__main__:2024-10-27 08:54:09 | Epoch: 2 | Step: 68760 | Dataset: 0-15552802 | Loss: 1.789 | 674 ms/step , 58314.35 GFLOP/s , 533229.7 tokens/s INFO:__main__:2024-10-27 08:54:16 | Epoch: 2 | Step: 68770 | Dataset: 0-15560802 | Loss: 1.799 | 674 ms/step , 58283.86 GFLOP/s , 532842.0 tokens/s INFO:__main__:2024-10-27 08:54:24 | Epoch: 2 | Step: 68780 | Dataset: 0-15568802 | Loss: 1.759 | 676 ms/step , 58190.01 GFLOP/s , 532428.0 tokens/s INFO:__main__:2024-10-27 08:54:32 | Epoch: 2 | Step: 68790 | Dataset: 0-15576802 | Loss: 1.788 | 676 ms/step , 58137.98 GFLOP/s , 532428.2 tokens/s INFO:__main__:2024-10-27 08:54:39 | Epoch: 2 | Step: 68800 | Dataset: 0-15584802 | Loss: 1.730 | 674 ms/step , 58298.20 GFLOP/s , 531424.9 tokens/s INFO:__main__:2024-10-27 08:54:47 | Epoch: 2 | Step: 68810 | Dataset: 0-15592802 | Loss: 1.715 | 674 ms/step , 58302.33 GFLOP/s , 533440.0 tokens/s INFO:__main__:2024-10-27 08:54:55 | Epoch: 2 | Step: 68820 | Dataset: 0-15600802 | Loss: 1.737 | 673 ms/step , 58377.42 GFLOP/s , 533034.1 tokens/s INFO:__main__:2024-10-27 08:55:03 | Epoch: 2 | Step: 68830 | Dataset: 0-15608802 | Loss: 1.753 | 674 ms/step , 58280.78 GFLOP/s , 532524.3 tokens/s INFO:__main__:2024-10-27 08:55:10 | Epoch: 2 | Step: 68840 | Dataset: 0-15616802 | Loss: 1.726 | 673 ms/step , 58382.02 GFLOP/s , 533346.3 tokens/s INFO:__main__:2024-10-27 08:55:18 | Epoch: 2 | Step: 68850 | Dataset: 0-15624802 | Loss: 1.723 | 675 ms/step , 58268.95 GFLOP/s , 533079.1 tokens/s INFO:__main__:2024-10-27 08:55:26 | Epoch: 2 | Step: 68860 | Dataset: 0-15632802 | Loss: 2.315 | 674 ms/step , 58294.70 GFLOP/s , 533521.0 tokens/s INFO:__main__:2024-10-27 08:55:33 | Epoch: 2 | Step: 68870 | Dataset: 0-15640802 | Loss: 2.178 | 675 ms/step , 58244.92 GFLOP/s , 532878.1 tokens/s INFO:__main__:2024-10-27 08:55:41 | Epoch: 2 | Step: 68880 | Dataset: 0-15648802 | Loss: 2.145 | 677 ms/step , 58104.78 GFLOP/s , 532994.6 tokens/s INFO:__main__:2024-10-27 08:55:49 | Epoch: 2 | Step: 68890 | Dataset: 0-15656802 | Loss: 2.147 | 675 ms/step , 58256.47 GFLOP/s , 533015.3 tokens/s INFO:__main__:2024-10-27 08:55:56 | Epoch: 2 | Step: 68900 | Dataset: 0-15664802 | Loss: 2.185 | 675 ms/step , 58240.83 GFLOP/s , 532846.1 tokens/s INFO:__main__:2024-10-27 08:56:04 | Epoch: 2 | Step: 68910 | Dataset: 0-15672802 | Loss: 2.171 | 673 ms/step , 58418.29 GFLOP/s , 533173.0 tokens/s INFO:__main__:2024-10-27 08:56:12 | Epoch: 2 | Step: 68920 | Dataset: 0-15680802 | Loss: 2.218 | 675 ms/step , 58266.47 GFLOP/s , 532989.0 tokens/s INFO:__main__:2024-10-27 08:56:19 | Epoch: 2 | Step: 68930 | Dataset: 0-15688802 | Loss: 2.171 | 677 ms/step , 58031.85 GFLOP/s , 532965.9 tokens/s INFO:__main__:2024-10-27 08:56:27 | Epoch: 2 | Step: 68940 | Dataset: 0-15696802 | Loss: 2.147 | 674 ms/step , 58312.75 GFLOP/s , 532988.7 tokens/s INFO:__main__:2024-10-27 08:56:35 | Epoch: 2 | Step: 68950 | Dataset: 0-15704802 | Loss: 2.212 | 675 ms/step , 58194.88 GFLOP/s , 532982.7 tokens/s INFO:__main__:2024-10-27 08:56:42 | Epoch: 2 | Step: 68960 | Dataset: 0-15712802 | Loss: 2.131 | 674 ms/step , 58297.72 GFLOP/s , 533395.1 tokens/s INFO:__main__:2024-10-27 08:56:50 | Epoch: 2 | Step: 68970 | Dataset: 0-15720802 | Loss: 2.167 | 674 ms/step , 58286.07 GFLOP/s , 533236.3 tokens/s INFO:__main__:2024-10-27 08:56:58 | Epoch: 2 | Step: 68980 | Dataset: 0-15728802 | Loss: 2.107 | 675 ms/step , 58232.38 GFLOP/s , 533101.2 tokens/s INFO:__main__:2024-10-27 08:57:05 | Epoch: 2 | Step: 68990 | Dataset: 0-15736802 | Loss: 2.198 | 675 ms/step , 58275.32 GFLOP/s , 533347.0 tokens/s INFO:__main__:2024-10-27 08:57:13 | Validation | Step: 69000 | Val_loss: 2.252 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 08:57:13 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_085713_step_69000.pt` INFO:__main__:2024-10-27 08:57:14 | Epoch: 2 | Step: 69000 | Dataset: 0-15744802 | Loss: 2.148 | 673 ms/step , 58403.75 GFLOP/s , 480916.8 tokens/s INFO:__main__:2024-10-27 08:57:22 | Epoch: 2 | Step: 69010 | Dataset: 0-15752802 | Loss: 2.190 | 675 ms/step , 58202.45 GFLOP/s , 532439.6 tokens/s INFO:__main__:2024-10-27 08:57:29 | Epoch: 2 | Step: 69020 | Dataset: 0-15760802 | Loss: 1.754 | 674 ms/step , 58317.22 GFLOP/s , 532935.2 tokens/s INFO:__main__:2024-10-27 08:57:37 | Epoch: 2 | Step: 69030 | Dataset: 0-15768802 | Loss: 1.718 | 675 ms/step , 58225.83 GFLOP/s , 532702.6 tokens/s INFO:__main__:2024-10-27 08:57:45 | Epoch: 2 | Step: 69040 | Dataset: 0-15776802 | Loss: 1.667 | 676 ms/step , 58125.08 GFLOP/s , 532202.0 tokens/s INFO:__main__:2024-10-27 08:57:52 | Epoch: 2 | Step: 69050 | Dataset: 0-15784802 | Loss: 1.662 | 674 ms/step , 58286.17 GFLOP/s , 532439.3 tokens/s INFO:__main__:2024-10-27 08:58:00 | Epoch: 2 | Step: 69060 | Dataset: 0-15792802 | Loss: 1.683 | 674 ms/step , 58291.36 GFLOP/s , 532425.0 tokens/s INFO:__main__:2024-10-27 08:58:08 | Epoch: 2 | Step: 69070 | Dataset: 0-15800802 | Loss: 1.639 | 674 ms/step , 58284.24 GFLOP/s , 532307.5 tokens/s INFO:__main__:2024-10-27 08:58:16 | Epoch: 2 | Step: 69080 | Dataset: 0-15808802 | Loss: 1.651 | 675 ms/step , 58207.87 GFLOP/s , 532361.4 tokens/s INFO:__main__:2024-10-27 08:58:23 | Epoch: 2 | Step: 69090 | Dataset: 0-15816802 | Loss: 1.664 | 675 ms/step , 58224.81 GFLOP/s , 532644.5 tokens/s INFO:__main__:2024-10-27 08:58:31 | Epoch: 2 | Step: 69100 | Dataset: 0-15824802 | Loss: 1.657 | 675 ms/step , 58197.99 GFLOP/s , 532673.3 tokens/s INFO:__main__:2024-10-27 08:58:39 | Epoch: 2 | Step: 69110 | Dataset: 0-15832802 | Loss: 2.270 | 675 ms/step , 58238.60 GFLOP/s , 533291.2 tokens/s INFO:__main__:2024-10-27 08:58:46 | Epoch: 2 | Step: 69120 | Dataset: 0-15840802 | Loss: 2.141 | 674 ms/step , 58361.69 GFLOP/s , 533345.8 tokens/s INFO:__main__:2024-10-27 08:58:54 | Epoch: 2 | Step: 69130 | Dataset: 0-15848802 | Loss: 2.198 | 676 ms/step , 58189.46 GFLOP/s , 532643.4 tokens/s INFO:__main__:2024-10-27 08:59:02 | Epoch: 2 | Step: 69140 | Dataset: 0-15856802 | Loss: 2.216 | 676 ms/step , 58188.19 GFLOP/s , 533391.0 tokens/s INFO:__main__:2024-10-27 08:59:09 | Epoch: 2 | Step: 69150 | Dataset: 0-15864802 | Loss: 2.172 | 676 ms/step , 58137.47 GFLOP/s , 532997.8 tokens/s INFO:__main__:2024-10-27 08:59:17 | Epoch: 2 | Step: 69160 | Dataset: 0-15872802 | Loss: 2.097 | 675 ms/step , 58251.64 GFLOP/s , 533188.4 tokens/s INFO:__main__:2024-10-27 08:59:25 | Epoch: 2 | Step: 69170 | Dataset: 0-15880802 | Loss: 2.206 | 676 ms/step , 58190.97 GFLOP/s , 532799.4 tokens/s INFO:__main__:2024-10-27 08:59:32 | Epoch: 2 | Step: 69180 | Dataset: 0-15888802 | Loss: 2.156 | 675 ms/step , 58259.55 GFLOP/s , 533533.1 tokens/s INFO:__main__:2024-10-27 08:59:40 | Epoch: 2 | Step: 69190 | Dataset: 0-15896802 | Loss: 2.157 | 675 ms/step , 58242.15 GFLOP/s , 532866.2 tokens/s INFO:__main__:2024-10-27 08:59:48 | Epoch: 2 | Step: 69200 | Dataset: 0-15904802 | Loss: 2.111 | 676 ms/step , 58114.44 GFLOP/s , 531299.2 tokens/s INFO:__main__:2024-10-27 08:59:55 | Epoch: 2 | Step: 69210 | Dataset: 0-15912802 | Loss: 2.150 | 677 ms/step , 58052.31 GFLOP/s , 531831.9 tokens/s INFO:__main__:2024-10-27 09:00:03 | Epoch: 2 | Step: 69220 | Dataset: 0-15920802 | Loss: 2.116 | 674 ms/step , 58349.00 GFLOP/s , 533477.8 tokens/s INFO:__main__:2024-10-27 09:00:11 | Epoch: 2 | Step: 69230 | Dataset: 0-15928802 | Loss: 2.200 | 674 ms/step , 58309.60 GFLOP/s , 533885.8 tokens/s INFO:__main__:2024-10-27 09:00:18 | Epoch: 2 | Step: 69240 | Dataset: 0-15936802 | Loss: 2.194 | 674 ms/step , 58317.95 GFLOP/s , 533724.1 tokens/s INFO:__main__:2024-10-27 09:00:26 | Epoch: 2 | Step: 69250 | Dataset: 0-15944802 | Loss: 2.182 | 676 ms/step , 58183.68 GFLOP/s , 533400.1 tokens/s INFO:__main__:2024-10-27 09:00:34 | Epoch: 2 | Step: 69260 | Dataset: 0-15952802 | Loss: 2.026 | 675 ms/step , 58221.19 GFLOP/s , 533155.0 tokens/s INFO:__main__:2024-10-27 09:00:42 | Epoch: 2 | Step: 69270 | Dataset: 0-15960802 | Loss: 2.196 | 674 ms/step , 58279.91 GFLOP/s , 533323.9 tokens/s INFO:__main__:2024-10-27 09:00:49 | Epoch: 2 | Step: 69280 | Dataset: 0-15968802 | Loss: 2.217 | 674 ms/step , 58317.51 GFLOP/s , 534084.1 tokens/s INFO:__main__:2024-10-27 09:00:57 | Epoch: 2 | Step: 69290 | Dataset: 0-15976802 | Loss: 2.237 | 674 ms/step , 58330.78 GFLOP/s , 533466.3 tokens/s INFO:__main__:2024-10-27 09:01:05 | Epoch: 2 | Step: 69300 | Dataset: 0-15984802 | Loss: 2.178 | 673 ms/step , 58377.79 GFLOP/s , 533765.2 tokens/s INFO:__main__:2024-10-27 09:01:12 | Epoch: 2 | Step: 69310 | Dataset: 0-15992802 | Loss: 2.142 | 675 ms/step , 58244.55 GFLOP/s , 532909.1 tokens/s INFO:__main__:2024-10-27 09:01:20 | Epoch: 2 | Step: 69320 | Dataset: 0-16000802 | Loss: 2.098 | 674 ms/step , 58309.52 GFLOP/s , 533473.3 tokens/s INFO:__main__:2024-10-27 09:01:28 | Epoch: 2 | Step: 69330 | Dataset: 0-16008802 | Loss: 2.177 | 675 ms/step , 58249.59 GFLOP/s , 532710.2 tokens/s INFO:__main__:2024-10-27 09:01:35 | Epoch: 2 | Step: 69340 | Dataset: 0-16016802 | Loss: 2.098 | 675 ms/step , 58246.17 GFLOP/s , 533271.5 tokens/s INFO:__main__:2024-10-27 09:01:43 | Epoch: 2 | Step: 69350 | Dataset: 0-16024802 | Loss: 2.092 | 676 ms/step , 58165.47 GFLOP/s , 532785.4 tokens/s INFO:__main__:2024-10-27 09:01:51 | Epoch: 2 | Step: 69360 | Dataset: 0-16032802 | Loss: 2.091 | 676 ms/step , 58138.01 GFLOP/s , 533362.3 tokens/s INFO:__main__:2024-10-27 09:01:58 | Epoch: 2 | Step: 69370 | Dataset: 0-16040802 | Loss: 2.120 | 675 ms/step , 58263.19 GFLOP/s , 533668.0 tokens/s INFO:__main__:2024-10-27 09:02:06 | Epoch: 2 | Step: 69380 | Dataset: 0-16048802 | Loss: 2.185 | 674 ms/step , 58292.27 GFLOP/s , 533573.3 tokens/s INFO:__main__:2024-10-27 09:02:14 | Epoch: 2 | Step: 69390 | Dataset: 0-16056802 | Loss: 2.125 | 675 ms/step , 58276.33 GFLOP/s , 533493.0 tokens/s INFO:__main__:2024-10-27 09:02:21 | Epoch: 2 | Step: 69400 | Dataset: 0-16064802 | Loss: 2.144 | 675 ms/step , 58278.97 GFLOP/s , 533006.2 tokens/s INFO:__main__:2024-10-27 09:02:29 | Epoch: 2 | Step: 69410 | Dataset: 0-16072802 | Loss: 2.168 | 674 ms/step , 58279.90 GFLOP/s , 533157.0 tokens/s INFO:__main__:2024-10-27 09:02:37 | Epoch: 2 | Step: 69420 | Dataset: 0-16080802 | Loss: 2.154 | 675 ms/step , 58225.72 GFLOP/s , 532694.5 tokens/s INFO:__main__:2024-10-27 09:02:44 | Epoch: 2 | Step: 69430 | Dataset: 0-16088802 | Loss: 2.219 | 674 ms/step , 58334.21 GFLOP/s , 533622.6 tokens/s INFO:__main__:2024-10-27 09:02:52 | Epoch: 2 | Step: 69440 | Dataset: 0-16096802 | Loss: 2.138 | 674 ms/step , 58287.19 GFLOP/s , 532892.2 tokens/s INFO:__main__:2024-10-27 09:03:00 | Epoch: 2 | Step: 69450 | Dataset: 0-16104802 | Loss: 2.087 | 674 ms/step , 58353.39 GFLOP/s , 534171.0 tokens/s INFO:__main__:2024-10-27 09:03:07 | Epoch: 2 | Step: 69460 | Dataset: 0-16112802 | Loss: 2.177 | 675 ms/step , 58249.92 GFLOP/s , 533229.1 tokens/s INFO:__main__:2024-10-27 09:03:15 | Epoch: 2 | Step: 69470 | Dataset: 0-16120802 | Loss: 2.106 | 674 ms/step , 58327.56 GFLOP/s , 534055.2 tokens/s INFO:__main__:2024-10-27 09:03:23 | Epoch: 2 | Step: 69480 | Dataset: 0-16128802 | Loss: 2.122 | 673 ms/step , 58377.25 GFLOP/s , 533556.9 tokens/s INFO:__main__:2024-10-27 09:03:30 | Epoch: 2 | Step: 69490 | Dataset: 0-16136802 | Loss: 2.069 | 673 ms/step , 58385.91 GFLOP/s , 533691.1 tokens/s INFO:__main__:2024-10-27 09:03:38 | Epoch: 2 | Step: 69500 | Dataset: 0-16144802 | Loss: 2.047 | 675 ms/step , 58235.69 GFLOP/s , 533047.1 tokens/s INFO:__main__:2024-10-27 09:03:46 | Epoch: 2 | Step: 69510 | Dataset: 0-16152802 | Loss: 2.110 | 675 ms/step , 58262.69 GFLOP/s , 533294.2 tokens/s INFO:__main__:2024-10-27 09:03:54 | Epoch: 2 | Step: 69520 | Dataset: 0-16160802 | Loss: 2.209 | 674 ms/step , 58282.68 GFLOP/s , 533330.5 tokens/s INFO:__main__:2024-10-27 09:04:01 | Epoch: 2 | Step: 69530 | Dataset: 0-16168802 | Loss: 2.107 | 675 ms/step , 58276.58 GFLOP/s , 532837.8 tokens/s INFO:__main__:2024-10-27 09:04:09 | Epoch: 2 | Step: 69540 | Dataset: 0-16176802 | Loss: 2.033 | 674 ms/step , 58284.37 GFLOP/s , 533744.0 tokens/s INFO:__main__:2024-10-27 09:04:17 | Epoch: 2 | Step: 69550 | Dataset: 0-16184802 | Loss: 2.145 | 673 ms/step , 58375.51 GFLOP/s , 533509.6 tokens/s INFO:__main__:2024-10-27 09:04:24 | Epoch: 2 | Step: 69560 | Dataset: 0-16192802 | Loss: 2.067 | 674 ms/step , 58297.90 GFLOP/s , 533756.8 tokens/s INFO:__main__:2024-10-27 09:04:32 | Epoch: 2 | Step: 69570 | Dataset: 0-16200802 | Loss: 2.088 | 675 ms/step , 58225.93 GFLOP/s , 532939.1 tokens/s INFO:__main__:2024-10-27 09:04:40 | Epoch: 2 | Step: 69580 | Dataset: 0-16208802 | Loss: 2.086 | 675 ms/step , 58261.74 GFLOP/s , 533002.7 tokens/s INFO:__main__:2024-10-27 09:04:47 | Epoch: 2 | Step: 69590 | Dataset: 0-16216802 | Loss: 2.166 | 676 ms/step , 58160.58 GFLOP/s , 532954.1 tokens/s INFO:__main__:2024-10-27 09:04:55 | Epoch: 2 | Step: 69600 | Dataset: 0-16224802 | Loss: 2.181 | 674 ms/step , 58333.89 GFLOP/s , 533183.0 tokens/s INFO:__main__:2024-10-27 09:05:03 | Epoch: 2 | Step: 69610 | Dataset: 0-16232802 | Loss: 2.132 | 674 ms/step , 58356.24 GFLOP/s , 533558.3 tokens/s INFO:__main__:2024-10-27 09:05:10 | Epoch: 2 | Step: 69620 | Dataset: 0-16240802 | Loss: 2.067 | 674 ms/step , 58300.00 GFLOP/s , 533370.2 tokens/s INFO:__main__:2024-10-27 09:05:18 | Epoch: 2 | Step: 69630 | Dataset: 0-16248802 | Loss: 2.090 | 675 ms/step , 58218.20 GFLOP/s , 533114.9 tokens/s INFO:__main__:2024-10-27 09:05:26 | Epoch: 2 | Step: 69640 | Dataset: 0-16256802 | Loss: 2.091 | 675 ms/step , 58234.95 GFLOP/s , 532860.0 tokens/s INFO:__main__:2024-10-27 09:05:33 | Epoch: 2 | Step: 69650 | Dataset: 0-16264802 | Loss: 2.182 | 676 ms/step , 58182.73 GFLOP/s , 533037.8 tokens/s INFO:__main__:2024-10-27 09:05:41 | Epoch: 2 | Step: 69660 | Dataset: 0-16272802 | Loss: 2.081 | 676 ms/step , 58184.94 GFLOP/s , 532627.1 tokens/s INFO:__main__:2024-10-27 09:05:49 | Epoch: 2 | Step: 69670 | Dataset: 0-16280802 | Loss: 2.085 | 675 ms/step , 58195.95 GFLOP/s , 533254.3 tokens/s INFO:__main__:2024-10-27 09:05:56 | Epoch: 2 | Step: 69680 | Dataset: 0-16288802 | Loss: 2.133 | 675 ms/step , 58225.06 GFLOP/s , 532905.4 tokens/s INFO:__main__:2024-10-27 09:06:04 | Epoch: 2 | Step: 69690 | Dataset: 0-16296802 | Loss: 2.139 | 675 ms/step , 58204.22 GFLOP/s , 533218.0 tokens/s INFO:__main__:2024-10-27 09:06:12 | Epoch: 2 | Step: 69700 | Dataset: 0-16304802 | Loss: 2.124 | 675 ms/step , 58272.37 GFLOP/s , 532903.9 tokens/s INFO:__main__:2024-10-27 09:06:20 | Epoch: 2 | Step: 69710 | Dataset: 0-16312802 | Loss: 2.100 | 675 ms/step , 58209.70 GFLOP/s , 532811.3 tokens/s INFO:__main__:2024-10-27 09:06:27 | Epoch: 2 | Step: 69720 | Dataset: 0-16320802 | Loss: 2.068 | 677 ms/step , 58084.34 GFLOP/s , 532192.4 tokens/s INFO:__main__:2024-10-27 09:06:35 | Epoch: 2 | Step: 69730 | Dataset: 0-16328802 | Loss: 2.131 | 674 ms/step , 58294.97 GFLOP/s , 532723.6 tokens/s INFO:__main__:2024-10-27 09:06:43 | Epoch: 2 | Step: 69740 | Dataset: 0-16336802 | Loss: 2.207 | 674 ms/step , 58297.76 GFLOP/s , 532505.5 tokens/s INFO:__main__:2024-10-27 09:06:50 | Epoch: 2 | Step: 69750 | Dataset: 0-16344802 | Loss: 1.858 | 676 ms/step , 58115.19 GFLOP/s , 532540.8 tokens/s INFO:__main__:2024-10-27 09:06:58 | Epoch: 2 | Step: 69760 | Dataset: 0-16352802 | Loss: 1.717 | 675 ms/step , 58237.99 GFLOP/s , 532538.4 tokens/s INFO:__main__:2024-10-27 09:07:06 | Epoch: 2 | Step: 69770 | Dataset: 0-16360802 | Loss: 1.705 | 675 ms/step , 58237.24 GFLOP/s , 531602.8 tokens/s INFO:__main__:2024-10-27 09:07:13 | Epoch: 2 | Step: 69780 | Dataset: 0-16368802 | Loss: 1.689 | 676 ms/step , 58147.01 GFLOP/s , 531349.9 tokens/s INFO:__main__:2024-10-27 09:07:21 | Epoch: 2 | Step: 69790 | Dataset: 0-16376802 | Loss: 1.661 | 674 ms/step , 58291.16 GFLOP/s , 531114.5 tokens/s INFO:__main__:2024-10-27 09:07:29 | Epoch: 2 | Step: 69800 | Dataset: 0-16384802 | Loss: 1.681 | 679 ms/step , 57909.01 GFLOP/s , 529476.0 tokens/s INFO:__main__:2024-10-27 09:07:37 | Epoch: 2 | Step: 69810 | Dataset: 0-16392802 | Loss: 1.675 | 675 ms/step , 58243.42 GFLOP/s , 530256.7 tokens/s INFO:__main__:2024-10-27 09:07:44 | Epoch: 2 | Step: 69820 | Dataset: 0-16400802 | Loss: 1.679 | 675 ms/step , 58252.03 GFLOP/s , 527575.8 tokens/s INFO:__main__:2024-10-27 09:07:52 | Epoch: 2 | Step: 69830 | Dataset: 0-16408802 | Loss: 1.665 | 677 ms/step , 58089.93 GFLOP/s , 531446.9 tokens/s INFO:__main__:2024-10-27 09:08:00 | Epoch: 2 | Step: 69840 | Dataset: 0-16416802 | Loss: 2.139 | 677 ms/step , 58028.60 GFLOP/s , 530921.9 tokens/s INFO:__main__:2024-10-27 09:08:07 | Epoch: 2 | Step: 69850 | Dataset: 0-16424802 | Loss: 2.161 | 678 ms/step , 58019.31 GFLOP/s , 530118.8 tokens/s INFO:__main__:2024-10-27 09:08:15 | Epoch: 2 | Step: 69860 | Dataset: 0-16432802 | Loss: 2.094 | 674 ms/step , 58295.56 GFLOP/s , 532196.1 tokens/s INFO:__main__:2024-10-27 09:08:23 | Epoch: 2 | Step: 69870 | Dataset: 0-16440802 | Loss: 2.167 | 675 ms/step , 58194.24 GFLOP/s , 532408.9 tokens/s INFO:__main__:2024-10-27 09:08:31 | Epoch: 2 | Step: 69880 | Dataset: 0-16448802 | Loss: 2.171 | 676 ms/step , 58191.94 GFLOP/s , 532196.2 tokens/s INFO:__main__:2024-10-27 09:08:38 | Epoch: 2 | Step: 69890 | Dataset: 0-16456802 | Loss: 2.174 | 676 ms/step , 58148.63 GFLOP/s , 532193.5 tokens/s INFO:__main__:2024-10-27 09:08:46 | Epoch: 2 | Step: 69900 | Dataset: 0-16464802 | Loss: 2.171 | 676 ms/step , 58175.43 GFLOP/s , 531063.8 tokens/s INFO:__main__:2024-10-27 09:08:54 | Epoch: 2 | Step: 69910 | Dataset: 0-16472802 | Loss: 2.179 | 674 ms/step , 58307.41 GFLOP/s , 532599.0 tokens/s INFO:__main__:2024-10-27 09:09:01 | Epoch: 2 | Step: 69920 | Dataset: 0-16480802 | Loss: 2.113 | 674 ms/step , 58285.58 GFLOP/s , 532923.3 tokens/s INFO:__main__:2024-10-27 09:09:09 | Epoch: 2 | Step: 69930 | Dataset: 0-16488802 | Loss: 2.091 | 674 ms/step , 58353.18 GFLOP/s , 533049.7 tokens/s INFO:__main__:2024-10-27 09:09:17 | Epoch: 2 | Step: 69940 | Dataset: 0-16496802 | Loss: 2.084 | 673 ms/step , 58388.47 GFLOP/s , 533543.8 tokens/s INFO:__main__:2024-10-27 09:09:24 | Epoch: 2 | Step: 69950 | Dataset: 0-16504802 | Loss: 2.092 | 675 ms/step , 58238.76 GFLOP/s , 533276.0 tokens/s INFO:__main__:2024-10-27 09:09:32 | Epoch: 2 | Step: 69960 | Dataset: 0-16512802 | Loss: 2.126 | 675 ms/step , 58240.21 GFLOP/s , 533293.3 tokens/s INFO:__main__:2024-10-27 09:09:40 | Epoch: 2 | Step: 69970 | Dataset: 0-16520802 | Loss: 2.120 | 675 ms/step , 58265.78 GFLOP/s , 533007.2 tokens/s INFO:__main__:2024-10-27 09:09:47 | Epoch: 2 | Step: 69980 | Dataset: 0-16528802 | Loss: 2.226 | 673 ms/step , 58369.55 GFLOP/s , 533726.6 tokens/s INFO:__main__:2024-10-27 09:09:55 | Epoch: 2 | Step: 69990 | Dataset: 0-16536802 | Loss: 2.086 | 675 ms/step , 58265.04 GFLOP/s , 533111.9 tokens/s INFO:__main__:2024-10-27 09:10:02 | Validation | Step: 70000 | Val_loss: 2.154 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 09:10:02 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_091002_step_70000.pt` INFO:__main__:2024-10-27 09:10:04 | Epoch: 2 | Step: 70000 | Dataset: 0-16544802 | Loss: 2.149 | 674 ms/step , 58365.50 GFLOP/s , 477600.4 tokens/s INFO:__main__:2024-10-27 09:10:11 | Epoch: 2 | Step: 70010 | Dataset: 0-16552802 | Loss: 2.150 | 675 ms/step , 58272.01 GFLOP/s , 533261.4 tokens/s INFO:__main__:2024-10-27 09:10:19 | Epoch: 2 | Step: 70020 | Dataset: 0-16560802 | Loss: 2.084 | 673 ms/step , 58397.37 GFLOP/s , 534014.2 tokens/s INFO:__main__:2024-10-27 09:10:27 | Epoch: 2 | Step: 70030 | Dataset: 0-16568802 | Loss: 2.148 | 674 ms/step , 58354.90 GFLOP/s , 533833.5 tokens/s INFO:__main__:2024-10-27 09:10:34 | Epoch: 2 | Step: 70040 | Dataset: 0-16576802 | Loss: 2.031 | 675 ms/step , 58265.15 GFLOP/s , 533880.2 tokens/s INFO:__main__:2024-10-27 09:10:42 | Epoch: 2 | Step: 70050 | Dataset: 0-16584802 | Loss: 2.118 | 674 ms/step , 58317.71 GFLOP/s , 533290.1 tokens/s INFO:__main__:2024-10-27 09:10:50 | Epoch: 2 | Step: 70060 | Dataset: 0-16592802 | Loss: 2.129 | 675 ms/step , 58233.77 GFLOP/s , 533442.2 tokens/s INFO:__main__:2024-10-27 09:10:57 | Epoch: 2 | Step: 70070 | Dataset: 0-16600802 | Loss: 2.166 | 675 ms/step , 58275.08 GFLOP/s , 533135.3 tokens/s INFO:__main__:2024-10-27 09:11:05 | Epoch: 2 | Step: 70080 | Dataset: 0-16608802 | Loss: 2.014 | 674 ms/step , 58336.88 GFLOP/s , 533151.7 tokens/s INFO:__main__:2024-10-27 09:11:13 | Epoch: 2 | Step: 70090 | Dataset: 0-16616802 | Loss: 2.128 | 673 ms/step , 58382.45 GFLOP/s , 533683.8 tokens/s INFO:__main__:2024-10-27 09:11:20 | Epoch: 2 | Step: 70100 | Dataset: 0-16624802 | Loss: 2.166 | 676 ms/step , 58159.19 GFLOP/s , 533202.6 tokens/s INFO:__main__:2024-10-27 09:11:28 | Epoch: 2 | Step: 70110 | Dataset: 0-16632802 | Loss: 2.134 | 675 ms/step , 58276.93 GFLOP/s , 533431.5 tokens/s INFO:__main__:2024-10-27 09:11:36 | Epoch: 2 | Step: 70120 | Dataset: 0-16640802 | Loss: 2.143 | 675 ms/step , 58259.15 GFLOP/s , 533214.6 tokens/s INFO:__main__:2024-10-27 09:11:44 | Epoch: 2 | Step: 70130 | Dataset: 0-16648802 | Loss: 2.071 | 675 ms/step , 58198.58 GFLOP/s , 532501.4 tokens/s INFO:__main__:2024-10-27 09:11:51 | Epoch: 2 | Step: 70140 | Dataset: 0-16656802 | Loss: 2.032 | 675 ms/step , 58241.32 GFLOP/s , 533225.6 tokens/s INFO:__main__:2024-10-27 09:11:59 | Epoch: 2 | Step: 70150 | Dataset: 0-16664802 | Loss: 2.067 | 674 ms/step , 58321.27 GFLOP/s , 533102.9 tokens/s INFO:__main__:2024-10-27 09:12:07 | Epoch: 2 | Step: 70160 | Dataset: 0-16672802 | Loss: 1.775 | 674 ms/step , 58295.84 GFLOP/s , 532680.0 tokens/s INFO:__main__:2024-10-27 09:12:14 | Epoch: 2 | Step: 70170 | Dataset: 0-16680802 | Loss: 1.708 | 674 ms/step , 58297.28 GFLOP/s , 532980.0 tokens/s INFO:__main__:2024-10-27 09:12:22 | Epoch: 2 | Step: 70180 | Dataset: 0-16688802 | Loss: 1.672 | 675 ms/step , 58277.96 GFLOP/s , 532727.5 tokens/s INFO:__main__:2024-10-27 09:12:30 | Epoch: 2 | Step: 70190 | Dataset: 0-16696802 | Loss: 1.700 | 674 ms/step , 58280.93 GFLOP/s , 532535.5 tokens/s INFO:__main__:2024-10-27 09:12:37 | Epoch: 2 | Step: 70200 | Dataset: 0-16704802 | Loss: 1.694 | 674 ms/step , 58305.37 GFLOP/s , 532702.7 tokens/s INFO:__main__:2024-10-27 09:12:45 | Epoch: 2 | Step: 70210 | Dataset: 0-16712802 | Loss: 1.670 | 674 ms/step , 58281.59 GFLOP/s , 533016.0 tokens/s INFO:__main__:2024-10-27 09:12:53 | Epoch: 2 | Step: 70220 | Dataset: 0-16720802 | Loss: 1.670 | 674 ms/step , 58285.88 GFLOP/s , 532549.1 tokens/s INFO:__main__:2024-10-27 09:13:00 | Epoch: 2 | Step: 70230 | Dataset: 0-16728802 | Loss: 1.654 | 675 ms/step , 58248.56 GFLOP/s , 532334.8 tokens/s INFO:__main__:2024-10-27 09:13:08 | Epoch: 2 | Step: 70240 | Dataset: 0-16736802 | Loss: 2.305 | 674 ms/step , 58316.16 GFLOP/s , 532528.8 tokens/s INFO:__main__:2024-10-27 09:13:16 | Epoch: 2 | Step: 70250 | Dataset: 0-16744802 | Loss: 2.100 | 675 ms/step , 58271.11 GFLOP/s , 533062.5 tokens/s INFO:__main__:2024-10-27 09:13:23 | Epoch: 2 | Step: 70260 | Dataset: 0-16752802 | Loss: 2.231 | 676 ms/step , 58173.62 GFLOP/s , 532974.7 tokens/s INFO:__main__:2024-10-27 09:13:31 | Epoch: 2 | Step: 70270 | Dataset: 0-16760802 | Loss: 2.175 | 676 ms/step , 58172.57 GFLOP/s , 533011.7 tokens/s INFO:__main__:2024-10-27 09:13:39 | Epoch: 2 | Step: 70280 | Dataset: 0-16768802 | Loss: 2.191 | 675 ms/step , 58268.38 GFLOP/s , 532686.1 tokens/s INFO:__main__:2024-10-27 09:13:47 | Epoch: 2 | Step: 70290 | Dataset: 0-16776802 | Loss: 2.198 | 675 ms/step , 58273.29 GFLOP/s , 532817.4 tokens/s INFO:__main__:2024-10-27 09:13:54 | Epoch: 2 | Step: 70300 | Dataset: 0-16784802 | Loss: 2.174 | 675 ms/step , 58199.01 GFLOP/s , 532886.2 tokens/s INFO:__main__:2024-10-27 09:14:02 | Epoch: 2 | Step: 70310 | Dataset: 0-16792802 | Loss: 2.058 | 677 ms/step , 58092.76 GFLOP/s , 532614.0 tokens/s INFO:__main__:2024-10-27 09:14:10 | Epoch: 2 | Step: 70320 | Dataset: 0-16800802 | Loss: 2.085 | 675 ms/step , 58219.52 GFLOP/s , 532675.6 tokens/s INFO:__main__:2024-10-27 09:14:17 | Epoch: 2 | Step: 70330 | Dataset: 0-16808802 | Loss: 2.158 | 675 ms/step , 58254.27 GFLOP/s , 532796.5 tokens/s INFO:__main__:2024-10-27 09:14:25 | Epoch: 2 | Step: 70340 | Dataset: 0-16816802 | Loss: 2.216 | 674 ms/step , 58279.81 GFLOP/s , 532894.4 tokens/s INFO:__main__:2024-10-27 09:14:33 | Epoch: 2 | Step: 70350 | Dataset: 0-16824802 | Loss: 2.107 | 675 ms/step , 58249.03 GFLOP/s , 531899.6 tokens/s INFO:__main__:2024-10-27 09:14:40 | Epoch: 2 | Step: 70360 | Dataset: 0-16832802 | Loss: 2.049 | 675 ms/step , 58195.11 GFLOP/s , 532620.0 tokens/s INFO:__main__:2024-10-27 09:14:48 | Epoch: 2 | Step: 70370 | Dataset: 0-16840802 | Loss: 2.020 | 675 ms/step , 58271.23 GFLOP/s , 532670.8 tokens/s INFO:__main__:2024-10-27 09:14:56 | Epoch: 2 | Step: 70380 | Dataset: 0-16848802 | Loss: 2.148 | 675 ms/step , 58256.55 GFLOP/s , 532901.0 tokens/s INFO:__main__:2024-10-27 09:15:03 | Epoch: 2 | Step: 70390 | Dataset: 0-16856802 | Loss: 2.122 | 676 ms/step , 58127.01 GFLOP/s , 532768.2 tokens/s INFO:__main__:2024-10-27 09:15:11 | Epoch: 2 | Step: 70400 | Dataset: 0-16864802 | Loss: 2.006 | 675 ms/step , 58200.68 GFLOP/s , 532635.1 tokens/s INFO:__main__:2024-10-27 09:15:19 | Epoch: 2 | Step: 70410 | Dataset: 0-16872802 | Loss: 1.731 | 676 ms/step , 58144.01 GFLOP/s , 531832.1 tokens/s INFO:__main__:2024-10-27 09:15:26 | Epoch: 2 | Step: 70420 | Dataset: 0-16880802 | Loss: 1.647 | 674 ms/step , 58299.23 GFLOP/s , 532076.6 tokens/s INFO:__main__:2024-10-27 09:15:34 | Epoch: 2 | Step: 70430 | Dataset: 0-16888802 | Loss: 1.664 | 676 ms/step , 58174.32 GFLOP/s , 531979.7 tokens/s INFO:__main__:2024-10-27 09:15:42 | Epoch: 2 | Step: 70440 | Dataset: 0-16896802 | Loss: 1.643 | 676 ms/step , 58160.70 GFLOP/s , 531852.2 tokens/s INFO:__main__:2024-10-27 09:15:50 | Epoch: 2 | Step: 70450 | Dataset: 0-16904802 | Loss: 1.669 | 676 ms/step , 58172.21 GFLOP/s , 531960.6 tokens/s INFO:__main__:2024-10-27 09:15:57 | Epoch: 2 | Step: 70460 | Dataset: 0-16912802 | Loss: 1.651 | 675 ms/step , 58247.57 GFLOP/s , 531777.5 tokens/s INFO:__main__:2024-10-27 09:16:05 | Epoch: 2 | Step: 70470 | Dataset: 0-16920802 | Loss: 1.646 | 675 ms/step , 58229.88 GFLOP/s , 532201.5 tokens/s INFO:__main__:2024-10-27 09:16:13 | Epoch: 2 | Step: 70480 | Dataset: 0-16928802 | Loss: 1.639 | 676 ms/step , 58189.61 GFLOP/s , 532104.1 tokens/s INFO:__main__:2024-10-27 09:16:20 | Epoch: 2 | Step: 70490 | Dataset: 0-16936802 | Loss: 2.327 | 673 ms/step , 58366.46 GFLOP/s , 532376.8 tokens/s INFO:__main__:2024-10-27 09:16:28 | Epoch: 2 | Step: 70500 | Dataset: 0-16944802 | Loss: 2.322 | 675 ms/step , 58273.35 GFLOP/s , 532870.9 tokens/s INFO:__main__:2024-10-27 09:16:36 | Epoch: 2 | Step: 70510 | Dataset: 0-16952802 | Loss: 2.264 | 674 ms/step , 58326.53 GFLOP/s , 533322.0 tokens/s INFO:__main__:2024-10-27 09:16:43 | Epoch: 2 | Step: 70520 | Dataset: 0-16960802 | Loss: 2.197 | 676 ms/step , 58179.73 GFLOP/s , 532940.1 tokens/s INFO:__main__:2024-10-27 09:16:51 | Epoch: 2 | Step: 70530 | Dataset: 0-16968802 | Loss: 2.197 | 675 ms/step , 58275.72 GFLOP/s , 532589.3 tokens/s INFO:__main__:2024-10-27 09:16:59 | Epoch: 2 | Step: 70540 | Dataset: 0-16976802 | Loss: 2.184 | 675 ms/step , 58258.21 GFLOP/s , 532907.9 tokens/s INFO:__main__:2024-10-27 09:17:07 | Epoch: 2 | Step: 70550 | Dataset: 0-16984802 | Loss: 2.190 | 673 ms/step , 58366.36 GFLOP/s , 532230.0 tokens/s INFO:__main__:2024-10-27 09:17:14 | Epoch: 2 | Step: 70560 | Dataset: 0-16992802 | Loss: 2.237 | 675 ms/step , 58209.86 GFLOP/s , 532818.7 tokens/s INFO:__main__:2024-10-27 09:17:22 | Epoch: 2 | Step: 70570 | Dataset: 0-17000802 | Loss: 2.143 | 674 ms/step , 58323.75 GFLOP/s , 532867.4 tokens/s INFO:__main__:2024-10-27 09:17:30 | Epoch: 2 | Step: 70580 | Dataset: 0-17008802 | Loss: 2.212 | 674 ms/step , 58292.40 GFLOP/s , 533776.8 tokens/s INFO:__main__:2024-10-27 09:17:37 | Epoch: 2 | Step: 70590 | Dataset: 0-17016802 | Loss: 2.227 | 674 ms/step , 58312.88 GFLOP/s , 533036.0 tokens/s INFO:__main__:2024-10-27 09:17:45 | Epoch: 2 | Step: 70600 | Dataset: 0-17024802 | Loss: 2.206 | 674 ms/step , 58364.55 GFLOP/s , 533729.5 tokens/s INFO:__main__:2024-10-27 09:17:53 | Epoch: 2 | Step: 70610 | Dataset: 0-17032802 | Loss: 2.218 | 675 ms/step , 58248.60 GFLOP/s , 533040.7 tokens/s INFO:__main__:2024-10-27 09:18:00 | Epoch: 2 | Step: 70620 | Dataset: 0-17040802 | Loss: 2.100 | 673 ms/step , 58411.28 GFLOP/s , 533393.8 tokens/s INFO:__main__:2024-10-27 09:18:08 | Epoch: 2 | Step: 70630 | Dataset: 0-17048802 | Loss: 2.200 | 674 ms/step , 58345.34 GFLOP/s , 533823.8 tokens/s INFO:__main__:2024-10-27 09:18:16 | Epoch: 2 | Step: 70640 | Dataset: 0-17056802 | Loss: 2.152 | 674 ms/step , 58356.35 GFLOP/s , 533349.3 tokens/s INFO:__main__:2024-10-27 09:18:23 | Epoch: 2 | Step: 70650 | Dataset: 0-17064802 | Loss: 2.257 | 674 ms/step , 58326.06 GFLOP/s , 533414.1 tokens/s INFO:__main__:2024-10-27 09:18:31 | Epoch: 2 | Step: 70660 | Dataset: 0-17072802 | Loss: 2.188 | 676 ms/step , 58141.56 GFLOP/s , 533136.5 tokens/s INFO:__main__:2024-10-27 09:18:39 | Epoch: 2 | Step: 70670 | Dataset: 0-17080802 | Loss: 2.220 | 675 ms/step , 58276.24 GFLOP/s , 533075.9 tokens/s INFO:__main__:2024-10-27 09:18:46 | Epoch: 2 | Step: 70680 | Dataset: 0-17088802 | Loss: 2.197 | 673 ms/step , 58406.18 GFLOP/s , 533383.0 tokens/s INFO:__main__:2024-10-27 09:18:54 | Epoch: 2 | Step: 70690 | Dataset: 0-17096802 | Loss: 2.251 | 674 ms/step , 58297.70 GFLOP/s , 533319.0 tokens/s INFO:__main__:2024-10-27 09:19:02 | Epoch: 2 | Step: 70700 | Dataset: 0-17104802 | Loss: 2.174 | 675 ms/step , 58272.75 GFLOP/s , 533197.1 tokens/s INFO:__main__:2024-10-27 09:19:09 | Epoch: 2 | Step: 70710 | Dataset: 0-17112802 | Loss: 2.160 | 675 ms/step , 58258.96 GFLOP/s , 533241.1 tokens/s INFO:__main__:2024-10-27 09:19:17 | Epoch: 2 | Step: 70720 | Dataset: 0-17120802 | Loss: 2.166 | 674 ms/step , 58326.52 GFLOP/s , 533307.9 tokens/s INFO:__main__:2024-10-27 09:19:25 | Epoch: 2 | Step: 70730 | Dataset: 0-17128802 | Loss: 2.160 | 673 ms/step , 58388.96 GFLOP/s , 533806.1 tokens/s INFO:__main__:2024-10-27 09:19:32 | Epoch: 2 | Step: 70740 | Dataset: 0-17136802 | Loss: 2.206 | 674 ms/step , 58327.65 GFLOP/s , 533477.6 tokens/s INFO:__main__:2024-10-27 09:19:40 | Epoch: 2 | Step: 70750 | Dataset: 0-17144802 | Loss: 2.248 | 676 ms/step , 58170.26 GFLOP/s , 532578.2 tokens/s INFO:__main__:2024-10-27 09:19:48 | Epoch: 2 | Step: 70760 | Dataset: 0-17152802 | Loss: 2.137 | 673 ms/step , 58405.25 GFLOP/s , 533534.7 tokens/s INFO:__main__:2024-10-27 09:19:55 | Epoch: 2 | Step: 70770 | Dataset: 0-17160802 | Loss: 2.154 | 675 ms/step , 58250.83 GFLOP/s , 533114.1 tokens/s INFO:__main__:2024-10-27 09:20:03 | Epoch: 2 | Step: 70780 | Dataset: 0-17168802 | Loss: 2.168 | 673 ms/step , 58388.86 GFLOP/s , 533174.3 tokens/s INFO:__main__:2024-10-27 09:20:11 | Epoch: 2 | Step: 70790 | Dataset: 0-17176802 | Loss: 2.228 | 675 ms/step , 58218.80 GFLOP/s , 533149.4 tokens/s INFO:__main__:2024-10-27 09:20:19 | Epoch: 2 | Step: 70800 | Dataset: 0-17184802 | Loss: 2.204 | 674 ms/step , 58326.71 GFLOP/s , 533666.1 tokens/s INFO:__main__:2024-10-27 09:20:26 | Epoch: 2 | Step: 70810 | Dataset: 0-17192802 | Loss: 1.810 | 676 ms/step , 58185.33 GFLOP/s , 533473.6 tokens/s INFO:__main__:2024-10-27 09:20:34 | Epoch: 2 | Step: 70820 | Dataset: 0-17200802 | Loss: 1.710 | 674 ms/step , 58312.79 GFLOP/s , 532768.0 tokens/s INFO:__main__:2024-10-27 09:20:42 | Epoch: 2 | Step: 70830 | Dataset: 0-17208802 | Loss: 1.700 | 675 ms/step , 58220.53 GFLOP/s , 533136.3 tokens/s INFO:__main__:2024-10-27 09:20:49 | Epoch: 2 | Step: 70840 | Dataset: 0-17216802 | Loss: 1.641 | 674 ms/step , 58331.41 GFLOP/s , 533100.7 tokens/s INFO:__main__:2024-10-27 09:20:57 | Epoch: 2 | Step: 70850 | Dataset: 0-17224802 | Loss: 1.657 | 675 ms/step , 58235.58 GFLOP/s , 533121.5 tokens/s INFO:__main__:2024-10-27 09:21:05 | Epoch: 2 | Step: 70860 | Dataset: 0-17232802 | Loss: 1.656 | 674 ms/step , 58328.02 GFLOP/s , 532724.9 tokens/s INFO:__main__:2024-10-27 09:21:12 | Epoch: 2 | Step: 70870 | Dataset: 0-17240802 | Loss: 1.660 | 673 ms/step , 58390.70 GFLOP/s , 533228.3 tokens/s INFO:__main__:2024-10-27 09:21:20 | Epoch: 2 | Step: 70880 | Dataset: 0-17248802 | Loss: 1.633 | 674 ms/step , 58292.54 GFLOP/s , 533556.7 tokens/s INFO:__main__:2024-10-27 09:21:28 | Epoch: 2 | Step: 70890 | Dataset: 0-17256802 | Loss: 1.684 | 674 ms/step , 58301.14 GFLOP/s , 532904.0 tokens/s INFO:__main__:2024-10-27 09:21:35 | Epoch: 2 | Step: 70900 | Dataset: 0-17264802 | Loss: 2.284 | 674 ms/step , 58360.48 GFLOP/s , 532914.5 tokens/s INFO:__main__:2024-10-27 09:21:43 | Epoch: 2 | Step: 70910 | Dataset: 0-17272802 | Loss: 2.148 | 674 ms/step , 58349.46 GFLOP/s , 533659.4 tokens/s INFO:__main__:2024-10-27 09:21:51 | Epoch: 2 | Step: 70920 | Dataset: 0-17280802 | Loss: 2.212 | 675 ms/step , 58228.93 GFLOP/s , 533288.7 tokens/s INFO:__main__:2024-10-27 09:21:58 | Epoch: 2 | Step: 70930 | Dataset: 0-17288802 | Loss: 2.217 | 675 ms/step , 58265.34 GFLOP/s , 533267.7 tokens/s INFO:__main__:2024-10-27 09:22:06 | Epoch: 2 | Step: 70940 | Dataset: 0-17296802 | Loss: 2.195 | 673 ms/step , 58401.91 GFLOP/s , 533575.9 tokens/s INFO:__main__:2024-10-27 09:22:14 | Epoch: 2 | Step: 70950 | Dataset: 0-17304802 | Loss: 2.146 | 675 ms/step , 58257.20 GFLOP/s , 533475.9 tokens/s INFO:__main__:2024-10-27 09:22:21 | Epoch: 2 | Step: 70960 | Dataset: 0-17312802 | Loss: 2.187 | 676 ms/step , 58129.91 GFLOP/s , 532671.5 tokens/s INFO:__main__:2024-10-27 09:22:29 | Epoch: 2 | Step: 70970 | Dataset: 0-17320802 | Loss: 2.161 | 675 ms/step , 58245.44 GFLOP/s , 532352.1 tokens/s INFO:__main__:2024-10-27 09:22:37 | Epoch: 2 | Step: 70980 | Dataset: 0-17328802 | Loss: 2.220 | 675 ms/step , 58251.65 GFLOP/s , 532635.0 tokens/s INFO:__main__:2024-10-27 09:22:45 | Epoch: 2 | Step: 70990 | Dataset: 0-17336802 | Loss: 2.164 | 674 ms/step , 58286.95 GFLOP/s , 532710.2 tokens/s INFO:__main__:2024-10-27 09:22:52 | Validation | Step: 71000 | Val_loss: 2.173 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 09:22:52 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_092252_step_71000.pt` INFO:__main__:2024-10-27 09:22:53 | Epoch: 2 | Step: 71000 | Dataset: 0-17344802 | Loss: 2.273 | 674 ms/step , 58353.92 GFLOP/s , 480414.1 tokens/s INFO:__main__:2024-10-27 09:23:01 | Epoch: 2 | Step: 71010 | Dataset: 0-17352802 | Loss: 2.220 | 675 ms/step , 58235.34 GFLOP/s , 532341.8 tokens/s INFO:__main__:2024-10-27 09:23:08 | Epoch: 2 | Step: 71020 | Dataset: 0-17360802 | Loss: 2.214 | 676 ms/step , 58138.95 GFLOP/s , 533314.0 tokens/s INFO:__main__:2024-10-27 09:23:16 | Epoch: 2 | Step: 71030 | Dataset: 0-17368802 | Loss: 2.123 | 675 ms/step , 58258.42 GFLOP/s , 533272.1 tokens/s INFO:__main__:2024-10-27 09:23:24 | Epoch: 2 | Step: 71040 | Dataset: 0-17376802 | Loss: 2.186 | 676 ms/step , 58167.97 GFLOP/s , 531725.5 tokens/s INFO:__main__:2024-10-27 09:23:32 | Epoch: 2 | Step: 71050 | Dataset: 0-17384802 | Loss: 2.093 | 678 ms/step , 57997.51 GFLOP/s , 531233.8 tokens/s INFO:__main__:2024-10-27 09:23:39 | Epoch: 2 | Step: 71060 | Dataset: 0-17392802 | Loss: 2.391 | 675 ms/step , 58229.45 GFLOP/s , 531737.0 tokens/s INFO:__main__:2024-10-27 09:23:47 | Epoch: 2 | Step: 71070 | Dataset: 0-17400802 | Loss: 2.297 | 676 ms/step , 58177.56 GFLOP/s , 532714.3 tokens/s INFO:__main__:2024-10-27 09:23:55 | Epoch: 2 | Step: 71080 | Dataset: 0-17408802 | Loss: 2.214 | 677 ms/step , 58096.64 GFLOP/s , 532152.2 tokens/s INFO:__main__:2024-10-27 09:24:02 | Epoch: 2 | Step: 71090 | Dataset: 0-17416802 | Loss: 2.168 | 676 ms/step , 58118.89 GFLOP/s , 532320.0 tokens/s INFO:__main__:2024-10-27 09:24:10 | Epoch: 2 | Step: 71100 | Dataset: 0-17424802 | Loss: 2.148 | 676 ms/step , 58189.07 GFLOP/s , 532433.7 tokens/s INFO:__main__:2024-10-27 09:24:18 | Epoch: 2 | Step: 71110 | Dataset: 0-17432802 | Loss: 2.157 | 674 ms/step , 58302.39 GFLOP/s , 532592.8 tokens/s INFO:__main__:2024-10-27 09:24:25 | Epoch: 2 | Step: 71120 | Dataset: 0-17440802 | Loss: 2.191 | 676 ms/step , 58185.53 GFLOP/s , 532944.2 tokens/s INFO:__main__:2024-10-27 09:24:33 | Epoch: 2 | Step: 71130 | Dataset: 0-17448802 | Loss: 2.102 | 675 ms/step , 58261.78 GFLOP/s , 533017.4 tokens/s INFO:__main__:2024-10-27 09:24:41 | Epoch: 2 | Step: 71140 | Dataset: 0-17456802 | Loss: 2.108 | 675 ms/step , 58224.55 GFLOP/s , 532613.0 tokens/s INFO:__main__:2024-10-27 09:24:48 | Epoch: 2 | Step: 71150 | Dataset: 0-17464802 | Loss: 2.116 | 677 ms/step , 58102.03 GFLOP/s , 532514.8 tokens/s INFO:__main__:2024-10-27 09:24:56 | Epoch: 2 | Step: 71160 | Dataset: 0-17472802 | Loss: 2.101 | 676 ms/step , 58125.39 GFLOP/s , 531602.8 tokens/s INFO:__main__:2024-10-27 09:25:04 | Epoch: 2 | Step: 71170 | Dataset: 0-17480802 | Loss: 2.108 | 677 ms/step , 58102.47 GFLOP/s , 532422.7 tokens/s INFO:__main__:2024-10-27 09:25:12 | Epoch: 2 | Step: 71180 | Dataset: 0-17488802 | Loss: 2.090 | 675 ms/step , 58242.09 GFLOP/s , 532554.3 tokens/s INFO:__main__:2024-10-27 09:25:19 | Epoch: 2 | Step: 71190 | Dataset: 0-17496802 | Loss: 2.072 | 675 ms/step , 58236.55 GFLOP/s , 532633.3 tokens/s INFO:__main__:2024-10-27 09:25:27 | Epoch: 2 | Step: 71200 | Dataset: 0-17504802 | Loss: 2.130 | 676 ms/step , 58175.72 GFLOP/s , 532466.4 tokens/s INFO:__main__:2024-10-27 09:25:35 | Epoch: 2 | Step: 71210 | Dataset: 0-17512802 | Loss: 2.087 | 676 ms/step , 58167.13 GFLOP/s , 530992.4 tokens/s INFO:__main__:2024-10-27 09:25:42 | Epoch: 2 | Step: 71220 | Dataset: 0-17520802 | Loss: 2.250 | 676 ms/step , 58168.36 GFLOP/s , 530882.5 tokens/s INFO:__main__:2024-10-27 09:25:50 | Epoch: 2 | Step: 71230 | Dataset: 0-17528802 | Loss: 2.149 | 675 ms/step , 58199.29 GFLOP/s , 531771.4 tokens/s INFO:__main__:2024-10-27 09:25:58 | Epoch: 2 | Step: 71240 | Dataset: 0-17536802 | Loss: 2.246 | 677 ms/step , 58105.23 GFLOP/s , 531136.7 tokens/s INFO:__main__:2024-10-27 09:26:05 | Epoch: 2 | Step: 71250 | Dataset: 0-17544802 | Loss: 2.142 | 675 ms/step , 58262.47 GFLOP/s , 531800.8 tokens/s INFO:__main__:2024-10-27 09:26:13 | Epoch: 2 | Step: 71260 | Dataset: 0-17552802 | Loss: 2.120 | 675 ms/step , 58215.20 GFLOP/s , 532074.5 tokens/s INFO:__main__:2024-10-27 09:26:21 | Epoch: 2 | Step: 71270 | Dataset: 0-17560802 | Loss: 2.212 | 676 ms/step , 58132.52 GFLOP/s , 530788.3 tokens/s INFO:__main__:2024-10-27 09:26:29 | Epoch: 2 | Step: 71280 | Dataset: 0-17568802 | Loss: 2.098 | 676 ms/step , 58170.34 GFLOP/s , 529530.3 tokens/s INFO:__main__:2024-10-27 09:26:36 | Epoch: 2 | Step: 71290 | Dataset: 0-17576802 | Loss: 2.173 | 675 ms/step , 58257.73 GFLOP/s , 533345.0 tokens/s INFO:__main__:2024-10-27 09:26:44 | Epoch: 2 | Step: 71300 | Dataset: 0-17584802 | Loss: 2.102 | 675 ms/step , 58209.11 GFLOP/s , 532905.0 tokens/s INFO:__main__:2024-10-27 09:26:52 | Epoch: 2 | Step: 71310 | Dataset: 0-17592802 | Loss: 2.125 | 674 ms/step , 58347.91 GFLOP/s , 532866.1 tokens/s INFO:__main__:2024-10-27 09:26:59 | Epoch: 2 | Step: 71320 | Dataset: 0-17600802 | Loss: 2.180 | 674 ms/step , 58347.51 GFLOP/s , 533132.1 tokens/s INFO:__main__:2024-10-27 09:27:07 | Epoch: 2 | Step: 71330 | Dataset: 0-17608802 | Loss: 2.007 | 677 ms/step , 58031.84 GFLOP/s , 532370.7 tokens/s INFO:__main__:2024-10-27 09:27:15 | Epoch: 2 | Step: 71340 | Dataset: 0-17616802 | Loss: 2.154 | 674 ms/step , 58315.16 GFLOP/s , 532195.0 tokens/s INFO:__main__:2024-10-27 09:27:22 | Epoch: 2 | Step: 71350 | Dataset: 0-17624802 | Loss: 2.134 | 673 ms/step , 58430.93 GFLOP/s , 533762.4 tokens/s INFO:__main__:2024-10-27 09:27:30 | Epoch: 2 | Step: 71360 | Dataset: 0-17632802 | Loss: 2.114 | 674 ms/step , 58358.10 GFLOP/s , 533990.0 tokens/s INFO:__main__:2024-10-27 09:27:38 | Epoch: 2 | Step: 71370 | Dataset: 0-17640802 | Loss: 2.092 | 674 ms/step , 58290.63 GFLOP/s , 533116.3 tokens/s INFO:__main__:2024-10-27 09:27:45 | Epoch: 2 | Step: 71380 | Dataset: 0-17648802 | Loss: 2.213 | 674 ms/step , 58294.09 GFLOP/s , 532882.8 tokens/s INFO:__main__:2024-10-27 09:27:53 | Epoch: 2 | Step: 71390 | Dataset: 0-17656802 | Loss: 2.175 | 675 ms/step , 58211.70 GFLOP/s , 532753.9 tokens/s INFO:__main__:2024-10-27 09:28:01 | Epoch: 2 | Step: 71400 | Dataset: 0-17664802 | Loss: 2.223 | 674 ms/step , 58315.95 GFLOP/s , 533595.1 tokens/s INFO:__main__:2024-10-27 09:28:08 | Epoch: 2 | Step: 71410 | Dataset: 0-17672802 | Loss: 2.185 | 674 ms/step , 58342.94 GFLOP/s , 533148.1 tokens/s INFO:__main__:2024-10-27 09:28:16 | Epoch: 2 | Step: 71420 | Dataset: 0-17680802 | Loss: 2.178 | 674 ms/step , 58293.37 GFLOP/s , 532906.0 tokens/s INFO:__main__:2024-10-27 09:28:24 | Epoch: 2 | Step: 71430 | Dataset: 0-17688802 | Loss: 2.130 | 675 ms/step , 58270.18 GFLOP/s , 533064.7 tokens/s INFO:__main__:2024-10-27 09:28:32 | Epoch: 2 | Step: 71440 | Dataset: 0-17696802 | Loss: 2.236 | 674 ms/step , 58316.74 GFLOP/s , 533103.2 tokens/s INFO:__main__:2024-10-27 09:28:39 | Epoch: 2 | Step: 71450 | Dataset: 0-17704802 | Loss: 2.224 | 674 ms/step , 58318.24 GFLOP/s , 533536.6 tokens/s INFO:__main__:2024-10-27 09:28:47 | Epoch: 2 | Step: 71460 | Dataset: 0-17712802 | Loss: 2.187 | 673 ms/step , 58382.55 GFLOP/s , 533439.6 tokens/s INFO:__main__:2024-10-27 09:28:55 | Epoch: 2 | Step: 71470 | Dataset: 0-17720802 | Loss: 2.231 | 674 ms/step , 58333.77 GFLOP/s , 533654.3 tokens/s INFO:__main__:2024-10-27 09:29:02 | Epoch: 2 | Step: 71480 | Dataset: 0-17728802 | Loss: 2.168 | 674 ms/step , 58308.45 GFLOP/s , 533399.5 tokens/s INFO:__main__:2024-10-27 09:29:10 | Epoch: 2 | Step: 71490 | Dataset: 0-17736802 | Loss: 2.052 | 674 ms/step , 58350.84 GFLOP/s , 533631.8 tokens/s INFO:__main__:2024-10-27 09:29:18 | Epoch: 2 | Step: 71500 | Dataset: 0-17744802 | Loss: 2.154 | 673 ms/step , 58384.21 GFLOP/s , 533294.8 tokens/s INFO:__main__:2024-10-27 09:29:25 | Epoch: 2 | Step: 71510 | Dataset: 0-17752802 | Loss: 2.182 | 673 ms/step , 58384.87 GFLOP/s , 532643.2 tokens/s INFO:__main__:2024-10-27 09:29:33 | Epoch: 2 | Step: 71520 | Dataset: 0-17760802 | Loss: 2.165 | 674 ms/step , 58329.95 GFLOP/s , 533458.3 tokens/s INFO:__main__:2024-10-27 09:29:41 | Epoch: 2 | Step: 71530 | Dataset: 0-17768802 | Loss: 2.170 | 675 ms/step , 58235.71 GFLOP/s , 533207.8 tokens/s INFO:__main__:2024-10-27 09:29:48 | Epoch: 2 | Step: 71540 | Dataset: 0-17776802 | Loss: 2.218 | 675 ms/step , 58248.21 GFLOP/s , 533408.6 tokens/s INFO:__main__:2024-10-27 09:29:56 | Epoch: 2 | Step: 71550 | Dataset: 0-17784802 | Loss: 2.215 | 675 ms/step , 58277.39 GFLOP/s , 533075.3 tokens/s INFO:__main__:2024-10-27 09:30:04 | Epoch: 2 | Step: 71560 | Dataset: 0-17792802 | Loss: 2.212 | 674 ms/step , 58296.72 GFLOP/s , 533472.7 tokens/s INFO:__main__:2024-10-27 09:30:11 | Epoch: 2 | Step: 71570 | Dataset: 0-17800802 | Loss: 2.217 | 673 ms/step , 58402.25 GFLOP/s , 533851.9 tokens/s INFO:__main__:2024-10-27 09:30:19 | Epoch: 2 | Step: 71580 | Dataset: 0-17808802 | Loss: 2.211 | 673 ms/step , 58442.40 GFLOP/s , 534197.4 tokens/s INFO:__main__:2024-10-27 09:30:27 | Epoch: 2 | Step: 71590 | Dataset: 0-17816802 | Loss: 2.114 | 674 ms/step , 58313.71 GFLOP/s , 533707.6 tokens/s INFO:__main__:2024-10-27 09:30:34 | Epoch: 2 | Step: 71600 | Dataset: 0-17824802 | Loss: 2.178 | 673 ms/step , 58405.57 GFLOP/s , 533776.0 tokens/s INFO:__main__:2024-10-27 09:30:42 | Epoch: 2 | Step: 71610 | Dataset: 0-17832802 | Loss: 2.249 | 675 ms/step , 58275.91 GFLOP/s , 533454.4 tokens/s INFO:__main__:2024-10-27 09:30:50 | Epoch: 2 | Step: 71620 | Dataset: 0-17840802 | Loss: 2.211 | 674 ms/step , 58314.99 GFLOP/s , 533419.1 tokens/s INFO:__main__:2024-10-27 09:30:57 | Epoch: 2 | Step: 71630 | Dataset: 0-17848802 | Loss: 2.169 | 674 ms/step , 58285.04 GFLOP/s , 533068.1 tokens/s INFO:__main__:2024-10-27 09:31:05 | Epoch: 2 | Step: 71640 | Dataset: 0-17856802 | Loss: 2.171 | 675 ms/step , 58220.23 GFLOP/s , 533097.3 tokens/s INFO:__main__:2024-10-27 09:31:13 | Epoch: 2 | Step: 71650 | Dataset: 0-17864802 | Loss: 2.169 | 675 ms/step , 58269.44 GFLOP/s , 533378.6 tokens/s INFO:__main__:2024-10-27 09:31:20 | Epoch: 2 | Step: 71660 | Dataset: 0-17872802 | Loss: 2.168 | 675 ms/step , 58243.31 GFLOP/s , 533020.9 tokens/s INFO:__main__:2024-10-27 09:31:28 | Epoch: 2 | Step: 71670 | Dataset: 0-17880802 | Loss: 2.132 | 673 ms/step , 58410.08 GFLOP/s , 533442.9 tokens/s INFO:__main__:2024-10-27 09:31:36 | Epoch: 2 | Step: 71680 | Dataset: 0-17888802 | Loss: 2.138 | 675 ms/step , 58228.19 GFLOP/s , 532847.6 tokens/s INFO:__main__:2024-10-27 09:31:44 | Epoch: 2 | Step: 71690 | Dataset: 0-17896802 | Loss: 2.229 | 675 ms/step , 58245.85 GFLOP/s , 533267.4 tokens/s INFO:__main__:2024-10-27 09:31:51 | Epoch: 2 | Step: 71700 | Dataset: 0-17904802 | Loss: 2.103 | 674 ms/step , 58285.71 GFLOP/s , 532650.9 tokens/s INFO:__main__:2024-10-27 09:31:59 | Epoch: 2 | Step: 71710 | Dataset: 0-17912802 | Loss: 1.801 | 674 ms/step , 58301.53 GFLOP/s , 532425.0 tokens/s INFO:__main__:2024-10-27 09:32:07 | Epoch: 2 | Step: 71720 | Dataset: 0-17920802 | Loss: 1.737 | 676 ms/step , 58180.67 GFLOP/s , 532166.5 tokens/s INFO:__main__:2024-10-27 09:32:14 | Epoch: 2 | Step: 71730 | Dataset: 0-17928802 | Loss: 1.696 | 675 ms/step , 58203.43 GFLOP/s , 532221.9 tokens/s INFO:__main__:2024-10-27 09:32:22 | Epoch: 2 | Step: 71740 | Dataset: 0-17936802 | Loss: 1.690 | 675 ms/step , 58234.47 GFLOP/s , 532133.8 tokens/s INFO:__main__:2024-10-27 09:32:30 | Epoch: 2 | Step: 71750 | Dataset: 0-17944802 | Loss: 1.686 | 675 ms/step , 58215.24 GFLOP/s , 531697.1 tokens/s INFO:__main__:2024-10-27 09:32:37 | Epoch: 2 | Step: 71760 | Dataset: 0-17952802 | Loss: 1.662 | 674 ms/step , 58336.82 GFLOP/s , 532091.1 tokens/s INFO:__main__:2024-10-27 09:32:45 | Epoch: 2 | Step: 71770 | Dataset: 0-17960802 | Loss: 1.667 | 676 ms/step , 58147.87 GFLOP/s , 532319.3 tokens/s INFO:__main__:2024-10-27 09:32:53 | Epoch: 2 | Step: 71780 | Dataset: 0-17968802 | Loss: 1.687 | 675 ms/step , 58220.18 GFLOP/s , 532374.4 tokens/s INFO:__main__:2024-10-27 09:33:00 | Epoch: 2 | Step: 71790 | Dataset: 0-17976802 | Loss: 1.667 | 676 ms/step , 58170.47 GFLOP/s , 531963.5 tokens/s INFO:__main__:2024-10-27 09:33:08 | Epoch: 2 | Step: 71800 | Dataset: 0-17984802 | Loss: 1.679 | 676 ms/step , 58179.69 GFLOP/s , 531997.6 tokens/s INFO:__main__:2024-10-27 09:33:16 | Epoch: 2 | Step: 71810 | Dataset: 0-17992802 | Loss: 1.646 | 676 ms/step , 58168.20 GFLOP/s , 531727.4 tokens/s INFO:__main__:2024-10-27 09:33:24 | Epoch: 2 | Step: 71820 | Dataset: 0-18000802 | Loss: 1.650 | 675 ms/step , 58228.90 GFLOP/s , 531729.4 tokens/s INFO:__main__:2024-10-27 09:33:31 | Epoch: 2 | Step: 71830 | Dataset: 0-18008802 | Loss: 1.655 | 675 ms/step , 58194.90 GFLOP/s , 531592.8 tokens/s INFO:__main__:2024-10-27 09:33:39 | Epoch: 2 | Step: 71840 | Dataset: 0-18016802 | Loss: 1.624 | 675 ms/step , 58214.91 GFLOP/s , 531564.8 tokens/s INFO:__main__:2024-10-27 09:33:47 | Epoch: 2 | Step: 71850 | Dataset: 0-18024802 | Loss: 1.623 | 675 ms/step , 58219.75 GFLOP/s , 532035.1 tokens/s INFO:__main__:2024-10-27 09:33:54 | Epoch: 2 | Step: 71860 | Dataset: 0-18032802 | Loss: 1.664 | 678 ms/step , 57964.01 GFLOP/s , 531702.3 tokens/s INFO:__main__:2024-10-27 09:34:02 | Epoch: 2 | Step: 71870 | Dataset: 0-18040802 | Loss: 1.629 | 675 ms/step , 58226.78 GFLOP/s , 531801.6 tokens/s INFO:__main__:2024-10-27 09:34:10 | Epoch: 2 | Step: 71880 | Dataset: 0-18048802 | Loss: 2.228 | 677 ms/step , 58090.82 GFLOP/s , 532223.1 tokens/s INFO:__main__:2024-10-27 09:34:17 | Epoch: 2 | Step: 71890 | Dataset: 0-18056802 | Loss: 2.218 | 676 ms/step , 58155.86 GFLOP/s , 532220.9 tokens/s INFO:__main__:2024-10-27 09:34:25 | Epoch: 2 | Step: 71900 | Dataset: 0-18064802 | Loss: 2.112 | 675 ms/step , 58217.29 GFLOP/s , 532413.1 tokens/s INFO:__main__:2024-10-27 09:34:33 | Epoch: 2 | Step: 71910 | Dataset: 0-18072802 | Loss: 2.174 | 676 ms/step , 58172.04 GFLOP/s , 532691.0 tokens/s INFO:__main__:2024-10-27 09:34:41 | Epoch: 2 | Step: 71920 | Dataset: 0-18080802 | Loss: 2.215 | 676 ms/step , 58190.23 GFLOP/s , 532204.8 tokens/s INFO:__main__:2024-10-27 09:34:48 | Epoch: 2 | Step: 71930 | Dataset: 0-18088802 | Loss: 2.116 | 675 ms/step , 58195.60 GFLOP/s , 532663.7 tokens/s INFO:__main__:2024-10-27 09:34:56 | Epoch: 2 | Step: 71940 | Dataset: 0-18096802 | Loss: 2.169 | 676 ms/step , 58135.05 GFLOP/s , 532448.7 tokens/s INFO:__main__:2024-10-27 09:35:04 | Epoch: 2 | Step: 71950 | Dataset: 0-18104802 | Loss: 2.208 | 675 ms/step , 58211.88 GFLOP/s , 532534.1 tokens/s INFO:__main__:2024-10-27 09:35:11 | Epoch: 2 | Step: 71960 | Dataset: 0-18112802 | Loss: 2.098 | 675 ms/step , 58250.62 GFLOP/s , 532566.3 tokens/s INFO:__main__:2024-10-27 09:35:19 | Epoch: 2 | Step: 71970 | Dataset: 0-18120802 | Loss: 2.194 | 675 ms/step , 58241.52 GFLOP/s , 532415.3 tokens/s INFO:__main__:2024-10-27 09:35:27 | Epoch: 2 | Step: 71980 | Dataset: 0-18128802 | Loss: 2.146 | 674 ms/step , 58285.22 GFLOP/s , 533213.4 tokens/s INFO:__main__:2024-10-27 09:35:34 | Epoch: 2 | Step: 71990 | Dataset: 0-18136802 | Loss: 2.219 | 675 ms/step , 58247.56 GFLOP/s , 532735.5 tokens/s INFO:__main__:2024-10-27 09:35:42 | Validation | Step: 72000 | Val_loss: 2.234 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 09:35:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_093542_step_72000.pt` INFO:__main__:2024-10-27 09:35:43 | Epoch: 2 | Step: 72000 | Dataset: 0-18144802 | Loss: 2.180 | 673 ms/step , 58439.27 GFLOP/s , 481054.6 tokens/s INFO:__main__:2024-10-27 09:35:51 | Epoch: 2 | Step: 72010 | Dataset: 0-18152802 | Loss: 2.174 | 675 ms/step , 58230.13 GFLOP/s , 532864.4 tokens/s INFO:__main__:2024-10-27 09:35:58 | Epoch: 2 | Step: 72020 | Dataset: 0-18160802 | Loss: 2.145 | 675 ms/step , 58268.45 GFLOP/s , 533093.9 tokens/s INFO:__main__:2024-10-27 09:36:06 | Epoch: 2 | Step: 72030 | Dataset: 0-18168802 | Loss: 2.107 | 674 ms/step , 58308.35 GFLOP/s , 533211.2 tokens/s INFO:__main__:2024-10-27 09:36:14 | Epoch: 2 | Step: 72040 | Dataset: 0-18176802 | Loss: 1.845 | 674 ms/step , 58330.96 GFLOP/s , 532971.6 tokens/s INFO:__main__:2024-10-27 09:36:21 | Epoch: 2 | Step: 72050 | Dataset: 0-18184802 | Loss: 1.751 | 675 ms/step , 58246.76 GFLOP/s , 532583.2 tokens/s INFO:__main__:2024-10-27 09:36:29 | Epoch: 2 | Step: 72060 | Dataset: 0-18192802 | Loss: 1.762 | 674 ms/step , 58353.98 GFLOP/s , 532996.9 tokens/s INFO:__main__:2024-10-27 09:36:37 | Epoch: 2 | Step: 72070 | Dataset: 0-18200802 | Loss: 1.738 | 674 ms/step , 58348.48 GFLOP/s , 533062.3 tokens/s INFO:__main__:2024-10-27 09:36:44 | Epoch: 2 | Step: 72080 | Dataset: 0-18208802 | Loss: 1.754 | 673 ms/step , 58382.86 GFLOP/s , 533618.1 tokens/s INFO:__main__:2024-10-27 09:36:52 | Epoch: 2 | Step: 72090 | Dataset: 0-18216802 | Loss: 1.758 | 674 ms/step , 58294.47 GFLOP/s , 532983.0 tokens/s INFO:__main__:2024-10-27 09:37:00 | Epoch: 2 | Step: 72100 | Dataset: 0-18224802 | Loss: 1.750 | 674 ms/step , 58310.33 GFLOP/s , 532997.4 tokens/s INFO:__main__:2024-10-27 09:37:07 | Epoch: 2 | Step: 72110 | Dataset: 0-18232802 | Loss: 1.741 | 674 ms/step , 58312.75 GFLOP/s , 532995.9 tokens/s INFO:__main__:2024-10-27 09:37:15 | Epoch: 2 | Step: 72120 | Dataset: 0-18240802 | Loss: 1.727 | 674 ms/step , 58290.09 GFLOP/s , 532694.6 tokens/s INFO:__main__:2024-10-27 09:37:23 | Epoch: 2 | Step: 72130 | Dataset: 0-18248802 | Loss: 2.290 | 674 ms/step , 58297.46 GFLOP/s , 533281.0 tokens/s INFO:__main__:2024-10-27 09:37:30 | Epoch: 2 | Step: 72140 | Dataset: 0-18256802 | Loss: 2.268 | 675 ms/step , 58277.29 GFLOP/s , 533124.5 tokens/s INFO:__main__:2024-10-27 09:37:38 | Epoch: 2 | Step: 72150 | Dataset: 0-18264802 | Loss: 2.234 | 674 ms/step , 58327.22 GFLOP/s , 533168.8 tokens/s INFO:__main__:2024-10-27 09:37:46 | Epoch: 2 | Step: 72160 | Dataset: 0-18272802 | Loss: 2.156 | 674 ms/step , 58342.95 GFLOP/s , 533957.1 tokens/s INFO:__main__:2024-10-27 09:37:54 | Epoch: 2 | Step: 72170 | Dataset: 0-18280802 | Loss: 2.215 | 673 ms/step , 58376.03 GFLOP/s , 533401.1 tokens/s INFO:__main__:2024-10-27 09:38:01 | Epoch: 2 | Step: 72180 | Dataset: 0-18288802 | Loss: 2.113 | 674 ms/step , 58310.56 GFLOP/s , 533502.1 tokens/s INFO:__main__:2024-10-27 09:38:09 | Epoch: 2 | Step: 72190 | Dataset: 0-18296802 | Loss: 2.224 | 674 ms/step , 58317.83 GFLOP/s , 533070.0 tokens/s INFO:__main__:2024-10-27 09:38:17 | Epoch: 2 | Step: 72200 | Dataset: 0-18304802 | Loss: 2.178 | 676 ms/step , 58157.51 GFLOP/s , 533440.0 tokens/s INFO:__main__:2024-10-27 09:38:24 | Epoch: 2 | Step: 72210 | Dataset: 0-18312802 | Loss: 2.182 | 674 ms/step , 58332.26 GFLOP/s , 533181.0 tokens/s INFO:__main__:2024-10-27 09:38:32 | Epoch: 2 | Step: 72220 | Dataset: 0-18320802 | Loss: 2.193 | 674 ms/step , 58296.40 GFLOP/s , 533566.3 tokens/s INFO:__main__:2024-10-27 09:38:40 | Epoch: 2 | Step: 72230 | Dataset: 0-18328802 | Loss: 2.213 | 675 ms/step , 58273.16 GFLOP/s , 533142.9 tokens/s INFO:__main__:2024-10-27 09:38:47 | Epoch: 2 | Step: 72240 | Dataset: 0-18336802 | Loss: 2.151 | 674 ms/step , 58327.41 GFLOP/s , 533730.7 tokens/s INFO:__main__:2024-10-27 09:38:55 | Epoch: 2 | Step: 72250 | Dataset: 0-18344802 | Loss: 2.186 | 674 ms/step , 58294.39 GFLOP/s , 533213.7 tokens/s INFO:__main__:2024-10-27 09:39:03 | Epoch: 2 | Step: 72260 | Dataset: 0-18352802 | Loss: 2.109 | 674 ms/step , 58332.38 GFLOP/s , 533453.1 tokens/s INFO:__main__:2024-10-27 09:39:10 | Epoch: 2 | Step: 72270 | Dataset: 0-18360802 | Loss: 2.181 | 675 ms/step , 58276.05 GFLOP/s , 532799.7 tokens/s INFO:__main__:2024-10-27 09:39:18 | Epoch: 2 | Step: 72280 | Dataset: 0-18368802 | Loss: 2.152 | 674 ms/step , 58310.12 GFLOP/s , 533358.4 tokens/s INFO:__main__:2024-10-27 09:39:26 | Epoch: 2 | Step: 72290 | Dataset: 0-18376802 | Loss: 2.177 | 676 ms/step , 58138.33 GFLOP/s , 533215.0 tokens/s INFO:__main__:2024-10-27 09:39:33 | Epoch: 2 | Step: 72300 | Dataset: 0-18384802 | Loss: 2.177 | 674 ms/step , 58292.76 GFLOP/s , 533220.6 tokens/s INFO:__main__:2024-10-27 09:39:41 | Epoch: 2 | Step: 72310 | Dataset: 0-18392802 | Loss: 2.168 | 674 ms/step , 58310.74 GFLOP/s , 533134.5 tokens/s INFO:__main__:2024-10-27 09:39:49 | Epoch: 2 | Step: 72320 | Dataset: 0-18400802 | Loss: 2.171 | 674 ms/step , 58317.57 GFLOP/s , 533345.0 tokens/s INFO:__main__:2024-10-27 09:39:56 | Epoch: 2 | Step: 72330 | Dataset: 0-18408802 | Loss: 2.103 | 673 ms/step , 58384.13 GFLOP/s , 533642.2 tokens/s INFO:__main__:2024-10-27 09:40:04 | Epoch: 2 | Step: 72340 | Dataset: 0-18416802 | Loss: 2.126 | 674 ms/step , 58329.88 GFLOP/s , 533735.1 tokens/s INFO:__main__:2024-10-27 09:40:12 | Epoch: 2 | Step: 72350 | Dataset: 0-18424802 | Loss: 2.228 | 674 ms/step , 58302.56 GFLOP/s , 533638.0 tokens/s INFO:__main__:2024-10-27 09:40:19 | Epoch: 2 | Step: 72360 | Dataset: 0-18432802 | Loss: 2.123 | 674 ms/step , 58301.24 GFLOP/s , 533639.7 tokens/s INFO:__main__:2024-10-27 09:40:27 | Epoch: 2 | Step: 72370 | Dataset: 0-18440802 | Loss: 2.213 | 675 ms/step , 58277.29 GFLOP/s , 533361.0 tokens/s INFO:__main__:2024-10-27 09:40:35 | Epoch: 2 | Step: 72380 | Dataset: 0-18448802 | Loss: 2.131 | 674 ms/step , 58308.04 GFLOP/s , 533490.1 tokens/s INFO:__main__:2024-10-27 09:40:42 | Epoch: 2 | Step: 72390 | Dataset: 0-18456802 | Loss: 2.168 | 674 ms/step , 58283.09 GFLOP/s , 533141.6 tokens/s INFO:__main__:2024-10-27 09:40:50 | Epoch: 2 | Step: 72400 | Dataset: 0-18464802 | Loss: 2.148 | 675 ms/step , 58273.18 GFLOP/s , 533510.7 tokens/s INFO:__main__:2024-10-27 09:40:58 | Epoch: 2 | Step: 72410 | Dataset: 0-18472802 | Loss: 2.234 | 674 ms/step , 58307.67 GFLOP/s , 533280.8 tokens/s INFO:__main__:2024-10-27 09:41:06 | Epoch: 2 | Step: 72420 | Dataset: 0-18480802 | Loss: 2.203 | 675 ms/step , 58198.72 GFLOP/s , 532641.1 tokens/s INFO:__main__:2024-10-27 09:41:13 | Epoch: 2 | Step: 72430 | Dataset: 0-18488802 | Loss: 2.086 | 675 ms/step , 58205.18 GFLOP/s , 533084.2 tokens/s INFO:__main__:2024-10-27 09:41:21 | Epoch: 2 | Step: 72440 | Dataset: 0-18496802 | Loss: 2.229 | 674 ms/step , 58293.10 GFLOP/s , 532941.8 tokens/s INFO:__main__:2024-10-27 09:41:29 | Epoch: 2 | Step: 72450 | Dataset: 0-18504802 | Loss: 2.143 | 676 ms/step , 58160.19 GFLOP/s , 533083.4 tokens/s INFO:__main__:2024-10-27 09:41:36 | Epoch: 2 | Step: 72460 | Dataset: 0-18512802 | Loss: 2.222 | 676 ms/step , 58181.24 GFLOP/s , 533166.1 tokens/s INFO:__main__:2024-10-27 09:41:44 | Epoch: 2 | Step: 72470 | Dataset: 0-18520802 | Loss: 2.090 | 675 ms/step , 58269.03 GFLOP/s , 533028.3 tokens/s INFO:__main__:2024-10-27 09:41:52 | Epoch: 2 | Step: 72480 | Dataset: 0-18528802 | Loss: 1.996 | 675 ms/step , 58254.43 GFLOP/s , 533045.2 tokens/s INFO:__main__:2024-10-27 09:41:59 | Epoch: 2 | Step: 72490 | Dataset: 0-18536802 | Loss: 2.056 | 674 ms/step , 58329.06 GFLOP/s , 533458.7 tokens/s INFO:__main__:2024-10-27 09:42:07 | Epoch: 2 | Step: 72500 | Dataset: 0-18544802 | Loss: 2.041 | 677 ms/step , 58030.41 GFLOP/s , 531886.9 tokens/s INFO:__main__:2024-10-27 09:42:15 | Epoch: 2 | Step: 72510 | Dataset: 0-18552802 | Loss: 2.143 | 674 ms/step , 58286.04 GFLOP/s , 532855.0 tokens/s INFO:__main__:2024-10-27 09:42:22 | Epoch: 2 | Step: 72520 | Dataset: 0-18560802 | Loss: 2.108 | 674 ms/step , 58344.89 GFLOP/s , 533641.4 tokens/s INFO:__main__:2024-10-27 09:42:30 | Epoch: 2 | Step: 72530 | Dataset: 0-18568802 | Loss: 2.100 | 677 ms/step , 58070.15 GFLOP/s , 531268.5 tokens/s INFO:__main__:2024-10-27 09:42:38 | Epoch: 2 | Step: 72540 | Dataset: 0-18576802 | Loss: 2.112 | 675 ms/step , 58203.10 GFLOP/s , 532869.2 tokens/s INFO:__main__:2024-10-27 09:42:45 | Epoch: 2 | Step: 72550 | Dataset: 0-18584802 | Loss: 2.133 | 675 ms/step , 58236.55 GFLOP/s , 532683.7 tokens/s INFO:__main__:2024-10-27 09:42:53 | Epoch: 2 | Step: 72560 | Dataset: 0-18592802 | Loss: 2.133 | 677 ms/step , 58028.71 GFLOP/s , 532778.4 tokens/s INFO:__main__:2024-10-27 09:43:01 | Epoch: 2 | Step: 72570 | Dataset: 0-18600802 | Loss: 2.172 | 675 ms/step , 58239.37 GFLOP/s , 533053.7 tokens/s INFO:__main__:2024-10-27 09:43:09 | Epoch: 2 | Step: 72580 | Dataset: 0-18608802 | Loss: 2.073 | 676 ms/step , 58180.32 GFLOP/s , 532557.3 tokens/s INFO:__main__:2024-10-27 09:43:16 | Epoch: 2 | Step: 72590 | Dataset: 0-18616802 | Loss: 2.137 | 676 ms/step , 58182.08 GFLOP/s , 532680.0 tokens/s INFO:__main__:2024-10-27 09:43:24 | Epoch: 2 | Step: 72600 | Dataset: 0-18624802 | Loss: 2.152 | 677 ms/step , 58100.97 GFLOP/s , 532885.4 tokens/s INFO:__main__:2024-10-27 09:43:32 | Epoch: 2 | Step: 72610 | Dataset: 0-18632802 | Loss: 2.291 | 675 ms/step , 58193.38 GFLOP/s , 532893.7 tokens/s INFO:__main__:2024-10-27 09:43:39 | Epoch: 2 | Step: 72620 | Dataset: 0-18640802 | Loss: 2.066 | 678 ms/step , 57984.39 GFLOP/s , 530526.7 tokens/s INFO:__main__:2024-10-27 09:43:47 | Epoch: 2 | Step: 72630 | Dataset: 0-18648802 | Loss: 2.157 | 679 ms/step , 57861.02 GFLOP/s , 528984.3 tokens/s INFO:__main__:2024-10-27 09:43:55 | Epoch: 2 | Step: 72640 | Dataset: 0-18656802 | Loss: 2.127 | 678 ms/step , 57969.14 GFLOP/s , 530093.5 tokens/s INFO:__main__:2024-10-27 09:44:03 | Epoch: 2 | Step: 72650 | Dataset: 0-18664802 | Loss: 2.137 | 678 ms/step , 57994.70 GFLOP/s , 529606.6 tokens/s INFO:__main__:2024-10-27 09:44:10 | Epoch: 2 | Step: 72660 | Dataset: 0-18672802 | Loss: 2.154 | 678 ms/step , 58013.42 GFLOP/s , 529840.2 tokens/s INFO:__main__:2024-10-27 09:44:18 | Epoch: 2 | Step: 72670 | Dataset: 0-18680802 | Loss: 2.064 | 678 ms/step , 57988.50 GFLOP/s , 530264.4 tokens/s INFO:__main__:2024-10-27 09:44:26 | Epoch: 2 | Step: 72680 | Dataset: 0-18688802 | Loss: 2.085 | 678 ms/step , 57976.30 GFLOP/s , 529931.7 tokens/s INFO:__main__:2024-10-27 09:44:33 | Epoch: 2 | Step: 72690 | Dataset: 0-18696802 | Loss: 2.116 | 678 ms/step , 57936.72 GFLOP/s , 528457.5 tokens/s INFO:__main__:2024-10-27 09:44:41 | Epoch: 2 | Step: 72700 | Dataset: 0-18704802 | Loss: 2.085 | 678 ms/step , 58008.49 GFLOP/s , 529944.9 tokens/s INFO:__main__:2024-10-27 09:44:49 | Epoch: 2 | Step: 72710 | Dataset: 0-18712802 | Loss: 2.070 | 685 ms/step , 57395.49 GFLOP/s , 532471.2 tokens/s INFO:__main__:2024-10-27 09:44:57 | Epoch: 2 | Step: 72720 | Dataset: 0-18720802 | Loss: 2.113 | 674 ms/step , 58325.50 GFLOP/s , 532922.1 tokens/s INFO:__main__:2024-10-27 09:45:04 | Epoch: 2 | Step: 72730 | Dataset: 0-18728802 | Loss: 2.068 | 675 ms/step , 58267.83 GFLOP/s , 533708.1 tokens/s INFO:__main__:2024-10-27 09:45:12 | Epoch: 2 | Step: 72740 | Dataset: 0-18736802 | Loss: 2.175 | 674 ms/step , 58297.11 GFLOP/s , 532893.4 tokens/s INFO:__main__:2024-10-27 09:45:20 | Epoch: 2 | Step: 72750 | Dataset: 0-18744802 | Loss: 2.075 | 673 ms/step , 58380.15 GFLOP/s , 533566.8 tokens/s INFO:__main__:2024-10-27 09:45:27 | Epoch: 2 | Step: 72760 | Dataset: 0-18752802 | Loss: 2.070 | 674 ms/step , 58354.78 GFLOP/s , 533696.5 tokens/s INFO:__main__:2024-10-27 09:45:35 | Epoch: 2 | Step: 72770 | Dataset: 0-18760802 | Loss: 1.960 | 673 ms/step , 58410.52 GFLOP/s , 533292.2 tokens/s INFO:__main__:2024-10-27 09:45:43 | Epoch: 2 | Step: 72780 | Dataset: 0-18768802 | Loss: 1.827 | 674 ms/step , 58339.75 GFLOP/s , 532709.5 tokens/s INFO:__main__:2024-10-27 09:45:50 | Epoch: 2 | Step: 72790 | Dataset: 0-18776802 | Loss: 1.810 | 682 ms/step , 57625.40 GFLOP/s , 531779.9 tokens/s INFO:__main__:2024-10-27 09:45:58 | Epoch: 2 | Step: 72800 | Dataset: 0-18784802 | Loss: 1.807 | 674 ms/step , 58341.54 GFLOP/s , 532830.0 tokens/s INFO:__main__:2024-10-27 09:46:06 | Epoch: 2 | Step: 72810 | Dataset: 0-18792802 | Loss: 1.733 | 674 ms/step , 58318.61 GFLOP/s , 532854.5 tokens/s INFO:__main__:2024-10-27 09:46:13 | Epoch: 2 | Step: 72820 | Dataset: 0-18800802 | Loss: 1.763 | 674 ms/step , 58342.74 GFLOP/s , 533112.3 tokens/s INFO:__main__:2024-10-27 09:46:21 | Epoch: 2 | Step: 72830 | Dataset: 0-18808802 | Loss: 1.758 | 674 ms/step , 58324.49 GFLOP/s , 532847.4 tokens/s INFO:__main__:2024-10-27 09:46:29 | Epoch: 2 | Step: 72840 | Dataset: 0-18816802 | Loss: 1.771 | 674 ms/step , 58351.16 GFLOP/s , 533104.9 tokens/s INFO:__main__:2024-10-27 09:46:36 | Epoch: 2 | Step: 72850 | Dataset: 0-18824802 | Loss: 1.745 | 674 ms/step , 58302.23 GFLOP/s , 532262.3 tokens/s INFO:__main__:2024-10-27 09:46:44 | Epoch: 2 | Step: 72860 | Dataset: 0-18832802 | Loss: 2.331 | 674 ms/step , 58333.09 GFLOP/s , 533163.3 tokens/s INFO:__main__:2024-10-27 09:46:52 | Epoch: 2 | Step: 72870 | Dataset: 0-18840802 | Loss: 2.275 | 673 ms/step , 58368.05 GFLOP/s , 533706.4 tokens/s INFO:__main__:2024-10-27 09:46:59 | Epoch: 2 | Step: 72880 | Dataset: 0-18848802 | Loss: 2.158 | 675 ms/step , 58222.42 GFLOP/s , 533341.9 tokens/s INFO:__main__:2024-10-27 09:47:07 | Epoch: 2 | Step: 72890 | Dataset: 0-18856802 | Loss: 2.166 | 675 ms/step , 58240.35 GFLOP/s , 533400.4 tokens/s INFO:__main__:2024-10-27 09:47:15 | Epoch: 2 | Step: 72900 | Dataset: 0-18864802 | Loss: 2.158 | 674 ms/step , 58327.19 GFLOP/s , 533636.2 tokens/s INFO:__main__:2024-10-27 09:47:23 | Epoch: 2 | Step: 72910 | Dataset: 0-18872802 | Loss: 2.174 | 674 ms/step , 58330.07 GFLOP/s , 533687.5 tokens/s INFO:__main__:2024-10-27 09:47:30 | Epoch: 2 | Step: 72920 | Dataset: 0-18880802 | Loss: 2.206 | 675 ms/step , 58277.32 GFLOP/s , 533527.8 tokens/s INFO:__main__:2024-10-27 09:47:38 | Epoch: 2 | Step: 72930 | Dataset: 0-18888802 | Loss: 2.079 | 674 ms/step , 58352.76 GFLOP/s , 533858.8 tokens/s INFO:__main__:2024-10-27 09:47:46 | Epoch: 2 | Step: 72940 | Dataset: 0-18896802 | Loss: 2.146 | 674 ms/step , 58318.37 GFLOP/s , 533530.5 tokens/s INFO:__main__:2024-10-27 09:47:53 | Epoch: 2 | Step: 72950 | Dataset: 0-18904802 | Loss: 2.076 | 675 ms/step , 58271.00 GFLOP/s , 533420.7 tokens/s INFO:__main__:2024-10-27 09:48:01 | Epoch: 2 | Step: 72960 | Dataset: 0-18912802 | Loss: 2.163 | 673 ms/step , 58369.11 GFLOP/s , 533283.7 tokens/s INFO:__main__:2024-10-27 09:48:09 | Epoch: 2 | Step: 72970 | Dataset: 0-18920802 | Loss: 2.191 | 674 ms/step , 58326.76 GFLOP/s , 533424.5 tokens/s INFO:__main__:2024-10-27 09:48:16 | Epoch: 2 | Step: 72980 | Dataset: 0-18928802 | Loss: 2.165 | 673 ms/step , 58385.94 GFLOP/s , 533502.1 tokens/s INFO:__main__:2024-10-27 09:48:24 | Epoch: 2 | Step: 72990 | Dataset: 0-18936802 | Loss: 2.091 | 674 ms/step , 58336.93 GFLOP/s , 533423.5 tokens/s INFO:__main__:2024-10-27 09:48:31 | Validation | Step: 73000 | Val_loss: 2.216 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 09:48:31 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_094831_step_73000.pt` INFO:__main__:2024-10-27 09:48:32 | Epoch: 2 | Step: 73000 | Dataset: 0-18944802 | Loss: 2.094 | 673 ms/step , 58416.33 GFLOP/s , 482008.4 tokens/s INFO:__main__:2024-10-27 09:48:40 | Epoch: 2 | Step: 73010 | Dataset: 0-18952802 | Loss: 2.165 | 674 ms/step , 58288.65 GFLOP/s , 533137.0 tokens/s INFO:__main__:2024-10-27 09:48:48 | Epoch: 2 | Step: 73020 | Dataset: 0-18960802 | Loss: 2.209 | 674 ms/step , 58327.34 GFLOP/s , 533642.6 tokens/s INFO:__main__:2024-10-27 09:48:55 | Epoch: 2 | Step: 73030 | Dataset: 0-18968802 | Loss: 2.107 | 675 ms/step , 58203.28 GFLOP/s , 532514.4 tokens/s INFO:__main__:2024-10-27 09:49:03 | Epoch: 2 | Step: 73040 | Dataset: 0-18976802 | Loss: 2.126 | 674 ms/step , 58306.15 GFLOP/s , 533226.9 tokens/s INFO:__main__:2024-10-27 09:49:11 | Epoch: 2 | Step: 73050 | Dataset: 0-18984802 | Loss: 2.134 | 673 ms/step , 58374.80 GFLOP/s , 533553.4 tokens/s INFO:__main__:2024-10-27 09:49:19 | Epoch: 2 | Step: 73060 | Dataset: 0-18992802 | Loss: 2.210 | 674 ms/step , 58296.89 GFLOP/s , 533841.6 tokens/s INFO:__main__:2024-10-27 09:49:26 | Epoch: 2 | Step: 73070 | Dataset: 0-19000802 | Loss: 2.141 | 674 ms/step , 58350.18 GFLOP/s , 532936.0 tokens/s INFO:__main__:2024-10-27 09:49:34 | Epoch: 2 | Step: 73080 | Dataset: 0-19008802 | Loss: 2.171 | 674 ms/step , 58292.19 GFLOP/s , 533440.6 tokens/s INFO:__main__:2024-10-27 09:49:42 | Epoch: 2 | Step: 73090 | Dataset: 0-19016802 | Loss: 2.176 | 675 ms/step , 58272.01 GFLOP/s , 533034.0 tokens/s INFO:__main__:2024-10-27 09:49:49 | Epoch: 2 | Step: 73100 | Dataset: 0-19024802 | Loss: 2.177 | 675 ms/step , 58215.30 GFLOP/s , 533208.5 tokens/s INFO:__main__:2024-10-27 09:49:57 | Epoch: 2 | Step: 73110 | Dataset: 0-19032802 | Loss: 2.153 | 674 ms/step , 58293.36 GFLOP/s , 533155.9 tokens/s INFO:__main__:2024-10-27 09:50:05 | Epoch: 2 | Step: 73120 | Dataset: 0-19040802 | Loss: 2.122 | 674 ms/step , 58320.97 GFLOP/s , 533021.2 tokens/s INFO:__main__:2024-10-27 09:50:12 | Epoch: 2 | Step: 73130 | Dataset: 0-19048802 | Loss: 2.125 | 674 ms/step , 58289.96 GFLOP/s , 532854.6 tokens/s INFO:__main__:2024-10-27 09:50:20 | Epoch: 2 | Step: 73140 | Dataset: 0-19056802 | Loss: 2.195 | 675 ms/step , 58208.42 GFLOP/s , 533012.7 tokens/s INFO:__main__:2024-10-27 09:50:28 | Epoch: 2 | Step: 73150 | Dataset: 0-19064802 | Loss: 2.055 | 675 ms/step , 58243.11 GFLOP/s , 533241.4 tokens/s INFO:__main__:2024-10-27 09:50:35 | Epoch: 2 | Step: 73160 | Dataset: 0-19072802 | Loss: 2.194 | 674 ms/step , 58310.20 GFLOP/s , 533125.8 tokens/s INFO:__main__:2024-10-27 09:50:43 | Epoch: 2 | Step: 73170 | Dataset: 0-19080802 | Loss: 2.110 | 673 ms/step , 58391.61 GFLOP/s , 533803.3 tokens/s INFO:__main__:2024-10-27 09:50:51 | Epoch: 2 | Step: 73180 | Dataset: 0-19088802 | Loss: 2.163 | 674 ms/step , 58304.75 GFLOP/s , 533457.8 tokens/s INFO:__main__:2024-10-27 09:50:58 | Epoch: 2 | Step: 73190 | Dataset: 0-19096802 | Loss: 2.164 | 674 ms/step , 58280.49 GFLOP/s , 533163.2 tokens/s INFO:__main__:2024-10-27 09:51:06 | Epoch: 2 | Step: 73200 | Dataset: 0-19104802 | Loss: 2.147 | 675 ms/step , 58218.19 GFLOP/s , 533167.9 tokens/s INFO:__main__:2024-10-27 09:51:14 | Epoch: 2 | Step: 73210 | Dataset: 0-19112802 | Loss: 2.177 | 675 ms/step , 58205.23 GFLOP/s , 532576.3 tokens/s INFO:__main__:2024-10-27 09:51:21 | Epoch: 2 | Step: 73220 | Dataset: 0-19120802 | Loss: 2.136 | 675 ms/step , 58265.26 GFLOP/s , 533212.5 tokens/s INFO:__main__:2024-10-27 09:51:29 | Epoch: 2 | Step: 73230 | Dataset: 0-19128802 | Loss: 2.137 | 676 ms/step , 58166.47 GFLOP/s , 533445.1 tokens/s INFO:__main__:2024-10-27 09:51:37 | Epoch: 2 | Step: 73240 | Dataset: 0-19136802 | Loss: 2.236 | 675 ms/step , 58248.61 GFLOP/s , 533019.2 tokens/s INFO:__main__:2024-10-27 09:51:44 | Epoch: 2 | Step: 73250 | Dataset: 0-19144802 | Loss: 2.145 | 675 ms/step , 58248.09 GFLOP/s , 533036.8 tokens/s INFO:__main__:2024-10-27 09:51:52 | Epoch: 2 | Step: 73260 | Dataset: 0-19152802 | Loss: 2.106 | 675 ms/step , 58196.97 GFLOP/s , 532994.2 tokens/s INFO:__main__:2024-10-27 09:52:00 | Epoch: 2 | Step: 73270 | Dataset: 0-19160802 | Loss: 2.154 | 675 ms/step , 58205.06 GFLOP/s , 532756.9 tokens/s INFO:__main__:2024-10-27 09:52:08 | Epoch: 2 | Step: 73280 | Dataset: 0-19168802 | Loss: 2.199 | 675 ms/step , 58216.28 GFLOP/s , 532921.3 tokens/s INFO:__main__:2024-10-27 09:52:15 | Epoch: 2 | Step: 73290 | Dataset: 0-19176802 | Loss: 2.225 | 676 ms/step , 58176.46 GFLOP/s , 532888.0 tokens/s INFO:__main__:2024-10-27 09:52:23 | Epoch: 2 | Step: 73300 | Dataset: 0-19184802 | Loss: 2.214 | 674 ms/step , 58320.88 GFLOP/s , 533075.6 tokens/s INFO:__main__:2024-10-27 09:52:31 | Epoch: 2 | Step: 73310 | Dataset: 0-19192802 | Loss: 2.210 | 676 ms/step , 58171.98 GFLOP/s , 532861.5 tokens/s INFO:__main__:2024-10-27 09:52:38 | Epoch: 2 | Step: 73320 | Dataset: 0-19200802 | Loss: 2.112 | 677 ms/step , 58102.15 GFLOP/s , 531740.4 tokens/s INFO:__main__:2024-10-27 09:52:46 | Epoch: 2 | Step: 73330 | Dataset: 0-19208802 | Loss: 2.132 | 676 ms/step , 58173.65 GFLOP/s , 532880.2 tokens/s INFO:__main__:2024-10-27 09:52:54 | Epoch: 2 | Step: 73340 | Dataset: 0-19216802 | Loss: 2.791 | 675 ms/step , 58237.20 GFLOP/s , 532781.8 tokens/s INFO:__main__:2024-10-27 09:53:01 | Epoch: 2 | Step: 73350 | Dataset: 0-19224802 | Loss: 2.550 | 677 ms/step , 58085.07 GFLOP/s , 532050.3 tokens/s INFO:__main__:2024-10-27 09:53:09 | Epoch: 2 | Step: 73360 | Dataset: 0-19232802 | Loss: 2.588 | 678 ms/step , 58011.44 GFLOP/s , 531298.3 tokens/s INFO:__main__:2024-10-27 09:53:17 | Epoch: 2 | Step: 73370 | Dataset: 0-19240802 | Loss: 2.540 | 674 ms/step , 58310.10 GFLOP/s , 533390.1 tokens/s INFO:__main__:2024-10-27 09:53:24 | Epoch: 2 | Step: 73380 | Dataset: 0-19248802 | Loss: 2.510 | 674 ms/step , 58310.84 GFLOP/s , 533190.0 tokens/s INFO:__main__:2024-10-27 09:53:32 | Epoch: 2 | Step: 73390 | Dataset: 0-19256802 | Loss: 2.551 | 674 ms/step , 58333.19 GFLOP/s , 533143.1 tokens/s INFO:__main__:2024-10-27 09:53:40 | Epoch: 2 | Step: 73400 | Dataset: 0-19264802 | Loss: 2.551 | 679 ms/step , 57930.95 GFLOP/s , 532609.6 tokens/s INFO:__main__:2024-10-27 09:53:48 | Epoch: 2 | Step: 73410 | Dataset: 0-19272802 | Loss: 2.445 | 678 ms/step , 57971.49 GFLOP/s , 532036.1 tokens/s INFO:__main__:2024-10-27 09:53:55 | Epoch: 2 | Step: 73420 | Dataset: 0-19280802 | Loss: 2.465 | 675 ms/step , 58213.78 GFLOP/s , 531936.8 tokens/s INFO:__main__:2024-10-27 09:54:03 | Epoch: 2 | Step: 73430 | Dataset: 0-19288802 | Loss: 2.498 | 674 ms/step , 58337.15 GFLOP/s , 532840.0 tokens/s INFO:__main__:2024-10-27 09:54:11 | Epoch: 2 | Step: 73440 | Dataset: 0-19296802 | Loss: 2.481 | 676 ms/step , 58152.69 GFLOP/s , 531980.9 tokens/s INFO:__main__:2024-10-27 09:54:18 | Epoch: 2 | Step: 73450 | Dataset: 0-19304802 | Loss: 2.410 | 675 ms/step , 58274.98 GFLOP/s , 532829.1 tokens/s INFO:__main__:2024-10-27 09:54:26 | Epoch: 2 | Step: 73460 | Dataset: 0-19312802 | Loss: 2.463 | 674 ms/step , 58286.38 GFLOP/s , 533231.6 tokens/s INFO:__main__:2024-10-27 09:54:34 | Epoch: 2 | Step: 73470 | Dataset: 0-19320802 | Loss: 2.431 | 675 ms/step , 58242.93 GFLOP/s , 532681.7 tokens/s INFO:__main__:2024-10-27 09:54:41 | Epoch: 2 | Step: 73480 | Dataset: 0-19328802 | Loss: 2.539 | 675 ms/step , 58246.43 GFLOP/s , 533275.9 tokens/s INFO:__main__:2024-10-27 09:54:49 | Epoch: 2 | Step: 73490 | Dataset: 0-19336802 | Loss: 2.424 | 674 ms/step , 58301.49 GFLOP/s , 532420.3 tokens/s INFO:__main__:2024-10-27 09:54:57 | Epoch: 2 | Step: 73500 | Dataset: 0-19344802 | Loss: 2.362 | 675 ms/step , 58206.76 GFLOP/s , 533212.0 tokens/s INFO:__main__:2024-10-27 09:55:04 | Epoch: 2 | Step: 73510 | Dataset: 0-19352802 | Loss: 2.256 | 674 ms/step , 58340.73 GFLOP/s , 531420.9 tokens/s INFO:__main__:2024-10-27 09:55:12 | Epoch: 2 | Step: 73520 | Dataset: 0-19360802 | Loss: 2.258 | 674 ms/step , 58331.38 GFLOP/s , 533488.4 tokens/s INFO:__main__:2024-10-27 09:55:20 | Epoch: 2 | Step: 73530 | Dataset: 0-19368802 | Loss: 2.247 | 674 ms/step , 58346.78 GFLOP/s , 533826.2 tokens/s INFO:__main__:2024-10-27 09:55:27 | Epoch: 2 | Step: 73540 | Dataset: 0-19376802 | Loss: 2.109 | 674 ms/step , 58309.09 GFLOP/s , 533032.9 tokens/s INFO:__main__:2024-10-27 09:55:35 | Epoch: 2 | Step: 73550 | Dataset: 0-19384802 | Loss: 2.208 | 675 ms/step , 58211.49 GFLOP/s , 533102.7 tokens/s INFO:__main__:2024-10-27 09:55:43 | Epoch: 2 | Step: 73560 | Dataset: 0-19392802 | Loss: 2.261 | 675 ms/step , 58227.67 GFLOP/s , 533196.3 tokens/s INFO:__main__:2024-10-27 09:55:51 | Epoch: 2 | Step: 73570 | Dataset: 0-19400802 | Loss: 2.211 | 674 ms/step , 58311.01 GFLOP/s , 533542.1 tokens/s INFO:__main__:2024-10-27 09:55:58 | Epoch: 2 | Step: 73580 | Dataset: 0-19408802 | Loss: 2.190 | 674 ms/step , 58294.21 GFLOP/s , 533679.7 tokens/s INFO:__main__:2024-10-27 09:56:06 | Epoch: 2 | Step: 73590 | Dataset: 0-19416802 | Loss: 2.186 | 675 ms/step , 58200.47 GFLOP/s , 533637.1 tokens/s INFO:__main__:2024-10-27 09:56:14 | Epoch: 2 | Step: 73600 | Dataset: 0-19424802 | Loss: 2.151 | 674 ms/step , 58287.95 GFLOP/s , 533209.4 tokens/s INFO:__main__:2024-10-27 09:56:21 | Epoch: 2 | Step: 73610 | Dataset: 0-19432802 | Loss: 2.119 | 674 ms/step , 58323.02 GFLOP/s , 533101.8 tokens/s INFO:__main__:2024-10-27 09:56:29 | Epoch: 2 | Step: 73620 | Dataset: 0-19440802 | Loss: 2.140 | 675 ms/step , 58240.20 GFLOP/s , 533178.2 tokens/s INFO:__main__:2024-10-27 09:56:37 | Epoch: 2 | Step: 73630 | Dataset: 0-19448802 | Loss: 2.122 | 675 ms/step , 58234.69 GFLOP/s , 532711.1 tokens/s INFO:__main__:2024-10-27 09:56:44 | Epoch: 2 | Step: 73640 | Dataset: 0-19456802 | Loss: 2.132 | 673 ms/step , 58432.72 GFLOP/s , 533982.6 tokens/s INFO:__main__:2024-10-27 09:56:52 | Epoch: 2 | Step: 73650 | Dataset: 0-19464802 | Loss: 2.182 | 675 ms/step , 58222.00 GFLOP/s , 533221.9 tokens/s INFO:__main__:2024-10-27 09:57:00 | Epoch: 2 | Step: 73660 | Dataset: 0-19472802 | Loss: 2.110 | 674 ms/step , 58284.09 GFLOP/s , 533007.4 tokens/s INFO:__main__:2024-10-27 09:57:07 | Epoch: 2 | Step: 73670 | Dataset: 0-19480802 | Loss: 2.171 | 675 ms/step , 58226.47 GFLOP/s , 532864.5 tokens/s INFO:__main__:2024-10-27 09:57:15 | Epoch: 2 | Step: 73680 | Dataset: 0-19488802 | Loss: 2.146 | 675 ms/step , 58272.34 GFLOP/s , 533164.6 tokens/s INFO:__main__:2024-10-27 09:57:23 | Epoch: 2 | Step: 73690 | Dataset: 0-19496802 | Loss: 2.111 | 696 ms/step , 56514.14 GFLOP/s , 531504.3 tokens/s INFO:__main__:2024-10-27 09:57:30 | Epoch: 2 | Step: 73700 | Dataset: 0-19504802 | Loss: 2.199 | 674 ms/step , 58335.72 GFLOP/s , 534191.3 tokens/s INFO:__main__:2024-10-27 09:57:38 | Epoch: 2 | Step: 73710 | Dataset: 0-19512802 | Loss: 2.218 | 674 ms/step , 58337.56 GFLOP/s , 533667.6 tokens/s INFO:__main__:2024-10-27 09:57:46 | Epoch: 2 | Step: 73720 | Dataset: 0-19520802 | Loss: 2.010 | 675 ms/step , 58258.44 GFLOP/s , 533199.4 tokens/s INFO:__main__:2024-10-27 09:57:53 | Epoch: 2 | Step: 73730 | Dataset: 0-19528802 | Loss: 2.145 | 673 ms/step , 58387.81 GFLOP/s , 533785.8 tokens/s INFO:__main__:2024-10-27 09:58:01 | Epoch: 2 | Step: 73740 | Dataset: 0-19536802 | Loss: 2.114 | 674 ms/step , 58311.02 GFLOP/s , 533207.9 tokens/s INFO:__main__:2024-10-27 09:58:09 | Epoch: 2 | Step: 73750 | Dataset: 0-19544802 | Loss: 2.062 | 674 ms/step , 58349.27 GFLOP/s , 533716.8 tokens/s INFO:__main__:2024-10-27 09:58:16 | Epoch: 2 | Step: 73760 | Dataset: 0-19552802 | Loss: 2.093 | 674 ms/step , 58348.72 GFLOP/s , 533472.4 tokens/s INFO:__main__:2024-10-27 09:58:24 | Epoch: 2 | Step: 73770 | Dataset: 0-19560802 | Loss: 2.130 | 674 ms/step , 58361.07 GFLOP/s , 533837.5 tokens/s INFO:__main__:2024-10-27 09:58:32 | Epoch: 2 | Step: 73780 | Dataset: 0-19568802 | Loss: 2.249 | 677 ms/step , 58071.06 GFLOP/s , 533487.9 tokens/s INFO:__main__:2024-10-27 09:58:39 | Epoch: 2 | Step: 73790 | Dataset: 0-19576802 | Loss: 2.138 | 675 ms/step , 58264.45 GFLOP/s , 533444.4 tokens/s INFO:__main__:2024-10-27 09:58:47 | Epoch: 2 | Step: 73800 | Dataset: 0-19584802 | Loss: 2.011 | 675 ms/step , 58213.82 GFLOP/s , 532542.4 tokens/s INFO:__main__:2024-10-27 09:58:55 | Epoch: 2 | Step: 73810 | Dataset: 0-19592802 | Loss: 2.142 | 674 ms/step , 58336.27 GFLOP/s , 533647.3 tokens/s INFO:__main__:2024-10-27 09:59:03 | Epoch: 2 | Step: 73820 | Dataset: 0-19600802 | Loss: 2.104 | 673 ms/step , 58397.76 GFLOP/s , 533502.1 tokens/s INFO:__main__:2024-10-27 09:59:10 | Epoch: 2 | Step: 73830 | Dataset: 0-19608802 | Loss: 2.225 | 675 ms/step , 58211.31 GFLOP/s , 531317.3 tokens/s INFO:__main__:2024-10-27 09:59:18 | Epoch: 2 | Step: 73840 | Dataset: 0-19616802 | Loss: 2.257 | 674 ms/step , 58328.26 GFLOP/s , 533401.8 tokens/s INFO:__main__:2024-10-27 09:59:26 | Epoch: 2 | Step: 73850 | Dataset: 0-19624802 | Loss: 2.177 | 676 ms/step , 58141.65 GFLOP/s , 532971.7 tokens/s INFO:__main__:2024-10-27 09:59:33 | Epoch: 2 | Step: 73860 | Dataset: 0-19632802 | Loss: 2.177 | 675 ms/step , 58205.98 GFLOP/s , 533004.2 tokens/s INFO:__main__:2024-10-27 09:59:41 | Epoch: 2 | Step: 73870 | Dataset: 0-19640802 | Loss: 2.160 | 675 ms/step , 58258.51 GFLOP/s , 533012.6 tokens/s INFO:__main__:2024-10-27 09:59:49 | Epoch: 2 | Step: 73880 | Dataset: 0-19648802 | Loss: 2.174 | 675 ms/step , 58205.59 GFLOP/s , 533199.5 tokens/s INFO:__main__:2024-10-27 09:59:56 | Epoch: 2 | Step: 73890 | Dataset: 0-19656802 | Loss: 2.164 | 675 ms/step , 58272.18 GFLOP/s , 533051.0 tokens/s INFO:__main__:2024-10-27 10:00:04 | Epoch: 2 | Step: 73900 | Dataset: 0-19664802 | Loss: 2.218 | 674 ms/step , 58307.51 GFLOP/s , 549159.3 tokens/s INFO:__main__:2024-10-27 10:00:11 | Epoch: 2 | Step: 73910 | Dataset: 0-19672802 | Loss: 2.166 | 674 ms/step , 58300.38 GFLOP/s , 532896.0 tokens/s INFO:__main__:2024-10-27 10:00:19 | Epoch: 2 | Step: 73920 | Dataset: 0-19680802 | Loss: 2.192 | 674 ms/step , 58302.97 GFLOP/s , 533514.2 tokens/s INFO:__main__:2024-10-27 10:00:27 | Epoch: 2 | Step: 73930 | Dataset: 0-19688802 | Loss: 2.170 | 676 ms/step , 58177.58 GFLOP/s , 532811.3 tokens/s INFO:__main__:2024-10-27 10:00:35 | Epoch: 2 | Step: 73940 | Dataset: 0-19696802 | Loss: 2.148 | 675 ms/step , 58230.83 GFLOP/s , 533100.7 tokens/s INFO:__main__:2024-10-27 10:00:42 | Epoch: 2 | Step: 73950 | Dataset: 0-19704802 | Loss: 2.184 | 675 ms/step , 58199.93 GFLOP/s , 532952.9 tokens/s INFO:__main__:2024-10-27 10:00:50 | Epoch: 2 | Step: 73960 | Dataset: 0-19712802 | Loss: 2.150 | 675 ms/step , 58252.83 GFLOP/s , 531673.0 tokens/s INFO:__main__:2024-10-27 10:00:58 | Epoch: 2 | Step: 73970 | Dataset: 0-19720802 | Loss: 2.133 | 675 ms/step , 58257.91 GFLOP/s , 533441.8 tokens/s INFO:__main__:2024-10-27 10:01:05 | Epoch: 3 | Step: 73980 | Dataset: 0-703 | Loss: 2.109 | 677 ms/step , 58062.46 GFLOP/s , 532676.8 tokens/s INFO:__main__:2024-10-27 10:01:13 | Epoch: 3 | Step: 73990 | Dataset: 0-8703 | Loss: 1.959 | 675 ms/step , 58228.94 GFLOP/s , 532297.1 tokens/s INFO:__main__:2024-10-27 10:01:20 | Validation | Step: 74000 | Val_loss: 2.305 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 10:01:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_100120_step_74000.pt` INFO:__main__:2024-10-27 10:01:22 | Epoch: 3 | Step: 74000 | Dataset: 0-16703 | Loss: 1.859 | 675 ms/step , 58247.22 GFLOP/s , 478830.2 tokens/s INFO:__main__:2024-10-27 10:01:29 | Epoch: 3 | Step: 74010 | Dataset: 0-24703 | Loss: 1.842 | 674 ms/step , 58302.03 GFLOP/s , 532389.5 tokens/s INFO:__main__:2024-10-27 10:01:37 | Epoch: 3 | Step: 74020 | Dataset: 0-32703 | Loss: 1.810 | 676 ms/step , 58187.52 GFLOP/s , 532179.2 tokens/s INFO:__main__:2024-10-27 10:01:45 | Epoch: 3 | Step: 74030 | Dataset: 0-40703 | Loss: 1.788 | 676 ms/step , 58185.76 GFLOP/s , 532099.3 tokens/s INFO:__main__:2024-10-27 10:01:52 | Epoch: 3 | Step: 74040 | Dataset: 0-48703 | Loss: 1.749 | 676 ms/step , 58188.27 GFLOP/s , 531335.6 tokens/s INFO:__main__:2024-10-27 10:02:00 | Epoch: 3 | Step: 74050 | Dataset: 0-56703 | Loss: 1.794 | 676 ms/step , 58164.40 GFLOP/s , 531633.4 tokens/s INFO:__main__:2024-10-27 10:02:08 | Epoch: 3 | Step: 74060 | Dataset: 0-64703 | Loss: 1.796 | 676 ms/step , 58176.24 GFLOP/s , 531090.2 tokens/s INFO:__main__:2024-10-27 10:02:15 | Epoch: 3 | Step: 74070 | Dataset: 0-72703 | Loss: 1.769 | 678 ms/step , 57951.12 GFLOP/s , 529394.5 tokens/s INFO:__main__:2024-10-27 10:02:23 | Epoch: 3 | Step: 74080 | Dataset: 0-80703 | Loss: 2.230 | 678 ms/step , 57990.26 GFLOP/s , 529615.9 tokens/s INFO:__main__:2024-10-27 10:02:31 | Epoch: 3 | Step: 74090 | Dataset: 0-88703 | Loss: 2.143 | 678 ms/step , 57953.73 GFLOP/s , 529459.3 tokens/s INFO:__main__:2024-10-27 10:02:39 | Epoch: 3 | Step: 74100 | Dataset: 0-96703 | Loss: 2.130 | 676 ms/step , 58116.44 GFLOP/s , 529689.0 tokens/s INFO:__main__:2024-10-27 10:02:46 | Epoch: 3 | Step: 74110 | Dataset: 0-104703 | Loss: 2.215 | 677 ms/step , 58042.75 GFLOP/s , 530376.7 tokens/s INFO:__main__:2024-10-27 10:02:54 | Epoch: 3 | Step: 74120 | Dataset: 0-112703 | Loss: 2.182 | 674 ms/step , 58287.99 GFLOP/s , 530459.0 tokens/s INFO:__main__:2024-10-27 10:03:02 | Epoch: 3 | Step: 74130 | Dataset: 0-120703 | Loss: 2.138 | 675 ms/step , 58194.69 GFLOP/s , 529746.2 tokens/s INFO:__main__:2024-10-27 10:03:10 | Epoch: 3 | Step: 74140 | Dataset: 0-128703 | Loss: 2.162 | 675 ms/step , 58252.65 GFLOP/s , 533279.1 tokens/s INFO:__main__:2024-10-27 10:03:17 | Epoch: 3 | Step: 74150 | Dataset: 0-136703 | Loss: 2.133 | 675 ms/step , 58233.17 GFLOP/s , 532704.5 tokens/s INFO:__main__:2024-10-27 10:03:25 | Epoch: 3 | Step: 74160 | Dataset: 0-144703 | Loss: 2.143 | 676 ms/step , 58123.30 GFLOP/s , 530712.0 tokens/s INFO:__main__:2024-10-27 10:03:33 | Epoch: 3 | Step: 74170 | Dataset: 0-152703 | Loss: 2.114 | 675 ms/step , 58262.99 GFLOP/s , 532703.1 tokens/s INFO:__main__:2024-10-27 10:03:40 | Epoch: 3 | Step: 74180 | Dataset: 0-160703 | Loss: 2.099 | 675 ms/step , 58227.48 GFLOP/s , 532424.4 tokens/s INFO:__main__:2024-10-27 10:03:48 | Epoch: 3 | Step: 74190 | Dataset: 0-168703 | Loss: 2.119 | 676 ms/step , 58185.35 GFLOP/s , 532891.3 tokens/s INFO:__main__:2024-10-27 10:03:56 | Epoch: 3 | Step: 74200 | Dataset: 0-176703 | Loss: 2.106 | 676 ms/step , 58137.50 GFLOP/s , 532421.3 tokens/s INFO:__main__:2024-10-27 10:04:03 | Epoch: 3 | Step: 74210 | Dataset: 0-184703 | Loss: 2.125 | 675 ms/step , 58225.32 GFLOP/s , 532399.4 tokens/s INFO:__main__:2024-10-27 10:04:11 | Epoch: 3 | Step: 74220 | Dataset: 0-192703 | Loss: 2.144 | 675 ms/step , 58267.50 GFLOP/s , 532587.0 tokens/s INFO:__main__:2024-10-27 10:04:19 | Epoch: 3 | Step: 74230 | Dataset: 0-200703 | Loss: 2.022 | 675 ms/step , 58200.04 GFLOP/s , 532998.5 tokens/s INFO:__main__:2024-10-27 10:04:26 | Epoch: 3 | Step: 74240 | Dataset: 0-208703 | Loss: 2.277 | 675 ms/step , 58212.43 GFLOP/s , 532688.5 tokens/s INFO:__main__:2024-10-27 10:04:34 | Epoch: 3 | Step: 74250 | Dataset: 0-216703 | Loss: 2.183 | 675 ms/step , 58273.96 GFLOP/s , 532501.8 tokens/s INFO:__main__:2024-10-27 10:04:42 | Epoch: 3 | Step: 74260 | Dataset: 0-224703 | Loss: 2.169 | 676 ms/step , 58184.13 GFLOP/s , 532918.2 tokens/s INFO:__main__:2024-10-27 10:04:50 | Epoch: 3 | Step: 74270 | Dataset: 0-232703 | Loss: 2.155 | 675 ms/step , 58266.47 GFLOP/s , 533119.4 tokens/s INFO:__main__:2024-10-27 10:04:57 | Epoch: 3 | Step: 74280 | Dataset: 0-240703 | Loss: 2.177 | 675 ms/step , 58277.70 GFLOP/s , 533123.4 tokens/s INFO:__main__:2024-10-27 10:05:05 | Epoch: 3 | Step: 74290 | Dataset: 0-248703 | Loss: 2.104 | 676 ms/step , 58171.88 GFLOP/s , 532116.0 tokens/s INFO:__main__:2024-10-27 10:05:13 | Epoch: 3 | Step: 74300 | Dataset: 0-256703 | Loss: 2.276 | 675 ms/step , 58197.47 GFLOP/s , 532531.6 tokens/s INFO:__main__:2024-10-27 10:05:20 | Epoch: 3 | Step: 74310 | Dataset: 0-264703 | Loss: 2.106 | 675 ms/step , 58213.94 GFLOP/s , 532887.1 tokens/s INFO:__main__:2024-10-27 10:05:28 | Epoch: 3 | Step: 74320 | Dataset: 0-272703 | Loss: 2.113 | 674 ms/step , 58315.36 GFLOP/s , 532851.7 tokens/s INFO:__main__:2024-10-27 10:05:36 | Epoch: 3 | Step: 74330 | Dataset: 0-280703 | Loss: 2.198 | 674 ms/step , 58343.81 GFLOP/s , 533062.5 tokens/s INFO:__main__:2024-10-27 10:05:43 | Epoch: 3 | Step: 74340 | Dataset: 0-288703 | Loss: 2.132 | 674 ms/step , 58304.64 GFLOP/s , 532568.3 tokens/s INFO:__main__:2024-10-27 10:05:51 | Epoch: 3 | Step: 74350 | Dataset: 0-296703 | Loss: 2.159 | 674 ms/step , 58317.00 GFLOP/s , 532896.4 tokens/s INFO:__main__:2024-10-27 10:05:59 | Epoch: 3 | Step: 74360 | Dataset: 0-304703 | Loss: 2.186 | 674 ms/step , 58306.47 GFLOP/s , 533317.8 tokens/s INFO:__main__:2024-10-27 10:06:06 | Epoch: 3 | Step: 74370 | Dataset: 0-312703 | Loss: 2.139 | 675 ms/step , 58258.01 GFLOP/s , 532201.4 tokens/s INFO:__main__:2024-10-27 10:06:14 | Epoch: 3 | Step: 74380 | Dataset: 0-320703 | Loss: 2.162 | 673 ms/step , 58367.84 GFLOP/s , 532989.9 tokens/s INFO:__main__:2024-10-27 10:06:22 | Epoch: 3 | Step: 74390 | Dataset: 0-328703 | Loss: 2.182 | 675 ms/step , 58253.23 GFLOP/s , 533330.7 tokens/s INFO:__main__:2024-10-27 10:06:29 | Epoch: 3 | Step: 74400 | Dataset: 0-336703 | Loss: 1.864 | 676 ms/step , 58171.35 GFLOP/s , 536273.9 tokens/s INFO:__main__:2024-10-27 10:06:37 | Epoch: 3 | Step: 74410 | Dataset: 0-344703 | Loss: 1.784 | 673 ms/step , 58386.26 GFLOP/s , 532616.8 tokens/s INFO:__main__:2024-10-27 10:06:45 | Epoch: 3 | Step: 74420 | Dataset: 0-352703 | Loss: 1.795 | 675 ms/step , 58197.77 GFLOP/s , 532696.1 tokens/s INFO:__main__:2024-10-27 10:06:52 | Epoch: 3 | Step: 74430 | Dataset: 0-360703 | Loss: 1.787 | 673 ms/step , 58374.77 GFLOP/s , 533000.8 tokens/s INFO:__main__:2024-10-27 10:07:00 | Epoch: 3 | Step: 74440 | Dataset: 0-368703 | Loss: 1.787 | 677 ms/step , 58083.57 GFLOP/s , 532288.0 tokens/s INFO:__main__:2024-10-27 10:07:08 | Epoch: 3 | Step: 74450 | Dataset: 0-376703 | Loss: 1.761 | 677 ms/step , 58104.37 GFLOP/s , 531930.0 tokens/s INFO:__main__:2024-10-27 10:07:16 | Epoch: 3 | Step: 74460 | Dataset: 0-384703 | Loss: 1.800 | 675 ms/step , 58269.73 GFLOP/s , 531341.6 tokens/s INFO:__main__:2024-10-27 10:07:23 | Epoch: 3 | Step: 74470 | Dataset: 0-392703 | Loss: 1.774 | 675 ms/step , 58224.03 GFLOP/s , 532475.7 tokens/s INFO:__main__:2024-10-27 10:07:31 | Epoch: 3 | Step: 74480 | Dataset: 0-400703 | Loss: 1.760 | 675 ms/step , 58262.65 GFLOP/s , 532664.3 tokens/s INFO:__main__:2024-10-27 10:07:39 | Epoch: 3 | Step: 74490 | Dataset: 0-408703 | Loss: 2.107 | 674 ms/step , 58317.58 GFLOP/s , 532970.4 tokens/s INFO:__main__:2024-10-27 10:07:46 | Epoch: 3 | Step: 74500 | Dataset: 0-416703 | Loss: 2.200 | 675 ms/step , 58276.84 GFLOP/s , 533187.9 tokens/s INFO:__main__:2024-10-27 10:07:54 | Epoch: 3 | Step: 74510 | Dataset: 0-424703 | Loss: 2.188 | 675 ms/step , 58268.78 GFLOP/s , 533310.3 tokens/s INFO:__main__:2024-10-27 10:08:02 | Epoch: 3 | Step: 74520 | Dataset: 0-432703 | Loss: 2.164 | 674 ms/step , 58343.96 GFLOP/s , 533246.7 tokens/s INFO:__main__:2024-10-27 10:08:09 | Epoch: 3 | Step: 74530 | Dataset: 0-440703 | Loss: 2.172 | 674 ms/step , 58319.03 GFLOP/s , 533599.7 tokens/s INFO:__main__:2024-10-27 10:08:17 | Epoch: 3 | Step: 74540 | Dataset: 0-448703 | Loss: 2.183 | 674 ms/step , 58321.41 GFLOP/s , 533385.3 tokens/s INFO:__main__:2024-10-27 10:08:25 | Epoch: 3 | Step: 74550 | Dataset: 0-456703 | Loss: 2.111 | 673 ms/step , 58370.76 GFLOP/s , 533473.0 tokens/s INFO:__main__:2024-10-27 10:08:32 | Epoch: 3 | Step: 74560 | Dataset: 0-464703 | Loss: 2.178 | 674 ms/step , 58298.82 GFLOP/s , 533405.5 tokens/s INFO:__main__:2024-10-27 10:08:40 | Epoch: 3 | Step: 74570 | Dataset: 0-472703 | Loss: 2.099 | 674 ms/step , 58314.78 GFLOP/s , 533316.6 tokens/s INFO:__main__:2024-10-27 10:08:48 | Epoch: 3 | Step: 74580 | Dataset: 0-480703 | Loss: 2.206 | 675 ms/step , 58264.53 GFLOP/s , 533352.1 tokens/s INFO:__main__:2024-10-27 10:08:55 | Epoch: 3 | Step: 74590 | Dataset: 0-488703 | Loss: 2.139 | 675 ms/step , 58241.27 GFLOP/s , 532824.5 tokens/s INFO:__main__:2024-10-27 10:09:03 | Epoch: 3 | Step: 74600 | Dataset: 0-496703 | Loss: 2.242 | 675 ms/step , 58227.81 GFLOP/s , 532819.2 tokens/s INFO:__main__:2024-10-27 10:09:11 | Epoch: 3 | Step: 74610 | Dataset: 0-504703 | Loss: 2.161 | 674 ms/step , 58294.44 GFLOP/s , 533189.4 tokens/s INFO:__main__:2024-10-27 10:09:19 | Epoch: 3 | Step: 74620 | Dataset: 0-512703 | Loss: 2.214 | 675 ms/step , 58228.66 GFLOP/s , 532574.1 tokens/s INFO:__main__:2024-10-27 10:09:26 | Epoch: 3 | Step: 74630 | Dataset: 0-520703 | Loss: 2.086 | 674 ms/step , 58319.19 GFLOP/s , 533360.0 tokens/s INFO:__main__:2024-10-27 10:09:34 | Epoch: 3 | Step: 74640 | Dataset: 0-528703 | Loss: 2.075 | 674 ms/step , 58359.03 GFLOP/s , 533099.9 tokens/s INFO:__main__:2024-10-27 10:09:42 | Epoch: 3 | Step: 74650 | Dataset: 0-536703 | Loss: 2.090 | 674 ms/step , 58290.77 GFLOP/s , 533532.6 tokens/s INFO:__main__:2024-10-27 10:09:49 | Epoch: 3 | Step: 74660 | Dataset: 0-544703 | Loss: 2.054 | 674 ms/step , 58304.22 GFLOP/s , 533436.1 tokens/s INFO:__main__:2024-10-27 10:09:57 | Epoch: 3 | Step: 74670 | Dataset: 0-552703 | Loss: 2.108 | 673 ms/step , 58376.41 GFLOP/s , 532924.8 tokens/s INFO:__main__:2024-10-27 10:10:05 | Epoch: 3 | Step: 74680 | Dataset: 0-560703 | Loss: 2.072 | 675 ms/step , 58263.36 GFLOP/s , 532876.8 tokens/s INFO:__main__:2024-10-27 10:10:12 | Epoch: 3 | Step: 74690 | Dataset: 0-568703 | Loss: 2.056 | 675 ms/step , 58267.27 GFLOP/s , 532664.9 tokens/s INFO:__main__:2024-10-27 10:10:20 | Epoch: 3 | Step: 74700 | Dataset: 0-576703 | Loss: 2.058 | 673 ms/step , 58367.17 GFLOP/s , 532996.8 tokens/s INFO:__main__:2024-10-27 10:10:28 | Epoch: 3 | Step: 74710 | Dataset: 0-584703 | Loss: 2.152 | 675 ms/step , 58253.24 GFLOP/s , 532198.0 tokens/s INFO:__main__:2024-10-27 10:10:35 | Epoch: 3 | Step: 74720 | Dataset: 0-592703 | Loss: 2.149 | 675 ms/step , 58253.37 GFLOP/s , 533141.6 tokens/s INFO:__main__:2024-10-27 10:10:43 | Epoch: 3 | Step: 74730 | Dataset: 0-600703 | Loss: 1.997 | 674 ms/step , 58280.54 GFLOP/s , 532860.8 tokens/s INFO:__main__:2024-10-27 10:10:51 | Epoch: 3 | Step: 74740 | Dataset: 0-608703 | Loss: 2.101 | 674 ms/step , 58290.43 GFLOP/s , 533196.2 tokens/s INFO:__main__:2024-10-27 10:10:58 | Epoch: 3 | Step: 74750 | Dataset: 0-616703 | Loss: 2.062 | 674 ms/step , 58358.39 GFLOP/s , 533461.9 tokens/s INFO:__main__:2024-10-27 10:11:06 | Epoch: 3 | Step: 74760 | Dataset: 0-624703 | Loss: 2.091 | 673 ms/step , 58379.19 GFLOP/s , 533221.6 tokens/s INFO:__main__:2024-10-27 10:11:14 | Epoch: 3 | Step: 74770 | Dataset: 0-632703 | Loss: 2.158 | 675 ms/step , 58218.62 GFLOP/s , 532880.3 tokens/s INFO:__main__:2024-10-27 10:11:21 | Epoch: 3 | Step: 74780 | Dataset: 0-640703 | Loss: 2.042 | 674 ms/step , 58335.26 GFLOP/s , 532497.7 tokens/s INFO:__main__:2024-10-27 10:11:29 | Epoch: 3 | Step: 74790 | Dataset: 0-648703 | Loss: 2.079 | 674 ms/step , 58338.27 GFLOP/s , 532599.8 tokens/s INFO:__main__:2024-10-27 10:11:37 | Epoch: 3 | Step: 74800 | Dataset: 0-656703 | Loss: 2.050 | 674 ms/step , 58336.10 GFLOP/s , 533096.4 tokens/s INFO:__main__:2024-10-27 10:11:45 | Epoch: 3 | Step: 74810 | Dataset: 0-664703 | Loss: 2.118 | 675 ms/step , 58249.52 GFLOP/s , 532862.1 tokens/s INFO:__main__:2024-10-27 10:11:52 | Epoch: 3 | Step: 74820 | Dataset: 0-672703 | Loss: 2.148 | 673 ms/step , 58410.89 GFLOP/s , 533112.0 tokens/s INFO:__main__:2024-10-27 10:12:00 | Epoch: 3 | Step: 74830 | Dataset: 0-680703 | Loss: 2.160 | 675 ms/step , 58254.53 GFLOP/s , 533232.0 tokens/s INFO:__main__:2024-10-27 10:12:08 | Epoch: 3 | Step: 74840 | Dataset: 0-688703 | Loss: 2.143 | 675 ms/step , 58267.06 GFLOP/s , 532650.4 tokens/s INFO:__main__:2024-10-27 10:12:15 | Epoch: 3 | Step: 74850 | Dataset: 0-696703 | Loss: 2.167 | 675 ms/step , 58226.62 GFLOP/s , 532175.6 tokens/s INFO:__main__:2024-10-27 10:12:23 | Epoch: 3 | Step: 74860 | Dataset: 0-704703 | Loss: 2.086 | 674 ms/step , 58353.56 GFLOP/s , 532841.1 tokens/s INFO:__main__:2024-10-27 10:12:31 | Epoch: 3 | Step: 74870 | Dataset: 0-712703 | Loss: 2.168 | 674 ms/step , 58364.41 GFLOP/s , 533203.6 tokens/s INFO:__main__:2024-10-27 10:12:38 | Epoch: 3 | Step: 74880 | Dataset: 0-720703 | Loss: 2.160 | 675 ms/step , 58221.19 GFLOP/s , 532933.7 tokens/s INFO:__main__:2024-10-27 10:12:46 | Epoch: 3 | Step: 74890 | Dataset: 0-728703 | Loss: 2.108 | 674 ms/step , 58343.42 GFLOP/s , 532970.6 tokens/s INFO:__main__:2024-10-27 10:12:54 | Epoch: 3 | Step: 74900 | Dataset: 0-736703 | Loss: 2.023 | 674 ms/step , 58329.27 GFLOP/s , 533577.4 tokens/s INFO:__main__:2024-10-27 10:13:01 | Epoch: 3 | Step: 74910 | Dataset: 0-744703 | Loss: 2.162 | 674 ms/step , 58305.98 GFLOP/s , 532329.2 tokens/s INFO:__main__:2024-10-27 10:13:09 | Epoch: 3 | Step: 74920 | Dataset: 0-752703 | Loss: 2.090 | 675 ms/step , 58228.02 GFLOP/s , 532787.9 tokens/s INFO:__main__:2024-10-27 10:13:17 | Epoch: 3 | Step: 74930 | Dataset: 0-760703 | Loss: 2.069 | 675 ms/step , 58208.19 GFLOP/s , 532407.6 tokens/s INFO:__main__:2024-10-27 10:13:24 | Epoch: 3 | Step: 74940 | Dataset: 0-768703 | Loss: 2.073 | 674 ms/step , 58290.23 GFLOP/s , 532997.4 tokens/s INFO:__main__:2024-10-27 10:13:32 | Epoch: 3 | Step: 74950 | Dataset: 0-776703 | Loss: 2.080 | 674 ms/step , 58322.29 GFLOP/s , 533188.1 tokens/s INFO:__main__:2024-10-27 10:13:40 | Epoch: 3 | Step: 74960 | Dataset: 0-784703 | Loss: 2.086 | 675 ms/step , 58272.67 GFLOP/s , 532625.9 tokens/s INFO:__main__:2024-10-27 10:13:48 | Epoch: 3 | Step: 74970 | Dataset: 0-792703 | Loss: 2.149 | 675 ms/step , 58270.77 GFLOP/s , 532737.9 tokens/s INFO:__main__:2024-10-27 10:13:55 | Epoch: 3 | Step: 74980 | Dataset: 0-800703 | Loss: 2.202 | 675 ms/step , 58213.63 GFLOP/s , 533093.2 tokens/s INFO:__main__:2024-10-27 10:14:03 | Epoch: 3 | Step: 74990 | Dataset: 0-808703 | Loss: 2.171 | 674 ms/step , 58352.23 GFLOP/s , 533104.3 tokens/s INFO:__main__:2024-10-27 10:14:10 | Validation | Step: 75000 | Val_loss: 2.147 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 10:14:10 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_101410_step_75000.pt` INFO:__main__:2024-10-27 10:14:11 | Epoch: 3 | Step: 75000 | Dataset: 0-816703 | Loss: 2.089 | 674 ms/step , 58356.86 GFLOP/s , 480274.2 tokens/s INFO:__main__:2024-10-27 10:14:19 | Epoch: 3 | Step: 75010 | Dataset: 0-824703 | Loss: 2.081 | 674 ms/step , 58311.23 GFLOP/s , 533297.7 tokens/s INFO:__main__:2024-10-27 10:14:27 | Epoch: 3 | Step: 75020 | Dataset: 0-832703 | Loss: 2.077 | 674 ms/step , 58338.24 GFLOP/s , 533273.1 tokens/s INFO:__main__:2024-10-27 10:14:34 | Epoch: 3 | Step: 75030 | Dataset: 0-840703 | Loss: 2.046 | 674 ms/step , 58309.54 GFLOP/s , 533884.1 tokens/s INFO:__main__:2024-10-27 10:14:42 | Epoch: 3 | Step: 75040 | Dataset: 0-848703 | Loss: 2.088 | 675 ms/step , 58268.21 GFLOP/s , 532841.4 tokens/s INFO:__main__:2024-10-27 10:14:50 | Epoch: 3 | Step: 75050 | Dataset: 0-856703 | Loss: 2.140 | 675 ms/step , 58207.02 GFLOP/s , 532998.8 tokens/s INFO:__main__:2024-10-27 10:14:57 | Epoch: 3 | Step: 75060 | Dataset: 0-864703 | Loss: 2.169 | 674 ms/step , 58352.02 GFLOP/s , 532933.5 tokens/s INFO:__main__:2024-10-27 10:15:05 | Epoch: 3 | Step: 75070 | Dataset: 0-872703 | Loss: 2.022 | 675 ms/step , 58266.62 GFLOP/s , 533273.4 tokens/s INFO:__main__:2024-10-27 10:15:13 | Epoch: 3 | Step: 75080 | Dataset: 0-880703 | Loss: 2.068 | 674 ms/step , 58310.29 GFLOP/s , 533509.7 tokens/s INFO:__main__:2024-10-27 10:15:21 | Epoch: 3 | Step: 75090 | Dataset: 0-888703 | Loss: 2.059 | 674 ms/step , 58306.37 GFLOP/s , 532999.0 tokens/s INFO:__main__:2024-10-27 10:15:28 | Epoch: 3 | Step: 75100 | Dataset: 0-896703 | Loss: 2.051 | 675 ms/step , 58230.77 GFLOP/s , 533130.0 tokens/s INFO:__main__:2024-10-27 10:15:36 | Epoch: 3 | Step: 75110 | Dataset: 0-904703 | Loss: 2.052 | 675 ms/step , 58213.89 GFLOP/s , 532787.3 tokens/s INFO:__main__:2024-10-27 10:15:44 | Epoch: 3 | Step: 75120 | Dataset: 0-912703 | Loss: 2.078 | 676 ms/step , 58178.60 GFLOP/s , 532447.2 tokens/s INFO:__main__:2024-10-27 10:15:51 | Epoch: 3 | Step: 75130 | Dataset: 0-920703 | Loss: 2.158 | 675 ms/step , 58226.80 GFLOP/s , 532851.9 tokens/s INFO:__main__:2024-10-27 10:15:59 | Epoch: 3 | Step: 75140 | Dataset: 0-928703 | Loss: 2.267 | 673 ms/step , 58380.05 GFLOP/s , 533393.2 tokens/s INFO:__main__:2024-10-27 10:16:07 | Epoch: 3 | Step: 75150 | Dataset: 0-936703 | Loss: 2.195 | 677 ms/step , 58067.56 GFLOP/s , 532239.3 tokens/s INFO:__main__:2024-10-27 10:16:14 | Epoch: 3 | Step: 75160 | Dataset: 0-944703 | Loss: 2.122 | 677 ms/step , 58058.12 GFLOP/s , 532047.5 tokens/s INFO:__main__:2024-10-27 10:16:22 | Epoch: 3 | Step: 75170 | Dataset: 0-952703 | Loss: 2.061 | 677 ms/step , 58071.34 GFLOP/s , 531649.3 tokens/s INFO:__main__:2024-10-27 10:16:30 | Epoch: 3 | Step: 75180 | Dataset: 0-960703 | Loss: 2.163 | 677 ms/step , 58022.62 GFLOP/s , 531505.3 tokens/s INFO:__main__:2024-10-27 10:16:37 | Epoch: 3 | Step: 75190 | Dataset: 0-968703 | Loss: 2.049 | 674 ms/step , 58338.41 GFLOP/s , 532818.5 tokens/s INFO:__main__:2024-10-27 10:16:45 | Epoch: 3 | Step: 75200 | Dataset: 0-976703 | Loss: 2.134 | 674 ms/step , 58329.91 GFLOP/s , 533563.9 tokens/s INFO:__main__:2024-10-27 10:16:53 | Epoch: 3 | Step: 75210 | Dataset: 0-984703 | Loss: 2.108 | 673 ms/step , 58417.37 GFLOP/s , 534134.8 tokens/s INFO:__main__:2024-10-27 10:17:00 | Epoch: 3 | Step: 75220 | Dataset: 0-992703 | Loss: 2.085 | 675 ms/step , 58275.81 GFLOP/s , 532838.3 tokens/s INFO:__main__:2024-10-27 10:17:08 | Epoch: 3 | Step: 75230 | Dataset: 0-1000703 | Loss: 2.018 | 673 ms/step , 58403.39 GFLOP/s , 533620.9 tokens/s INFO:__main__:2024-10-27 10:17:16 | Epoch: 3 | Step: 75240 | Dataset: 0-1008703 | Loss: 2.175 | 674 ms/step , 58352.70 GFLOP/s , 533441.6 tokens/s INFO:__main__:2024-10-27 10:17:24 | Epoch: 3 | Step: 75250 | Dataset: 0-1016703 | Loss: 2.102 | 675 ms/step , 58200.23 GFLOP/s , 533091.5 tokens/s INFO:__main__:2024-10-27 10:17:31 | Epoch: 3 | Step: 75260 | Dataset: 0-1024703 | Loss: 2.135 | 673 ms/step , 58370.42 GFLOP/s , 533129.5 tokens/s INFO:__main__:2024-10-27 10:17:39 | Epoch: 3 | Step: 75270 | Dataset: 0-1032703 | Loss: 2.159 | 674 ms/step , 58287.42 GFLOP/s , 532848.6 tokens/s INFO:__main__:2024-10-27 10:17:47 | Epoch: 3 | Step: 75280 | Dataset: 0-1040703 | Loss: 2.013 | 675 ms/step , 58278.78 GFLOP/s , 532867.8 tokens/s INFO:__main__:2024-10-27 10:17:54 | Epoch: 3 | Step: 75290 | Dataset: 0-1048703 | Loss: 2.062 | 673 ms/step , 58390.16 GFLOP/s , 533496.2 tokens/s INFO:__main__:2024-10-27 10:18:02 | Epoch: 3 | Step: 75300 | Dataset: 0-1056703 | Loss: 1.925 | 674 ms/step , 58295.75 GFLOP/s , 532987.6 tokens/s INFO:__main__:2024-10-27 10:18:10 | Epoch: 3 | Step: 75310 | Dataset: 0-1064703 | Loss: 1.861 | 676 ms/step , 58188.87 GFLOP/s , 532190.9 tokens/s INFO:__main__:2024-10-27 10:18:17 | Epoch: 3 | Step: 75320 | Dataset: 0-1072703 | Loss: 1.800 | 674 ms/step , 58321.75 GFLOP/s , 532996.9 tokens/s INFO:__main__:2024-10-27 10:18:25 | Epoch: 3 | Step: 75330 | Dataset: 0-1080703 | Loss: 1.781 | 676 ms/step , 58165.62 GFLOP/s , 532503.8 tokens/s INFO:__main__:2024-10-27 10:18:33 | Epoch: 3 | Step: 75340 | Dataset: 0-1088703 | Loss: 1.812 | 675 ms/step , 58244.77 GFLOP/s , 532900.0 tokens/s INFO:__main__:2024-10-27 10:18:40 | Epoch: 3 | Step: 75350 | Dataset: 0-1096703 | Loss: 1.754 | 675 ms/step , 58266.02 GFLOP/s , 532471.9 tokens/s INFO:__main__:2024-10-27 10:18:48 | Epoch: 3 | Step: 75360 | Dataset: 0-1104703 | Loss: 1.734 | 673 ms/step , 58380.37 GFLOP/s , 532430.5 tokens/s INFO:__main__:2024-10-27 10:18:56 | Epoch: 3 | Step: 75370 | Dataset: 0-1112703 | Loss: 1.778 | 675 ms/step , 58195.91 GFLOP/s , 532868.8 tokens/s INFO:__main__:2024-10-27 10:19:03 | Epoch: 3 | Step: 75380 | Dataset: 0-1120703 | Loss: 2.299 | 676 ms/step , 58182.61 GFLOP/s , 532537.8 tokens/s INFO:__main__:2024-10-27 10:19:11 | Epoch: 3 | Step: 75390 | Dataset: 0-1128703 | Loss: 2.232 | 674 ms/step , 58312.82 GFLOP/s , 532854.0 tokens/s INFO:__main__:2024-10-27 10:19:19 | Epoch: 3 | Step: 75400 | Dataset: 0-1136703 | Loss: 2.151 | 675 ms/step , 58231.97 GFLOP/s , 532759.2 tokens/s INFO:__main__:2024-10-27 10:19:27 | Epoch: 3 | Step: 75410 | Dataset: 0-1144703 | Loss: 2.254 | 676 ms/step , 58159.25 GFLOP/s , 532046.6 tokens/s INFO:__main__:2024-10-27 10:19:34 | Epoch: 3 | Step: 75420 | Dataset: 0-1152703 | Loss: 2.245 | 675 ms/step , 58274.89 GFLOP/s , 533087.3 tokens/s INFO:__main__:2024-10-27 10:19:42 | Epoch: 3 | Step: 75430 | Dataset: 0-1160703 | Loss: 2.202 | 675 ms/step , 58256.36 GFLOP/s , 533048.5 tokens/s INFO:__main__:2024-10-27 10:19:50 | Epoch: 3 | Step: 75440 | Dataset: 0-1168703 | Loss: 2.158 | 675 ms/step , 58204.97 GFLOP/s , 532755.8 tokens/s INFO:__main__:2024-10-27 10:19:57 | Epoch: 3 | Step: 75450 | Dataset: 0-1176703 | Loss: 2.181 | 674 ms/step , 58294.03 GFLOP/s , 532670.8 tokens/s INFO:__main__:2024-10-27 10:20:05 | Epoch: 3 | Step: 75460 | Dataset: 0-1184703 | Loss: 2.172 | 675 ms/step , 58216.07 GFLOP/s , 532782.2 tokens/s INFO:__main__:2024-10-27 10:20:13 | Epoch: 3 | Step: 75470 | Dataset: 0-1192703 | Loss: 2.147 | 674 ms/step , 58323.61 GFLOP/s , 533128.5 tokens/s INFO:__main__:2024-10-27 10:20:20 | Epoch: 3 | Step: 75480 | Dataset: 0-1200703 | Loss: 2.164 | 675 ms/step , 58199.87 GFLOP/s , 533085.6 tokens/s INFO:__main__:2024-10-27 10:20:28 | Epoch: 3 | Step: 75490 | Dataset: 0-1208703 | Loss: 2.168 | 677 ms/step , 58078.81 GFLOP/s , 530871.6 tokens/s INFO:__main__:2024-10-27 10:20:36 | Epoch: 3 | Step: 75500 | Dataset: 0-1216703 | Loss: 2.168 | 675 ms/step , 58233.22 GFLOP/s , 531626.1 tokens/s INFO:__main__:2024-10-27 10:20:43 | Epoch: 3 | Step: 75510 | Dataset: 0-1224703 | Loss: 2.184 | 676 ms/step , 58187.88 GFLOP/s , 531598.0 tokens/s INFO:__main__:2024-10-27 10:20:51 | Epoch: 3 | Step: 75520 | Dataset: 0-1232703 | Loss: 2.127 | 675 ms/step , 58261.30 GFLOP/s , 532248.2 tokens/s INFO:__main__:2024-10-27 10:20:59 | Epoch: 3 | Step: 75530 | Dataset: 0-1240703 | Loss: 2.170 | 675 ms/step , 58238.83 GFLOP/s , 531781.1 tokens/s INFO:__main__:2024-10-27 10:21:07 | Epoch: 3 | Step: 75540 | Dataset: 0-1248703 | Loss: 2.230 | 675 ms/step , 58206.73 GFLOP/s , 532037.1 tokens/s INFO:__main__:2024-10-27 10:21:14 | Epoch: 3 | Step: 75550 | Dataset: 0-1256703 | Loss: 2.139 | 675 ms/step , 58255.05 GFLOP/s , 531209.8 tokens/s INFO:__main__:2024-10-27 10:21:22 | Epoch: 3 | Step: 75560 | Dataset: 0-1264703 | Loss: 2.146 | 675 ms/step , 58201.97 GFLOP/s , 530517.2 tokens/s INFO:__main__:2024-10-27 10:21:30 | Epoch: 3 | Step: 75570 | Dataset: 0-1272703 | Loss: 2.152 | 675 ms/step , 58198.40 GFLOP/s , 530352.0 tokens/s INFO:__main__:2024-10-27 10:21:37 | Epoch: 3 | Step: 75580 | Dataset: 0-1280703 | Loss: 2.185 | 675 ms/step , 58244.49 GFLOP/s , 532682.9 tokens/s INFO:__main__:2024-10-27 10:21:45 | Epoch: 3 | Step: 75590 | Dataset: 0-1288703 | Loss: 2.184 | 676 ms/step , 58158.46 GFLOP/s , 532811.5 tokens/s INFO:__main__:2024-10-27 10:21:53 | Epoch: 3 | Step: 75600 | Dataset: 0-1296703 | Loss: 2.142 | 674 ms/step , 58363.55 GFLOP/s , 533081.9 tokens/s INFO:__main__:2024-10-27 10:22:00 | Epoch: 3 | Step: 75610 | Dataset: 0-1304703 | Loss: 2.189 | 675 ms/step , 58266.93 GFLOP/s , 532849.3 tokens/s INFO:__main__:2024-10-27 10:22:08 | Epoch: 3 | Step: 75620 | Dataset: 0-1312703 | Loss: 2.157 | 675 ms/step , 58227.71 GFLOP/s , 532506.2 tokens/s INFO:__main__:2024-10-27 10:22:16 | Epoch: 3 | Step: 75630 | Dataset: 0-1320703 | Loss: 2.104 | 675 ms/step , 58264.43 GFLOP/s , 532708.5 tokens/s INFO:__main__:2024-10-27 10:22:24 | Epoch: 3 | Step: 75640 | Dataset: 0-1328703 | Loss: 2.178 | 674 ms/step , 58286.38 GFLOP/s , 532619.4 tokens/s INFO:__main__:2024-10-27 10:22:31 | Epoch: 3 | Step: 75650 | Dataset: 0-1336703 | Loss: 2.121 | 674 ms/step , 58308.42 GFLOP/s , 533471.0 tokens/s INFO:__main__:2024-10-27 10:22:39 | Epoch: 3 | Step: 75660 | Dataset: 0-1344703 | Loss: 2.146 | 675 ms/step , 58260.56 GFLOP/s , 533179.3 tokens/s INFO:__main__:2024-10-27 10:22:47 | Epoch: 3 | Step: 75670 | Dataset: 0-1352703 | Loss: 2.138 | 675 ms/step , 58245.07 GFLOP/s , 533238.0 tokens/s INFO:__main__:2024-10-27 10:22:54 | Epoch: 3 | Step: 75680 | Dataset: 0-1360703 | Loss: 2.168 | 674 ms/step , 58286.25 GFLOP/s , 533292.0 tokens/s INFO:__main__:2024-10-27 10:23:02 | Epoch: 3 | Step: 75690 | Dataset: 0-1368703 | Loss: 2.155 | 676 ms/step , 58167.58 GFLOP/s , 533002.6 tokens/s INFO:__main__:2024-10-27 10:23:10 | Epoch: 3 | Step: 75700 | Dataset: 0-1376703 | Loss: 2.132 | 675 ms/step , 58209.10 GFLOP/s , 533228.2 tokens/s INFO:__main__:2024-10-27 10:23:17 | Epoch: 3 | Step: 75710 | Dataset: 0-1384703 | Loss: 2.186 | 675 ms/step , 58272.80 GFLOP/s , 532992.0 tokens/s INFO:__main__:2024-10-27 10:23:25 | Epoch: 3 | Step: 75720 | Dataset: 0-1392703 | Loss: 2.175 | 674 ms/step , 58352.17 GFLOP/s , 533467.4 tokens/s INFO:__main__:2024-10-27 10:23:33 | Epoch: 3 | Step: 75730 | Dataset: 0-1400703 | Loss: 2.214 | 674 ms/step , 58363.99 GFLOP/s , 533409.6 tokens/s INFO:__main__:2024-10-27 10:23:40 | Epoch: 3 | Step: 75740 | Dataset: 0-1408703 | Loss: 2.167 | 675 ms/step , 58260.95 GFLOP/s , 533220.1 tokens/s INFO:__main__:2024-10-27 10:23:48 | Epoch: 3 | Step: 75750 | Dataset: 0-1416703 | Loss: 2.132 | 674 ms/step , 58279.44 GFLOP/s , 533041.5 tokens/s INFO:__main__:2024-10-27 10:23:56 | Epoch: 3 | Step: 75760 | Dataset: 0-1424703 | Loss: 2.118 | 675 ms/step , 58226.33 GFLOP/s , 532907.2 tokens/s INFO:__main__:2024-10-27 10:24:03 | Epoch: 3 | Step: 75770 | Dataset: 0-1432703 | Loss: 2.187 | 674 ms/step , 58322.87 GFLOP/s , 533484.0 tokens/s INFO:__main__:2024-10-27 10:24:11 | Epoch: 3 | Step: 75780 | Dataset: 0-1440703 | Loss: 2.169 | 674 ms/step , 58289.18 GFLOP/s , 533077.6 tokens/s INFO:__main__:2024-10-27 10:24:19 | Epoch: 3 | Step: 75790 | Dataset: 0-1448703 | Loss: 2.065 | 674 ms/step , 58284.01 GFLOP/s , 532942.5 tokens/s INFO:__main__:2024-10-27 10:24:26 | Epoch: 3 | Step: 75800 | Dataset: 0-1456703 | Loss: 2.171 | 676 ms/step , 58188.19 GFLOP/s , 532901.9 tokens/s INFO:__main__:2024-10-27 10:24:34 | Epoch: 3 | Step: 75810 | Dataset: 0-1464703 | Loss: 2.196 | 674 ms/step , 58317.90 GFLOP/s , 533010.9 tokens/s INFO:__main__:2024-10-27 10:24:42 | Epoch: 3 | Step: 75820 | Dataset: 0-1472703 | Loss: 2.176 | 674 ms/step , 58348.90 GFLOP/s , 533150.0 tokens/s INFO:__main__:2024-10-27 10:24:50 | Epoch: 3 | Step: 75830 | Dataset: 0-1480703 | Loss: 2.141 | 674 ms/step , 58362.94 GFLOP/s , 533200.3 tokens/s INFO:__main__:2024-10-27 10:24:57 | Epoch: 3 | Step: 75840 | Dataset: 0-1488703 | Loss: 2.128 | 674 ms/step , 58279.38 GFLOP/s , 533264.5 tokens/s INFO:__main__:2024-10-27 10:25:05 | Epoch: 3 | Step: 75850 | Dataset: 0-1496703 | Loss: 2.187 | 674 ms/step , 58306.01 GFLOP/s , 530077.6 tokens/s INFO:__main__:2024-10-27 10:25:13 | Epoch: 3 | Step: 75860 | Dataset: 0-1504703 | Loss: 2.101 | 675 ms/step , 58249.97 GFLOP/s , 533043.8 tokens/s INFO:__main__:2024-10-27 10:25:20 | Epoch: 3 | Step: 75870 | Dataset: 0-1512703 | Loss: 1.873 | 675 ms/step , 58251.29 GFLOP/s , 532475.5 tokens/s INFO:__main__:2024-10-27 10:25:28 | Epoch: 3 | Step: 75880 | Dataset: 0-1520703 | Loss: 1.824 | 674 ms/step , 58298.53 GFLOP/s , 532554.5 tokens/s INFO:__main__:2024-10-27 10:25:36 | Epoch: 3 | Step: 75890 | Dataset: 0-1528703 | Loss: 1.799 | 675 ms/step , 58278.57 GFLOP/s , 532302.5 tokens/s INFO:__main__:2024-10-27 10:25:43 | Epoch: 3 | Step: 75900 | Dataset: 0-1536703 | Loss: 1.812 | 674 ms/step , 58294.58 GFLOP/s , 532562.2 tokens/s INFO:__main__:2024-10-27 10:25:51 | Epoch: 3 | Step: 75910 | Dataset: 0-1544703 | Loss: 1.791 | 674 ms/step , 58279.36 GFLOP/s , 532399.7 tokens/s INFO:__main__:2024-10-27 10:25:59 | Epoch: 3 | Step: 75920 | Dataset: 0-1552703 | Loss: 1.757 | 673 ms/step , 58398.68 GFLOP/s , 533174.6 tokens/s INFO:__main__:2024-10-27 10:26:06 | Epoch: 3 | Step: 75930 | Dataset: 0-1560703 | Loss: 1.770 | 673 ms/step , 58390.69 GFLOP/s , 532923.8 tokens/s INFO:__main__:2024-10-27 10:26:14 | Epoch: 3 | Step: 75940 | Dataset: 0-1568703 | Loss: 1.765 | 674 ms/step , 58302.08 GFLOP/s , 533072.5 tokens/s INFO:__main__:2024-10-27 10:26:22 | Epoch: 3 | Step: 75950 | Dataset: 0-1576703 | Loss: 2.380 | 674 ms/step , 58311.12 GFLOP/s , 532946.4 tokens/s INFO:__main__:2024-10-27 10:26:29 | Epoch: 3 | Step: 75960 | Dataset: 0-1584703 | Loss: 2.201 | 673 ms/step , 58419.38 GFLOP/s , 533940.7 tokens/s INFO:__main__:2024-10-27 10:26:37 | Epoch: 3 | Step: 75970 | Dataset: 0-1592703 | Loss: 2.198 | 674 ms/step , 58296.38 GFLOP/s , 533012.9 tokens/s INFO:__main__:2024-10-27 10:26:45 | Epoch: 3 | Step: 75980 | Dataset: 0-1600703 | Loss: 2.191 | 674 ms/step , 58301.01 GFLOP/s , 533297.9 tokens/s INFO:__main__:2024-10-27 10:26:53 | Epoch: 3 | Step: 75990 | Dataset: 0-1608703 | Loss: 2.143 | 673 ms/step , 58419.55 GFLOP/s , 533512.4 tokens/s INFO:__main__:2024-10-27 10:27:00 | Validation | Step: 76000 | Val_loss: 2.190 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 10:27:00 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_102700_step_76000.pt` INFO:__main__:2024-10-27 10:27:01 | Epoch: 3 | Step: 76000 | Dataset: 0-1616703 | Loss: 2.100 | 674 ms/step , 58344.66 GFLOP/s , 480318.8 tokens/s INFO:__main__:2024-10-27 10:27:09 | Epoch: 3 | Step: 76010 | Dataset: 0-1624703 | Loss: 2.082 | 676 ms/step , 58111.79 GFLOP/s , 533290.6 tokens/s INFO:__main__:2024-10-27 10:27:16 | Epoch: 3 | Step: 76020 | Dataset: 0-1632703 | Loss: 2.129 | 674 ms/step , 58353.11 GFLOP/s , 533658.1 tokens/s INFO:__main__:2024-10-27 10:27:24 | Epoch: 3 | Step: 76030 | Dataset: 0-1640703 | Loss: 2.139 | 674 ms/step , 58351.65 GFLOP/s , 533435.1 tokens/s INFO:__main__:2024-10-27 10:27:32 | Epoch: 3 | Step: 76040 | Dataset: 0-1648703 | Loss: 2.119 | 674 ms/step , 58340.34 GFLOP/s , 532959.4 tokens/s INFO:__main__:2024-10-27 10:27:39 | Epoch: 3 | Step: 76050 | Dataset: 0-1656703 | Loss: 2.029 | 674 ms/step , 58323.53 GFLOP/s , 533877.0 tokens/s INFO:__main__:2024-10-27 10:27:47 | Epoch: 3 | Step: 76060 | Dataset: 0-1664703 | Loss: 2.091 | 674 ms/step , 58292.11 GFLOP/s , 532798.6 tokens/s INFO:__main__:2024-10-27 10:27:55 | Epoch: 3 | Step: 76070 | Dataset: 0-1672703 | Loss: 1.930 | 674 ms/step , 58312.92 GFLOP/s , 533472.1 tokens/s INFO:__main__:2024-10-27 10:28:02 | Epoch: 3 | Step: 76080 | Dataset: 0-1680703 | Loss: 2.163 | 675 ms/step , 58242.67 GFLOP/s , 533057.1 tokens/s INFO:__main__:2024-10-27 10:28:10 | Epoch: 3 | Step: 76090 | Dataset: 0-1688703 | Loss: 2.135 | 675 ms/step , 58240.36 GFLOP/s , 533295.5 tokens/s INFO:__main__:2024-10-27 10:28:18 | Epoch: 3 | Step: 76100 | Dataset: 0-1696703 | Loss: 2.182 | 675 ms/step , 58272.06 GFLOP/s , 533442.4 tokens/s INFO:__main__:2024-10-27 10:28:26 | Epoch: 3 | Step: 76110 | Dataset: 0-1704703 | Loss: 2.122 | 674 ms/step , 58313.81 GFLOP/s , 533248.9 tokens/s INFO:__main__:2024-10-27 10:28:33 | Epoch: 3 | Step: 76120 | Dataset: 0-1712703 | Loss: 2.208 | 676 ms/step , 58189.26 GFLOP/s , 532935.2 tokens/s INFO:__main__:2024-10-27 10:28:41 | Epoch: 3 | Step: 76130 | Dataset: 0-1720703 | Loss: 2.195 | 673 ms/step , 58389.31 GFLOP/s , 533813.4 tokens/s INFO:__main__:2024-10-27 10:28:49 | Epoch: 3 | Step: 76140 | Dataset: 0-1728703 | Loss: 2.193 | 673 ms/step , 58377.85 GFLOP/s , 533894.3 tokens/s INFO:__main__:2024-10-27 10:28:56 | Epoch: 3 | Step: 76150 | Dataset: 0-1736703 | Loss: 2.159 | 674 ms/step , 58316.69 GFLOP/s , 533066.0 tokens/s INFO:__main__:2024-10-27 10:29:04 | Epoch: 3 | Step: 76160 | Dataset: 0-1744703 | Loss: 2.107 | 674 ms/step , 58322.29 GFLOP/s , 533139.2 tokens/s INFO:__main__:2024-10-27 10:29:12 | Epoch: 3 | Step: 76170 | Dataset: 0-1752703 | Loss: 2.124 | 675 ms/step , 58201.74 GFLOP/s , 532839.8 tokens/s INFO:__main__:2024-10-27 10:29:19 | Epoch: 3 | Step: 76180 | Dataset: 0-1760703 | Loss: 2.157 | 675 ms/step , 58268.33 GFLOP/s , 533413.8 tokens/s INFO:__main__:2024-10-27 10:29:27 | Epoch: 3 | Step: 76190 | Dataset: 0-1768703 | Loss: 2.155 | 675 ms/step , 58244.65 GFLOP/s , 532938.9 tokens/s INFO:__main__:2024-10-27 10:29:35 | Epoch: 3 | Step: 76200 | Dataset: 0-1776703 | Loss: 2.170 | 674 ms/step , 58301.03 GFLOP/s , 533275.2 tokens/s INFO:__main__:2024-10-27 10:29:42 | Epoch: 3 | Step: 76210 | Dataset: 0-1784703 | Loss: 2.177 | 675 ms/step , 58261.91 GFLOP/s , 532925.1 tokens/s INFO:__main__:2024-10-27 10:29:50 | Epoch: 3 | Step: 76220 | Dataset: 0-1792703 | Loss: 2.116 | 674 ms/step , 58300.72 GFLOP/s , 533469.3 tokens/s INFO:__main__:2024-10-27 10:29:58 | Epoch: 3 | Step: 76230 | Dataset: 0-1800703 | Loss: 2.155 | 676 ms/step , 58167.13 GFLOP/s , 533052.1 tokens/s INFO:__main__:2024-10-27 10:30:05 | Epoch: 3 | Step: 76240 | Dataset: 0-1808703 | Loss: 2.138 | 675 ms/step , 58196.92 GFLOP/s , 532851.3 tokens/s INFO:__main__:2024-10-27 10:30:13 | Epoch: 3 | Step: 76250 | Dataset: 0-1816703 | Loss: 2.155 | 675 ms/step , 58235.10 GFLOP/s , 532997.7 tokens/s INFO:__main__:2024-10-27 10:30:21 | Epoch: 3 | Step: 76260 | Dataset: 0-1824703 | Loss: 2.127 | 675 ms/step , 58252.57 GFLOP/s , 532776.1 tokens/s INFO:__main__:2024-10-27 10:30:28 | Epoch: 3 | Step: 76270 | Dataset: 0-1832703 | Loss: 2.231 | 675 ms/step , 58202.91 GFLOP/s , 533113.1 tokens/s INFO:__main__:2024-10-27 10:30:36 | Epoch: 3 | Step: 76280 | Dataset: 0-1840703 | Loss: 1.800 | 676 ms/step , 58138.29 GFLOP/s , 531802.4 tokens/s INFO:__main__:2024-10-27 10:30:44 | Epoch: 3 | Step: 76290 | Dataset: 0-1848703 | Loss: 1.752 | 674 ms/step , 58332.92 GFLOP/s , 531860.9 tokens/s INFO:__main__:2024-10-27 10:30:52 | Epoch: 3 | Step: 76300 | Dataset: 0-1856703 | Loss: 1.775 | 676 ms/step , 58184.88 GFLOP/s , 532268.0 tokens/s INFO:__main__:2024-10-27 10:30:59 | Epoch: 3 | Step: 76310 | Dataset: 0-1864703 | Loss: 1.703 | 675 ms/step , 58226.20 GFLOP/s , 531785.7 tokens/s INFO:__main__:2024-10-27 10:31:07 | Epoch: 3 | Step: 76320 | Dataset: 0-1872703 | Loss: 1.709 | 676 ms/step , 58183.88 GFLOP/s , 531597.8 tokens/s INFO:__main__:2024-10-27 10:31:15 | Epoch: 3 | Step: 76330 | Dataset: 0-1880703 | Loss: 1.734 | 674 ms/step , 58281.54 GFLOP/s , 532092.1 tokens/s INFO:__main__:2024-10-27 10:31:22 | Epoch: 3 | Step: 76340 | Dataset: 0-1888703 | Loss: 1.714 | 677 ms/step , 58102.22 GFLOP/s , 532273.8 tokens/s INFO:__main__:2024-10-27 10:31:30 | Epoch: 3 | Step: 76350 | Dataset: 0-1896703 | Loss: 1.723 | 674 ms/step , 58298.98 GFLOP/s , 532297.4 tokens/s INFO:__main__:2024-10-27 10:31:38 | Epoch: 3 | Step: 76360 | Dataset: 0-1904703 | Loss: 2.284 | 675 ms/step , 58274.77 GFLOP/s , 532688.2 tokens/s INFO:__main__:2024-10-27 10:31:45 | Epoch: 3 | Step: 76370 | Dataset: 0-1912703 | Loss: 2.244 | 675 ms/step , 58235.53 GFLOP/s , 533122.6 tokens/s INFO:__main__:2024-10-27 10:31:53 | Epoch: 3 | Step: 76380 | Dataset: 0-1920703 | Loss: 2.205 | 676 ms/step , 58125.33 GFLOP/s , 532447.8 tokens/s INFO:__main__:2024-10-27 10:32:01 | Epoch: 3 | Step: 76390 | Dataset: 0-1928703 | Loss: 2.164 | 675 ms/step , 58250.70 GFLOP/s , 532950.1 tokens/s INFO:__main__:2024-10-27 10:32:08 | Epoch: 3 | Step: 76400 | Dataset: 0-1936703 | Loss: 2.240 | 674 ms/step , 58313.07 GFLOP/s , 533583.6 tokens/s INFO:__main__:2024-10-27 10:32:16 | Epoch: 3 | Step: 76410 | Dataset: 0-1944703 | Loss: 2.143 | 677 ms/step , 58097.23 GFLOP/s , 532555.6 tokens/s INFO:__main__:2024-10-27 10:32:24 | Epoch: 3 | Step: 76420 | Dataset: 0-1952703 | Loss: 2.156 | 677 ms/step , 58036.05 GFLOP/s , 531428.1 tokens/s INFO:__main__:2024-10-27 10:32:32 | Epoch: 3 | Step: 76430 | Dataset: 0-1960703 | Loss: 2.138 | 676 ms/step , 58131.61 GFLOP/s , 531334.1 tokens/s INFO:__main__:2024-10-27 10:32:39 | Epoch: 3 | Step: 76440 | Dataset: 0-1968703 | Loss: 2.105 | 676 ms/step , 58147.00 GFLOP/s , 532209.7 tokens/s INFO:__main__:2024-10-27 10:32:47 | Epoch: 3 | Step: 76450 | Dataset: 0-1976703 | Loss: 2.009 | 675 ms/step , 58208.04 GFLOP/s , 533306.5 tokens/s INFO:__main__:2024-10-27 10:32:55 | Epoch: 3 | Step: 76460 | Dataset: 0-1984703 | Loss: 2.025 | 676 ms/step , 58124.14 GFLOP/s , 531937.1 tokens/s INFO:__main__:2024-10-27 10:33:02 | Epoch: 3 | Step: 76470 | Dataset: 0-1992703 | Loss: 2.111 | 673 ms/step , 58382.30 GFLOP/s , 532676.9 tokens/s INFO:__main__:2024-10-27 10:33:10 | Epoch: 3 | Step: 76480 | Dataset: 0-2000703 | Loss: 2.192 | 676 ms/step , 58174.22 GFLOP/s , 532915.4 tokens/s INFO:__main__:2024-10-27 10:33:18 | Epoch: 3 | Step: 76490 | Dataset: 0-2008703 | Loss: 2.196 | 676 ms/step , 58188.87 GFLOP/s , 533103.3 tokens/s INFO:__main__:2024-10-27 10:33:25 | Epoch: 3 | Step: 76500 | Dataset: 0-2016703 | Loss: 2.226 | 674 ms/step , 58301.60 GFLOP/s , 533250.2 tokens/s INFO:__main__:2024-10-27 10:33:33 | Epoch: 3 | Step: 76510 | Dataset: 0-2024703 | Loss: 2.077 | 675 ms/step , 58277.33 GFLOP/s , 533230.8 tokens/s INFO:__main__:2024-10-27 10:33:41 | Epoch: 3 | Step: 76520 | Dataset: 0-2032703 | Loss: 2.132 | 675 ms/step , 58203.34 GFLOP/s , 533274.0 tokens/s INFO:__main__:2024-10-27 10:33:48 | Epoch: 3 | Step: 76530 | Dataset: 0-2040703 | Loss: 1.712 | 674 ms/step , 58365.01 GFLOP/s , 532599.5 tokens/s INFO:__main__:2024-10-27 10:33:56 | Epoch: 3 | Step: 76540 | Dataset: 0-2048703 | Loss: 1.672 | 676 ms/step , 58108.54 GFLOP/s , 532057.7 tokens/s INFO:__main__:2024-10-27 10:34:04 | Epoch: 3 | Step: 76550 | Dataset: 0-2056703 | Loss: 1.697 | 676 ms/step , 58188.46 GFLOP/s , 532567.2 tokens/s INFO:__main__:2024-10-27 10:34:12 | Epoch: 3 | Step: 76560 | Dataset: 0-2064703 | Loss: 1.673 | 676 ms/step , 58108.07 GFLOP/s , 532426.5 tokens/s INFO:__main__:2024-10-27 10:34:19 | Epoch: 3 | Step: 76570 | Dataset: 0-2072703 | Loss: 1.672 | 674 ms/step , 58302.93 GFLOP/s , 532274.8 tokens/s INFO:__main__:2024-10-27 10:34:27 | Epoch: 3 | Step: 76580 | Dataset: 0-2080703 | Loss: 1.644 | 674 ms/step , 58305.62 GFLOP/s , 532588.9 tokens/s INFO:__main__:2024-10-27 10:34:35 | Epoch: 3 | Step: 76590 | Dataset: 0-2088703 | Loss: 1.645 | 674 ms/step , 58284.47 GFLOP/s , 532405.4 tokens/s INFO:__main__:2024-10-27 10:34:42 | Epoch: 3 | Step: 76600 | Dataset: 0-2096703 | Loss: 1.654 | 675 ms/step , 58227.19 GFLOP/s , 532369.5 tokens/s INFO:__main__:2024-10-27 10:34:50 | Epoch: 3 | Step: 76610 | Dataset: 0-2104703 | Loss: 2.212 | 675 ms/step , 58214.23 GFLOP/s , 533076.6 tokens/s INFO:__main__:2024-10-27 10:34:58 | Epoch: 3 | Step: 76620 | Dataset: 0-2112703 | Loss: 2.130 | 673 ms/step , 58380.95 GFLOP/s , 533330.7 tokens/s INFO:__main__:2024-10-27 10:35:05 | Epoch: 3 | Step: 76630 | Dataset: 0-2120703 | Loss: 2.088 | 675 ms/step , 58271.95 GFLOP/s , 533239.8 tokens/s INFO:__main__:2024-10-27 10:35:13 | Epoch: 3 | Step: 76640 | Dataset: 0-2128703 | Loss: 2.113 | 675 ms/step , 58238.17 GFLOP/s , 533002.7 tokens/s INFO:__main__:2024-10-27 10:35:21 | Epoch: 3 | Step: 76650 | Dataset: 0-2136703 | Loss: 2.138 | 674 ms/step , 58304.10 GFLOP/s , 533966.0 tokens/s INFO:__main__:2024-10-27 10:35:28 | Epoch: 3 | Step: 76660 | Dataset: 0-2144703 | Loss: 2.144 | 679 ms/step , 57883.13 GFLOP/s , 533454.2 tokens/s INFO:__main__:2024-10-27 10:35:36 | Epoch: 3 | Step: 76670 | Dataset: 0-2152703 | Loss: 2.124 | 673 ms/step , 58382.50 GFLOP/s , 533923.2 tokens/s INFO:__main__:2024-10-27 10:35:44 | Epoch: 3 | Step: 76680 | Dataset: 0-2160703 | Loss: 2.100 | 674 ms/step , 58295.36 GFLOP/s , 532993.2 tokens/s INFO:__main__:2024-10-27 10:35:51 | Epoch: 3 | Step: 76690 | Dataset: 0-2168703 | Loss: 2.112 | 675 ms/step , 58255.39 GFLOP/s , 532831.9 tokens/s INFO:__main__:2024-10-27 10:35:59 | Epoch: 3 | Step: 76700 | Dataset: 0-2176703 | Loss: 2.091 | 674 ms/step , 58315.47 GFLOP/s , 533468.8 tokens/s INFO:__main__:2024-10-27 10:36:07 | Epoch: 3 | Step: 76710 | Dataset: 0-2184703 | Loss: 2.082 | 674 ms/step , 58296.16 GFLOP/s , 533296.2 tokens/s INFO:__main__:2024-10-27 10:36:14 | Epoch: 3 | Step: 76720 | Dataset: 0-2192703 | Loss: 2.122 | 674 ms/step , 58301.68 GFLOP/s , 533421.5 tokens/s INFO:__main__:2024-10-27 10:36:22 | Epoch: 3 | Step: 76730 | Dataset: 0-2200703 | Loss: 2.151 | 674 ms/step , 58291.42 GFLOP/s , 533546.2 tokens/s INFO:__main__:2024-10-27 10:36:30 | Epoch: 3 | Step: 76740 | Dataset: 0-2208703 | Loss: 2.082 | 675 ms/step , 58215.81 GFLOP/s , 533182.4 tokens/s INFO:__main__:2024-10-27 10:36:37 | Epoch: 3 | Step: 76750 | Dataset: 0-2216703 | Loss: 2.203 | 674 ms/step , 58351.98 GFLOP/s , 533585.4 tokens/s INFO:__main__:2024-10-27 10:36:45 | Epoch: 3 | Step: 76760 | Dataset: 0-2224703 | Loss: 2.130 | 675 ms/step , 58261.83 GFLOP/s , 533406.5 tokens/s INFO:__main__:2024-10-27 10:36:53 | Epoch: 3 | Step: 76770 | Dataset: 0-2232703 | Loss: 1.874 | 673 ms/step , 58385.92 GFLOP/s , 533032.7 tokens/s INFO:__main__:2024-10-27 10:37:01 | Epoch: 3 | Step: 76780 | Dataset: 0-2240703 | Loss: 1.805 | 673 ms/step , 58375.16 GFLOP/s , 533248.2 tokens/s INFO:__main__:2024-10-27 10:37:08 | Epoch: 3 | Step: 76790 | Dataset: 0-2248703 | Loss: 1.784 | 674 ms/step , 58333.43 GFLOP/s , 532992.3 tokens/s INFO:__main__:2024-10-27 10:37:16 | Epoch: 3 | Step: 76800 | Dataset: 0-2256703 | Loss: 1.768 | 673 ms/step , 58400.19 GFLOP/s , 533432.1 tokens/s INFO:__main__:2024-10-27 10:37:24 | Epoch: 3 | Step: 76810 | Dataset: 0-2264703 | Loss: 1.753 | 674 ms/step , 58363.61 GFLOP/s , 533301.9 tokens/s INFO:__main__:2024-10-27 10:37:31 | Epoch: 3 | Step: 76820 | Dataset: 0-2272703 | Loss: 1.723 | 674 ms/step , 58347.30 GFLOP/s , 532728.0 tokens/s INFO:__main__:2024-10-27 10:37:39 | Epoch: 3 | Step: 76830 | Dataset: 0-2280703 | Loss: 1.724 | 674 ms/step , 58354.19 GFLOP/s , 532858.0 tokens/s INFO:__main__:2024-10-27 10:37:47 | Epoch: 3 | Step: 76840 | Dataset: 0-2288703 | Loss: 1.741 | 674 ms/step , 58320.08 GFLOP/s , 532750.3 tokens/s INFO:__main__:2024-10-27 10:37:54 | Epoch: 3 | Step: 76850 | Dataset: 0-2296703 | Loss: 1.746 | 678 ms/step , 57995.57 GFLOP/s , 531739.3 tokens/s INFO:__main__:2024-10-27 10:38:02 | Epoch: 3 | Step: 76860 | Dataset: 0-2304703 | Loss: 2.173 | 677 ms/step , 58087.56 GFLOP/s , 530311.6 tokens/s INFO:__main__:2024-10-27 10:38:10 | Epoch: 3 | Step: 76870 | Dataset: 0-2312703 | Loss: 2.145 | 676 ms/step , 58109.31 GFLOP/s , 531408.2 tokens/s INFO:__main__:2024-10-27 10:38:17 | Epoch: 3 | Step: 76880 | Dataset: 0-2320703 | Loss: 2.174 | 679 ms/step , 57905.86 GFLOP/s , 530742.9 tokens/s INFO:__main__:2024-10-27 10:38:25 | Epoch: 3 | Step: 76890 | Dataset: 0-2328703 | Loss: 2.128 | 675 ms/step , 58268.41 GFLOP/s , 530308.2 tokens/s INFO:__main__:2024-10-27 10:38:33 | Epoch: 3 | Step: 76900 | Dataset: 0-2336703 | Loss: 2.085 | 675 ms/step , 58205.69 GFLOP/s , 531444.2 tokens/s INFO:__main__:2024-10-27 10:38:41 | Epoch: 3 | Step: 76910 | Dataset: 0-2344703 | Loss: 2.193 | 676 ms/step , 58164.81 GFLOP/s , 531719.1 tokens/s INFO:__main__:2024-10-27 10:38:48 | Epoch: 3 | Step: 76920 | Dataset: 0-2352703 | Loss: 2.125 | 676 ms/step , 58135.54 GFLOP/s , 531618.8 tokens/s INFO:__main__:2024-10-27 10:38:56 | Epoch: 3 | Step: 76930 | Dataset: 0-2360703 | Loss: 2.101 | 675 ms/step , 58238.91 GFLOP/s , 532245.9 tokens/s INFO:__main__:2024-10-27 10:39:04 | Epoch: 3 | Step: 76940 | Dataset: 0-2368703 | Loss: 2.123 | 676 ms/step , 58125.35 GFLOP/s , 531025.6 tokens/s INFO:__main__:2024-10-27 10:39:11 | Epoch: 3 | Step: 76950 | Dataset: 0-2376703 | Loss: 2.151 | 676 ms/step , 58158.44 GFLOP/s , 530187.5 tokens/s INFO:__main__:2024-10-27 10:39:19 | Epoch: 3 | Step: 76960 | Dataset: 0-2384703 | Loss: 2.140 | 676 ms/step , 58187.12 GFLOP/s , 530619.7 tokens/s INFO:__main__:2024-10-27 10:39:27 | Epoch: 3 | Step: 76970 | Dataset: 0-2392703 | Loss: 2.156 | 674 ms/step , 58334.07 GFLOP/s , 532062.6 tokens/s INFO:__main__:2024-10-27 10:39:35 | Epoch: 3 | Step: 76980 | Dataset: 0-2400703 | Loss: 2.088 | 674 ms/step , 58286.25 GFLOP/s , 532872.7 tokens/s INFO:__main__:2024-10-27 10:39:42 | Epoch: 3 | Step: 76990 | Dataset: 0-2408703 | Loss: 2.071 | 675 ms/step , 58262.70 GFLOP/s , 532759.6 tokens/s INFO:__main__:2024-10-27 10:39:49 | Validation | Step: 77000 | Val_loss: 2.128 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 10:39:49 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_103949_step_77000.pt` INFO:__main__:2024-10-27 10:39:51 | Epoch: 3 | Step: 77000 | Dataset: 0-2416703 | Loss: 2.102 | 674 ms/step , 58283.01 GFLOP/s , 478446.2 tokens/s INFO:__main__:2024-10-27 10:39:59 | Epoch: 3 | Step: 77010 | Dataset: 0-2424703 | Loss: 2.149 | 675 ms/step , 58222.36 GFLOP/s , 532437.5 tokens/s INFO:__main__:2024-10-27 10:40:06 | Epoch: 3 | Step: 77020 | Dataset: 0-2432703 | Loss: 2.164 | 675 ms/step , 58265.34 GFLOP/s , 532236.4 tokens/s INFO:__main__:2024-10-27 10:40:14 | Epoch: 3 | Step: 77030 | Dataset: 0-2440703 | Loss: 2.239 | 675 ms/step , 58221.06 GFLOP/s , 532547.5 tokens/s INFO:__main__:2024-10-27 10:40:22 | Epoch: 3 | Step: 77040 | Dataset: 0-2448703 | Loss: 2.141 | 676 ms/step , 58148.53 GFLOP/s , 532247.0 tokens/s INFO:__main__:2024-10-27 10:40:29 | Epoch: 3 | Step: 77050 | Dataset: 0-2456703 | Loss: 2.246 | 676 ms/step , 58168.16 GFLOP/s , 532674.9 tokens/s INFO:__main__:2024-10-27 10:40:37 | Epoch: 3 | Step: 77060 | Dataset: 0-2464703 | Loss: 2.049 | 674 ms/step , 58310.32 GFLOP/s , 533209.0 tokens/s INFO:__main__:2024-10-27 10:40:45 | Epoch: 3 | Step: 77070 | Dataset: 0-2472703 | Loss: 2.135 | 677 ms/step , 58056.99 GFLOP/s , 531597.2 tokens/s INFO:__main__:2024-10-27 10:40:52 | Epoch: 3 | Step: 77080 | Dataset: 0-2480703 | Loss: 2.184 | 673 ms/step , 58419.82 GFLOP/s , 532146.2 tokens/s INFO:__main__:2024-10-27 10:41:00 | Epoch: 3 | Step: 77090 | Dataset: 0-2488703 | Loss: 2.157 | 674 ms/step , 58285.73 GFLOP/s , 532839.7 tokens/s INFO:__main__:2024-10-27 10:41:08 | Epoch: 3 | Step: 77100 | Dataset: 0-2496703 | Loss: 2.125 | 675 ms/step , 58270.65 GFLOP/s , 532923.4 tokens/s INFO:__main__:2024-10-27 10:41:15 | Epoch: 3 | Step: 77110 | Dataset: 0-2504703 | Loss: 2.111 | 675 ms/step , 58203.31 GFLOP/s , 532651.2 tokens/s INFO:__main__:2024-10-27 10:41:23 | Epoch: 3 | Step: 77120 | Dataset: 0-2512703 | Loss: 2.203 | 674 ms/step , 58340.46 GFLOP/s , 532194.8 tokens/s INFO:__main__:2024-10-27 10:41:31 | Epoch: 3 | Step: 77130 | Dataset: 0-2520703 | Loss: 2.158 | 675 ms/step , 58265.05 GFLOP/s , 532634.6 tokens/s INFO:__main__:2024-10-27 10:41:39 | Epoch: 3 | Step: 77140 | Dataset: 0-2528703 | Loss: 2.166 | 676 ms/step , 58149.19 GFLOP/s , 532458.6 tokens/s INFO:__main__:2024-10-27 10:41:46 | Epoch: 3 | Step: 77150 | Dataset: 0-2536703 | Loss: 2.056 | 675 ms/step , 58237.74 GFLOP/s , 532680.9 tokens/s INFO:__main__:2024-10-27 10:41:54 | Epoch: 3 | Step: 77160 | Dataset: 0-2544703 | Loss: 2.112 | 675 ms/step , 58259.47 GFLOP/s , 532340.6 tokens/s INFO:__main__:2024-10-27 10:42:02 | Epoch: 3 | Step: 77170 | Dataset: 0-2552703 | Loss: 2.143 | 673 ms/step , 58390.62 GFLOP/s , 532823.0 tokens/s INFO:__main__:2024-10-27 10:42:09 | Epoch: 3 | Step: 77180 | Dataset: 0-2560703 | Loss: 2.120 | 675 ms/step , 58250.77 GFLOP/s , 533073.8 tokens/s INFO:__main__:2024-10-27 10:42:17 | Epoch: 3 | Step: 77190 | Dataset: 0-2568703 | Loss: 2.148 | 675 ms/step , 58221.22 GFLOP/s , 533208.8 tokens/s INFO:__main__:2024-10-27 10:42:25 | Epoch: 3 | Step: 77200 | Dataset: 0-2576703 | Loss: 2.182 | 676 ms/step , 58111.82 GFLOP/s , 532673.0 tokens/s INFO:__main__:2024-10-27 10:42:32 | Epoch: 3 | Step: 77210 | Dataset: 0-2584703 | Loss: 2.119 | 677 ms/step , 58068.35 GFLOP/s , 530857.1 tokens/s INFO:__main__:2024-10-27 10:42:40 | Epoch: 3 | Step: 77220 | Dataset: 0-2592703 | Loss: 2.143 | 677 ms/step , 58044.45 GFLOP/s , 531466.1 tokens/s INFO:__main__:2024-10-27 10:42:48 | Epoch: 3 | Step: 77230 | Dataset: 0-2600703 | Loss: 2.140 | 678 ms/step , 57948.36 GFLOP/s , 531243.9 tokens/s INFO:__main__:2024-10-27 10:42:55 | Epoch: 3 | Step: 77240 | Dataset: 0-2608703 | Loss: 2.118 | 674 ms/step , 58302.58 GFLOP/s , 533257.1 tokens/s INFO:__main__:2024-10-27 10:43:03 | Epoch: 3 | Step: 77250 | Dataset: 0-2616703 | Loss: 2.162 | 675 ms/step , 58251.12 GFLOP/s , 533249.3 tokens/s INFO:__main__:2024-10-27 10:43:11 | Epoch: 3 | Step: 77260 | Dataset: 0-2624703 | Loss: 2.117 | 675 ms/step , 58257.65 GFLOP/s , 532761.4 tokens/s INFO:__main__:2024-10-27 10:43:19 | Epoch: 3 | Step: 77270 | Dataset: 0-2632703 | Loss: 2.072 | 675 ms/step , 58201.24 GFLOP/s , 533225.0 tokens/s INFO:__main__:2024-10-27 10:43:26 | Epoch: 3 | Step: 77280 | Dataset: 0-2640703 | Loss: 2.089 | 675 ms/step , 58270.85 GFLOP/s , 532996.2 tokens/s INFO:__main__:2024-10-27 10:43:34 | Epoch: 3 | Step: 77290 | Dataset: 0-2648703 | Loss: 2.073 | 674 ms/step , 58338.44 GFLOP/s , 533192.1 tokens/s INFO:__main__:2024-10-27 10:43:42 | Epoch: 3 | Step: 77300 | Dataset: 0-2656703 | Loss: 2.126 | 674 ms/step , 58350.28 GFLOP/s , 533691.3 tokens/s INFO:__main__:2024-10-27 10:43:49 | Epoch: 3 | Step: 77310 | Dataset: 0-2664703 | Loss: 2.126 | 675 ms/step , 58207.66 GFLOP/s , 533064.8 tokens/s INFO:__main__:2024-10-27 10:43:57 | Epoch: 3 | Step: 77320 | Dataset: 0-2672703 | Loss: 2.116 | 675 ms/step , 58230.56 GFLOP/s , 532651.7 tokens/s INFO:__main__:2024-10-27 10:44:05 | Epoch: 3 | Step: 77330 | Dataset: 0-2680703 | Loss: 2.053 | 674 ms/step , 58352.06 GFLOP/s , 533398.4 tokens/s INFO:__main__:2024-10-27 10:44:12 | Epoch: 3 | Step: 77340 | Dataset: 0-2688703 | Loss: 2.139 | 675 ms/step , 58198.40 GFLOP/s , 532156.9 tokens/s INFO:__main__:2024-10-27 10:44:20 | Epoch: 3 | Step: 77350 | Dataset: 0-2696703 | Loss: 2.280 | 676 ms/step , 58110.03 GFLOP/s , 531935.3 tokens/s INFO:__main__:2024-10-27 10:44:28 | Epoch: 3 | Step: 77360 | Dataset: 0-2704703 | Loss: 2.197 | 675 ms/step , 58202.60 GFLOP/s , 532164.3 tokens/s INFO:__main__:2024-10-27 10:44:35 | Epoch: 3 | Step: 77370 | Dataset: 0-2712703 | Loss: 2.187 | 676 ms/step , 58135.86 GFLOP/s , 531988.1 tokens/s INFO:__main__:2024-10-27 10:44:43 | Epoch: 3 | Step: 77380 | Dataset: 0-2720703 | Loss: 2.169 | 675 ms/step , 58215.76 GFLOP/s , 531457.1 tokens/s INFO:__main__:2024-10-27 10:44:51 | Epoch: 3 | Step: 77390 | Dataset: 0-2728703 | Loss: 2.110 | 675 ms/step , 58236.78 GFLOP/s , 532505.2 tokens/s INFO:__main__:2024-10-27 10:44:58 | Epoch: 3 | Step: 77400 | Dataset: 0-2736703 | Loss: 2.086 | 675 ms/step , 58203.30 GFLOP/s , 532233.8 tokens/s INFO:__main__:2024-10-27 10:45:06 | Epoch: 3 | Step: 77410 | Dataset: 0-2744703 | Loss: 2.079 | 676 ms/step , 58192.58 GFLOP/s , 531212.0 tokens/s INFO:__main__:2024-10-27 10:45:14 | Epoch: 3 | Step: 77420 | Dataset: 0-2752703 | Loss: 2.084 | 675 ms/step , 58271.17 GFLOP/s , 532839.7 tokens/s INFO:__main__:2024-10-27 10:45:22 | Epoch: 3 | Step: 77430 | Dataset: 0-2760703 | Loss: 2.024 | 674 ms/step , 58310.81 GFLOP/s , 533519.7 tokens/s INFO:__main__:2024-10-27 10:45:29 | Epoch: 3 | Step: 77440 | Dataset: 0-2768703 | Loss: 2.044 | 674 ms/step , 58311.83 GFLOP/s , 532984.2 tokens/s INFO:__main__:2024-10-27 10:45:37 | Epoch: 3 | Step: 77450 | Dataset: 0-2776703 | Loss: 2.003 | 674 ms/step , 58313.99 GFLOP/s , 532665.1 tokens/s INFO:__main__:2024-10-27 10:45:45 | Epoch: 3 | Step: 77460 | Dataset: 0-2784703 | Loss: 2.038 | 676 ms/step , 58167.57 GFLOP/s , 533066.6 tokens/s INFO:__main__:2024-10-27 10:45:52 | Epoch: 3 | Step: 77470 | Dataset: 0-2792703 | Loss: 2.007 | 674 ms/step , 58312.36 GFLOP/s , 533105.0 tokens/s INFO:__main__:2024-10-27 10:46:00 | Epoch: 3 | Step: 77480 | Dataset: 0-2800703 | Loss: 1.984 | 674 ms/step , 58282.53 GFLOP/s , 533540.4 tokens/s INFO:__main__:2024-10-27 10:46:08 | Epoch: 3 | Step: 77490 | Dataset: 0-2808703 | Loss: 1.980 | 675 ms/step , 58196.19 GFLOP/s , 533165.9 tokens/s INFO:__main__:2024-10-27 10:46:15 | Epoch: 3 | Step: 77500 | Dataset: 0-2816703 | Loss: 1.984 | 674 ms/step , 58279.50 GFLOP/s , 533210.4 tokens/s INFO:__main__:2024-10-27 10:46:23 | Epoch: 3 | Step: 77510 | Dataset: 0-2824703 | Loss: 2.369 | 674 ms/step , 58292.37 GFLOP/s , 533194.2 tokens/s INFO:__main__:2024-10-27 10:46:31 | Epoch: 3 | Step: 77520 | Dataset: 0-2832703 | Loss: 2.230 | 675 ms/step , 58246.35 GFLOP/s , 532579.4 tokens/s INFO:__main__:2024-10-27 10:46:38 | Epoch: 3 | Step: 77530 | Dataset: 0-2840703 | Loss: 2.192 | 675 ms/step , 58233.72 GFLOP/s , 532764.5 tokens/s INFO:__main__:2024-10-27 10:46:46 | Epoch: 3 | Step: 77540 | Dataset: 0-2848703 | Loss: 2.182 | 675 ms/step , 58269.03 GFLOP/s , 532882.0 tokens/s INFO:__main__:2024-10-27 10:46:54 | Epoch: 3 | Step: 77550 | Dataset: 0-2856703 | Loss: 2.161 | 674 ms/step , 58320.35 GFLOP/s , 533241.6 tokens/s INFO:__main__:2024-10-27 10:47:01 | Epoch: 3 | Step: 77560 | Dataset: 0-2864703 | Loss: 2.162 | 675 ms/step , 58228.75 GFLOP/s , 532972.9 tokens/s INFO:__main__:2024-10-27 10:47:09 | Epoch: 3 | Step: 77570 | Dataset: 0-2872703 | Loss: 2.149 | 674 ms/step , 58364.99 GFLOP/s , 533258.4 tokens/s INFO:__main__:2024-10-27 10:47:17 | Epoch: 3 | Step: 77580 | Dataset: 0-2880703 | Loss: 2.134 | 675 ms/step , 58266.60 GFLOP/s , 533040.0 tokens/s INFO:__main__:2024-10-27 10:47:25 | Epoch: 3 | Step: 77590 | Dataset: 0-2888703 | Loss: 2.190 | 674 ms/step , 58300.45 GFLOP/s , 533299.1 tokens/s INFO:__main__:2024-10-27 10:47:32 | Epoch: 3 | Step: 77600 | Dataset: 0-2896703 | Loss: 2.192 | 675 ms/step , 58232.13 GFLOP/s , 532879.2 tokens/s INFO:__main__:2024-10-27 10:47:40 | Epoch: 3 | Step: 77610 | Dataset: 0-2904703 | Loss: 2.281 | 674 ms/step , 58282.41 GFLOP/s , 532803.3 tokens/s INFO:__main__:2024-10-27 10:47:48 | Epoch: 3 | Step: 77620 | Dataset: 0-2912703 | Loss: 2.088 | 675 ms/step , 58246.35 GFLOP/s , 532939.4 tokens/s INFO:__main__:2024-10-27 10:47:55 | Epoch: 3 | Step: 77630 | Dataset: 0-2920703 | Loss: 2.128 | 676 ms/step , 58138.86 GFLOP/s , 532413.1 tokens/s INFO:__main__:2024-10-27 10:48:03 | Epoch: 3 | Step: 77640 | Dataset: 0-2928703 | Loss: 2.134 | 675 ms/step , 58223.84 GFLOP/s , 532860.6 tokens/s INFO:__main__:2024-10-27 10:48:11 | Epoch: 3 | Step: 77650 | Dataset: 0-2936703 | Loss: 2.149 | 675 ms/step , 58247.67 GFLOP/s , 532887.9 tokens/s INFO:__main__:2024-10-27 10:48:18 | Epoch: 3 | Step: 77660 | Dataset: 0-2944703 | Loss: 2.165 | 675 ms/step , 58223.90 GFLOP/s , 532188.3 tokens/s INFO:__main__:2024-10-27 10:48:26 | Epoch: 3 | Step: 77670 | Dataset: 0-2952703 | Loss: 2.190 | 675 ms/step , 58205.02 GFLOP/s , 532372.2 tokens/s INFO:__main__:2024-10-27 10:48:34 | Epoch: 3 | Step: 77680 | Dataset: 0-2960703 | Loss: 2.256 | 675 ms/step , 58227.10 GFLOP/s , 532420.9 tokens/s INFO:__main__:2024-10-27 10:48:41 | Epoch: 3 | Step: 77690 | Dataset: 0-2968703 | Loss: 2.190 | 674 ms/step , 58321.98 GFLOP/s , 532572.3 tokens/s INFO:__main__:2024-10-27 10:48:49 | Epoch: 3 | Step: 77700 | Dataset: 0-2976703 | Loss: 2.221 | 675 ms/step , 58218.38 GFLOP/s , 532886.8 tokens/s INFO:__main__:2024-10-27 10:48:57 | Epoch: 3 | Step: 77710 | Dataset: 0-2984703 | Loss: 2.188 | 676 ms/step , 58172.10 GFLOP/s , 533108.7 tokens/s INFO:__main__:2024-10-27 10:49:04 | Epoch: 3 | Step: 77720 | Dataset: 0-2992703 | Loss: 2.203 | 675 ms/step , 58234.11 GFLOP/s , 532567.0 tokens/s INFO:__main__:2024-10-27 10:49:12 | Epoch: 3 | Step: 77730 | Dataset: 0-3000703 | Loss: 2.128 | 675 ms/step , 58205.10 GFLOP/s , 532317.6 tokens/s INFO:__main__:2024-10-27 10:49:20 | Epoch: 3 | Step: 77740 | Dataset: 0-3008703 | Loss: 2.244 | 675 ms/step , 58208.05 GFLOP/s , 532019.1 tokens/s INFO:__main__:2024-10-27 10:49:28 | Epoch: 3 | Step: 77750 | Dataset: 0-3016703 | Loss: 2.113 | 675 ms/step , 58214.50 GFLOP/s , 532681.0 tokens/s INFO:__main__:2024-10-27 10:49:35 | Epoch: 3 | Step: 77760 | Dataset: 0-3024703 | Loss: 2.266 | 675 ms/step , 58195.29 GFLOP/s , 532595.5 tokens/s INFO:__main__:2024-10-27 10:49:43 | Epoch: 3 | Step: 77770 | Dataset: 0-3032703 | Loss: 2.181 | 676 ms/step , 58162.31 GFLOP/s , 532421.1 tokens/s INFO:__main__:2024-10-27 10:49:51 | Epoch: 3 | Step: 77780 | Dataset: 0-3040703 | Loss: 2.188 | 676 ms/step , 58150.83 GFLOP/s , 532416.1 tokens/s INFO:__main__:2024-10-27 10:49:58 | Epoch: 3 | Step: 77790 | Dataset: 0-3048703 | Loss: 2.177 | 675 ms/step , 58238.45 GFLOP/s , 532760.7 tokens/s INFO:__main__:2024-10-27 10:50:06 | Epoch: 3 | Step: 77800 | Dataset: 0-3056703 | Loss: 2.211 | 674 ms/step , 58305.64 GFLOP/s , 532532.8 tokens/s INFO:__main__:2024-10-27 10:50:14 | Epoch: 3 | Step: 77810 | Dataset: 0-3064703 | Loss: 2.174 | 675 ms/step , 58264.30 GFLOP/s , 532594.5 tokens/s INFO:__main__:2024-10-27 10:50:21 | Epoch: 3 | Step: 77820 | Dataset: 0-3072703 | Loss: 2.148 | 675 ms/step , 58272.26 GFLOP/s , 532950.7 tokens/s INFO:__main__:2024-10-27 10:50:29 | Epoch: 3 | Step: 77830 | Dataset: 0-3080703 | Loss: 1.888 | 674 ms/step , 58353.49 GFLOP/s , 533905.4 tokens/s INFO:__main__:2024-10-27 10:50:37 | Epoch: 3 | Step: 77840 | Dataset: 0-3088703 | Loss: 1.767 | 675 ms/step , 58223.07 GFLOP/s , 532682.4 tokens/s INFO:__main__:2024-10-27 10:50:44 | Epoch: 3 | Step: 77850 | Dataset: 0-3096703 | Loss: 1.723 | 674 ms/step , 58291.69 GFLOP/s , 532946.9 tokens/s INFO:__main__:2024-10-27 10:50:52 | Epoch: 3 | Step: 77860 | Dataset: 0-3104703 | Loss: 1.661 | 674 ms/step , 58281.93 GFLOP/s , 532990.5 tokens/s INFO:__main__:2024-10-27 10:51:00 | Epoch: 3 | Step: 77870 | Dataset: 0-3112703 | Loss: 1.619 | 675 ms/step , 58272.94 GFLOP/s , 532872.1 tokens/s INFO:__main__:2024-10-27 10:51:07 | Epoch: 3 | Step: 77880 | Dataset: 0-3120703 | Loss: 1.645 | 675 ms/step , 58269.43 GFLOP/s , 533368.9 tokens/s INFO:__main__:2024-10-27 10:51:15 | Epoch: 3 | Step: 77890 | Dataset: 0-3128703 | Loss: 2.215 | 674 ms/step , 58361.68 GFLOP/s , 533462.3 tokens/s INFO:__main__:2024-10-27 10:51:23 | Epoch: 3 | Step: 77900 | Dataset: 0-3136703 | Loss: 2.216 | 675 ms/step , 58245.53 GFLOP/s , 533252.7 tokens/s INFO:__main__:2024-10-27 10:51:31 | Epoch: 3 | Step: 77910 | Dataset: 0-3144703 | Loss: 2.256 | 676 ms/step , 58191.13 GFLOP/s , 532985.1 tokens/s INFO:__main__:2024-10-27 10:51:38 | Epoch: 3 | Step: 77920 | Dataset: 0-3152703 | Loss: 2.133 | 674 ms/step , 58354.80 GFLOP/s , 533459.8 tokens/s INFO:__main__:2024-10-27 10:51:46 | Epoch: 3 | Step: 77930 | Dataset: 0-3160703 | Loss: 2.128 | 675 ms/step , 58259.65 GFLOP/s , 533065.6 tokens/s INFO:__main__:2024-10-27 10:51:54 | Epoch: 3 | Step: 77940 | Dataset: 0-3168703 | Loss: 2.158 | 675 ms/step , 58266.13 GFLOP/s , 533478.8 tokens/s INFO:__main__:2024-10-27 10:52:01 | Epoch: 3 | Step: 77950 | Dataset: 0-3176703 | Loss: 2.198 | 675 ms/step , 58250.58 GFLOP/s , 533552.7 tokens/s INFO:__main__:2024-10-27 10:52:09 | Epoch: 3 | Step: 77960 | Dataset: 0-3184703 | Loss: 2.141 | 674 ms/step , 58299.27 GFLOP/s , 532959.8 tokens/s INFO:__main__:2024-10-27 10:52:17 | Epoch: 3 | Step: 77970 | Dataset: 0-3192703 | Loss: 2.229 | 675 ms/step , 58232.77 GFLOP/s , 533176.4 tokens/s INFO:__main__:2024-10-27 10:52:24 | Epoch: 3 | Step: 77980 | Dataset: 0-3200703 | Loss: 2.190 | 674 ms/step , 58296.50 GFLOP/s , 532993.9 tokens/s INFO:__main__:2024-10-27 10:52:32 | Epoch: 3 | Step: 77990 | Dataset: 0-3208703 | Loss: 2.138 | 673 ms/step , 58380.91 GFLOP/s , 533213.2 tokens/s INFO:__main__:2024-10-27 10:52:39 | Validation | Step: 78000 | Val_loss: 2.257 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 10:52:39 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_105239_step_78000.pt` INFO:__main__:2024-10-27 10:52:40 | Epoch: 3 | Step: 78000 | Dataset: 0-3216703 | Loss: 2.148 | 673 ms/step , 58366.94 GFLOP/s , 480840.2 tokens/s INFO:__main__:2024-10-27 10:52:48 | Epoch: 3 | Step: 78010 | Dataset: 0-3224703 | Loss: 2.160 | 674 ms/step , 58309.83 GFLOP/s , 533087.3 tokens/s INFO:__main__:2024-10-27 10:52:56 | Epoch: 3 | Step: 78020 | Dataset: 0-3232703 | Loss: 2.055 | 674 ms/step , 58321.50 GFLOP/s , 532558.2 tokens/s INFO:__main__:2024-10-27 10:53:04 | Epoch: 3 | Step: 78030 | Dataset: 0-3240703 | Loss: 2.162 | 675 ms/step , 58265.05 GFLOP/s , 532744.2 tokens/s INFO:__main__:2024-10-27 10:53:11 | Epoch: 3 | Step: 78040 | Dataset: 0-3248703 | Loss: 2.089 | 676 ms/step , 58144.25 GFLOP/s , 533046.3 tokens/s INFO:__main__:2024-10-27 10:53:19 | Epoch: 3 | Step: 78050 | Dataset: 0-3256703 | Loss: 2.321 | 674 ms/step , 58280.06 GFLOP/s , 533080.7 tokens/s INFO:__main__:2024-10-27 10:53:27 | Epoch: 3 | Step: 78060 | Dataset: 0-3264703 | Loss: 2.190 | 675 ms/step , 58205.02 GFLOP/s , 533285.8 tokens/s INFO:__main__:2024-10-27 10:53:34 | Epoch: 3 | Step: 78070 | Dataset: 0-3272703 | Loss: 2.125 | 675 ms/step , 58251.35 GFLOP/s , 533279.9 tokens/s INFO:__main__:2024-10-27 10:53:42 | Epoch: 3 | Step: 78080 | Dataset: 0-3280703 | Loss: 2.207 | 675 ms/step , 58270.81 GFLOP/s , 532732.8 tokens/s INFO:__main__:2024-10-27 10:53:50 | Epoch: 3 | Step: 78090 | Dataset: 0-3288703 | Loss: 2.200 | 674 ms/step , 58295.51 GFLOP/s , 532948.5 tokens/s INFO:__main__:2024-10-27 10:53:57 | Epoch: 3 | Step: 78100 | Dataset: 0-3296703 | Loss: 2.119 | 675 ms/step , 58218.89 GFLOP/s , 532459.1 tokens/s INFO:__main__:2024-10-27 10:54:05 | Epoch: 3 | Step: 78110 | Dataset: 0-3304703 | Loss: 2.122 | 674 ms/step , 58335.56 GFLOP/s , 533018.0 tokens/s INFO:__main__:2024-10-27 10:54:13 | Epoch: 3 | Step: 78120 | Dataset: 0-3312703 | Loss: 2.239 | 674 ms/step , 58310.87 GFLOP/s , 533357.0 tokens/s INFO:__main__:2024-10-27 10:54:20 | Epoch: 3 | Step: 78130 | Dataset: 0-3320703 | Loss: 2.224 | 675 ms/step , 58225.73 GFLOP/s , 532861.2 tokens/s INFO:__main__:2024-10-27 10:54:28 | Epoch: 3 | Step: 78140 | Dataset: 0-3328703 | Loss: 2.155 | 675 ms/step , 58277.04 GFLOP/s , 532502.4 tokens/s INFO:__main__:2024-10-27 10:54:36 | Epoch: 3 | Step: 78150 | Dataset: 0-3336703 | Loss: 2.124 | 675 ms/step , 58220.97 GFLOP/s , 532576.1 tokens/s INFO:__main__:2024-10-27 10:54:43 | Epoch: 3 | Step: 78160 | Dataset: 0-3344703 | Loss: 2.082 | 675 ms/step , 58237.65 GFLOP/s , 532768.3 tokens/s INFO:__main__:2024-10-27 10:54:51 | Epoch: 3 | Step: 78170 | Dataset: 0-3352703 | Loss: 2.172 | 675 ms/step , 58210.72 GFLOP/s , 532866.4 tokens/s INFO:__main__:2024-10-27 10:54:59 | Epoch: 3 | Step: 78180 | Dataset: 0-3360703 | Loss: 2.161 | 674 ms/step , 58301.67 GFLOP/s , 532748.6 tokens/s INFO:__main__:2024-10-27 10:55:07 | Epoch: 3 | Step: 78190 | Dataset: 0-3368703 | Loss: 2.071 | 675 ms/step , 58262.08 GFLOP/s , 532552.2 tokens/s INFO:__main__:2024-10-27 10:55:14 | Epoch: 3 | Step: 78200 | Dataset: 0-3376703 | Loss: 2.158 | 674 ms/step , 58308.18 GFLOP/s , 532725.1 tokens/s INFO:__main__:2024-10-27 10:55:22 | Epoch: 3 | Step: 78210 | Dataset: 0-3384703 | Loss: 2.144 | 675 ms/step , 58225.98 GFLOP/s , 532970.5 tokens/s INFO:__main__:2024-10-27 10:55:30 | Epoch: 3 | Step: 78220 | Dataset: 0-3392703 | Loss: 2.244 | 675 ms/step , 58213.44 GFLOP/s , 532131.1 tokens/s INFO:__main__:2024-10-27 10:55:37 | Epoch: 3 | Step: 78230 | Dataset: 0-3400703 | Loss: 2.075 | 675 ms/step , 58255.20 GFLOP/s , 532351.5 tokens/s INFO:__main__:2024-10-27 10:55:45 | Epoch: 3 | Step: 78240 | Dataset: 0-3408703 | Loss: 2.113 | 676 ms/step , 58183.17 GFLOP/s , 532154.8 tokens/s INFO:__main__:2024-10-27 10:55:53 | Epoch: 3 | Step: 78250 | Dataset: 0-3416703 | Loss: 2.148 | 675 ms/step , 58217.26 GFLOP/s , 532444.0 tokens/s INFO:__main__:2024-10-27 10:56:00 | Epoch: 3 | Step: 78260 | Dataset: 0-3424703 | Loss: 2.242 | 675 ms/step , 58236.74 GFLOP/s , 532085.9 tokens/s INFO:__main__:2024-10-27 10:56:08 | Epoch: 3 | Step: 78270 | Dataset: 0-3432703 | Loss: 2.200 | 677 ms/step , 58075.04 GFLOP/s , 531874.8 tokens/s INFO:__main__:2024-10-27 10:56:16 | Epoch: 3 | Step: 78280 | Dataset: 0-3440703 | Loss: 2.170 | 675 ms/step , 58196.24 GFLOP/s , 531554.7 tokens/s INFO:__main__:2024-10-27 10:56:24 | Epoch: 3 | Step: 78290 | Dataset: 0-3448703 | Loss: 2.162 | 677 ms/step , 58074.19 GFLOP/s , 530519.2 tokens/s INFO:__main__:2024-10-27 10:56:31 | Epoch: 3 | Step: 78300 | Dataset: 0-3456703 | Loss: 2.079 | 676 ms/step , 58120.75 GFLOP/s , 530693.9 tokens/s INFO:__main__:2024-10-27 10:56:39 | Epoch: 3 | Step: 78310 | Dataset: 0-3464703 | Loss: 2.120 | 677 ms/step , 58093.99 GFLOP/s , 530530.2 tokens/s INFO:__main__:2024-10-27 10:56:47 | Epoch: 3 | Step: 78320 | Dataset: 0-3472703 | Loss: 2.157 | 676 ms/step , 58163.10 GFLOP/s , 530613.2 tokens/s INFO:__main__:2024-10-27 10:56:54 | Epoch: 3 | Step: 78330 | Dataset: 0-3480703 | Loss: 2.099 | 676 ms/step , 58147.36 GFLOP/s , 530729.4 tokens/s INFO:__main__:2024-10-27 10:57:02 | Epoch: 3 | Step: 78340 | Dataset: 0-3488703 | Loss: 2.102 | 676 ms/step , 58164.11 GFLOP/s , 530455.4 tokens/s INFO:__main__:2024-10-27 10:57:10 | Epoch: 3 | Step: 78350 | Dataset: 0-3496703 | Loss: 2.083 | 677 ms/step , 58100.08 GFLOP/s , 529353.6 tokens/s INFO:__main__:2024-10-27 10:57:18 | Epoch: 3 | Step: 78360 | Dataset: 0-3504703 | Loss: 2.122 | 676 ms/step , 58182.86 GFLOP/s , 529645.6 tokens/s INFO:__main__:2024-10-27 10:57:25 | Epoch: 3 | Step: 78370 | Dataset: 0-3512703 | Loss: 2.106 | 678 ms/step , 58012.88 GFLOP/s , 531153.4 tokens/s INFO:__main__:2024-10-27 10:57:33 | Epoch: 3 | Step: 78380 | Dataset: 0-3520703 | Loss: 2.172 | 676 ms/step , 58192.76 GFLOP/s , 531746.4 tokens/s INFO:__main__:2024-10-27 10:57:41 | Epoch: 3 | Step: 78390 | Dataset: 0-3528703 | Loss: 2.204 | 675 ms/step , 58227.02 GFLOP/s , 532403.5 tokens/s INFO:__main__:2024-10-27 10:57:48 | Epoch: 3 | Step: 78400 | Dataset: 0-3536703 | Loss: 2.060 | 675 ms/step , 58197.16 GFLOP/s , 531861.2 tokens/s INFO:__main__:2024-10-27 10:57:56 | Epoch: 3 | Step: 78410 | Dataset: 0-3544703 | Loss: 2.134 | 675 ms/step , 58203.99 GFLOP/s , 532232.0 tokens/s INFO:__main__:2024-10-27 10:58:04 | Epoch: 3 | Step: 78420 | Dataset: 0-3552703 | Loss: 2.079 | 676 ms/step , 58165.28 GFLOP/s , 531741.0 tokens/s INFO:__main__:2024-10-27 10:58:11 | Epoch: 3 | Step: 78430 | Dataset: 0-3560703 | Loss: 2.104 | 676 ms/step , 58161.99 GFLOP/s , 532300.8 tokens/s INFO:__main__:2024-10-27 10:58:19 | Epoch: 3 | Step: 78440 | Dataset: 0-3568703 | Loss: 2.146 | 675 ms/step , 58210.10 GFLOP/s , 531527.0 tokens/s INFO:__main__:2024-10-27 10:58:27 | Epoch: 3 | Step: 78450 | Dataset: 0-3576703 | Loss: 2.104 | 676 ms/step , 58167.93 GFLOP/s , 531897.7 tokens/s INFO:__main__:2024-10-27 10:58:35 | Epoch: 3 | Step: 78460 | Dataset: 0-3584703 | Loss: 2.109 | 675 ms/step , 58219.57 GFLOP/s , 532310.1 tokens/s INFO:__main__:2024-10-27 10:58:42 | Epoch: 3 | Step: 78470 | Dataset: 0-3592703 | Loss: 2.079 | 676 ms/step , 58186.92 GFLOP/s , 532242.2 tokens/s INFO:__main__:2024-10-27 10:58:50 | Epoch: 3 | Step: 78480 | Dataset: 0-3600703 | Loss: 2.076 | 676 ms/step , 58151.64 GFLOP/s , 532427.3 tokens/s INFO:__main__:2024-10-27 10:58:58 | Epoch: 3 | Step: 78490 | Dataset: 0-3608703 | Loss: 2.147 | 675 ms/step , 58231.36 GFLOP/s , 532529.2 tokens/s INFO:__main__:2024-10-27 10:59:05 | Epoch: 3 | Step: 78500 | Dataset: 0-3616703 | Loss: 2.132 | 676 ms/step , 58151.19 GFLOP/s , 532236.8 tokens/s INFO:__main__:2024-10-27 10:59:13 | Epoch: 3 | Step: 78510 | Dataset: 0-3624703 | Loss: 2.068 | 675 ms/step , 58219.89 GFLOP/s , 532278.5 tokens/s INFO:__main__:2024-10-27 10:59:21 | Epoch: 3 | Step: 78520 | Dataset: 0-3632703 | Loss: 2.152 | 675 ms/step , 58255.40 GFLOP/s , 532439.3 tokens/s INFO:__main__:2024-10-27 10:59:28 | Epoch: 3 | Step: 78530 | Dataset: 0-3640703 | Loss: 2.156 | 676 ms/step , 58187.92 GFLOP/s , 532237.3 tokens/s INFO:__main__:2024-10-27 10:59:36 | Epoch: 3 | Step: 78540 | Dataset: 0-3648703 | Loss: 2.130 | 675 ms/step , 58209.16 GFLOP/s , 532144.4 tokens/s INFO:__main__:2024-10-27 10:59:44 | Epoch: 3 | Step: 78550 | Dataset: 0-3656703 | Loss: 2.235 | 675 ms/step , 58268.28 GFLOP/s , 532456.4 tokens/s INFO:__main__:2024-10-27 10:59:52 | Epoch: 3 | Step: 78560 | Dataset: 0-3664703 | Loss: 2.145 | 675 ms/step , 58271.99 GFLOP/s , 532079.7 tokens/s INFO:__main__:2024-10-27 10:59:59 | Epoch: 3 | Step: 78570 | Dataset: 0-3672703 | Loss: 2.132 | 674 ms/step , 58325.91 GFLOP/s , 532700.1 tokens/s INFO:__main__:2024-10-27 11:00:07 | Epoch: 3 | Step: 78580 | Dataset: 0-3680703 | Loss: 2.097 | 675 ms/step , 58244.74 GFLOP/s , 532501.1 tokens/s INFO:__main__:2024-10-27 11:00:15 | Epoch: 3 | Step: 78590 | Dataset: 0-3688703 | Loss: 2.192 | 674 ms/step , 58294.50 GFLOP/s , 532570.5 tokens/s INFO:__main__:2024-10-27 11:00:22 | Epoch: 3 | Step: 78600 | Dataset: 0-3696703 | Loss: 2.133 | 675 ms/step , 58208.66 GFLOP/s , 532500.0 tokens/s INFO:__main__:2024-10-27 11:00:30 | Epoch: 3 | Step: 78610 | Dataset: 0-3704703 | Loss: 2.129 | 675 ms/step , 58242.85 GFLOP/s , 532590.0 tokens/s INFO:__main__:2024-10-27 11:00:38 | Epoch: 3 | Step: 78620 | Dataset: 0-3712703 | Loss: 2.208 | 675 ms/step , 58230.76 GFLOP/s , 532678.7 tokens/s INFO:__main__:2024-10-27 11:00:45 | Epoch: 3 | Step: 78630 | Dataset: 0-3720703 | Loss: 2.217 | 674 ms/step , 58289.47 GFLOP/s , 532193.7 tokens/s INFO:__main__:2024-10-27 11:00:53 | Epoch: 3 | Step: 78640 | Dataset: 0-3728703 | Loss: 2.104 | 675 ms/step , 58202.35 GFLOP/s , 532503.7 tokens/s INFO:__main__:2024-10-27 11:01:01 | Epoch: 3 | Step: 78650 | Dataset: 0-3736703 | Loss: 2.207 | 675 ms/step , 58195.67 GFLOP/s , 532652.5 tokens/s INFO:__main__:2024-10-27 11:01:08 | Epoch: 3 | Step: 78660 | Dataset: 0-3744703 | Loss: 2.174 | 676 ms/step , 58171.14 GFLOP/s , 532413.6 tokens/s INFO:__main__:2024-10-27 11:01:16 | Epoch: 3 | Step: 78670 | Dataset: 0-3752703 | Loss: 2.163 | 674 ms/step , 58311.02 GFLOP/s , 532867.7 tokens/s INFO:__main__:2024-10-27 11:01:24 | Epoch: 3 | Step: 78680 | Dataset: 0-3760703 | Loss: 2.114 | 673 ms/step , 58370.94 GFLOP/s , 533303.3 tokens/s INFO:__main__:2024-10-27 11:01:32 | Epoch: 3 | Step: 78690 | Dataset: 0-3768703 | Loss: 2.170 | 677 ms/step , 58025.12 GFLOP/s , 532385.4 tokens/s INFO:__main__:2024-10-27 11:01:39 | Epoch: 3 | Step: 78700 | Dataset: 0-3776703 | Loss: 1.992 | 678 ms/step , 58016.81 GFLOP/s , 530828.0 tokens/s INFO:__main__:2024-10-27 11:01:47 | Epoch: 3 | Step: 78710 | Dataset: 0-3784703 | Loss: 1.857 | 675 ms/step , 58217.18 GFLOP/s , 531660.0 tokens/s INFO:__main__:2024-10-27 11:01:55 | Epoch: 3 | Step: 78720 | Dataset: 0-3792703 | Loss: 1.771 | 674 ms/step , 58285.82 GFLOP/s , 532982.0 tokens/s INFO:__main__:2024-10-27 11:02:02 | Epoch: 3 | Step: 78730 | Dataset: 0-3800703 | Loss: 1.763 | 675 ms/step , 58268.54 GFLOP/s , 532337.2 tokens/s INFO:__main__:2024-10-27 11:02:10 | Epoch: 3 | Step: 78740 | Dataset: 0-3808703 | Loss: 1.756 | 675 ms/step , 58231.52 GFLOP/s , 532259.5 tokens/s INFO:__main__:2024-10-27 11:02:18 | Epoch: 3 | Step: 78750 | Dataset: 0-3816703 | Loss: 1.751 | 674 ms/step , 58328.87 GFLOP/s , 532658.4 tokens/s INFO:__main__:2024-10-27 11:02:25 | Epoch: 3 | Step: 78760 | Dataset: 0-3824703 | Loss: 1.719 | 675 ms/step , 58260.56 GFLOP/s , 532718.8 tokens/s INFO:__main__:2024-10-27 11:02:33 | Epoch: 3 | Step: 78770 | Dataset: 0-3832703 | Loss: 1.714 | 675 ms/step , 58262.60 GFLOP/s , 532564.1 tokens/s INFO:__main__:2024-10-27 11:02:41 | Epoch: 3 | Step: 78780 | Dataset: 0-3840703 | Loss: 2.256 | 676 ms/step , 58190.47 GFLOP/s , 532398.4 tokens/s INFO:__main__:2024-10-27 11:02:48 | Epoch: 3 | Step: 78790 | Dataset: 0-3848703 | Loss: 2.135 | 675 ms/step , 58236.89 GFLOP/s , 532674.8 tokens/s INFO:__main__:2024-10-27 11:02:56 | Epoch: 3 | Step: 78800 | Dataset: 0-3856703 | Loss: 2.193 | 675 ms/step , 58262.39 GFLOP/s , 533009.8 tokens/s INFO:__main__:2024-10-27 11:03:04 | Epoch: 3 | Step: 78810 | Dataset: 0-3864703 | Loss: 2.131 | 674 ms/step , 58297.85 GFLOP/s , 533186.3 tokens/s INFO:__main__:2024-10-27 11:03:12 | Epoch: 3 | Step: 78820 | Dataset: 0-3872703 | Loss: 2.097 | 675 ms/step , 58245.64 GFLOP/s , 533176.0 tokens/s INFO:__main__:2024-10-27 11:03:19 | Epoch: 3 | Step: 78830 | Dataset: 0-3880703 | Loss: 2.112 | 674 ms/step , 58293.93 GFLOP/s , 532789.9 tokens/s INFO:__main__:2024-10-27 11:03:27 | Epoch: 3 | Step: 78840 | Dataset: 0-3888703 | Loss: 2.141 | 675 ms/step , 58270.05 GFLOP/s , 532829.7 tokens/s INFO:__main__:2024-10-27 11:03:35 | Epoch: 3 | Step: 78850 | Dataset: 0-3896703 | Loss: 2.124 | 676 ms/step , 58183.23 GFLOP/s , 532599.3 tokens/s INFO:__main__:2024-10-27 11:03:42 | Epoch: 3 | Step: 78860 | Dataset: 0-3904703 | Loss: 2.094 | 674 ms/step , 58282.42 GFLOP/s , 532966.2 tokens/s INFO:__main__:2024-10-27 11:03:50 | Epoch: 3 | Step: 78870 | Dataset: 0-3912703 | Loss: 2.137 | 675 ms/step , 58238.02 GFLOP/s , 532969.3 tokens/s INFO:__main__:2024-10-27 11:03:58 | Epoch: 3 | Step: 78880 | Dataset: 0-3920703 | Loss: 1.979 | 675 ms/step , 58221.33 GFLOP/s , 532758.0 tokens/s INFO:__main__:2024-10-27 11:04:05 | Epoch: 3 | Step: 78890 | Dataset: 0-3928703 | Loss: 2.091 | 673 ms/step , 58381.70 GFLOP/s , 533073.8 tokens/s INFO:__main__:2024-10-27 11:04:13 | Epoch: 3 | Step: 78900 | Dataset: 0-3936703 | Loss: 2.087 | 674 ms/step , 58332.84 GFLOP/s , 532955.2 tokens/s INFO:__main__:2024-10-27 11:04:21 | Epoch: 3 | Step: 78910 | Dataset: 0-3944703 | Loss: 2.079 | 675 ms/step , 58215.40 GFLOP/s , 533110.6 tokens/s INFO:__main__:2024-10-27 11:04:28 | Epoch: 3 | Step: 78920 | Dataset: 0-3952703 | Loss: 2.168 | 675 ms/step , 58236.56 GFLOP/s , 532782.9 tokens/s INFO:__main__:2024-10-27 11:04:36 | Epoch: 3 | Step: 78930 | Dataset: 0-3960703 | Loss: 2.051 | 675 ms/step , 58202.68 GFLOP/s , 532813.3 tokens/s INFO:__main__:2024-10-27 11:04:44 | Epoch: 3 | Step: 78940 | Dataset: 0-3968703 | Loss: 2.105 | 677 ms/step , 58092.54 GFLOP/s , 532212.6 tokens/s INFO:__main__:2024-10-27 11:04:51 | Epoch: 3 | Step: 78950 | Dataset: 0-3976703 | Loss: 2.185 | 675 ms/step , 58255.57 GFLOP/s , 532669.3 tokens/s INFO:__main__:2024-10-27 11:04:59 | Epoch: 3 | Step: 78960 | Dataset: 0-3984703 | Loss: 2.126 | 676 ms/step , 58157.98 GFLOP/s , 532375.0 tokens/s INFO:__main__:2024-10-27 11:05:07 | Epoch: 3 | Step: 78970 | Dataset: 0-3992703 | Loss: 2.182 | 675 ms/step , 58205.28 GFLOP/s , 532701.6 tokens/s INFO:__main__:2024-10-27 11:05:15 | Epoch: 3 | Step: 78980 | Dataset: 0-4000703 | Loss: 2.226 | 675 ms/step , 58198.04 GFLOP/s , 532407.8 tokens/s INFO:__main__:2024-10-27 11:05:22 | Epoch: 3 | Step: 78990 | Dataset: 0-4008703 | Loss: 2.141 | 674 ms/step , 58334.05 GFLOP/s , 531788.2 tokens/s INFO:__main__:2024-10-27 11:05:29 | Validation | Step: 79000 | Val_loss: 2.104 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 11:05:29 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_110529_step_79000.pt` INFO:__main__:2024-10-27 11:05:31 | Epoch: 3 | Step: 79000 | Dataset: 0-4016703 | Loss: 2.203 | 672 ms/step , 58452.63 GFLOP/s , 479352.0 tokens/s INFO:__main__:2024-10-27 11:05:38 | Epoch: 3 | Step: 79010 | Dataset: 0-4024703 | Loss: 2.181 | 675 ms/step , 58221.74 GFLOP/s , 532280.6 tokens/s INFO:__main__:2024-10-27 11:05:46 | Epoch: 3 | Step: 79020 | Dataset: 0-4032703 | Loss: 2.120 | 676 ms/step , 58130.97 GFLOP/s , 532403.1 tokens/s INFO:__main__:2024-10-27 11:05:54 | Epoch: 3 | Step: 79030 | Dataset: 0-4040703 | Loss: 2.057 | 674 ms/step , 58285.58 GFLOP/s , 532730.3 tokens/s INFO:__main__:2024-10-27 11:06:02 | Epoch: 3 | Step: 79040 | Dataset: 0-4048703 | Loss: 2.177 | 676 ms/step , 58141.72 GFLOP/s , 532304.0 tokens/s INFO:__main__:2024-10-27 11:06:09 | Epoch: 3 | Step: 79050 | Dataset: 0-4056703 | Loss: 2.184 | 675 ms/step , 58242.68 GFLOP/s , 532488.7 tokens/s INFO:__main__:2024-10-27 11:06:17 | Epoch: 3 | Step: 79060 | Dataset: 0-4064703 | Loss: 2.125 | 675 ms/step , 58272.61 GFLOP/s , 532774.6 tokens/s INFO:__main__:2024-10-27 11:06:25 | Epoch: 3 | Step: 79070 | Dataset: 0-4072703 | Loss: 2.088 | 676 ms/step , 58185.03 GFLOP/s , 532080.0 tokens/s INFO:__main__:2024-10-27 11:06:32 | Epoch: 3 | Step: 79080 | Dataset: 0-4080703 | Loss: 2.150 | 675 ms/step , 58231.07 GFLOP/s , 532169.9 tokens/s INFO:__main__:2024-10-27 11:06:40 | Epoch: 3 | Step: 79090 | Dataset: 0-4088703 | Loss: 2.168 | 675 ms/step , 58203.48 GFLOP/s , 531882.0 tokens/s INFO:__main__:2024-10-27 11:06:48 | Epoch: 3 | Step: 79100 | Dataset: 0-4096703 | Loss: 2.128 | 675 ms/step , 58246.37 GFLOP/s , 532109.2 tokens/s INFO:__main__:2024-10-27 11:06:55 | Epoch: 3 | Step: 79110 | Dataset: 0-4104703 | Loss: 2.194 | 676 ms/step , 58118.95 GFLOP/s , 531926.7 tokens/s INFO:__main__:2024-10-27 11:07:03 | Epoch: 3 | Step: 79120 | Dataset: 0-4112703 | Loss: 2.228 | 675 ms/step , 58243.64 GFLOP/s , 532170.8 tokens/s INFO:__main__:2024-10-27 11:07:11 | Epoch: 3 | Step: 79130 | Dataset: 0-4120703 | Loss: 2.148 | 678 ms/step , 57999.46 GFLOP/s , 531924.6 tokens/s INFO:__main__:2024-10-27 11:07:19 | Epoch: 3 | Step: 79140 | Dataset: 0-4128703 | Loss: 2.179 | 676 ms/step , 58181.25 GFLOP/s , 532023.4 tokens/s INFO:__main__:2024-10-27 11:07:26 | Epoch: 3 | Step: 79150 | Dataset: 0-4136703 | Loss: 2.173 | 676 ms/step , 58164.55 GFLOP/s , 531962.3 tokens/s INFO:__main__:2024-10-27 11:07:34 | Epoch: 3 | Step: 79160 | Dataset: 0-4144703 | Loss: 2.140 | 674 ms/step , 58307.47 GFLOP/s , 531975.8 tokens/s INFO:__main__:2024-10-27 11:07:42 | Epoch: 3 | Step: 79170 | Dataset: 0-4152703 | Loss: 2.175 | 676 ms/step , 58170.96 GFLOP/s , 532345.3 tokens/s INFO:__main__:2024-10-27 11:07:49 | Epoch: 3 | Step: 79180 | Dataset: 0-4160703 | Loss: 2.181 | 675 ms/step , 58258.33 GFLOP/s , 532164.1 tokens/s INFO:__main__:2024-10-27 11:07:57 | Epoch: 3 | Step: 79190 | Dataset: 0-4168703 | Loss: 2.164 | 677 ms/step , 58074.89 GFLOP/s , 531650.2 tokens/s INFO:__main__:2024-10-27 11:08:05 | Epoch: 3 | Step: 79200 | Dataset: 0-4176703 | Loss: 2.209 | 675 ms/step , 58242.17 GFLOP/s , 531788.4 tokens/s INFO:__main__:2024-10-27 11:08:12 | Epoch: 3 | Step: 79210 | Dataset: 0-4184703 | Loss: 2.223 | 675 ms/step , 58196.56 GFLOP/s , 532230.1 tokens/s INFO:__main__:2024-10-27 11:08:20 | Epoch: 3 | Step: 79220 | Dataset: 0-4192703 | Loss: 2.153 | 676 ms/step , 58170.32 GFLOP/s , 532146.7 tokens/s INFO:__main__:2024-10-27 11:08:28 | Epoch: 3 | Step: 79230 | Dataset: 0-4200703 | Loss: 2.148 | 675 ms/step , 58215.74 GFLOP/s , 532256.9 tokens/s INFO:__main__:2024-10-27 11:08:36 | Epoch: 3 | Step: 79240 | Dataset: 0-4208703 | Loss: 2.191 | 675 ms/step , 58193.91 GFLOP/s , 532243.9 tokens/s INFO:__main__:2024-10-27 11:08:43 | Epoch: 3 | Step: 79250 | Dataset: 0-4216703 | Loss: 2.162 | 676 ms/step , 58166.06 GFLOP/s , 532158.9 tokens/s INFO:__main__:2024-10-27 11:08:51 | Epoch: 3 | Step: 79260 | Dataset: 0-4224703 | Loss: 2.265 | 675 ms/step , 58206.57 GFLOP/s , 532290.4 tokens/s INFO:__main__:2024-10-27 11:08:59 | Epoch: 3 | Step: 79270 | Dataset: 0-4232703 | Loss: 2.173 | 674 ms/step , 58303.17 GFLOP/s , 532814.4 tokens/s INFO:__main__:2024-10-27 11:09:06 | Epoch: 3 | Step: 79280 | Dataset: 0-4240703 | Loss: 2.185 | 674 ms/step , 58319.91 GFLOP/s , 532858.4 tokens/s INFO:__main__:2024-10-27 11:09:14 | Epoch: 3 | Step: 79290 | Dataset: 0-4248703 | Loss: 2.089 | 675 ms/step , 58223.38 GFLOP/s , 532427.0 tokens/s INFO:__main__:2024-10-27 11:09:22 | Epoch: 3 | Step: 79300 | Dataset: 0-4256703 | Loss: 2.233 | 675 ms/step , 58271.89 GFLOP/s , 532649.3 tokens/s INFO:__main__:2024-10-27 11:09:29 | Epoch: 3 | Step: 79310 | Dataset: 0-4264703 | Loss: 2.222 | 674 ms/step , 58316.78 GFLOP/s , 532650.4 tokens/s INFO:__main__:2024-10-27 11:09:37 | Epoch: 3 | Step: 79320 | Dataset: 0-4272703 | Loss: 2.178 | 675 ms/step , 58221.84 GFLOP/s , 532903.9 tokens/s INFO:__main__:2024-10-27 11:09:45 | Epoch: 3 | Step: 79330 | Dataset: 0-4280703 | Loss: 2.180 | 675 ms/step , 58240.40 GFLOP/s , 532234.2 tokens/s INFO:__main__:2024-10-27 11:09:52 | Epoch: 3 | Step: 79340 | Dataset: 0-4288703 | Loss: 2.132 | 675 ms/step , 58230.69 GFLOP/s , 532541.4 tokens/s INFO:__main__:2024-10-27 11:10:00 | Epoch: 3 | Step: 79350 | Dataset: 0-4296703 | Loss: 2.165 | 674 ms/step , 58334.63 GFLOP/s , 531525.0 tokens/s INFO:__main__:2024-10-27 11:10:08 | Epoch: 3 | Step: 79360 | Dataset: 0-4304703 | Loss: 2.141 | 673 ms/step , 58366.00 GFLOP/s , 532670.6 tokens/s INFO:__main__:2024-10-27 11:10:16 | Epoch: 3 | Step: 79370 | Dataset: 0-4312703 | Loss: 2.123 | 675 ms/step , 58218.21 GFLOP/s , 532857.1 tokens/s INFO:__main__:2024-10-27 11:10:23 | Epoch: 3 | Step: 79380 | Dataset: 0-4320703 | Loss: 2.159 | 675 ms/step , 58244.01 GFLOP/s , 532528.3 tokens/s INFO:__main__:2024-10-27 11:10:31 | Epoch: 3 | Step: 79390 | Dataset: 0-4328703 | Loss: 2.226 | 675 ms/step , 58215.90 GFLOP/s , 532595.5 tokens/s INFO:__main__:2024-10-27 11:10:39 | Epoch: 3 | Step: 79400 | Dataset: 0-4336703 | Loss: 2.189 | 674 ms/step , 58294.78 GFLOP/s , 532112.6 tokens/s INFO:__main__:2024-10-27 11:10:46 | Epoch: 3 | Step: 79410 | Dataset: 0-4344703 | Loss: 2.164 | 675 ms/step , 58220.13 GFLOP/s , 530056.2 tokens/s INFO:__main__:2024-10-27 11:10:54 | Epoch: 3 | Step: 79420 | Dataset: 0-4352703 | Loss: 2.195 | 674 ms/step , 58353.15 GFLOP/s , 532126.0 tokens/s INFO:__main__:2024-10-27 11:11:02 | Epoch: 3 | Step: 79430 | Dataset: 0-4360703 | Loss: 1.982 | 676 ms/step , 58175.90 GFLOP/s , 532581.3 tokens/s INFO:__main__:2024-10-27 11:11:09 | Epoch: 3 | Step: 79440 | Dataset: 0-4368703 | Loss: 1.887 | 676 ms/step , 58156.78 GFLOP/s , 532340.5 tokens/s INFO:__main__:2024-10-27 11:11:17 | Epoch: 3 | Step: 79450 | Dataset: 0-4376703 | Loss: 1.865 | 674 ms/step , 58287.97 GFLOP/s , 532402.6 tokens/s INFO:__main__:2024-10-27 11:11:25 | Epoch: 3 | Step: 79460 | Dataset: 0-4384703 | Loss: 1.808 | 675 ms/step , 58223.50 GFLOP/s , 531931.9 tokens/s INFO:__main__:2024-10-27 11:11:33 | Epoch: 3 | Step: 79470 | Dataset: 0-4392703 | Loss: 1.820 | 674 ms/step , 58350.00 GFLOP/s , 530174.4 tokens/s INFO:__main__:2024-10-27 11:11:40 | Epoch: 3 | Step: 79480 | Dataset: 0-4400703 | Loss: 1.793 | 677 ms/step , 58062.51 GFLOP/s , 532348.7 tokens/s INFO:__main__:2024-10-27 11:11:48 | Epoch: 3 | Step: 79490 | Dataset: 0-4408703 | Loss: 1.776 | 678 ms/step , 58019.75 GFLOP/s , 531852.0 tokens/s INFO:__main__:2024-10-27 11:11:56 | Epoch: 3 | Step: 79500 | Dataset: 0-4416703 | Loss: 1.781 | 675 ms/step , 58240.95 GFLOP/s , 532019.2 tokens/s INFO:__main__:2024-10-27 11:12:03 | Epoch: 3 | Step: 79510 | Dataset: 0-4424703 | Loss: 1.784 | 674 ms/step , 58288.00 GFLOP/s , 532319.0 tokens/s INFO:__main__:2024-10-27 11:12:11 | Epoch: 3 | Step: 79520 | Dataset: 0-4432703 | Loss: 1.725 | 675 ms/step , 58242.62 GFLOP/s , 532429.2 tokens/s INFO:__main__:2024-10-27 11:12:19 | Epoch: 3 | Step: 79530 | Dataset: 0-4440703 | Loss: 1.716 | 675 ms/step , 58242.51 GFLOP/s , 532290.0 tokens/s INFO:__main__:2024-10-27 11:12:26 | Epoch: 3 | Step: 79540 | Dataset: 0-4448703 | Loss: 1.680 | 674 ms/step , 58323.22 GFLOP/s , 532585.9 tokens/s INFO:__main__:2024-10-27 11:12:34 | Epoch: 3 | Step: 79550 | Dataset: 0-4456703 | Loss: 1.727 | 677 ms/step , 58042.40 GFLOP/s , 531478.0 tokens/s INFO:__main__:2024-10-27 11:12:42 | Epoch: 3 | Step: 79560 | Dataset: 0-4464703 | Loss: 1.643 | 674 ms/step , 58311.37 GFLOP/s , 531467.4 tokens/s INFO:__main__:2024-10-27 11:12:49 | Epoch: 3 | Step: 79570 | Dataset: 0-4472703 | Loss: 1.663 | 674 ms/step , 58325.72 GFLOP/s , 532855.9 tokens/s INFO:__main__:2024-10-27 11:12:57 | Epoch: 3 | Step: 79580 | Dataset: 0-4480703 | Loss: 1.650 | 675 ms/step , 58250.79 GFLOP/s , 532740.8 tokens/s INFO:__main__:2024-10-27 11:13:05 | Epoch: 3 | Step: 79590 | Dataset: 0-4488703 | Loss: 1.671 | 676 ms/step , 58188.84 GFLOP/s , 532103.7 tokens/s INFO:__main__:2024-10-27 11:13:13 | Epoch: 3 | Step: 79600 | Dataset: 0-4496703 | Loss: 1.686 | 673 ms/step , 58378.65 GFLOP/s , 532421.7 tokens/s INFO:__main__:2024-10-27 11:13:20 | Epoch: 3 | Step: 79610 | Dataset: 0-4504703 | Loss: 2.233 | 675 ms/step , 58261.05 GFLOP/s , 533157.0 tokens/s INFO:__main__:2024-10-27 11:13:28 | Epoch: 3 | Step: 79620 | Dataset: 0-4512703 | Loss: 2.150 | 675 ms/step , 58221.51 GFLOP/s , 532239.8 tokens/s INFO:__main__:2024-10-27 11:13:36 | Epoch: 3 | Step: 79630 | Dataset: 0-4520703 | Loss: 2.195 | 675 ms/step , 58225.22 GFLOP/s , 532609.4 tokens/s INFO:__main__:2024-10-27 11:13:43 | Epoch: 3 | Step: 79640 | Dataset: 0-4528703 | Loss: 2.216 | 676 ms/step , 58177.96 GFLOP/s , 532459.7 tokens/s INFO:__main__:2024-10-27 11:13:51 | Epoch: 3 | Step: 79650 | Dataset: 0-4536703 | Loss: 2.251 | 675 ms/step , 58266.37 GFLOP/s , 532799.4 tokens/s INFO:__main__:2024-10-27 11:13:59 | Epoch: 3 | Step: 79660 | Dataset: 0-4544703 | Loss: 2.191 | 675 ms/step , 58276.91 GFLOP/s , 533223.3 tokens/s INFO:__main__:2024-10-27 11:14:06 | Epoch: 3 | Step: 79670 | Dataset: 0-4552703 | Loss: 2.153 | 675 ms/step , 58212.07 GFLOP/s , 533137.5 tokens/s INFO:__main__:2024-10-27 11:14:14 | Epoch: 3 | Step: 79680 | Dataset: 0-4560703 | Loss: 2.158 | 673 ms/step , 58366.14 GFLOP/s , 532817.0 tokens/s INFO:__main__:2024-10-27 11:14:22 | Epoch: 3 | Step: 79690 | Dataset: 0-4568703 | Loss: 2.071 | 675 ms/step , 58263.66 GFLOP/s , 532743.5 tokens/s INFO:__main__:2024-10-27 11:14:29 | Epoch: 3 | Step: 79700 | Dataset: 0-4576703 | Loss: 2.108 | 674 ms/step , 58293.20 GFLOP/s , 532801.2 tokens/s INFO:__main__:2024-10-27 11:14:37 | Epoch: 3 | Step: 79710 | Dataset: 0-4584703 | Loss: 2.017 | 675 ms/step , 58202.58 GFLOP/s , 532039.0 tokens/s INFO:__main__:2024-10-27 11:14:45 | Epoch: 3 | Step: 79720 | Dataset: 0-4592703 | Loss: 2.080 | 675 ms/step , 58251.19 GFLOP/s , 532668.5 tokens/s INFO:__main__:2024-10-27 11:14:53 | Epoch: 3 | Step: 79730 | Dataset: 0-4600703 | Loss: 2.162 | 676 ms/step , 58129.77 GFLOP/s , 531279.3 tokens/s INFO:__main__:2024-10-27 11:15:00 | Epoch: 3 | Step: 79740 | Dataset: 0-4608703 | Loss: 2.021 | 675 ms/step , 58199.06 GFLOP/s , 531010.5 tokens/s INFO:__main__:2024-10-27 11:15:08 | Epoch: 3 | Step: 79750 | Dataset: 0-4616703 | Loss: 2.133 | 677 ms/step , 58086.64 GFLOP/s , 531490.7 tokens/s INFO:__main__:2024-10-27 11:15:16 | Epoch: 3 | Step: 79760 | Dataset: 0-4624703 | Loss: 2.126 | 677 ms/step , 58097.85 GFLOP/s , 531393.9 tokens/s INFO:__main__:2024-10-27 11:15:23 | Epoch: 3 | Step: 79770 | Dataset: 0-4632703 | Loss: 2.177 | 675 ms/step , 58211.23 GFLOP/s , 531771.9 tokens/s INFO:__main__:2024-10-27 11:15:31 | Epoch: 3 | Step: 79780 | Dataset: 0-4640703 | Loss: 2.192 | 675 ms/step , 58224.06 GFLOP/s , 531966.9 tokens/s INFO:__main__:2024-10-27 11:15:39 | Epoch: 3 | Step: 79790 | Dataset: 0-4648703 | Loss: 2.179 | 677 ms/step , 58105.94 GFLOP/s , 531872.4 tokens/s INFO:__main__:2024-10-27 11:15:46 | Epoch: 3 | Step: 79800 | Dataset: 0-4656703 | Loss: 2.193 | 675 ms/step , 58217.45 GFLOP/s , 529741.1 tokens/s INFO:__main__:2024-10-27 11:15:54 | Epoch: 3 | Step: 79810 | Dataset: 0-4664703 | Loss: 2.084 | 675 ms/step , 58271.47 GFLOP/s , 531666.7 tokens/s INFO:__main__:2024-10-27 11:16:02 | Epoch: 3 | Step: 79820 | Dataset: 0-4672703 | Loss: 2.162 | 676 ms/step , 58179.87 GFLOP/s , 532574.9 tokens/s INFO:__main__:2024-10-27 11:16:10 | Epoch: 3 | Step: 79830 | Dataset: 0-4680703 | Loss: 2.142 | 675 ms/step , 58257.11 GFLOP/s , 532466.0 tokens/s INFO:__main__:2024-10-27 11:16:17 | Epoch: 3 | Step: 79840 | Dataset: 0-4688703 | Loss: 2.134 | 675 ms/step , 58201.19 GFLOP/s , 531989.0 tokens/s INFO:__main__:2024-10-27 11:16:25 | Epoch: 3 | Step: 79850 | Dataset: 0-4696703 | Loss: 2.108 | 675 ms/step , 58201.97 GFLOP/s , 532723.6 tokens/s INFO:__main__:2024-10-27 11:16:33 | Epoch: 3 | Step: 79860 | Dataset: 0-4704703 | Loss: 2.095 | 675 ms/step , 58253.02 GFLOP/s , 532574.1 tokens/s INFO:__main__:2024-10-27 11:16:40 | Epoch: 3 | Step: 79870 | Dataset: 0-4712703 | Loss: 2.088 | 676 ms/step , 58116.13 GFLOP/s , 532158.1 tokens/s INFO:__main__:2024-10-27 11:16:48 | Epoch: 3 | Step: 79880 | Dataset: 0-4720703 | Loss: 2.143 | 675 ms/step , 58211.47 GFLOP/s , 532670.7 tokens/s INFO:__main__:2024-10-27 11:16:56 | Epoch: 3 | Step: 79890 | Dataset: 0-4728703 | Loss: 2.110 | 675 ms/step , 58254.84 GFLOP/s , 533146.6 tokens/s INFO:__main__:2024-10-27 11:17:03 | Epoch: 3 | Step: 79900 | Dataset: 0-4736703 | Loss: 2.108 | 675 ms/step , 58257.61 GFLOP/s , 533358.2 tokens/s INFO:__main__:2024-10-27 11:17:11 | Epoch: 3 | Step: 79910 | Dataset: 0-4744703 | Loss: 2.127 | 674 ms/step , 58303.75 GFLOP/s , 532695.4 tokens/s INFO:__main__:2024-10-27 11:17:19 | Epoch: 3 | Step: 79920 | Dataset: 0-4752703 | Loss: 2.044 | 676 ms/step , 58156.90 GFLOP/s , 532387.1 tokens/s INFO:__main__:2024-10-27 11:17:26 | Epoch: 3 | Step: 79930 | Dataset: 0-4760703 | Loss: 2.134 | 676 ms/step , 58187.15 GFLOP/s , 532037.6 tokens/s INFO:__main__:2024-10-27 11:17:34 | Epoch: 3 | Step: 79940 | Dataset: 0-4768703 | Loss: 2.126 | 675 ms/step , 58242.94 GFLOP/s , 532517.8 tokens/s INFO:__main__:2024-10-27 11:17:42 | Epoch: 3 | Step: 79950 | Dataset: 0-4776703 | Loss: 2.088 | 674 ms/step , 58327.84 GFLOP/s , 532379.6 tokens/s INFO:__main__:2024-10-27 11:17:50 | Epoch: 3 | Step: 79960 | Dataset: 0-4784703 | Loss: 2.154 | 676 ms/step , 58187.56 GFLOP/s , 533187.1 tokens/s INFO:__main__:2024-10-27 11:17:57 | Epoch: 3 | Step: 79970 | Dataset: 0-4792703 | Loss: 2.113 | 675 ms/step , 58253.67 GFLOP/s , 532912.7 tokens/s INFO:__main__:2024-10-27 11:18:05 | Epoch: 3 | Step: 79980 | Dataset: 0-4800703 | Loss: 2.123 | 676 ms/step , 58126.81 GFLOP/s , 532591.4 tokens/s INFO:__main__:2024-10-27 11:18:13 | Epoch: 3 | Step: 79990 | Dataset: 0-4808703 | Loss: 2.120 | 674 ms/step , 58323.30 GFLOP/s , 533211.1 tokens/s INFO:__main__:2024-10-27 11:18:20 | Validation | Step: 80000 | Val_loss: 2.156 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 11:18:20 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_111820_step_80000.pt` INFO:__main__:2024-10-27 11:18:21 | Epoch: 3 | Step: 80000 | Dataset: 0-4816703 | Loss: 2.217 | 674 ms/step , 58310.98 GFLOP/s , 479461.8 tokens/s INFO:__main__:2024-10-27 11:18:29 | Epoch: 3 | Step: 80010 | Dataset: 0-4824703 | Loss: 2.181 | 675 ms/step , 58266.34 GFLOP/s , 531997.1 tokens/s INFO:__main__:2024-10-27 11:18:37 | Epoch: 3 | Step: 80020 | Dataset: 0-4832703 | Loss: 2.137 | 676 ms/step , 58189.89 GFLOP/s , 532145.8 tokens/s INFO:__main__:2024-10-27 11:18:44 | Epoch: 3 | Step: 80030 | Dataset: 0-4840703 | Loss: 2.155 | 676 ms/step , 58185.96 GFLOP/s , 532775.4 tokens/s INFO:__main__:2024-10-27 11:18:52 | Epoch: 3 | Step: 80040 | Dataset: 0-4848703 | Loss: 2.087 | 675 ms/step , 58227.15 GFLOP/s , 532269.4 tokens/s INFO:__main__:2024-10-27 11:19:00 | Epoch: 3 | Step: 80050 | Dataset: 0-4856703 | Loss: 2.162 | 674 ms/step , 58305.95 GFLOP/s , 532546.9 tokens/s INFO:__main__:2024-10-27 11:19:07 | Epoch: 3 | Step: 80060 | Dataset: 0-4864703 | Loss: 2.156 | 675 ms/step , 58261.96 GFLOP/s , 532604.6 tokens/s INFO:__main__:2024-10-27 11:19:15 | Epoch: 3 | Step: 80070 | Dataset: 0-4872703 | Loss: 2.071 | 675 ms/step , 58246.55 GFLOP/s , 532531.3 tokens/s INFO:__main__:2024-10-27 11:19:23 | Epoch: 3 | Step: 80080 | Dataset: 0-4880703 | Loss: 2.138 | 675 ms/step , 58269.80 GFLOP/s , 532388.0 tokens/s INFO:__main__:2024-10-27 11:19:30 | Epoch: 3 | Step: 80090 | Dataset: 0-4888703 | Loss: 2.084 | 674 ms/step , 58307.97 GFLOP/s , 532718.3 tokens/s INFO:__main__:2024-10-27 11:19:38 | Epoch: 3 | Step: 80100 | Dataset: 0-4896703 | Loss: 2.161 | 674 ms/step , 58283.04 GFLOP/s , 532624.8 tokens/s INFO:__main__:2024-10-27 11:19:46 | Epoch: 3 | Step: 80110 | Dataset: 0-4904703 | Loss: 2.145 | 675 ms/step , 58231.13 GFLOP/s , 532758.1 tokens/s INFO:__main__:2024-10-27 11:19:53 | Epoch: 3 | Step: 80120 | Dataset: 0-4912703 | Loss: 2.109 | 675 ms/step , 58236.55 GFLOP/s , 532545.1 tokens/s INFO:__main__:2024-10-27 11:20:01 | Epoch: 3 | Step: 80130 | Dataset: 0-4920703 | Loss: 2.062 | 674 ms/step , 58296.63 GFLOP/s , 532422.6 tokens/s INFO:__main__:2024-10-27 11:20:09 | Epoch: 3 | Step: 80140 | Dataset: 0-4928703 | Loss: 2.186 | 675 ms/step , 58226.30 GFLOP/s , 532158.8 tokens/s INFO:__main__:2024-10-27 11:20:17 | Epoch: 3 | Step: 80150 | Dataset: 0-4936703 | Loss: 2.060 | 676 ms/step , 58183.51 GFLOP/s , 532097.7 tokens/s INFO:__main__:2024-10-27 11:20:24 | Epoch: 3 | Step: 80160 | Dataset: 0-4944703 | Loss: 2.040 | 674 ms/step , 58280.91 GFLOP/s , 532734.2 tokens/s INFO:__main__:2024-10-27 11:20:32 | Epoch: 3 | Step: 80170 | Dataset: 0-4952703 | Loss: 2.080 | 677 ms/step , 58088.85 GFLOP/s , 532915.4 tokens/s INFO:__main__:2024-10-27 11:20:40 | Epoch: 3 | Step: 80180 | Dataset: 0-4960703 | Loss: 2.126 | 675 ms/step , 58266.68 GFLOP/s , 532116.0 tokens/s INFO:__main__:2024-10-27 11:20:47 | Epoch: 3 | Step: 80190 | Dataset: 0-4968703 | Loss: 2.020 | 675 ms/step , 58271.21 GFLOP/s , 532466.6 tokens/s INFO:__main__:2024-10-27 11:20:55 | Epoch: 3 | Step: 80200 | Dataset: 0-4976703 | Loss: 2.141 | 676 ms/step , 58186.26 GFLOP/s , 532256.5 tokens/s INFO:__main__:2024-10-27 11:21:03 | Epoch: 3 | Step: 80210 | Dataset: 0-4984703 | Loss: 2.119 | 675 ms/step , 58252.85 GFLOP/s , 532351.2 tokens/s INFO:__main__:2024-10-27 11:21:10 | Epoch: 3 | Step: 80220 | Dataset: 0-4992703 | Loss: 2.166 | 675 ms/step , 58250.55 GFLOP/s , 532234.6 tokens/s INFO:__main__:2024-10-27 11:21:18 | Epoch: 3 | Step: 80230 | Dataset: 0-5000703 | Loss: 2.116 | 674 ms/step , 58343.62 GFLOP/s , 532770.0 tokens/s INFO:__main__:2024-10-27 11:21:26 | Epoch: 3 | Step: 80240 | Dataset: 0-5008703 | Loss: 2.014 | 675 ms/step , 58256.84 GFLOP/s , 532268.8 tokens/s INFO:__main__:2024-10-27 11:21:33 | Epoch: 3 | Step: 80250 | Dataset: 0-5016703 | Loss: 2.332 | 676 ms/step , 58154.52 GFLOP/s , 532357.4 tokens/s INFO:__main__:2024-10-27 11:21:41 | Epoch: 3 | Step: 80260 | Dataset: 0-5024703 | Loss: 2.205 | 675 ms/step , 58234.85 GFLOP/s , 532230.8 tokens/s INFO:__main__:2024-10-27 11:21:49 | Epoch: 3 | Step: 80270 | Dataset: 0-5032703 | Loss: 2.236 | 675 ms/step , 58215.40 GFLOP/s , 532639.4 tokens/s INFO:__main__:2024-10-27 11:21:57 | Epoch: 3 | Step: 80280 | Dataset: 0-5040703 | Loss: 2.167 | 676 ms/step , 58186.72 GFLOP/s , 532535.9 tokens/s INFO:__main__:2024-10-27 11:22:04 | Epoch: 3 | Step: 80290 | Dataset: 0-5048703 | Loss: 2.152 | 675 ms/step , 58274.16 GFLOP/s , 531972.0 tokens/s INFO:__main__:2024-10-27 11:22:12 | Epoch: 3 | Step: 80300 | Dataset: 0-5056703 | Loss: 2.137 | 675 ms/step , 58224.77 GFLOP/s , 532565.9 tokens/s INFO:__main__:2024-10-27 11:22:20 | Epoch: 3 | Step: 80310 | Dataset: 0-5064703 | Loss: 2.185 | 676 ms/step , 58163.45 GFLOP/s , 532275.0 tokens/s INFO:__main__:2024-10-27 11:22:27 | Epoch: 3 | Step: 80320 | Dataset: 0-5072703 | Loss: 2.086 | 675 ms/step , 58226.52 GFLOP/s , 532602.9 tokens/s INFO:__main__:2024-10-27 11:22:35 | Epoch: 3 | Step: 80330 | Dataset: 0-5080703 | Loss: 2.064 | 675 ms/step , 58196.64 GFLOP/s , 532324.2 tokens/s INFO:__main__:2024-10-27 11:22:43 | Epoch: 3 | Step: 80340 | Dataset: 0-5088703 | Loss: 2.075 | 674 ms/step , 58348.67 GFLOP/s , 532947.5 tokens/s INFO:__main__:2024-10-27 11:22:50 | Epoch: 3 | Step: 80350 | Dataset: 0-5096703 | Loss: 2.157 | 676 ms/step , 58151.60 GFLOP/s , 532413.3 tokens/s INFO:__main__:2024-10-27 11:22:58 | Epoch: 3 | Step: 80360 | Dataset: 0-5104703 | Loss: 2.035 | 675 ms/step , 58258.59 GFLOP/s , 532032.3 tokens/s INFO:__main__:2024-10-27 11:23:06 | Epoch: 3 | Step: 80370 | Dataset: 0-5112703 | Loss: 2.083 | 675 ms/step , 58253.20 GFLOP/s , 532683.0 tokens/s INFO:__main__:2024-10-27 11:23:13 | Epoch: 3 | Step: 80380 | Dataset: 0-5120703 | Loss: 2.042 | 676 ms/step , 58184.51 GFLOP/s , 532243.7 tokens/s INFO:__main__:2024-10-27 11:23:21 | Epoch: 3 | Step: 80390 | Dataset: 0-5128703 | Loss: 2.015 | 675 ms/step , 58220.32 GFLOP/s , 532706.6 tokens/s INFO:__main__:2024-10-27 11:23:29 | Epoch: 3 | Step: 80400 | Dataset: 0-5136703 | Loss: 2.073 | 676 ms/step , 58168.21 GFLOP/s , 531968.1 tokens/s INFO:__main__:2024-10-27 11:23:37 | Epoch: 3 | Step: 80410 | Dataset: 0-5144703 | Loss: 2.083 | 677 ms/step , 58021.29 GFLOP/s , 532346.8 tokens/s INFO:__main__:2024-10-27 11:23:44 | Epoch: 3 | Step: 80420 | Dataset: 0-5152703 | Loss: 1.823 | 675 ms/step , 58201.92 GFLOP/s , 531778.4 tokens/s INFO:__main__:2024-10-27 11:23:52 | Epoch: 3 | Step: 80430 | Dataset: 0-5160703 | Loss: 1.756 | 675 ms/step , 58245.14 GFLOP/s , 531811.4 tokens/s INFO:__main__:2024-10-27 11:24:00 | Epoch: 3 | Step: 80440 | Dataset: 0-5168703 | Loss: 1.732 | 675 ms/step , 58224.90 GFLOP/s , 532103.0 tokens/s INFO:__main__:2024-10-27 11:24:07 | Epoch: 3 | Step: 80450 | Dataset: 0-5176703 | Loss: 1.718 | 678 ms/step , 58011.35 GFLOP/s , 530924.1 tokens/s INFO:__main__:2024-10-27 11:24:15 | Epoch: 3 | Step: 80460 | Dataset: 0-5184703 | Loss: 1.717 | 676 ms/step , 58166.03 GFLOP/s , 531718.8 tokens/s INFO:__main__:2024-10-27 11:24:23 | Epoch: 3 | Step: 80470 | Dataset: 0-5192703 | Loss: 1.643 | 675 ms/step , 58221.06 GFLOP/s , 532259.2 tokens/s INFO:__main__:2024-10-27 11:24:30 | Epoch: 3 | Step: 80480 | Dataset: 0-5200703 | Loss: 1.703 | 675 ms/step , 58195.79 GFLOP/s , 531910.6 tokens/s INFO:__main__:2024-10-27 11:24:38 | Epoch: 3 | Step: 80490 | Dataset: 0-5208703 | Loss: 1.662 | 675 ms/step , 58251.80 GFLOP/s , 532225.4 tokens/s INFO:__main__:2024-10-27 11:24:46 | Epoch: 3 | Step: 80500 | Dataset: 0-5216703 | Loss: 2.221 | 676 ms/step , 58167.69 GFLOP/s , 532276.3 tokens/s INFO:__main__:2024-10-27 11:24:54 | Epoch: 3 | Step: 80510 | Dataset: 0-5224703 | Loss: 2.225 | 675 ms/step , 58212.12 GFLOP/s , 532219.6 tokens/s INFO:__main__:2024-10-27 11:25:01 | Epoch: 3 | Step: 80520 | Dataset: 0-5232703 | Loss: 2.144 | 676 ms/step , 58130.66 GFLOP/s , 532213.9 tokens/s INFO:__main__:2024-10-27 11:25:09 | Epoch: 3 | Step: 80530 | Dataset: 0-5240703 | Loss: 2.062 | 677 ms/step , 58097.53 GFLOP/s , 532207.5 tokens/s INFO:__main__:2024-10-27 11:25:17 | Epoch: 3 | Step: 80540 | Dataset: 0-5248703 | Loss: 2.076 | 675 ms/step , 58261.86 GFLOP/s , 532736.5 tokens/s INFO:__main__:2024-10-27 11:25:24 | Epoch: 3 | Step: 80550 | Dataset: 0-5256703 | Loss: 2.203 | 675 ms/step , 58254.43 GFLOP/s , 532651.9 tokens/s INFO:__main__:2024-10-27 11:25:32 | Epoch: 3 | Step: 80560 | Dataset: 0-5264703 | Loss: 2.089 | 676 ms/step , 58192.18 GFLOP/s , 532774.4 tokens/s INFO:__main__:2024-10-27 11:25:40 | Epoch: 3 | Step: 80570 | Dataset: 0-5272703 | Loss: 2.079 | 676 ms/step , 58144.97 GFLOP/s , 530858.7 tokens/s INFO:__main__:2024-10-27 11:25:47 | Epoch: 3 | Step: 80580 | Dataset: 0-5280703 | Loss: 2.030 | 674 ms/step , 58290.84 GFLOP/s , 532798.9 tokens/s INFO:__main__:2024-10-27 11:25:55 | Epoch: 3 | Step: 80590 | Dataset: 0-5288703 | Loss: 2.123 | 676 ms/step , 58179.29 GFLOP/s , 532161.9 tokens/s INFO:__main__:2024-10-27 11:26:03 | Epoch: 3 | Step: 80600 | Dataset: 0-5296703 | Loss: 2.108 | 676 ms/step , 58185.53 GFLOP/s , 532099.0 tokens/s INFO:__main__:2024-10-27 11:26:11 | Epoch: 3 | Step: 80610 | Dataset: 0-5304703 | Loss: 2.046 | 675 ms/step , 58240.01 GFLOP/s , 532071.7 tokens/s INFO:__main__:2024-10-27 11:26:18 | Epoch: 3 | Step: 80620 | Dataset: 0-5312703 | Loss: 2.212 | 677 ms/step , 58072.19 GFLOP/s , 532108.2 tokens/s INFO:__main__:2024-10-27 11:26:26 | Epoch: 3 | Step: 80630 | Dataset: 0-5320703 | Loss: 2.104 | 676 ms/step , 58157.97 GFLOP/s , 532129.2 tokens/s INFO:__main__:2024-10-27 11:26:34 | Epoch: 3 | Step: 80640 | Dataset: 0-5328703 | Loss: 2.004 | 675 ms/step , 58218.64 GFLOP/s , 531749.1 tokens/s INFO:__main__:2024-10-27 11:26:41 | Epoch: 3 | Step: 80650 | Dataset: 0-5336703 | Loss: 1.984 | 675 ms/step , 58224.22 GFLOP/s , 532196.5 tokens/s INFO:__main__:2024-10-27 11:26:49 | Epoch: 3 | Step: 80660 | Dataset: 0-5344703 | Loss: 2.035 | 676 ms/step , 58162.59 GFLOP/s , 532107.5 tokens/s INFO:__main__:2024-10-27 11:26:57 | Epoch: 3 | Step: 80670 | Dataset: 0-5352703 | Loss: 1.982 | 676 ms/step , 58183.66 GFLOP/s , 532275.4 tokens/s INFO:__main__:2024-10-27 11:27:04 | Epoch: 3 | Step: 80680 | Dataset: 0-5360703 | Loss: 1.992 | 676 ms/step , 58170.33 GFLOP/s , 531905.0 tokens/s INFO:__main__:2024-10-27 11:27:12 | Epoch: 3 | Step: 80690 | Dataset: 0-5368703 | Loss: 1.996 | 675 ms/step , 58247.31 GFLOP/s , 532401.0 tokens/s INFO:__main__:2024-10-27 11:27:20 | Epoch: 3 | Step: 80700 | Dataset: 0-5376703 | Loss: 1.929 | 676 ms/step , 58110.33 GFLOP/s , 532093.6 tokens/s INFO:__main__:2024-10-27 11:27:28 | Epoch: 3 | Step: 80710 | Dataset: 0-5384703 | Loss: 1.923 | 676 ms/step , 58188.92 GFLOP/s , 531748.8 tokens/s INFO:__main__:2024-10-27 11:27:35 | Epoch: 3 | Step: 80720 | Dataset: 0-5392703 | Loss: 1.875 | 676 ms/step , 58186.76 GFLOP/s , 531973.9 tokens/s INFO:__main__:2024-10-27 11:27:43 | Epoch: 3 | Step: 80730 | Dataset: 0-5400703 | Loss: 1.887 | 674 ms/step , 58316.44 GFLOP/s , 532471.5 tokens/s INFO:__main__:2024-10-27 11:27:51 | Epoch: 3 | Step: 80740 | Dataset: 0-5408703 | Loss: 1.884 | 675 ms/step , 58262.34 GFLOP/s , 532464.1 tokens/s INFO:__main__:2024-10-27 11:27:58 | Epoch: 3 | Step: 80750 | Dataset: 0-5416703 | Loss: 1.850 | 676 ms/step , 58162.46 GFLOP/s , 532300.0 tokens/s INFO:__main__:2024-10-27 11:28:06 | Epoch: 3 | Step: 80760 | Dataset: 0-5424703 | Loss: 1.899 | 675 ms/step , 58240.49 GFLOP/s , 532064.3 tokens/s INFO:__main__:2024-10-27 11:28:14 | Epoch: 3 | Step: 80770 | Dataset: 0-5432703 | Loss: 1.773 | 677 ms/step , 58087.92 GFLOP/s , 531922.7 tokens/s INFO:__main__:2024-10-27 11:28:21 | Epoch: 3 | Step: 80780 | Dataset: 0-5440703 | Loss: 1.733 | 677 ms/step , 58049.25 GFLOP/s , 530830.3 tokens/s INFO:__main__:2024-10-27 11:28:29 | Epoch: 3 | Step: 80790 | Dataset: 0-5448703 | Loss: 1.748 | 675 ms/step , 58264.47 GFLOP/s , 531656.4 tokens/s INFO:__main__:2024-10-27 11:28:37 | Epoch: 3 | Step: 80800 | Dataset: 0-5456703 | Loss: 1.739 | 674 ms/step , 58297.40 GFLOP/s , 532445.1 tokens/s INFO:__main__:2024-10-27 11:28:45 | Epoch: 3 | Step: 80810 | Dataset: 0-5464703 | Loss: 1.741 | 675 ms/step , 58230.72 GFLOP/s , 531953.9 tokens/s INFO:__main__:2024-10-27 11:28:52 | Epoch: 3 | Step: 80820 | Dataset: 0-5472703 | Loss: 1.749 | 675 ms/step , 58214.61 GFLOP/s , 531844.1 tokens/s INFO:__main__:2024-10-27 11:29:00 | Epoch: 3 | Step: 80830 | Dataset: 0-5480703 | Loss: 1.733 | 674 ms/step , 58307.89 GFLOP/s , 532319.4 tokens/s INFO:__main__:2024-10-27 11:29:08 | Epoch: 3 | Step: 80840 | Dataset: 0-5488703 | Loss: 1.761 | 675 ms/step , 58224.29 GFLOP/s , 532152.3 tokens/s INFO:__main__:2024-10-27 11:29:15 | Epoch: 3 | Step: 80850 | Dataset: 0-5496703 | Loss: 2.278 | 674 ms/step , 58280.29 GFLOP/s , 532845.5 tokens/s INFO:__main__:2024-10-27 11:29:23 | Epoch: 3 | Step: 80860 | Dataset: 0-5504703 | Loss: 2.263 | 675 ms/step , 58274.49 GFLOP/s , 532723.2 tokens/s INFO:__main__:2024-10-27 11:29:31 | Epoch: 3 | Step: 80870 | Dataset: 0-5512703 | Loss: 2.233 | 674 ms/step , 58294.13 GFLOP/s , 532724.3 tokens/s INFO:__main__:2024-10-27 11:29:38 | Epoch: 3 | Step: 80880 | Dataset: 0-5520703 | Loss: 2.246 | 675 ms/step , 58276.98 GFLOP/s , 532494.6 tokens/s INFO:__main__:2024-10-27 11:29:46 | Epoch: 3 | Step: 80890 | Dataset: 0-5528703 | Loss: 2.248 | 674 ms/step , 58312.89 GFLOP/s , 532870.4 tokens/s INFO:__main__:2024-10-27 11:29:54 | Epoch: 3 | Step: 80900 | Dataset: 0-5536703 | Loss: 2.135 | 676 ms/step , 58184.77 GFLOP/s , 532619.4 tokens/s INFO:__main__:2024-10-27 11:30:01 | Epoch: 3 | Step: 80910 | Dataset: 0-5544703 | Loss: 2.122 | 676 ms/step , 58174.57 GFLOP/s , 531900.1 tokens/s INFO:__main__:2024-10-27 11:30:09 | Epoch: 3 | Step: 80920 | Dataset: 0-5552703 | Loss: 2.154 | 676 ms/step , 58112.17 GFLOP/s , 531765.4 tokens/s INFO:__main__:2024-10-27 11:30:17 | Epoch: 3 | Step: 80930 | Dataset: 0-5560703 | Loss: 2.054 | 676 ms/step , 58174.13 GFLOP/s , 532337.2 tokens/s INFO:__main__:2024-10-27 11:30:25 | Epoch: 3 | Step: 80940 | Dataset: 0-5568703 | Loss: 2.197 | 675 ms/step , 58242.40 GFLOP/s , 531755.5 tokens/s INFO:__main__:2024-10-27 11:30:32 | Epoch: 3 | Step: 80950 | Dataset: 0-5576703 | Loss: 2.152 | 676 ms/step , 58190.21 GFLOP/s , 531901.4 tokens/s INFO:__main__:2024-10-27 11:30:40 | Epoch: 3 | Step: 80960 | Dataset: 0-5584703 | Loss: 2.199 | 677 ms/step , 58052.08 GFLOP/s , 532421.5 tokens/s INFO:__main__:2024-10-27 11:30:48 | Epoch: 3 | Step: 80970 | Dataset: 0-5592703 | Loss: 2.108 | 677 ms/step , 58039.76 GFLOP/s , 530484.5 tokens/s INFO:__main__:2024-10-27 11:30:55 | Epoch: 3 | Step: 80980 | Dataset: 0-5600703 | Loss: 2.229 | 678 ms/step , 57981.72 GFLOP/s , 530336.7 tokens/s INFO:__main__:2024-10-27 11:31:03 | Epoch: 3 | Step: 80990 | Dataset: 0-5608703 | Loss: 2.059 | 675 ms/step , 58207.07 GFLOP/s , 530839.4 tokens/s INFO:__main__:2024-10-27 11:31:10 | Validation | Step: 81000 | Val_loss: 2.214 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 11:31:10 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_113110_step_81000.pt` INFO:__main__:2024-10-27 11:31:12 | Epoch: 3 | Step: 81000 | Dataset: 0-5616703 | Loss: 2.173 | 674 ms/step , 58328.70 GFLOP/s , 479529.6 tokens/s INFO:__main__:2024-10-27 11:31:19 | Epoch: 3 | Step: 81010 | Dataset: 0-5624703 | Loss: 2.140 | 675 ms/step , 58201.56 GFLOP/s , 532065.5 tokens/s INFO:__main__:2024-10-27 11:31:27 | Epoch: 3 | Step: 81020 | Dataset: 0-5632703 | Loss: 2.271 | 675 ms/step , 58243.06 GFLOP/s , 532557.8 tokens/s INFO:__main__:2024-10-27 11:31:35 | Epoch: 3 | Step: 81030 | Dataset: 0-5640703 | Loss: 2.223 | 674 ms/step , 58307.03 GFLOP/s , 533118.7 tokens/s INFO:__main__:2024-10-27 11:31:42 | Epoch: 3 | Step: 81040 | Dataset: 0-5648703 | Loss: 2.204 | 674 ms/step , 58286.40 GFLOP/s , 532707.2 tokens/s INFO:__main__:2024-10-27 11:31:50 | Epoch: 3 | Step: 81050 | Dataset: 0-5656703 | Loss: 2.196 | 675 ms/step , 58224.63 GFLOP/s , 532750.2 tokens/s INFO:__main__:2024-10-27 11:31:58 | Epoch: 3 | Step: 81060 | Dataset: 0-5664703 | Loss: 2.248 | 675 ms/step , 58239.47 GFLOP/s , 532095.8 tokens/s INFO:__main__:2024-10-27 11:32:05 | Epoch: 3 | Step: 81070 | Dataset: 0-5672703 | Loss: 2.178 | 675 ms/step , 58257.14 GFLOP/s , 532349.3 tokens/s INFO:__main__:2024-10-27 11:32:13 | Epoch: 3 | Step: 81080 | Dataset: 0-5680703 | Loss: 2.149 | 676 ms/step , 58176.66 GFLOP/s , 532126.4 tokens/s INFO:__main__:2024-10-27 11:32:21 | Epoch: 3 | Step: 81090 | Dataset: 0-5688703 | Loss: 2.152 | 675 ms/step , 58193.19 GFLOP/s , 532147.2 tokens/s INFO:__main__:2024-10-27 11:32:29 | Epoch: 3 | Step: 81100 | Dataset: 0-5696703 | Loss: 2.129 | 675 ms/step , 58220.73 GFLOP/s , 532174.4 tokens/s INFO:__main__:2024-10-27 11:32:36 | Epoch: 3 | Step: 81110 | Dataset: 0-5704703 | Loss: 2.227 | 674 ms/step , 58282.23 GFLOP/s , 531824.1 tokens/s INFO:__main__:2024-10-27 11:32:44 | Epoch: 3 | Step: 81120 | Dataset: 0-5712703 | Loss: 2.117 | 676 ms/step , 58125.31 GFLOP/s , 530774.3 tokens/s INFO:__main__:2024-10-27 11:32:52 | Epoch: 3 | Step: 81130 | Dataset: 0-5720703 | Loss: 2.179 | 675 ms/step , 58211.06 GFLOP/s , 530902.6 tokens/s INFO:__main__:2024-10-27 11:32:59 | Epoch: 3 | Step: 81140 | Dataset: 0-5728703 | Loss: 2.092 | 674 ms/step , 58280.63 GFLOP/s , 531589.6 tokens/s INFO:__main__:2024-10-27 11:33:07 | Epoch: 3 | Step: 81150 | Dataset: 0-5736703 | Loss: 2.080 | 676 ms/step , 58136.94 GFLOP/s , 531343.5 tokens/s INFO:__main__:2024-10-27 11:33:15 | Epoch: 3 | Step: 81160 | Dataset: 0-5744703 | Loss: 2.144 | 675 ms/step , 58276.88 GFLOP/s , 531475.9 tokens/s INFO:__main__:2024-10-27 11:33:23 | Epoch: 3 | Step: 81170 | Dataset: 0-5752703 | Loss: 2.169 | 677 ms/step , 58031.34 GFLOP/s , 530863.7 tokens/s INFO:__main__:2024-10-27 11:33:30 | Epoch: 3 | Step: 81180 | Dataset: 0-5760703 | Loss: 2.189 | 679 ms/step , 57917.39 GFLOP/s , 529310.9 tokens/s INFO:__main__:2024-10-27 11:33:38 | Epoch: 3 | Step: 81190 | Dataset: 0-5768703 | Loss: 2.188 | 676 ms/step , 58188.73 GFLOP/s , 528395.3 tokens/s INFO:__main__:2024-10-27 11:33:46 | Epoch: 3 | Step: 81200 | Dataset: 0-5776703 | Loss: 2.229 | 676 ms/step , 58186.97 GFLOP/s , 531424.8 tokens/s INFO:__main__:2024-10-27 11:33:53 | Epoch: 3 | Step: 81210 | Dataset: 0-5784703 | Loss: 2.162 | 675 ms/step , 58201.31 GFLOP/s , 532135.7 tokens/s INFO:__main__:2024-10-27 11:34:01 | Epoch: 3 | Step: 81220 | Dataset: 0-5792703 | Loss: 2.183 | 675 ms/step , 58204.89 GFLOP/s , 532319.0 tokens/s INFO:__main__:2024-10-27 11:34:09 | Epoch: 3 | Step: 81230 | Dataset: 0-5800703 | Loss: 2.167 | 675 ms/step , 58238.45 GFLOP/s , 532547.3 tokens/s INFO:__main__:2024-10-27 11:34:17 | Epoch: 3 | Step: 81240 | Dataset: 0-5808703 | Loss: 2.175 | 676 ms/step , 58182.31 GFLOP/s , 532461.4 tokens/s INFO:__main__:2024-10-27 11:34:24 | Epoch: 3 | Step: 81250 | Dataset: 0-5816703 | Loss: 2.149 | 674 ms/step , 58325.95 GFLOP/s , 532396.2 tokens/s INFO:__main__:2024-10-27 11:34:32 | Epoch: 3 | Step: 81260 | Dataset: 0-5824703 | Loss: 2.170 | 675 ms/step , 58261.06 GFLOP/s , 532343.1 tokens/s INFO:__main__:2024-10-27 11:34:40 | Epoch: 3 | Step: 81270 | Dataset: 0-5832703 | Loss: 2.049 | 675 ms/step , 58245.91 GFLOP/s , 532664.1 tokens/s INFO:__main__:2024-10-27 11:34:47 | Epoch: 3 | Step: 81280 | Dataset: 0-5840703 | Loss: 2.197 | 674 ms/step , 58332.55 GFLOP/s , 532434.0 tokens/s INFO:__main__:2024-10-27 11:34:55 | Epoch: 3 | Step: 81290 | Dataset: 0-5848703 | Loss: 2.179 | 674 ms/step , 58289.57 GFLOP/s , 532898.8 tokens/s INFO:__main__:2024-10-27 11:35:03 | Epoch: 3 | Step: 81300 | Dataset: 0-5856703 | Loss: 2.203 | 675 ms/step , 58193.31 GFLOP/s , 532175.3 tokens/s INFO:__main__:2024-10-27 11:35:10 | Epoch: 3 | Step: 81310 | Dataset: 0-5864703 | Loss: 2.089 | 676 ms/step , 58174.55 GFLOP/s , 532293.4 tokens/s INFO:__main__:2024-10-27 11:35:18 | Epoch: 3 | Step: 81320 | Dataset: 0-5872703 | Loss: 2.133 | 675 ms/step , 58203.73 GFLOP/s , 532297.6 tokens/s INFO:__main__:2024-10-27 11:35:26 | Epoch: 3 | Step: 81330 | Dataset: 0-5880703 | Loss: 2.136 | 675 ms/step , 58213.56 GFLOP/s , 532373.8 tokens/s INFO:__main__:2024-10-27 11:35:33 | Epoch: 3 | Step: 81340 | Dataset: 0-5888703 | Loss: 2.185 | 678 ms/step , 57998.30 GFLOP/s , 531482.7 tokens/s INFO:__main__:2024-10-27 11:35:41 | Epoch: 3 | Step: 81350 | Dataset: 0-5896703 | Loss: 2.185 | 678 ms/step , 57978.93 GFLOP/s , 530520.2 tokens/s INFO:__main__:2024-10-27 11:35:49 | Epoch: 3 | Step: 81360 | Dataset: 0-5904703 | Loss: 2.106 | 677 ms/step , 58094.74 GFLOP/s , 532483.2 tokens/s INFO:__main__:2024-10-27 11:35:57 | Epoch: 3 | Step: 81370 | Dataset: 0-5912703 | Loss: 2.141 | 675 ms/step , 58271.31 GFLOP/s , 532563.6 tokens/s INFO:__main__:2024-10-27 11:36:04 | Epoch: 3 | Step: 81380 | Dataset: 0-5920703 | Loss: 2.144 | 675 ms/step , 58235.78 GFLOP/s , 532939.6 tokens/s INFO:__main__:2024-10-27 11:36:12 | Epoch: 3 | Step: 81390 | Dataset: 0-5928703 | Loss: 2.176 | 675 ms/step , 58276.91 GFLOP/s , 532293.1 tokens/s INFO:__main__:2024-10-27 11:36:20 | Epoch: 3 | Step: 81400 | Dataset: 0-5936703 | Loss: 2.096 | 676 ms/step , 58116.09 GFLOP/s , 532404.8 tokens/s INFO:__main__:2024-10-27 11:36:27 | Epoch: 3 | Step: 81410 | Dataset: 0-5944703 | Loss: 2.277 | 675 ms/step , 58212.34 GFLOP/s , 532336.1 tokens/s INFO:__main__:2024-10-27 11:36:35 | Epoch: 3 | Step: 81420 | Dataset: 0-5952703 | Loss: 2.157 | 676 ms/step , 58144.91 GFLOP/s , 532011.5 tokens/s INFO:__main__:2024-10-27 11:36:43 | Epoch: 3 | Step: 81430 | Dataset: 0-5960703 | Loss: 2.138 | 676 ms/step , 58164.19 GFLOP/s , 532213.8 tokens/s INFO:__main__:2024-10-27 11:36:50 | Epoch: 3 | Step: 81440 | Dataset: 0-5968703 | Loss: 2.157 | 676 ms/step , 58116.15 GFLOP/s , 531337.8 tokens/s INFO:__main__:2024-10-27 11:36:58 | Epoch: 3 | Step: 81450 | Dataset: 0-5976703 | Loss: 2.135 | 676 ms/step , 58109.01 GFLOP/s , 531767.1 tokens/s INFO:__main__:2024-10-27 11:37:06 | Epoch: 3 | Step: 81460 | Dataset: 0-5984703 | Loss: 2.241 | 675 ms/step , 58206.01 GFLOP/s , 532563.7 tokens/s INFO:__main__:2024-10-27 11:37:14 | Epoch: 3 | Step: 81470 | Dataset: 0-5992703 | Loss: 2.126 | 676 ms/step , 58166.65 GFLOP/s , 531984.0 tokens/s INFO:__main__:2024-10-27 11:37:21 | Epoch: 3 | Step: 81480 | Dataset: 0-6000703 | Loss: 2.179 | 676 ms/step , 58162.22 GFLOP/s , 532200.9 tokens/s INFO:__main__:2024-10-27 11:37:29 | Epoch: 3 | Step: 81490 | Dataset: 0-6008703 | Loss: 2.289 | 676 ms/step , 58152.30 GFLOP/s , 532415.8 tokens/s INFO:__main__:2024-10-27 11:37:37 | Epoch: 3 | Step: 81500 | Dataset: 0-6016703 | Loss: 2.199 | 676 ms/step , 58136.42 GFLOP/s , 531700.8 tokens/s INFO:__main__:2024-10-27 11:37:44 | Epoch: 3 | Step: 81510 | Dataset: 0-6024703 | Loss: 2.072 | 675 ms/step , 58254.16 GFLOP/s , 532213.2 tokens/s INFO:__main__:2024-10-27 11:37:52 | Epoch: 3 | Step: 81520 | Dataset: 0-6032703 | Loss: 2.197 | 676 ms/step , 58169.06 GFLOP/s , 532007.2 tokens/s INFO:__main__:2024-10-27 11:38:00 | Epoch: 3 | Step: 81530 | Dataset: 0-6040703 | Loss: 2.095 | 681 ms/step , 57747.91 GFLOP/s , 530528.2 tokens/s INFO:__main__:2024-10-27 11:38:07 | Epoch: 3 | Step: 81540 | Dataset: 0-6048703 | Loss: 2.152 | 676 ms/step , 58175.08 GFLOP/s , 532065.1 tokens/s INFO:__main__:2024-10-27 11:38:15 | Epoch: 3 | Step: 81550 | Dataset: 0-6056703 | Loss: 2.137 | 675 ms/step , 58193.22 GFLOP/s , 532223.7 tokens/s INFO:__main__:2024-10-27 11:38:23 | Epoch: 3 | Step: 81560 | Dataset: 0-6064703 | Loss: 2.121 | 675 ms/step , 58193.02 GFLOP/s , 532086.4 tokens/s INFO:__main__:2024-10-27 11:38:31 | Epoch: 3 | Step: 81570 | Dataset: 0-6072703 | Loss: 2.109 | 674 ms/step , 58285.41 GFLOP/s , 532265.3 tokens/s INFO:__main__:2024-10-27 11:38:38 | Epoch: 3 | Step: 81580 | Dataset: 0-6080703 | Loss: 2.051 | 675 ms/step , 58211.66 GFLOP/s , 532046.8 tokens/s INFO:__main__:2024-10-27 11:38:46 | Epoch: 3 | Step: 81590 | Dataset: 0-6088703 | Loss: 2.120 | 674 ms/step , 58292.15 GFLOP/s , 532176.1 tokens/s INFO:__main__:2024-10-27 11:38:54 | Epoch: 3 | Step: 81600 | Dataset: 0-6096703 | Loss: 2.101 | 676 ms/step , 58189.26 GFLOP/s , 532161.7 tokens/s INFO:__main__:2024-10-27 11:39:01 | Epoch: 3 | Step: 81610 | Dataset: 0-6104703 | Loss: 2.078 | 675 ms/step , 58205.02 GFLOP/s , 531651.1 tokens/s INFO:__main__:2024-10-27 11:39:09 | Epoch: 3 | Step: 81620 | Dataset: 0-6112703 | Loss: 2.041 | 676 ms/step , 58189.61 GFLOP/s , 532055.4 tokens/s INFO:__main__:2024-10-27 11:39:17 | Epoch: 3 | Step: 81630 | Dataset: 0-6120703 | Loss: 2.079 | 674 ms/step , 58327.82 GFLOP/s , 532133.4 tokens/s INFO:__main__:2024-10-27 11:39:24 | Epoch: 3 | Step: 81640 | Dataset: 0-6128703 | Loss: 2.135 | 675 ms/step , 58235.79 GFLOP/s , 532251.9 tokens/s INFO:__main__:2024-10-27 11:39:32 | Epoch: 3 | Step: 81650 | Dataset: 0-6136703 | Loss: 2.223 | 675 ms/step , 58245.32 GFLOP/s , 531778.6 tokens/s INFO:__main__:2024-10-27 11:39:40 | Epoch: 3 | Step: 81660 | Dataset: 0-6144703 | Loss: 2.161 | 674 ms/step , 58288.33 GFLOP/s , 532315.0 tokens/s INFO:__main__:2024-10-27 11:39:48 | Epoch: 3 | Step: 81670 | Dataset: 0-6152703 | Loss: 2.143 | 676 ms/step , 58132.06 GFLOP/s , 532349.0 tokens/s INFO:__main__:2024-10-27 11:39:55 | Epoch: 3 | Step: 81680 | Dataset: 0-6160703 | Loss: 2.102 | 676 ms/step , 58160.08 GFLOP/s , 531865.3 tokens/s INFO:__main__:2024-10-27 11:40:03 | Epoch: 3 | Step: 81690 | Dataset: 0-6168703 | Loss: 2.115 | 676 ms/step , 58175.76 GFLOP/s , 531874.1 tokens/s INFO:__main__:2024-10-27 11:40:11 | Epoch: 3 | Step: 81700 | Dataset: 0-6176703 | Loss: 2.223 | 676 ms/step , 58156.51 GFLOP/s , 531128.3 tokens/s INFO:__main__:2024-10-27 11:40:18 | Epoch: 3 | Step: 81710 | Dataset: 0-6184703 | Loss: 2.174 | 676 ms/step , 58148.57 GFLOP/s , 532029.7 tokens/s INFO:__main__:2024-10-27 11:40:26 | Epoch: 3 | Step: 81720 | Dataset: 0-6192703 | Loss: 2.179 | 675 ms/step , 58265.17 GFLOP/s , 532258.9 tokens/s INFO:__main__:2024-10-27 11:40:34 | Epoch: 3 | Step: 81730 | Dataset: 0-6200703 | Loss: 2.087 | 674 ms/step , 58329.08 GFLOP/s , 532435.1 tokens/s INFO:__main__:2024-10-27 11:40:41 | Epoch: 3 | Step: 81740 | Dataset: 0-6208703 | Loss: 2.090 | 675 ms/step , 58225.36 GFLOP/s , 532418.4 tokens/s INFO:__main__:2024-10-27 11:40:49 | Epoch: 3 | Step: 81750 | Dataset: 0-6216703 | Loss: 2.049 | 676 ms/step , 58177.14 GFLOP/s , 532221.8 tokens/s INFO:__main__:2024-10-27 11:40:57 | Epoch: 3 | Step: 81760 | Dataset: 0-6224703 | Loss: 2.128 | 676 ms/step , 58187.85 GFLOP/s , 531974.8 tokens/s INFO:__main__:2024-10-27 11:41:04 | Epoch: 3 | Step: 81770 | Dataset: 0-6232703 | Loss: 2.045 | 675 ms/step , 58251.13 GFLOP/s , 531877.2 tokens/s INFO:__main__:2024-10-27 11:41:12 | Epoch: 3 | Step: 81780 | Dataset: 0-6240703 | Loss: 2.169 | 675 ms/step , 58218.32 GFLOP/s , 532056.1 tokens/s INFO:__main__:2024-10-27 11:41:20 | Epoch: 3 | Step: 81790 | Dataset: 0-6248703 | Loss: 2.040 | 674 ms/step , 58284.53 GFLOP/s , 532090.5 tokens/s INFO:__main__:2024-10-27 11:41:28 | Epoch: 3 | Step: 81800 | Dataset: 0-6256703 | Loss: 2.085 | 674 ms/step , 58319.65 GFLOP/s , 532448.4 tokens/s INFO:__main__:2024-10-27 11:41:35 | Epoch: 3 | Step: 81810 | Dataset: 0-6264703 | Loss: 2.117 | 676 ms/step , 58167.63 GFLOP/s , 532211.5 tokens/s INFO:__main__:2024-10-27 11:41:43 | Epoch: 3 | Step: 81820 | Dataset: 0-6272703 | Loss: 2.223 | 676 ms/step , 58131.00 GFLOP/s , 532185.4 tokens/s INFO:__main__:2024-10-27 11:41:51 | Epoch: 3 | Step: 81830 | Dataset: 0-6280703 | Loss: 2.142 | 674 ms/step , 58329.51 GFLOP/s , 532099.3 tokens/s INFO:__main__:2024-10-27 11:41:58 | Epoch: 3 | Step: 81840 | Dataset: 0-6288703 | Loss: 2.151 | 675 ms/step , 58196.08 GFLOP/s , 532241.2 tokens/s INFO:__main__:2024-10-27 11:42:06 | Epoch: 3 | Step: 81850 | Dataset: 0-6296703 | Loss: 2.119 | 675 ms/step , 58248.54 GFLOP/s , 532126.4 tokens/s INFO:__main__:2024-10-27 11:42:14 | Epoch: 3 | Step: 81860 | Dataset: 0-6304703 | Loss: 2.229 | 676 ms/step , 58181.10 GFLOP/s , 531748.4 tokens/s INFO:__main__:2024-10-27 11:42:21 | Epoch: 3 | Step: 81870 | Dataset: 0-6312703 | Loss: 2.173 | 675 ms/step , 58258.12 GFLOP/s , 532157.5 tokens/s INFO:__main__:2024-10-27 11:42:29 | Epoch: 3 | Step: 81880 | Dataset: 0-6320703 | Loss: 2.154 | 674 ms/step , 58309.00 GFLOP/s , 532154.1 tokens/s INFO:__main__:2024-10-27 11:42:37 | Epoch: 3 | Step: 81890 | Dataset: 0-6328703 | Loss: 2.154 | 675 ms/step , 58224.36 GFLOP/s , 532584.0 tokens/s INFO:__main__:2024-10-27 11:42:45 | Epoch: 3 | Step: 81900 | Dataset: 0-6336703 | Loss: 2.152 | 678 ms/step , 57997.64 GFLOP/s , 531873.6 tokens/s INFO:__main__:2024-10-27 11:42:52 | Epoch: 3 | Step: 81910 | Dataset: 0-6344703 | Loss: 2.208 | 677 ms/step , 58031.61 GFLOP/s , 530751.6 tokens/s INFO:__main__:2024-10-27 11:43:00 | Epoch: 3 | Step: 81920 | Dataset: 0-6352703 | Loss: 2.038 | 678 ms/step , 58018.81 GFLOP/s , 530306.7 tokens/s INFO:__main__:2024-10-27 11:43:08 | Epoch: 3 | Step: 81930 | Dataset: 0-6360703 | Loss: 2.142 | 674 ms/step , 58353.19 GFLOP/s , 531827.6 tokens/s INFO:__main__:2024-10-27 11:43:15 | Epoch: 3 | Step: 81940 | Dataset: 0-6368703 | Loss: 2.133 | 675 ms/step , 58256.15 GFLOP/s , 532538.5 tokens/s INFO:__main__:2024-10-27 11:43:23 | Epoch: 3 | Step: 81950 | Dataset: 0-6376703 | Loss: 2.160 | 677 ms/step , 58078.89 GFLOP/s , 530655.7 tokens/s INFO:__main__:2024-10-27 11:43:31 | Epoch: 3 | Step: 81960 | Dataset: 0-6384703 | Loss: 2.147 | 674 ms/step , 58310.59 GFLOP/s , 531174.5 tokens/s INFO:__main__:2024-10-27 11:43:39 | Epoch: 3 | Step: 81970 | Dataset: 0-6392703 | Loss: 2.170 | 675 ms/step , 58264.34 GFLOP/s , 532617.0 tokens/s INFO:__main__:2024-10-27 11:43:46 | Epoch: 3 | Step: 81980 | Dataset: 0-6400703 | Loss: 2.294 | 674 ms/step , 58280.52 GFLOP/s , 532612.7 tokens/s INFO:__main__:2024-10-27 11:43:54 | Epoch: 3 | Step: 81990 | Dataset: 0-6408703 | Loss: 2.253 | 676 ms/step , 58183.25 GFLOP/s , 531703.3 tokens/s INFO:__main__:2024-10-27 11:44:01 | Validation | Step: 82000 | Val_loss: 2.274 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 11:44:01 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_114401_step_82000.pt` INFO:__main__:2024-10-27 11:44:02 | Epoch: 3 | Step: 82000 | Dataset: 0-6416703 | Loss: 2.135 | 675 ms/step , 58232.18 GFLOP/s , 479568.1 tokens/s INFO:__main__:2024-10-27 11:44:10 | Epoch: 3 | Step: 82010 | Dataset: 0-6424703 | Loss: 2.106 | 676 ms/step , 58151.64 GFLOP/s , 531545.8 tokens/s INFO:__main__:2024-10-27 11:44:18 | Epoch: 3 | Step: 82020 | Dataset: 0-6432703 | Loss: 2.101 | 675 ms/step , 58277.60 GFLOP/s , 532170.4 tokens/s INFO:__main__:2024-10-27 11:44:26 | Epoch: 3 | Step: 82030 | Dataset: 0-6440703 | Loss: 2.080 | 677 ms/step , 58101.81 GFLOP/s , 531646.4 tokens/s INFO:__main__:2024-10-27 11:44:33 | Epoch: 3 | Step: 82040 | Dataset: 0-6448703 | Loss: 2.055 | 674 ms/step , 58339.53 GFLOP/s , 532219.3 tokens/s INFO:__main__:2024-10-27 11:44:41 | Epoch: 3 | Step: 82050 | Dataset: 0-6456703 | Loss: 2.072 | 679 ms/step , 57907.79 GFLOP/s , 531393.7 tokens/s INFO:__main__:2024-10-27 11:44:49 | Epoch: 3 | Step: 82060 | Dataset: 0-6464703 | Loss: 1.986 | 678 ms/step , 58010.39 GFLOP/s , 530353.9 tokens/s INFO:__main__:2024-10-27 11:44:56 | Epoch: 3 | Step: 82070 | Dataset: 0-6472703 | Loss: 2.025 | 678 ms/step , 57978.91 GFLOP/s , 530149.6 tokens/s INFO:__main__:2024-10-27 11:45:04 | Epoch: 3 | Step: 82080 | Dataset: 0-6480703 | Loss: 1.993 | 678 ms/step , 57986.04 GFLOP/s , 530059.7 tokens/s INFO:__main__:2024-10-27 11:45:12 | Epoch: 3 | Step: 82090 | Dataset: 0-6488703 | Loss: 2.026 | 674 ms/step , 58294.81 GFLOP/s , 531427.2 tokens/s INFO:__main__:2024-10-27 11:45:20 | Epoch: 3 | Step: 82100 | Dataset: 0-6496703 | Loss: 2.003 | 675 ms/step , 58201.15 GFLOP/s , 531824.5 tokens/s INFO:__main__:2024-10-27 11:45:27 | Epoch: 3 | Step: 82110 | Dataset: 0-6504703 | Loss: 1.993 | 675 ms/step , 58219.12 GFLOP/s , 531857.1 tokens/s INFO:__main__:2024-10-27 11:45:35 | Epoch: 3 | Step: 82120 | Dataset: 0-6512703 | Loss: 1.930 | 676 ms/step , 58166.19 GFLOP/s , 531054.6 tokens/s INFO:__main__:2024-10-27 11:45:43 | Epoch: 3 | Step: 82130 | Dataset: 0-6520703 | Loss: 1.991 | 674 ms/step , 58312.95 GFLOP/s , 531499.7 tokens/s INFO:__main__:2024-10-27 11:45:50 | Epoch: 3 | Step: 82140 | Dataset: 0-6528703 | Loss: 2.154 | 675 ms/step , 58225.84 GFLOP/s , 531955.2 tokens/s INFO:__main__:2024-10-27 11:45:58 | Epoch: 3 | Step: 82150 | Dataset: 0-6536703 | Loss: 1.940 | 676 ms/step , 58125.24 GFLOP/s , 531953.3 tokens/s INFO:__main__:2024-10-27 11:46:06 | Epoch: 3 | Step: 82160 | Dataset: 0-6544703 | Loss: 1.869 | 675 ms/step , 58207.05 GFLOP/s , 532120.0 tokens/s INFO:__main__:2024-10-27 11:46:13 | Epoch: 3 | Step: 82170 | Dataset: 0-6552703 | Loss: 1.831 | 676 ms/step , 58179.28 GFLOP/s , 531719.5 tokens/s INFO:__main__:2024-10-27 11:46:21 | Epoch: 3 | Step: 82180 | Dataset: 0-6560703 | Loss: 1.831 | 677 ms/step , 58103.10 GFLOP/s , 531700.2 tokens/s INFO:__main__:2024-10-27 11:46:29 | Epoch: 3 | Step: 82190 | Dataset: 0-6568703 | Loss: 1.824 | 676 ms/step , 58187.70 GFLOP/s , 531668.7 tokens/s INFO:__main__:2024-10-27 11:46:37 | Epoch: 3 | Step: 82200 | Dataset: 0-6576703 | Loss: 1.793 | 676 ms/step , 58178.72 GFLOP/s , 531876.8 tokens/s INFO:__main__:2024-10-27 11:46:44 | Epoch: 3 | Step: 82210 | Dataset: 0-6584703 | Loss: 1.789 | 675 ms/step , 58267.23 GFLOP/s , 531689.7 tokens/s INFO:__main__:2024-10-27 11:46:52 | Epoch: 3 | Step: 82220 | Dataset: 0-6592703 | Loss: 1.774 | 676 ms/step , 58169.46 GFLOP/s , 531756.3 tokens/s INFO:__main__:2024-10-27 11:47:00 | Epoch: 3 | Step: 82230 | Dataset: 0-6600703 | Loss: 1.801 | 675 ms/step , 58229.72 GFLOP/s , 532078.2 tokens/s INFO:__main__:2024-10-27 11:47:07 | Epoch: 3 | Step: 82240 | Dataset: 0-6608703 | Loss: 1.778 | 675 ms/step , 58233.48 GFLOP/s , 532106.6 tokens/s INFO:__main__:2024-10-27 11:47:15 | Epoch: 3 | Step: 82250 | Dataset: 0-6616703 | Loss: 1.780 | 676 ms/step , 58111.05 GFLOP/s , 531616.0 tokens/s INFO:__main__:2024-10-27 11:47:23 | Epoch: 3 | Step: 82260 | Dataset: 0-6624703 | Loss: 1.787 | 675 ms/step , 58217.00 GFLOP/s , 531791.0 tokens/s INFO:__main__:2024-10-27 11:47:30 | Epoch: 3 | Step: 82270 | Dataset: 0-6632703 | Loss: 1.777 | 676 ms/step , 58116.71 GFLOP/s , 531788.1 tokens/s INFO:__main__:2024-10-27 11:47:38 | Epoch: 3 | Step: 82280 | Dataset: 0-6640703 | Loss: 1.725 | 676 ms/step , 58166.82 GFLOP/s , 531884.6 tokens/s INFO:__main__:2024-10-27 11:47:46 | Epoch: 3 | Step: 82290 | Dataset: 0-6648703 | Loss: 1.769 | 675 ms/step , 58273.91 GFLOP/s , 532274.3 tokens/s INFO:__main__:2024-10-27 11:47:54 | Epoch: 3 | Step: 82300 | Dataset: 0-6656703 | Loss: 1.783 | 676 ms/step , 58134.45 GFLOP/s , 531694.7 tokens/s INFO:__main__:2024-10-27 11:48:01 | Epoch: 3 | Step: 82310 | Dataset: 0-6664703 | Loss: 1.723 | 676 ms/step , 58179.87 GFLOP/s , 531939.1 tokens/s INFO:__main__:2024-10-27 11:48:09 | Epoch: 3 | Step: 82320 | Dataset: 0-6672703 | Loss: 2.457 | 675 ms/step , 58225.07 GFLOP/s , 532120.3 tokens/s INFO:__main__:2024-10-27 11:48:17 | Epoch: 3 | Step: 82330 | Dataset: 0-6680703 | Loss: 2.160 | 676 ms/step , 58182.60 GFLOP/s , 532731.5 tokens/s INFO:__main__:2024-10-27 11:48:24 | Epoch: 3 | Step: 82340 | Dataset: 0-6688703 | Loss: 2.113 | 676 ms/step , 58190.28 GFLOP/s , 532597.3 tokens/s INFO:__main__:2024-10-27 11:48:32 | Epoch: 3 | Step: 82350 | Dataset: 0-6696703 | Loss: 2.075 | 676 ms/step , 58174.32 GFLOP/s , 532116.4 tokens/s INFO:__main__:2024-10-27 11:48:40 | Epoch: 3 | Step: 82360 | Dataset: 0-6704703 | Loss: 2.080 | 676 ms/step , 58190.77 GFLOP/s , 532257.2 tokens/s INFO:__main__:2024-10-27 11:48:47 | Epoch: 3 | Step: 82370 | Dataset: 0-6712703 | Loss: 2.053 | 675 ms/step , 58256.92 GFLOP/s , 532832.2 tokens/s INFO:__main__:2024-10-27 11:48:55 | Epoch: 3 | Step: 82380 | Dataset: 0-6720703 | Loss: 2.001 | 675 ms/step , 58251.87 GFLOP/s , 532889.4 tokens/s INFO:__main__:2024-10-27 11:49:03 | Epoch: 3 | Step: 82390 | Dataset: 0-6728703 | Loss: 2.000 | 676 ms/step , 58156.01 GFLOP/s , 532074.0 tokens/s INFO:__main__:2024-10-27 11:49:11 | Epoch: 3 | Step: 82400 | Dataset: 0-6736703 | Loss: 1.952 | 675 ms/step , 58221.35 GFLOP/s , 532267.4 tokens/s INFO:__main__:2024-10-27 11:49:18 | Epoch: 3 | Step: 82410 | Dataset: 0-6744703 | Loss: 1.949 | 675 ms/step , 58194.37 GFLOP/s , 532188.9 tokens/s INFO:__main__:2024-10-27 11:49:26 | Epoch: 3 | Step: 82420 | Dataset: 0-6752703 | Loss: 2.080 | 676 ms/step , 58149.92 GFLOP/s , 532522.5 tokens/s INFO:__main__:2024-10-27 11:49:34 | Epoch: 3 | Step: 82430 | Dataset: 0-6760703 | Loss: 2.066 | 676 ms/step , 58113.03 GFLOP/s , 532440.7 tokens/s INFO:__main__:2024-10-27 11:49:41 | Epoch: 3 | Step: 82440 | Dataset: 0-6768703 | Loss: 1.990 | 675 ms/step , 58232.87 GFLOP/s , 532620.8 tokens/s INFO:__main__:2024-10-27 11:49:49 | Epoch: 3 | Step: 82450 | Dataset: 0-6776703 | Loss: 2.059 | 675 ms/step , 58269.61 GFLOP/s , 532697.2 tokens/s INFO:__main__:2024-10-27 11:49:57 | Epoch: 3 | Step: 82460 | Dataset: 0-6784703 | Loss: 1.928 | 676 ms/step , 58188.27 GFLOP/s , 533198.7 tokens/s INFO:__main__:2024-10-27 11:50:04 | Epoch: 3 | Step: 82470 | Dataset: 0-6792703 | Loss: 1.984 | 677 ms/step , 58081.27 GFLOP/s , 531331.1 tokens/s INFO:__main__:2024-10-27 11:50:12 | Epoch: 3 | Step: 82480 | Dataset: 0-6800703 | Loss: 1.857 | 676 ms/step , 58146.55 GFLOP/s , 530692.5 tokens/s INFO:__main__:2024-10-27 11:50:20 | Epoch: 3 | Step: 82490 | Dataset: 0-6808703 | Loss: 1.810 | 677 ms/step , 58028.73 GFLOP/s , 530997.2 tokens/s INFO:__main__:2024-10-27 11:50:28 | Epoch: 3 | Step: 82500 | Dataset: 0-6816703 | Loss: 1.763 | 676 ms/step , 58191.84 GFLOP/s , 529342.9 tokens/s INFO:__main__:2024-10-27 11:50:35 | Epoch: 3 | Step: 82510 | Dataset: 0-6824703 | Loss: 1.781 | 675 ms/step , 58197.51 GFLOP/s , 531491.2 tokens/s INFO:__main__:2024-10-27 11:50:43 | Epoch: 3 | Step: 82520 | Dataset: 0-6832703 | Loss: 1.777 | 676 ms/step , 58154.53 GFLOP/s , 531019.5 tokens/s INFO:__main__:2024-10-27 11:50:51 | Epoch: 3 | Step: 82530 | Dataset: 0-6840703 | Loss: 1.767 | 677 ms/step , 58105.09 GFLOP/s , 530625.9 tokens/s INFO:__main__:2024-10-27 11:50:58 | Epoch: 3 | Step: 82540 | Dataset: 0-6848703 | Loss: 1.725 | 676 ms/step , 58112.83 GFLOP/s , 529336.5 tokens/s INFO:__main__:2024-10-27 11:51:06 | Epoch: 3 | Step: 82550 | Dataset: 0-6856703 | Loss: 1.742 | 675 ms/step , 58237.65 GFLOP/s , 532153.0 tokens/s INFO:__main__:2024-10-27 11:51:14 | Epoch: 3 | Step: 82560 | Dataset: 0-6864703 | Loss: 1.726 | 675 ms/step , 58265.20 GFLOP/s , 532169.8 tokens/s INFO:__main__:2024-10-27 11:51:22 | Epoch: 3 | Step: 82570 | Dataset: 0-6872703 | Loss: 2.359 | 675 ms/step , 58198.73 GFLOP/s , 532052.3 tokens/s INFO:__main__:2024-10-27 11:51:29 | Epoch: 3 | Step: 82580 | Dataset: 0-6880703 | Loss: 2.261 | 676 ms/step , 58163.89 GFLOP/s , 532108.7 tokens/s INFO:__main__:2024-10-27 11:51:37 | Epoch: 3 | Step: 82590 | Dataset: 0-6888703 | Loss: 2.319 | 677 ms/step , 58091.17 GFLOP/s , 531601.0 tokens/s INFO:__main__:2024-10-27 11:51:45 | Epoch: 3 | Step: 82600 | Dataset: 0-6896703 | Loss: 2.252 | 675 ms/step , 58216.90 GFLOP/s , 532475.6 tokens/s INFO:__main__:2024-10-27 11:51:52 | Epoch: 3 | Step: 82610 | Dataset: 0-6904703 | Loss: 2.236 | 674 ms/step , 58335.30 GFLOP/s , 532894.6 tokens/s INFO:__main__:2024-10-27 11:52:00 | Epoch: 3 | Step: 82620 | Dataset: 0-6912703 | Loss: 2.226 | 678 ms/step , 57993.11 GFLOP/s , 532551.0 tokens/s INFO:__main__:2024-10-27 11:52:08 | Epoch: 3 | Step: 82630 | Dataset: 0-6920703 | Loss: 2.225 | 677 ms/step , 58080.06 GFLOP/s , 531123.7 tokens/s INFO:__main__:2024-10-27 11:52:15 | Epoch: 3 | Step: 82640 | Dataset: 0-6928703 | Loss: 2.217 | 678 ms/step , 57938.64 GFLOP/s , 531031.0 tokens/s INFO:__main__:2024-10-27 11:52:23 | Epoch: 3 | Step: 82650 | Dataset: 0-6936703 | Loss: 2.170 | 674 ms/step , 58320.56 GFLOP/s , 531751.4 tokens/s INFO:__main__:2024-10-27 11:52:31 | Epoch: 3 | Step: 82660 | Dataset: 0-6944703 | Loss: 2.172 | 675 ms/step , 58276.51 GFLOP/s , 533257.0 tokens/s INFO:__main__:2024-10-27 11:52:38 | Epoch: 3 | Step: 82670 | Dataset: 0-6952703 | Loss: 2.221 | 676 ms/step , 58163.98 GFLOP/s , 532574.3 tokens/s INFO:__main__:2024-10-27 11:52:46 | Epoch: 3 | Step: 82680 | Dataset: 0-6960703 | Loss: 2.212 | 674 ms/step , 58281.67 GFLOP/s , 532896.3 tokens/s INFO:__main__:2024-10-27 11:52:54 | Epoch: 3 | Step: 82690 | Dataset: 0-6968703 | Loss: 2.160 | 675 ms/step , 58240.98 GFLOP/s , 533091.2 tokens/s INFO:__main__:2024-10-27 11:53:02 | Epoch: 3 | Step: 82700 | Dataset: 0-6976703 | Loss: 2.153 | 676 ms/step , 58171.21 GFLOP/s , 532672.5 tokens/s INFO:__main__:2024-10-27 11:53:09 | Epoch: 3 | Step: 82710 | Dataset: 0-6984703 | Loss: 2.193 | 675 ms/step , 58234.56 GFLOP/s , 532186.1 tokens/s INFO:__main__:2024-10-27 11:53:17 | Epoch: 3 | Step: 82720 | Dataset: 0-6992703 | Loss: 2.162 | 676 ms/step , 58116.40 GFLOP/s , 532300.3 tokens/s INFO:__main__:2024-10-27 11:53:25 | Epoch: 3 | Step: 82730 | Dataset: 0-7000703 | Loss: 2.139 | 676 ms/step , 58138.68 GFLOP/s , 532481.6 tokens/s INFO:__main__:2024-10-27 11:53:32 | Epoch: 3 | Step: 82740 | Dataset: 0-7008703 | Loss: 2.192 | 675 ms/step , 58225.77 GFLOP/s , 532437.5 tokens/s INFO:__main__:2024-10-27 11:53:40 | Epoch: 3 | Step: 82750 | Dataset: 0-7016703 | Loss: 2.179 | 674 ms/step , 58300.96 GFLOP/s , 533011.3 tokens/s INFO:__main__:2024-10-27 11:53:48 | Epoch: 3 | Step: 82760 | Dataset: 0-7024703 | Loss: 2.131 | 674 ms/step , 58289.86 GFLOP/s , 533157.5 tokens/s INFO:__main__:2024-10-27 11:53:55 | Epoch: 3 | Step: 82770 | Dataset: 0-7032703 | Loss: 2.175 | 675 ms/step , 58219.19 GFLOP/s , 533054.2 tokens/s INFO:__main__:2024-10-27 11:54:03 | Epoch: 3 | Step: 82780 | Dataset: 0-7040703 | Loss: 2.199 | 675 ms/step , 58213.03 GFLOP/s , 531559.7 tokens/s INFO:__main__:2024-10-27 11:54:11 | Epoch: 3 | Step: 82790 | Dataset: 0-7048703 | Loss: 2.186 | 676 ms/step , 58142.79 GFLOP/s , 532309.0 tokens/s INFO:__main__:2024-10-27 11:54:18 | Epoch: 3 | Step: 82800 | Dataset: 0-7056703 | Loss: 2.151 | 676 ms/step , 58115.45 GFLOP/s , 532150.5 tokens/s INFO:__main__:2024-10-27 11:54:26 | Epoch: 3 | Step: 82810 | Dataset: 0-7064703 | Loss: 2.211 | 675 ms/step , 58265.42 GFLOP/s , 532869.6 tokens/s INFO:__main__:2024-10-27 11:54:34 | Epoch: 3 | Step: 82820 | Dataset: 0-7072703 | Loss: 2.086 | 677 ms/step , 58086.55 GFLOP/s , 532665.1 tokens/s INFO:__main__:2024-10-27 11:54:42 | Epoch: 3 | Step: 82830 | Dataset: 0-7080703 | Loss: 2.167 | 676 ms/step , 58152.85 GFLOP/s , 532191.7 tokens/s INFO:__main__:2024-10-27 11:54:49 | Epoch: 3 | Step: 82840 | Dataset: 0-7088703 | Loss: 2.152 | 675 ms/step , 58241.91 GFLOP/s , 533055.0 tokens/s INFO:__main__:2024-10-27 11:54:57 | Epoch: 3 | Step: 82850 | Dataset: 0-7096703 | Loss: 2.167 | 675 ms/step , 58277.48 GFLOP/s , 532530.2 tokens/s INFO:__main__:2024-10-27 11:55:05 | Epoch: 3 | Step: 82860 | Dataset: 0-7104703 | Loss: 2.151 | 676 ms/step , 58138.34 GFLOP/s , 532036.3 tokens/s INFO:__main__:2024-10-27 11:55:12 | Epoch: 3 | Step: 82870 | Dataset: 0-7112703 | Loss: 2.075 | 676 ms/step , 58149.51 GFLOP/s , 532066.0 tokens/s INFO:__main__:2024-10-27 11:55:20 | Epoch: 3 | Step: 82880 | Dataset: 0-7120703 | Loss: 2.207 | 676 ms/step , 58132.51 GFLOP/s , 532085.8 tokens/s INFO:__main__:2024-10-27 11:55:28 | Epoch: 3 | Step: 82890 | Dataset: 0-7128703 | Loss: 1.897 | 677 ms/step , 58065.43 GFLOP/s , 531778.4 tokens/s INFO:__main__:2024-10-27 11:55:35 | Epoch: 3 | Step: 82900 | Dataset: 0-7136703 | Loss: 1.730 | 676 ms/step , 58134.99 GFLOP/s , 531506.4 tokens/s INFO:__main__:2024-10-27 11:55:43 | Epoch: 3 | Step: 82910 | Dataset: 0-7144703 | Loss: 1.681 | 676 ms/step , 58187.21 GFLOP/s , 531755.7 tokens/s INFO:__main__:2024-10-27 11:55:51 | Epoch: 3 | Step: 82920 | Dataset: 0-7152703 | Loss: 1.694 | 675 ms/step , 58212.89 GFLOP/s , 531837.9 tokens/s INFO:__main__:2024-10-27 11:55:59 | Epoch: 3 | Step: 82930 | Dataset: 0-7160703 | Loss: 1.666 | 676 ms/step , 58176.29 GFLOP/s , 531647.7 tokens/s INFO:__main__:2024-10-27 11:56:06 | Epoch: 3 | Step: 82940 | Dataset: 0-7168703 | Loss: 1.678 | 677 ms/step , 58091.84 GFLOP/s , 531533.2 tokens/s INFO:__main__:2024-10-27 11:56:14 | Epoch: 3 | Step: 82950 | Dataset: 0-7176703 | Loss: 1.662 | 674 ms/step , 58299.96 GFLOP/s , 532098.8 tokens/s INFO:__main__:2024-10-27 11:56:22 | Epoch: 3 | Step: 82960 | Dataset: 0-7184703 | Loss: 1.633 | 676 ms/step , 58125.22 GFLOP/s , 531839.0 tokens/s INFO:__main__:2024-10-27 11:56:29 | Epoch: 3 | Step: 82970 | Dataset: 0-7192703 | Loss: 1.636 | 676 ms/step , 58183.00 GFLOP/s , 531859.0 tokens/s INFO:__main__:2024-10-27 11:56:37 | Epoch: 3 | Step: 82980 | Dataset: 0-7200703 | Loss: 2.247 | 675 ms/step , 58217.27 GFLOP/s , 531850.7 tokens/s INFO:__main__:2024-10-27 11:56:45 | Epoch: 3 | Step: 82990 | Dataset: 0-7208703 | Loss: 2.186 | 677 ms/step , 58086.95 GFLOP/s , 532182.5 tokens/s INFO:__main__:2024-10-27 11:56:52 | Validation | Step: 83000 | Val_loss: 2.272 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 11:56:52 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_115652_step_83000.pt` INFO:__main__:2024-10-27 11:56:53 | Epoch: 3 | Step: 83000 | Dataset: 0-7216703 | Loss: 2.204 | 675 ms/step , 58200.04 GFLOP/s , 479697.6 tokens/s INFO:__main__:2024-10-27 11:57:01 | Epoch: 3 | Step: 83010 | Dataset: 0-7224703 | Loss: 2.241 | 677 ms/step , 58022.82 GFLOP/s , 530563.7 tokens/s INFO:__main__:2024-10-27 11:57:09 | Epoch: 3 | Step: 83020 | Dataset: 0-7232703 | Loss: 2.182 | 676 ms/step , 58156.19 GFLOP/s , 532066.7 tokens/s INFO:__main__:2024-10-27 11:57:16 | Epoch: 3 | Step: 83030 | Dataset: 0-7240703 | Loss: 2.206 | 676 ms/step , 58181.63 GFLOP/s , 532226.5 tokens/s INFO:__main__:2024-10-27 11:57:24 | Epoch: 3 | Step: 83040 | Dataset: 0-7248703 | Loss: 2.121 | 674 ms/step , 58297.15 GFLOP/s , 533163.0 tokens/s INFO:__main__:2024-10-27 11:57:32 | Epoch: 3 | Step: 83050 | Dataset: 0-7256703 | Loss: 2.143 | 676 ms/step , 58138.75 GFLOP/s , 532212.5 tokens/s INFO:__main__:2024-10-27 11:57:39 | Epoch: 3 | Step: 83060 | Dataset: 0-7264703 | Loss: 2.185 | 676 ms/step , 58155.09 GFLOP/s , 532562.5 tokens/s INFO:__main__:2024-10-27 11:57:47 | Epoch: 3 | Step: 83070 | Dataset: 0-7272703 | Loss: 2.122 | 675 ms/step , 58245.32 GFLOP/s , 532338.0 tokens/s INFO:__main__:2024-10-27 11:57:55 | Epoch: 3 | Step: 83080 | Dataset: 0-7280703 | Loss: 2.163 | 675 ms/step , 58199.41 GFLOP/s , 532505.0 tokens/s INFO:__main__:2024-10-27 11:58:03 | Epoch: 3 | Step: 83090 | Dataset: 0-7288703 | Loss: 2.083 | 675 ms/step , 58216.83 GFLOP/s , 532195.5 tokens/s INFO:__main__:2024-10-27 11:58:10 | Epoch: 3 | Step: 83100 | Dataset: 0-7296703 | Loss: 2.187 | 676 ms/step , 58188.93 GFLOP/s , 532494.7 tokens/s INFO:__main__:2024-10-27 11:58:18 | Epoch: 3 | Step: 83110 | Dataset: 0-7304703 | Loss: 2.071 | 675 ms/step , 58240.96 GFLOP/s , 532582.0 tokens/s INFO:__main__:2024-10-27 11:58:26 | Epoch: 3 | Step: 83120 | Dataset: 0-7312703 | Loss: 2.098 | 676 ms/step , 58174.21 GFLOP/s , 532603.0 tokens/s INFO:__main__:2024-10-27 11:58:33 | Epoch: 3 | Step: 83130 | Dataset: 0-7320703 | Loss: 2.130 | 674 ms/step , 58296.64 GFLOP/s , 532911.7 tokens/s INFO:__main__:2024-10-27 11:58:41 | Epoch: 3 | Step: 83140 | Dataset: 0-7328703 | Loss: 1.841 | 675 ms/step , 58214.98 GFLOP/s , 532184.6 tokens/s INFO:__main__:2024-10-27 11:58:49 | Epoch: 3 | Step: 83150 | Dataset: 0-7336703 | Loss: 1.770 | 676 ms/step , 58134.24 GFLOP/s , 531826.4 tokens/s INFO:__main__:2024-10-27 11:58:56 | Epoch: 3 | Step: 83160 | Dataset: 0-7344703 | Loss: 1.775 | 678 ms/step , 57973.40 GFLOP/s , 531615.9 tokens/s INFO:__main__:2024-10-27 11:59:04 | Epoch: 3 | Step: 83170 | Dataset: 0-7352703 | Loss: 1.788 | 676 ms/step , 58158.22 GFLOP/s , 532387.7 tokens/s INFO:__main__:2024-10-27 11:59:12 | Epoch: 3 | Step: 83180 | Dataset: 0-7360703 | Loss: 1.771 | 674 ms/step , 58291.72 GFLOP/s , 532361.7 tokens/s INFO:__main__:2024-10-27 11:59:20 | Epoch: 3 | Step: 83190 | Dataset: 0-7368703 | Loss: 1.749 | 678 ms/step , 58010.34 GFLOP/s , 530709.8 tokens/s INFO:__main__:2024-10-27 11:59:27 | Epoch: 3 | Step: 83200 | Dataset: 0-7376703 | Loss: 1.730 | 677 ms/step , 58029.28 GFLOP/s , 530616.2 tokens/s INFO:__main__:2024-10-27 11:59:35 | Epoch: 3 | Step: 83210 | Dataset: 0-7384703 | Loss: 1.744 | 676 ms/step , 58159.12 GFLOP/s , 532143.4 tokens/s INFO:__main__:2024-10-27 11:59:43 | Epoch: 3 | Step: 83220 | Dataset: 0-7392703 | Loss: 1.750 | 674 ms/step , 58289.97 GFLOP/s , 532372.7 tokens/s INFO:__main__:2024-10-27 11:59:50 | Epoch: 3 | Step: 83230 | Dataset: 0-7400703 | Loss: 2.295 | 677 ms/step , 58103.31 GFLOP/s , 532468.3 tokens/s INFO:__main__:2024-10-27 11:59:58 | Epoch: 3 | Step: 83240 | Dataset: 0-7408703 | Loss: 2.177 | 676 ms/step , 58190.33 GFLOP/s , 532923.7 tokens/s INFO:__main__:2024-10-27 12:00:05 | Epoch: 3 | Step: 83250 | Dataset: 0-7416703 | Loss: 2.185 | 675 ms/step , 58201.15 GFLOP/s , 611072.9 tokens/s INFO:__main__:2024-10-27 12:00:12 | Epoch: 3 | Step: 83260 | Dataset: 0-7424703 | Loss: 2.157 | 676 ms/step , 58168.35 GFLOP/s , 532523.5 tokens/s INFO:__main__:2024-10-27 12:00:20 | Epoch: 3 | Step: 83270 | Dataset: 0-7432703 | Loss: 2.200 | 674 ms/step , 58312.73 GFLOP/s , 532811.8 tokens/s INFO:__main__:2024-10-27 12:00:28 | Epoch: 3 | Step: 83280 | Dataset: 0-7440703 | Loss: 2.166 | 675 ms/step , 58221.67 GFLOP/s , 532811.2 tokens/s INFO:__main__:2024-10-27 12:00:35 | Epoch: 3 | Step: 83290 | Dataset: 0-7448703 | Loss: 2.130 | 674 ms/step , 58320.17 GFLOP/s , 532591.1 tokens/s INFO:__main__:2024-10-27 12:00:43 | Epoch: 3 | Step: 83300 | Dataset: 0-7456703 | Loss: 2.208 | 675 ms/step , 58210.78 GFLOP/s , 532780.0 tokens/s INFO:__main__:2024-10-27 12:00:51 | Epoch: 3 | Step: 83310 | Dataset: 0-7464703 | Loss: 2.101 | 681 ms/step , 57708.43 GFLOP/s , 530962.5 tokens/s INFO:__main__:2024-10-27 12:00:59 | Epoch: 3 | Step: 83320 | Dataset: 0-7472703 | Loss: 2.237 | 679 ms/step , 57878.13 GFLOP/s , 529678.0 tokens/s INFO:__main__:2024-10-27 12:01:06 | Epoch: 3 | Step: 83330 | Dataset: 0-7480703 | Loss: 2.149 | 678 ms/step , 57946.92 GFLOP/s , 529877.5 tokens/s INFO:__main__:2024-10-27 12:01:14 | Epoch: 3 | Step: 83340 | Dataset: 0-7488703 | Loss: 2.082 | 680 ms/step , 57817.75 GFLOP/s , 529914.9 tokens/s INFO:__main__:2024-10-27 12:01:22 | Epoch: 3 | Step: 83350 | Dataset: 0-7496703 | Loss: 2.174 | 679 ms/step , 57889.49 GFLOP/s , 529606.9 tokens/s INFO:__main__:2024-10-27 12:01:30 | Epoch: 3 | Step: 83360 | Dataset: 0-7504703 | Loss: 2.014 | 679 ms/step , 57911.86 GFLOP/s , 529205.5 tokens/s INFO:__main__:2024-10-27 12:01:37 | Epoch: 3 | Step: 83370 | Dataset: 0-7512703 | Loss: 2.102 | 678 ms/step , 57971.13 GFLOP/s , 529603.0 tokens/s INFO:__main__:2024-10-27 12:01:45 | Epoch: 3 | Step: 83380 | Dataset: 0-7520703 | Loss: 2.121 | 679 ms/step , 57878.66 GFLOP/s , 529548.3 tokens/s INFO:__main__:2024-10-27 12:01:53 | Epoch: 3 | Step: 83390 | Dataset: 0-7528703 | Loss: 2.248 | 678 ms/step , 57983.41 GFLOP/s , 530116.6 tokens/s INFO:__main__:2024-10-27 12:02:00 | Epoch: 3 | Step: 83400 | Dataset: 0-7536703 | Loss: 2.139 | 677 ms/step , 58069.72 GFLOP/s , 530278.4 tokens/s INFO:__main__:2024-10-27 12:02:08 | Epoch: 3 | Step: 83410 | Dataset: 0-7544703 | Loss: 2.173 | 677 ms/step , 58074.79 GFLOP/s , 531234.8 tokens/s INFO:__main__:2024-10-27 12:02:16 | Epoch: 3 | Step: 83420 | Dataset: 0-7552703 | Loss: 2.097 | 678 ms/step , 58015.11 GFLOP/s , 531210.3 tokens/s INFO:__main__:2024-10-27 12:02:24 | Epoch: 3 | Step: 83430 | Dataset: 0-7560703 | Loss: 2.086 | 678 ms/step , 57951.41 GFLOP/s , 530921.5 tokens/s INFO:__main__:2024-10-27 12:02:31 | Epoch: 3 | Step: 83440 | Dataset: 0-7568703 | Loss: 2.123 | 679 ms/step , 57930.51 GFLOP/s , 530543.6 tokens/s INFO:__main__:2024-10-27 12:02:39 | Epoch: 3 | Step: 83450 | Dataset: 0-7576703 | Loss: 2.136 | 677 ms/step , 58042.97 GFLOP/s , 530535.3 tokens/s INFO:__main__:2024-10-27 12:02:47 | Epoch: 3 | Step: 83460 | Dataset: 0-7584703 | Loss: 2.216 | 678 ms/step , 57997.97 GFLOP/s , 530852.5 tokens/s INFO:__main__:2024-10-27 12:02:54 | Epoch: 3 | Step: 83470 | Dataset: 0-7592703 | Loss: 2.116 | 679 ms/step , 57888.42 GFLOP/s , 530655.6 tokens/s INFO:__main__:2024-10-27 12:03:02 | Epoch: 3 | Step: 83480 | Dataset: 0-7600703 | Loss: 2.132 | 680 ms/step , 57784.92 GFLOP/s , 528095.0 tokens/s INFO:__main__:2024-10-27 12:03:10 | Epoch: 3 | Step: 83490 | Dataset: 0-7608703 | Loss: 2.078 | 680 ms/step , 57772.96 GFLOP/s , 528398.3 tokens/s INFO:__main__:2024-10-27 12:03:18 | Epoch: 3 | Step: 83500 | Dataset: 0-7616703 | Loss: 2.157 | 681 ms/step , 57687.48 GFLOP/s , 528429.9 tokens/s INFO:__main__:2024-10-27 12:03:25 | Epoch: 3 | Step: 83510 | Dataset: 0-7624703 | Loss: 2.166 | 681 ms/step , 57726.93 GFLOP/s , 528423.3 tokens/s INFO:__main__:2024-10-27 12:03:33 | Epoch: 3 | Step: 83520 | Dataset: 0-7632703 | Loss: 2.144 | 681 ms/step , 57752.97 GFLOP/s , 528681.1 tokens/s INFO:__main__:2024-10-27 12:03:41 | Epoch: 3 | Step: 83530 | Dataset: 0-7640703 | Loss: 2.201 | 681 ms/step , 57762.19 GFLOP/s , 528435.1 tokens/s INFO:__main__:2024-10-27 12:03:49 | Epoch: 3 | Step: 83540 | Dataset: 0-7648703 | Loss: 2.155 | 678 ms/step , 58013.92 GFLOP/s , 528694.6 tokens/s INFO:__main__:2024-10-27 12:03:56 | Epoch: 3 | Step: 83550 | Dataset: 0-7656703 | Loss: 2.255 | 674 ms/step , 58288.50 GFLOP/s , 532711.3 tokens/s INFO:__main__:2024-10-27 12:04:04 | Epoch: 3 | Step: 83560 | Dataset: 0-7664703 | Loss: 2.192 | 679 ms/step , 57920.32 GFLOP/s , 531337.1 tokens/s INFO:__main__:2024-10-27 12:04:12 | Epoch: 3 | Step: 83570 | Dataset: 0-7672703 | Loss: 2.183 | 678 ms/step , 58016.41 GFLOP/s , 531033.9 tokens/s INFO:__main__:2024-10-27 12:04:20 | Epoch: 3 | Step: 83580 | Dataset: 0-7680703 | Loss: 2.236 | 678 ms/step , 57950.26 GFLOP/s , 530903.4 tokens/s INFO:__main__:2024-10-27 12:04:27 | Epoch: 3 | Step: 83590 | Dataset: 0-7688703 | Loss: 2.273 | 677 ms/step , 58036.57 GFLOP/s , 530580.6 tokens/s INFO:__main__:2024-10-27 12:04:35 | Epoch: 3 | Step: 83600 | Dataset: 0-7696703 | Loss: 2.189 | 678 ms/step , 58008.41 GFLOP/s , 529690.7 tokens/s INFO:__main__:2024-10-27 12:04:43 | Epoch: 3 | Step: 83610 | Dataset: 0-7704703 | Loss: 2.154 | 678 ms/step , 58009.50 GFLOP/s , 530588.9 tokens/s INFO:__main__:2024-10-27 12:04:50 | Epoch: 3 | Step: 83620 | Dataset: 0-7712703 | Loss: 2.178 | 678 ms/step , 57993.20 GFLOP/s , 530382.6 tokens/s INFO:__main__:2024-10-27 12:04:58 | Epoch: 3 | Step: 83630 | Dataset: 0-7720703 | Loss: 2.173 | 676 ms/step , 58162.24 GFLOP/s , 531428.9 tokens/s INFO:__main__:2024-10-27 12:05:06 | Epoch: 3 | Step: 83640 | Dataset: 0-7728703 | Loss: 2.141 | 676 ms/step , 58131.60 GFLOP/s , 531106.9 tokens/s INFO:__main__:2024-10-27 12:05:14 | Epoch: 3 | Step: 83650 | Dataset: 0-7736703 | Loss: 2.220 | 677 ms/step , 58100.64 GFLOP/s , 531422.2 tokens/s INFO:__main__:2024-10-27 12:05:21 | Epoch: 3 | Step: 83660 | Dataset: 0-7744703 | Loss: 2.201 | 676 ms/step , 58182.16 GFLOP/s , 528399.7 tokens/s INFO:__main__:2024-10-27 12:05:29 | Epoch: 3 | Step: 83670 | Dataset: 0-7752703 | Loss: 2.207 | 677 ms/step , 58058.73 GFLOP/s , 531398.7 tokens/s INFO:__main__:2024-10-27 12:05:37 | Epoch: 3 | Step: 83680 | Dataset: 0-7760703 | Loss: 2.191 | 677 ms/step , 58044.46 GFLOP/s , 531001.8 tokens/s INFO:__main__:2024-10-27 12:05:44 | Epoch: 3 | Step: 83690 | Dataset: 0-7768703 | Loss: 2.162 | 678 ms/step , 57990.94 GFLOP/s , 530476.0 tokens/s INFO:__main__:2024-10-27 12:05:52 | Epoch: 3 | Step: 83700 | Dataset: 0-7776703 | Loss: 2.114 | 677 ms/step , 58080.31 GFLOP/s , 531126.3 tokens/s INFO:__main__:2024-10-27 12:06:00 | Epoch: 3 | Step: 83710 | Dataset: 0-7784703 | Loss: 2.181 | 676 ms/step , 58108.86 GFLOP/s , 531044.6 tokens/s INFO:__main__:2024-10-27 12:06:08 | Epoch: 3 | Step: 83720 | Dataset: 0-7792703 | Loss: 2.131 | 678 ms/step , 57978.26 GFLOP/s , 530766.6 tokens/s INFO:__main__:2024-10-27 12:06:15 | Epoch: 3 | Step: 83730 | Dataset: 0-7800703 | Loss: 2.139 | 678 ms/step , 57997.23 GFLOP/s , 554195.7 tokens/s INFO:__main__:2024-10-27 12:06:23 | Epoch: 3 | Step: 83740 | Dataset: 0-7808703 | Loss: 2.059 | 677 ms/step , 58077.98 GFLOP/s , 531484.5 tokens/s INFO:__main__:2024-10-27 12:06:30 | Epoch: 3 | Step: 83750 | Dataset: 0-7816703 | Loss: 2.158 | 676 ms/step , 58163.80 GFLOP/s , 531323.6 tokens/s INFO:__main__:2024-10-27 12:06:38 | Epoch: 3 | Step: 83760 | Dataset: 0-7824703 | Loss: 2.096 | 675 ms/step , 58194.12 GFLOP/s , 532018.6 tokens/s INFO:__main__:2024-10-27 12:06:46 | Epoch: 3 | Step: 83770 | Dataset: 0-7832703 | Loss: 2.148 | 675 ms/step , 58203.33 GFLOP/s , 532150.0 tokens/s INFO:__main__:2024-10-27 12:06:54 | Epoch: 3 | Step: 83780 | Dataset: 0-7840703 | Loss: 2.132 | 676 ms/step , 58136.89 GFLOP/s , 531987.2 tokens/s INFO:__main__:2024-10-27 12:07:01 | Epoch: 3 | Step: 83790 | Dataset: 0-7848703 | Loss: 2.141 | 675 ms/step , 58195.12 GFLOP/s , 532546.5 tokens/s INFO:__main__:2024-10-27 12:07:09 | Epoch: 3 | Step: 83800 | Dataset: 0-7856703 | Loss: 2.092 | 676 ms/step , 58136.83 GFLOP/s , 532251.9 tokens/s INFO:__main__:2024-10-27 12:07:17 | Epoch: 3 | Step: 83810 | Dataset: 0-7864703 | Loss: 2.083 | 675 ms/step , 58278.39 GFLOP/s , 532872.9 tokens/s INFO:__main__:2024-10-27 12:07:24 | Epoch: 3 | Step: 83820 | Dataset: 0-7872703 | Loss: 2.083 | 675 ms/step , 58204.94 GFLOP/s , 532535.4 tokens/s INFO:__main__:2024-10-27 12:07:32 | Epoch: 3 | Step: 83830 | Dataset: 0-7880703 | Loss: 2.050 | 674 ms/step , 58291.21 GFLOP/s , 533263.5 tokens/s INFO:__main__:2024-10-27 12:07:40 | Epoch: 3 | Step: 83840 | Dataset: 0-7888703 | Loss: 2.124 | 675 ms/step , 58239.39 GFLOP/s , 532927.9 tokens/s INFO:__main__:2024-10-27 12:07:47 | Epoch: 3 | Step: 83850 | Dataset: 0-7896703 | Loss: 2.113 | 675 ms/step , 58251.73 GFLOP/s , 533098.0 tokens/s INFO:__main__:2024-10-27 12:07:55 | Epoch: 3 | Step: 83860 | Dataset: 0-7904703 | Loss: 2.065 | 675 ms/step , 58269.86 GFLOP/s , 532782.7 tokens/s INFO:__main__:2024-10-27 12:08:03 | Epoch: 3 | Step: 83870 | Dataset: 0-7912703 | Loss: 2.203 | 675 ms/step , 58228.99 GFLOP/s , 532217.3 tokens/s INFO:__main__:2024-10-27 12:08:10 | Epoch: 3 | Step: 83880 | Dataset: 0-7920703 | Loss: 2.234 | 675 ms/step , 58246.67 GFLOP/s , 532837.9 tokens/s INFO:__main__:2024-10-27 12:08:18 | Epoch: 3 | Step: 83890 | Dataset: 0-7928703 | Loss: 2.247 | 675 ms/step , 58215.73 GFLOP/s , 532394.9 tokens/s INFO:__main__:2024-10-27 12:08:26 | Epoch: 3 | Step: 83900 | Dataset: 0-7936703 | Loss: 2.148 | 675 ms/step , 58196.10 GFLOP/s , 532507.1 tokens/s INFO:__main__:2024-10-27 12:08:33 | Epoch: 3 | Step: 83910 | Dataset: 0-7944703 | Loss: 2.093 | 675 ms/step , 58245.91 GFLOP/s , 532519.6 tokens/s INFO:__main__:2024-10-27 12:08:41 | Epoch: 3 | Step: 83920 | Dataset: 0-7952703 | Loss: 2.126 | 675 ms/step , 58262.72 GFLOP/s , 532704.3 tokens/s INFO:__main__:2024-10-27 12:08:49 | Epoch: 3 | Step: 83930 | Dataset: 0-7960703 | Loss: 2.096 | 676 ms/step , 58130.74 GFLOP/s , 530837.5 tokens/s INFO:__main__:2024-10-27 12:08:57 | Epoch: 3 | Step: 83940 | Dataset: 0-7968703 | Loss: 2.191 | 676 ms/step , 58163.94 GFLOP/s , 530794.5 tokens/s INFO:__main__:2024-10-27 12:09:04 | Epoch: 3 | Step: 83950 | Dataset: 0-7976703 | Loss: 2.138 | 675 ms/step , 58263.23 GFLOP/s , 531654.0 tokens/s INFO:__main__:2024-10-27 12:09:12 | Epoch: 3 | Step: 83960 | Dataset: 0-7984703 | Loss: 2.143 | 677 ms/step , 58101.08 GFLOP/s , 531345.5 tokens/s INFO:__main__:2024-10-27 12:09:20 | Epoch: 3 | Step: 83970 | Dataset: 0-7992703 | Loss: 2.171 | 676 ms/step , 58108.19 GFLOP/s , 531306.4 tokens/s INFO:__main__:2024-10-27 12:09:27 | Epoch: 3 | Step: 83980 | Dataset: 0-8000703 | Loss: 2.053 | 676 ms/step , 58189.81 GFLOP/s , 531423.9 tokens/s INFO:__main__:2024-10-27 12:09:35 | Epoch: 3 | Step: 83990 | Dataset: 0-8008703 | Loss: 2.119 | 676 ms/step , 58130.03 GFLOP/s , 530007.9 tokens/s INFO:__main__:2024-10-27 12:09:42 | Validation | Step: 84000 | Val_loss: 2.124 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 12:09:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_120942_step_84000.pt` INFO:__main__:2024-10-27 12:09:44 | Epoch: 3 | Step: 84000 | Dataset: 0-8016703 | Loss: 2.087 | 674 ms/step , 58364.97 GFLOP/s , 477618.9 tokens/s INFO:__main__:2024-10-27 12:09:51 | Epoch: 3 | Step: 84010 | Dataset: 0-8024703 | Loss: 2.172 | 675 ms/step , 58222.52 GFLOP/s , 531397.4 tokens/s INFO:__main__:2024-10-27 12:09:59 | Epoch: 3 | Step: 84020 | Dataset: 0-8032703 | Loss: 2.116 | 676 ms/step , 58152.12 GFLOP/s , 532022.6 tokens/s INFO:__main__:2024-10-27 12:10:07 | Epoch: 3 | Step: 84030 | Dataset: 0-8040703 | Loss: 2.212 | 675 ms/step , 58214.18 GFLOP/s , 532416.8 tokens/s INFO:__main__:2024-10-27 12:10:15 | Epoch: 3 | Step: 84040 | Dataset: 0-8048703 | Loss: 2.158 | 681 ms/step , 57723.63 GFLOP/s , 531758.0 tokens/s INFO:__main__:2024-10-27 12:10:22 | Epoch: 3 | Step: 84050 | Dataset: 0-8056703 | Loss: 2.086 | 675 ms/step , 58250.79 GFLOP/s , 531541.5 tokens/s INFO:__main__:2024-10-27 12:10:30 | Epoch: 3 | Step: 84060 | Dataset: 0-8064703 | Loss: 2.093 | 676 ms/step , 58154.35 GFLOP/s , 532461.9 tokens/s INFO:__main__:2024-10-27 12:10:38 | Epoch: 3 | Step: 84070 | Dataset: 0-8072703 | Loss: 2.191 | 674 ms/step , 58310.06 GFLOP/s , 532164.7 tokens/s INFO:__main__:2024-10-27 12:10:45 | Epoch: 3 | Step: 84080 | Dataset: 0-8080703 | Loss: 2.107 | 675 ms/step , 58218.50 GFLOP/s , 532919.6 tokens/s INFO:__main__:2024-10-27 12:10:53 | Epoch: 3 | Step: 84090 | Dataset: 0-8088703 | Loss: 2.071 | 676 ms/step , 58150.27 GFLOP/s , 532026.0 tokens/s INFO:__main__:2024-10-27 12:11:01 | Epoch: 3 | Step: 84100 | Dataset: 0-8096703 | Loss: 2.150 | 675 ms/step , 58217.51 GFLOP/s , 532508.3 tokens/s INFO:__main__:2024-10-27 12:11:08 | Epoch: 3 | Step: 84110 | Dataset: 0-8104703 | Loss: 2.129 | 676 ms/step , 58178.13 GFLOP/s , 532149.6 tokens/s INFO:__main__:2024-10-27 12:11:16 | Epoch: 3 | Step: 84120 | Dataset: 0-8112703 | Loss: 2.116 | 675 ms/step , 58255.41 GFLOP/s , 532135.8 tokens/s INFO:__main__:2024-10-27 12:11:24 | Epoch: 3 | Step: 84130 | Dataset: 0-8120703 | Loss: 2.156 | 675 ms/step , 58223.90 GFLOP/s , 532233.0 tokens/s INFO:__main__:2024-10-27 12:11:31 | Epoch: 3 | Step: 84140 | Dataset: 0-8128703 | Loss: 2.119 | 675 ms/step , 58199.88 GFLOP/s , 532464.7 tokens/s INFO:__main__:2024-10-27 12:11:39 | Epoch: 3 | Step: 84150 | Dataset: 0-8136703 | Loss: 2.109 | 675 ms/step , 58247.51 GFLOP/s , 531937.4 tokens/s INFO:__main__:2024-10-27 12:11:47 | Epoch: 3 | Step: 84160 | Dataset: 0-8144703 | Loss: 2.109 | 674 ms/step , 58325.35 GFLOP/s , 532251.8 tokens/s INFO:__main__:2024-10-27 12:11:55 | Epoch: 3 | Step: 84170 | Dataset: 0-8152703 | Loss: 2.099 | 676 ms/step , 58169.53 GFLOP/s , 532700.2 tokens/s INFO:__main__:2024-10-27 12:12:02 | Epoch: 3 | Step: 84180 | Dataset: 0-8160703 | Loss: 2.032 | 675 ms/step , 58198.71 GFLOP/s , 532274.6 tokens/s INFO:__main__:2024-10-27 12:12:10 | Epoch: 3 | Step: 84190 | Dataset: 0-8168703 | Loss: 2.113 | 676 ms/step , 58127.73 GFLOP/s , 531787.4 tokens/s INFO:__main__:2024-10-27 12:12:18 | Epoch: 3 | Step: 84200 | Dataset: 0-8176703 | Loss: 2.131 | 676 ms/step , 58138.81 GFLOP/s , 532391.4 tokens/s INFO:__main__:2024-10-27 12:12:25 | Epoch: 3 | Step: 84210 | Dataset: 0-8184703 | Loss: 2.069 | 677 ms/step , 58087.27 GFLOP/s , 531627.8 tokens/s INFO:__main__:2024-10-27 12:12:33 | Epoch: 3 | Step: 84220 | Dataset: 0-8192703 | Loss: 2.117 | 676 ms/step , 58144.52 GFLOP/s , 531609.2 tokens/s INFO:__main__:2024-10-27 12:12:41 | Epoch: 3 | Step: 84230 | Dataset: 0-8200703 | Loss: 2.083 | 675 ms/step , 58240.04 GFLOP/s , 532268.0 tokens/s INFO:__main__:2024-10-27 12:12:48 | Epoch: 3 | Step: 84240 | Dataset: 0-8208703 | Loss: 2.187 | 675 ms/step , 58264.23 GFLOP/s , 532716.9 tokens/s INFO:__main__:2024-10-27 12:12:56 | Epoch: 3 | Step: 84250 | Dataset: 0-8216703 | Loss: 2.134 | 676 ms/step , 58169.22 GFLOP/s , 531972.0 tokens/s INFO:__main__:2024-10-27 12:13:04 | Epoch: 3 | Step: 84260 | Dataset: 0-8224703 | Loss: 2.157 | 676 ms/step , 58179.34 GFLOP/s , 532197.2 tokens/s INFO:__main__:2024-10-27 12:13:12 | Epoch: 3 | Step: 84270 | Dataset: 0-8232703 | Loss: 2.118 | 676 ms/step , 58112.33 GFLOP/s , 531691.1 tokens/s INFO:__main__:2024-10-27 12:13:19 | Epoch: 3 | Step: 84280 | Dataset: 0-8240703 | Loss: 2.087 | 676 ms/step , 58122.39 GFLOP/s , 531430.9 tokens/s INFO:__main__:2024-10-27 12:13:27 | Epoch: 3 | Step: 84290 | Dataset: 0-8248703 | Loss: 2.127 | 676 ms/step , 58148.09 GFLOP/s , 531630.7 tokens/s INFO:__main__:2024-10-27 12:13:35 | Epoch: 3 | Step: 84300 | Dataset: 0-8256703 | Loss: 2.150 | 676 ms/step , 58153.91 GFLOP/s , 531903.7 tokens/s INFO:__main__:2024-10-27 12:13:42 | Epoch: 3 | Step: 84310 | Dataset: 0-8264703 | Loss: 2.075 | 677 ms/step , 58058.38 GFLOP/s , 531710.1 tokens/s INFO:__main__:2024-10-27 12:13:50 | Epoch: 3 | Step: 84320 | Dataset: 0-8272703 | Loss: 2.108 | 677 ms/step , 58095.12 GFLOP/s , 531624.8 tokens/s INFO:__main__:2024-10-27 12:13:58 | Epoch: 3 | Step: 84330 | Dataset: 0-8280703 | Loss: 2.115 | 677 ms/step , 58098.38 GFLOP/s , 531199.0 tokens/s INFO:__main__:2024-10-27 12:14:05 | Epoch: 3 | Step: 84340 | Dataset: 0-8288703 | Loss: 2.213 | 676 ms/step , 58122.06 GFLOP/s , 531763.1 tokens/s INFO:__main__:2024-10-27 12:14:13 | Epoch: 3 | Step: 84350 | Dataset: 0-8296703 | Loss: 2.106 | 676 ms/step , 58147.77 GFLOP/s , 531794.4 tokens/s INFO:__main__:2024-10-27 12:14:21 | Epoch: 3 | Step: 84360 | Dataset: 0-8304703 | Loss: 1.873 | 678 ms/step , 57970.69 GFLOP/s , 531166.7 tokens/s INFO:__main__:2024-10-27 12:14:29 | Epoch: 3 | Step: 84370 | Dataset: 0-8312703 | Loss: 1.781 | 679 ms/step , 57925.40 GFLOP/s , 529649.3 tokens/s INFO:__main__:2024-10-27 12:14:36 | Epoch: 3 | Step: 84380 | Dataset: 0-8320703 | Loss: 1.740 | 678 ms/step , 57981.77 GFLOP/s , 530157.0 tokens/s INFO:__main__:2024-10-27 12:14:44 | Epoch: 3 | Step: 84390 | Dataset: 0-8328703 | Loss: 1.715 | 677 ms/step , 58068.06 GFLOP/s , 529904.1 tokens/s INFO:__main__:2024-10-27 12:14:52 | Epoch: 3 | Step: 84400 | Dataset: 0-8336703 | Loss: 1.726 | 678 ms/step , 58005.83 GFLOP/s , 531362.2 tokens/s INFO:__main__:2024-10-27 12:15:00 | Epoch: 3 | Step: 84410 | Dataset: 0-8344703 | Loss: 1.701 | 678 ms/step , 57939.79 GFLOP/s , 530133.6 tokens/s INFO:__main__:2024-10-27 12:15:07 | Epoch: 3 | Step: 84420 | Dataset: 0-8352703 | Loss: 1.697 | 677 ms/step , 58040.12 GFLOP/s , 530967.5 tokens/s INFO:__main__:2024-10-27 12:15:15 | Epoch: 3 | Step: 84430 | Dataset: 0-8360703 | Loss: 1.680 | 677 ms/step , 58091.71 GFLOP/s , 530898.7 tokens/s INFO:__main__:2024-10-27 12:15:23 | Epoch: 3 | Step: 84440 | Dataset: 0-8368703 | Loss: 2.409 | 675 ms/step , 58201.63 GFLOP/s , 531603.8 tokens/s INFO:__main__:2024-10-27 12:15:30 | Epoch: 3 | Step: 84450 | Dataset: 0-8376703 | Loss: 2.180 | 676 ms/step , 58129.12 GFLOP/s , 531950.8 tokens/s INFO:__main__:2024-10-27 12:15:38 | Epoch: 3 | Step: 84460 | Dataset: 0-8384703 | Loss: 2.273 | 675 ms/step , 58250.26 GFLOP/s , 532672.9 tokens/s INFO:__main__:2024-10-27 12:15:46 | Epoch: 3 | Step: 84470 | Dataset: 0-8392703 | Loss: 2.239 | 676 ms/step , 58185.22 GFLOP/s , 531916.6 tokens/s INFO:__main__:2024-10-27 12:15:53 | Epoch: 3 | Step: 84480 | Dataset: 0-8400703 | Loss: 2.121 | 676 ms/step , 58189.38 GFLOP/s , 532409.6 tokens/s INFO:__main__:2024-10-27 12:16:01 | Epoch: 3 | Step: 84490 | Dataset: 0-8408703 | Loss: 2.177 | 676 ms/step , 58148.96 GFLOP/s , 531799.3 tokens/s INFO:__main__:2024-10-27 12:16:09 | Epoch: 3 | Step: 84500 | Dataset: 0-8416703 | Loss: 2.134 | 675 ms/step , 58231.85 GFLOP/s , 532522.8 tokens/s INFO:__main__:2024-10-27 12:16:17 | Epoch: 3 | Step: 84510 | Dataset: 0-8424703 | Loss: 2.091 | 675 ms/step , 58273.82 GFLOP/s , 532076.5 tokens/s INFO:__main__:2024-10-27 12:16:24 | Epoch: 3 | Step: 84520 | Dataset: 0-8432703 | Loss: 2.134 | 676 ms/step , 58171.31 GFLOP/s , 532513.5 tokens/s INFO:__main__:2024-10-27 12:16:32 | Epoch: 3 | Step: 84530 | Dataset: 0-8440703 | Loss: 2.084 | 676 ms/step , 58122.75 GFLOP/s , 532239.8 tokens/s INFO:__main__:2024-10-27 12:16:40 | Epoch: 3 | Step: 84540 | Dataset: 0-8448703 | Loss: 2.157 | 676 ms/step , 58125.61 GFLOP/s , 531855.5 tokens/s INFO:__main__:2024-10-27 12:16:47 | Epoch: 3 | Step: 84550 | Dataset: 0-8456703 | Loss: 2.120 | 675 ms/step , 58230.32 GFLOP/s , 532319.9 tokens/s INFO:__main__:2024-10-27 12:16:55 | Epoch: 3 | Step: 84560 | Dataset: 0-8464703 | Loss: 2.117 | 676 ms/step , 58130.32 GFLOP/s , 531744.6 tokens/s INFO:__main__:2024-10-27 12:17:03 | Epoch: 3 | Step: 84570 | Dataset: 0-8472703 | Loss: 2.075 | 676 ms/step , 58187.15 GFLOP/s , 532480.0 tokens/s INFO:__main__:2024-10-27 12:17:10 | Epoch: 3 | Step: 84580 | Dataset: 0-8480703 | Loss: 2.209 | 676 ms/step , 58134.99 GFLOP/s , 531898.5 tokens/s INFO:__main__:2024-10-27 12:17:18 | Epoch: 3 | Step: 84590 | Dataset: 0-8488703 | Loss: 2.183 | 676 ms/step , 58141.97 GFLOP/s , 532161.8 tokens/s INFO:__main__:2024-10-27 12:17:26 | Epoch: 3 | Step: 84600 | Dataset: 0-8496703 | Loss: 2.087 | 675 ms/step , 58225.79 GFLOP/s , 532305.9 tokens/s INFO:__main__:2024-10-27 12:17:34 | Epoch: 3 | Step: 84610 | Dataset: 0-8504703 | Loss: 1.752 | 675 ms/step , 58256.30 GFLOP/s , 531611.8 tokens/s INFO:__main__:2024-10-27 12:17:41 | Epoch: 3 | Step: 84620 | Dataset: 0-8512703 | Loss: 1.723 | 676 ms/step , 58178.63 GFLOP/s , 531971.9 tokens/s INFO:__main__:2024-10-27 12:17:49 | Epoch: 3 | Step: 84630 | Dataset: 0-8520703 | Loss: 1.702 | 677 ms/step , 58060.40 GFLOP/s , 531450.9 tokens/s INFO:__main__:2024-10-27 12:17:57 | Epoch: 3 | Step: 84640 | Dataset: 0-8528703 | Loss: 1.692 | 677 ms/step , 58103.57 GFLOP/s , 531849.1 tokens/s INFO:__main__:2024-10-27 12:18:04 | Epoch: 3 | Step: 84650 | Dataset: 0-8536703 | Loss: 1.698 | 677 ms/step , 58083.11 GFLOP/s , 531685.4 tokens/s INFO:__main__:2024-10-27 12:18:12 | Epoch: 3 | Step: 84660 | Dataset: 0-8544703 | Loss: 1.697 | 675 ms/step , 58209.79 GFLOP/s , 531772.4 tokens/s INFO:__main__:2024-10-27 12:18:20 | Epoch: 3 | Step: 84670 | Dataset: 0-8552703 | Loss: 1.670 | 675 ms/step , 58229.61 GFLOP/s , 531592.8 tokens/s INFO:__main__:2024-10-27 12:18:27 | Epoch: 3 | Step: 84680 | Dataset: 0-8560703 | Loss: 1.673 | 675 ms/step , 58271.60 GFLOP/s , 532514.1 tokens/s INFO:__main__:2024-10-27 12:18:35 | Epoch: 3 | Step: 84690 | Dataset: 0-8568703 | Loss: 2.261 | 674 ms/step , 58323.37 GFLOP/s , 532426.5 tokens/s INFO:__main__:2024-10-27 12:18:43 | Epoch: 3 | Step: 84700 | Dataset: 0-8576703 | Loss: 2.200 | 676 ms/step , 58184.34 GFLOP/s , 532146.8 tokens/s INFO:__main__:2024-10-27 12:18:51 | Epoch: 3 | Step: 84710 | Dataset: 0-8584703 | Loss: 2.111 | 676 ms/step , 58154.93 GFLOP/s , 532154.7 tokens/s INFO:__main__:2024-10-27 12:18:58 | Epoch: 3 | Step: 84720 | Dataset: 0-8592703 | Loss: 2.155 | 676 ms/step , 58179.87 GFLOP/s , 532136.6 tokens/s INFO:__main__:2024-10-27 12:19:06 | Epoch: 3 | Step: 84730 | Dataset: 0-8600703 | Loss: 2.173 | 676 ms/step , 58182.86 GFLOP/s , 532282.9 tokens/s INFO:__main__:2024-10-27 12:19:14 | Epoch: 3 | Step: 84740 | Dataset: 0-8608703 | Loss: 2.106 | 675 ms/step , 58199.38 GFLOP/s , 532210.4 tokens/s INFO:__main__:2024-10-27 12:19:21 | Epoch: 3 | Step: 84750 | Dataset: 0-8616703 | Loss: 2.171 | 675 ms/step , 58243.87 GFLOP/s , 532063.6 tokens/s INFO:__main__:2024-10-27 12:19:29 | Epoch: 3 | Step: 84760 | Dataset: 0-8624703 | Loss: 2.075 | 675 ms/step , 58259.66 GFLOP/s , 532353.9 tokens/s INFO:__main__:2024-10-27 12:19:37 | Epoch: 3 | Step: 84770 | Dataset: 0-8632703 | Loss: 2.141 | 678 ms/step , 58005.54 GFLOP/s , 531998.9 tokens/s INFO:__main__:2024-10-27 12:19:44 | Epoch: 3 | Step: 84780 | Dataset: 0-8640703 | Loss: 2.167 | 675 ms/step , 58230.83 GFLOP/s , 532300.6 tokens/s INFO:__main__:2024-10-27 12:19:52 | Epoch: 3 | Step: 84790 | Dataset: 0-8648703 | Loss: 2.161 | 675 ms/step , 58204.32 GFLOP/s , 532292.8 tokens/s INFO:__main__:2024-10-27 12:20:00 | Epoch: 3 | Step: 84800 | Dataset: 0-8656703 | Loss: 2.163 | 675 ms/step , 58211.68 GFLOP/s , 532066.4 tokens/s INFO:__main__:2024-10-27 12:20:07 | Epoch: 3 | Step: 84810 | Dataset: 0-8664703 | Loss: 2.068 | 676 ms/step , 58171.17 GFLOP/s , 532198.2 tokens/s INFO:__main__:2024-10-27 12:20:15 | Epoch: 3 | Step: 84820 | Dataset: 0-8672703 | Loss: 2.205 | 676 ms/step , 58184.15 GFLOP/s , 532082.5 tokens/s INFO:__main__:2024-10-27 12:20:23 | Epoch: 3 | Step: 84830 | Dataset: 0-8680703 | Loss: 2.098 | 675 ms/step , 58200.93 GFLOP/s , 531200.3 tokens/s INFO:__main__:2024-10-27 12:20:31 | Epoch: 3 | Step: 84840 | Dataset: 0-8688703 | Loss: 2.132 | 675 ms/step , 58217.60 GFLOP/s , 531604.9 tokens/s INFO:__main__:2024-10-27 12:20:38 | Epoch: 3 | Step: 84850 | Dataset: 0-8696703 | Loss: 2.289 | 675 ms/step , 58271.25 GFLOP/s , 532097.8 tokens/s INFO:__main__:2024-10-27 12:20:46 | Epoch: 3 | Step: 84860 | Dataset: 0-8704703 | Loss: 2.332 | 675 ms/step , 58246.09 GFLOP/s , 531932.0 tokens/s INFO:__main__:2024-10-27 12:20:54 | Epoch: 3 | Step: 84870 | Dataset: 0-8712703 | Loss: 2.256 | 676 ms/step , 58149.16 GFLOP/s , 532568.1 tokens/s INFO:__main__:2024-10-27 12:21:01 | Epoch: 3 | Step: 84880 | Dataset: 0-8720703 | Loss: 2.225 | 675 ms/step , 58212.21 GFLOP/s , 532337.0 tokens/s INFO:__main__:2024-10-27 12:21:09 | Epoch: 3 | Step: 84890 | Dataset: 0-8728703 | Loss: 2.234 | 676 ms/step , 58134.55 GFLOP/s , 531649.4 tokens/s INFO:__main__:2024-10-27 12:21:17 | Epoch: 3 | Step: 84900 | Dataset: 0-8736703 | Loss: 2.198 | 676 ms/step , 58121.28 GFLOP/s , 532220.2 tokens/s INFO:__main__:2024-10-27 12:21:24 | Epoch: 3 | Step: 84910 | Dataset: 0-8744703 | Loss: 2.175 | 677 ms/step , 58045.88 GFLOP/s , 531803.3 tokens/s INFO:__main__:2024-10-27 12:21:32 | Epoch: 3 | Step: 84920 | Dataset: 0-8752703 | Loss: 2.206 | 676 ms/step , 58120.35 GFLOP/s , 531002.1 tokens/s INFO:__main__:2024-10-27 12:21:40 | Epoch: 3 | Step: 84930 | Dataset: 0-8760703 | Loss: 2.265 | 676 ms/step , 58158.80 GFLOP/s , 532477.3 tokens/s INFO:__main__:2024-10-27 12:21:48 | Epoch: 3 | Step: 84940 | Dataset: 0-8768703 | Loss: 2.179 | 678 ms/step , 57953.73 GFLOP/s , 530584.7 tokens/s INFO:__main__:2024-10-27 12:21:55 | Epoch: 3 | Step: 84950 | Dataset: 0-8776703 | Loss: 2.178 | 674 ms/step , 58284.05 GFLOP/s , 531886.8 tokens/s INFO:__main__:2024-10-27 12:22:03 | Epoch: 3 | Step: 84960 | Dataset: 0-8784703 | Loss: 2.146 | 675 ms/step , 58231.17 GFLOP/s , 532375.8 tokens/s INFO:__main__:2024-10-27 12:22:11 | Epoch: 3 | Step: 84970 | Dataset: 0-8792703 | Loss: 2.242 | 676 ms/step , 58173.50 GFLOP/s , 532094.4 tokens/s INFO:__main__:2024-10-27 12:22:18 | Epoch: 3 | Step: 84980 | Dataset: 0-8800703 | Loss: 2.139 | 675 ms/step , 58256.50 GFLOP/s , 532012.6 tokens/s INFO:__main__:2024-10-27 12:22:26 | Epoch: 3 | Step: 84990 | Dataset: 0-8808703 | Loss: 2.197 | 676 ms/step , 58131.08 GFLOP/s , 532320.8 tokens/s INFO:__main__:2024-10-27 12:22:33 | Validation | Step: 85000 | Val_loss: 2.438 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 12:22:33 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_122233_step_85000.pt` INFO:__main__:2024-10-27 12:22:35 | Epoch: 3 | Step: 85000 | Dataset: 0-8816703 | Loss: 2.184 | 676 ms/step , 58178.52 GFLOP/s , 477872.3 tokens/s INFO:__main__:2024-10-27 12:22:42 | Epoch: 3 | Step: 85010 | Dataset: 0-8824703 | Loss: 2.235 | 676 ms/step , 58107.83 GFLOP/s , 531398.2 tokens/s INFO:__main__:2024-10-27 12:22:50 | Epoch: 3 | Step: 85020 | Dataset: 0-8832703 | Loss: 1.864 | 678 ms/step , 58002.48 GFLOP/s , 531517.7 tokens/s INFO:__main__:2024-10-27 12:22:58 | Epoch: 3 | Step: 85030 | Dataset: 0-8840703 | Loss: 1.831 | 676 ms/step , 58151.95 GFLOP/s , 530837.2 tokens/s INFO:__main__:2024-10-27 12:23:05 | Epoch: 3 | Step: 85040 | Dataset: 0-8848703 | Loss: 1.829 | 676 ms/step , 58163.56 GFLOP/s , 531357.9 tokens/s INFO:__main__:2024-10-27 12:23:13 | Epoch: 3 | Step: 85050 | Dataset: 0-8856703 | Loss: 1.797 | 677 ms/step , 58099.10 GFLOP/s , 531146.0 tokens/s INFO:__main__:2024-10-27 12:23:21 | Epoch: 3 | Step: 85060 | Dataset: 0-8864703 | Loss: 1.785 | 676 ms/step , 58128.50 GFLOP/s , 531183.8 tokens/s INFO:__main__:2024-10-27 12:23:29 | Epoch: 3 | Step: 85070 | Dataset: 0-8872703 | Loss: 1.781 | 678 ms/step , 57980.43 GFLOP/s , 531107.7 tokens/s INFO:__main__:2024-10-27 12:23:36 | Epoch: 3 | Step: 85080 | Dataset: 0-8880703 | Loss: 1.769 | 676 ms/step , 58117.41 GFLOP/s , 531668.7 tokens/s INFO:__main__:2024-10-27 12:23:44 | Epoch: 3 | Step: 85090 | Dataset: 0-8888703 | Loss: 1.792 | 675 ms/step , 58231.55 GFLOP/s , 531044.4 tokens/s INFO:__main__:2024-10-27 12:23:52 | Epoch: 3 | Step: 85100 | Dataset: 0-8896703 | Loss: 2.430 | 675 ms/step , 58208.13 GFLOP/s , 531487.4 tokens/s INFO:__main__:2024-10-27 12:23:59 | Epoch: 3 | Step: 85110 | Dataset: 0-8904703 | Loss: 2.278 | 676 ms/step , 58158.21 GFLOP/s , 531739.8 tokens/s INFO:__main__:2024-10-27 12:24:07 | Epoch: 3 | Step: 85120 | Dataset: 0-8912703 | Loss: 2.208 | 675 ms/step , 58203.18 GFLOP/s , 531871.3 tokens/s INFO:__main__:2024-10-27 12:24:15 | Epoch: 3 | Step: 85130 | Dataset: 0-8920703 | Loss: 2.292 | 677 ms/step , 58031.94 GFLOP/s , 532502.5 tokens/s INFO:__main__:2024-10-27 12:24:23 | Epoch: 3 | Step: 85140 | Dataset: 0-8928703 | Loss: 2.264 | 676 ms/step , 58129.65 GFLOP/s , 531834.7 tokens/s INFO:__main__:2024-10-27 12:24:30 | Epoch: 3 | Step: 85150 | Dataset: 0-8936703 | Loss: 2.191 | 678 ms/step , 58019.37 GFLOP/s , 531878.9 tokens/s INFO:__main__:2024-10-27 12:24:38 | Epoch: 3 | Step: 85160 | Dataset: 0-8944703 | Loss: 2.139 | 676 ms/step , 58125.60 GFLOP/s , 531882.3 tokens/s INFO:__main__:2024-10-27 12:24:46 | Epoch: 3 | Step: 85170 | Dataset: 0-8952703 | Loss: 2.243 | 676 ms/step , 58186.42 GFLOP/s , 532040.1 tokens/s INFO:__main__:2024-10-27 12:24:53 | Epoch: 3 | Step: 85180 | Dataset: 0-8960703 | Loss: 2.165 | 677 ms/step , 58060.97 GFLOP/s , 532303.1 tokens/s INFO:__main__:2024-10-27 12:25:01 | Epoch: 3 | Step: 85190 | Dataset: 0-8968703 | Loss: 2.177 | 675 ms/step , 58196.70 GFLOP/s , 532174.4 tokens/s INFO:__main__:2024-10-27 12:25:09 | Epoch: 3 | Step: 85200 | Dataset: 0-8976703 | Loss: 2.130 | 675 ms/step , 58200.75 GFLOP/s , 532199.8 tokens/s INFO:__main__:2024-10-27 12:25:16 | Epoch: 3 | Step: 85210 | Dataset: 0-8984703 | Loss: 2.173 | 676 ms/step , 58114.16 GFLOP/s , 531618.1 tokens/s INFO:__main__:2024-10-27 12:25:24 | Epoch: 3 | Step: 85220 | Dataset: 0-8992703 | Loss: 2.143 | 674 ms/step , 58281.09 GFLOP/s , 532512.3 tokens/s INFO:__main__:2024-10-27 12:25:32 | Epoch: 3 | Step: 85230 | Dataset: 0-9000703 | Loss: 2.114 | 676 ms/step , 58174.56 GFLOP/s , 531794.7 tokens/s INFO:__main__:2024-10-27 12:25:40 | Epoch: 3 | Step: 85240 | Dataset: 0-9008703 | Loss: 2.118 | 676 ms/step , 58119.68 GFLOP/s , 531368.8 tokens/s INFO:__main__:2024-10-27 12:25:47 | Epoch: 3 | Step: 85250 | Dataset: 0-9016703 | Loss: 2.163 | 676 ms/step , 58185.38 GFLOP/s , 531943.5 tokens/s INFO:__main__:2024-10-27 12:25:55 | Epoch: 3 | Step: 85260 | Dataset: 0-9024703 | Loss: 2.201 | 676 ms/step , 58180.86 GFLOP/s , 532203.5 tokens/s INFO:__main__:2024-10-27 12:26:03 | Epoch: 3 | Step: 85270 | Dataset: 0-9032703 | Loss: 2.254 | 676 ms/step , 58182.23 GFLOP/s , 531964.2 tokens/s INFO:__main__:2024-10-27 12:26:10 | Epoch: 3 | Step: 85280 | Dataset: 0-9040703 | Loss: 2.213 | 677 ms/step , 58103.37 GFLOP/s , 531652.0 tokens/s INFO:__main__:2024-10-27 12:26:18 | Epoch: 3 | Step: 85290 | Dataset: 0-9048703 | Loss: 2.161 | 676 ms/step , 58128.01 GFLOP/s , 532279.4 tokens/s INFO:__main__:2024-10-27 12:26:26 | Epoch: 3 | Step: 85300 | Dataset: 0-9056703 | Loss: 2.235 | 674 ms/step , 58348.35 GFLOP/s , 532136.0 tokens/s INFO:__main__:2024-10-27 12:26:33 | Epoch: 3 | Step: 85310 | Dataset: 0-9064703 | Loss: 2.166 | 676 ms/step , 58176.60 GFLOP/s , 530390.5 tokens/s INFO:__main__:2024-10-27 12:26:41 | Epoch: 3 | Step: 85320 | Dataset: 0-9072703 | Loss: 2.171 | 676 ms/step , 58125.79 GFLOP/s , 530661.2 tokens/s INFO:__main__:2024-10-27 12:26:49 | Epoch: 3 | Step: 85330 | Dataset: 0-9080703 | Loss: 2.199 | 677 ms/step , 58056.05 GFLOP/s , 530377.4 tokens/s INFO:__main__:2024-10-27 12:26:57 | Epoch: 3 | Step: 85340 | Dataset: 0-9088703 | Loss: 2.238 | 676 ms/step , 58173.21 GFLOP/s , 531119.0 tokens/s INFO:__main__:2024-10-27 12:27:04 | Epoch: 3 | Step: 85350 | Dataset: 0-9096703 | Loss: 2.203 | 678 ms/step , 58015.17 GFLOP/s , 530811.3 tokens/s INFO:__main__:2024-10-27 12:27:12 | Epoch: 3 | Step: 85360 | Dataset: 0-9104703 | Loss: 2.144 | 676 ms/step , 58115.62 GFLOP/s , 530739.4 tokens/s INFO:__main__:2024-10-27 12:27:20 | Epoch: 3 | Step: 85370 | Dataset: 0-9112703 | Loss: 2.206 | 676 ms/step , 58110.40 GFLOP/s , 530492.8 tokens/s INFO:__main__:2024-10-27 12:27:28 | Epoch: 3 | Step: 85380 | Dataset: 0-9120703 | Loss: 2.177 | 677 ms/step , 58097.36 GFLOP/s , 528739.9 tokens/s INFO:__main__:2024-10-27 12:27:35 | Epoch: 3 | Step: 85390 | Dataset: 0-9128703 | Loss: 2.152 | 677 ms/step , 58091.04 GFLOP/s , 531624.7 tokens/s INFO:__main__:2024-10-27 12:27:43 | Epoch: 3 | Step: 85400 | Dataset: 0-9136703 | Loss: 2.154 | 676 ms/step , 58143.21 GFLOP/s , 531336.5 tokens/s INFO:__main__:2024-10-27 12:27:51 | Epoch: 3 | Step: 85410 | Dataset: 0-9144703 | Loss: 2.188 | 675 ms/step , 58198.77 GFLOP/s , 532068.6 tokens/s INFO:__main__:2024-10-27 12:27:58 | Epoch: 3 | Step: 85420 | Dataset: 0-9152703 | Loss: 2.192 | 675 ms/step , 58201.67 GFLOP/s , 532063.0 tokens/s INFO:__main__:2024-10-27 12:28:06 | Epoch: 3 | Step: 85430 | Dataset: 0-9160703 | Loss: 2.203 | 674 ms/step , 58291.25 GFLOP/s , 532449.0 tokens/s INFO:__main__:2024-10-27 12:28:14 | Epoch: 3 | Step: 85440 | Dataset: 0-9168703 | Loss: 2.237 | 676 ms/step , 58170.83 GFLOP/s , 532571.1 tokens/s INFO:__main__:2024-10-27 12:28:21 | Epoch: 3 | Step: 85450 | Dataset: 0-9176703 | Loss: 2.142 | 675 ms/step , 58215.88 GFLOP/s , 531709.1 tokens/s INFO:__main__:2024-10-27 12:28:29 | Epoch: 3 | Step: 85460 | Dataset: 0-9184703 | Loss: 2.096 | 676 ms/step , 58179.16 GFLOP/s , 532429.6 tokens/s INFO:__main__:2024-10-27 12:28:37 | Epoch: 3 | Step: 85470 | Dataset: 0-9192703 | Loss: 2.120 | 676 ms/step , 58167.25 GFLOP/s , 531741.7 tokens/s INFO:__main__:2024-10-27 12:28:45 | Epoch: 3 | Step: 85480 | Dataset: 0-9200703 | Loss: 2.127 | 675 ms/step , 58194.78 GFLOP/s , 531868.5 tokens/s INFO:__main__:2024-10-27 12:28:52 | Epoch: 3 | Step: 85490 | Dataset: 0-9208703 | Loss: 2.165 | 675 ms/step , 58193.21 GFLOP/s , 531860.7 tokens/s INFO:__main__:2024-10-27 12:29:00 | Epoch: 3 | Step: 85500 | Dataset: 0-9216703 | Loss: 2.127 | 675 ms/step , 58196.01 GFLOP/s , 531596.8 tokens/s INFO:__main__:2024-10-27 12:29:08 | Epoch: 3 | Step: 85510 | Dataset: 0-9224703 | Loss: 2.113 | 675 ms/step , 58242.92 GFLOP/s , 531844.5 tokens/s INFO:__main__:2024-10-27 12:29:15 | Epoch: 3 | Step: 85520 | Dataset: 0-9232703 | Loss: 2.115 | 676 ms/step , 58164.37 GFLOP/s , 532392.2 tokens/s INFO:__main__:2024-10-27 12:29:23 | Epoch: 3 | Step: 85530 | Dataset: 0-9240703 | Loss: 2.186 | 677 ms/step , 58066.93 GFLOP/s , 531263.0 tokens/s INFO:__main__:2024-10-27 12:29:31 | Epoch: 3 | Step: 85540 | Dataset: 0-9248703 | Loss: 2.160 | 678 ms/step , 57940.87 GFLOP/s , 530519.1 tokens/s INFO:__main__:2024-10-27 12:29:38 | Epoch: 3 | Step: 85550 | Dataset: 0-9256703 | Loss: 2.113 | 678 ms/step , 57962.09 GFLOP/s , 530519.5 tokens/s INFO:__main__:2024-10-27 12:29:46 | Epoch: 3 | Step: 85560 | Dataset: 0-9264703 | Loss: 2.062 | 675 ms/step , 58252.31 GFLOP/s , 532586.5 tokens/s INFO:__main__:2024-10-27 12:29:54 | Epoch: 3 | Step: 85570 | Dataset: 0-9272703 | Loss: 2.065 | 674 ms/step , 58350.10 GFLOP/s , 532437.9 tokens/s INFO:__main__:2024-10-27 12:30:02 | Epoch: 3 | Step: 85580 | Dataset: 0-9280703 | Loss: 2.074 | 674 ms/step , 58317.42 GFLOP/s , 532134.1 tokens/s INFO:__main__:2024-10-27 12:30:09 | Epoch: 3 | Step: 85590 | Dataset: 0-9288703 | Loss: 1.862 | 676 ms/step , 58159.26 GFLOP/s , 531408.7 tokens/s INFO:__main__:2024-10-27 12:30:17 | Epoch: 3 | Step: 85600 | Dataset: 0-9296703 | Loss: 1.787 | 675 ms/step , 58194.04 GFLOP/s , 531556.4 tokens/s INFO:__main__:2024-10-27 12:30:25 | Epoch: 3 | Step: 85610 | Dataset: 0-9304703 | Loss: 1.775 | 676 ms/step , 58108.98 GFLOP/s , 531197.4 tokens/s INFO:__main__:2024-10-27 12:30:32 | Epoch: 3 | Step: 85620 | Dataset: 0-9312703 | Loss: 1.791 | 675 ms/step , 58241.57 GFLOP/s , 531979.0 tokens/s INFO:__main__:2024-10-27 12:30:40 | Epoch: 3 | Step: 85630 | Dataset: 0-9320703 | Loss: 1.780 | 676 ms/step , 58126.28 GFLOP/s , 531562.5 tokens/s INFO:__main__:2024-10-27 12:30:48 | Epoch: 3 | Step: 85640 | Dataset: 0-9328703 | Loss: 1.746 | 676 ms/step , 58182.49 GFLOP/s , 531465.6 tokens/s INFO:__main__:2024-10-27 12:30:55 | Epoch: 3 | Step: 85650 | Dataset: 0-9336703 | Loss: 1.769 | 675 ms/step , 58196.96 GFLOP/s , 531658.1 tokens/s INFO:__main__:2024-10-27 12:31:03 | Epoch: 3 | Step: 85660 | Dataset: 0-9344703 | Loss: 1.735 | 676 ms/step , 58153.70 GFLOP/s , 530364.4 tokens/s INFO:__main__:2024-10-27 12:31:11 | Epoch: 3 | Step: 85670 | Dataset: 0-9352703 | Loss: 1.735 | 675 ms/step , 58211.03 GFLOP/s , 531514.9 tokens/s INFO:__main__:2024-10-27 12:31:19 | Epoch: 3 | Step: 85680 | Dataset: 0-9360703 | Loss: 2.161 | 676 ms/step , 58182.71 GFLOP/s , 532317.1 tokens/s INFO:__main__:2024-10-27 12:31:26 | Epoch: 3 | Step: 85690 | Dataset: 0-9368703 | Loss: 2.165 | 676 ms/step , 58176.83 GFLOP/s , 531782.5 tokens/s INFO:__main__:2024-10-27 12:31:34 | Epoch: 3 | Step: 85700 | Dataset: 0-9376703 | Loss: 2.083 | 676 ms/step , 58173.19 GFLOP/s , 532299.3 tokens/s INFO:__main__:2024-10-27 12:31:42 | Epoch: 3 | Step: 85710 | Dataset: 0-9384703 | Loss: 2.095 | 676 ms/step , 58161.42 GFLOP/s , 531520.3 tokens/s INFO:__main__:2024-10-27 12:31:49 | Epoch: 3 | Step: 85720 | Dataset: 0-9392703 | Loss: 2.164 | 676 ms/step , 58184.09 GFLOP/s , 531585.6 tokens/s INFO:__main__:2024-10-27 12:31:57 | Epoch: 3 | Step: 85730 | Dataset: 0-9400703 | Loss: 1.919 | 676 ms/step , 58125.52 GFLOP/s , 531491.6 tokens/s INFO:__main__:2024-10-27 12:32:05 | Epoch: 3 | Step: 85740 | Dataset: 0-9408703 | Loss: 2.143 | 676 ms/step , 58108.47 GFLOP/s , 531967.4 tokens/s INFO:__main__:2024-10-27 12:32:13 | Epoch: 3 | Step: 85750 | Dataset: 0-9416703 | Loss: 2.113 | 675 ms/step , 58219.56 GFLOP/s , 532171.7 tokens/s INFO:__main__:2024-10-27 12:32:20 | Epoch: 3 | Step: 85760 | Dataset: 0-9424703 | Loss: 2.117 | 675 ms/step , 58265.47 GFLOP/s , 532266.9 tokens/s INFO:__main__:2024-10-27 12:32:28 | Epoch: 3 | Step: 85770 | Dataset: 0-9432703 | Loss: 2.067 | 676 ms/step , 58120.66 GFLOP/s , 531477.4 tokens/s INFO:__main__:2024-10-27 12:32:36 | Epoch: 3 | Step: 85780 | Dataset: 0-9440703 | Loss: 2.089 | 677 ms/step , 58087.07 GFLOP/s , 531593.8 tokens/s INFO:__main__:2024-10-27 12:32:43 | Epoch: 3 | Step: 85790 | Dataset: 0-9448703 | Loss: 2.071 | 677 ms/step , 58106.48 GFLOP/s , 531514.0 tokens/s INFO:__main__:2024-10-27 12:32:51 | Epoch: 3 | Step: 85800 | Dataset: 0-9456703 | Loss: 2.112 | 677 ms/step , 58035.73 GFLOP/s , 529951.5 tokens/s INFO:__main__:2024-10-27 12:32:59 | Epoch: 3 | Step: 85810 | Dataset: 0-9464703 | Loss: 2.138 | 677 ms/step , 58070.64 GFLOP/s , 531043.4 tokens/s INFO:__main__:2024-10-27 12:33:06 | Epoch: 3 | Step: 85820 | Dataset: 0-9472703 | Loss: 1.995 | 677 ms/step , 58064.00 GFLOP/s , 531308.7 tokens/s INFO:__main__:2024-10-27 12:33:14 | Epoch: 3 | Step: 85830 | Dataset: 0-9480703 | Loss: 2.037 | 676 ms/step , 58161.82 GFLOP/s , 531939.7 tokens/s INFO:__main__:2024-10-27 12:33:22 | Epoch: 3 | Step: 85840 | Dataset: 0-9488703 | Loss: 2.204 | 677 ms/step , 58080.47 GFLOP/s , 531686.2 tokens/s INFO:__main__:2024-10-27 12:33:30 | Epoch: 3 | Step: 85850 | Dataset: 0-9496703 | Loss: 2.156 | 675 ms/step , 58257.95 GFLOP/s , 532053.9 tokens/s INFO:__main__:2024-10-27 12:33:37 | Epoch: 3 | Step: 85860 | Dataset: 0-9504703 | Loss: 2.142 | 675 ms/step , 58251.87 GFLOP/s , 532339.9 tokens/s INFO:__main__:2024-10-27 12:33:45 | Epoch: 3 | Step: 85870 | Dataset: 0-9512703 | Loss: 2.078 | 675 ms/step , 58223.32 GFLOP/s , 532108.6 tokens/s INFO:__main__:2024-10-27 12:33:53 | Epoch: 3 | Step: 85880 | Dataset: 0-9520703 | Loss: 2.124 | 679 ms/step , 57873.40 GFLOP/s , 530894.6 tokens/s INFO:__main__:2024-10-27 12:34:00 | Epoch: 3 | Step: 85890 | Dataset: 0-9528703 | Loss: 2.095 | 675 ms/step , 58223.62 GFLOP/s , 532543.3 tokens/s INFO:__main__:2024-10-27 12:34:08 | Epoch: 3 | Step: 85900 | Dataset: 0-9536703 | Loss: 2.132 | 675 ms/step , 58264.72 GFLOP/s , 532336.7 tokens/s INFO:__main__:2024-10-27 12:34:16 | Epoch: 3 | Step: 85910 | Dataset: 0-9544703 | Loss: 2.057 | 676 ms/step , 58174.83 GFLOP/s , 532263.7 tokens/s INFO:__main__:2024-10-27 12:34:23 | Epoch: 3 | Step: 85920 | Dataset: 0-9552703 | Loss: 2.227 | 675 ms/step , 58229.43 GFLOP/s , 531274.8 tokens/s INFO:__main__:2024-10-27 12:34:31 | Epoch: 3 | Step: 85930 | Dataset: 0-9560703 | Loss: 2.108 | 676 ms/step , 58126.98 GFLOP/s , 531420.7 tokens/s INFO:__main__:2024-10-27 12:34:39 | Epoch: 3 | Step: 85940 | Dataset: 0-9568703 | Loss: 2.077 | 677 ms/step , 58079.07 GFLOP/s , 531908.3 tokens/s INFO:__main__:2024-10-27 12:34:47 | Epoch: 3 | Step: 85950 | Dataset: 0-9576703 | Loss: 2.179 | 676 ms/step , 58183.64 GFLOP/s , 531561.0 tokens/s INFO:__main__:2024-10-27 12:34:54 | Epoch: 3 | Step: 85960 | Dataset: 0-9584703 | Loss: 2.107 | 676 ms/step , 58128.63 GFLOP/s , 531710.4 tokens/s INFO:__main__:2024-10-27 12:35:02 | Epoch: 3 | Step: 85970 | Dataset: 0-9592703 | Loss: 2.140 | 675 ms/step , 58244.65 GFLOP/s , 532307.3 tokens/s INFO:__main__:2024-10-27 12:35:10 | Epoch: 3 | Step: 85980 | Dataset: 0-9600703 | Loss: 2.122 | 675 ms/step , 58263.11 GFLOP/s , 532964.8 tokens/s INFO:__main__:2024-10-27 12:35:17 | Epoch: 3 | Step: 85990 | Dataset: 0-9608703 | Loss: 2.075 | 674 ms/step , 58283.06 GFLOP/s , 532414.9 tokens/s INFO:__main__:2024-10-27 12:35:25 | Validation | Step: 86000 | Val_loss: 2.090 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 12:35:25 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_123525_step_86000.pt` INFO:__main__:2024-10-27 12:35:26 | Epoch: 3 | Step: 86000 | Dataset: 0-9616703 | Loss: 2.154 | 675 ms/step , 58267.18 GFLOP/s , 478837.5 tokens/s INFO:__main__:2024-10-27 12:35:34 | Epoch: 3 | Step: 86010 | Dataset: 0-9624703 | Loss: 2.033 | 675 ms/step , 58242.48 GFLOP/s , 531742.7 tokens/s INFO:__main__:2024-10-27 12:35:41 | Epoch: 3 | Step: 86020 | Dataset: 0-9632703 | Loss: 2.157 | 676 ms/step , 58151.75 GFLOP/s , 531795.0 tokens/s INFO:__main__:2024-10-27 12:35:49 | Epoch: 3 | Step: 86030 | Dataset: 0-9640703 | Loss: 2.095 | 676 ms/step , 58157.71 GFLOP/s , 532252.7 tokens/s INFO:__main__:2024-10-27 12:35:57 | Epoch: 3 | Step: 86040 | Dataset: 0-9648703 | Loss: 2.088 | 675 ms/step , 58247.62 GFLOP/s , 532361.3 tokens/s INFO:__main__:2024-10-27 12:36:04 | Epoch: 3 | Step: 86050 | Dataset: 0-9656703 | Loss: 2.099 | 675 ms/step , 58243.76 GFLOP/s , 532723.0 tokens/s INFO:__main__:2024-10-27 12:36:12 | Epoch: 3 | Step: 86060 | Dataset: 0-9664703 | Loss: 2.088 | 675 ms/step , 58223.97 GFLOP/s , 532588.1 tokens/s INFO:__main__:2024-10-27 12:36:20 | Epoch: 3 | Step: 86070 | Dataset: 0-9672703 | Loss: 2.092 | 676 ms/step , 58126.38 GFLOP/s , 532024.2 tokens/s INFO:__main__:2024-10-27 12:36:28 | Epoch: 3 | Step: 86080 | Dataset: 0-9680703 | Loss: 2.052 | 676 ms/step , 58174.71 GFLOP/s , 531967.7 tokens/s INFO:__main__:2024-10-27 12:36:35 | Epoch: 3 | Step: 86090 | Dataset: 0-9688703 | Loss: 2.090 | 676 ms/step , 58153.85 GFLOP/s , 532138.9 tokens/s INFO:__main__:2024-10-27 12:36:43 | Epoch: 3 | Step: 86100 | Dataset: 0-9696703 | Loss: 1.992 | 676 ms/step , 58128.69 GFLOP/s , 532211.5 tokens/s INFO:__main__:2024-10-27 12:36:51 | Epoch: 3 | Step: 86110 | Dataset: 0-9704703 | Loss: 2.018 | 676 ms/step , 58162.28 GFLOP/s , 532489.4 tokens/s INFO:__main__:2024-10-27 12:36:58 | Epoch: 3 | Step: 86120 | Dataset: 0-9712703 | Loss: 2.153 | 674 ms/step , 58306.81 GFLOP/s , 532822.9 tokens/s INFO:__main__:2024-10-27 12:37:06 | Epoch: 3 | Step: 86130 | Dataset: 0-9720703 | Loss: 2.031 | 676 ms/step , 58172.43 GFLOP/s , 532234.8 tokens/s INFO:__main__:2024-10-27 12:37:14 | Epoch: 3 | Step: 86140 | Dataset: 0-9728703 | Loss: 2.174 | 675 ms/step , 58247.58 GFLOP/s , 532338.4 tokens/s INFO:__main__:2024-10-27 12:37:21 | Epoch: 3 | Step: 86150 | Dataset: 0-9736703 | Loss: 2.030 | 676 ms/step , 58126.04 GFLOP/s , 531925.1 tokens/s INFO:__main__:2024-10-27 12:37:29 | Epoch: 3 | Step: 86160 | Dataset: 0-9744703 | Loss: 1.827 | 675 ms/step , 58212.19 GFLOP/s , 532079.8 tokens/s INFO:__main__:2024-10-27 12:37:37 | Epoch: 3 | Step: 86170 | Dataset: 0-9752703 | Loss: 1.746 | 677 ms/step , 58101.88 GFLOP/s , 531435.2 tokens/s INFO:__main__:2024-10-27 12:37:44 | Epoch: 3 | Step: 86180 | Dataset: 0-9760703 | Loss: 1.712 | 675 ms/step , 58232.89 GFLOP/s , 531794.5 tokens/s INFO:__main__:2024-10-27 12:37:52 | Epoch: 3 | Step: 86190 | Dataset: 0-9768703 | Loss: 1.660 | 676 ms/step , 58152.09 GFLOP/s , 531635.0 tokens/s INFO:__main__:2024-10-27 12:38:00 | Epoch: 3 | Step: 86200 | Dataset: 0-9776703 | Loss: 1.686 | 676 ms/step , 58166.51 GFLOP/s , 531394.1 tokens/s INFO:__main__:2024-10-27 12:38:08 | Epoch: 3 | Step: 86210 | Dataset: 0-9784703 | Loss: 1.670 | 676 ms/step , 58171.08 GFLOP/s , 531442.4 tokens/s INFO:__main__:2024-10-27 12:38:15 | Epoch: 3 | Step: 86220 | Dataset: 0-9792703 | Loss: 1.675 | 675 ms/step , 58220.63 GFLOP/s , 531621.8 tokens/s INFO:__main__:2024-10-27 12:38:23 | Epoch: 3 | Step: 86230 | Dataset: 0-9800703 | Loss: 1.652 | 678 ms/step , 57975.44 GFLOP/s , 530410.7 tokens/s INFO:__main__:2024-10-27 12:38:31 | Epoch: 3 | Step: 86240 | Dataset: 0-9808703 | Loss: 1.689 | 675 ms/step , 58271.14 GFLOP/s , 530190.3 tokens/s INFO:__main__:2024-10-27 12:38:38 | Epoch: 3 | Step: 86250 | Dataset: 0-9816703 | Loss: 2.197 | 675 ms/step , 58232.44 GFLOP/s , 533168.6 tokens/s INFO:__main__:2024-10-27 12:38:46 | Epoch: 3 | Step: 86260 | Dataset: 0-9824703 | Loss: 2.101 | 676 ms/step , 58167.52 GFLOP/s , 532488.4 tokens/s INFO:__main__:2024-10-27 12:38:54 | Epoch: 3 | Step: 86270 | Dataset: 0-9832703 | Loss: 2.133 | 675 ms/step , 58269.07 GFLOP/s , 532074.4 tokens/s INFO:__main__:2024-10-27 12:39:02 | Epoch: 3 | Step: 86280 | Dataset: 0-9840703 | Loss: 2.116 | 675 ms/step , 58245.92 GFLOP/s , 532683.9 tokens/s INFO:__main__:2024-10-27 12:39:09 | Epoch: 3 | Step: 86290 | Dataset: 0-9848703 | Loss: 2.129 | 674 ms/step , 58319.90 GFLOP/s , 532459.0 tokens/s INFO:__main__:2024-10-27 12:39:17 | Epoch: 3 | Step: 86300 | Dataset: 0-9856703 | Loss: 2.079 | 676 ms/step , 58171.60 GFLOP/s , 532320.5 tokens/s INFO:__main__:2024-10-27 12:39:25 | Epoch: 3 | Step: 86310 | Dataset: 0-9864703 | Loss: 2.085 | 676 ms/step , 58137.85 GFLOP/s , 531768.8 tokens/s INFO:__main__:2024-10-27 12:39:32 | Epoch: 3 | Step: 86320 | Dataset: 0-9872703 | Loss: 2.118 | 675 ms/step , 58251.93 GFLOP/s , 532414.7 tokens/s INFO:__main__:2024-10-27 12:39:40 | Epoch: 3 | Step: 86330 | Dataset: 0-9880703 | Loss: 2.076 | 675 ms/step , 58199.41 GFLOP/s , 531874.1 tokens/s INFO:__main__:2024-10-27 12:39:48 | Epoch: 3 | Step: 86340 | Dataset: 0-9888703 | Loss: 2.101 | 675 ms/step , 58205.16 GFLOP/s , 531968.6 tokens/s INFO:__main__:2024-10-27 12:39:55 | Epoch: 3 | Step: 86350 | Dataset: 0-9896703 | Loss: 2.054 | 675 ms/step , 58235.15 GFLOP/s , 532232.4 tokens/s INFO:__main__:2024-10-27 12:40:03 | Epoch: 3 | Step: 86360 | Dataset: 0-9904703 | Loss: 2.160 | 674 ms/step , 58292.87 GFLOP/s , 532582.8 tokens/s INFO:__main__:2024-10-27 12:40:11 | Epoch: 3 | Step: 86370 | Dataset: 0-9912703 | Loss: 2.066 | 676 ms/step , 58159.51 GFLOP/s , 532909.8 tokens/s INFO:__main__:2024-10-27 12:40:18 | Epoch: 3 | Step: 86380 | Dataset: 0-9920703 | Loss: 2.025 | 675 ms/step , 58219.10 GFLOP/s , 532429.2 tokens/s INFO:__main__:2024-10-27 12:40:26 | Epoch: 3 | Step: 86390 | Dataset: 0-9928703 | Loss: 2.044 | 674 ms/step , 58293.75 GFLOP/s , 532130.8 tokens/s INFO:__main__:2024-10-27 12:40:34 | Epoch: 3 | Step: 86400 | Dataset: 0-9936703 | Loss: 2.081 | 675 ms/step , 58231.14 GFLOP/s , 532513.5 tokens/s INFO:__main__:2024-10-27 12:40:42 | Epoch: 3 | Step: 86410 | Dataset: 0-9944703 | Loss: 2.232 | 678 ms/step , 57968.35 GFLOP/s , 531391.4 tokens/s INFO:__main__:2024-10-27 12:40:49 | Epoch: 3 | Step: 86420 | Dataset: 0-9952703 | Loss: 2.203 | 679 ms/step , 57892.73 GFLOP/s , 530394.6 tokens/s INFO:__main__:2024-10-27 12:40:57 | Epoch: 3 | Step: 86430 | Dataset: 0-9960703 | Loss: 2.252 | 676 ms/step , 58176.12 GFLOP/s , 532404.8 tokens/s INFO:__main__:2024-10-27 12:41:05 | Epoch: 3 | Step: 86440 | Dataset: 0-9968703 | Loss: 2.164 | 675 ms/step , 58220.36 GFLOP/s , 532630.4 tokens/s INFO:__main__:2024-10-27 12:41:12 | Epoch: 3 | Step: 86450 | Dataset: 0-9976703 | Loss: 2.152 | 674 ms/step , 58300.86 GFLOP/s , 532419.3 tokens/s INFO:__main__:2024-10-27 12:41:20 | Epoch: 3 | Step: 86460 | Dataset: 0-9984703 | Loss: 2.216 | 677 ms/step , 58062.14 GFLOP/s , 532800.3 tokens/s INFO:__main__:2024-10-27 12:41:28 | Epoch: 3 | Step: 86470 | Dataset: 0-9992703 | Loss: 2.139 | 676 ms/step , 58182.57 GFLOP/s , 532143.8 tokens/s INFO:__main__:2024-10-27 12:41:35 | Epoch: 3 | Step: 86480 | Dataset: 0-10000703 | Loss: 2.127 | 676 ms/step , 58153.60 GFLOP/s , 532537.6 tokens/s INFO:__main__:2024-10-27 12:41:43 | Epoch: 3 | Step: 86490 | Dataset: 0-10008703 | Loss: 2.271 | 678 ms/step , 57947.82 GFLOP/s , 530627.9 tokens/s INFO:__main__:2024-10-27 12:41:51 | Epoch: 3 | Step: 86500 | Dataset: 0-10016703 | Loss: 2.088 | 678 ms/step , 57967.26 GFLOP/s , 530256.0 tokens/s INFO:__main__:2024-10-27 12:41:59 | Epoch: 3 | Step: 86510 | Dataset: 0-10024703 | Loss: 2.247 | 674 ms/step , 58289.41 GFLOP/s , 530965.7 tokens/s INFO:__main__:2024-10-27 12:42:06 | Epoch: 3 | Step: 86520 | Dataset: 0-10032703 | Loss: 2.129 | 676 ms/step , 58188.56 GFLOP/s , 532536.9 tokens/s INFO:__main__:2024-10-27 12:42:14 | Epoch: 3 | Step: 86530 | Dataset: 0-10040703 | Loss: 2.148 | 676 ms/step , 58188.67 GFLOP/s , 532363.5 tokens/s INFO:__main__:2024-10-27 12:42:22 | Epoch: 3 | Step: 86540 | Dataset: 0-10048703 | Loss: 2.134 | 675 ms/step , 58193.37 GFLOP/s , 532215.2 tokens/s INFO:__main__:2024-10-27 12:42:29 | Epoch: 3 | Step: 86550 | Dataset: 0-10056703 | Loss: 2.243 | 676 ms/step , 58177.32 GFLOP/s , 532294.7 tokens/s INFO:__main__:2024-10-27 12:42:37 | Epoch: 3 | Step: 86560 | Dataset: 0-10064703 | Loss: 2.155 | 677 ms/step , 58087.39 GFLOP/s , 531679.4 tokens/s INFO:__main__:2024-10-27 12:42:45 | Epoch: 3 | Step: 86570 | Dataset: 0-10072703 | Loss: 1.875 | 677 ms/step , 58077.38 GFLOP/s , 531490.0 tokens/s INFO:__main__:2024-10-27 12:42:52 | Epoch: 3 | Step: 86580 | Dataset: 0-10080703 | Loss: 1.827 | 676 ms/step , 58178.97 GFLOP/s , 531834.1 tokens/s INFO:__main__:2024-10-27 12:43:00 | Epoch: 3 | Step: 86590 | Dataset: 0-10088703 | Loss: 1.787 | 676 ms/step , 58112.30 GFLOP/s , 531931.4 tokens/s INFO:__main__:2024-10-27 12:43:08 | Epoch: 3 | Step: 86600 | Dataset: 0-10096703 | Loss: 1.772 | 675 ms/step , 58260.14 GFLOP/s , 532103.1 tokens/s INFO:__main__:2024-10-27 12:43:16 | Epoch: 3 | Step: 86610 | Dataset: 0-10104703 | Loss: 1.757 | 675 ms/step , 58262.12 GFLOP/s , 532014.7 tokens/s INFO:__main__:2024-10-27 12:43:23 | Epoch: 3 | Step: 86620 | Dataset: 0-10112703 | Loss: 1.770 | 675 ms/step , 58240.28 GFLOP/s , 532214.3 tokens/s INFO:__main__:2024-10-27 12:43:31 | Epoch: 3 | Step: 86630 | Dataset: 0-10120703 | Loss: 1.765 | 676 ms/step , 58145.30 GFLOP/s , 531762.0 tokens/s INFO:__main__:2024-10-27 12:43:39 | Epoch: 3 | Step: 86640 | Dataset: 0-10128703 | Loss: 1.781 | 675 ms/step , 58250.60 GFLOP/s , 531746.8 tokens/s INFO:__main__:2024-10-27 12:43:46 | Epoch: 3 | Step: 86650 | Dataset: 0-10136703 | Loss: 1.752 | 675 ms/step , 58251.89 GFLOP/s , 531837.1 tokens/s INFO:__main__:2024-10-27 12:43:54 | Epoch: 3 | Step: 86660 | Dataset: 0-10144703 | Loss: 2.180 | 677 ms/step , 58051.20 GFLOP/s , 532465.3 tokens/s INFO:__main__:2024-10-27 12:44:02 | Epoch: 3 | Step: 86670 | Dataset: 0-10152703 | Loss: 2.159 | 675 ms/step , 58219.59 GFLOP/s , 532190.6 tokens/s INFO:__main__:2024-10-27 12:44:09 | Epoch: 3 | Step: 86680 | Dataset: 0-10160703 | Loss: 2.116 | 676 ms/step , 58177.53 GFLOP/s , 532117.4 tokens/s INFO:__main__:2024-10-27 12:44:17 | Epoch: 3 | Step: 86690 | Dataset: 0-10168703 | Loss: 2.179 | 676 ms/step , 58165.38 GFLOP/s , 532281.4 tokens/s INFO:__main__:2024-10-27 12:44:25 | Epoch: 3 | Step: 86700 | Dataset: 0-10176703 | Loss: 2.079 | 676 ms/step , 58184.53 GFLOP/s , 532224.1 tokens/s INFO:__main__:2024-10-27 12:44:33 | Epoch: 3 | Step: 86710 | Dataset: 0-10184703 | Loss: 2.134 | 675 ms/step , 58220.44 GFLOP/s , 532468.2 tokens/s INFO:__main__:2024-10-27 12:44:40 | Epoch: 3 | Step: 86720 | Dataset: 0-10192703 | Loss: 2.095 | 675 ms/step , 58226.04 GFLOP/s , 532686.8 tokens/s INFO:__main__:2024-10-27 12:44:48 | Epoch: 3 | Step: 86730 | Dataset: 0-10200703 | Loss: 2.127 | 674 ms/step , 58295.03 GFLOP/s , 532868.1 tokens/s INFO:__main__:2024-10-27 12:44:56 | Epoch: 3 | Step: 86740 | Dataset: 0-10208703 | Loss: 2.168 | 676 ms/step , 58160.99 GFLOP/s , 532566.8 tokens/s INFO:__main__:2024-10-27 12:45:03 | Epoch: 3 | Step: 86750 | Dataset: 0-10216703 | Loss: 2.173 | 676 ms/step , 58158.82 GFLOP/s , 530858.9 tokens/s INFO:__main__:2024-10-27 12:45:11 | Epoch: 3 | Step: 86760 | Dataset: 0-10224703 | Loss: 2.165 | 676 ms/step , 58159.25 GFLOP/s , 531078.1 tokens/s INFO:__main__:2024-10-27 12:45:19 | Epoch: 3 | Step: 86770 | Dataset: 0-10232703 | Loss: 2.204 | 675 ms/step , 58197.78 GFLOP/s , 531842.4 tokens/s INFO:__main__:2024-10-27 12:45:26 | Epoch: 3 | Step: 86780 | Dataset: 0-10240703 | Loss: 2.151 | 675 ms/step , 58240.56 GFLOP/s , 531479.2 tokens/s INFO:__main__:2024-10-27 12:45:34 | Epoch: 3 | Step: 86790 | Dataset: 0-10248703 | Loss: 2.117 | 675 ms/step , 58202.74 GFLOP/s , 531327.2 tokens/s INFO:__main__:2024-10-27 12:45:42 | Epoch: 3 | Step: 86800 | Dataset: 0-10256703 | Loss: 1.993 | 676 ms/step , 58123.85 GFLOP/s , 531178.6 tokens/s INFO:__main__:2024-10-27 12:45:50 | Epoch: 3 | Step: 86810 | Dataset: 0-10264703 | Loss: 2.093 | 677 ms/step , 58094.36 GFLOP/s , 531193.4 tokens/s INFO:__main__:2024-10-27 12:45:57 | Epoch: 3 | Step: 86820 | Dataset: 0-10272703 | Loss: 2.230 | 675 ms/step , 58198.88 GFLOP/s , 529796.3 tokens/s INFO:__main__:2024-10-27 12:46:05 | Epoch: 3 | Step: 86830 | Dataset: 0-10280703 | Loss: 2.229 | 676 ms/step , 58152.71 GFLOP/s , 529970.2 tokens/s INFO:__main__:2024-10-27 12:46:13 | Epoch: 3 | Step: 86840 | Dataset: 0-10288703 | Loss: 2.135 | 675 ms/step , 58201.90 GFLOP/s , 532642.6 tokens/s INFO:__main__:2024-10-27 12:46:20 | Epoch: 3 | Step: 86850 | Dataset: 0-10296703 | Loss: 2.163 | 677 ms/step , 58086.03 GFLOP/s , 532103.5 tokens/s INFO:__main__:2024-10-27 12:46:28 | Epoch: 3 | Step: 86860 | Dataset: 0-10304703 | Loss: 2.231 | 675 ms/step , 58199.12 GFLOP/s , 531597.4 tokens/s INFO:__main__:2024-10-27 12:46:36 | Epoch: 3 | Step: 86870 | Dataset: 0-10312703 | Loss: 2.182 | 676 ms/step , 58190.22 GFLOP/s , 531986.9 tokens/s INFO:__main__:2024-10-27 12:46:44 | Epoch: 3 | Step: 86880 | Dataset: 0-10320703 | Loss: 2.144 | 675 ms/step , 58233.08 GFLOP/s , 532192.4 tokens/s INFO:__main__:2024-10-27 12:46:51 | Epoch: 3 | Step: 86890 | Dataset: 0-10328703 | Loss: 2.125 | 674 ms/step , 58316.39 GFLOP/s , 532558.0 tokens/s INFO:__main__:2024-10-27 12:46:59 | Epoch: 3 | Step: 86900 | Dataset: 0-10336703 | Loss: 2.231 | 674 ms/step , 58281.43 GFLOP/s , 532734.5 tokens/s INFO:__main__:2024-10-27 12:47:07 | Epoch: 3 | Step: 86910 | Dataset: 0-10344703 | Loss: 2.147 | 675 ms/step , 58233.44 GFLOP/s , 532197.5 tokens/s INFO:__main__:2024-10-27 12:47:14 | Epoch: 3 | Step: 86920 | Dataset: 0-10352703 | Loss: 2.100 | 675 ms/step , 58227.52 GFLOP/s , 532353.1 tokens/s INFO:__main__:2024-10-27 12:47:22 | Epoch: 3 | Step: 86930 | Dataset: 0-10360703 | Loss: 2.123 | 677 ms/step , 58099.53 GFLOP/s , 532199.7 tokens/s INFO:__main__:2024-10-27 12:47:30 | Epoch: 3 | Step: 86940 | Dataset: 0-10368703 | Loss: 2.103 | 676 ms/step , 58175.11 GFLOP/s , 532326.9 tokens/s INFO:__main__:2024-10-27 12:47:37 | Epoch: 3 | Step: 86950 | Dataset: 0-10376703 | Loss: 2.145 | 675 ms/step , 58270.25 GFLOP/s , 532060.5 tokens/s INFO:__main__:2024-10-27 12:47:45 | Epoch: 3 | Step: 86960 | Dataset: 0-10384703 | Loss: 2.090 | 674 ms/step , 58343.00 GFLOP/s , 532446.0 tokens/s INFO:__main__:2024-10-27 12:47:53 | Epoch: 3 | Step: 86970 | Dataset: 0-10392703 | Loss: 2.090 | 674 ms/step , 58312.16 GFLOP/s , 532186.2 tokens/s INFO:__main__:2024-10-27 12:48:00 | Epoch: 3 | Step: 86980 | Dataset: 0-10400703 | Loss: 2.217 | 676 ms/step , 58140.02 GFLOP/s , 532071.4 tokens/s INFO:__main__:2024-10-27 12:48:08 | Epoch: 3 | Step: 86990 | Dataset: 0-10408703 | Loss: 2.228 | 676 ms/step , 58158.03 GFLOP/s , 532325.2 tokens/s INFO:__main__:2024-10-27 12:48:15 | Validation | Step: 87000 | Val_loss: 2.185 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 12:48:15 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_124815_step_87000.pt` INFO:__main__:2024-10-27 12:48:17 | Epoch: 3 | Step: 87000 | Dataset: 0-10416703 | Loss: 2.218 | 674 ms/step , 58325.13 GFLOP/s , 479127.2 tokens/s INFO:__main__:2024-10-27 12:48:24 | Epoch: 3 | Step: 87010 | Dataset: 0-10424703 | Loss: 2.166 | 675 ms/step , 58198.54 GFLOP/s , 532043.4 tokens/s INFO:__main__:2024-10-27 12:48:32 | Epoch: 3 | Step: 87020 | Dataset: 0-10432703 | Loss: 2.162 | 675 ms/step , 58198.72 GFLOP/s , 531898.9 tokens/s INFO:__main__:2024-10-27 12:48:40 | Epoch: 3 | Step: 87030 | Dataset: 0-10440703 | Loss: 2.173 | 677 ms/step , 58088.69 GFLOP/s , 532095.4 tokens/s INFO:__main__:2024-10-27 12:48:48 | Epoch: 3 | Step: 87040 | Dataset: 0-10448703 | Loss: 2.186 | 676 ms/step , 58149.39 GFLOP/s , 532236.9 tokens/s INFO:__main__:2024-10-27 12:48:55 | Epoch: 3 | Step: 87050 | Dataset: 0-10456703 | Loss: 2.191 | 676 ms/step , 58166.04 GFLOP/s , 532201.2 tokens/s INFO:__main__:2024-10-27 12:49:03 | Epoch: 3 | Step: 87060 | Dataset: 0-10464703 | Loss: 2.206 | 676 ms/step , 58185.43 GFLOP/s , 532411.5 tokens/s INFO:__main__:2024-10-27 12:49:11 | Epoch: 3 | Step: 87070 | Dataset: 0-10472703 | Loss: 2.208 | 678 ms/step , 57997.43 GFLOP/s , 531854.4 tokens/s INFO:__main__:2024-10-27 12:49:18 | Epoch: 3 | Step: 87080 | Dataset: 0-10480703 | Loss: 2.241 | 675 ms/step , 58216.74 GFLOP/s , 532415.1 tokens/s INFO:__main__:2024-10-27 12:49:26 | Epoch: 3 | Step: 87090 | Dataset: 0-10488703 | Loss: 2.182 | 676 ms/step , 58166.61 GFLOP/s , 532015.1 tokens/s INFO:__main__:2024-10-27 12:49:34 | Epoch: 3 | Step: 87100 | Dataset: 0-10496703 | Loss: 2.130 | 676 ms/step , 58184.98 GFLOP/s , 531953.5 tokens/s INFO:__main__:2024-10-27 12:49:41 | Epoch: 3 | Step: 87110 | Dataset: 0-10504703 | Loss: 2.211 | 675 ms/step , 58260.84 GFLOP/s , 531912.8 tokens/s INFO:__main__:2024-10-27 12:49:49 | Epoch: 3 | Step: 87120 | Dataset: 0-10512703 | Loss: 2.150 | 678 ms/step , 58020.68 GFLOP/s , 532386.2 tokens/s INFO:__main__:2024-10-27 12:49:57 | Epoch: 3 | Step: 87130 | Dataset: 0-10520703 | Loss: 2.157 | 674 ms/step , 58364.59 GFLOP/s , 532715.9 tokens/s INFO:__main__:2024-10-27 12:50:04 | Epoch: 3 | Step: 87140 | Dataset: 0-10528703 | Loss: 1.863 | 675 ms/step , 58248.87 GFLOP/s , 532047.9 tokens/s INFO:__main__:2024-10-27 12:50:12 | Epoch: 3 | Step: 87150 | Dataset: 0-10536703 | Loss: 1.762 | 676 ms/step , 58165.02 GFLOP/s , 531649.6 tokens/s INFO:__main__:2024-10-27 12:50:20 | Epoch: 3 | Step: 87160 | Dataset: 0-10544703 | Loss: 1.713 | 676 ms/step , 58137.56 GFLOP/s , 531502.3 tokens/s INFO:__main__:2024-10-27 12:50:28 | Epoch: 3 | Step: 87170 | Dataset: 0-10552703 | Loss: 1.694 | 676 ms/step , 58121.52 GFLOP/s , 530852.1 tokens/s INFO:__main__:2024-10-27 12:50:35 | Epoch: 3 | Step: 87180 | Dataset: 0-10560703 | Loss: 1.686 | 675 ms/step , 58202.42 GFLOP/s , 531319.7 tokens/s INFO:__main__:2024-10-27 12:50:43 | Epoch: 3 | Step: 87190 | Dataset: 0-10568703 | Loss: 1.657 | 675 ms/step , 58194.70 GFLOP/s , 531556.9 tokens/s INFO:__main__:2024-10-27 12:50:51 | Epoch: 3 | Step: 87200 | Dataset: 0-10576703 | Loss: 1.661 | 676 ms/step , 58148.46 GFLOP/s , 531494.9 tokens/s INFO:__main__:2024-10-27 12:50:58 | Epoch: 3 | Step: 87210 | Dataset: 0-10584703 | Loss: 1.647 | 676 ms/step , 58181.64 GFLOP/s , 531836.1 tokens/s INFO:__main__:2024-10-27 12:51:06 | Epoch: 3 | Step: 87220 | Dataset: 0-10592703 | Loss: 1.709 | 675 ms/step , 58248.41 GFLOP/s , 531451.3 tokens/s INFO:__main__:2024-10-27 12:51:14 | Epoch: 3 | Step: 87230 | Dataset: 0-10600703 | Loss: 1.627 | 676 ms/step , 58183.32 GFLOP/s , 531736.5 tokens/s INFO:__main__:2024-10-27 12:51:22 | Epoch: 3 | Step: 87240 | Dataset: 0-10608703 | Loss: 1.618 | 675 ms/step , 58212.95 GFLOP/s , 531609.6 tokens/s INFO:__main__:2024-10-27 12:51:29 | Epoch: 3 | Step: 87250 | Dataset: 0-10616703 | Loss: 1.640 | 676 ms/step , 58179.75 GFLOP/s , 531444.0 tokens/s INFO:__main__:2024-10-27 12:51:37 | Epoch: 3 | Step: 87260 | Dataset: 0-10624703 | Loss: 1.638 | 676 ms/step , 58142.01 GFLOP/s , 531339.0 tokens/s INFO:__main__:2024-10-27 12:51:45 | Epoch: 3 | Step: 87270 | Dataset: 0-10632703 | Loss: 1.628 | 676 ms/step , 58165.29 GFLOP/s , 531445.2 tokens/s INFO:__main__:2024-10-27 12:51:52 | Epoch: 3 | Step: 87280 | Dataset: 0-10640703 | Loss: 1.649 | 675 ms/step , 58215.28 GFLOP/s , 531991.3 tokens/s INFO:__main__:2024-10-27 12:52:00 | Epoch: 3 | Step: 87290 | Dataset: 0-10648703 | Loss: 1.649 | 676 ms/step , 58116.48 GFLOP/s , 531611.2 tokens/s INFO:__main__:2024-10-27 12:52:08 | Epoch: 3 | Step: 87300 | Dataset: 0-10656703 | Loss: 1.639 | 675 ms/step , 58234.54 GFLOP/s , 531999.3 tokens/s INFO:__main__:2024-10-27 12:52:15 | Epoch: 3 | Step: 87310 | Dataset: 0-10664703 | Loss: 2.331 | 675 ms/step , 58267.53 GFLOP/s , 532268.9 tokens/s INFO:__main__:2024-10-27 12:52:23 | Epoch: 3 | Step: 87320 | Dataset: 0-10672703 | Loss: 2.189 | 674 ms/step , 58281.61 GFLOP/s , 532930.3 tokens/s INFO:__main__:2024-10-27 12:52:31 | Epoch: 3 | Step: 87330 | Dataset: 0-10680703 | Loss: 2.120 | 676 ms/step , 58192.31 GFLOP/s , 531656.3 tokens/s INFO:__main__:2024-10-27 12:52:39 | Epoch: 3 | Step: 87340 | Dataset: 0-10688703 | Loss: 2.203 | 674 ms/step , 58306.85 GFLOP/s , 532624.7 tokens/s INFO:__main__:2024-10-27 12:52:46 | Epoch: 3 | Step: 87350 | Dataset: 0-10696703 | Loss: 2.166 | 675 ms/step , 58206.91 GFLOP/s , 532155.3 tokens/s INFO:__main__:2024-10-27 12:52:54 | Epoch: 3 | Step: 87360 | Dataset: 0-10704703 | Loss: 2.114 | 675 ms/step , 58236.74 GFLOP/s , 532018.8 tokens/s INFO:__main__:2024-10-27 12:53:02 | Epoch: 3 | Step: 87370 | Dataset: 0-10712703 | Loss: 2.103 | 674 ms/step , 58308.16 GFLOP/s , 532423.9 tokens/s INFO:__main__:2024-10-27 12:53:09 | Epoch: 3 | Step: 87380 | Dataset: 0-10720703 | Loss: 2.116 | 675 ms/step , 58209.43 GFLOP/s , 532496.4 tokens/s INFO:__main__:2024-10-27 12:53:17 | Epoch: 3 | Step: 87390 | Dataset: 0-10728703 | Loss: 2.102 | 674 ms/step , 58287.87 GFLOP/s , 532145.0 tokens/s INFO:__main__:2024-10-27 12:53:25 | Epoch: 3 | Step: 87400 | Dataset: 0-10736703 | Loss: 2.129 | 675 ms/step , 58205.17 GFLOP/s , 532207.3 tokens/s INFO:__main__:2024-10-27 12:53:32 | Epoch: 3 | Step: 87410 | Dataset: 0-10744703 | Loss: 2.071 | 676 ms/step , 58141.02 GFLOP/s , 532298.8 tokens/s INFO:__main__:2024-10-27 12:53:40 | Epoch: 3 | Step: 87420 | Dataset: 0-10752703 | Loss: 2.075 | 675 ms/step , 58254.88 GFLOP/s , 532715.7 tokens/s INFO:__main__:2024-10-27 12:53:48 | Epoch: 3 | Step: 87430 | Dataset: 0-10760703 | Loss: 2.038 | 676 ms/step , 58167.38 GFLOP/s , 532193.4 tokens/s INFO:__main__:2024-10-27 12:53:56 | Epoch: 3 | Step: 87440 | Dataset: 0-10768703 | Loss: 2.182 | 676 ms/step , 58161.22 GFLOP/s , 532153.0 tokens/s INFO:__main__:2024-10-27 12:54:03 | Epoch: 3 | Step: 87450 | Dataset: 0-10776703 | Loss: 2.049 | 676 ms/step , 58164.49 GFLOP/s , 532123.2 tokens/s INFO:__main__:2024-10-27 12:54:11 | Epoch: 3 | Step: 87460 | Dataset: 0-10784703 | Loss: 2.071 | 676 ms/step , 58134.33 GFLOP/s , 532156.8 tokens/s INFO:__main__:2024-10-27 12:54:19 | Epoch: 3 | Step: 87470 | Dataset: 0-10792703 | Loss: 2.150 | 674 ms/step , 58301.96 GFLOP/s , 532476.0 tokens/s INFO:__main__:2024-10-27 12:54:26 | Epoch: 3 | Step: 87480 | Dataset: 0-10800703 | Loss: 1.642 | 675 ms/step , 58209.59 GFLOP/s , 531734.5 tokens/s INFO:__main__:2024-10-27 12:54:34 | Epoch: 3 | Step: 87490 | Dataset: 0-10808703 | Loss: 1.633 | 675 ms/step , 58238.76 GFLOP/s , 531618.0 tokens/s INFO:__main__:2024-10-27 12:54:42 | Epoch: 3 | Step: 87500 | Dataset: 0-10816703 | Loss: 1.656 | 676 ms/step , 58134.78 GFLOP/s , 531822.9 tokens/s INFO:__main__:2024-10-27 12:54:49 | Epoch: 3 | Step: 87510 | Dataset: 0-10824703 | Loss: 1.648 | 676 ms/step , 58156.43 GFLOP/s , 532030.3 tokens/s INFO:__main__:2024-10-27 12:54:57 | Epoch: 3 | Step: 87520 | Dataset: 0-10832703 | Loss: 1.614 | 676 ms/step , 58162.39 GFLOP/s , 531799.0 tokens/s INFO:__main__:2024-10-27 12:55:05 | Epoch: 3 | Step: 87530 | Dataset: 0-10840703 | Loss: 1.642 | 676 ms/step , 58142.87 GFLOP/s , 531651.6 tokens/s INFO:__main__:2024-10-27 12:55:13 | Epoch: 3 | Step: 87540 | Dataset: 0-10848703 | Loss: 1.640 | 675 ms/step , 58274.90 GFLOP/s , 531724.2 tokens/s INFO:__main__:2024-10-27 12:55:20 | Epoch: 3 | Step: 87550 | Dataset: 0-10856703 | Loss: 1.637 | 678 ms/step , 57978.59 GFLOP/s , 531681.0 tokens/s INFO:__main__:2024-10-27 12:55:28 | Epoch: 3 | Step: 87560 | Dataset: 0-10864703 | Loss: 1.661 | 675 ms/step , 58192.95 GFLOP/s , 531883.6 tokens/s INFO:__main__:2024-10-27 12:55:36 | Epoch: 3 | Step: 87570 | Dataset: 0-10872703 | Loss: 1.669 | 674 ms/step , 58301.41 GFLOP/s , 531957.2 tokens/s INFO:__main__:2024-10-27 12:55:43 | Epoch: 3 | Step: 87580 | Dataset: 0-10880703 | Loss: 1.643 | 676 ms/step , 58106.93 GFLOP/s , 531452.6 tokens/s INFO:__main__:2024-10-27 12:55:51 | Epoch: 3 | Step: 87590 | Dataset: 0-10888703 | Loss: 1.638 | 676 ms/step , 58137.68 GFLOP/s , 531783.8 tokens/s INFO:__main__:2024-10-27 12:55:59 | Epoch: 3 | Step: 87600 | Dataset: 0-10896703 | Loss: 1.620 | 675 ms/step , 58239.93 GFLOP/s , 529937.7 tokens/s INFO:__main__:2024-10-27 12:56:06 | Epoch: 3 | Step: 87610 | Dataset: 0-10904703 | Loss: 1.645 | 674 ms/step , 58296.62 GFLOP/s , 531976.7 tokens/s INFO:__main__:2024-10-27 12:56:14 | Epoch: 3 | Step: 87620 | Dataset: 0-10912703 | Loss: 1.613 | 674 ms/step , 58307.05 GFLOP/s , 532382.6 tokens/s INFO:__main__:2024-10-27 12:56:22 | Epoch: 3 | Step: 87630 | Dataset: 0-10920703 | Loss: 1.631 | 675 ms/step , 58276.03 GFLOP/s , 532161.7 tokens/s INFO:__main__:2024-10-27 12:56:30 | Epoch: 3 | Step: 87640 | Dataset: 0-10928703 | Loss: 1.612 | 675 ms/step , 58202.92 GFLOP/s , 532214.8 tokens/s INFO:__main__:2024-10-27 12:56:37 | Epoch: 3 | Step: 87650 | Dataset: 0-10936703 | Loss: 2.308 | 676 ms/step , 58173.92 GFLOP/s , 532191.7 tokens/s INFO:__main__:2024-10-27 12:56:45 | Epoch: 3 | Step: 87660 | Dataset: 0-10944703 | Loss: 2.271 | 674 ms/step , 58325.28 GFLOP/s , 532459.9 tokens/s INFO:__main__:2024-10-27 12:56:53 | Epoch: 3 | Step: 87670 | Dataset: 0-10952703 | Loss: 2.217 | 675 ms/step , 58259.61 GFLOP/s , 532866.5 tokens/s INFO:__main__:2024-10-27 12:57:00 | Epoch: 3 | Step: 87680 | Dataset: 0-10960703 | Loss: 2.193 | 676 ms/step , 58178.33 GFLOP/s , 532171.3 tokens/s INFO:__main__:2024-10-27 12:57:08 | Epoch: 3 | Step: 87690 | Dataset: 0-10968703 | Loss: 2.117 | 676 ms/step , 58174.81 GFLOP/s , 531818.2 tokens/s INFO:__main__:2024-10-27 12:57:16 | Epoch: 3 | Step: 87700 | Dataset: 0-10976703 | Loss: 2.171 | 675 ms/step , 58239.33 GFLOP/s , 532752.5 tokens/s INFO:__main__:2024-10-27 12:57:23 | Epoch: 3 | Step: 87710 | Dataset: 0-10984703 | Loss: 2.135 | 676 ms/step , 58126.27 GFLOP/s , 532102.9 tokens/s INFO:__main__:2024-10-27 12:57:31 | Epoch: 3 | Step: 87720 | Dataset: 0-10992703 | Loss: 2.191 | 676 ms/step , 58179.87 GFLOP/s , 532320.8 tokens/s INFO:__main__:2024-10-27 12:57:39 | Epoch: 3 | Step: 87730 | Dataset: 0-11000703 | Loss: 2.139 | 676 ms/step , 58151.29 GFLOP/s , 531827.6 tokens/s INFO:__main__:2024-10-27 12:57:46 | Epoch: 3 | Step: 87740 | Dataset: 0-11008703 | Loss: 2.109 | 674 ms/step , 58308.94 GFLOP/s , 532306.1 tokens/s INFO:__main__:2024-10-27 12:57:54 | Epoch: 3 | Step: 87750 | Dataset: 0-11016703 | Loss: 2.131 | 675 ms/step , 58275.33 GFLOP/s , 533061.4 tokens/s INFO:__main__:2024-10-27 12:58:02 | Epoch: 3 | Step: 87760 | Dataset: 0-11024703 | Loss: 2.131 | 674 ms/step , 58281.11 GFLOP/s , 532417.8 tokens/s INFO:__main__:2024-10-27 12:58:10 | Epoch: 3 | Step: 87770 | Dataset: 0-11032703 | Loss: 2.078 | 675 ms/step , 58229.75 GFLOP/s , 533033.8 tokens/s INFO:__main__:2024-10-27 12:58:17 | Epoch: 3 | Step: 87780 | Dataset: 0-11040703 | Loss: 2.201 | 677 ms/step , 58099.04 GFLOP/s , 532249.6 tokens/s INFO:__main__:2024-10-27 12:58:25 | Epoch: 3 | Step: 87790 | Dataset: 0-11048703 | Loss: 2.113 | 675 ms/step , 58241.27 GFLOP/s , 532621.1 tokens/s INFO:__main__:2024-10-27 12:58:33 | Epoch: 3 | Step: 87800 | Dataset: 0-11056703 | Loss: 2.138 | 676 ms/step , 58183.73 GFLOP/s , 532460.3 tokens/s INFO:__main__:2024-10-27 12:58:40 | Epoch: 3 | Step: 87810 | Dataset: 0-11064703 | Loss: 2.128 | 676 ms/step , 58150.92 GFLOP/s , 532109.5 tokens/s INFO:__main__:2024-10-27 12:58:48 | Epoch: 3 | Step: 87820 | Dataset: 0-11072703 | Loss: 2.119 | 677 ms/step , 58080.90 GFLOP/s , 532039.2 tokens/s INFO:__main__:2024-10-27 12:58:56 | Epoch: 3 | Step: 87830 | Dataset: 0-11080703 | Loss: 2.130 | 676 ms/step , 58126.49 GFLOP/s , 532054.9 tokens/s INFO:__main__:2024-10-27 12:59:03 | Epoch: 3 | Step: 87840 | Dataset: 0-11088703 | Loss: 2.167 | 676 ms/step , 58129.33 GFLOP/s , 532275.1 tokens/s INFO:__main__:2024-10-27 12:59:11 | Epoch: 3 | Step: 87850 | Dataset: 0-11096703 | Loss: 2.067 | 676 ms/step , 58166.76 GFLOP/s , 532178.7 tokens/s INFO:__main__:2024-10-27 12:59:19 | Epoch: 3 | Step: 87860 | Dataset: 0-11104703 | Loss: 1.999 | 676 ms/step , 58162.93 GFLOP/s , 532147.8 tokens/s INFO:__main__:2024-10-27 12:59:27 | Epoch: 3 | Step: 87870 | Dataset: 0-11112703 | Loss: 2.027 | 676 ms/step , 58138.45 GFLOP/s , 531945.7 tokens/s INFO:__main__:2024-10-27 12:59:34 | Epoch: 3 | Step: 87880 | Dataset: 0-11120703 | Loss: 2.107 | 676 ms/step , 58124.22 GFLOP/s , 532192.3 tokens/s INFO:__main__:2024-10-27 12:59:42 | Epoch: 3 | Step: 87890 | Dataset: 0-11128703 | Loss: 2.097 | 676 ms/step , 58148.75 GFLOP/s , 531643.0 tokens/s INFO:__main__:2024-10-27 12:59:50 | Epoch: 3 | Step: 87900 | Dataset: 0-11136703 | Loss: 2.077 | 676 ms/step , 58152.59 GFLOP/s , 532047.8 tokens/s INFO:__main__:2024-10-27 12:59:57 | Epoch: 3 | Step: 87910 | Dataset: 0-11144703 | Loss: 1.993 | 676 ms/step , 58157.37 GFLOP/s , 532082.6 tokens/s INFO:__main__:2024-10-27 13:00:04 | Epoch: 3 | Step: 87920 | Dataset: 0-11152703 | Loss: 2.128 | 679 ms/step , 57875.11 GFLOP/s , 571835.4 tokens/s INFO:__main__:2024-10-27 13:00:12 | Epoch: 3 | Step: 87930 | Dataset: 0-11160703 | Loss: 2.047 | 678 ms/step , 57978.08 GFLOP/s , 530360.7 tokens/s INFO:__main__:2024-10-27 13:00:20 | Epoch: 3 | Step: 87940 | Dataset: 0-11168703 | Loss: 2.101 | 677 ms/step , 58036.84 GFLOP/s , 530409.3 tokens/s INFO:__main__:2024-10-27 13:00:28 | Epoch: 3 | Step: 87950 | Dataset: 0-11176703 | Loss: 2.049 | 679 ms/step , 57857.55 GFLOP/s , 530317.5 tokens/s INFO:__main__:2024-10-27 13:00:35 | Epoch: 3 | Step: 87960 | Dataset: 0-11184703 | Loss: 1.984 | 678 ms/step , 57969.61 GFLOP/s , 529849.9 tokens/s INFO:__main__:2024-10-27 13:00:43 | Epoch: 3 | Step: 87970 | Dataset: 0-11192703 | Loss: 2.242 | 678 ms/step , 57975.78 GFLOP/s , 530102.9 tokens/s INFO:__main__:2024-10-27 13:00:51 | Epoch: 3 | Step: 87980 | Dataset: 0-11200703 | Loss: 2.187 | 679 ms/step , 57934.60 GFLOP/s , 529811.1 tokens/s INFO:__main__:2024-10-27 13:00:59 | Epoch: 3 | Step: 87990 | Dataset: 0-11208703 | Loss: 2.207 | 678 ms/step , 57960.86 GFLOP/s , 530249.6 tokens/s INFO:__main__:2024-10-27 13:01:06 | Validation | Step: 88000 | Val_loss: 2.169 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 13:01:06 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_130106_step_88000.pt` INFO:__main__:2024-10-27 13:01:07 | Epoch: 3 | Step: 88000 | Dataset: 0-11216703 | Loss: 2.176 | 674 ms/step , 58302.72 GFLOP/s , 478152.7 tokens/s INFO:__main__:2024-10-27 13:01:15 | Epoch: 3 | Step: 88010 | Dataset: 0-11224703 | Loss: 2.159 | 687 ms/step , 57220.57 GFLOP/s , 526136.7 tokens/s INFO:__main__:2024-10-27 13:01:23 | Epoch: 3 | Step: 88020 | Dataset: 0-11232703 | Loss: 2.200 | 682 ms/step , 57604.61 GFLOP/s , 524449.2 tokens/s INFO:__main__:2024-10-27 13:01:30 | Epoch: 3 | Step: 88030 | Dataset: 0-11240703 | Loss: 2.185 | 684 ms/step , 57507.33 GFLOP/s , 526991.4 tokens/s INFO:__main__:2024-10-27 13:01:38 | Epoch: 3 | Step: 88040 | Dataset: 0-11248703 | Loss: 2.123 | 676 ms/step , 58125.80 GFLOP/s , 527473.4 tokens/s INFO:__main__:2024-10-27 13:01:46 | Epoch: 3 | Step: 88050 | Dataset: 0-11256703 | Loss: 2.190 | 675 ms/step , 58223.18 GFLOP/s , 532154.6 tokens/s INFO:__main__:2024-10-27 13:01:54 | Epoch: 3 | Step: 88060 | Dataset: 0-11264703 | Loss: 2.221 | 675 ms/step , 58230.63 GFLOP/s , 532470.1 tokens/s INFO:__main__:2024-10-27 13:02:01 | Epoch: 3 | Step: 88070 | Dataset: 0-11272703 | Loss: 2.089 | 675 ms/step , 58205.52 GFLOP/s , 532464.9 tokens/s INFO:__main__:2024-10-27 13:02:09 | Epoch: 3 | Step: 88080 | Dataset: 0-11280703 | Loss: 2.113 | 677 ms/step , 58099.87 GFLOP/s , 531993.6 tokens/s INFO:__main__:2024-10-27 13:02:17 | Epoch: 3 | Step: 88090 | Dataset: 0-11288703 | Loss: 2.119 | 677 ms/step , 58034.36 GFLOP/s , 531535.9 tokens/s INFO:__main__:2024-10-27 13:02:24 | Epoch: 3 | Step: 88100 | Dataset: 0-11296703 | Loss: 2.151 | 677 ms/step , 58026.72 GFLOP/s , 530751.6 tokens/s INFO:__main__:2024-10-27 13:02:32 | Epoch: 3 | Step: 88110 | Dataset: 0-11304703 | Loss: 2.151 | 678 ms/step , 58015.78 GFLOP/s , 530445.4 tokens/s INFO:__main__:2024-10-27 13:02:40 | Epoch: 3 | Step: 88120 | Dataset: 0-11312703 | Loss: 2.156 | 677 ms/step , 58026.69 GFLOP/s , 530306.1 tokens/s INFO:__main__:2024-10-27 13:02:48 | Epoch: 3 | Step: 88130 | Dataset: 0-11320703 | Loss: 2.173 | 677 ms/step , 58098.75 GFLOP/s , 530843.1 tokens/s INFO:__main__:2024-10-27 13:02:55 | Epoch: 3 | Step: 88140 | Dataset: 0-11328703 | Loss: 2.070 | 676 ms/step , 58135.24 GFLOP/s , 531907.2 tokens/s INFO:__main__:2024-10-27 13:03:03 | Epoch: 3 | Step: 88150 | Dataset: 0-11336703 | Loss: 2.164 | 678 ms/step , 57949.36 GFLOP/s , 529558.3 tokens/s INFO:__main__:2024-10-27 13:03:11 | Epoch: 3 | Step: 88160 | Dataset: 0-11344703 | Loss: 2.113 | 677 ms/step , 58023.30 GFLOP/s , 529542.1 tokens/s INFO:__main__:2024-10-27 13:03:19 | Epoch: 3 | Step: 88170 | Dataset: 0-11352703 | Loss: 2.115 | 676 ms/step , 58136.93 GFLOP/s , 529969.2 tokens/s INFO:__main__:2024-10-27 13:03:26 | Epoch: 3 | Step: 88180 | Dataset: 0-11360703 | Loss: 2.002 | 677 ms/step , 58096.41 GFLOP/s , 530714.1 tokens/s INFO:__main__:2024-10-27 13:03:34 | Epoch: 3 | Step: 88190 | Dataset: 0-11368703 | Loss: 2.158 | 675 ms/step , 58208.25 GFLOP/s , 531265.6 tokens/s INFO:__main__:2024-10-27 13:03:42 | Epoch: 3 | Step: 88200 | Dataset: 0-11376703 | Loss: 2.190 | 677 ms/step , 58024.72 GFLOP/s , 530405.1 tokens/s INFO:__main__:2024-10-27 13:03:49 | Epoch: 3 | Step: 88210 | Dataset: 0-11384703 | Loss: 2.070 | 677 ms/step , 58102.22 GFLOP/s , 528244.1 tokens/s INFO:__main__:2024-10-27 13:03:57 | Epoch: 3 | Step: 88220 | Dataset: 0-11392703 | Loss: 2.122 | 676 ms/step , 58179.02 GFLOP/s , 531836.8 tokens/s INFO:__main__:2024-10-27 13:04:05 | Epoch: 3 | Step: 88230 | Dataset: 0-11400703 | Loss: 2.057 | 676 ms/step , 58174.83 GFLOP/s , 532059.1 tokens/s INFO:__main__:2024-10-27 13:04:13 | Epoch: 3 | Step: 88240 | Dataset: 0-11408703 | Loss: 2.081 | 676 ms/step , 58190.64 GFLOP/s , 531684.5 tokens/s INFO:__main__:2024-10-27 13:04:20 | Epoch: 3 | Step: 88250 | Dataset: 0-11416703 | Loss: 2.185 | 676 ms/step , 58186.89 GFLOP/s , 531623.9 tokens/s INFO:__main__:2024-10-27 13:04:28 | Epoch: 3 | Step: 88260 | Dataset: 0-11424703 | Loss: 2.062 | 676 ms/step , 58123.97 GFLOP/s , 531898.9 tokens/s INFO:__main__:2024-10-27 13:04:36 | Epoch: 3 | Step: 88270 | Dataset: 0-11432703 | Loss: 2.094 | 676 ms/step , 58167.36 GFLOP/s , 531667.2 tokens/s INFO:__main__:2024-10-27 13:04:43 | Epoch: 3 | Step: 88280 | Dataset: 0-11440703 | Loss: 2.098 | 676 ms/step , 58161.34 GFLOP/s , 531532.4 tokens/s INFO:__main__:2024-10-27 13:04:51 | Epoch: 3 | Step: 88290 | Dataset: 0-11448703 | Loss: 1.896 | 676 ms/step , 58189.01 GFLOP/s , 531926.3 tokens/s INFO:__main__:2024-10-27 13:04:59 | Epoch: 3 | Step: 88300 | Dataset: 0-11456703 | Loss: 1.739 | 676 ms/step , 58189.49 GFLOP/s , 531914.6 tokens/s INFO:__main__:2024-10-27 13:05:06 | Epoch: 3 | Step: 88310 | Dataset: 0-11464703 | Loss: 1.696 | 675 ms/step , 58223.54 GFLOP/s , 531680.2 tokens/s INFO:__main__:2024-10-27 13:05:14 | Epoch: 3 | Step: 88320 | Dataset: 0-11472703 | Loss: 1.696 | 675 ms/step , 58223.19 GFLOP/s , 531767.2 tokens/s INFO:__main__:2024-10-27 13:05:22 | Epoch: 3 | Step: 88330 | Dataset: 0-11480703 | Loss: 1.676 | 676 ms/step , 58134.21 GFLOP/s , 531503.5 tokens/s INFO:__main__:2024-10-27 13:05:29 | Epoch: 3 | Step: 88340 | Dataset: 0-11488703 | Loss: 1.680 | 678 ms/step , 57985.21 GFLOP/s , 542830.3 tokens/s INFO:__main__:2024-10-27 13:05:37 | Epoch: 3 | Step: 88350 | Dataset: 0-11496703 | Loss: 1.684 | 676 ms/step , 58190.76 GFLOP/s , 531633.2 tokens/s INFO:__main__:2024-10-27 13:05:45 | Epoch: 3 | Step: 88360 | Dataset: 0-11504703 | Loss: 1.666 | 676 ms/step , 58121.99 GFLOP/s , 531726.7 tokens/s INFO:__main__:2024-10-27 13:05:53 | Epoch: 3 | Step: 88370 | Dataset: 0-11512703 | Loss: 1.669 | 676 ms/step , 58158.74 GFLOP/s , 531803.9 tokens/s INFO:__main__:2024-10-27 13:06:00 | Epoch: 3 | Step: 88380 | Dataset: 0-11520703 | Loss: 2.253 | 677 ms/step , 58033.53 GFLOP/s , 531441.0 tokens/s INFO:__main__:2024-10-27 13:06:08 | Epoch: 3 | Step: 88390 | Dataset: 0-11528703 | Loss: 2.241 | 674 ms/step , 58304.74 GFLOP/s , 532483.9 tokens/s INFO:__main__:2024-10-27 13:06:16 | Epoch: 3 | Step: 88400 | Dataset: 0-11536703 | Loss: 2.303 | 675 ms/step , 58206.63 GFLOP/s , 532669.1 tokens/s INFO:__main__:2024-10-27 13:06:23 | Epoch: 3 | Step: 88410 | Dataset: 0-11544703 | Loss: 2.109 | 675 ms/step , 58231.33 GFLOP/s , 532237.9 tokens/s INFO:__main__:2024-10-27 13:06:31 | Epoch: 3 | Step: 88420 | Dataset: 0-11552703 | Loss: 2.225 | 674 ms/step , 58322.81 GFLOP/s , 532356.5 tokens/s INFO:__main__:2024-10-27 13:06:39 | Epoch: 3 | Step: 88430 | Dataset: 0-11560703 | Loss: 2.169 | 675 ms/step , 58255.32 GFLOP/s , 532171.7 tokens/s INFO:__main__:2024-10-27 13:06:46 | Epoch: 3 | Step: 88440 | Dataset: 0-11568703 | Loss: 2.204 | 675 ms/step , 58216.80 GFLOP/s , 532462.3 tokens/s INFO:__main__:2024-10-27 13:06:54 | Epoch: 3 | Step: 88450 | Dataset: 0-11576703 | Loss: 2.164 | 675 ms/step , 58228.12 GFLOP/s , 532299.8 tokens/s INFO:__main__:2024-10-27 13:07:02 | Epoch: 3 | Step: 88460 | Dataset: 0-11584703 | Loss: 2.173 | 674 ms/step , 58340.06 GFLOP/s , 532522.5 tokens/s INFO:__main__:2024-10-27 13:07:09 | Epoch: 3 | Step: 88470 | Dataset: 0-11592703 | Loss: 2.185 | 675 ms/step , 58247.55 GFLOP/s , 532177.1 tokens/s INFO:__main__:2024-10-27 13:07:17 | Epoch: 3 | Step: 88480 | Dataset: 0-11600703 | Loss: 2.138 | 675 ms/step , 58270.47 GFLOP/s , 532274.5 tokens/s INFO:__main__:2024-10-27 13:07:25 | Epoch: 3 | Step: 88490 | Dataset: 0-11608703 | Loss: 2.195 | 676 ms/step , 58151.57 GFLOP/s , 532031.8 tokens/s INFO:__main__:2024-10-27 13:07:33 | Epoch: 3 | Step: 88500 | Dataset: 0-11616703 | Loss: 2.122 | 674 ms/step , 58319.31 GFLOP/s , 533026.9 tokens/s INFO:__main__:2024-10-27 13:07:40 | Epoch: 3 | Step: 88510 | Dataset: 0-11624703 | Loss: 2.171 | 674 ms/step , 58317.03 GFLOP/s , 532968.8 tokens/s INFO:__main__:2024-10-27 13:07:48 | Epoch: 3 | Step: 88520 | Dataset: 0-11632703 | Loss: 2.190 | 674 ms/step , 58343.45 GFLOP/s , 533060.1 tokens/s INFO:__main__:2024-10-27 13:07:56 | Epoch: 3 | Step: 88530 | Dataset: 0-11640703 | Loss: 2.083 | 675 ms/step , 58260.49 GFLOP/s , 532767.4 tokens/s INFO:__main__:2024-10-27 13:08:03 | Epoch: 3 | Step: 88540 | Dataset: 0-11648703 | Loss: 2.209 | 674 ms/step , 58338.15 GFLOP/s , 532966.5 tokens/s INFO:__main__:2024-10-27 13:08:11 | Epoch: 3 | Step: 88550 | Dataset: 0-11656703 | Loss: 2.119 | 674 ms/step , 58324.73 GFLOP/s , 532969.7 tokens/s INFO:__main__:2024-10-27 13:08:19 | Epoch: 3 | Step: 88560 | Dataset: 0-11664703 | Loss: 2.224 | 675 ms/step , 58244.41 GFLOP/s , 532476.9 tokens/s INFO:__main__:2024-10-27 13:08:26 | Epoch: 3 | Step: 88570 | Dataset: 0-11672703 | Loss: 2.097 | 676 ms/step , 58171.24 GFLOP/s , 532122.3 tokens/s INFO:__main__:2024-10-27 13:08:34 | Epoch: 3 | Step: 88580 | Dataset: 0-11680703 | Loss: 2.115 | 675 ms/step , 58205.23 GFLOP/s , 532105.6 tokens/s INFO:__main__:2024-10-27 13:08:42 | Epoch: 3 | Step: 88590 | Dataset: 0-11688703 | Loss: 2.154 | 676 ms/step , 58187.28 GFLOP/s , 531999.0 tokens/s INFO:__main__:2024-10-27 13:08:49 | Epoch: 3 | Step: 88600 | Dataset: 0-11696703 | Loss: 2.165 | 674 ms/step , 58289.06 GFLOP/s , 532878.5 tokens/s INFO:__main__:2024-10-27 13:08:57 | Epoch: 3 | Step: 88610 | Dataset: 0-11704703 | Loss: 2.122 | 675 ms/step , 58208.78 GFLOP/s , 532605.9 tokens/s INFO:__main__:2024-10-27 13:09:05 | Epoch: 3 | Step: 88620 | Dataset: 0-11712703 | Loss: 2.143 | 675 ms/step , 58275.44 GFLOP/s , 532385.5 tokens/s INFO:__main__:2024-10-27 13:09:13 | Epoch: 3 | Step: 88630 | Dataset: 0-11720703 | Loss: 2.126 | 675 ms/step , 58205.51 GFLOP/s , 532337.9 tokens/s INFO:__main__:2024-10-27 13:09:20 | Epoch: 3 | Step: 88640 | Dataset: 0-11728703 | Loss: 2.040 | 675 ms/step , 58258.27 GFLOP/s , 532083.8 tokens/s INFO:__main__:2024-10-27 13:09:28 | Epoch: 3 | Step: 88650 | Dataset: 0-11736703 | Loss: 2.315 | 675 ms/step , 58250.00 GFLOP/s , 533146.4 tokens/s INFO:__main__:2024-10-27 13:09:36 | Epoch: 3 | Step: 88660 | Dataset: 0-11744703 | Loss: 2.160 | 675 ms/step , 58214.50 GFLOP/s , 531863.1 tokens/s INFO:__main__:2024-10-27 13:09:43 | Epoch: 3 | Step: 88670 | Dataset: 0-11752703 | Loss: 2.177 | 675 ms/step , 58221.54 GFLOP/s , 531984.4 tokens/s INFO:__main__:2024-10-27 13:09:51 | Epoch: 3 | Step: 88680 | Dataset: 0-11760703 | Loss: 2.154 | 676 ms/step , 58143.96 GFLOP/s , 531918.3 tokens/s INFO:__main__:2024-10-27 13:09:59 | Epoch: 3 | Step: 88690 | Dataset: 0-11768703 | Loss: 2.153 | 674 ms/step , 58288.74 GFLOP/s , 532231.3 tokens/s INFO:__main__:2024-10-27 13:10:06 | Epoch: 3 | Step: 88700 | Dataset: 0-11776703 | Loss: 2.702 | 676 ms/step , 58146.82 GFLOP/s , 532128.0 tokens/s INFO:__main__:2024-10-27 13:10:14 | Epoch: 3 | Step: 88710 | Dataset: 0-11784703 | Loss: 2.602 | 676 ms/step , 58150.63 GFLOP/s , 532159.6 tokens/s INFO:__main__:2024-10-27 13:10:22 | Epoch: 3 | Step: 88720 | Dataset: 0-11792703 | Loss: 2.526 | 676 ms/step , 58124.14 GFLOP/s , 531603.8 tokens/s INFO:__main__:2024-10-27 13:10:30 | Epoch: 3 | Step: 88730 | Dataset: 0-11800703 | Loss: 2.451 | 675 ms/step , 58207.07 GFLOP/s , 531909.1 tokens/s INFO:__main__:2024-10-27 13:10:37 | Epoch: 3 | Step: 88740 | Dataset: 0-11808703 | Loss: 2.536 | 676 ms/step , 58164.70 GFLOP/s , 531775.1 tokens/s INFO:__main__:2024-10-27 13:10:45 | Epoch: 3 | Step: 88750 | Dataset: 0-11816703 | Loss: 2.512 | 675 ms/step , 58199.50 GFLOP/s , 530283.2 tokens/s INFO:__main__:2024-10-27 13:10:53 | Epoch: 3 | Step: 88760 | Dataset: 0-11824703 | Loss: 2.540 | 675 ms/step , 58201.98 GFLOP/s , 532671.2 tokens/s INFO:__main__:2024-10-27 13:11:00 | Epoch: 3 | Step: 88770 | Dataset: 0-11832703 | Loss: 2.468 | 675 ms/step , 58248.04 GFLOP/s , 531958.9 tokens/s INFO:__main__:2024-10-27 13:11:08 | Epoch: 3 | Step: 88780 | Dataset: 0-11840703 | Loss: 2.488 | 676 ms/step , 58187.26 GFLOP/s , 532174.1 tokens/s INFO:__main__:2024-10-27 13:11:16 | Epoch: 3 | Step: 88790 | Dataset: 0-11848703 | Loss: 2.472 | 676 ms/step , 58116.93 GFLOP/s , 532027.0 tokens/s INFO:__main__:2024-10-27 13:11:23 | Epoch: 3 | Step: 88800 | Dataset: 0-11856703 | Loss: 2.480 | 674 ms/step , 58289.93 GFLOP/s , 532016.6 tokens/s INFO:__main__:2024-10-27 13:11:31 | Epoch: 3 | Step: 88810 | Dataset: 0-11864703 | Loss: 2.427 | 675 ms/step , 58236.63 GFLOP/s , 532044.8 tokens/s INFO:__main__:2024-10-27 13:11:39 | Epoch: 3 | Step: 88820 | Dataset: 0-11872703 | Loss: 2.433 | 676 ms/step , 58177.23 GFLOP/s , 531709.8 tokens/s INFO:__main__:2024-10-27 13:11:47 | Epoch: 3 | Step: 88830 | Dataset: 0-11880703 | Loss: 2.464 | 676 ms/step , 58144.49 GFLOP/s , 531757.0 tokens/s INFO:__main__:2024-10-27 13:11:54 | Epoch: 3 | Step: 88840 | Dataset: 0-11888703 | Loss: 2.391 | 676 ms/step , 58159.79 GFLOP/s , 532200.9 tokens/s INFO:__main__:2024-10-27 13:12:02 | Epoch: 3 | Step: 88850 | Dataset: 0-11896703 | Loss: 2.357 | 676 ms/step , 58120.65 GFLOP/s , 531891.7 tokens/s INFO:__main__:2024-10-27 13:12:10 | Epoch: 3 | Step: 88860 | Dataset: 0-11904703 | Loss: 2.281 | 674 ms/step , 58304.86 GFLOP/s , 532460.2 tokens/s INFO:__main__:2024-10-27 13:12:17 | Epoch: 3 | Step: 88870 | Dataset: 0-11912703 | Loss: 2.217 | 676 ms/step , 58132.92 GFLOP/s , 532141.7 tokens/s INFO:__main__:2024-10-27 13:12:25 | Epoch: 3 | Step: 88880 | Dataset: 0-11920703 | Loss: 2.266 | 676 ms/step , 58161.02 GFLOP/s , 532018.8 tokens/s INFO:__main__:2024-10-27 13:12:33 | Epoch: 3 | Step: 88890 | Dataset: 0-11928703 | Loss: 2.146 | 675 ms/step , 58235.20 GFLOP/s , 532107.9 tokens/s INFO:__main__:2024-10-27 13:12:40 | Epoch: 3 | Step: 88900 | Dataset: 0-11936703 | Loss: 2.138 | 676 ms/step , 58122.37 GFLOP/s , 532727.4 tokens/s INFO:__main__:2024-10-27 13:12:48 | Epoch: 3 | Step: 88910 | Dataset: 0-11944703 | Loss: 2.224 | 674 ms/step , 58306.57 GFLOP/s , 532219.3 tokens/s INFO:__main__:2024-10-27 13:12:56 | Epoch: 3 | Step: 88920 | Dataset: 0-11952703 | Loss: 2.159 | 676 ms/step , 58144.27 GFLOP/s , 532067.6 tokens/s INFO:__main__:2024-10-27 13:13:03 | Epoch: 3 | Step: 88930 | Dataset: 0-11960703 | Loss: 2.133 | 675 ms/step , 58216.58 GFLOP/s , 532218.0 tokens/s INFO:__main__:2024-10-27 13:13:11 | Epoch: 3 | Step: 88940 | Dataset: 0-11968703 | Loss: 2.196 | 674 ms/step , 58299.61 GFLOP/s , 532166.7 tokens/s INFO:__main__:2024-10-27 13:13:19 | Epoch: 3 | Step: 88950 | Dataset: 0-11976703 | Loss: 2.166 | 675 ms/step , 58246.77 GFLOP/s , 532898.1 tokens/s INFO:__main__:2024-10-27 13:13:27 | Epoch: 3 | Step: 88960 | Dataset: 0-11984703 | Loss: 2.124 | 676 ms/step , 58128.86 GFLOP/s , 532350.4 tokens/s INFO:__main__:2024-10-27 13:13:34 | Epoch: 3 | Step: 88970 | Dataset: 0-11992703 | Loss: 2.133 | 674 ms/step , 58337.42 GFLOP/s , 532758.2 tokens/s INFO:__main__:2024-10-27 13:13:42 | Epoch: 3 | Step: 88980 | Dataset: 0-12000703 | Loss: 2.167 | 675 ms/step , 58269.04 GFLOP/s , 532945.9 tokens/s INFO:__main__:2024-10-27 13:13:50 | Epoch: 3 | Step: 88990 | Dataset: 0-12008703 | Loss: 2.129 | 675 ms/step , 58221.52 GFLOP/s , 532420.2 tokens/s INFO:__main__:2024-10-27 13:13:57 | Validation | Step: 89000 | Val_loss: 2.201 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 13:13:57 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_131357_step_89000.pt` INFO:__main__:2024-10-27 13:13:58 | Epoch: 3 | Step: 89000 | Dataset: 0-12016703 | Loss: 2.081 | 674 ms/step , 58316.97 GFLOP/s , 479517.8 tokens/s INFO:__main__:2024-10-27 13:14:06 | Epoch: 3 | Step: 89010 | Dataset: 0-12024703 | Loss: 2.216 | 675 ms/step , 58230.39 GFLOP/s , 532592.7 tokens/s INFO:__main__:2024-10-27 13:14:14 | Epoch: 3 | Step: 89020 | Dataset: 0-12032703 | Loss: 2.204 | 675 ms/step , 58226.03 GFLOP/s , 532239.8 tokens/s INFO:__main__:2024-10-27 13:14:21 | Epoch: 3 | Step: 89030 | Dataset: 0-12040703 | Loss: 2.106 | 674 ms/step , 58280.99 GFLOP/s , 532709.9 tokens/s INFO:__main__:2024-10-27 13:14:29 | Epoch: 3 | Step: 89040 | Dataset: 0-12048703 | Loss: 2.128 | 676 ms/step , 58180.66 GFLOP/s , 532163.5 tokens/s INFO:__main__:2024-10-27 13:14:37 | Epoch: 3 | Step: 89050 | Dataset: 0-12056703 | Loss: 2.053 | 676 ms/step , 58138.68 GFLOP/s , 532412.1 tokens/s INFO:__main__:2024-10-27 13:14:44 | Epoch: 3 | Step: 89060 | Dataset: 0-12064703 | Loss: 2.089 | 674 ms/step , 58302.34 GFLOP/s , 532185.8 tokens/s INFO:__main__:2024-10-27 13:14:52 | Epoch: 3 | Step: 89070 | Dataset: 0-12072703 | Loss: 2.080 | 676 ms/step , 58185.06 GFLOP/s , 532395.5 tokens/s INFO:__main__:2024-10-27 13:15:00 | Epoch: 3 | Step: 89080 | Dataset: 0-12080703 | Loss: 2.144 | 676 ms/step , 58130.30 GFLOP/s , 531769.7 tokens/s INFO:__main__:2024-10-27 13:15:07 | Epoch: 3 | Step: 89090 | Dataset: 0-12088703 | Loss: 2.086 | 676 ms/step , 58173.59 GFLOP/s , 532486.6 tokens/s INFO:__main__:2024-10-27 13:15:15 | Epoch: 3 | Step: 89100 | Dataset: 0-12096703 | Loss: 2.015 | 675 ms/step , 58263.66 GFLOP/s , 532791.0 tokens/s INFO:__main__:2024-10-27 13:15:23 | Epoch: 3 | Step: 89110 | Dataset: 0-12104703 | Loss: 2.095 | 675 ms/step , 58261.06 GFLOP/s , 533133.1 tokens/s INFO:__main__:2024-10-27 13:15:30 | Epoch: 3 | Step: 89120 | Dataset: 0-12112703 | Loss: 2.101 | 676 ms/step , 58166.48 GFLOP/s , 532943.1 tokens/s INFO:__main__:2024-10-27 13:15:38 | Epoch: 3 | Step: 89130 | Dataset: 0-12120703 | Loss: 2.160 | 675 ms/step , 58237.64 GFLOP/s , 532252.4 tokens/s INFO:__main__:2024-10-27 13:15:46 | Epoch: 3 | Step: 89140 | Dataset: 0-12128703 | Loss: 2.169 | 675 ms/step , 58194.00 GFLOP/s , 531455.1 tokens/s INFO:__main__:2024-10-27 13:15:54 | Epoch: 3 | Step: 89150 | Dataset: 0-12136703 | Loss: 2.061 | 676 ms/step , 58177.79 GFLOP/s , 531328.2 tokens/s INFO:__main__:2024-10-27 13:16:01 | Epoch: 3 | Step: 89160 | Dataset: 0-12144703 | Loss: 2.152 | 674 ms/step , 58291.68 GFLOP/s , 532526.6 tokens/s INFO:__main__:2024-10-27 13:16:09 | Epoch: 3 | Step: 89170 | Dataset: 0-12152703 | Loss: 2.131 | 675 ms/step , 58221.85 GFLOP/s , 532309.5 tokens/s INFO:__main__:2024-10-27 13:16:17 | Epoch: 3 | Step: 89180 | Dataset: 0-12160703 | Loss: 2.193 | 675 ms/step , 58249.36 GFLOP/s , 532922.6 tokens/s INFO:__main__:2024-10-27 13:16:24 | Epoch: 3 | Step: 89190 | Dataset: 0-12168703 | Loss: 2.238 | 674 ms/step , 58287.53 GFLOP/s , 532295.4 tokens/s INFO:__main__:2024-10-27 13:16:32 | Epoch: 3 | Step: 89200 | Dataset: 0-12176703 | Loss: 2.161 | 675 ms/step , 58195.38 GFLOP/s , 532408.1 tokens/s INFO:__main__:2024-10-27 13:16:40 | Epoch: 3 | Step: 89210 | Dataset: 0-12184703 | Loss: 2.198 | 675 ms/step , 58235.74 GFLOP/s , 532353.8 tokens/s INFO:__main__:2024-10-27 13:16:47 | Epoch: 3 | Step: 89220 | Dataset: 0-12192703 | Loss: 2.169 | 675 ms/step , 58203.72 GFLOP/s , 532113.5 tokens/s INFO:__main__:2024-10-27 13:16:55 | Epoch: 3 | Step: 89230 | Dataset: 0-12200703 | Loss: 2.221 | 675 ms/step , 58248.34 GFLOP/s , 532464.3 tokens/s INFO:__main__:2024-10-27 13:17:03 | Epoch: 3 | Step: 89240 | Dataset: 0-12208703 | Loss: 2.184 | 675 ms/step , 58225.15 GFLOP/s , 532205.4 tokens/s INFO:__main__:2024-10-27 13:17:11 | Epoch: 3 | Step: 89250 | Dataset: 0-12216703 | Loss: 2.180 | 674 ms/step , 58296.56 GFLOP/s , 532649.6 tokens/s INFO:__main__:2024-10-27 13:17:18 | Epoch: 3 | Step: 89260 | Dataset: 0-12224703 | Loss: 2.124 | 674 ms/step , 58322.82 GFLOP/s , 532952.1 tokens/s INFO:__main__:2024-10-27 13:17:26 | Epoch: 3 | Step: 89270 | Dataset: 0-12232703 | Loss: 2.166 | 675 ms/step , 58218.48 GFLOP/s , 533009.4 tokens/s INFO:__main__:2024-10-27 13:17:34 | Epoch: 3 | Step: 89280 | Dataset: 0-12240703 | Loss: 2.109 | 674 ms/step , 58295.57 GFLOP/s , 532231.6 tokens/s INFO:__main__:2024-10-27 13:17:41 | Epoch: 3 | Step: 89290 | Dataset: 0-12248703 | Loss: 2.163 | 674 ms/step , 58287.88 GFLOP/s , 532775.0 tokens/s INFO:__main__:2024-10-27 13:17:49 | Epoch: 3 | Step: 89300 | Dataset: 0-12256703 | Loss: 2.190 | 675 ms/step , 58249.14 GFLOP/s , 532381.2 tokens/s INFO:__main__:2024-10-27 13:17:57 | Epoch: 3 | Step: 89310 | Dataset: 0-12264703 | Loss: 2.210 | 676 ms/step , 58131.56 GFLOP/s , 532405.0 tokens/s INFO:__main__:2024-10-27 13:18:04 | Epoch: 3 | Step: 89320 | Dataset: 0-12272703 | Loss: 2.166 | 676 ms/step , 58158.01 GFLOP/s , 532136.9 tokens/s INFO:__main__:2024-10-27 13:18:12 | Epoch: 3 | Step: 89330 | Dataset: 0-12280703 | Loss: 2.120 | 676 ms/step , 58168.86 GFLOP/s , 532210.9 tokens/s INFO:__main__:2024-10-27 13:18:20 | Epoch: 3 | Step: 89340 | Dataset: 0-12288703 | Loss: 2.135 | 674 ms/step , 58341.07 GFLOP/s , 532978.8 tokens/s INFO:__main__:2024-10-27 13:18:27 | Epoch: 3 | Step: 89350 | Dataset: 0-12296703 | Loss: 2.100 | 675 ms/step , 58250.16 GFLOP/s , 532987.5 tokens/s INFO:__main__:2024-10-27 13:18:35 | Epoch: 3 | Step: 89360 | Dataset: 0-12304703 | Loss: 2.177 | 676 ms/step , 58182.29 GFLOP/s , 533133.4 tokens/s INFO:__main__:2024-10-27 13:18:43 | Epoch: 3 | Step: 89370 | Dataset: 0-12312703 | Loss: 2.143 | 675 ms/step , 58209.04 GFLOP/s , 532110.6 tokens/s INFO:__main__:2024-10-27 13:18:50 | Epoch: 3 | Step: 89380 | Dataset: 0-12320703 | Loss: 2.088 | 674 ms/step , 58295.74 GFLOP/s , 532306.0 tokens/s INFO:__main__:2024-10-27 13:18:58 | Epoch: 3 | Step: 89390 | Dataset: 0-12328703 | Loss: 2.097 | 675 ms/step , 58254.89 GFLOP/s , 532777.5 tokens/s INFO:__main__:2024-10-27 13:19:06 | Epoch: 3 | Step: 89400 | Dataset: 0-12336703 | Loss: 2.116 | 677 ms/step , 58078.82 GFLOP/s , 532368.4 tokens/s INFO:__main__:2024-10-27 13:19:14 | Epoch: 3 | Step: 89410 | Dataset: 0-12344703 | Loss: 2.093 | 676 ms/step , 58188.52 GFLOP/s , 532384.8 tokens/s INFO:__main__:2024-10-27 13:19:21 | Epoch: 3 | Step: 89420 | Dataset: 0-12352703 | Loss: 2.045 | 674 ms/step , 58329.21 GFLOP/s , 532552.8 tokens/s INFO:__main__:2024-10-27 13:19:29 | Epoch: 3 | Step: 89430 | Dataset: 0-12360703 | Loss: 2.097 | 676 ms/step , 58118.91 GFLOP/s , 532216.3 tokens/s INFO:__main__:2024-10-27 13:19:37 | Epoch: 3 | Step: 89440 | Dataset: 0-12368703 | Loss: 2.041 | 676 ms/step , 58126.20 GFLOP/s , 531574.8 tokens/s INFO:__main__:2024-10-27 13:19:44 | Epoch: 3 | Step: 89450 | Dataset: 0-12376703 | Loss: 2.118 | 675 ms/step , 58240.29 GFLOP/s , 532090.9 tokens/s INFO:__main__:2024-10-27 13:19:52 | Epoch: 3 | Step: 89460 | Dataset: 0-12384703 | Loss: 2.154 | 675 ms/step , 58248.03 GFLOP/s , 532563.8 tokens/s INFO:__main__:2024-10-27 13:20:00 | Epoch: 3 | Step: 89470 | Dataset: 0-12392703 | Loss: 2.133 | 676 ms/step , 58160.96 GFLOP/s , 532058.3 tokens/s INFO:__main__:2024-10-27 13:20:07 | Epoch: 3 | Step: 89480 | Dataset: 0-12400703 | Loss: 2.170 | 675 ms/step , 58278.20 GFLOP/s , 532357.2 tokens/s INFO:__main__:2024-10-27 13:20:15 | Epoch: 3 | Step: 89490 | Dataset: 0-12408703 | Loss: 2.041 | 676 ms/step , 58176.94 GFLOP/s , 532434.6 tokens/s INFO:__main__:2024-10-27 13:20:23 | Epoch: 3 | Step: 89500 | Dataset: 0-12416703 | Loss: 2.149 | 676 ms/step , 58151.02 GFLOP/s , 531336.7 tokens/s INFO:__main__:2024-10-27 13:20:31 | Epoch: 3 | Step: 89510 | Dataset: 0-12424703 | Loss: 2.294 | 677 ms/step , 58079.85 GFLOP/s , 532012.4 tokens/s INFO:__main__:2024-10-27 13:20:38 | Epoch: 3 | Step: 89520 | Dataset: 0-12432703 | Loss: 2.186 | 678 ms/step , 57985.86 GFLOP/s , 529852.4 tokens/s INFO:__main__:2024-10-27 13:20:46 | Epoch: 3 | Step: 89530 | Dataset: 0-12440703 | Loss: 2.179 | 677 ms/step , 58096.87 GFLOP/s , 530477.7 tokens/s INFO:__main__:2024-10-27 13:20:54 | Epoch: 3 | Step: 89540 | Dataset: 0-12448703 | Loss: 2.252 | 677 ms/step , 58091.94 GFLOP/s , 530913.5 tokens/s INFO:__main__:2024-10-27 13:21:01 | Epoch: 3 | Step: 89550 | Dataset: 0-12456703 | Loss: 2.166 | 677 ms/step , 58080.04 GFLOP/s , 530527.3 tokens/s INFO:__main__:2024-10-27 13:21:09 | Epoch: 3 | Step: 89560 | Dataset: 0-12464703 | Loss: 2.138 | 676 ms/step , 58151.08 GFLOP/s , 530412.6 tokens/s INFO:__main__:2024-10-27 13:21:17 | Epoch: 3 | Step: 89570 | Dataset: 0-12472703 | Loss: 2.158 | 677 ms/step , 58047.34 GFLOP/s , 531115.4 tokens/s INFO:__main__:2024-10-27 13:21:25 | Epoch: 3 | Step: 89580 | Dataset: 0-12480703 | Loss: 2.098 | 677 ms/step , 58061.72 GFLOP/s , 530757.4 tokens/s INFO:__main__:2024-10-27 13:21:32 | Epoch: 3 | Step: 89590 | Dataset: 0-12488703 | Loss: 2.143 | 678 ms/step , 57997.13 GFLOP/s , 529000.4 tokens/s INFO:__main__:2024-10-27 13:21:40 | Epoch: 3 | Step: 89600 | Dataset: 0-12496703 | Loss: 2.092 | 675 ms/step , 58246.15 GFLOP/s , 530395.3 tokens/s INFO:__main__:2024-10-27 13:21:48 | Epoch: 3 | Step: 89610 | Dataset: 0-12504703 | Loss: 2.122 | 677 ms/step , 58100.80 GFLOP/s , 532554.0 tokens/s INFO:__main__:2024-10-27 13:21:55 | Epoch: 3 | Step: 89620 | Dataset: 0-12512703 | Loss: 2.158 | 677 ms/step , 58036.18 GFLOP/s , 532271.0 tokens/s INFO:__main__:2024-10-27 13:22:03 | Epoch: 3 | Step: 89630 | Dataset: 0-12520703 | Loss: 2.144 | 676 ms/step , 58132.77 GFLOP/s , 532290.9 tokens/s INFO:__main__:2024-10-27 13:22:11 | Epoch: 3 | Step: 89640 | Dataset: 0-12528703 | Loss: 2.171 | 675 ms/step , 58256.96 GFLOP/s , 532614.1 tokens/s INFO:__main__:2024-10-27 13:22:19 | Epoch: 3 | Step: 89650 | Dataset: 0-12536703 | Loss: 2.122 | 675 ms/step , 58207.01 GFLOP/s , 532335.3 tokens/s INFO:__main__:2024-10-27 13:22:26 | Epoch: 3 | Step: 89660 | Dataset: 0-12544703 | Loss: 2.159 | 676 ms/step , 58142.86 GFLOP/s , 531898.8 tokens/s INFO:__main__:2024-10-27 13:22:34 | Epoch: 3 | Step: 89670 | Dataset: 0-12552703 | Loss: 2.131 | 676 ms/step , 58155.91 GFLOP/s , 531917.9 tokens/s INFO:__main__:2024-10-27 13:22:42 | Epoch: 3 | Step: 89680 | Dataset: 0-12560703 | Loss: 2.171 | 675 ms/step , 58251.41 GFLOP/s , 531845.9 tokens/s INFO:__main__:2024-10-27 13:22:49 | Epoch: 3 | Step: 89690 | Dataset: 0-12568703 | Loss: 2.197 | 675 ms/step , 58220.24 GFLOP/s , 532179.1 tokens/s INFO:__main__:2024-10-27 13:22:57 | Epoch: 3 | Step: 89700 | Dataset: 0-12576703 | Loss: 2.127 | 676 ms/step , 58170.54 GFLOP/s , 532297.8 tokens/s INFO:__main__:2024-10-27 13:23:05 | Epoch: 3 | Step: 89710 | Dataset: 0-12584703 | Loss: 2.143 | 674 ms/step , 58327.95 GFLOP/s , 533209.7 tokens/s INFO:__main__:2024-10-27 13:23:12 | Epoch: 3 | Step: 89720 | Dataset: 0-12592703 | Loss: 2.058 | 684 ms/step , 57499.31 GFLOP/s , 531475.5 tokens/s INFO:__main__:2024-10-27 13:23:20 | Epoch: 3 | Step: 89730 | Dataset: 0-12600703 | Loss: 2.145 | 676 ms/step , 58185.73 GFLOP/s , 532882.3 tokens/s INFO:__main__:2024-10-27 13:23:28 | Epoch: 3 | Step: 89740 | Dataset: 0-12608703 | Loss: 2.157 | 675 ms/step , 58264.39 GFLOP/s , 533184.8 tokens/s INFO:__main__:2024-10-27 13:23:35 | Epoch: 3 | Step: 89750 | Dataset: 0-12616703 | Loss: 2.135 | 675 ms/step , 58273.51 GFLOP/s , 532934.5 tokens/s INFO:__main__:2024-10-27 13:23:43 | Epoch: 3 | Step: 89760 | Dataset: 0-12624703 | Loss: 2.071 | 674 ms/step , 58283.49 GFLOP/s , 533236.6 tokens/s INFO:__main__:2024-10-27 13:23:51 | Epoch: 3 | Step: 89770 | Dataset: 0-12632703 | Loss: 2.094 | 676 ms/step , 58176.67 GFLOP/s , 532928.8 tokens/s INFO:__main__:2024-10-27 13:23:59 | Epoch: 3 | Step: 89780 | Dataset: 0-12640703 | Loss: 2.108 | 675 ms/step , 58256.59 GFLOP/s , 532537.5 tokens/s INFO:__main__:2024-10-27 13:24:06 | Epoch: 3 | Step: 89790 | Dataset: 0-12648703 | Loss: 2.120 | 675 ms/step , 58196.86 GFLOP/s , 532416.8 tokens/s INFO:__main__:2024-10-27 13:24:14 | Epoch: 3 | Step: 89800 | Dataset: 0-12656703 | Loss: 2.054 | 674 ms/step , 58295.74 GFLOP/s , 532981.3 tokens/s INFO:__main__:2024-10-27 13:24:22 | Epoch: 3 | Step: 89810 | Dataset: 0-12664703 | Loss: 2.102 | 675 ms/step , 58210.24 GFLOP/s , 532567.0 tokens/s INFO:__main__:2024-10-27 13:24:29 | Epoch: 3 | Step: 89820 | Dataset: 0-12672703 | Loss: 2.147 | 675 ms/step , 58253.51 GFLOP/s , 532841.9 tokens/s INFO:__main__:2024-10-27 13:24:37 | Epoch: 3 | Step: 89830 | Dataset: 0-12680703 | Loss: 2.196 | 677 ms/step , 58090.60 GFLOP/s , 532422.9 tokens/s INFO:__main__:2024-10-27 13:24:45 | Epoch: 3 | Step: 89840 | Dataset: 0-12688703 | Loss: 2.264 | 675 ms/step , 58269.00 GFLOP/s , 532838.4 tokens/s INFO:__main__:2024-10-27 13:24:52 | Epoch: 3 | Step: 89850 | Dataset: 0-12696703 | Loss: 2.149 | 675 ms/step , 58253.08 GFLOP/s , 533309.1 tokens/s INFO:__main__:2024-10-27 13:25:00 | Epoch: 3 | Step: 89860 | Dataset: 0-12704703 | Loss: 2.089 | 676 ms/step , 58183.61 GFLOP/s , 532515.8 tokens/s INFO:__main__:2024-10-27 13:25:08 | Epoch: 3 | Step: 89870 | Dataset: 0-12712703 | Loss: 2.123 | 675 ms/step , 58222.35 GFLOP/s , 532815.4 tokens/s INFO:__main__:2024-10-27 13:25:15 | Epoch: 3 | Step: 89880 | Dataset: 0-12720703 | Loss: 2.108 | 675 ms/step , 58204.59 GFLOP/s , 532624.9 tokens/s INFO:__main__:2024-10-27 13:25:23 | Epoch: 3 | Step: 89890 | Dataset: 0-12728703 | Loss: 2.114 | 676 ms/step , 58191.50 GFLOP/s , 533086.3 tokens/s INFO:__main__:2024-10-27 13:25:31 | Epoch: 3 | Step: 89900 | Dataset: 0-12736703 | Loss: 2.125 | 675 ms/step , 58251.51 GFLOP/s , 532668.7 tokens/s INFO:__main__:2024-10-27 13:25:38 | Epoch: 3 | Step: 89910 | Dataset: 0-12744703 | Loss: 2.122 | 674 ms/step , 58304.70 GFLOP/s , 533201.1 tokens/s INFO:__main__:2024-10-27 13:25:46 | Epoch: 3 | Step: 89920 | Dataset: 0-12752703 | Loss: 2.177 | 675 ms/step , 58269.32 GFLOP/s , 533180.5 tokens/s INFO:__main__:2024-10-27 13:25:54 | Epoch: 3 | Step: 89930 | Dataset: 0-12760703 | Loss: 2.106 | 675 ms/step , 58274.89 GFLOP/s , 533164.6 tokens/s INFO:__main__:2024-10-27 13:26:02 | Epoch: 3 | Step: 89940 | Dataset: 0-12768703 | Loss: 2.160 | 675 ms/step , 58211.20 GFLOP/s , 532998.5 tokens/s INFO:__main__:2024-10-27 13:26:09 | Epoch: 3 | Step: 89950 | Dataset: 0-12776703 | Loss: 2.097 | 675 ms/step , 58216.26 GFLOP/s , 532615.2 tokens/s INFO:__main__:2024-10-27 13:26:17 | Epoch: 3 | Step: 89960 | Dataset: 0-12784703 | Loss: 2.178 | 675 ms/step , 58234.68 GFLOP/s , 532687.7 tokens/s INFO:__main__:2024-10-27 13:26:25 | Epoch: 3 | Step: 89970 | Dataset: 0-12792703 | Loss: 2.141 | 674 ms/step , 58301.09 GFLOP/s , 532921.9 tokens/s INFO:__main__:2024-10-27 13:26:32 | Epoch: 3 | Step: 89980 | Dataset: 0-12800703 | Loss: 2.048 | 675 ms/step , 58276.11 GFLOP/s , 533420.0 tokens/s INFO:__main__:2024-10-27 13:26:40 | Epoch: 3 | Step: 89990 | Dataset: 0-12808703 | Loss: 2.458 | 675 ms/step , 58234.24 GFLOP/s , 532102.3 tokens/s INFO:__main__:2024-10-27 13:26:47 | Validation | Step: 90000 | Val_loss: 2.207 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 13:26:47 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_132647_step_90000.pt` INFO:__main__:2024-10-27 13:26:48 | Epoch: 3 | Step: 90000 | Dataset: 0-12816703 | Loss: 2.340 | 674 ms/step , 58327.62 GFLOP/s , 480168.9 tokens/s INFO:__main__:2024-10-27 13:26:56 | Epoch: 3 | Step: 90010 | Dataset: 0-12824703 | Loss: 2.371 | 675 ms/step , 58269.53 GFLOP/s , 532506.4 tokens/s INFO:__main__:2024-10-27 13:27:04 | Epoch: 3 | Step: 90020 | Dataset: 0-12832703 | Loss: 2.257 | 675 ms/step , 58254.27 GFLOP/s , 532896.6 tokens/s INFO:__main__:2024-10-27 13:27:12 | Epoch: 3 | Step: 90030 | Dataset: 0-12840703 | Loss: 2.350 | 675 ms/step , 58255.24 GFLOP/s , 532768.2 tokens/s INFO:__main__:2024-10-27 13:27:19 | Epoch: 3 | Step: 90040 | Dataset: 0-12848703 | Loss: 2.254 | 674 ms/step , 58297.17 GFLOP/s , 532756.3 tokens/s INFO:__main__:2024-10-27 13:27:27 | Epoch: 3 | Step: 90050 | Dataset: 0-12856703 | Loss: 2.288 | 675 ms/step , 58196.20 GFLOP/s , 532330.3 tokens/s INFO:__main__:2024-10-27 13:27:35 | Epoch: 3 | Step: 90060 | Dataset: 0-12864703 | Loss: 2.317 | 675 ms/step , 58237.05 GFLOP/s , 532529.5 tokens/s INFO:__main__:2024-10-27 13:27:42 | Epoch: 3 | Step: 90070 | Dataset: 0-12872703 | Loss: 2.260 | 676 ms/step , 58174.94 GFLOP/s , 532599.7 tokens/s INFO:__main__:2024-10-27 13:27:50 | Epoch: 3 | Step: 90080 | Dataset: 0-12880703 | Loss: 2.270 | 687 ms/step , 57212.81 GFLOP/s , 532147.7 tokens/s INFO:__main__:2024-10-27 13:27:58 | Epoch: 3 | Step: 90090 | Dataset: 0-12888703 | Loss: 2.345 | 675 ms/step , 58199.29 GFLOP/s , 532184.0 tokens/s INFO:__main__:2024-10-27 13:28:05 | Epoch: 3 | Step: 90100 | Dataset: 0-12896703 | Loss: 2.308 | 675 ms/step , 58222.69 GFLOP/s , 532505.8 tokens/s INFO:__main__:2024-10-27 13:28:13 | Epoch: 3 | Step: 90110 | Dataset: 0-12904703 | Loss: 2.320 | 675 ms/step , 58233.35 GFLOP/s , 532678.8 tokens/s INFO:__main__:2024-10-27 13:28:21 | Epoch: 3 | Step: 90120 | Dataset: 0-12912703 | Loss: 2.353 | 675 ms/step , 58232.38 GFLOP/s , 531836.2 tokens/s INFO:__main__:2024-10-27 13:28:28 | Epoch: 3 | Step: 90130 | Dataset: 0-12920703 | Loss: 2.346 | 674 ms/step , 58311.37 GFLOP/s , 532082.3 tokens/s INFO:__main__:2024-10-27 13:28:36 | Epoch: 3 | Step: 90140 | Dataset: 0-12928703 | Loss: 2.345 | 674 ms/step , 58282.08 GFLOP/s , 532745.3 tokens/s INFO:__main__:2024-10-27 13:28:44 | Epoch: 3 | Step: 90150 | Dataset: 0-12936703 | Loss: 2.017 | 675 ms/step , 58215.27 GFLOP/s , 532498.2 tokens/s INFO:__main__:2024-10-27 13:28:52 | Epoch: 3 | Step: 90160 | Dataset: 0-12944703 | Loss: 1.908 | 674 ms/step , 58283.71 GFLOP/s , 532662.7 tokens/s INFO:__main__:2024-10-27 13:28:59 | Epoch: 3 | Step: 90170 | Dataset: 0-12952703 | Loss: 1.869 | 677 ms/step , 58076.77 GFLOP/s , 531666.0 tokens/s INFO:__main__:2024-10-27 13:29:07 | Epoch: 3 | Step: 90180 | Dataset: 0-12960703 | Loss: 1.804 | 676 ms/step , 58126.64 GFLOP/s , 531744.4 tokens/s INFO:__main__:2024-10-27 13:29:15 | Epoch: 3 | Step: 90190 | Dataset: 0-12968703 | Loss: 1.826 | 677 ms/step , 58060.65 GFLOP/s , 532134.9 tokens/s INFO:__main__:2024-10-27 13:29:22 | Epoch: 3 | Step: 90200 | Dataset: 0-12976703 | Loss: 1.810 | 676 ms/step , 58157.91 GFLOP/s , 531783.1 tokens/s INFO:__main__:2024-10-27 13:29:30 | Epoch: 3 | Step: 90210 | Dataset: 0-12984703 | Loss: 1.809 | 677 ms/step , 58086.36 GFLOP/s , 531642.6 tokens/s INFO:__main__:2024-10-27 13:29:38 | Epoch: 3 | Step: 90220 | Dataset: 0-12992703 | Loss: 1.818 | 677 ms/step , 58025.52 GFLOP/s , 531394.9 tokens/s INFO:__main__:2024-10-27 13:29:45 | Epoch: 3 | Step: 90230 | Dataset: 0-13000703 | Loss: 1.765 | 676 ms/step , 58187.66 GFLOP/s , 531361.4 tokens/s INFO:__main__:2024-10-27 13:29:53 | Epoch: 3 | Step: 90240 | Dataset: 0-13008703 | Loss: 1.769 | 675 ms/step , 58212.55 GFLOP/s , 531597.1 tokens/s INFO:__main__:2024-10-27 13:30:01 | Epoch: 3 | Step: 90250 | Dataset: 0-13016703 | Loss: 1.779 | 676 ms/step , 58177.55 GFLOP/s , 531773.9 tokens/s INFO:__main__:2024-10-27 13:30:09 | Epoch: 3 | Step: 90260 | Dataset: 0-13024703 | Loss: 1.781 | 681 ms/step , 57740.87 GFLOP/s , 531197.6 tokens/s INFO:__main__:2024-10-27 13:30:16 | Epoch: 3 | Step: 90270 | Dataset: 0-13032703 | Loss: 1.755 | 677 ms/step , 58028.12 GFLOP/s , 530587.0 tokens/s INFO:__main__:2024-10-27 13:30:24 | Epoch: 3 | Step: 90280 | Dataset: 0-13040703 | Loss: 1.742 | 676 ms/step , 58155.65 GFLOP/s , 531768.1 tokens/s INFO:__main__:2024-10-27 13:30:32 | Epoch: 3 | Step: 90290 | Dataset: 0-13048703 | Loss: 1.757 | 676 ms/step , 58171.23 GFLOP/s , 531770.9 tokens/s INFO:__main__:2024-10-27 13:30:39 | Epoch: 3 | Step: 90300 | Dataset: 0-13056703 | Loss: 1.752 | 676 ms/step , 58166.62 GFLOP/s , 530961.5 tokens/s INFO:__main__:2024-10-27 13:30:47 | Epoch: 3 | Step: 90310 | Dataset: 0-13064703 | Loss: 1.734 | 677 ms/step , 58087.42 GFLOP/s , 531822.1 tokens/s INFO:__main__:2024-10-27 13:30:55 | Epoch: 3 | Step: 90320 | Dataset: 0-13072703 | Loss: 2.162 | 676 ms/step , 58170.63 GFLOP/s , 530593.2 tokens/s INFO:__main__:2024-10-27 13:31:03 | Epoch: 3 | Step: 90330 | Dataset: 0-13080703 | Loss: 2.265 | 675 ms/step , 58239.79 GFLOP/s , 532756.4 tokens/s INFO:__main__:2024-10-27 13:31:10 | Epoch: 3 | Step: 90340 | Dataset: 0-13088703 | Loss: 2.171 | 675 ms/step , 58241.28 GFLOP/s , 532812.5 tokens/s INFO:__main__:2024-10-27 13:31:18 | Epoch: 3 | Step: 90350 | Dataset: 0-13096703 | Loss: 2.155 | 675 ms/step , 58193.34 GFLOP/s , 532682.9 tokens/s INFO:__main__:2024-10-27 13:31:26 | Epoch: 3 | Step: 90360 | Dataset: 0-13104703 | Loss: 2.104 | 675 ms/step , 58230.79 GFLOP/s , 532488.4 tokens/s INFO:__main__:2024-10-27 13:31:33 | Epoch: 3 | Step: 90370 | Dataset: 0-13112703 | Loss: 2.107 | 675 ms/step , 58211.83 GFLOP/s , 532569.1 tokens/s INFO:__main__:2024-10-27 13:31:41 | Epoch: 3 | Step: 90380 | Dataset: 0-13120703 | Loss: 2.043 | 674 ms/step , 58293.59 GFLOP/s , 533103.0 tokens/s INFO:__main__:2024-10-27 13:31:49 | Epoch: 3 | Step: 90390 | Dataset: 0-13128703 | Loss: 2.093 | 675 ms/step , 58252.62 GFLOP/s , 532920.5 tokens/s INFO:__main__:2024-10-27 13:31:56 | Epoch: 3 | Step: 90400 | Dataset: 0-13136703 | Loss: 1.969 | 675 ms/step , 58199.48 GFLOP/s , 532940.9 tokens/s INFO:__main__:2024-10-27 13:32:04 | Epoch: 3 | Step: 90410 | Dataset: 0-13144703 | Loss: 2.073 | 675 ms/step , 58266.71 GFLOP/s , 533105.8 tokens/s INFO:__main__:2024-10-27 13:32:12 | Epoch: 3 | Step: 90420 | Dataset: 0-13152703 | Loss: 2.095 | 674 ms/step , 58307.40 GFLOP/s , 533232.5 tokens/s INFO:__main__:2024-10-27 13:32:19 | Epoch: 3 | Step: 90430 | Dataset: 0-13160703 | Loss: 2.077 | 677 ms/step , 58036.44 GFLOP/s , 531480.7 tokens/s INFO:__main__:2024-10-27 13:32:27 | Epoch: 3 | Step: 90440 | Dataset: 0-13168703 | Loss: 1.972 | 675 ms/step , 58271.66 GFLOP/s , 532428.8 tokens/s INFO:__main__:2024-10-27 13:32:35 | Epoch: 3 | Step: 90450 | Dataset: 0-13176703 | Loss: 2.100 | 677 ms/step , 58098.77 GFLOP/s , 532831.5 tokens/s INFO:__main__:2024-10-27 13:32:42 | Epoch: 3 | Step: 90460 | Dataset: 0-13184703 | Loss: 1.974 | 675 ms/step , 58255.74 GFLOP/s , 533091.5 tokens/s INFO:__main__:2024-10-27 13:32:50 | Epoch: 3 | Step: 90470 | Dataset: 0-13192703 | Loss: 2.054 | 675 ms/step , 58264.75 GFLOP/s , 532685.8 tokens/s INFO:__main__:2024-10-27 13:32:58 | Epoch: 3 | Step: 90480 | Dataset: 0-13200703 | Loss: 2.019 | 676 ms/step , 58189.02 GFLOP/s , 532608.0 tokens/s INFO:__main__:2024-10-27 13:33:06 | Epoch: 3 | Step: 90490 | Dataset: 0-13208703 | Loss: 2.361 | 674 ms/step , 58312.77 GFLOP/s , 532773.0 tokens/s INFO:__main__:2024-10-27 13:33:13 | Epoch: 3 | Step: 90500 | Dataset: 0-13216703 | Loss: 2.281 | 675 ms/step , 58220.62 GFLOP/s , 532440.1 tokens/s INFO:__main__:2024-10-27 13:33:21 | Epoch: 3 | Step: 90510 | Dataset: 0-13224703 | Loss: 2.259 | 674 ms/step , 58288.05 GFLOP/s , 533008.8 tokens/s INFO:__main__:2024-10-27 13:33:29 | Epoch: 3 | Step: 90520 | Dataset: 0-13232703 | Loss: 2.228 | 675 ms/step , 58231.93 GFLOP/s , 532399.4 tokens/s INFO:__main__:2024-10-27 13:33:36 | Epoch: 3 | Step: 90530 | Dataset: 0-13240703 | Loss: 2.161 | 675 ms/step , 58217.59 GFLOP/s , 532835.8 tokens/s INFO:__main__:2024-10-27 13:33:44 | Epoch: 3 | Step: 90540 | Dataset: 0-13248703 | Loss: 2.225 | 675 ms/step , 58209.95 GFLOP/s , 532696.8 tokens/s INFO:__main__:2024-10-27 13:33:52 | Epoch: 3 | Step: 90550 | Dataset: 0-13256703 | Loss: 2.169 | 673 ms/step , 58380.10 GFLOP/s , 532964.4 tokens/s INFO:__main__:2024-10-27 13:33:59 | Epoch: 3 | Step: 90560 | Dataset: 0-13264703 | Loss: 2.106 | 675 ms/step , 58267.51 GFLOP/s , 533347.8 tokens/s INFO:__main__:2024-10-27 13:34:07 | Epoch: 3 | Step: 90570 | Dataset: 0-13272703 | Loss: 2.171 | 674 ms/step , 58329.34 GFLOP/s , 533441.7 tokens/s INFO:__main__:2024-10-27 13:34:15 | Epoch: 3 | Step: 90580 | Dataset: 0-13280703 | Loss: 2.079 | 674 ms/step , 58352.05 GFLOP/s , 533161.9 tokens/s INFO:__main__:2024-10-27 13:34:22 | Epoch: 3 | Step: 90590 | Dataset: 0-13288703 | Loss: 2.167 | 675 ms/step , 58269.19 GFLOP/s , 531747.1 tokens/s INFO:__main__:2024-10-27 13:34:30 | Epoch: 3 | Step: 90600 | Dataset: 0-13296703 | Loss: 2.145 | 675 ms/step , 58195.54 GFLOP/s , 532647.6 tokens/s INFO:__main__:2024-10-27 13:34:38 | Epoch: 3 | Step: 90610 | Dataset: 0-13304703 | Loss: 2.184 | 673 ms/step , 58369.66 GFLOP/s , 533160.6 tokens/s INFO:__main__:2024-10-27 13:34:46 | Epoch: 3 | Step: 90620 | Dataset: 0-13312703 | Loss: 2.155 | 675 ms/step , 58265.06 GFLOP/s , 532490.9 tokens/s INFO:__main__:2024-10-27 13:34:53 | Epoch: 3 | Step: 90630 | Dataset: 0-13320703 | Loss: 2.147 | 674 ms/step , 58289.45 GFLOP/s , 532638.6 tokens/s INFO:__main__:2024-10-27 13:35:01 | Epoch: 3 | Step: 90640 | Dataset: 0-13328703 | Loss: 2.121 | 675 ms/step , 58203.42 GFLOP/s , 533274.5 tokens/s INFO:__main__:2024-10-27 13:35:09 | Epoch: 3 | Step: 90650 | Dataset: 0-13336703 | Loss: 2.140 | 674 ms/step , 58304.98 GFLOP/s , 533294.6 tokens/s INFO:__main__:2024-10-27 13:35:16 | Epoch: 3 | Step: 90660 | Dataset: 0-13344703 | Loss: 2.141 | 674 ms/step , 58306.73 GFLOP/s , 533344.8 tokens/s INFO:__main__:2024-10-27 13:35:24 | Epoch: 3 | Step: 90670 | Dataset: 0-13352703 | Loss: 2.119 | 674 ms/step , 58301.65 GFLOP/s , 533367.8 tokens/s INFO:__main__:2024-10-27 13:35:32 | Epoch: 3 | Step: 90680 | Dataset: 0-13360703 | Loss: 2.174 | 674 ms/step , 58281.85 GFLOP/s , 533373.7 tokens/s INFO:__main__:2024-10-27 13:35:39 | Epoch: 3 | Step: 90690 | Dataset: 0-13368703 | Loss: 2.169 | 675 ms/step , 58236.27 GFLOP/s , 532937.6 tokens/s INFO:__main__:2024-10-27 13:35:47 | Epoch: 3 | Step: 90700 | Dataset: 0-13376703 | Loss: 2.126 | 675 ms/step , 58265.36 GFLOP/s , 532999.0 tokens/s INFO:__main__:2024-10-27 13:35:55 | Epoch: 3 | Step: 90710 | Dataset: 0-13384703 | Loss: 2.145 | 676 ms/step , 58190.79 GFLOP/s , 533239.7 tokens/s INFO:__main__:2024-10-27 13:36:02 | Epoch: 3 | Step: 90720 | Dataset: 0-13392703 | Loss: 2.029 | 675 ms/step , 58263.45 GFLOP/s , 532669.7 tokens/s INFO:__main__:2024-10-27 13:36:10 | Epoch: 3 | Step: 90730 | Dataset: 0-13400703 | Loss: 2.195 | 675 ms/step , 58210.77 GFLOP/s , 533059.9 tokens/s INFO:__main__:2024-10-27 13:36:18 | Epoch: 3 | Step: 90740 | Dataset: 0-13408703 | Loss: 2.151 | 674 ms/step , 58307.54 GFLOP/s , 532634.3 tokens/s INFO:__main__:2024-10-27 13:36:25 | Epoch: 3 | Step: 90750 | Dataset: 0-13416703 | Loss: 2.175 | 676 ms/step , 58161.83 GFLOP/s , 532973.5 tokens/s INFO:__main__:2024-10-27 13:36:33 | Epoch: 3 | Step: 90760 | Dataset: 0-13424703 | Loss: 2.087 | 676 ms/step , 58165.76 GFLOP/s , 532933.5 tokens/s INFO:__main__:2024-10-27 13:36:41 | Epoch: 3 | Step: 90770 | Dataset: 0-13432703 | Loss: 2.122 | 676 ms/step , 58132.55 GFLOP/s , 532707.5 tokens/s INFO:__main__:2024-10-27 13:36:48 | Epoch: 3 | Step: 90780 | Dataset: 0-13440703 | Loss: 2.122 | 675 ms/step , 58252.12 GFLOP/s , 532578.2 tokens/s INFO:__main__:2024-10-27 13:36:56 | Epoch: 3 | Step: 90790 | Dataset: 0-13448703 | Loss: 2.126 | 675 ms/step , 58264.50 GFLOP/s , 532809.7 tokens/s INFO:__main__:2024-10-27 13:37:04 | Epoch: 3 | Step: 90800 | Dataset: 0-13456703 | Loss: 2.050 | 675 ms/step , 58233.71 GFLOP/s , 533002.7 tokens/s INFO:__main__:2024-10-27 13:37:12 | Epoch: 3 | Step: 90810 | Dataset: 0-13464703 | Loss: 1.900 | 675 ms/step , 58228.49 GFLOP/s , 532231.6 tokens/s INFO:__main__:2024-10-27 13:37:19 | Epoch: 3 | Step: 90820 | Dataset: 0-13472703 | Loss: 1.758 | 675 ms/step , 58230.09 GFLOP/s , 532097.4 tokens/s INFO:__main__:2024-10-27 13:37:27 | Epoch: 3 | Step: 90830 | Dataset: 0-13480703 | Loss: 1.713 | 674 ms/step , 58298.67 GFLOP/s , 532165.9 tokens/s INFO:__main__:2024-10-27 13:37:35 | Epoch: 3 | Step: 90840 | Dataset: 0-13488703 | Loss: 1.703 | 675 ms/step , 58276.75 GFLOP/s , 532053.8 tokens/s INFO:__main__:2024-10-27 13:37:42 | Epoch: 3 | Step: 90850 | Dataset: 0-13496703 | Loss: 1.680 | 674 ms/step , 58320.97 GFLOP/s , 531658.6 tokens/s INFO:__main__:2024-10-27 13:37:50 | Epoch: 3 | Step: 90860 | Dataset: 0-13504703 | Loss: 1.697 | 675 ms/step , 58220.46 GFLOP/s , 532359.5 tokens/s INFO:__main__:2024-10-27 13:37:58 | Epoch: 3 | Step: 90870 | Dataset: 0-13512703 | Loss: 1.670 | 676 ms/step , 58118.83 GFLOP/s , 531902.9 tokens/s INFO:__main__:2024-10-27 13:38:05 | Epoch: 3 | Step: 90880 | Dataset: 0-13520703 | Loss: 1.674 | 675 ms/step , 58259.04 GFLOP/s , 528582.8 tokens/s INFO:__main__:2024-10-27 13:38:13 | Epoch: 3 | Step: 90890 | Dataset: 0-13528703 | Loss: 1.666 | 675 ms/step , 58217.12 GFLOP/s , 530872.7 tokens/s INFO:__main__:2024-10-27 13:38:21 | Epoch: 3 | Step: 90900 | Dataset: 0-13536703 | Loss: 1.806 | 677 ms/step , 58102.14 GFLOP/s , 530921.1 tokens/s INFO:__main__:2024-10-27 13:38:29 | Epoch: 3 | Step: 90910 | Dataset: 0-13544703 | Loss: 1.748 | 676 ms/step , 58171.21 GFLOP/s , 531237.7 tokens/s INFO:__main__:2024-10-27 13:38:36 | Epoch: 3 | Step: 90920 | Dataset: 0-13552703 | Loss: 1.762 | 677 ms/step , 58086.70 GFLOP/s , 530665.2 tokens/s INFO:__main__:2024-10-27 13:38:44 | Epoch: 3 | Step: 90930 | Dataset: 0-13560703 | Loss: 1.727 | 676 ms/step , 58188.91 GFLOP/s , 530922.6 tokens/s INFO:__main__:2024-10-27 13:38:52 | Epoch: 3 | Step: 90940 | Dataset: 0-13568703 | Loss: 1.737 | 677 ms/step , 58058.30 GFLOP/s , 530923.9 tokens/s INFO:__main__:2024-10-27 13:38:59 | Epoch: 3 | Step: 90950 | Dataset: 0-13576703 | Loss: 1.722 | 677 ms/step , 58070.92 GFLOP/s , 531010.2 tokens/s INFO:__main__:2024-10-27 13:39:07 | Epoch: 3 | Step: 90960 | Dataset: 0-13584703 | Loss: 1.725 | 677 ms/step , 58072.63 GFLOP/s , 528758.2 tokens/s INFO:__main__:2024-10-27 13:39:15 | Epoch: 3 | Step: 90970 | Dataset: 0-13592703 | Loss: 1.730 | 677 ms/step , 58095.82 GFLOP/s , 530361.5 tokens/s INFO:__main__:2024-10-27 13:39:23 | Epoch: 3 | Step: 90980 | Dataset: 0-13600703 | Loss: 1.739 | 675 ms/step , 58237.43 GFLOP/s , 532073.2 tokens/s INFO:__main__:2024-10-27 13:39:30 | Epoch: 3 | Step: 90990 | Dataset: 0-13608703 | Loss: 2.182 | 676 ms/step , 58137.12 GFLOP/s , 532370.8 tokens/s INFO:__main__:2024-10-27 13:39:38 | Validation | Step: 91000 | Val_loss: 2.231 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 13:39:38 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_133938_step_91000.pt` INFO:__main__:2024-10-27 13:39:39 | Epoch: 3 | Step: 91000 | Dataset: 0-13616703 | Loss: 2.189 | 674 ms/step , 58342.12 GFLOP/s , 480125.3 tokens/s INFO:__main__:2024-10-27 13:39:47 | Epoch: 3 | Step: 91010 | Dataset: 0-13624703 | Loss: 2.171 | 675 ms/step , 58200.52 GFLOP/s , 532545.7 tokens/s INFO:__main__:2024-10-27 13:39:54 | Epoch: 3 | Step: 91020 | Dataset: 0-13632703 | Loss: 2.208 | 675 ms/step , 58211.85 GFLOP/s , 532369.6 tokens/s INFO:__main__:2024-10-27 13:40:02 | Epoch: 3 | Step: 91030 | Dataset: 0-13640703 | Loss: 2.161 | 676 ms/step , 58118.76 GFLOP/s , 532287.8 tokens/s INFO:__main__:2024-10-27 13:40:10 | Epoch: 3 | Step: 91040 | Dataset: 0-13648703 | Loss: 2.097 | 675 ms/step , 58213.87 GFLOP/s , 532668.6 tokens/s INFO:__main__:2024-10-27 13:40:17 | Epoch: 3 | Step: 91050 | Dataset: 0-13656703 | Loss: 2.087 | 675 ms/step , 58241.54 GFLOP/s , 531944.6 tokens/s INFO:__main__:2024-10-27 13:40:25 | Epoch: 3 | Step: 91060 | Dataset: 0-13664703 | Loss: 2.188 | 676 ms/step , 58173.96 GFLOP/s , 532540.2 tokens/s INFO:__main__:2024-10-27 13:40:33 | Epoch: 3 | Step: 91070 | Dataset: 0-13672703 | Loss: 2.129 | 675 ms/step , 58220.46 GFLOP/s , 532204.0 tokens/s INFO:__main__:2024-10-27 13:40:40 | Epoch: 3 | Step: 91080 | Dataset: 0-13680703 | Loss: 2.125 | 675 ms/step , 58227.58 GFLOP/s , 532569.0 tokens/s INFO:__main__:2024-10-27 13:40:48 | Epoch: 3 | Step: 91090 | Dataset: 0-13688703 | Loss: 2.161 | 675 ms/step , 58205.10 GFLOP/s , 532461.6 tokens/s INFO:__main__:2024-10-27 13:40:56 | Epoch: 3 | Step: 91100 | Dataset: 0-13696703 | Loss: 2.132 | 674 ms/step , 58301.80 GFLOP/s , 532648.9 tokens/s INFO:__main__:2024-10-27 13:41:03 | Epoch: 3 | Step: 91110 | Dataset: 0-13704703 | Loss: 2.159 | 675 ms/step , 58257.64 GFLOP/s , 533074.4 tokens/s INFO:__main__:2024-10-27 13:41:11 | Epoch: 3 | Step: 91120 | Dataset: 0-13712703 | Loss: 2.082 | 675 ms/step , 58255.70 GFLOP/s , 532865.9 tokens/s INFO:__main__:2024-10-27 13:41:19 | Epoch: 3 | Step: 91130 | Dataset: 0-13720703 | Loss: 2.129 | 676 ms/step , 58162.78 GFLOP/s , 532464.5 tokens/s INFO:__main__:2024-10-27 13:41:27 | Epoch: 3 | Step: 91140 | Dataset: 0-13728703 | Loss: 2.110 | 678 ms/step , 57951.83 GFLOP/s , 531758.8 tokens/s INFO:__main__:2024-10-27 13:41:34 | Epoch: 3 | Step: 91150 | Dataset: 0-13736703 | Loss: 2.147 | 677 ms/step , 58030.91 GFLOP/s , 531148.5 tokens/s INFO:__main__:2024-10-27 13:41:42 | Epoch: 3 | Step: 91160 | Dataset: 0-13744703 | Loss: 2.186 | 679 ms/step , 57863.02 GFLOP/s , 529987.8 tokens/s INFO:__main__:2024-10-27 13:41:50 | Epoch: 3 | Step: 91170 | Dataset: 0-13752703 | Loss: 2.169 | 675 ms/step , 58208.17 GFLOP/s , 531843.5 tokens/s INFO:__main__:2024-10-27 13:41:57 | Epoch: 3 | Step: 91180 | Dataset: 0-13760703 | Loss: 2.197 | 675 ms/step , 58235.53 GFLOP/s , 532719.4 tokens/s INFO:__main__:2024-10-27 13:42:05 | Epoch: 3 | Step: 91190 | Dataset: 0-13768703 | Loss: 2.202 | 676 ms/step , 58179.38 GFLOP/s , 532681.9 tokens/s INFO:__main__:2024-10-27 13:42:13 | Epoch: 3 | Step: 91200 | Dataset: 0-13776703 | Loss: 2.164 | 674 ms/step , 58362.11 GFLOP/s , 533203.0 tokens/s INFO:__main__:2024-10-27 13:42:20 | Epoch: 3 | Step: 91210 | Dataset: 0-13784703 | Loss: 2.189 | 675 ms/step , 58213.98 GFLOP/s , 532179.5 tokens/s INFO:__main__:2024-10-27 13:42:28 | Epoch: 3 | Step: 91220 | Dataset: 0-13792703 | Loss: 2.182 | 675 ms/step , 58263.64 GFLOP/s , 532654.4 tokens/s INFO:__main__:2024-10-27 13:42:36 | Epoch: 3 | Step: 91230 | Dataset: 0-13800703 | Loss: 2.195 | 676 ms/step , 58161.20 GFLOP/s , 532440.8 tokens/s INFO:__main__:2024-10-27 13:42:44 | Epoch: 3 | Step: 91240 | Dataset: 0-13808703 | Loss: 2.119 | 674 ms/step , 58284.66 GFLOP/s , 533150.8 tokens/s INFO:__main__:2024-10-27 13:42:51 | Epoch: 3 | Step: 91250 | Dataset: 0-13816703 | Loss: 2.098 | 674 ms/step , 58312.32 GFLOP/s , 533165.5 tokens/s INFO:__main__:2024-10-27 13:42:59 | Epoch: 3 | Step: 91260 | Dataset: 0-13824703 | Loss: 2.155 | 677 ms/step , 58035.32 GFLOP/s , 532808.0 tokens/s INFO:__main__:2024-10-27 13:43:07 | Epoch: 3 | Step: 91270 | Dataset: 0-13832703 | Loss: 2.191 | 677 ms/step , 58029.98 GFLOP/s , 530762.8 tokens/s INFO:__main__:2024-10-27 13:43:14 | Epoch: 3 | Step: 91280 | Dataset: 0-13840703 | Loss: 2.124 | 674 ms/step , 58331.93 GFLOP/s , 532804.5 tokens/s INFO:__main__:2024-10-27 13:43:22 | Epoch: 3 | Step: 91290 | Dataset: 0-13848703 | Loss: 2.099 | 674 ms/step , 58325.03 GFLOP/s , 533114.3 tokens/s INFO:__main__:2024-10-27 13:43:30 | Epoch: 3 | Step: 91300 | Dataset: 0-13856703 | Loss: 2.134 | 674 ms/step , 58305.69 GFLOP/s , 533138.8 tokens/s INFO:__main__:2024-10-27 13:43:37 | Epoch: 3 | Step: 91310 | Dataset: 0-13864703 | Loss: 2.224 | 674 ms/step , 58288.48 GFLOP/s , 533116.1 tokens/s INFO:__main__:2024-10-27 13:43:45 | Epoch: 3 | Step: 91320 | Dataset: 0-13872703 | Loss: 2.108 | 675 ms/step , 58276.70 GFLOP/s , 532542.6 tokens/s INFO:__main__:2024-10-27 13:43:53 | Epoch: 3 | Step: 91330 | Dataset: 0-13880703 | Loss: 2.163 | 676 ms/step , 58171.54 GFLOP/s , 532552.5 tokens/s INFO:__main__:2024-10-27 13:44:00 | Epoch: 3 | Step: 91340 | Dataset: 0-13888703 | Loss: 2.047 | 674 ms/step , 58279.86 GFLOP/s , 532489.4 tokens/s INFO:__main__:2024-10-27 13:44:08 | Epoch: 3 | Step: 91350 | Dataset: 0-13896703 | Loss: 2.171 | 675 ms/step , 58210.67 GFLOP/s , 532589.5 tokens/s INFO:__main__:2024-10-27 13:44:16 | Epoch: 3 | Step: 91360 | Dataset: 0-13904703 | Loss: 2.137 | 674 ms/step , 58337.91 GFLOP/s , 532932.0 tokens/s INFO:__main__:2024-10-27 13:44:23 | Epoch: 3 | Step: 91370 | Dataset: 0-13912703 | Loss: 2.136 | 675 ms/step , 58204.47 GFLOP/s , 532734.7 tokens/s INFO:__main__:2024-10-27 13:44:31 | Epoch: 3 | Step: 91380 | Dataset: 0-13920703 | Loss: 2.160 | 675 ms/step , 58247.82 GFLOP/s , 532858.7 tokens/s INFO:__main__:2024-10-27 13:44:39 | Epoch: 3 | Step: 91390 | Dataset: 0-13928703 | Loss: 2.142 | 675 ms/step , 58244.39 GFLOP/s , 532258.0 tokens/s INFO:__main__:2024-10-27 13:44:47 | Epoch: 3 | Step: 91400 | Dataset: 0-13936703 | Loss: 2.095 | 675 ms/step , 58244.23 GFLOP/s , 532622.9 tokens/s INFO:__main__:2024-10-27 13:44:54 | Epoch: 3 | Step: 91410 | Dataset: 0-13944703 | Loss: 2.131 | 675 ms/step , 58215.81 GFLOP/s , 532745.8 tokens/s INFO:__main__:2024-10-27 13:45:02 | Epoch: 3 | Step: 91420 | Dataset: 0-13952703 | Loss: 2.115 | 675 ms/step , 58267.92 GFLOP/s , 532576.6 tokens/s INFO:__main__:2024-10-27 13:45:10 | Epoch: 3 | Step: 91430 | Dataset: 0-13960703 | Loss: 2.140 | 675 ms/step , 58261.20 GFLOP/s , 532433.1 tokens/s INFO:__main__:2024-10-27 13:45:17 | Epoch: 3 | Step: 91440 | Dataset: 0-13968703 | Loss: 2.092 | 676 ms/step , 58134.85 GFLOP/s , 532245.1 tokens/s INFO:__main__:2024-10-27 13:45:25 | Epoch: 3 | Step: 91450 | Dataset: 0-13976703 | Loss: 2.105 | 675 ms/step , 58201.58 GFLOP/s , 532399.4 tokens/s INFO:__main__:2024-10-27 13:45:33 | Epoch: 3 | Step: 91460 | Dataset: 0-13984703 | Loss: 2.127 | 674 ms/step , 58287.92 GFLOP/s , 532596.7 tokens/s INFO:__main__:2024-10-27 13:45:40 | Epoch: 3 | Step: 91470 | Dataset: 0-13992703 | Loss: 2.131 | 675 ms/step , 58215.98 GFLOP/s , 532306.3 tokens/s INFO:__main__:2024-10-27 13:45:48 | Epoch: 3 | Step: 91480 | Dataset: 0-14000703 | Loss: 2.004 | 674 ms/step , 58351.42 GFLOP/s , 532236.1 tokens/s INFO:__main__:2024-10-27 13:45:56 | Epoch: 3 | Step: 91490 | Dataset: 0-14008703 | Loss: 2.030 | 675 ms/step , 58233.88 GFLOP/s , 532812.8 tokens/s INFO:__main__:2024-10-27 13:46:03 | Epoch: 3 | Step: 91500 | Dataset: 0-14016703 | Loss: 2.015 | 675 ms/step , 58229.03 GFLOP/s , 532817.7 tokens/s INFO:__main__:2024-10-27 13:46:11 | Epoch: 3 | Step: 91510 | Dataset: 0-14024703 | Loss: 2.010 | 674 ms/step , 58325.34 GFLOP/s , 532746.8 tokens/s INFO:__main__:2024-10-27 13:46:19 | Epoch: 3 | Step: 91520 | Dataset: 0-14032703 | Loss: 1.973 | 674 ms/step , 58316.54 GFLOP/s , 533381.8 tokens/s INFO:__main__:2024-10-27 13:46:27 | Epoch: 3 | Step: 91530 | Dataset: 0-14040703 | Loss: 1.979 | 675 ms/step , 58275.83 GFLOP/s , 532794.3 tokens/s INFO:__main__:2024-10-27 13:46:34 | Epoch: 3 | Step: 91540 | Dataset: 0-14048703 | Loss: 1.970 | 674 ms/step , 58280.85 GFLOP/s , 532910.5 tokens/s INFO:__main__:2024-10-27 13:46:42 | Epoch: 3 | Step: 91550 | Dataset: 0-14056703 | Loss: 2.029 | 675 ms/step , 58228.22 GFLOP/s , 533102.6 tokens/s INFO:__main__:2024-10-27 13:46:50 | Epoch: 3 | Step: 91560 | Dataset: 0-14064703 | Loss: 1.959 | 676 ms/step , 58191.39 GFLOP/s , 532271.3 tokens/s INFO:__main__:2024-10-27 13:46:57 | Epoch: 3 | Step: 91570 | Dataset: 0-14072703 | Loss: 1.991 | 675 ms/step , 58208.09 GFLOP/s , 532366.5 tokens/s INFO:__main__:2024-10-27 13:47:05 | Epoch: 3 | Step: 91580 | Dataset: 0-14080703 | Loss: 1.948 | 674 ms/step , 58286.66 GFLOP/s , 532457.7 tokens/s INFO:__main__:2024-10-27 13:47:13 | Epoch: 3 | Step: 91590 | Dataset: 0-14088703 | Loss: 1.987 | 676 ms/step , 58138.41 GFLOP/s , 532471.5 tokens/s INFO:__main__:2024-10-27 13:47:20 | Epoch: 3 | Step: 91600 | Dataset: 0-14096703 | Loss: 1.917 | 674 ms/step , 58283.54 GFLOP/s , 532843.0 tokens/s INFO:__main__:2024-10-27 13:47:28 | Epoch: 3 | Step: 91610 | Dataset: 0-14104703 | Loss: 2.014 | 675 ms/step , 58195.47 GFLOP/s , 532229.5 tokens/s INFO:__main__:2024-10-27 13:47:36 | Epoch: 3 | Step: 91620 | Dataset: 0-14112703 | Loss: 1.967 | 676 ms/step , 58160.87 GFLOP/s , 532080.1 tokens/s INFO:__main__:2024-10-27 13:47:43 | Epoch: 3 | Step: 91630 | Dataset: 0-14120703 | Loss: 2.328 | 676 ms/step , 58185.17 GFLOP/s , 532100.9 tokens/s INFO:__main__:2024-10-27 13:47:51 | Epoch: 3 | Step: 91640 | Dataset: 0-14128703 | Loss: 2.203 | 676 ms/step , 58116.15 GFLOP/s , 532449.3 tokens/s INFO:__main__:2024-10-27 13:47:59 | Epoch: 3 | Step: 91650 | Dataset: 0-14136703 | Loss: 2.181 | 676 ms/step , 58146.53 GFLOP/s , 532379.4 tokens/s INFO:__main__:2024-10-27 13:48:07 | Epoch: 3 | Step: 91660 | Dataset: 0-14144703 | Loss: 2.168 | 677 ms/step , 58078.33 GFLOP/s , 531925.0 tokens/s INFO:__main__:2024-10-27 13:48:14 | Epoch: 3 | Step: 91670 | Dataset: 0-14152703 | Loss: 2.234 | 676 ms/step , 58182.99 GFLOP/s , 531915.4 tokens/s INFO:__main__:2024-10-27 13:48:22 | Epoch: 3 | Step: 91680 | Dataset: 0-14160703 | Loss: 2.158 | 675 ms/step , 58203.76 GFLOP/s , 532146.9 tokens/s INFO:__main__:2024-10-27 13:48:30 | Epoch: 3 | Step: 91690 | Dataset: 0-14168703 | Loss: 2.130 | 676 ms/step , 58107.96 GFLOP/s , 532170.8 tokens/s INFO:__main__:2024-10-27 13:48:37 | Epoch: 3 | Step: 91700 | Dataset: 0-14176703 | Loss: 2.088 | 676 ms/step , 58171.12 GFLOP/s , 532146.0 tokens/s INFO:__main__:2024-10-27 13:48:45 | Epoch: 3 | Step: 91710 | Dataset: 0-14184703 | Loss: 2.177 | 675 ms/step , 58233.17 GFLOP/s , 532215.2 tokens/s INFO:__main__:2024-10-27 13:48:53 | Epoch: 3 | Step: 91720 | Dataset: 0-14192703 | Loss: 2.087 | 675 ms/step , 58252.60 GFLOP/s , 532306.6 tokens/s INFO:__main__:2024-10-27 13:49:00 | Epoch: 3 | Step: 91730 | Dataset: 0-14200703 | Loss: 2.100 | 675 ms/step , 58248.24 GFLOP/s , 532542.6 tokens/s INFO:__main__:2024-10-27 13:49:08 | Epoch: 3 | Step: 91740 | Dataset: 0-14208703 | Loss: 2.071 | 675 ms/step , 58211.27 GFLOP/s , 532093.5 tokens/s INFO:__main__:2024-10-27 13:49:16 | Epoch: 3 | Step: 91750 | Dataset: 0-14216703 | Loss: 2.095 | 674 ms/step , 58299.68 GFLOP/s , 532607.5 tokens/s INFO:__main__:2024-10-27 13:49:23 | Epoch: 3 | Step: 91760 | Dataset: 0-14224703 | Loss: 2.066 | 675 ms/step , 58252.91 GFLOP/s , 532800.9 tokens/s INFO:__main__:2024-10-27 13:49:31 | Epoch: 3 | Step: 91770 | Dataset: 0-14232703 | Loss: 2.117 | 675 ms/step , 58193.89 GFLOP/s , 532639.1 tokens/s INFO:__main__:2024-10-27 13:49:39 | Epoch: 3 | Step: 91780 | Dataset: 0-14240703 | Loss: 2.132 | 675 ms/step , 58242.70 GFLOP/s , 532180.5 tokens/s INFO:__main__:2024-10-27 13:49:47 | Epoch: 3 | Step: 91790 | Dataset: 0-14248703 | Loss: 1.923 | 675 ms/step , 58235.31 GFLOP/s , 531959.0 tokens/s INFO:__main__:2024-10-27 13:49:54 | Epoch: 3 | Step: 91800 | Dataset: 0-14256703 | Loss: 1.772 | 674 ms/step , 58281.94 GFLOP/s , 531988.4 tokens/s INFO:__main__:2024-10-27 13:50:02 | Epoch: 3 | Step: 91810 | Dataset: 0-14264703 | Loss: 1.705 | 676 ms/step , 58141.85 GFLOP/s , 532193.5 tokens/s INFO:__main__:2024-10-27 13:50:10 | Epoch: 3 | Step: 91820 | Dataset: 0-14272703 | Loss: 1.679 | 674 ms/step , 58282.74 GFLOP/s , 532126.2 tokens/s INFO:__main__:2024-10-27 13:50:17 | Epoch: 3 | Step: 91830 | Dataset: 0-14280703 | Loss: 1.710 | 677 ms/step , 58023.55 GFLOP/s , 530834.6 tokens/s INFO:__main__:2024-10-27 13:50:25 | Epoch: 3 | Step: 91840 | Dataset: 0-14288703 | Loss: 1.676 | 677 ms/step , 58029.41 GFLOP/s , 530669.0 tokens/s INFO:__main__:2024-10-27 13:50:33 | Epoch: 3 | Step: 91850 | Dataset: 0-14296703 | Loss: 1.688 | 678 ms/step , 57999.63 GFLOP/s , 530553.4 tokens/s INFO:__main__:2024-10-27 13:50:41 | Epoch: 3 | Step: 91860 | Dataset: 0-14304703 | Loss: 1.695 | 677 ms/step , 58031.08 GFLOP/s , 530581.9 tokens/s INFO:__main__:2024-10-27 13:50:48 | Epoch: 3 | Step: 91870 | Dataset: 0-14312703 | Loss: 1.643 | 673 ms/step , 58370.91 GFLOP/s , 532634.4 tokens/s INFO:__main__:2024-10-27 13:50:56 | Epoch: 3 | Step: 91880 | Dataset: 0-14320703 | Loss: 2.264 | 674 ms/step , 58302.30 GFLOP/s , 533395.9 tokens/s INFO:__main__:2024-10-27 13:51:04 | Epoch: 3 | Step: 91890 | Dataset: 0-14328703 | Loss: 2.246 | 674 ms/step , 58320.19 GFLOP/s , 533455.7 tokens/s INFO:__main__:2024-10-27 13:51:11 | Epoch: 3 | Step: 91900 | Dataset: 0-14336703 | Loss: 2.250 | 674 ms/step , 58338.34 GFLOP/s , 533562.0 tokens/s INFO:__main__:2024-10-27 13:51:19 | Epoch: 3 | Step: 91910 | Dataset: 0-14344703 | Loss: 2.245 | 675 ms/step , 58237.10 GFLOP/s , 532281.9 tokens/s INFO:__main__:2024-10-27 13:51:27 | Epoch: 3 | Step: 91920 | Dataset: 0-14352703 | Loss: 2.189 | 676 ms/step , 58188.99 GFLOP/s , 532257.3 tokens/s INFO:__main__:2024-10-27 13:51:34 | Epoch: 3 | Step: 91930 | Dataset: 0-14360703 | Loss: 2.192 | 674 ms/step , 58282.12 GFLOP/s , 532572.6 tokens/s INFO:__main__:2024-10-27 13:51:42 | Epoch: 3 | Step: 91940 | Dataset: 0-14368703 | Loss: 2.104 | 674 ms/step , 58328.87 GFLOP/s , 532941.4 tokens/s INFO:__main__:2024-10-27 13:51:50 | Epoch: 3 | Step: 91950 | Dataset: 0-14376703 | Loss: 2.183 | 675 ms/step , 58195.65 GFLOP/s , 532744.9 tokens/s INFO:__main__:2024-10-27 13:51:57 | Epoch: 3 | Step: 91960 | Dataset: 0-14384703 | Loss: 2.130 | 676 ms/step , 58139.30 GFLOP/s , 532444.5 tokens/s INFO:__main__:2024-10-27 13:52:05 | Epoch: 3 | Step: 91970 | Dataset: 0-14392703 | Loss: 2.155 | 676 ms/step , 58118.41 GFLOP/s , 532648.4 tokens/s INFO:__main__:2024-10-27 13:52:13 | Epoch: 3 | Step: 91980 | Dataset: 0-14400703 | Loss: 2.105 | 674 ms/step , 58305.99 GFLOP/s , 532441.5 tokens/s INFO:__main__:2024-10-27 13:52:20 | Epoch: 3 | Step: 91990 | Dataset: 0-14408703 | Loss: 2.189 | 675 ms/step , 58219.24 GFLOP/s , 533041.4 tokens/s INFO:__main__:2024-10-27 13:52:28 | Validation | Step: 92000 | Val_loss: 2.114 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 13:52:28 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_135228_step_92000.pt` INFO:__main__:2024-10-27 13:52:29 | Epoch: 3 | Step: 92000 | Dataset: 0-14416703 | Loss: 2.142 | 674 ms/step , 58352.49 GFLOP/s , 479578.1 tokens/s INFO:__main__:2024-10-27 13:52:37 | Epoch: 3 | Step: 92010 | Dataset: 0-14424703 | Loss: 2.107 | 675 ms/step , 58240.73 GFLOP/s , 532108.4 tokens/s INFO:__main__:2024-10-27 13:52:44 | Epoch: 3 | Step: 92020 | Dataset: 0-14432703 | Loss: 2.113 | 673 ms/step , 58393.73 GFLOP/s , 532943.3 tokens/s INFO:__main__:2024-10-27 13:52:52 | Epoch: 3 | Step: 92030 | Dataset: 0-14440703 | Loss: 2.183 | 675 ms/step , 58257.11 GFLOP/s , 532925.1 tokens/s INFO:__main__:2024-10-27 13:53:00 | Epoch: 3 | Step: 92040 | Dataset: 0-14448703 | Loss: 1.736 | 676 ms/step , 58120.92 GFLOP/s , 532666.8 tokens/s INFO:__main__:2024-10-27 13:53:07 | Epoch: 3 | Step: 92050 | Dataset: 0-14456703 | Loss: 1.658 | 674 ms/step , 58286.73 GFLOP/s , 532401.0 tokens/s INFO:__main__:2024-10-27 13:53:15 | Epoch: 3 | Step: 92060 | Dataset: 0-14464703 | Loss: 1.676 | 675 ms/step , 58224.64 GFLOP/s , 532792.3 tokens/s INFO:__main__:2024-10-27 13:53:23 | Epoch: 3 | Step: 92070 | Dataset: 0-14472703 | Loss: 1.642 | 674 ms/step , 58303.68 GFLOP/s , 532648.0 tokens/s INFO:__main__:2024-10-27 13:53:31 | Epoch: 3 | Step: 92080 | Dataset: 0-14480703 | Loss: 1.657 | 674 ms/step , 58330.89 GFLOP/s , 532787.2 tokens/s INFO:__main__:2024-10-27 13:53:38 | Epoch: 3 | Step: 92090 | Dataset: 0-14488703 | Loss: 1.636 | 677 ms/step , 58031.69 GFLOP/s , 530286.3 tokens/s INFO:__main__:2024-10-27 13:53:46 | Epoch: 3 | Step: 92100 | Dataset: 0-14496703 | Loss: 1.628 | 674 ms/step , 58314.32 GFLOP/s , 532761.1 tokens/s INFO:__main__:2024-10-27 13:53:54 | Epoch: 3 | Step: 92110 | Dataset: 0-14504703 | Loss: 1.653 | 676 ms/step , 58139.12 GFLOP/s , 532291.2 tokens/s INFO:__main__:2024-10-27 13:54:01 | Epoch: 3 | Step: 92120 | Dataset: 0-14512703 | Loss: 1.647 | 675 ms/step , 58240.74 GFLOP/s , 532333.4 tokens/s INFO:__main__:2024-10-27 13:54:09 | Epoch: 3 | Step: 92130 | Dataset: 0-14520703 | Loss: 2.264 | 675 ms/step , 58227.62 GFLOP/s , 532857.8 tokens/s INFO:__main__:2024-10-27 13:54:17 | Epoch: 3 | Step: 92140 | Dataset: 0-14528703 | Loss: 2.177 | 676 ms/step , 58172.67 GFLOP/s , 532711.1 tokens/s INFO:__main__:2024-10-27 13:54:24 | Epoch: 3 | Step: 92150 | Dataset: 0-14536703 | Loss: 2.055 | 674 ms/step , 58289.02 GFLOP/s , 533316.9 tokens/s INFO:__main__:2024-10-27 13:54:32 | Epoch: 3 | Step: 92160 | Dataset: 0-14544703 | Loss: 2.158 | 674 ms/step , 58292.93 GFLOP/s , 533137.1 tokens/s INFO:__main__:2024-10-27 13:54:40 | Epoch: 3 | Step: 92170 | Dataset: 0-14552703 | Loss: 2.174 | 676 ms/step , 58159.61 GFLOP/s , 532772.3 tokens/s INFO:__main__:2024-10-27 13:54:47 | Epoch: 3 | Step: 92180 | Dataset: 0-14560703 | Loss: 2.109 | 675 ms/step , 58271.38 GFLOP/s , 532544.7 tokens/s INFO:__main__:2024-10-27 13:54:55 | Epoch: 3 | Step: 92190 | Dataset: 0-14568703 | Loss: 2.082 | 675 ms/step , 58259.85 GFLOP/s , 532916.7 tokens/s INFO:__main__:2024-10-27 13:55:03 | Epoch: 3 | Step: 92200 | Dataset: 0-14576703 | Loss: 2.167 | 675 ms/step , 58233.49 GFLOP/s , 532311.9 tokens/s INFO:__main__:2024-10-27 13:55:11 | Epoch: 3 | Step: 92210 | Dataset: 0-14584703 | Loss: 2.134 | 675 ms/step , 58252.43 GFLOP/s , 532309.5 tokens/s INFO:__main__:2024-10-27 13:55:18 | Epoch: 3 | Step: 92220 | Dataset: 0-14592703 | Loss: 2.173 | 676 ms/step , 58146.38 GFLOP/s , 532180.8 tokens/s INFO:__main__:2024-10-27 13:55:26 | Epoch: 3 | Step: 92230 | Dataset: 0-14600703 | Loss: 2.173 | 674 ms/step , 58304.42 GFLOP/s , 532448.0 tokens/s INFO:__main__:2024-10-27 13:55:34 | Epoch: 3 | Step: 92240 | Dataset: 0-14608703 | Loss: 2.041 | 676 ms/step , 58157.37 GFLOP/s , 531723.9 tokens/s INFO:__main__:2024-10-27 13:55:41 | Epoch: 3 | Step: 92250 | Dataset: 0-14616703 | Loss: 2.122 | 676 ms/step , 58123.31 GFLOP/s , 530974.8 tokens/s INFO:__main__:2024-10-27 13:55:49 | Epoch: 3 | Step: 92260 | Dataset: 0-14624703 | Loss: 2.148 | 675 ms/step , 58255.98 GFLOP/s , 531198.9 tokens/s INFO:__main__:2024-10-27 13:55:57 | Epoch: 3 | Step: 92270 | Dataset: 0-14632703 | Loss: 2.123 | 675 ms/step , 58214.27 GFLOP/s , 531528.2 tokens/s INFO:__main__:2024-10-27 13:56:04 | Epoch: 3 | Step: 92280 | Dataset: 0-14640703 | Loss: 2.059 | 674 ms/step , 58297.09 GFLOP/s , 531822.6 tokens/s INFO:__main__:2024-10-27 13:56:12 | Epoch: 3 | Step: 92290 | Dataset: 0-14648703 | Loss: 2.174 | 675 ms/step , 58222.95 GFLOP/s , 531827.7 tokens/s INFO:__main__:2024-10-27 13:56:20 | Epoch: 3 | Step: 92300 | Dataset: 0-14656703 | Loss: 2.221 | 686 ms/step , 57298.09 GFLOP/s , 530569.9 tokens/s INFO:__main__:2024-10-27 13:56:28 | Epoch: 3 | Step: 92310 | Dataset: 0-14664703 | Loss: 2.199 | 677 ms/step , 58095.17 GFLOP/s , 528990.5 tokens/s INFO:__main__:2024-10-27 13:56:35 | Epoch: 3 | Step: 92320 | Dataset: 0-14672703 | Loss: 2.143 | 677 ms/step , 58067.70 GFLOP/s , 531302.0 tokens/s INFO:__main__:2024-10-27 13:56:43 | Epoch: 3 | Step: 92330 | Dataset: 0-14680703 | Loss: 2.196 | 674 ms/step , 58288.11 GFLOP/s , 532460.9 tokens/s INFO:__main__:2024-10-27 13:56:51 | Epoch: 3 | Step: 92340 | Dataset: 0-14688703 | Loss: 2.101 | 675 ms/step , 58199.09 GFLOP/s , 532504.2 tokens/s INFO:__main__:2024-10-27 13:56:58 | Epoch: 3 | Step: 92350 | Dataset: 0-14696703 | Loss: 2.238 | 675 ms/step , 58217.66 GFLOP/s , 532231.8 tokens/s INFO:__main__:2024-10-27 13:57:06 | Epoch: 3 | Step: 92360 | Dataset: 0-14704703 | Loss: 2.090 | 674 ms/step , 58313.84 GFLOP/s , 532376.9 tokens/s INFO:__main__:2024-10-27 13:57:14 | Epoch: 3 | Step: 92370 | Dataset: 0-14712703 | Loss: 2.113 | 675 ms/step , 58274.57 GFLOP/s , 532986.8 tokens/s INFO:__main__:2024-10-27 13:57:22 | Epoch: 3 | Step: 92380 | Dataset: 0-14720703 | Loss: 2.158 | 675 ms/step , 58263.02 GFLOP/s , 532024.3 tokens/s INFO:__main__:2024-10-27 13:57:29 | Epoch: 3 | Step: 92390 | Dataset: 0-14728703 | Loss: 2.094 | 675 ms/step , 58198.84 GFLOP/s , 532067.0 tokens/s INFO:__main__:2024-10-27 13:57:37 | Epoch: 3 | Step: 92400 | Dataset: 0-14736703 | Loss: 2.161 | 677 ms/step , 58098.41 GFLOP/s , 530936.5 tokens/s INFO:__main__:2024-10-27 13:57:45 | Epoch: 3 | Step: 92410 | Dataset: 0-14744703 | Loss: 2.138 | 676 ms/step , 58126.45 GFLOP/s , 532139.5 tokens/s INFO:__main__:2024-10-27 13:57:52 | Epoch: 3 | Step: 92420 | Dataset: 0-14752703 | Loss: 2.026 | 676 ms/step , 58128.37 GFLOP/s , 531987.2 tokens/s INFO:__main__:2024-10-27 13:58:00 | Epoch: 3 | Step: 92430 | Dataset: 0-14760703 | Loss: 2.155 | 675 ms/step , 58194.05 GFLOP/s , 532109.0 tokens/s INFO:__main__:2024-10-27 13:58:08 | Epoch: 3 | Step: 92440 | Dataset: 0-14768703 | Loss: 2.125 | 675 ms/step , 58193.06 GFLOP/s , 532262.6 tokens/s INFO:__main__:2024-10-27 13:58:15 | Epoch: 3 | Step: 92450 | Dataset: 0-14776703 | Loss: 2.218 | 675 ms/step , 58197.92 GFLOP/s , 532743.0 tokens/s INFO:__main__:2024-10-27 13:58:23 | Epoch: 3 | Step: 92460 | Dataset: 0-14784703 | Loss: 2.128 | 675 ms/step , 58266.28 GFLOP/s , 532846.3 tokens/s INFO:__main__:2024-10-27 13:58:31 | Epoch: 3 | Step: 92470 | Dataset: 0-14792703 | Loss: 2.140 | 674 ms/step , 58299.56 GFLOP/s , 532621.1 tokens/s INFO:__main__:2024-10-27 13:58:38 | Epoch: 3 | Step: 92480 | Dataset: 0-14800703 | Loss: 2.070 | 674 ms/step , 58303.18 GFLOP/s , 533073.3 tokens/s INFO:__main__:2024-10-27 13:58:46 | Epoch: 3 | Step: 92490 | Dataset: 0-14808703 | Loss: 2.181 | 674 ms/step , 58312.92 GFLOP/s , 533040.9 tokens/s INFO:__main__:2024-10-27 13:58:54 | Epoch: 3 | Step: 92500 | Dataset: 0-14816703 | Loss: 2.133 | 675 ms/step , 58218.69 GFLOP/s , 532447.2 tokens/s INFO:__main__:2024-10-27 13:59:02 | Epoch: 3 | Step: 92510 | Dataset: 0-14824703 | Loss: 2.130 | 676 ms/step , 58129.08 GFLOP/s , 532426.6 tokens/s INFO:__main__:2024-10-27 13:59:09 | Epoch: 3 | Step: 92520 | Dataset: 0-14832703 | Loss: 2.173 | 676 ms/step , 58180.03 GFLOP/s , 532291.3 tokens/s INFO:__main__:2024-10-27 13:59:17 | Epoch: 3 | Step: 92530 | Dataset: 0-14840703 | Loss: 2.037 | 676 ms/step , 58168.77 GFLOP/s , 532580.0 tokens/s INFO:__main__:2024-10-27 13:59:25 | Epoch: 3 | Step: 92540 | Dataset: 0-14848703 | Loss: 2.145 | 675 ms/step , 58227.62 GFLOP/s , 532337.7 tokens/s INFO:__main__:2024-10-27 13:59:32 | Epoch: 3 | Step: 92550 | Dataset: 0-14856703 | Loss: 2.159 | 675 ms/step , 58241.16 GFLOP/s , 532284.0 tokens/s INFO:__main__:2024-10-27 13:59:40 | Epoch: 3 | Step: 92560 | Dataset: 0-14864703 | Loss: 2.162 | 674 ms/step , 58283.67 GFLOP/s , 531855.5 tokens/s INFO:__main__:2024-10-27 13:59:48 | Epoch: 3 | Step: 92570 | Dataset: 0-14872703 | Loss: 2.146 | 675 ms/step , 58229.59 GFLOP/s , 532390.2 tokens/s INFO:__main__:2024-10-27 13:59:55 | Epoch: 3 | Step: 92580 | Dataset: 0-14880703 | Loss: 2.070 | 674 ms/step , 58280.19 GFLOP/s , 532851.9 tokens/s INFO:__main__:2024-10-27 14:00:03 | Epoch: 3 | Step: 92590 | Dataset: 0-14888703 | Loss: 2.195 | 674 ms/step , 58307.26 GFLOP/s , 534772.1 tokens/s INFO:__main__:2024-10-27 14:00:11 | Epoch: 3 | Step: 92600 | Dataset: 0-14896703 | Loss: 2.111 | 675 ms/step , 58265.90 GFLOP/s , 532883.2 tokens/s INFO:__main__:2024-10-27 14:00:18 | Epoch: 3 | Step: 92610 | Dataset: 0-14904703 | Loss: 2.208 | 674 ms/step , 58314.36 GFLOP/s , 533022.4 tokens/s INFO:__main__:2024-10-27 14:00:26 | Epoch: 3 | Step: 92620 | Dataset: 0-14912703 | Loss: 2.147 | 675 ms/step , 58260.81 GFLOP/s , 532674.2 tokens/s INFO:__main__:2024-10-27 14:00:34 | Epoch: 3 | Step: 92630 | Dataset: 0-14920703 | Loss: 2.220 | 675 ms/step , 58229.39 GFLOP/s , 532588.8 tokens/s INFO:__main__:2024-10-27 14:00:41 | Epoch: 3 | Step: 92640 | Dataset: 0-14928703 | Loss: 2.167 | 675 ms/step , 58227.59 GFLOP/s , 532386.9 tokens/s INFO:__main__:2024-10-27 14:00:49 | Epoch: 3 | Step: 92650 | Dataset: 0-14936703 | Loss: 2.213 | 676 ms/step , 58154.64 GFLOP/s , 532423.4 tokens/s INFO:__main__:2024-10-27 14:00:57 | Epoch: 3 | Step: 92660 | Dataset: 0-14944703 | Loss: 2.109 | 676 ms/step , 58183.28 GFLOP/s , 532456.8 tokens/s INFO:__main__:2024-10-27 14:01:05 | Epoch: 3 | Step: 92670 | Dataset: 0-14952703 | Loss: 2.142 | 675 ms/step , 58249.58 GFLOP/s , 532325.5 tokens/s INFO:__main__:2024-10-27 14:01:12 | Epoch: 3 | Step: 92680 | Dataset: 0-14960703 | Loss: 2.161 | 676 ms/step , 58183.47 GFLOP/s , 532679.5 tokens/s INFO:__main__:2024-10-27 14:01:20 | Epoch: 3 | Step: 92690 | Dataset: 0-14968703 | Loss: 2.096 | 675 ms/step , 58247.03 GFLOP/s , 532489.9 tokens/s INFO:__main__:2024-10-27 14:01:28 | Epoch: 3 | Step: 92700 | Dataset: 0-14976703 | Loss: 2.128 | 674 ms/step , 58294.10 GFLOP/s , 532644.9 tokens/s INFO:__main__:2024-10-27 14:01:35 | Epoch: 3 | Step: 92710 | Dataset: 0-14984703 | Loss: 2.192 | 675 ms/step , 58270.00 GFLOP/s , 532669.3 tokens/s INFO:__main__:2024-10-27 14:01:43 | Epoch: 3 | Step: 92720 | Dataset: 0-14992703 | Loss: 2.131 | 674 ms/step , 58313.52 GFLOP/s , 532866.6 tokens/s INFO:__main__:2024-10-27 14:01:51 | Epoch: 3 | Step: 92730 | Dataset: 0-15000703 | Loss: 2.115 | 675 ms/step , 58226.20 GFLOP/s , 532468.4 tokens/s INFO:__main__:2024-10-27 14:01:58 | Epoch: 3 | Step: 92740 | Dataset: 0-15008703 | Loss: 2.166 | 674 ms/step , 58344.30 GFLOP/s , 532765.7 tokens/s INFO:__main__:2024-10-27 14:02:06 | Epoch: 3 | Step: 92750 | Dataset: 0-15016703 | Loss: 2.150 | 674 ms/step , 58287.10 GFLOP/s , 533109.0 tokens/s INFO:__main__:2024-10-27 14:02:14 | Epoch: 3 | Step: 92760 | Dataset: 0-15024703 | Loss: 2.093 | 676 ms/step , 58146.56 GFLOP/s , 532611.8 tokens/s INFO:__main__:2024-10-27 14:02:21 | Epoch: 3 | Step: 92770 | Dataset: 0-15032703 | Loss: 2.235 | 675 ms/step , 58277.69 GFLOP/s , 532421.5 tokens/s INFO:__main__:2024-10-27 14:02:29 | Epoch: 3 | Step: 92780 | Dataset: 0-15040703 | Loss: 2.202 | 675 ms/step , 58242.43 GFLOP/s , 532281.4 tokens/s INFO:__main__:2024-10-27 14:02:37 | Epoch: 3 | Step: 92790 | Dataset: 0-15048703 | Loss: 2.186 | 676 ms/step , 58146.58 GFLOP/s , 532971.2 tokens/s INFO:__main__:2024-10-27 14:02:45 | Epoch: 3 | Step: 92800 | Dataset: 0-15056703 | Loss: 2.226 | 674 ms/step , 58332.56 GFLOP/s , 532817.4 tokens/s INFO:__main__:2024-10-27 14:02:52 | Epoch: 3 | Step: 92810 | Dataset: 0-15064703 | Loss: 2.203 | 674 ms/step , 58316.91 GFLOP/s , 532951.2 tokens/s INFO:__main__:2024-10-27 14:03:00 | Epoch: 3 | Step: 92820 | Dataset: 0-15072703 | Loss: 2.196 | 676 ms/step , 58148.81 GFLOP/s , 531961.0 tokens/s INFO:__main__:2024-10-27 14:03:08 | Epoch: 3 | Step: 92830 | Dataset: 0-15080703 | Loss: 2.166 | 675 ms/step , 58211.46 GFLOP/s , 532306.4 tokens/s INFO:__main__:2024-10-27 14:03:15 | Epoch: 3 | Step: 92840 | Dataset: 0-15088703 | Loss: 2.172 | 675 ms/step , 58226.83 GFLOP/s , 532152.0 tokens/s INFO:__main__:2024-10-27 14:03:23 | Epoch: 3 | Step: 92850 | Dataset: 0-15096703 | Loss: 2.149 | 675 ms/step , 58204.76 GFLOP/s , 531924.7 tokens/s INFO:__main__:2024-10-27 14:03:31 | Epoch: 3 | Step: 92860 | Dataset: 0-15104703 | Loss: 2.146 | 675 ms/step , 58254.01 GFLOP/s , 532598.9 tokens/s INFO:__main__:2024-10-27 14:03:38 | Epoch: 3 | Step: 92870 | Dataset: 0-15112703 | Loss: 2.155 | 675 ms/step , 58256.79 GFLOP/s , 532326.6 tokens/s INFO:__main__:2024-10-27 14:03:46 | Epoch: 3 | Step: 92880 | Dataset: 0-15120703 | Loss: 2.216 | 675 ms/step , 58234.26 GFLOP/s , 532380.8 tokens/s INFO:__main__:2024-10-27 14:03:54 | Epoch: 3 | Step: 92890 | Dataset: 0-15128703 | Loss: 2.189 | 675 ms/step , 58227.11 GFLOP/s , 531842.7 tokens/s INFO:__main__:2024-10-27 14:04:01 | Epoch: 3 | Step: 92900 | Dataset: 0-15136703 | Loss: 2.138 | 676 ms/step , 58154.07 GFLOP/s , 532364.5 tokens/s INFO:__main__:2024-10-27 14:04:09 | Epoch: 3 | Step: 92910 | Dataset: 0-15144703 | Loss: 2.084 | 675 ms/step , 58227.08 GFLOP/s , 532268.9 tokens/s INFO:__main__:2024-10-27 14:04:17 | Epoch: 3 | Step: 92920 | Dataset: 0-15152703 | Loss: 2.171 | 675 ms/step , 58211.74 GFLOP/s , 532524.3 tokens/s INFO:__main__:2024-10-27 14:04:25 | Epoch: 3 | Step: 92930 | Dataset: 0-15160703 | Loss: 2.235 | 675 ms/step , 58225.11 GFLOP/s , 532086.0 tokens/s INFO:__main__:2024-10-27 14:04:32 | Epoch: 3 | Step: 92940 | Dataset: 0-15168703 | Loss: 2.162 | 674 ms/step , 58314.18 GFLOP/s , 532248.5 tokens/s INFO:__main__:2024-10-27 14:04:40 | Epoch: 3 | Step: 92950 | Dataset: 0-15176703 | Loss: 2.159 | 675 ms/step , 58264.11 GFLOP/s , 533143.4 tokens/s INFO:__main__:2024-10-27 14:04:48 | Epoch: 3 | Step: 92960 | Dataset: 0-15184703 | Loss: 2.177 | 674 ms/step , 58328.11 GFLOP/s , 532719.0 tokens/s INFO:__main__:2024-10-27 14:04:55 | Epoch: 3 | Step: 92970 | Dataset: 0-15192703 | Loss: 2.131 | 677 ms/step , 58057.83 GFLOP/s , 531243.7 tokens/s INFO:__main__:2024-10-27 14:05:03 | Epoch: 3 | Step: 92980 | Dataset: 0-15200703 | Loss: 2.085 | 675 ms/step , 58242.74 GFLOP/s , 531932.6 tokens/s INFO:__main__:2024-10-27 14:05:11 | Epoch: 3 | Step: 92990 | Dataset: 0-15208703 | Loss: 2.172 | 675 ms/step , 58273.01 GFLOP/s , 532374.1 tokens/s INFO:__main__:2024-10-27 14:05:18 | Validation | Step: 93000 | Val_loss: 2.192 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 14:05:18 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_140518_step_93000.pt` INFO:__main__:2024-10-27 14:05:19 | Epoch: 3 | Step: 93000 | Dataset: 0-15216703 | Loss: 2.126 | 673 ms/step , 58366.89 GFLOP/s , 478060.6 tokens/s INFO:__main__:2024-10-27 14:05:27 | Epoch: 3 | Step: 93010 | Dataset: 0-15224703 | Loss: 2.174 | 675 ms/step , 58239.41 GFLOP/s , 532444.4 tokens/s INFO:__main__:2024-10-27 14:05:35 | Epoch: 3 | Step: 93020 | Dataset: 0-15232703 | Loss: 2.018 | 675 ms/step , 58224.14 GFLOP/s , 532450.6 tokens/s INFO:__main__:2024-10-27 14:05:42 | Epoch: 3 | Step: 93030 | Dataset: 0-15240703 | Loss: 2.048 | 675 ms/step , 58199.69 GFLOP/s , 532499.5 tokens/s INFO:__main__:2024-10-27 14:05:50 | Epoch: 3 | Step: 93040 | Dataset: 0-15248703 | Loss: 2.106 | 676 ms/step , 58172.43 GFLOP/s , 532119.8 tokens/s INFO:__main__:2024-10-27 14:05:58 | Epoch: 3 | Step: 93050 | Dataset: 0-15256703 | Loss: 2.107 | 675 ms/step , 58271.50 GFLOP/s , 532591.1 tokens/s INFO:__main__:2024-10-27 14:06:05 | Epoch: 3 | Step: 93060 | Dataset: 0-15264703 | Loss: 2.140 | 675 ms/step , 58218.10 GFLOP/s , 532790.4 tokens/s INFO:__main__:2024-10-27 14:06:13 | Epoch: 3 | Step: 93070 | Dataset: 0-15272703 | Loss: 2.108 | 674 ms/step , 58296.94 GFLOP/s , 532461.6 tokens/s INFO:__main__:2024-10-27 14:06:21 | Epoch: 3 | Step: 93080 | Dataset: 0-15280703 | Loss: 2.046 | 675 ms/step , 58206.35 GFLOP/s , 531859.6 tokens/s INFO:__main__:2024-10-27 14:06:29 | Epoch: 3 | Step: 93090 | Dataset: 0-15288703 | Loss: 2.070 | 675 ms/step , 58211.38 GFLOP/s , 531496.9 tokens/s INFO:__main__:2024-10-27 14:06:36 | Epoch: 3 | Step: 93100 | Dataset: 0-15296703 | Loss: 1.784 | 676 ms/step , 58145.64 GFLOP/s , 530989.8 tokens/s INFO:__main__:2024-10-27 14:06:44 | Epoch: 3 | Step: 93110 | Dataset: 0-15304703 | Loss: 1.766 | 677 ms/step , 58104.17 GFLOP/s , 531747.3 tokens/s INFO:__main__:2024-10-27 14:06:52 | Epoch: 3 | Step: 93120 | Dataset: 0-15312703 | Loss: 1.741 | 676 ms/step , 58165.32 GFLOP/s , 531733.7 tokens/s INFO:__main__:2024-10-27 14:06:59 | Epoch: 3 | Step: 93130 | Dataset: 0-15320703 | Loss: 1.695 | 675 ms/step , 58243.72 GFLOP/s , 531936.5 tokens/s INFO:__main__:2024-10-27 14:07:07 | Epoch: 3 | Step: 93140 | Dataset: 0-15328703 | Loss: 1.708 | 676 ms/step , 58168.11 GFLOP/s , 531361.7 tokens/s INFO:__main__:2024-10-27 14:07:15 | Epoch: 3 | Step: 93150 | Dataset: 0-15336703 | Loss: 1.679 | 676 ms/step , 58164.08 GFLOP/s , 531465.4 tokens/s INFO:__main__:2024-10-27 14:07:22 | Epoch: 3 | Step: 93160 | Dataset: 0-15344703 | Loss: 1.673 | 677 ms/step , 58095.81 GFLOP/s , 531734.5 tokens/s INFO:__main__:2024-10-27 14:07:30 | Epoch: 3 | Step: 93170 | Dataset: 0-15352703 | Loss: 1.655 | 677 ms/step , 58044.47 GFLOP/s , 530806.0 tokens/s INFO:__main__:2024-10-27 14:07:38 | Epoch: 3 | Step: 93180 | Dataset: 0-15360703 | Loss: 2.234 | 675 ms/step , 58251.69 GFLOP/s , 530962.1 tokens/s INFO:__main__:2024-10-27 14:07:46 | Epoch: 3 | Step: 93190 | Dataset: 0-15368703 | Loss: 2.186 | 674 ms/step , 58282.89 GFLOP/s , 532546.4 tokens/s INFO:__main__:2024-10-27 14:07:53 | Epoch: 3 | Step: 93200 | Dataset: 0-15376703 | Loss: 2.226 | 674 ms/step , 58320.75 GFLOP/s , 532798.0 tokens/s INFO:__main__:2024-10-27 14:08:01 | Epoch: 3 | Step: 93210 | Dataset: 0-15384703 | Loss: 2.108 | 675 ms/step , 58196.01 GFLOP/s , 532384.1 tokens/s INFO:__main__:2024-10-27 14:08:09 | Epoch: 3 | Step: 93220 | Dataset: 0-15392703 | Loss: 2.148 | 675 ms/step , 58220.43 GFLOP/s , 532212.4 tokens/s INFO:__main__:2024-10-27 14:08:16 | Epoch: 3 | Step: 93230 | Dataset: 0-15400703 | Loss: 2.147 | 675 ms/step , 58203.24 GFLOP/s , 532378.5 tokens/s INFO:__main__:2024-10-27 14:08:24 | Epoch: 3 | Step: 93240 | Dataset: 0-15408703 | Loss: 2.079 | 674 ms/step , 58292.09 GFLOP/s , 532728.8 tokens/s INFO:__main__:2024-10-27 14:08:32 | Epoch: 3 | Step: 93250 | Dataset: 0-15416703 | Loss: 2.074 | 675 ms/step , 58224.26 GFLOP/s , 532933.9 tokens/s INFO:__main__:2024-10-27 14:08:39 | Epoch: 3 | Step: 93260 | Dataset: 0-15424703 | Loss: 2.087 | 675 ms/step , 58196.50 GFLOP/s , 532527.3 tokens/s INFO:__main__:2024-10-27 14:08:47 | Epoch: 3 | Step: 93270 | Dataset: 0-15432703 | Loss: 2.049 | 675 ms/step , 58234.68 GFLOP/s , 532863.7 tokens/s INFO:__main__:2024-10-27 14:08:55 | Epoch: 3 | Step: 93280 | Dataset: 0-15440703 | Loss: 2.153 | 674 ms/step , 58289.12 GFLOP/s , 532973.9 tokens/s INFO:__main__:2024-10-27 14:09:03 | Epoch: 3 | Step: 93290 | Dataset: 0-15448703 | Loss: 2.104 | 674 ms/step , 58295.96 GFLOP/s , 532916.0 tokens/s INFO:__main__:2024-10-27 14:09:10 | Epoch: 3 | Step: 93300 | Dataset: 0-15456703 | Loss: 2.055 | 674 ms/step , 58296.44 GFLOP/s , 533286.8 tokens/s INFO:__main__:2024-10-27 14:09:18 | Epoch: 3 | Step: 93310 | Dataset: 0-15464703 | Loss: 2.137 | 674 ms/step , 58285.63 GFLOP/s , 533233.9 tokens/s INFO:__main__:2024-10-27 14:09:26 | Epoch: 3 | Step: 93320 | Dataset: 0-15472703 | Loss: 2.100 | 675 ms/step , 58274.49 GFLOP/s , 533134.9 tokens/s INFO:__main__:2024-10-27 14:09:33 | Epoch: 3 | Step: 93330 | Dataset: 0-15480703 | Loss: 2.133 | 676 ms/step , 58176.29 GFLOP/s , 532662.6 tokens/s INFO:__main__:2024-10-27 14:09:41 | Epoch: 3 | Step: 93340 | Dataset: 0-15488703 | Loss: 1.888 | 674 ms/step , 58286.12 GFLOP/s , 529950.1 tokens/s INFO:__main__:2024-10-27 14:09:49 | Epoch: 3 | Step: 93350 | Dataset: 0-15496703 | Loss: 1.829 | 674 ms/step , 58319.39 GFLOP/s , 532509.5 tokens/s INFO:__main__:2024-10-27 14:09:56 | Epoch: 3 | Step: 93360 | Dataset: 0-15504703 | Loss: 1.802 | 675 ms/step , 58270.45 GFLOP/s , 532294.8 tokens/s INFO:__main__:2024-10-27 14:10:04 | Epoch: 3 | Step: 93370 | Dataset: 0-15512703 | Loss: 1.774 | 675 ms/step , 58246.49 GFLOP/s , 531876.4 tokens/s INFO:__main__:2024-10-27 14:10:12 | Epoch: 3 | Step: 93380 | Dataset: 0-15520703 | Loss: 1.761 | 675 ms/step , 58228.55 GFLOP/s , 532211.1 tokens/s INFO:__main__:2024-10-27 14:10:19 | Epoch: 3 | Step: 93390 | Dataset: 0-15528703 | Loss: 1.803 | 674 ms/step , 58294.38 GFLOP/s , 533004.2 tokens/s INFO:__main__:2024-10-27 14:10:27 | Epoch: 3 | Step: 93400 | Dataset: 0-15536703 | Loss: 1.756 | 677 ms/step , 58057.32 GFLOP/s , 531435.9 tokens/s INFO:__main__:2024-10-27 14:10:35 | Epoch: 3 | Step: 93410 | Dataset: 0-15544703 | Loss: 1.753 | 678 ms/step , 57965.82 GFLOP/s , 530659.4 tokens/s INFO:__main__:2024-10-27 14:10:43 | Epoch: 3 | Step: 93420 | Dataset: 0-15552703 | Loss: 1.771 | 676 ms/step , 58113.37 GFLOP/s , 530695.7 tokens/s INFO:__main__:2024-10-27 14:10:50 | Epoch: 3 | Step: 93430 | Dataset: 0-15560703 | Loss: 1.764 | 677 ms/step , 58036.80 GFLOP/s , 530343.4 tokens/s INFO:__main__:2024-10-27 14:10:58 | Epoch: 3 | Step: 93440 | Dataset: 0-15568703 | Loss: 1.750 | 674 ms/step , 58324.42 GFLOP/s , 531820.6 tokens/s INFO:__main__:2024-10-27 14:11:06 | Epoch: 3 | Step: 93450 | Dataset: 0-15576703 | Loss: 1.784 | 674 ms/step , 58305.62 GFLOP/s , 532937.6 tokens/s INFO:__main__:2024-10-27 14:11:13 | Epoch: 3 | Step: 93460 | Dataset: 0-15584703 | Loss: 1.740 | 675 ms/step , 58279.07 GFLOP/s , 532184.4 tokens/s INFO:__main__:2024-10-27 14:11:21 | Epoch: 3 | Step: 93470 | Dataset: 0-15592703 | Loss: 1.712 | 676 ms/step , 58153.37 GFLOP/s , 531961.7 tokens/s INFO:__main__:2024-10-27 14:11:29 | Epoch: 3 | Step: 93480 | Dataset: 0-15600703 | Loss: 1.740 | 675 ms/step , 58239.98 GFLOP/s , 532370.2 tokens/s INFO:__main__:2024-10-27 14:11:36 | Epoch: 3 | Step: 93490 | Dataset: 0-15608703 | Loss: 1.726 | 675 ms/step , 58274.53 GFLOP/s , 532506.9 tokens/s INFO:__main__:2024-10-27 14:11:44 | Epoch: 3 | Step: 93500 | Dataset: 0-15616703 | Loss: 1.741 | 676 ms/step , 58176.61 GFLOP/s , 532176.3 tokens/s INFO:__main__:2024-10-27 14:11:52 | Epoch: 3 | Step: 93510 | Dataset: 0-15624703 | Loss: 1.733 | 678 ms/step , 57994.60 GFLOP/s , 531043.9 tokens/s INFO:__main__:2024-10-27 14:12:00 | Epoch: 3 | Step: 93520 | Dataset: 0-15632703 | Loss: 2.342 | 675 ms/step , 58275.57 GFLOP/s , 530639.4 tokens/s INFO:__main__:2024-10-27 14:12:07 | Epoch: 3 | Step: 93530 | Dataset: 0-15640703 | Loss: 2.183 | 676 ms/step , 58163.87 GFLOP/s , 530741.8 tokens/s INFO:__main__:2024-10-27 14:12:15 | Epoch: 3 | Step: 93540 | Dataset: 0-15648703 | Loss: 2.176 | 676 ms/step , 58186.13 GFLOP/s , 532000.8 tokens/s INFO:__main__:2024-10-27 14:12:23 | Epoch: 3 | Step: 93550 | Dataset: 0-15656703 | Loss: 2.167 | 675 ms/step , 58203.65 GFLOP/s , 531713.9 tokens/s INFO:__main__:2024-10-27 14:12:30 | Epoch: 3 | Step: 93560 | Dataset: 0-15664703 | Loss: 2.145 | 676 ms/step , 58108.05 GFLOP/s , 532069.5 tokens/s INFO:__main__:2024-10-27 14:12:38 | Epoch: 3 | Step: 93570 | Dataset: 0-15672703 | Loss: 2.116 | 675 ms/step , 58267.38 GFLOP/s , 532060.6 tokens/s INFO:__main__:2024-10-27 14:12:46 | Epoch: 3 | Step: 93580 | Dataset: 0-15680703 | Loss: 2.243 | 676 ms/step , 58122.27 GFLOP/s , 532434.7 tokens/s INFO:__main__:2024-10-27 14:12:54 | Epoch: 3 | Step: 93590 | Dataset: 0-15688703 | Loss: 2.151 | 675 ms/step , 58236.04 GFLOP/s , 532038.4 tokens/s INFO:__main__:2024-10-27 14:13:01 | Epoch: 3 | Step: 93600 | Dataset: 0-15696703 | Loss: 2.164 | 678 ms/step , 57980.98 GFLOP/s , 531834.4 tokens/s INFO:__main__:2024-10-27 14:13:09 | Epoch: 3 | Step: 93610 | Dataset: 0-15704703 | Loss: 2.173 | 675 ms/step , 58199.40 GFLOP/s , 531813.3 tokens/s INFO:__main__:2024-10-27 14:13:17 | Epoch: 3 | Step: 93620 | Dataset: 0-15712703 | Loss: 2.183 | 677 ms/step , 58021.26 GFLOP/s , 532081.9 tokens/s INFO:__main__:2024-10-27 14:13:24 | Epoch: 3 | Step: 93630 | Dataset: 0-15720703 | Loss: 2.146 | 676 ms/step , 58115.33 GFLOP/s , 532023.1 tokens/s INFO:__main__:2024-10-27 14:13:32 | Epoch: 3 | Step: 93640 | Dataset: 0-15728703 | Loss: 2.072 | 675 ms/step , 58226.66 GFLOP/s , 531034.7 tokens/s INFO:__main__:2024-10-27 14:13:40 | Epoch: 3 | Step: 93650 | Dataset: 0-15736703 | Loss: 2.184 | 675 ms/step , 58230.75 GFLOP/s , 531165.9 tokens/s INFO:__main__:2024-10-27 14:13:47 | Epoch: 3 | Step: 93660 | Dataset: 0-15744703 | Loss: 2.203 | 676 ms/step , 58168.17 GFLOP/s , 530934.4 tokens/s INFO:__main__:2024-10-27 14:13:55 | Epoch: 3 | Step: 93670 | Dataset: 0-15752703 | Loss: 2.135 | 676 ms/step , 58167.46 GFLOP/s , 531465.3 tokens/s INFO:__main__:2024-10-27 14:14:03 | Epoch: 3 | Step: 93680 | Dataset: 0-15760703 | Loss: 1.750 | 677 ms/step , 58092.43 GFLOP/s , 531042.7 tokens/s INFO:__main__:2024-10-27 14:14:11 | Epoch: 3 | Step: 93690 | Dataset: 0-15768703 | Loss: 1.697 | 676 ms/step , 58127.85 GFLOP/s , 531104.4 tokens/s INFO:__main__:2024-10-27 14:14:18 | Epoch: 3 | Step: 93700 | Dataset: 0-15776703 | Loss: 1.632 | 675 ms/step , 58200.87 GFLOP/s , 529600.7 tokens/s INFO:__main__:2024-10-27 14:14:26 | Epoch: 3 | Step: 93710 | Dataset: 0-15784703 | Loss: 1.673 | 677 ms/step , 58033.22 GFLOP/s , 530541.4 tokens/s INFO:__main__:2024-10-27 14:14:34 | Epoch: 3 | Step: 93720 | Dataset: 0-15792703 | Loss: 1.675 | 676 ms/step , 58127.41 GFLOP/s , 532023.8 tokens/s INFO:__main__:2024-10-27 14:14:41 | Epoch: 3 | Step: 93730 | Dataset: 0-15800703 | Loss: 1.647 | 675 ms/step , 58193.90 GFLOP/s , 531345.1 tokens/s INFO:__main__:2024-10-27 14:14:49 | Epoch: 3 | Step: 93740 | Dataset: 0-15808703 | Loss: 1.660 | 675 ms/step , 58222.26 GFLOP/s , 531702.4 tokens/s INFO:__main__:2024-10-27 14:14:57 | Epoch: 3 | Step: 93750 | Dataset: 0-15816703 | Loss: 1.650 | 675 ms/step , 58202.36 GFLOP/s , 531438.5 tokens/s INFO:__main__:2024-10-27 14:15:05 | Epoch: 3 | Step: 93760 | Dataset: 0-15824703 | Loss: 1.654 | 675 ms/step , 58207.24 GFLOP/s , 531623.8 tokens/s INFO:__main__:2024-10-27 14:15:12 | Epoch: 3 | Step: 93770 | Dataset: 0-15832703 | Loss: 2.227 | 676 ms/step , 58163.54 GFLOP/s , 532028.9 tokens/s INFO:__main__:2024-10-27 14:15:20 | Epoch: 3 | Step: 93780 | Dataset: 0-15840703 | Loss: 2.179 | 676 ms/step , 58157.76 GFLOP/s , 532357.4 tokens/s INFO:__main__:2024-10-27 14:15:28 | Epoch: 3 | Step: 93790 | Dataset: 0-15848703 | Loss: 2.209 | 675 ms/step , 58225.41 GFLOP/s , 532251.0 tokens/s INFO:__main__:2024-10-27 14:15:35 | Epoch: 3 | Step: 93800 | Dataset: 0-15856703 | Loss: 2.162 | 675 ms/step , 58249.15 GFLOP/s , 532554.8 tokens/s INFO:__main__:2024-10-27 14:15:43 | Epoch: 3 | Step: 93810 | Dataset: 0-15864703 | Loss: 2.182 | 676 ms/step , 58148.26 GFLOP/s , 531709.4 tokens/s INFO:__main__:2024-10-27 14:15:51 | Epoch: 3 | Step: 93820 | Dataset: 0-15872703 | Loss: 2.111 | 676 ms/step , 58162.91 GFLOP/s , 531894.8 tokens/s INFO:__main__:2024-10-27 14:15:58 | Epoch: 3 | Step: 93830 | Dataset: 0-15880703 | Loss: 2.133 | 675 ms/step , 58257.71 GFLOP/s , 532210.5 tokens/s INFO:__main__:2024-10-27 14:16:06 | Epoch: 3 | Step: 93840 | Dataset: 0-15888703 | Loss: 2.186 | 676 ms/step , 58178.05 GFLOP/s , 529780.9 tokens/s INFO:__main__:2024-10-27 14:16:14 | Epoch: 3 | Step: 93850 | Dataset: 0-15896703 | Loss: 2.183 | 677 ms/step , 58062.47 GFLOP/s , 531628.6 tokens/s INFO:__main__:2024-10-27 14:16:22 | Epoch: 3 | Step: 93860 | Dataset: 0-15904703 | Loss: 2.107 | 676 ms/step , 58141.44 GFLOP/s , 531090.2 tokens/s INFO:__main__:2024-10-27 14:16:29 | Epoch: 3 | Step: 93870 | Dataset: 0-15912703 | Loss: 2.178 | 675 ms/step , 58196.00 GFLOP/s , 530840.2 tokens/s INFO:__main__:2024-10-27 14:16:37 | Epoch: 3 | Step: 93880 | Dataset: 0-15920703 | Loss: 2.140 | 676 ms/step , 58190.82 GFLOP/s , 531041.1 tokens/s INFO:__main__:2024-10-27 14:16:45 | Epoch: 3 | Step: 93890 | Dataset: 0-15928703 | Loss: 2.180 | 675 ms/step , 58255.95 GFLOP/s , 531574.7 tokens/s INFO:__main__:2024-10-27 14:16:52 | Epoch: 3 | Step: 93900 | Dataset: 0-15936703 | Loss: 2.101 | 676 ms/step , 58127.22 GFLOP/s , 530592.8 tokens/s INFO:__main__:2024-10-27 14:17:00 | Epoch: 3 | Step: 93910 | Dataset: 0-15944703 | Loss: 2.134 | 676 ms/step , 58148.36 GFLOP/s , 530831.9 tokens/s INFO:__main__:2024-10-27 14:17:08 | Epoch: 3 | Step: 93920 | Dataset: 0-15952703 | Loss: 2.141 | 674 ms/step , 58334.14 GFLOP/s , 531093.3 tokens/s INFO:__main__:2024-10-27 14:17:16 | Epoch: 3 | Step: 93930 | Dataset: 0-15960703 | Loss: 2.223 | 675 ms/step , 58271.86 GFLOP/s , 531371.1 tokens/s INFO:__main__:2024-10-27 14:17:23 | Epoch: 3 | Step: 93940 | Dataset: 0-15968703 | Loss: 2.145 | 676 ms/step , 58148.37 GFLOP/s , 531107.4 tokens/s INFO:__main__:2024-10-27 14:17:31 | Epoch: 3 | Step: 93950 | Dataset: 0-15976703 | Loss: 2.151 | 674 ms/step , 58317.33 GFLOP/s , 528470.3 tokens/s INFO:__main__:2024-10-27 14:17:39 | Epoch: 3 | Step: 93960 | Dataset: 0-15984703 | Loss: 2.144 | 676 ms/step , 58177.98 GFLOP/s , 532396.8 tokens/s INFO:__main__:2024-10-27 14:17:46 | Epoch: 3 | Step: 93970 | Dataset: 0-15992703 | Loss: 2.095 | 675 ms/step , 58217.24 GFLOP/s , 531780.8 tokens/s INFO:__main__:2024-10-27 14:17:54 | Epoch: 3 | Step: 93980 | Dataset: 0-16000703 | Loss: 2.116 | 675 ms/step , 58272.00 GFLOP/s , 532409.8 tokens/s INFO:__main__:2024-10-27 14:18:02 | Epoch: 3 | Step: 93990 | Dataset: 0-16008703 | Loss: 2.136 | 675 ms/step , 58232.51 GFLOP/s , 532849.8 tokens/s INFO:__main__:2024-10-27 14:18:09 | Validation | Step: 94000 | Val_loss: 2.100 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 14:18:09 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_141809_step_94000.pt` INFO:__main__:2024-10-27 14:18:10 | Epoch: 3 | Step: 94000 | Dataset: 0-16016703 | Loss: 2.140 | 674 ms/step , 58313.25 GFLOP/s , 479288.2 tokens/s INFO:__main__:2024-10-27 14:18:18 | Epoch: 3 | Step: 94010 | Dataset: 0-16024703 | Loss: 2.131 | 676 ms/step , 58153.79 GFLOP/s , 531776.5 tokens/s INFO:__main__:2024-10-27 14:18:26 | Epoch: 3 | Step: 94020 | Dataset: 0-16032703 | Loss: 2.085 | 675 ms/step , 58263.41 GFLOP/s , 532312.8 tokens/s INFO:__main__:2024-10-27 14:18:33 | Epoch: 3 | Step: 94030 | Dataset: 0-16040703 | Loss: 2.134 | 675 ms/step , 58239.92 GFLOP/s , 532564.2 tokens/s INFO:__main__:2024-10-27 14:18:41 | Epoch: 3 | Step: 94040 | Dataset: 0-16048703 | Loss: 2.155 | 674 ms/step , 58314.69 GFLOP/s , 532628.5 tokens/s INFO:__main__:2024-10-27 14:18:49 | Epoch: 3 | Step: 94050 | Dataset: 0-16056703 | Loss: 2.174 | 675 ms/step , 58228.93 GFLOP/s , 532533.2 tokens/s INFO:__main__:2024-10-27 14:18:57 | Epoch: 3 | Step: 94060 | Dataset: 0-16064703 | Loss: 2.107 | 675 ms/step , 58239.64 GFLOP/s , 531921.5 tokens/s INFO:__main__:2024-10-27 14:19:04 | Epoch: 3 | Step: 94070 | Dataset: 0-16072703 | Loss: 2.159 | 674 ms/step , 58342.33 GFLOP/s , 532012.4 tokens/s INFO:__main__:2024-10-27 14:19:12 | Epoch: 3 | Step: 94080 | Dataset: 0-16080703 | Loss: 2.144 | 676 ms/step , 58134.56 GFLOP/s , 531697.5 tokens/s INFO:__main__:2024-10-27 14:19:20 | Epoch: 3 | Step: 94090 | Dataset: 0-16088703 | Loss: 2.179 | 675 ms/step , 58212.15 GFLOP/s , 531055.0 tokens/s INFO:__main__:2024-10-27 14:19:27 | Epoch: 3 | Step: 94100 | Dataset: 0-16096703 | Loss: 2.158 | 675 ms/step , 58259.88 GFLOP/s , 532311.4 tokens/s INFO:__main__:2024-10-27 14:19:35 | Epoch: 3 | Step: 94110 | Dataset: 0-16104703 | Loss: 2.126 | 674 ms/step , 58283.36 GFLOP/s , 532630.9 tokens/s INFO:__main__:2024-10-27 14:19:43 | Epoch: 3 | Step: 94120 | Dataset: 0-16112703 | Loss: 2.158 | 676 ms/step , 58130.18 GFLOP/s , 532014.0 tokens/s INFO:__main__:2024-10-27 14:19:50 | Epoch: 3 | Step: 94130 | Dataset: 0-16120703 | Loss: 2.049 | 674 ms/step , 58298.02 GFLOP/s , 532869.7 tokens/s INFO:__main__:2024-10-27 14:19:58 | Epoch: 3 | Step: 94140 | Dataset: 0-16128703 | Loss: 2.140 | 677 ms/step , 58024.36 GFLOP/s , 531788.8 tokens/s INFO:__main__:2024-10-27 14:20:06 | Epoch: 3 | Step: 94150 | Dataset: 0-16136703 | Loss: 2.084 | 678 ms/step , 57995.00 GFLOP/s , 529658.8 tokens/s INFO:__main__:2024-10-27 14:20:14 | Epoch: 3 | Step: 94160 | Dataset: 0-16144703 | Loss: 2.079 | 677 ms/step , 58076.30 GFLOP/s , 530667.6 tokens/s INFO:__main__:2024-10-27 14:20:21 | Epoch: 3 | Step: 94170 | Dataset: 0-16152703 | Loss: 2.048 | 674 ms/step , 58301.89 GFLOP/s , 531331.9 tokens/s INFO:__main__:2024-10-27 14:20:29 | Epoch: 3 | Step: 94180 | Dataset: 0-16160703 | Loss: 2.159 | 676 ms/step , 58130.33 GFLOP/s , 532072.3 tokens/s INFO:__main__:2024-10-27 14:20:37 | Epoch: 3 | Step: 94190 | Dataset: 0-16168703 | Loss: 2.046 | 675 ms/step , 58229.18 GFLOP/s , 528645.8 tokens/s INFO:__main__:2024-10-27 14:20:44 | Epoch: 3 | Step: 94200 | Dataset: 0-16176703 | Loss: 2.032 | 675 ms/step , 58258.10 GFLOP/s , 532128.5 tokens/s INFO:__main__:2024-10-27 14:20:52 | Epoch: 3 | Step: 94210 | Dataset: 0-16184703 | Loss: 2.060 | 676 ms/step , 58182.59 GFLOP/s , 531678.1 tokens/s INFO:__main__:2024-10-27 14:21:00 | Epoch: 3 | Step: 94220 | Dataset: 0-16192703 | Loss: 2.029 | 675 ms/step , 58233.61 GFLOP/s , 532093.0 tokens/s INFO:__main__:2024-10-27 14:21:08 | Epoch: 3 | Step: 94230 | Dataset: 0-16200703 | Loss: 2.094 | 675 ms/step , 58276.44 GFLOP/s , 532357.5 tokens/s INFO:__main__:2024-10-27 14:21:15 | Epoch: 3 | Step: 94240 | Dataset: 0-16208703 | Loss: 2.084 | 676 ms/step , 58166.88 GFLOP/s , 532591.8 tokens/s INFO:__main__:2024-10-27 14:21:23 | Epoch: 3 | Step: 94250 | Dataset: 0-16216703 | Loss: 2.106 | 675 ms/step , 58271.31 GFLOP/s , 531817.6 tokens/s INFO:__main__:2024-10-27 14:21:31 | Epoch: 3 | Step: 94260 | Dataset: 0-16224703 | Loss: 2.114 | 675 ms/step , 58243.95 GFLOP/s , 531476.9 tokens/s INFO:__main__:2024-10-27 14:21:38 | Epoch: 3 | Step: 94270 | Dataset: 0-16232703 | Loss: 2.166 | 676 ms/step , 58171.46 GFLOP/s , 530324.6 tokens/s INFO:__main__:2024-10-27 14:21:46 | Epoch: 3 | Step: 94280 | Dataset: 0-16240703 | Loss: 2.041 | 677 ms/step , 58095.22 GFLOP/s , 531711.4 tokens/s INFO:__main__:2024-10-27 14:21:54 | Epoch: 3 | Step: 94290 | Dataset: 0-16248703 | Loss: 2.133 | 677 ms/step , 58095.38 GFLOP/s , 531755.3 tokens/s INFO:__main__:2024-10-27 14:22:01 | Epoch: 3 | Step: 94300 | Dataset: 0-16256703 | Loss: 2.134 | 676 ms/step , 58148.29 GFLOP/s , 531870.5 tokens/s INFO:__main__:2024-10-27 14:22:09 | Epoch: 3 | Step: 94310 | Dataset: 0-16264703 | Loss: 2.079 | 675 ms/step , 58233.41 GFLOP/s , 531883.5 tokens/s INFO:__main__:2024-10-27 14:22:17 | Epoch: 3 | Step: 94320 | Dataset: 0-16272703 | Loss: 2.047 | 679 ms/step , 57915.88 GFLOP/s , 532238.2 tokens/s INFO:__main__:2024-10-27 14:22:25 | Epoch: 3 | Step: 94330 | Dataset: 0-16280703 | Loss: 2.098 | 675 ms/step , 58240.55 GFLOP/s , 532138.8 tokens/s INFO:__main__:2024-10-27 14:22:32 | Epoch: 3 | Step: 94340 | Dataset: 0-16288703 | Loss: 2.167 | 676 ms/step , 58172.57 GFLOP/s , 532213.3 tokens/s INFO:__main__:2024-10-27 14:22:40 | Epoch: 3 | Step: 94350 | Dataset: 0-16296703 | Loss: 2.157 | 675 ms/step , 58242.48 GFLOP/s , 531355.0 tokens/s INFO:__main__:2024-10-27 14:22:48 | Epoch: 3 | Step: 94360 | Dataset: 0-16304703 | Loss: 2.064 | 676 ms/step , 58138.24 GFLOP/s , 531784.3 tokens/s INFO:__main__:2024-10-27 14:22:55 | Epoch: 3 | Step: 94370 | Dataset: 0-16312703 | Loss: 2.053 | 675 ms/step , 58209.43 GFLOP/s , 531855.9 tokens/s INFO:__main__:2024-10-27 14:23:03 | Epoch: 3 | Step: 94380 | Dataset: 0-16320703 | Loss: 2.109 | 674 ms/step , 58349.07 GFLOP/s , 532109.1 tokens/s INFO:__main__:2024-10-27 14:23:11 | Epoch: 3 | Step: 94390 | Dataset: 0-16328703 | Loss: 2.110 | 674 ms/step , 58319.08 GFLOP/s , 532481.8 tokens/s INFO:__main__:2024-10-27 14:23:18 | Epoch: 3 | Step: 94400 | Dataset: 0-16336703 | Loss: 2.172 | 675 ms/step , 58204.76 GFLOP/s , 531960.3 tokens/s INFO:__main__:2024-10-27 14:23:26 | Epoch: 3 | Step: 94410 | Dataset: 0-16344703 | Loss: 1.840 | 674 ms/step , 58285.99 GFLOP/s , 532027.1 tokens/s INFO:__main__:2024-10-27 14:23:34 | Epoch: 3 | Step: 94420 | Dataset: 0-16352703 | Loss: 1.713 | 676 ms/step , 58148.70 GFLOP/s , 531267.9 tokens/s INFO:__main__:2024-10-27 14:23:42 | Epoch: 3 | Step: 94430 | Dataset: 0-16360703 | Loss: 1.707 | 675 ms/step , 58234.87 GFLOP/s , 531857.4 tokens/s INFO:__main__:2024-10-27 14:23:49 | Epoch: 3 | Step: 94440 | Dataset: 0-16368703 | Loss: 1.684 | 675 ms/step , 58222.11 GFLOP/s , 531631.8 tokens/s INFO:__main__:2024-10-27 14:23:57 | Epoch: 3 | Step: 94450 | Dataset: 0-16376703 | Loss: 1.633 | 675 ms/step , 58274.31 GFLOP/s , 532108.8 tokens/s INFO:__main__:2024-10-27 14:24:05 | Epoch: 3 | Step: 94460 | Dataset: 0-16384703 | Loss: 1.689 | 675 ms/step , 58198.76 GFLOP/s , 531584.9 tokens/s INFO:__main__:2024-10-27 14:24:12 | Epoch: 3 | Step: 94470 | Dataset: 0-16392703 | Loss: 1.655 | 674 ms/step , 58304.60 GFLOP/s , 532008.2 tokens/s INFO:__main__:2024-10-27 14:24:20 | Epoch: 3 | Step: 94480 | Dataset: 0-16400703 | Loss: 1.660 | 675 ms/step , 58267.47 GFLOP/s , 532162.7 tokens/s INFO:__main__:2024-10-27 14:24:28 | Epoch: 3 | Step: 94490 | Dataset: 0-16408703 | Loss: 1.654 | 675 ms/step , 58212.67 GFLOP/s , 531825.9 tokens/s INFO:__main__:2024-10-27 14:24:35 | Epoch: 3 | Step: 94500 | Dataset: 0-16416703 | Loss: 2.222 | 675 ms/step , 58234.88 GFLOP/s , 531978.3 tokens/s INFO:__main__:2024-10-27 14:24:43 | Epoch: 3 | Step: 94510 | Dataset: 0-16424703 | Loss: 2.155 | 675 ms/step , 58207.99 GFLOP/s , 531969.4 tokens/s INFO:__main__:2024-10-27 14:24:51 | Epoch: 3 | Step: 94520 | Dataset: 0-16432703 | Loss: 2.142 | 675 ms/step , 58193.73 GFLOP/s , 531993.5 tokens/s INFO:__main__:2024-10-27 14:24:59 | Epoch: 3 | Step: 94530 | Dataset: 0-16440703 | Loss: 2.136 | 675 ms/step , 58197.57 GFLOP/s , 531664.4 tokens/s INFO:__main__:2024-10-27 14:25:06 | Epoch: 3 | Step: 94540 | Dataset: 0-16448703 | Loss: 2.178 | 675 ms/step , 58210.80 GFLOP/s , 531695.7 tokens/s INFO:__main__:2024-10-27 14:25:14 | Epoch: 3 | Step: 94550 | Dataset: 0-16456703 | Loss: 2.126 | 676 ms/step , 58177.11 GFLOP/s , 532185.9 tokens/s INFO:__main__:2024-10-27 14:25:22 | Epoch: 3 | Step: 94560 | Dataset: 0-16464703 | Loss: 2.115 | 675 ms/step , 58223.33 GFLOP/s , 531715.8 tokens/s INFO:__main__:2024-10-27 14:25:29 | Epoch: 3 | Step: 94570 | Dataset: 0-16472703 | Loss: 2.181 | 675 ms/step , 58275.58 GFLOP/s , 532619.1 tokens/s INFO:__main__:2024-10-27 14:25:37 | Epoch: 3 | Step: 94580 | Dataset: 0-16480703 | Loss: 2.085 | 675 ms/step , 58252.21 GFLOP/s , 532397.5 tokens/s INFO:__main__:2024-10-27 14:25:45 | Epoch: 3 | Step: 94590 | Dataset: 0-16488703 | Loss: 2.131 | 674 ms/step , 58302.94 GFLOP/s , 532365.1 tokens/s INFO:__main__:2024-10-27 14:25:52 | Epoch: 3 | Step: 94600 | Dataset: 0-16496703 | Loss: 2.152 | 675 ms/step , 58250.65 GFLOP/s , 532209.8 tokens/s INFO:__main__:2024-10-27 14:26:00 | Epoch: 3 | Step: 94610 | Dataset: 0-16504703 | Loss: 2.108 | 676 ms/step , 58160.84 GFLOP/s , 532188.1 tokens/s INFO:__main__:2024-10-27 14:26:08 | Epoch: 3 | Step: 94620 | Dataset: 0-16512703 | Loss: 2.122 | 675 ms/step , 58192.95 GFLOP/s , 531424.3 tokens/s INFO:__main__:2024-10-27 14:26:16 | Epoch: 3 | Step: 94630 | Dataset: 0-16520703 | Loss: 2.124 | 674 ms/step , 58335.21 GFLOP/s , 530565.6 tokens/s INFO:__main__:2024-10-27 14:26:23 | Epoch: 3 | Step: 94640 | Dataset: 0-16528703 | Loss: 2.127 | 675 ms/step , 58210.20 GFLOP/s , 532316.3 tokens/s INFO:__main__:2024-10-27 14:26:31 | Epoch: 3 | Step: 94650 | Dataset: 0-16536703 | Loss: 2.066 | 675 ms/step , 58233.48 GFLOP/s , 532166.9 tokens/s INFO:__main__:2024-10-27 14:26:39 | Epoch: 3 | Step: 94660 | Dataset: 0-16544703 | Loss: 2.019 | 674 ms/step , 58303.57 GFLOP/s , 532671.2 tokens/s INFO:__main__:2024-10-27 14:26:46 | Epoch: 3 | Step: 94670 | Dataset: 0-16552703 | Loss: 2.171 | 676 ms/step , 58148.21 GFLOP/s , 531950.9 tokens/s INFO:__main__:2024-10-27 14:26:54 | Epoch: 3 | Step: 94680 | Dataset: 0-16560703 | Loss: 2.085 | 676 ms/step , 58173.54 GFLOP/s , 531801.5 tokens/s INFO:__main__:2024-10-27 14:27:02 | Epoch: 3 | Step: 94690 | Dataset: 0-16568703 | Loss: 2.203 | 676 ms/step , 58185.71 GFLOP/s , 532055.5 tokens/s INFO:__main__:2024-10-27 14:27:09 | Epoch: 3 | Step: 94700 | Dataset: 0-16576703 | Loss: 2.012 | 675 ms/step , 58193.15 GFLOP/s , 531728.8 tokens/s INFO:__main__:2024-10-27 14:27:17 | Epoch: 3 | Step: 94710 | Dataset: 0-16584703 | Loss: 2.163 | 675 ms/step , 58271.47 GFLOP/s , 532110.1 tokens/s INFO:__main__:2024-10-27 14:27:25 | Epoch: 3 | Step: 94720 | Dataset: 0-16592703 | Loss: 2.035 | 677 ms/step , 58074.97 GFLOP/s , 531810.7 tokens/s INFO:__main__:2024-10-27 14:27:33 | Epoch: 3 | Step: 94730 | Dataset: 0-16600703 | Loss: 2.128 | 674 ms/step , 58280.53 GFLOP/s , 532481.9 tokens/s INFO:__main__:2024-10-27 14:27:40 | Epoch: 3 | Step: 94740 | Dataset: 0-16608703 | Loss: 2.048 | 675 ms/step , 58265.19 GFLOP/s , 532324.4 tokens/s INFO:__main__:2024-10-27 14:27:48 | Epoch: 3 | Step: 94750 | Dataset: 0-16616703 | Loss: 2.084 | 675 ms/step , 58219.56 GFLOP/s , 532359.3 tokens/s INFO:__main__:2024-10-27 14:27:56 | Epoch: 3 | Step: 94760 | Dataset: 0-16624703 | Loss: 2.153 | 675 ms/step , 58227.41 GFLOP/s , 532030.6 tokens/s INFO:__main__:2024-10-27 14:28:03 | Epoch: 3 | Step: 94770 | Dataset: 0-16632703 | Loss: 2.092 | 676 ms/step , 58190.14 GFLOP/s , 531821.8 tokens/s INFO:__main__:2024-10-27 14:28:11 | Epoch: 3 | Step: 94780 | Dataset: 0-16640703 | Loss: 2.028 | 675 ms/step , 58223.78 GFLOP/s , 531939.9 tokens/s INFO:__main__:2024-10-27 14:28:19 | Epoch: 3 | Step: 94790 | Dataset: 0-16648703 | Loss: 2.107 | 675 ms/step , 58234.95 GFLOP/s , 532248.0 tokens/s INFO:__main__:2024-10-27 14:28:26 | Epoch: 3 | Step: 94800 | Dataset: 0-16656703 | Loss: 2.108 | 674 ms/step , 58286.18 GFLOP/s , 531798.5 tokens/s INFO:__main__:2024-10-27 14:28:34 | Epoch: 3 | Step: 94810 | Dataset: 0-16664703 | Loss: 2.065 | 675 ms/step , 58205.71 GFLOP/s , 532041.8 tokens/s INFO:__main__:2024-10-27 14:28:42 | Epoch: 3 | Step: 94820 | Dataset: 0-16672703 | Loss: 1.778 | 676 ms/step , 58171.12 GFLOP/s , 531858.9 tokens/s INFO:__main__:2024-10-27 14:28:50 | Epoch: 3 | Step: 94830 | Dataset: 0-16680703 | Loss: 1.696 | 676 ms/step , 58187.80 GFLOP/s , 529607.1 tokens/s INFO:__main__:2024-10-27 14:28:57 | Epoch: 3 | Step: 94840 | Dataset: 0-16688703 | Loss: 1.686 | 676 ms/step , 58168.07 GFLOP/s , 531903.7 tokens/s INFO:__main__:2024-10-27 14:29:05 | Epoch: 3 | Step: 94850 | Dataset: 0-16696703 | Loss: 1.679 | 677 ms/step , 58080.66 GFLOP/s , 531461.7 tokens/s INFO:__main__:2024-10-27 14:29:13 | Epoch: 3 | Step: 94860 | Dataset: 0-16704703 | Loss: 1.656 | 675 ms/step , 58270.38 GFLOP/s , 531628.8 tokens/s INFO:__main__:2024-10-27 14:29:20 | Epoch: 3 | Step: 94870 | Dataset: 0-16712703 | Loss: 1.685 | 676 ms/step , 58181.12 GFLOP/s , 531642.7 tokens/s INFO:__main__:2024-10-27 14:29:28 | Epoch: 3 | Step: 94880 | Dataset: 0-16720703 | Loss: 1.636 | 679 ms/step , 57930.67 GFLOP/s , 530750.5 tokens/s INFO:__main__:2024-10-27 14:29:36 | Epoch: 3 | Step: 94890 | Dataset: 0-16728703 | Loss: 1.666 | 674 ms/step , 58352.89 GFLOP/s , 531375.2 tokens/s INFO:__main__:2024-10-27 14:29:44 | Epoch: 3 | Step: 94900 | Dataset: 0-16736703 | Loss: 1.934 | 674 ms/step , 58314.92 GFLOP/s , 531300.9 tokens/s INFO:__main__:2024-10-27 14:29:51 | Epoch: 3 | Step: 94910 | Dataset: 0-16744703 | Loss: 2.041 | 677 ms/step , 58098.50 GFLOP/s , 532199.4 tokens/s INFO:__main__:2024-10-27 14:29:59 | Epoch: 3 | Step: 94920 | Dataset: 0-16752703 | Loss: 2.201 | 676 ms/step , 58179.36 GFLOP/s , 531510.0 tokens/s INFO:__main__:2024-10-27 14:30:07 | Epoch: 3 | Step: 94930 | Dataset: 0-16760703 | Loss: 2.129 | 676 ms/step , 58189.34 GFLOP/s , 531908.5 tokens/s INFO:__main__:2024-10-27 14:30:14 | Epoch: 3 | Step: 94940 | Dataset: 0-16768703 | Loss: 2.137 | 675 ms/step , 58247.60 GFLOP/s , 531755.2 tokens/s INFO:__main__:2024-10-27 14:30:22 | Epoch: 3 | Step: 94950 | Dataset: 0-16776703 | Loss: 2.150 | 676 ms/step , 58107.01 GFLOP/s , 532271.7 tokens/s INFO:__main__:2024-10-27 14:30:30 | Epoch: 3 | Step: 94960 | Dataset: 0-16784703 | Loss: 2.189 | 675 ms/step , 58209.31 GFLOP/s , 529517.2 tokens/s INFO:__main__:2024-10-27 14:30:37 | Epoch: 3 | Step: 94970 | Dataset: 0-16792703 | Loss: 2.066 | 676 ms/step , 58139.70 GFLOP/s , 531975.0 tokens/s INFO:__main__:2024-10-27 14:30:45 | Epoch: 3 | Step: 94980 | Dataset: 0-16800703 | Loss: 2.103 | 676 ms/step , 58178.75 GFLOP/s , 531485.5 tokens/s INFO:__main__:2024-10-27 14:30:53 | Epoch: 3 | Step: 94990 | Dataset: 0-16808703 | Loss: 2.185 | 675 ms/step , 58224.59 GFLOP/s , 531750.6 tokens/s INFO:__main__:2024-10-27 14:31:00 | Validation | Step: 95000 | Val_loss: 2.087 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 14:31:00 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_143100_step_95000.pt` INFO:__main__:2024-10-27 14:31:01 | Epoch: 3 | Step: 95000 | Dataset: 0-16816703 | Loss: 2.187 | 674 ms/step , 58282.13 GFLOP/s , 479143.7 tokens/s INFO:__main__:2024-10-27 14:31:09 | Epoch: 3 | Step: 95010 | Dataset: 0-16824703 | Loss: 2.047 | 675 ms/step , 58244.30 GFLOP/s , 529403.8 tokens/s INFO:__main__:2024-10-27 14:31:17 | Epoch: 3 | Step: 95020 | Dataset: 0-16832703 | Loss: 2.024 | 676 ms/step , 58186.70 GFLOP/s , 531468.0 tokens/s INFO:__main__:2024-10-27 14:31:25 | Epoch: 3 | Step: 95030 | Dataset: 0-16840703 | Loss: 2.069 | 676 ms/step , 58123.81 GFLOP/s , 531166.4 tokens/s INFO:__main__:2024-10-27 14:31:32 | Epoch: 3 | Step: 95040 | Dataset: 0-16848703 | Loss: 2.128 | 675 ms/step , 58261.22 GFLOP/s , 531724.9 tokens/s INFO:__main__:2024-10-27 14:31:40 | Epoch: 3 | Step: 95050 | Dataset: 0-16856703 | Loss: 2.121 | 675 ms/step , 58212.08 GFLOP/s , 531135.6 tokens/s INFO:__main__:2024-10-27 14:31:48 | Epoch: 3 | Step: 95060 | Dataset: 0-16864703 | Loss: 2.111 | 676 ms/step , 58125.41 GFLOP/s , 531204.6 tokens/s INFO:__main__:2024-10-27 14:31:55 | Epoch: 3 | Step: 95070 | Dataset: 0-16872703 | Loss: 1.715 | 675 ms/step , 58222.42 GFLOP/s , 531499.2 tokens/s INFO:__main__:2024-10-27 14:32:03 | Epoch: 3 | Step: 95080 | Dataset: 0-16880703 | Loss: 1.670 | 675 ms/step , 58211.86 GFLOP/s , 531232.5 tokens/s INFO:__main__:2024-10-27 14:32:11 | Epoch: 3 | Step: 95090 | Dataset: 0-16888703 | Loss: 1.659 | 676 ms/step , 58160.41 GFLOP/s , 530045.8 tokens/s INFO:__main__:2024-10-27 14:32:19 | Epoch: 3 | Step: 95100 | Dataset: 0-16896703 | Loss: 1.619 | 676 ms/step , 58152.47 GFLOP/s , 531800.4 tokens/s INFO:__main__:2024-10-27 14:32:26 | Epoch: 3 | Step: 95110 | Dataset: 0-16904703 | Loss: 1.661 | 675 ms/step , 58271.79 GFLOP/s , 531774.1 tokens/s INFO:__main__:2024-10-27 14:32:34 | Epoch: 3 | Step: 95120 | Dataset: 0-16912703 | Loss: 1.654 | 675 ms/step , 58202.88 GFLOP/s , 531575.2 tokens/s INFO:__main__:2024-10-27 14:32:42 | Epoch: 3 | Step: 95130 | Dataset: 0-16920703 | Loss: 1.645 | 676 ms/step , 58187.89 GFLOP/s , 532073.6 tokens/s INFO:__main__:2024-10-27 14:32:49 | Epoch: 3 | Step: 95140 | Dataset: 0-16928703 | Loss: 1.619 | 676 ms/step , 58158.91 GFLOP/s , 531720.8 tokens/s INFO:__main__:2024-10-27 14:32:57 | Epoch: 3 | Step: 95150 | Dataset: 0-16936703 | Loss: 2.393 | 676 ms/step , 58177.80 GFLOP/s , 532203.9 tokens/s INFO:__main__:2024-10-27 14:33:05 | Epoch: 3 | Step: 95160 | Dataset: 0-16944703 | Loss: 2.290 | 675 ms/step , 58199.69 GFLOP/s , 532135.3 tokens/s INFO:__main__:2024-10-27 14:33:12 | Epoch: 3 | Step: 95170 | Dataset: 0-16952703 | Loss: 2.216 | 677 ms/step , 58065.76 GFLOP/s , 531430.4 tokens/s INFO:__main__:2024-10-27 14:33:20 | Epoch: 3 | Step: 95180 | Dataset: 0-16960703 | Loss: 2.239 | 676 ms/step , 58170.11 GFLOP/s , 531582.9 tokens/s INFO:__main__:2024-10-27 14:33:28 | Epoch: 3 | Step: 95190 | Dataset: 0-16968703 | Loss: 2.235 | 676 ms/step , 58177.27 GFLOP/s , 532081.0 tokens/s INFO:__main__:2024-10-27 14:33:36 | Epoch: 3 | Step: 95200 | Dataset: 0-16976703 | Loss: 2.105 | 675 ms/step , 58214.27 GFLOP/s , 531768.4 tokens/s INFO:__main__:2024-10-27 14:33:43 | Epoch: 3 | Step: 95210 | Dataset: 0-16984703 | Loss: 2.103 | 675 ms/step , 58208.75 GFLOP/s , 532029.1 tokens/s INFO:__main__:2024-10-27 14:33:51 | Epoch: 3 | Step: 95220 | Dataset: 0-16992703 | Loss: 2.157 | 675 ms/step , 58263.96 GFLOP/s , 531857.0 tokens/s INFO:__main__:2024-10-27 14:33:59 | Epoch: 3 | Step: 95230 | Dataset: 0-17000703 | Loss: 2.152 | 675 ms/step , 58230.03 GFLOP/s , 531320.1 tokens/s INFO:__main__:2024-10-27 14:34:06 | Epoch: 3 | Step: 95240 | Dataset: 0-17008703 | Loss: 2.197 | 675 ms/step , 58242.96 GFLOP/s , 531436.5 tokens/s INFO:__main__:2024-10-27 14:34:14 | Epoch: 3 | Step: 95250 | Dataset: 0-17016703 | Loss: 2.166 | 675 ms/step , 58215.10 GFLOP/s , 531525.3 tokens/s INFO:__main__:2024-10-27 14:34:22 | Epoch: 3 | Step: 95260 | Dataset: 0-17024703 | Loss: 2.160 | 674 ms/step , 58295.38 GFLOP/s , 532082.3 tokens/s INFO:__main__:2024-10-27 14:34:29 | Epoch: 3 | Step: 95270 | Dataset: 0-17032703 | Loss: 2.153 | 674 ms/step , 58284.38 GFLOP/s , 532419.0 tokens/s INFO:__main__:2024-10-27 14:34:37 | Epoch: 3 | Step: 95280 | Dataset: 0-17040703 | Loss: 2.063 | 675 ms/step , 58260.00 GFLOP/s , 531888.3 tokens/s INFO:__main__:2024-10-27 14:34:45 | Epoch: 3 | Step: 95290 | Dataset: 0-17048703 | Loss: 2.175 | 675 ms/step , 58195.58 GFLOP/s , 532096.1 tokens/s INFO:__main__:2024-10-27 14:34:53 | Epoch: 3 | Step: 95300 | Dataset: 0-17056703 | Loss: 2.131 | 676 ms/step , 58177.08 GFLOP/s , 531949.1 tokens/s INFO:__main__:2024-10-27 14:35:00 | Epoch: 3 | Step: 95310 | Dataset: 0-17064703 | Loss: 2.292 | 676 ms/step , 58156.34 GFLOP/s , 531752.5 tokens/s INFO:__main__:2024-10-27 14:35:08 | Epoch: 3 | Step: 95320 | Dataset: 0-17072703 | Loss: 2.146 | 674 ms/step , 58339.73 GFLOP/s , 532004.8 tokens/s INFO:__main__:2024-10-27 14:35:16 | Epoch: 3 | Step: 95330 | Dataset: 0-17080703 | Loss: 2.169 | 675 ms/step , 58259.97 GFLOP/s , 532321.4 tokens/s INFO:__main__:2024-10-27 14:35:23 | Epoch: 3 | Step: 95340 | Dataset: 0-17088703 | Loss: 2.179 | 675 ms/step , 58224.39 GFLOP/s , 531683.8 tokens/s INFO:__main__:2024-10-27 14:35:31 | Epoch: 3 | Step: 95350 | Dataset: 0-17096703 | Loss: 2.189 | 676 ms/step , 58171.47 GFLOP/s , 531611.4 tokens/s INFO:__main__:2024-10-27 14:35:39 | Epoch: 3 | Step: 95360 | Dataset: 0-17104703 | Loss: 2.242 | 675 ms/step , 58230.96 GFLOP/s , 531521.9 tokens/s INFO:__main__:2024-10-27 14:35:46 | Epoch: 3 | Step: 95370 | Dataset: 0-17112703 | Loss: 2.234 | 674 ms/step , 58299.54 GFLOP/s , 531642.7 tokens/s INFO:__main__:2024-10-27 14:35:54 | Epoch: 3 | Step: 95380 | Dataset: 0-17120703 | Loss: 2.179 | 677 ms/step , 58106.17 GFLOP/s , 531646.3 tokens/s INFO:__main__:2024-10-27 14:36:02 | Epoch: 3 | Step: 95390 | Dataset: 0-17128703 | Loss: 2.215 | 676 ms/step , 58172.49 GFLOP/s , 531693.8 tokens/s INFO:__main__:2024-10-27 14:36:10 | Epoch: 3 | Step: 95400 | Dataset: 0-17136703 | Loss: 2.129 | 676 ms/step , 58140.02 GFLOP/s , 531646.3 tokens/s INFO:__main__:2024-10-27 14:36:17 | Epoch: 3 | Step: 95410 | Dataset: 0-17144703 | Loss: 2.254 | 676 ms/step , 58131.16 GFLOP/s , 531554.2 tokens/s INFO:__main__:2024-10-27 14:36:25 | Epoch: 3 | Step: 95420 | Dataset: 0-17152703 | Loss: 2.175 | 676 ms/step , 58135.43 GFLOP/s , 531836.7 tokens/s INFO:__main__:2024-10-27 14:36:33 | Epoch: 3 | Step: 95430 | Dataset: 0-17160703 | Loss: 2.098 | 675 ms/step , 58197.65 GFLOP/s , 531799.8 tokens/s INFO:__main__:2024-10-27 14:36:40 | Epoch: 3 | Step: 95440 | Dataset: 0-17168703 | Loss: 2.207 | 676 ms/step , 58150.48 GFLOP/s , 531657.1 tokens/s INFO:__main__:2024-10-27 14:36:48 | Epoch: 3 | Step: 95450 | Dataset: 0-17176703 | Loss: 2.232 | 676 ms/step , 58179.11 GFLOP/s , 531653.9 tokens/s INFO:__main__:2024-10-27 14:36:56 | Epoch: 3 | Step: 95460 | Dataset: 0-17184703 | Loss: 2.167 | 675 ms/step , 58219.35 GFLOP/s , 531731.2 tokens/s INFO:__main__:2024-10-27 14:37:04 | Epoch: 3 | Step: 95470 | Dataset: 0-17192703 | Loss: 1.846 | 676 ms/step , 58153.96 GFLOP/s , 531367.2 tokens/s INFO:__main__:2024-10-27 14:37:11 | Epoch: 3 | Step: 95480 | Dataset: 0-17200703 | Loss: 1.707 | 675 ms/step , 58263.12 GFLOP/s , 531866.3 tokens/s INFO:__main__:2024-10-27 14:37:19 | Epoch: 3 | Step: 95490 | Dataset: 0-17208703 | Loss: 1.694 | 675 ms/step , 58226.63 GFLOP/s , 531371.9 tokens/s INFO:__main__:2024-10-27 14:37:27 | Epoch: 3 | Step: 95500 | Dataset: 0-17216703 | Loss: 1.609 | 674 ms/step , 58303.19 GFLOP/s , 531924.5 tokens/s INFO:__main__:2024-10-27 14:37:34 | Epoch: 3 | Step: 95510 | Dataset: 0-17224703 | Loss: 1.648 | 676 ms/step , 58171.33 GFLOP/s , 531739.6 tokens/s INFO:__main__:2024-10-27 14:37:42 | Epoch: 3 | Step: 95520 | Dataset: 0-17232703 | Loss: 1.629 | 678 ms/step , 57956.02 GFLOP/s , 531391.1 tokens/s INFO:__main__:2024-10-27 14:37:50 | Epoch: 3 | Step: 95530 | Dataset: 0-17240703 | Loss: 1.654 | 677 ms/step , 58079.34 GFLOP/s , 531669.8 tokens/s INFO:__main__:2024-10-27 14:37:57 | Epoch: 3 | Step: 95540 | Dataset: 0-17248703 | Loss: 1.641 | 675 ms/step , 58223.51 GFLOP/s , 531185.2 tokens/s INFO:__main__:2024-10-27 14:38:05 | Epoch: 3 | Step: 95550 | Dataset: 0-17256703 | Loss: 1.665 | 676 ms/step , 58179.40 GFLOP/s , 531385.6 tokens/s INFO:__main__:2024-10-27 14:38:13 | Epoch: 3 | Step: 95560 | Dataset: 0-17264703 | Loss: 2.243 | 676 ms/step , 58120.57 GFLOP/s , 531311.3 tokens/s INFO:__main__:2024-10-27 14:38:21 | Epoch: 3 | Step: 95570 | Dataset: 0-17272703 | Loss: 2.180 | 676 ms/step , 58165.85 GFLOP/s , 531571.6 tokens/s INFO:__main__:2024-10-27 14:38:28 | Epoch: 3 | Step: 95580 | Dataset: 0-17280703 | Loss: 2.203 | 675 ms/step , 58268.54 GFLOP/s , 531648.9 tokens/s INFO:__main__:2024-10-27 14:38:36 | Epoch: 3 | Step: 95590 | Dataset: 0-17288703 | Loss: 2.206 | 675 ms/step , 58232.71 GFLOP/s , 532079.1 tokens/s INFO:__main__:2024-10-27 14:38:44 | Epoch: 3 | Step: 95600 | Dataset: 0-17296703 | Loss: 2.176 | 675 ms/step , 58196.54 GFLOP/s , 531410.1 tokens/s INFO:__main__:2024-10-27 14:38:51 | Epoch: 3 | Step: 95610 | Dataset: 0-17304703 | Loss: 2.159 | 680 ms/step , 57822.74 GFLOP/s , 530923.0 tokens/s INFO:__main__:2024-10-27 14:38:59 | Epoch: 3 | Step: 95620 | Dataset: 0-17312703 | Loss: 2.176 | 676 ms/step , 58168.60 GFLOP/s , 531273.6 tokens/s INFO:__main__:2024-10-27 14:39:07 | Epoch: 3 | Step: 95630 | Dataset: 0-17320703 | Loss: 2.143 | 674 ms/step , 58284.05 GFLOP/s , 530345.7 tokens/s INFO:__main__:2024-10-27 14:39:15 | Epoch: 3 | Step: 95640 | Dataset: 0-17328703 | Loss: 2.194 | 676 ms/step , 58169.56 GFLOP/s , 531568.3 tokens/s INFO:__main__:2024-10-27 14:39:22 | Epoch: 3 | Step: 95650 | Dataset: 0-17336703 | Loss: 2.148 | 676 ms/step , 58146.51 GFLOP/s , 530807.4 tokens/s INFO:__main__:2024-10-27 14:39:30 | Epoch: 3 | Step: 95660 | Dataset: 0-17344703 | Loss: 2.156 | 675 ms/step , 58236.64 GFLOP/s , 531440.1 tokens/s INFO:__main__:2024-10-27 14:39:38 | Epoch: 3 | Step: 95670 | Dataset: 0-17352703 | Loss: 2.182 | 674 ms/step , 58295.00 GFLOP/s , 532059.6 tokens/s INFO:__main__:2024-10-27 14:39:45 | Epoch: 3 | Step: 95680 | Dataset: 0-17360703 | Loss: 2.147 | 678 ms/step , 57994.04 GFLOP/s , 531253.9 tokens/s INFO:__main__:2024-10-27 14:39:53 | Epoch: 3 | Step: 95690 | Dataset: 0-17368703 | Loss: 2.095 | 678 ms/step , 58006.62 GFLOP/s , 529987.4 tokens/s INFO:__main__:2024-10-27 14:40:01 | Epoch: 3 | Step: 95700 | Dataset: 0-17376703 | Loss: 2.248 | 674 ms/step , 58294.68 GFLOP/s , 531159.9 tokens/s INFO:__main__:2024-10-27 14:40:09 | Epoch: 3 | Step: 95710 | Dataset: 0-17384703 | Loss: 2.159 | 674 ms/step , 58298.43 GFLOP/s , 531948.7 tokens/s INFO:__main__:2024-10-27 14:40:16 | Epoch: 3 | Step: 95720 | Dataset: 0-17392703 | Loss: 2.417 | 674 ms/step , 58279.60 GFLOP/s , 532173.8 tokens/s INFO:__main__:2024-10-27 14:40:24 | Epoch: 3 | Step: 95730 | Dataset: 0-17400703 | Loss: 2.282 | 675 ms/step , 58201.41 GFLOP/s , 532331.0 tokens/s INFO:__main__:2024-10-27 14:40:32 | Epoch: 3 | Step: 95740 | Dataset: 0-17408703 | Loss: 2.231 | 674 ms/step , 58283.54 GFLOP/s , 532111.0 tokens/s INFO:__main__:2024-10-27 14:40:39 | Epoch: 3 | Step: 95750 | Dataset: 0-17416703 | Loss: 2.157 | 675 ms/step , 58253.26 GFLOP/s , 532296.5 tokens/s INFO:__main__:2024-10-27 14:40:47 | Epoch: 3 | Step: 95760 | Dataset: 0-17424703 | Loss: 2.137 | 675 ms/step , 58211.58 GFLOP/s , 531780.0 tokens/s INFO:__main__:2024-10-27 14:40:55 | Epoch: 3 | Step: 95770 | Dataset: 0-17432703 | Loss: 2.126 | 675 ms/step , 58265.65 GFLOP/s , 531894.4 tokens/s INFO:__main__:2024-10-27 14:41:02 | Epoch: 3 | Step: 95780 | Dataset: 0-17440703 | Loss: 2.139 | 676 ms/step , 58186.31 GFLOP/s , 531997.9 tokens/s INFO:__main__:2024-10-27 14:41:10 | Epoch: 3 | Step: 95790 | Dataset: 0-17448703 | Loss: 2.155 | 675 ms/step , 58194.71 GFLOP/s , 531345.2 tokens/s INFO:__main__:2024-10-27 14:41:18 | Epoch: 3 | Step: 95800 | Dataset: 0-17456703 | Loss: 2.086 | 675 ms/step , 58211.10 GFLOP/s , 531022.3 tokens/s INFO:__main__:2024-10-27 14:41:26 | Epoch: 3 | Step: 95810 | Dataset: 0-17464703 | Loss: 2.109 | 674 ms/step , 58308.16 GFLOP/s , 532094.0 tokens/s INFO:__main__:2024-10-27 14:41:33 | Epoch: 3 | Step: 95820 | Dataset: 0-17472703 | Loss: 2.115 | 675 ms/step , 58200.73 GFLOP/s , 531978.5 tokens/s INFO:__main__:2024-10-27 14:41:41 | Epoch: 3 | Step: 95830 | Dataset: 0-17480703 | Loss: 2.081 | 675 ms/step , 58249.85 GFLOP/s , 531948.2 tokens/s INFO:__main__:2024-10-27 14:41:49 | Epoch: 3 | Step: 95840 | Dataset: 0-17488703 | Loss: 2.080 | 675 ms/step , 58242.98 GFLOP/s , 531960.7 tokens/s INFO:__main__:2024-10-27 14:41:56 | Epoch: 3 | Step: 95850 | Dataset: 0-17496703 | Loss: 2.052 | 674 ms/step , 58283.81 GFLOP/s , 531979.9 tokens/s INFO:__main__:2024-10-27 14:42:04 | Epoch: 3 | Step: 95860 | Dataset: 0-17504703 | Loss: 2.066 | 675 ms/step , 58226.49 GFLOP/s , 531963.5 tokens/s INFO:__main__:2024-10-27 14:42:12 | Epoch: 3 | Step: 95870 | Dataset: 0-17512703 | Loss: 2.070 | 682 ms/step , 57621.10 GFLOP/s , 531858.9 tokens/s INFO:__main__:2024-10-27 14:42:19 | Epoch: 3 | Step: 95880 | Dataset: 0-17520703 | Loss: 2.212 | 674 ms/step , 58329.80 GFLOP/s , 532854.4 tokens/s INFO:__main__:2024-10-27 14:42:27 | Epoch: 3 | Step: 95890 | Dataset: 0-17528703 | Loss: 2.185 | 675 ms/step , 58257.08 GFLOP/s , 532485.6 tokens/s INFO:__main__:2024-10-27 14:42:35 | Epoch: 3 | Step: 95900 | Dataset: 0-17536703 | Loss: 2.166 | 675 ms/step , 58226.70 GFLOP/s , 532488.1 tokens/s INFO:__main__:2024-10-27 14:42:43 | Epoch: 3 | Step: 95910 | Dataset: 0-17544703 | Loss: 2.165 | 675 ms/step , 58194.81 GFLOP/s , 532309.4 tokens/s INFO:__main__:2024-10-27 14:42:50 | Epoch: 3 | Step: 95920 | Dataset: 0-17552703 | Loss: 2.116 | 675 ms/step , 58276.93 GFLOP/s , 532228.8 tokens/s INFO:__main__:2024-10-27 14:42:58 | Epoch: 3 | Step: 95930 | Dataset: 0-17560703 | Loss: 2.106 | 674 ms/step , 58337.08 GFLOP/s , 532384.2 tokens/s INFO:__main__:2024-10-27 14:43:06 | Epoch: 3 | Step: 95940 | Dataset: 0-17568703 | Loss: 2.114 | 673 ms/step , 58376.30 GFLOP/s , 531928.3 tokens/s INFO:__main__:2024-10-27 14:43:13 | Epoch: 3 | Step: 95950 | Dataset: 0-17576703 | Loss: 2.175 | 675 ms/step , 58278.14 GFLOP/s , 532547.6 tokens/s INFO:__main__:2024-10-27 14:43:21 | Epoch: 3 | Step: 95960 | Dataset: 0-17584703 | Loss: 2.039 | 675 ms/step , 58248.99 GFLOP/s , 531929.7 tokens/s INFO:__main__:2024-10-27 14:43:29 | Epoch: 3 | Step: 95970 | Dataset: 0-17592703 | Loss: 2.072 | 675 ms/step , 58265.48 GFLOP/s , 532409.3 tokens/s INFO:__main__:2024-10-27 14:43:36 | Epoch: 3 | Step: 95980 | Dataset: 0-17600703 | Loss: 2.102 | 675 ms/step , 58213.09 GFLOP/s , 532038.9 tokens/s INFO:__main__:2024-10-27 14:43:44 | Epoch: 3 | Step: 95990 | Dataset: 0-17608703 | Loss: 2.062 | 675 ms/step , 58215.55 GFLOP/s , 532185.1 tokens/s INFO:__main__:2024-10-27 14:43:51 | Validation | Step: 96000 | Val_loss: 2.162 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 14:43:51 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_144351_step_96000.pt` INFO:__main__:2024-10-27 14:43:53 | Epoch: 3 | Step: 96000 | Dataset: 0-17616703 | Loss: 2.154 | 673 ms/step , 58376.25 GFLOP/s , 479284.1 tokens/s INFO:__main__:2024-10-27 14:44:00 | Epoch: 3 | Step: 96010 | Dataset: 0-17624703 | Loss: 2.078 | 675 ms/step , 58266.37 GFLOP/s , 532374.8 tokens/s INFO:__main__:2024-10-27 14:44:08 | Epoch: 3 | Step: 96020 | Dataset: 0-17632703 | Loss: 2.104 | 675 ms/step , 58238.37 GFLOP/s , 531892.2 tokens/s INFO:__main__:2024-10-27 14:44:16 | Epoch: 3 | Step: 96030 | Dataset: 0-17640703 | Loss: 2.126 | 675 ms/step , 58242.71 GFLOP/s , 532540.3 tokens/s INFO:__main__:2024-10-27 14:44:23 | Epoch: 3 | Step: 96040 | Dataset: 0-17648703 | Loss: 2.172 | 676 ms/step , 58189.07 GFLOP/s , 532350.6 tokens/s INFO:__main__:2024-10-27 14:44:31 | Epoch: 3 | Step: 96050 | Dataset: 0-17656703 | Loss: 2.129 | 675 ms/step , 58266.06 GFLOP/s , 532509.8 tokens/s INFO:__main__:2024-10-27 14:44:39 | Epoch: 3 | Step: 96060 | Dataset: 0-17664703 | Loss: 2.236 | 676 ms/step , 58191.99 GFLOP/s , 532676.7 tokens/s INFO:__main__:2024-10-27 14:44:47 | Epoch: 3 | Step: 96070 | Dataset: 0-17672703 | Loss: 2.177 | 674 ms/step , 58290.50 GFLOP/s , 530041.5 tokens/s INFO:__main__:2024-10-27 14:44:54 | Epoch: 3 | Step: 96080 | Dataset: 0-17680703 | Loss: 2.159 | 675 ms/step , 58236.84 GFLOP/s , 532106.3 tokens/s INFO:__main__:2024-10-27 14:45:02 | Epoch: 3 | Step: 96090 | Dataset: 0-17688703 | Loss: 2.124 | 675 ms/step , 58234.68 GFLOP/s , 532107.7 tokens/s INFO:__main__:2024-10-27 14:45:10 | Epoch: 3 | Step: 96100 | Dataset: 0-17696703 | Loss: 2.228 | 675 ms/step , 58252.24 GFLOP/s , 532426.8 tokens/s INFO:__main__:2024-10-27 14:45:17 | Epoch: 3 | Step: 96110 | Dataset: 0-17704703 | Loss: 2.156 | 676 ms/step , 58190.09 GFLOP/s , 531146.2 tokens/s INFO:__main__:2024-10-27 14:45:25 | Epoch: 3 | Step: 96120 | Dataset: 0-17712703 | Loss: 2.159 | 675 ms/step , 58270.59 GFLOP/s , 532059.6 tokens/s INFO:__main__:2024-10-27 14:45:33 | Epoch: 3 | Step: 96130 | Dataset: 0-17720703 | Loss: 2.160 | 675 ms/step , 58237.04 GFLOP/s , 531894.1 tokens/s INFO:__main__:2024-10-27 14:45:40 | Epoch: 3 | Step: 96140 | Dataset: 0-17728703 | Loss: 2.110 | 675 ms/step , 58204.01 GFLOP/s , 532173.1 tokens/s INFO:__main__:2024-10-27 14:45:48 | Epoch: 3 | Step: 96150 | Dataset: 0-17736703 | Loss: 2.084 | 674 ms/step , 58318.68 GFLOP/s , 531072.6 tokens/s INFO:__main__:2024-10-27 14:45:56 | Epoch: 3 | Step: 96160 | Dataset: 0-17744703 | Loss: 2.134 | 675 ms/step , 58256.41 GFLOP/s , 532071.8 tokens/s INFO:__main__:2024-10-27 14:46:04 | Epoch: 3 | Step: 96170 | Dataset: 0-17752703 | Loss: 2.143 | 676 ms/step , 58163.34 GFLOP/s , 532425.8 tokens/s INFO:__main__:2024-10-27 14:46:11 | Epoch: 3 | Step: 96180 | Dataset: 0-17760703 | Loss: 2.115 | 675 ms/step , 58252.24 GFLOP/s , 532252.5 tokens/s INFO:__main__:2024-10-27 14:46:19 | Epoch: 3 | Step: 96190 | Dataset: 0-17768703 | Loss: 2.143 | 675 ms/step , 58277.76 GFLOP/s , 531763.9 tokens/s INFO:__main__:2024-10-27 14:46:27 | Epoch: 3 | Step: 96200 | Dataset: 0-17776703 | Loss: 2.186 | 675 ms/step , 58227.64 GFLOP/s , 531689.2 tokens/s INFO:__main__:2024-10-27 14:46:34 | Epoch: 3 | Step: 96210 | Dataset: 0-17784703 | Loss: 2.119 | 676 ms/step , 58152.72 GFLOP/s , 532367.3 tokens/s INFO:__main__:2024-10-27 14:46:42 | Epoch: 3 | Step: 96220 | Dataset: 0-17792703 | Loss: 2.172 | 676 ms/step , 58186.08 GFLOP/s , 531912.4 tokens/s INFO:__main__:2024-10-27 14:46:50 | Epoch: 3 | Step: 96230 | Dataset: 0-17800703 | Loss: 2.143 | 676 ms/step , 58132.15 GFLOP/s , 531812.2 tokens/s INFO:__main__:2024-10-27 14:46:57 | Epoch: 3 | Step: 96240 | Dataset: 0-17808703 | Loss: 2.159 | 675 ms/step , 58222.60 GFLOP/s , 531959.3 tokens/s INFO:__main__:2024-10-27 14:47:05 | Epoch: 3 | Step: 96250 | Dataset: 0-17816703 | Loss: 2.201 | 675 ms/step , 58236.89 GFLOP/s , 532449.2 tokens/s INFO:__main__:2024-10-27 14:47:13 | Epoch: 3 | Step: 96260 | Dataset: 0-17824703 | Loss: 2.202 | 675 ms/step , 58261.87 GFLOP/s , 532245.3 tokens/s INFO:__main__:2024-10-27 14:47:20 | Epoch: 3 | Step: 96270 | Dataset: 0-17832703 | Loss: 2.213 | 675 ms/step , 58211.78 GFLOP/s , 532518.6 tokens/s INFO:__main__:2024-10-27 14:47:28 | Epoch: 3 | Step: 96280 | Dataset: 0-17840703 | Loss: 2.180 | 677 ms/step , 58049.71 GFLOP/s , 531647.9 tokens/s INFO:__main__:2024-10-27 14:47:36 | Epoch: 3 | Step: 96290 | Dataset: 0-17848703 | Loss: 2.204 | 676 ms/step , 58175.03 GFLOP/s , 531185.1 tokens/s INFO:__main__:2024-10-27 14:47:44 | Epoch: 3 | Step: 96300 | Dataset: 0-17856703 | Loss: 2.170 | 675 ms/step , 58225.49 GFLOP/s , 532001.5 tokens/s INFO:__main__:2024-10-27 14:47:51 | Epoch: 3 | Step: 96310 | Dataset: 0-17864703 | Loss: 2.137 | 676 ms/step , 58134.02 GFLOP/s , 531590.2 tokens/s INFO:__main__:2024-10-27 14:47:59 | Epoch: 3 | Step: 96320 | Dataset: 0-17872703 | Loss: 2.136 | 674 ms/step , 58339.33 GFLOP/s , 532594.8 tokens/s INFO:__main__:2024-10-27 14:48:07 | Epoch: 3 | Step: 96330 | Dataset: 0-17880703 | Loss: 2.127 | 676 ms/step , 58165.87 GFLOP/s , 531776.2 tokens/s INFO:__main__:2024-10-27 14:48:14 | Epoch: 3 | Step: 96340 | Dataset: 0-17888703 | Loss: 2.145 | 675 ms/step , 58266.03 GFLOP/s , 532328.2 tokens/s INFO:__main__:2024-10-27 14:48:22 | Epoch: 3 | Step: 96350 | Dataset: 0-17896703 | Loss: 2.178 | 674 ms/step , 58285.22 GFLOP/s , 532129.0 tokens/s INFO:__main__:2024-10-27 14:48:30 | Epoch: 3 | Step: 96360 | Dataset: 0-17904703 | Loss: 2.177 | 677 ms/step , 58070.37 GFLOP/s , 531567.8 tokens/s INFO:__main__:2024-10-27 14:48:38 | Epoch: 3 | Step: 96370 | Dataset: 0-17912703 | Loss: 1.780 | 675 ms/step , 58259.87 GFLOP/s , 531171.9 tokens/s INFO:__main__:2024-10-27 14:48:45 | Epoch: 3 | Step: 96380 | Dataset: 0-17920703 | Loss: 1.746 | 674 ms/step , 58328.56 GFLOP/s , 531780.3 tokens/s INFO:__main__:2024-10-27 14:48:53 | Epoch: 3 | Step: 96390 | Dataset: 0-17928703 | Loss: 1.689 | 675 ms/step , 58235.12 GFLOP/s , 532063.7 tokens/s INFO:__main__:2024-10-27 14:49:01 | Epoch: 3 | Step: 96400 | Dataset: 0-17936703 | Loss: 1.677 | 675 ms/step , 58258.77 GFLOP/s , 531493.0 tokens/s INFO:__main__:2024-10-27 14:49:08 | Epoch: 3 | Step: 96410 | Dataset: 0-17944703 | Loss: 1.665 | 674 ms/step , 58345.11 GFLOP/s , 532241.3 tokens/s INFO:__main__:2024-10-27 14:49:16 | Epoch: 3 | Step: 96420 | Dataset: 0-17952703 | Loss: 1.656 | 674 ms/step , 58302.64 GFLOP/s , 532094.2 tokens/s INFO:__main__:2024-10-27 14:49:24 | Epoch: 3 | Step: 96430 | Dataset: 0-17960703 | Loss: 1.673 | 677 ms/step , 58087.37 GFLOP/s , 531620.4 tokens/s INFO:__main__:2024-10-27 14:49:31 | Epoch: 3 | Step: 96440 | Dataset: 0-17968703 | Loss: 1.662 | 676 ms/step , 58182.20 GFLOP/s , 531443.8 tokens/s INFO:__main__:2024-10-27 14:49:39 | Epoch: 3 | Step: 96450 | Dataset: 0-17976703 | Loss: 1.677 | 676 ms/step , 58168.05 GFLOP/s , 531108.1 tokens/s INFO:__main__:2024-10-27 14:49:47 | Epoch: 3 | Step: 96460 | Dataset: 0-17984703 | Loss: 1.664 | 674 ms/step , 58354.50 GFLOP/s , 531240.2 tokens/s INFO:__main__:2024-10-27 14:49:55 | Epoch: 3 | Step: 96470 | Dataset: 0-17992703 | Loss: 1.633 | 675 ms/step , 58267.07 GFLOP/s , 531825.6 tokens/s INFO:__main__:2024-10-27 14:50:02 | Epoch: 3 | Step: 96480 | Dataset: 0-18000703 | Loss: 1.646 | 675 ms/step , 58205.28 GFLOP/s , 531922.4 tokens/s INFO:__main__:2024-10-27 14:50:10 | Epoch: 3 | Step: 96490 | Dataset: 0-18008703 | Loss: 1.604 | 676 ms/step , 58183.37 GFLOP/s , 531538.7 tokens/s INFO:__main__:2024-10-27 14:50:18 | Epoch: 3 | Step: 96500 | Dataset: 0-18016703 | Loss: 1.653 | 676 ms/step , 58167.12 GFLOP/s , 531443.0 tokens/s INFO:__main__:2024-10-27 14:50:25 | Epoch: 3 | Step: 96510 | Dataset: 0-18024703 | Loss: 1.648 | 674 ms/step , 58281.13 GFLOP/s , 531530.9 tokens/s INFO:__main__:2024-10-27 14:50:33 | Epoch: 3 | Step: 96520 | Dataset: 0-18032703 | Loss: 1.642 | 675 ms/step , 58243.86 GFLOP/s , 531840.3 tokens/s INFO:__main__:2024-10-27 14:50:41 | Epoch: 3 | Step: 96530 | Dataset: 0-18040703 | Loss: 1.612 | 676 ms/step , 58143.79 GFLOP/s , 531060.3 tokens/s INFO:__main__:2024-10-27 14:50:48 | Epoch: 3 | Step: 96540 | Dataset: 0-18048703 | Loss: 2.293 | 676 ms/step , 58138.30 GFLOP/s , 531195.6 tokens/s INFO:__main__:2024-10-27 14:50:56 | Epoch: 3 | Step: 96550 | Dataset: 0-18056703 | Loss: 2.187 | 675 ms/step , 58221.71 GFLOP/s , 532370.1 tokens/s INFO:__main__:2024-10-27 14:51:04 | Epoch: 3 | Step: 96560 | Dataset: 0-18064703 | Loss: 2.110 | 675 ms/step , 58200.11 GFLOP/s , 532263.0 tokens/s INFO:__main__:2024-10-27 14:51:12 | Epoch: 3 | Step: 96570 | Dataset: 0-18072703 | Loss: 2.154 | 675 ms/step , 58272.37 GFLOP/s , 532273.4 tokens/s INFO:__main__:2024-10-27 14:51:19 | Epoch: 3 | Step: 96580 | Dataset: 0-18080703 | Loss: 2.227 | 674 ms/step , 58281.96 GFLOP/s , 532265.0 tokens/s INFO:__main__:2024-10-27 14:51:27 | Epoch: 3 | Step: 96590 | Dataset: 0-18088703 | Loss: 2.075 | 675 ms/step , 58242.01 GFLOP/s , 532177.9 tokens/s INFO:__main__:2024-10-27 14:51:35 | Epoch: 3 | Step: 96600 | Dataset: 0-18096703 | Loss: 2.186 | 675 ms/step , 58271.45 GFLOP/s , 532382.8 tokens/s INFO:__main__:2024-10-27 14:51:42 | Epoch: 3 | Step: 96610 | Dataset: 0-18104703 | Loss: 2.153 | 675 ms/step , 58261.51 GFLOP/s , 532576.3 tokens/s INFO:__main__:2024-10-27 14:51:50 | Epoch: 3 | Step: 96620 | Dataset: 0-18112703 | Loss: 2.136 | 675 ms/step , 58232.94 GFLOP/s , 532140.0 tokens/s INFO:__main__:2024-10-27 14:51:58 | Epoch: 3 | Step: 96630 | Dataset: 0-18120703 | Loss: 2.161 | 674 ms/step , 58284.52 GFLOP/s , 532694.9 tokens/s INFO:__main__:2024-10-27 14:52:05 | Epoch: 3 | Step: 96640 | Dataset: 0-18128703 | Loss: 2.120 | 674 ms/step , 58330.06 GFLOP/s , 531967.6 tokens/s INFO:__main__:2024-10-27 14:52:13 | Epoch: 3 | Step: 96650 | Dataset: 0-18136703 | Loss: 2.127 | 676 ms/step , 58161.78 GFLOP/s , 532148.4 tokens/s INFO:__main__:2024-10-27 14:52:21 | Epoch: 3 | Step: 96660 | Dataset: 0-18144703 | Loss: 2.127 | 675 ms/step , 58196.50 GFLOP/s , 532142.2 tokens/s INFO:__main__:2024-10-27 14:52:29 | Epoch: 3 | Step: 96670 | Dataset: 0-18152703 | Loss: 2.106 | 675 ms/step , 58222.99 GFLOP/s , 532370.5 tokens/s INFO:__main__:2024-10-27 14:52:36 | Epoch: 3 | Step: 96680 | Dataset: 0-18160703 | Loss: 2.039 | 674 ms/step , 58333.78 GFLOP/s , 532080.6 tokens/s INFO:__main__:2024-10-27 14:52:44 | Epoch: 3 | Step: 96690 | Dataset: 0-18168703 | Loss: 2.103 | 675 ms/step , 58276.96 GFLOP/s , 532746.0 tokens/s INFO:__main__:2024-10-27 14:52:52 | Epoch: 3 | Step: 96700 | Dataset: 0-18176703 | Loss: 1.843 | 675 ms/step , 58210.46 GFLOP/s , 532358.3 tokens/s INFO:__main__:2024-10-27 14:52:59 | Epoch: 3 | Step: 96710 | Dataset: 0-18184703 | Loss: 1.752 | 674 ms/step , 58300.95 GFLOP/s , 531759.1 tokens/s INFO:__main__:2024-10-27 14:53:07 | Epoch: 3 | Step: 96720 | Dataset: 0-18192703 | Loss: 1.769 | 675 ms/step , 58229.78 GFLOP/s , 532005.7 tokens/s INFO:__main__:2024-10-27 14:53:15 | Epoch: 3 | Step: 96730 | Dataset: 0-18200703 | Loss: 1.770 | 675 ms/step , 58200.28 GFLOP/s , 531415.6 tokens/s INFO:__main__:2024-10-27 14:53:22 | Epoch: 3 | Step: 96740 | Dataset: 0-18208703 | Loss: 1.758 | 675 ms/step , 58267.02 GFLOP/s , 531918.5 tokens/s INFO:__main__:2024-10-27 14:53:30 | Epoch: 3 | Step: 96750 | Dataset: 0-18216703 | Loss: 1.737 | 676 ms/step , 58180.44 GFLOP/s , 531368.7 tokens/s INFO:__main__:2024-10-27 14:53:38 | Epoch: 3 | Step: 96760 | Dataset: 0-18224703 | Loss: 1.761 | 675 ms/step , 58249.86 GFLOP/s , 531607.8 tokens/s INFO:__main__:2024-10-27 14:53:46 | Epoch: 3 | Step: 96770 | Dataset: 0-18232703 | Loss: 1.742 | 675 ms/step , 58271.80 GFLOP/s , 531719.4 tokens/s INFO:__main__:2024-10-27 14:53:53 | Epoch: 3 | Step: 96780 | Dataset: 0-18240703 | Loss: 1.701 | 676 ms/step , 58137.97 GFLOP/s , 531297.8 tokens/s INFO:__main__:2024-10-27 14:54:01 | Epoch: 3 | Step: 96790 | Dataset: 0-18248703 | Loss: 2.295 | 678 ms/step , 58002.05 GFLOP/s , 531795.8 tokens/s INFO:__main__:2024-10-27 14:54:09 | Epoch: 3 | Step: 96800 | Dataset: 0-18256703 | Loss: 2.216 | 674 ms/step , 58297.85 GFLOP/s , 532168.8 tokens/s INFO:__main__:2024-10-27 14:54:16 | Epoch: 3 | Step: 96810 | Dataset: 0-18264703 | Loss: 2.229 | 675 ms/step , 58199.54 GFLOP/s , 531800.0 tokens/s INFO:__main__:2024-10-27 14:54:24 | Epoch: 3 | Step: 96820 | Dataset: 0-18272703 | Loss: 2.163 | 675 ms/step , 58276.14 GFLOP/s , 531570.2 tokens/s INFO:__main__:2024-10-27 14:54:32 | Epoch: 3 | Step: 96830 | Dataset: 0-18280703 | Loss: 2.189 | 675 ms/step , 58201.07 GFLOP/s , 531667.1 tokens/s INFO:__main__:2024-10-27 14:54:39 | Epoch: 3 | Step: 96840 | Dataset: 0-18288703 | Loss: 2.105 | 675 ms/step , 58235.23 GFLOP/s , 531678.0 tokens/s INFO:__main__:2024-10-27 14:54:47 | Epoch: 3 | Step: 96850 | Dataset: 0-18296703 | Loss: 2.215 | 674 ms/step , 58294.19 GFLOP/s , 531619.6 tokens/s INFO:__main__:2024-10-27 14:54:55 | Epoch: 3 | Step: 96860 | Dataset: 0-18304703 | Loss: 2.225 | 676 ms/step , 58186.32 GFLOP/s , 532495.8 tokens/s INFO:__main__:2024-10-27 14:55:03 | Epoch: 3 | Step: 96870 | Dataset: 0-18312703 | Loss: 2.155 | 676 ms/step , 58163.41 GFLOP/s , 531785.5 tokens/s INFO:__main__:2024-10-27 14:55:10 | Epoch: 3 | Step: 96880 | Dataset: 0-18320703 | Loss: 2.150 | 675 ms/step , 58208.18 GFLOP/s , 531586.9 tokens/s INFO:__main__:2024-10-27 14:55:18 | Epoch: 3 | Step: 96890 | Dataset: 0-18328703 | Loss: 2.194 | 674 ms/step , 58314.84 GFLOP/s , 531726.2 tokens/s INFO:__main__:2024-10-27 14:55:26 | Epoch: 3 | Step: 96900 | Dataset: 0-18336703 | Loss: 2.084 | 676 ms/step , 58148.77 GFLOP/s , 531571.0 tokens/s INFO:__main__:2024-10-27 14:55:33 | Epoch: 3 | Step: 96910 | Dataset: 0-18344703 | Loss: 2.097 | 675 ms/step , 58249.13 GFLOP/s , 531652.8 tokens/s INFO:__main__:2024-10-27 14:55:41 | Epoch: 3 | Step: 96920 | Dataset: 0-18352703 | Loss: 2.176 | 675 ms/step , 58270.47 GFLOP/s , 532030.5 tokens/s INFO:__main__:2024-10-27 14:55:49 | Epoch: 3 | Step: 96930 | Dataset: 0-18360703 | Loss: 2.137 | 675 ms/step , 58198.07 GFLOP/s , 531530.8 tokens/s INFO:__main__:2024-10-27 14:55:56 | Epoch: 3 | Step: 96940 | Dataset: 0-18368703 | Loss: 2.196 | 674 ms/step , 58308.39 GFLOP/s , 531905.4 tokens/s INFO:__main__:2024-10-27 14:56:04 | Epoch: 3 | Step: 96950 | Dataset: 0-18376703 | Loss: 2.181 | 674 ms/step , 58284.12 GFLOP/s , 532169.0 tokens/s INFO:__main__:2024-10-27 14:56:12 | Epoch: 3 | Step: 96960 | Dataset: 0-18384703 | Loss: 2.192 | 675 ms/step , 58209.78 GFLOP/s , 532201.0 tokens/s INFO:__main__:2024-10-27 14:56:20 | Epoch: 3 | Step: 96970 | Dataset: 0-18392703 | Loss: 2.245 | 676 ms/step , 58168.17 GFLOP/s , 531846.5 tokens/s INFO:__main__:2024-10-27 14:56:27 | Epoch: 3 | Step: 96980 | Dataset: 0-18400703 | Loss: 2.173 | 675 ms/step , 58218.24 GFLOP/s , 531881.7 tokens/s INFO:__main__:2024-10-27 14:56:35 | Epoch: 3 | Step: 96990 | Dataset: 0-18408703 | Loss: 2.173 | 675 ms/step , 58214.34 GFLOP/s , 531920.7 tokens/s INFO:__main__:2024-10-27 14:56:42 | Validation | Step: 97000 | Val_loss: 2.154 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 14:56:42 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_145642_step_97000.pt` INFO:__main__:2024-10-27 14:56:44 | Epoch: 3 | Step: 97000 | Dataset: 0-18416703 | Loss: 2.152 | 674 ms/step , 58344.24 GFLOP/s , 479579.1 tokens/s INFO:__main__:2024-10-27 14:56:51 | Epoch: 3 | Step: 97010 | Dataset: 0-18424703 | Loss: 2.199 | 676 ms/step , 58172.73 GFLOP/s , 532039.1 tokens/s INFO:__main__:2024-10-27 14:56:59 | Epoch: 3 | Step: 97020 | Dataset: 0-18432703 | Loss: 2.084 | 675 ms/step , 58219.35 GFLOP/s , 532374.1 tokens/s INFO:__main__:2024-10-27 14:57:07 | Epoch: 3 | Step: 97030 | Dataset: 0-18440703 | Loss: 2.165 | 675 ms/step , 58197.88 GFLOP/s , 532512.8 tokens/s INFO:__main__:2024-10-27 14:57:14 | Epoch: 3 | Step: 97040 | Dataset: 0-18448703 | Loss: 2.112 | 675 ms/step , 58214.35 GFLOP/s , 531965.5 tokens/s INFO:__main__:2024-10-27 14:57:22 | Epoch: 3 | Step: 97050 | Dataset: 0-18456703 | Loss: 2.179 | 675 ms/step , 58234.63 GFLOP/s , 532443.8 tokens/s INFO:__main__:2024-10-27 14:57:30 | Epoch: 3 | Step: 97060 | Dataset: 0-18464703 | Loss: 2.113 | 674 ms/step , 58293.98 GFLOP/s , 532601.7 tokens/s INFO:__main__:2024-10-27 14:57:37 | Epoch: 3 | Step: 97070 | Dataset: 0-18472703 | Loss: 2.180 | 674 ms/step , 58289.80 GFLOP/s , 532382.1 tokens/s INFO:__main__:2024-10-27 14:57:45 | Epoch: 3 | Step: 97080 | Dataset: 0-18480703 | Loss: 2.153 | 675 ms/step , 58207.56 GFLOP/s , 532444.4 tokens/s INFO:__main__:2024-10-27 14:57:53 | Epoch: 3 | Step: 97090 | Dataset: 0-18488703 | Loss: 2.149 | 675 ms/step , 58269.29 GFLOP/s , 532017.5 tokens/s INFO:__main__:2024-10-27 14:58:00 | Epoch: 3 | Step: 97100 | Dataset: 0-18496703 | Loss: 2.138 | 674 ms/step , 58350.74 GFLOP/s , 532578.7 tokens/s INFO:__main__:2024-10-27 14:58:08 | Epoch: 3 | Step: 97110 | Dataset: 0-18504703 | Loss: 2.121 | 675 ms/step , 58226.10 GFLOP/s , 532224.3 tokens/s INFO:__main__:2024-10-27 14:58:16 | Epoch: 3 | Step: 97120 | Dataset: 0-18512703 | Loss: 2.146 | 675 ms/step , 58231.75 GFLOP/s , 531878.3 tokens/s INFO:__main__:2024-10-27 14:58:24 | Epoch: 3 | Step: 97130 | Dataset: 0-18520703 | Loss: 2.068 | 677 ms/step , 58075.38 GFLOP/s , 531460.2 tokens/s INFO:__main__:2024-10-27 14:58:31 | Epoch: 3 | Step: 97140 | Dataset: 0-18528703 | Loss: 2.002 | 675 ms/step , 58217.87 GFLOP/s , 532043.2 tokens/s INFO:__main__:2024-10-27 14:58:39 | Epoch: 3 | Step: 97150 | Dataset: 0-18536703 | Loss: 2.092 | 674 ms/step , 58291.30 GFLOP/s , 532274.8 tokens/s INFO:__main__:2024-10-27 14:58:47 | Epoch: 3 | Step: 97160 | Dataset: 0-18544703 | Loss: 2.097 | 674 ms/step , 58313.80 GFLOP/s , 532601.9 tokens/s INFO:__main__:2024-10-27 14:58:54 | Epoch: 3 | Step: 97170 | Dataset: 0-18552703 | Loss: 2.167 | 675 ms/step , 58254.47 GFLOP/s , 532515.4 tokens/s INFO:__main__:2024-10-27 14:59:02 | Epoch: 3 | Step: 97180 | Dataset: 0-18560703 | Loss: 2.127 | 674 ms/step , 58321.22 GFLOP/s , 532888.4 tokens/s INFO:__main__:2024-10-27 14:59:10 | Epoch: 3 | Step: 97190 | Dataset: 0-18568703 | Loss: 2.064 | 675 ms/step , 58261.26 GFLOP/s , 532253.5 tokens/s INFO:__main__:2024-10-27 14:59:17 | Epoch: 3 | Step: 97200 | Dataset: 0-18576703 | Loss: 2.041 | 675 ms/step , 58243.71 GFLOP/s , 530916.7 tokens/s INFO:__main__:2024-10-27 14:59:25 | Epoch: 3 | Step: 97210 | Dataset: 0-18584703 | Loss: 2.128 | 675 ms/step , 58223.33 GFLOP/s , 533109.7 tokens/s INFO:__main__:2024-10-27 14:59:33 | Epoch: 3 | Step: 97220 | Dataset: 0-18592703 | Loss: 2.095 | 674 ms/step , 58293.80 GFLOP/s , 532718.5 tokens/s INFO:__main__:2024-10-27 14:59:41 | Epoch: 3 | Step: 97230 | Dataset: 0-18600703 | Loss: 2.190 | 676 ms/step , 58151.43 GFLOP/s , 532241.0 tokens/s INFO:__main__:2024-10-27 14:59:48 | Epoch: 3 | Step: 97240 | Dataset: 0-18608703 | Loss: 2.075 | 675 ms/step , 58239.90 GFLOP/s , 532299.6 tokens/s INFO:__main__:2024-10-27 14:59:56 | Epoch: 3 | Step: 97250 | Dataset: 0-18616703 | Loss: 2.173 | 674 ms/step , 58293.41 GFLOP/s , 532572.9 tokens/s INFO:__main__:2024-10-27 15:00:03 | Epoch: 3 | Step: 97260 | Dataset: 0-18624703 | Loss: 2.014 | 675 ms/step , 58252.64 GFLOP/s , 546116.7 tokens/s INFO:__main__:2024-10-27 15:00:11 | Epoch: 3 | Step: 97270 | Dataset: 0-18632703 | Loss: 2.244 | 676 ms/step , 58140.00 GFLOP/s , 532356.0 tokens/s INFO:__main__:2024-10-27 15:00:19 | Epoch: 3 | Step: 97280 | Dataset: 0-18640703 | Loss: 2.068 | 675 ms/step , 58269.14 GFLOP/s , 532223.5 tokens/s INFO:__main__:2024-10-27 15:00:26 | Epoch: 3 | Step: 97290 | Dataset: 0-18648703 | Loss: 2.120 | 674 ms/step , 58321.21 GFLOP/s , 533038.3 tokens/s INFO:__main__:2024-10-27 15:00:34 | Epoch: 3 | Step: 97300 | Dataset: 0-18656703 | Loss: 2.204 | 675 ms/step , 58264.51 GFLOP/s , 532703.6 tokens/s INFO:__main__:2024-10-27 15:00:42 | Epoch: 3 | Step: 97310 | Dataset: 0-18664703 | Loss: 2.104 | 674 ms/step , 58323.75 GFLOP/s , 533021.7 tokens/s INFO:__main__:2024-10-27 15:00:50 | Epoch: 3 | Step: 97320 | Dataset: 0-18672703 | Loss: 2.093 | 676 ms/step , 58144.32 GFLOP/s , 531996.0 tokens/s INFO:__main__:2024-10-27 15:00:57 | Epoch: 3 | Step: 97330 | Dataset: 0-18680703 | Loss: 2.071 | 674 ms/step , 58335.68 GFLOP/s , 532430.7 tokens/s INFO:__main__:2024-10-27 15:01:05 | Epoch: 3 | Step: 97340 | Dataset: 0-18688703 | Loss: 2.135 | 676 ms/step , 58159.39 GFLOP/s , 532042.6 tokens/s INFO:__main__:2024-10-27 15:01:13 | Epoch: 3 | Step: 97350 | Dataset: 0-18696703 | Loss: 2.125 | 675 ms/step , 58224.16 GFLOP/s , 531647.1 tokens/s INFO:__main__:2024-10-27 15:01:20 | Epoch: 3 | Step: 97360 | Dataset: 0-18704703 | Loss: 2.032 | 674 ms/step , 58312.69 GFLOP/s , 532537.1 tokens/s INFO:__main__:2024-10-27 15:01:28 | Epoch: 3 | Step: 97370 | Dataset: 0-18712703 | Loss: 2.006 | 676 ms/step , 58162.74 GFLOP/s , 531848.5 tokens/s INFO:__main__:2024-10-27 15:01:36 | Epoch: 3 | Step: 97380 | Dataset: 0-18720703 | Loss: 2.082 | 675 ms/step , 58210.30 GFLOP/s , 532064.6 tokens/s INFO:__main__:2024-10-27 15:01:43 | Epoch: 3 | Step: 97390 | Dataset: 0-18728703 | Loss: 2.066 | 675 ms/step , 58231.29 GFLOP/s , 531740.2 tokens/s INFO:__main__:2024-10-27 15:01:51 | Epoch: 3 | Step: 97400 | Dataset: 0-18736703 | Loss: 2.087 | 677 ms/step , 58026.28 GFLOP/s , 530737.8 tokens/s INFO:__main__:2024-10-27 15:01:59 | Epoch: 3 | Step: 97410 | Dataset: 0-18744703 | Loss: 2.109 | 677 ms/step , 58039.17 GFLOP/s , 530780.8 tokens/s INFO:__main__:2024-10-27 15:02:07 | Epoch: 3 | Step: 97420 | Dataset: 0-18752703 | Loss: 2.077 | 678 ms/step , 58019.27 GFLOP/s , 530765.2 tokens/s INFO:__main__:2024-10-27 15:02:14 | Epoch: 3 | Step: 97430 | Dataset: 0-18760703 | Loss: 2.000 | 679 ms/step , 57859.47 GFLOP/s , 530399.9 tokens/s INFO:__main__:2024-10-27 15:02:22 | Epoch: 3 | Step: 97440 | Dataset: 0-18768703 | Loss: 1.810 | 677 ms/step , 58056.04 GFLOP/s , 529424.0 tokens/s INFO:__main__:2024-10-27 15:02:30 | Epoch: 3 | Step: 97450 | Dataset: 0-18776703 | Loss: 1.808 | 675 ms/step , 58267.74 GFLOP/s , 531986.1 tokens/s INFO:__main__:2024-10-27 15:02:37 | Epoch: 3 | Step: 97460 | Dataset: 0-18784703 | Loss: 1.793 | 674 ms/step , 58350.20 GFLOP/s , 532304.3 tokens/s INFO:__main__:2024-10-27 15:02:45 | Epoch: 3 | Step: 97470 | Dataset: 0-18792703 | Loss: 1.730 | 675 ms/step , 58202.20 GFLOP/s , 532736.3 tokens/s INFO:__main__:2024-10-27 15:02:53 | Epoch: 3 | Step: 97480 | Dataset: 0-18800703 | Loss: 1.767 | 676 ms/step , 58167.38 GFLOP/s , 531557.1 tokens/s INFO:__main__:2024-10-27 15:03:01 | Epoch: 3 | Step: 97490 | Dataset: 0-18808703 | Loss: 1.774 | 676 ms/step , 58151.22 GFLOP/s , 531432.8 tokens/s INFO:__main__:2024-10-27 15:03:08 | Epoch: 3 | Step: 97500 | Dataset: 0-18816703 | Loss: 1.735 | 677 ms/step , 58078.76 GFLOP/s , 531382.7 tokens/s INFO:__main__:2024-10-27 15:03:16 | Epoch: 3 | Step: 97510 | Dataset: 0-18824703 | Loss: 1.730 | 676 ms/step , 58144.80 GFLOP/s , 531124.2 tokens/s INFO:__main__:2024-10-27 15:03:24 | Epoch: 3 | Step: 97520 | Dataset: 0-18832703 | Loss: 2.280 | 676 ms/step , 58154.02 GFLOP/s , 531455.4 tokens/s INFO:__main__:2024-10-27 15:03:31 | Epoch: 3 | Step: 97530 | Dataset: 0-18840703 | Loss: 2.189 | 676 ms/step , 58122.58 GFLOP/s , 532238.4 tokens/s INFO:__main__:2024-10-27 15:03:39 | Epoch: 3 | Step: 97540 | Dataset: 0-18848703 | Loss: 2.129 | 676 ms/step , 58192.38 GFLOP/s , 532146.8 tokens/s INFO:__main__:2024-10-27 15:03:47 | Epoch: 3 | Step: 97550 | Dataset: 0-18856703 | Loss: 2.109 | 677 ms/step , 58083.75 GFLOP/s , 531871.6 tokens/s INFO:__main__:2024-10-27 15:03:54 | Epoch: 3 | Step: 97560 | Dataset: 0-18864703 | Loss: 2.113 | 677 ms/step , 58102.46 GFLOP/s , 531845.2 tokens/s INFO:__main__:2024-10-27 15:04:02 | Epoch: 3 | Step: 97570 | Dataset: 0-18872703 | Loss: 2.154 | 677 ms/step , 58098.49 GFLOP/s , 531993.3 tokens/s INFO:__main__:2024-10-27 15:04:10 | Epoch: 3 | Step: 97580 | Dataset: 0-18880703 | Loss: 2.204 | 676 ms/step , 58171.83 GFLOP/s , 531905.8 tokens/s INFO:__main__:2024-10-27 15:04:18 | Epoch: 3 | Step: 97590 | Dataset: 0-18888703 | Loss: 2.130 | 675 ms/step , 58204.97 GFLOP/s , 531960.8 tokens/s INFO:__main__:2024-10-27 15:04:25 | Epoch: 3 | Step: 97600 | Dataset: 0-18896703 | Loss: 2.131 | 675 ms/step , 58202.60 GFLOP/s , 532099.2 tokens/s INFO:__main__:2024-10-27 15:04:33 | Epoch: 3 | Step: 97610 | Dataset: 0-18904703 | Loss: 2.138 | 676 ms/step , 58117.68 GFLOP/s , 531864.8 tokens/s INFO:__main__:2024-10-27 15:04:41 | Epoch: 3 | Step: 97620 | Dataset: 0-18912703 | Loss: 2.119 | 676 ms/step , 58145.52 GFLOP/s , 531783.8 tokens/s INFO:__main__:2024-10-27 15:04:48 | Epoch: 3 | Step: 97630 | Dataset: 0-18920703 | Loss: 2.179 | 675 ms/step , 58198.45 GFLOP/s , 531972.8 tokens/s INFO:__main__:2024-10-27 15:04:56 | Epoch: 3 | Step: 97640 | Dataset: 0-18928703 | Loss: 2.124 | 675 ms/step , 58196.86 GFLOP/s , 531829.9 tokens/s INFO:__main__:2024-10-27 15:05:04 | Epoch: 3 | Step: 97650 | Dataset: 0-18936703 | Loss: 2.095 | 674 ms/step , 58309.97 GFLOP/s , 531919.9 tokens/s INFO:__main__:2024-10-27 15:05:11 | Epoch: 3 | Step: 97660 | Dataset: 0-18944703 | Loss: 2.008 | 676 ms/step , 58152.45 GFLOP/s , 530461.8 tokens/s INFO:__main__:2024-10-27 15:05:19 | Epoch: 3 | Step: 97670 | Dataset: 0-18952703 | Loss: 2.147 | 676 ms/step , 58125.71 GFLOP/s , 531372.6 tokens/s INFO:__main__:2024-10-27 15:05:27 | Epoch: 3 | Step: 97680 | Dataset: 0-18960703 | Loss: 2.227 | 676 ms/step , 58172.92 GFLOP/s , 531617.5 tokens/s INFO:__main__:2024-10-27 15:05:35 | Epoch: 3 | Step: 97690 | Dataset: 0-18968703 | Loss: 2.154 | 676 ms/step , 58168.62 GFLOP/s , 531751.7 tokens/s INFO:__main__:2024-10-27 15:05:42 | Epoch: 3 | Step: 97700 | Dataset: 0-18976703 | Loss: 2.172 | 677 ms/step , 58099.12 GFLOP/s , 531458.2 tokens/s INFO:__main__:2024-10-27 15:05:50 | Epoch: 3 | Step: 97710 | Dataset: 0-18984703 | Loss: 2.182 | 677 ms/step , 58101.24 GFLOP/s , 531794.4 tokens/s INFO:__main__:2024-10-27 15:05:58 | Epoch: 3 | Step: 97720 | Dataset: 0-18992703 | Loss: 2.163 | 675 ms/step , 58215.06 GFLOP/s , 531855.5 tokens/s INFO:__main__:2024-10-27 15:06:05 | Epoch: 3 | Step: 97730 | Dataset: 0-19000703 | Loss: 2.175 | 676 ms/step , 58172.21 GFLOP/s , 534808.4 tokens/s INFO:__main__:2024-10-27 15:06:13 | Epoch: 3 | Step: 97740 | Dataset: 0-19008703 | Loss: 2.081 | 676 ms/step , 58109.19 GFLOP/s , 531762.0 tokens/s INFO:__main__:2024-10-27 15:06:21 | Epoch: 3 | Step: 97750 | Dataset: 0-19016703 | Loss: 2.123 | 676 ms/step , 58130.93 GFLOP/s , 531365.9 tokens/s INFO:__main__:2024-10-27 15:06:29 | Epoch: 3 | Step: 97760 | Dataset: 0-19024703 | Loss: 2.190 | 676 ms/step , 58116.28 GFLOP/s , 530861.0 tokens/s INFO:__main__:2024-10-27 15:06:36 | Epoch: 3 | Step: 97770 | Dataset: 0-19032703 | Loss: 2.174 | 674 ms/step , 58304.85 GFLOP/s , 531432.0 tokens/s INFO:__main__:2024-10-27 15:06:44 | Epoch: 3 | Step: 97780 | Dataset: 0-19040703 | Loss: 2.112 | 675 ms/step , 58229.69 GFLOP/s , 531861.9 tokens/s INFO:__main__:2024-10-27 15:06:52 | Epoch: 3 | Step: 97790 | Dataset: 0-19048703 | Loss: 2.138 | 675 ms/step , 58274.76 GFLOP/s , 531918.8 tokens/s INFO:__main__:2024-10-27 15:06:59 | Epoch: 3 | Step: 97800 | Dataset: 0-19056703 | Loss: 2.176 | 675 ms/step , 58218.58 GFLOP/s , 531790.9 tokens/s INFO:__main__:2024-10-27 15:07:07 | Epoch: 3 | Step: 97810 | Dataset: 0-19064703 | Loss: 2.129 | 675 ms/step , 58242.38 GFLOP/s , 531759.0 tokens/s INFO:__main__:2024-10-27 15:07:15 | Epoch: 3 | Step: 97820 | Dataset: 0-19072703 | Loss: 2.114 | 677 ms/step , 58052.36 GFLOP/s , 531058.8 tokens/s INFO:__main__:2024-10-27 15:07:22 | Epoch: 3 | Step: 97830 | Dataset: 0-19080703 | Loss: 2.101 | 675 ms/step , 58210.27 GFLOP/s , 529866.5 tokens/s INFO:__main__:2024-10-27 15:07:30 | Epoch: 3 | Step: 97840 | Dataset: 0-19088703 | Loss: 2.157 | 674 ms/step , 58280.99 GFLOP/s , 532355.6 tokens/s INFO:__main__:2024-10-27 15:07:38 | Epoch: 3 | Step: 97850 | Dataset: 0-19096703 | Loss: 2.171 | 675 ms/step , 58269.30 GFLOP/s , 532533.7 tokens/s INFO:__main__:2024-10-27 15:07:46 | Epoch: 3 | Step: 97860 | Dataset: 0-19104703 | Loss: 2.207 | 674 ms/step , 58305.29 GFLOP/s , 532314.5 tokens/s INFO:__main__:2024-10-27 15:07:53 | Epoch: 3 | Step: 97870 | Dataset: 0-19112703 | Loss: 2.173 | 676 ms/step , 58135.07 GFLOP/s , 532034.4 tokens/s INFO:__main__:2024-10-27 15:08:01 | Epoch: 3 | Step: 97880 | Dataset: 0-19120703 | Loss: 2.114 | 675 ms/step , 58253.73 GFLOP/s , 532353.2 tokens/s INFO:__main__:2024-10-27 15:08:09 | Epoch: 3 | Step: 97890 | Dataset: 0-19128703 | Loss: 2.092 | 674 ms/step , 58352.03 GFLOP/s , 532441.4 tokens/s INFO:__main__:2024-10-27 15:08:16 | Epoch: 3 | Step: 97900 | Dataset: 0-19136703 | Loss: 2.161 | 676 ms/step , 58176.42 GFLOP/s , 532182.0 tokens/s INFO:__main__:2024-10-27 15:08:24 | Epoch: 3 | Step: 97910 | Dataset: 0-19144703 | Loss: 2.181 | 674 ms/step , 58292.54 GFLOP/s , 531907.4 tokens/s INFO:__main__:2024-10-27 15:08:32 | Epoch: 3 | Step: 97920 | Dataset: 0-19152703 | Loss: 2.088 | 675 ms/step , 58244.41 GFLOP/s , 532083.5 tokens/s INFO:__main__:2024-10-27 15:08:39 | Epoch: 3 | Step: 97930 | Dataset: 0-19160703 | Loss: 2.156 | 675 ms/step , 58229.66 GFLOP/s , 532041.5 tokens/s INFO:__main__:2024-10-27 15:08:47 | Epoch: 3 | Step: 97940 | Dataset: 0-19168703 | Loss: 2.183 | 674 ms/step , 58284.27 GFLOP/s , 532174.4 tokens/s INFO:__main__:2024-10-27 15:08:55 | Epoch: 3 | Step: 97950 | Dataset: 0-19176703 | Loss: 2.157 | 675 ms/step , 58272.68 GFLOP/s , 532001.3 tokens/s INFO:__main__:2024-10-27 15:09:03 | Epoch: 3 | Step: 97960 | Dataset: 0-19184703 | Loss: 2.162 | 675 ms/step , 58229.88 GFLOP/s , 531917.6 tokens/s INFO:__main__:2024-10-27 15:09:10 | Epoch: 3 | Step: 97970 | Dataset: 0-19192703 | Loss: 2.205 | 676 ms/step , 58164.99 GFLOP/s , 532255.4 tokens/s INFO:__main__:2024-10-27 15:09:18 | Epoch: 3 | Step: 97980 | Dataset: 0-19200703 | Loss: 2.092 | 676 ms/step , 58182.64 GFLOP/s , 531985.7 tokens/s INFO:__main__:2024-10-27 15:09:26 | Epoch: 3 | Step: 97990 | Dataset: 0-19208703 | Loss: 2.169 | 675 ms/step , 58247.53 GFLOP/s , 532388.6 tokens/s INFO:__main__:2024-10-27 15:09:33 | Validation | Step: 98000 | Val_loss: 2.224 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 15:09:33 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_150933_step_98000.pt` INFO:__main__:2024-10-27 15:09:34 | Epoch: 3 | Step: 98000 | Dataset: 0-19216703 | Loss: 2.818 | 673 ms/step , 58408.82 GFLOP/s , 479709.1 tokens/s INFO:__main__:2024-10-27 15:09:42 | Epoch: 3 | Step: 98010 | Dataset: 0-19224703 | Loss: 2.611 | 674 ms/step , 58282.30 GFLOP/s , 532303.9 tokens/s INFO:__main__:2024-10-27 15:09:50 | Epoch: 3 | Step: 98020 | Dataset: 0-19232703 | Loss: 2.607 | 675 ms/step , 58257.93 GFLOP/s , 532765.7 tokens/s INFO:__main__:2024-10-27 15:09:57 | Epoch: 3 | Step: 98030 | Dataset: 0-19240703 | Loss: 2.488 | 673 ms/step , 58369.65 GFLOP/s , 532631.8 tokens/s INFO:__main__:2024-10-27 15:10:05 | Epoch: 3 | Step: 98040 | Dataset: 0-19248703 | Loss: 2.470 | 675 ms/step , 58272.43 GFLOP/s , 533003.2 tokens/s INFO:__main__:2024-10-27 15:10:13 | Epoch: 3 | Step: 98050 | Dataset: 0-19256703 | Loss: 2.533 | 677 ms/step , 58085.95 GFLOP/s , 532022.1 tokens/s INFO:__main__:2024-10-27 15:10:20 | Epoch: 3 | Step: 98060 | Dataset: 0-19264703 | Loss: 2.574 | 675 ms/step , 58252.90 GFLOP/s , 532173.9 tokens/s INFO:__main__:2024-10-27 15:10:28 | Epoch: 3 | Step: 98070 | Dataset: 0-19272703 | Loss: 2.491 | 676 ms/step , 58143.76 GFLOP/s , 531769.7 tokens/s INFO:__main__:2024-10-27 15:10:36 | Epoch: 3 | Step: 98080 | Dataset: 0-19280703 | Loss: 2.487 | 675 ms/step , 58269.37 GFLOP/s , 531859.3 tokens/s INFO:__main__:2024-10-27 15:10:43 | Epoch: 3 | Step: 98090 | Dataset: 0-19288703 | Loss: 2.439 | 676 ms/step , 58142.19 GFLOP/s , 531478.4 tokens/s INFO:__main__:2024-10-27 15:10:51 | Epoch: 3 | Step: 98100 | Dataset: 0-19296703 | Loss: 2.448 | 675 ms/step , 58262.27 GFLOP/s , 532153.3 tokens/s INFO:__main__:2024-10-27 15:10:59 | Epoch: 3 | Step: 98110 | Dataset: 0-19304703 | Loss: 2.445 | 674 ms/step , 58348.85 GFLOP/s , 532557.4 tokens/s INFO:__main__:2024-10-27 15:11:06 | Epoch: 3 | Step: 98120 | Dataset: 0-19312703 | Loss: 2.455 | 675 ms/step , 58276.30 GFLOP/s , 532049.0 tokens/s INFO:__main__:2024-10-27 15:11:14 | Epoch: 3 | Step: 98130 | Dataset: 0-19320703 | Loss: 2.385 | 675 ms/step , 58234.50 GFLOP/s , 532568.7 tokens/s INFO:__main__:2024-10-27 15:11:22 | Epoch: 3 | Step: 98140 | Dataset: 0-19328703 | Loss: 2.495 | 675 ms/step , 58248.85 GFLOP/s , 532022.1 tokens/s INFO:__main__:2024-10-27 15:11:30 | Epoch: 3 | Step: 98150 | Dataset: 0-19336703 | Loss: 2.396 | 674 ms/step , 58290.27 GFLOP/s , 532307.4 tokens/s INFO:__main__:2024-10-27 15:11:37 | Epoch: 3 | Step: 98160 | Dataset: 0-19344703 | Loss: 2.281 | 675 ms/step , 58248.08 GFLOP/s , 531734.5 tokens/s INFO:__main__:2024-10-27 15:11:45 | Epoch: 3 | Step: 98170 | Dataset: 0-19352703 | Loss: 2.187 | 675 ms/step , 58261.36 GFLOP/s , 532420.5 tokens/s INFO:__main__:2024-10-27 15:11:53 | Epoch: 3 | Step: 98180 | Dataset: 0-19360703 | Loss: 2.259 | 675 ms/step , 58243.55 GFLOP/s , 532024.0 tokens/s INFO:__main__:2024-10-27 15:12:00 | Epoch: 3 | Step: 98190 | Dataset: 0-19368703 | Loss: 2.174 | 674 ms/step , 58286.38 GFLOP/s , 532013.2 tokens/s INFO:__main__:2024-10-27 15:12:08 | Epoch: 3 | Step: 98200 | Dataset: 0-19376703 | Loss: 2.133 | 681 ms/step , 57706.75 GFLOP/s , 531778.0 tokens/s INFO:__main__:2024-10-27 15:12:16 | Epoch: 3 | Step: 98210 | Dataset: 0-19384703 | Loss: 2.213 | 676 ms/step , 58167.64 GFLOP/s , 531783.6 tokens/s INFO:__main__:2024-10-27 15:12:23 | Epoch: 3 | Step: 98220 | Dataset: 0-19392703 | Loss: 2.163 | 676 ms/step , 58187.35 GFLOP/s , 532473.5 tokens/s INFO:__main__:2024-10-27 15:12:31 | Epoch: 3 | Step: 98230 | Dataset: 0-19400703 | Loss: 2.176 | 676 ms/step , 58127.36 GFLOP/s , 531999.4 tokens/s INFO:__main__:2024-10-27 15:12:39 | Epoch: 3 | Step: 98240 | Dataset: 0-19408703 | Loss: 2.146 | 676 ms/step , 58186.64 GFLOP/s , 532005.8 tokens/s INFO:__main__:2024-10-27 15:12:47 | Epoch: 3 | Step: 98250 | Dataset: 0-19416703 | Loss: 2.167 | 676 ms/step , 58173.51 GFLOP/s , 531927.8 tokens/s INFO:__main__:2024-10-27 15:12:54 | Epoch: 3 | Step: 98260 | Dataset: 0-19424703 | Loss: 2.074 | 675 ms/step , 58205.93 GFLOP/s , 532059.9 tokens/s INFO:__main__:2024-10-27 15:13:02 | Epoch: 3 | Step: 98270 | Dataset: 0-19432703 | Loss: 2.156 | 675 ms/step , 58265.07 GFLOP/s , 532078.3 tokens/s INFO:__main__:2024-10-27 15:13:10 | Epoch: 3 | Step: 98280 | Dataset: 0-19440703 | Loss: 2.083 | 677 ms/step , 58101.38 GFLOP/s , 531845.1 tokens/s INFO:__main__:2024-10-27 15:13:17 | Epoch: 3 | Step: 98290 | Dataset: 0-19448703 | Loss: 2.131 | 676 ms/step , 58152.98 GFLOP/s , 531683.0 tokens/s INFO:__main__:2024-10-27 15:13:25 | Epoch: 3 | Step: 98300 | Dataset: 0-19456703 | Loss: 2.106 | 675 ms/step , 58222.25 GFLOP/s , 531827.6 tokens/s INFO:__main__:2024-10-27 15:13:33 | Epoch: 3 | Step: 98310 | Dataset: 0-19464703 | Loss: 2.166 | 675 ms/step , 58201.00 GFLOP/s , 532263.4 tokens/s INFO:__main__:2024-10-27 15:13:40 | Epoch: 3 | Step: 98320 | Dataset: 0-19472703 | Loss: 2.022 | 675 ms/step , 58212.55 GFLOP/s , 532314.7 tokens/s INFO:__main__:2024-10-27 15:13:48 | Epoch: 3 | Step: 98330 | Dataset: 0-19480703 | Loss: 2.149 | 674 ms/step , 58288.33 GFLOP/s , 532287.8 tokens/s INFO:__main__:2024-10-27 15:13:56 | Epoch: 3 | Step: 98340 | Dataset: 0-19488703 | Loss: 2.159 | 675 ms/step , 58254.27 GFLOP/s , 532311.4 tokens/s INFO:__main__:2024-10-27 15:14:04 | Epoch: 3 | Step: 98350 | Dataset: 0-19496703 | Loss: 2.245 | 675 ms/step , 58194.05 GFLOP/s , 532128.0 tokens/s INFO:__main__:2024-10-27 15:14:11 | Epoch: 3 | Step: 98360 | Dataset: 0-19504703 | Loss: 2.136 | 677 ms/step , 58076.44 GFLOP/s , 531438.2 tokens/s INFO:__main__:2024-10-27 15:14:19 | Epoch: 3 | Step: 98370 | Dataset: 0-19512703 | Loss: 2.199 | 675 ms/step , 58264.49 GFLOP/s , 531900.2 tokens/s INFO:__main__:2024-10-27 15:14:27 | Epoch: 3 | Step: 98380 | Dataset: 0-19520703 | Loss: 2.085 | 676 ms/step , 58152.11 GFLOP/s , 532059.7 tokens/s INFO:__main__:2024-10-27 15:14:34 | Epoch: 3 | Step: 98390 | Dataset: 0-19528703 | Loss: 2.104 | 677 ms/step , 58069.91 GFLOP/s , 530598.8 tokens/s INFO:__main__:2024-10-27 15:14:42 | Epoch: 3 | Step: 98400 | Dataset: 0-19536703 | Loss: 2.061 | 674 ms/step , 58294.37 GFLOP/s , 532120.8 tokens/s INFO:__main__:2024-10-27 15:14:50 | Epoch: 3 | Step: 98410 | Dataset: 0-19544703 | Loss: 2.077 | 675 ms/step , 58261.79 GFLOP/s , 532208.0 tokens/s INFO:__main__:2024-10-27 15:14:57 | Epoch: 3 | Step: 98420 | Dataset: 0-19552703 | Loss: 2.155 | 676 ms/step , 58170.64 GFLOP/s , 532228.7 tokens/s INFO:__main__:2024-10-27 15:15:05 | Epoch: 3 | Step: 98430 | Dataset: 0-19560703 | Loss: 2.132 | 674 ms/step , 58306.21 GFLOP/s , 532090.7 tokens/s INFO:__main__:2024-10-27 15:15:13 | Epoch: 3 | Step: 98440 | Dataset: 0-19568703 | Loss: 2.184 | 675 ms/step , 58269.12 GFLOP/s , 532530.2 tokens/s INFO:__main__:2024-10-27 15:15:21 | Epoch: 3 | Step: 98450 | Dataset: 0-19576703 | Loss: 2.124 | 675 ms/step , 58199.52 GFLOP/s , 531804.2 tokens/s INFO:__main__:2024-10-27 15:15:28 | Epoch: 3 | Step: 98460 | Dataset: 0-19584703 | Loss: 2.046 | 675 ms/step , 58211.20 GFLOP/s , 532042.0 tokens/s INFO:__main__:2024-10-27 15:15:36 | Epoch: 3 | Step: 98470 | Dataset: 0-19592703 | Loss: 2.072 | 676 ms/step , 58176.37 GFLOP/s , 531605.2 tokens/s INFO:__main__:2024-10-27 15:15:44 | Epoch: 3 | Step: 98480 | Dataset: 0-19600703 | Loss: 2.210 | 675 ms/step , 58201.93 GFLOP/s , 531919.3 tokens/s INFO:__main__:2024-10-27 15:15:51 | Epoch: 3 | Step: 98490 | Dataset: 0-19608703 | Loss: 2.195 | 674 ms/step , 58285.79 GFLOP/s , 532144.7 tokens/s INFO:__main__:2024-10-27 15:15:59 | Epoch: 3 | Step: 98500 | Dataset: 0-19616703 | Loss: 2.226 | 675 ms/step , 58228.95 GFLOP/s , 532687.6 tokens/s INFO:__main__:2024-10-27 15:16:07 | Epoch: 3 | Step: 98510 | Dataset: 0-19624703 | Loss: 2.139 | 675 ms/step , 58254.69 GFLOP/s , 532414.1 tokens/s INFO:__main__:2024-10-27 15:16:14 | Epoch: 3 | Step: 98520 | Dataset: 0-19632703 | Loss: 2.218 | 674 ms/step , 58318.14 GFLOP/s , 532148.3 tokens/s INFO:__main__:2024-10-27 15:16:22 | Epoch: 3 | Step: 98530 | Dataset: 0-19640703 | Loss: 2.176 | 674 ms/step , 58295.16 GFLOP/s , 532525.3 tokens/s INFO:__main__:2024-10-27 15:16:30 | Epoch: 3 | Step: 98540 | Dataset: 0-19648703 | Loss: 2.139 | 675 ms/step , 58228.25 GFLOP/s , 532471.4 tokens/s INFO:__main__:2024-10-27 15:16:38 | Epoch: 3 | Step: 98550 | Dataset: 0-19656703 | Loss: 2.184 | 673 ms/step , 58384.86 GFLOP/s , 532807.9 tokens/s INFO:__main__:2024-10-27 15:16:45 | Epoch: 3 | Step: 98560 | Dataset: 0-19664703 | Loss: 2.164 | 677 ms/step , 58064.13 GFLOP/s , 530510.2 tokens/s INFO:__main__:2024-10-27 15:16:53 | Epoch: 3 | Step: 98570 | Dataset: 0-19672703 | Loss: 2.152 | 675 ms/step , 58250.17 GFLOP/s , 532765.5 tokens/s INFO:__main__:2024-10-27 15:17:01 | Epoch: 3 | Step: 98580 | Dataset: 0-19680703 | Loss: 2.188 | 676 ms/step , 58170.52 GFLOP/s , 532702.0 tokens/s INFO:__main__:2024-10-27 15:17:08 | Epoch: 3 | Step: 98590 | Dataset: 0-19688703 | Loss: 2.197 | 674 ms/step , 58326.47 GFLOP/s , 532520.8 tokens/s INFO:__main__:2024-10-27 15:17:16 | Epoch: 3 | Step: 98600 | Dataset: 0-19696703 | Loss: 2.131 | 674 ms/step , 58322.93 GFLOP/s , 533121.7 tokens/s INFO:__main__:2024-10-27 15:17:24 | Epoch: 3 | Step: 98610 | Dataset: 0-19704703 | Loss: 2.134 | 675 ms/step , 58228.22 GFLOP/s , 532809.3 tokens/s INFO:__main__:2024-10-27 15:17:31 | Epoch: 3 | Step: 98620 | Dataset: 0-19712703 | Loss: 2.137 | 675 ms/step , 58241.15 GFLOP/s , 532400.6 tokens/s INFO:__main__:2024-10-27 15:17:39 | Epoch: 3 | Step: 98630 | Dataset: 0-19720703 | Loss: 2.107 | 675 ms/step , 58230.00 GFLOP/s , 532274.3 tokens/s INFO:__main__:2024-10-27 15:17:47 | Epoch: 4 | Step: 98640 | Dataset: 0-604 | Loss: 2.147 | 675 ms/step , 58269.43 GFLOP/s , 532252.9 tokens/s INFO:__main__:2024-10-27 15:17:54 | Epoch: 4 | Step: 98650 | Dataset: 0-8604 | Loss: 1.955 | 674 ms/step , 58300.57 GFLOP/s , 531701.4 tokens/s INFO:__main__:2024-10-27 15:18:02 | Epoch: 4 | Step: 98660 | Dataset: 0-16604 | Loss: 1.839 | 674 ms/step , 58333.07 GFLOP/s , 531890.9 tokens/s INFO:__main__:2024-10-27 15:18:10 | Epoch: 4 | Step: 98670 | Dataset: 0-24604 | Loss: 1.826 | 675 ms/step , 58272.64 GFLOP/s , 531588.4 tokens/s INFO:__main__:2024-10-27 15:18:18 | Epoch: 4 | Step: 98680 | Dataset: 0-32604 | Loss: 1.838 | 674 ms/step , 58306.75 GFLOP/s , 532425.9 tokens/s INFO:__main__:2024-10-27 15:18:25 | Epoch: 4 | Step: 98690 | Dataset: 0-40604 | Loss: 1.799 | 675 ms/step , 58220.94 GFLOP/s , 532188.0 tokens/s INFO:__main__:2024-10-27 15:18:33 | Epoch: 4 | Step: 98700 | Dataset: 0-48604 | Loss: 1.783 | 675 ms/step , 58227.55 GFLOP/s , 531592.3 tokens/s INFO:__main__:2024-10-27 15:18:41 | Epoch: 4 | Step: 98710 | Dataset: 0-56604 | Loss: 1.769 | 675 ms/step , 58220.44 GFLOP/s , 531404.0 tokens/s INFO:__main__:2024-10-27 15:18:48 | Epoch: 4 | Step: 98720 | Dataset: 0-64604 | Loss: 1.780 | 675 ms/step , 58208.23 GFLOP/s , 531164.0 tokens/s INFO:__main__:2024-10-27 15:18:56 | Epoch: 4 | Step: 98730 | Dataset: 0-72604 | Loss: 1.767 | 676 ms/step , 58167.13 GFLOP/s , 531781.2 tokens/s INFO:__main__:2024-10-27 15:19:04 | Epoch: 4 | Step: 98740 | Dataset: 0-80604 | Loss: 2.248 | 675 ms/step , 58278.74 GFLOP/s , 532525.7 tokens/s INFO:__main__:2024-10-27 15:19:11 | Epoch: 4 | Step: 98750 | Dataset: 0-88604 | Loss: 2.172 | 675 ms/step , 58266.26 GFLOP/s , 532048.9 tokens/s INFO:__main__:2024-10-27 15:19:19 | Epoch: 4 | Step: 98760 | Dataset: 0-96604 | Loss: 2.040 | 674 ms/step , 58325.48 GFLOP/s , 532631.2 tokens/s INFO:__main__:2024-10-27 15:19:27 | Epoch: 4 | Step: 98770 | Dataset: 0-104604 | Loss: 2.129 | 673 ms/step , 58400.11 GFLOP/s , 532782.3 tokens/s INFO:__main__:2024-10-27 15:19:35 | Epoch: 4 | Step: 98780 | Dataset: 0-112604 | Loss: 2.169 | 675 ms/step , 58221.44 GFLOP/s , 531914.4 tokens/s INFO:__main__:2024-10-27 15:19:42 | Epoch: 4 | Step: 98790 | Dataset: 0-120604 | Loss: 2.115 | 675 ms/step , 58246.26 GFLOP/s , 532081.7 tokens/s INFO:__main__:2024-10-27 15:19:50 | Epoch: 4 | Step: 98800 | Dataset: 0-128604 | Loss: 2.079 | 675 ms/step , 58270.81 GFLOP/s , 531994.8 tokens/s INFO:__main__:2024-10-27 15:19:58 | Epoch: 4 | Step: 98810 | Dataset: 0-136604 | Loss: 2.136 | 676 ms/step , 58189.56 GFLOP/s , 532302.9 tokens/s INFO:__main__:2024-10-27 15:20:05 | Epoch: 4 | Step: 98820 | Dataset: 0-144604 | Loss: 2.125 | 676 ms/step , 58143.48 GFLOP/s , 532013.1 tokens/s INFO:__main__:2024-10-27 15:20:13 | Epoch: 4 | Step: 98830 | Dataset: 0-152604 | Loss: 2.041 | 674 ms/step , 58356.84 GFLOP/s , 531795.4 tokens/s INFO:__main__:2024-10-27 15:20:21 | Epoch: 4 | Step: 98840 | Dataset: 0-160604 | Loss: 1.999 | 674 ms/step , 58326.69 GFLOP/s , 532645.9 tokens/s INFO:__main__:2024-10-27 15:20:28 | Epoch: 4 | Step: 98850 | Dataset: 0-168604 | Loss: 2.093 | 674 ms/step , 58317.50 GFLOP/s , 532314.8 tokens/s INFO:__main__:2024-10-27 15:20:36 | Epoch: 4 | Step: 98860 | Dataset: 0-176604 | Loss: 2.133 | 675 ms/step , 58248.38 GFLOP/s , 532285.0 tokens/s INFO:__main__:2024-10-27 15:20:44 | Epoch: 4 | Step: 98870 | Dataset: 0-184604 | Loss: 2.146 | 677 ms/step , 58096.66 GFLOP/s , 531586.9 tokens/s INFO:__main__:2024-10-27 15:20:52 | Epoch: 4 | Step: 98880 | Dataset: 0-192604 | Loss: 2.083 | 675 ms/step , 58240.35 GFLOP/s , 531642.8 tokens/s INFO:__main__:2024-10-27 15:20:59 | Epoch: 4 | Step: 98890 | Dataset: 0-200604 | Loss: 2.034 | 675 ms/step , 58231.78 GFLOP/s , 531082.7 tokens/s INFO:__main__:2024-10-27 15:21:07 | Epoch: 4 | Step: 98900 | Dataset: 0-208604 | Loss: 2.260 | 676 ms/step , 58148.32 GFLOP/s , 531515.0 tokens/s INFO:__main__:2024-10-27 15:21:15 | Epoch: 4 | Step: 98910 | Dataset: 0-216604 | Loss: 2.157 | 676 ms/step , 58163.62 GFLOP/s , 531714.4 tokens/s INFO:__main__:2024-10-27 15:21:22 | Epoch: 4 | Step: 98920 | Dataset: 0-224604 | Loss: 2.186 | 674 ms/step , 58332.82 GFLOP/s , 532079.5 tokens/s INFO:__main__:2024-10-27 15:21:30 | Epoch: 4 | Step: 98930 | Dataset: 0-232604 | Loss: 2.127 | 674 ms/step , 58311.30 GFLOP/s , 532025.5 tokens/s INFO:__main__:2024-10-27 15:21:38 | Epoch: 4 | Step: 98940 | Dataset: 0-240604 | Loss: 2.122 | 676 ms/step , 58126.44 GFLOP/s , 531642.7 tokens/s INFO:__main__:2024-10-27 15:21:45 | Epoch: 4 | Step: 98950 | Dataset: 0-248604 | Loss: 2.072 | 676 ms/step , 58190.57 GFLOP/s , 531566.0 tokens/s INFO:__main__:2024-10-27 15:21:53 | Epoch: 4 | Step: 98960 | Dataset: 0-256604 | Loss: 2.232 | 675 ms/step , 58232.61 GFLOP/s , 531651.7 tokens/s INFO:__main__:2024-10-27 15:22:01 | Epoch: 4 | Step: 98970 | Dataset: 0-264604 | Loss: 2.136 | 675 ms/step , 58212.36 GFLOP/s , 530726.6 tokens/s INFO:__main__:2024-10-27 15:22:09 | Epoch: 4 | Step: 98980 | Dataset: 0-272604 | Loss: 2.128 | 676 ms/step , 58153.20 GFLOP/s , 531324.4 tokens/s INFO:__main__:2024-10-27 15:22:16 | Epoch: 4 | Step: 98990 | Dataset: 0-280604 | Loss: 2.200 | 676 ms/step , 58140.44 GFLOP/s , 531294.0 tokens/s INFO:__main__:2024-10-27 15:22:24 | Validation | Step: 99000 | Val_loss: 2.141 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 15:22:24 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_152224_step_99000.pt` INFO:__main__:2024-10-27 15:22:25 | Epoch: 4 | Step: 99000 | Dataset: 0-288604 | Loss: 2.127 | 675 ms/step , 58203.14 GFLOP/s , 477974.3 tokens/s INFO:__main__:2024-10-27 15:22:33 | Epoch: 4 | Step: 99010 | Dataset: 0-296604 | Loss: 2.146 | 676 ms/step , 58149.85 GFLOP/s , 531132.5 tokens/s INFO:__main__:2024-10-27 15:22:40 | Epoch: 4 | Step: 99020 | Dataset: 0-304604 | Loss: 2.122 | 677 ms/step , 58095.95 GFLOP/s , 531053.3 tokens/s INFO:__main__:2024-10-27 15:22:48 | Epoch: 4 | Step: 99030 | Dataset: 0-312604 | Loss: 2.150 | 676 ms/step , 58124.88 GFLOP/s , 531354.0 tokens/s INFO:__main__:2024-10-27 15:22:56 | Epoch: 4 | Step: 99040 | Dataset: 0-320604 | Loss: 2.164 | 676 ms/step , 58119.93 GFLOP/s , 531086.3 tokens/s INFO:__main__:2024-10-27 15:23:03 | Epoch: 4 | Step: 99050 | Dataset: 0-328604 | Loss: 2.187 | 677 ms/step , 58055.21 GFLOP/s , 530844.7 tokens/s INFO:__main__:2024-10-27 15:23:11 | Epoch: 4 | Step: 99060 | Dataset: 0-336604 | Loss: 1.865 | 676 ms/step , 58115.24 GFLOP/s , 530960.4 tokens/s INFO:__main__:2024-10-27 15:23:19 | Epoch: 4 | Step: 99070 | Dataset: 0-344604 | Loss: 1.796 | 676 ms/step , 58178.77 GFLOP/s , 531045.7 tokens/s INFO:__main__:2024-10-27 15:23:27 | Epoch: 4 | Step: 99080 | Dataset: 0-352604 | Loss: 1.799 | 675 ms/step , 58252.26 GFLOP/s , 531627.6 tokens/s INFO:__main__:2024-10-27 15:23:34 | Epoch: 4 | Step: 99090 | Dataset: 0-360604 | Loss: 1.783 | 678 ms/step , 58005.82 GFLOP/s , 530501.8 tokens/s INFO:__main__:2024-10-27 15:23:42 | Epoch: 4 | Step: 99100 | Dataset: 0-368604 | Loss: 1.768 | 679 ms/step , 57935.40 GFLOP/s , 529371.9 tokens/s INFO:__main__:2024-10-27 15:23:50 | Epoch: 4 | Step: 99110 | Dataset: 0-376604 | Loss: 1.762 | 680 ms/step , 57820.33 GFLOP/s , 529480.6 tokens/s INFO:__main__:2024-10-27 15:23:57 | Epoch: 4 | Step: 99120 | Dataset: 0-384604 | Loss: 1.761 | 680 ms/step , 57840.82 GFLOP/s , 529174.8 tokens/s INFO:__main__:2024-10-27 15:24:05 | Epoch: 4 | Step: 99130 | Dataset: 0-392604 | Loss: 1.764 | 676 ms/step , 58112.99 GFLOP/s , 531031.5 tokens/s INFO:__main__:2024-10-27 15:24:13 | Epoch: 4 | Step: 99140 | Dataset: 0-400604 | Loss: 1.766 | 675 ms/step , 58216.43 GFLOP/s , 531213.2 tokens/s INFO:__main__:2024-10-27 15:24:21 | Epoch: 4 | Step: 99150 | Dataset: 0-408604 | Loss: 2.091 | 675 ms/step , 58197.86 GFLOP/s , 531427.9 tokens/s INFO:__main__:2024-10-27 15:24:28 | Epoch: 4 | Step: 99160 | Dataset: 0-416604 | Loss: 2.127 | 677 ms/step , 58101.72 GFLOP/s , 531508.5 tokens/s INFO:__main__:2024-10-27 15:24:36 | Epoch: 4 | Step: 99170 | Dataset: 0-424604 | Loss: 2.189 | 676 ms/step , 58139.83 GFLOP/s , 530753.8 tokens/s INFO:__main__:2024-10-27 15:24:44 | Epoch: 4 | Step: 99180 | Dataset: 0-432604 | Loss: 2.217 | 675 ms/step , 58205.50 GFLOP/s , 530713.8 tokens/s INFO:__main__:2024-10-27 15:24:51 | Epoch: 4 | Step: 99190 | Dataset: 0-440604 | Loss: 2.168 | 676 ms/step , 58117.23 GFLOP/s , 531227.7 tokens/s INFO:__main__:2024-10-27 15:24:59 | Epoch: 4 | Step: 99200 | Dataset: 0-448604 | Loss: 2.200 | 680 ms/step , 57798.29 GFLOP/s , 530572.4 tokens/s INFO:__main__:2024-10-27 15:25:07 | Epoch: 4 | Step: 99210 | Dataset: 0-456604 | Loss: 2.084 | 676 ms/step , 58158.84 GFLOP/s , 530550.1 tokens/s INFO:__main__:2024-10-27 15:25:15 | Epoch: 4 | Step: 99220 | Dataset: 0-464604 | Loss: 2.134 | 677 ms/step , 58088.20 GFLOP/s , 531130.9 tokens/s INFO:__main__:2024-10-27 15:25:22 | Epoch: 4 | Step: 99230 | Dataset: 0-472604 | Loss: 2.121 | 676 ms/step , 58129.21 GFLOP/s , 530807.5 tokens/s INFO:__main__:2024-10-27 15:25:30 | Epoch: 4 | Step: 99240 | Dataset: 0-480604 | Loss: 2.198 | 677 ms/step , 58091.61 GFLOP/s , 529968.6 tokens/s INFO:__main__:2024-10-27 15:25:38 | Epoch: 4 | Step: 99250 | Dataset: 0-488604 | Loss: 2.091 | 676 ms/step , 58192.25 GFLOP/s , 531481.7 tokens/s INFO:__main__:2024-10-27 15:25:45 | Epoch: 4 | Step: 99260 | Dataset: 0-496604 | Loss: 2.202 | 674 ms/step , 58318.02 GFLOP/s , 532346.2 tokens/s INFO:__main__:2024-10-27 15:25:53 | Epoch: 4 | Step: 99270 | Dataset: 0-504604 | Loss: 2.127 | 675 ms/step , 58213.12 GFLOP/s , 532435.7 tokens/s INFO:__main__:2024-10-27 15:26:01 | Epoch: 4 | Step: 99280 | Dataset: 0-512604 | Loss: 2.159 | 675 ms/step , 58271.85 GFLOP/s , 532017.0 tokens/s INFO:__main__:2024-10-27 15:26:09 | Epoch: 4 | Step: 99290 | Dataset: 0-520604 | Loss: 2.051 | 675 ms/step , 58230.87 GFLOP/s , 532326.4 tokens/s INFO:__main__:2024-10-27 15:26:16 | Epoch: 4 | Step: 99300 | Dataset: 0-528604 | Loss: 2.070 | 675 ms/step , 58275.60 GFLOP/s , 532520.5 tokens/s INFO:__main__:2024-10-27 15:26:24 | Epoch: 4 | Step: 99310 | Dataset: 0-536604 | Loss: 2.091 | 673 ms/step , 58373.38 GFLOP/s , 532493.9 tokens/s INFO:__main__:2024-10-27 15:26:32 | Epoch: 4 | Step: 99320 | Dataset: 0-544604 | Loss: 2.110 | 675 ms/step , 58258.29 GFLOP/s , 532028.9 tokens/s INFO:__main__:2024-10-27 15:26:39 | Epoch: 4 | Step: 99330 | Dataset: 0-552604 | Loss: 2.117 | 675 ms/step , 58255.34 GFLOP/s , 531864.1 tokens/s INFO:__main__:2024-10-27 15:26:47 | Epoch: 4 | Step: 99340 | Dataset: 0-560604 | Loss: 2.086 | 676 ms/step , 58179.75 GFLOP/s , 531640.5 tokens/s INFO:__main__:2024-10-27 15:26:55 | Epoch: 4 | Step: 99350 | Dataset: 0-568604 | Loss: 2.093 | 674 ms/step , 58319.42 GFLOP/s , 532401.4 tokens/s INFO:__main__:2024-10-27 15:27:02 | Epoch: 4 | Step: 99360 | Dataset: 0-576604 | Loss: 2.045 | 674 ms/step , 58356.76 GFLOP/s , 530382.7 tokens/s INFO:__main__:2024-10-27 15:27:10 | Epoch: 4 | Step: 99370 | Dataset: 0-584604 | Loss: 2.174 | 675 ms/step , 58261.43 GFLOP/s , 532486.6 tokens/s INFO:__main__:2024-10-27 15:27:18 | Epoch: 4 | Step: 99380 | Dataset: 0-592604 | Loss: 2.153 | 676 ms/step , 58185.77 GFLOP/s , 532443.0 tokens/s INFO:__main__:2024-10-27 15:27:26 | Epoch: 4 | Step: 99390 | Dataset: 0-600604 | Loss: 2.019 | 673 ms/step , 58373.20 GFLOP/s , 532684.4 tokens/s INFO:__main__:2024-10-27 15:27:33 | Epoch: 4 | Step: 99400 | Dataset: 0-608604 | Loss: 2.064 | 674 ms/step , 58300.04 GFLOP/s , 532435.3 tokens/s INFO:__main__:2024-10-27 15:27:41 | Epoch: 4 | Step: 99410 | Dataset: 0-616604 | Loss: 2.085 | 675 ms/step , 58206.92 GFLOP/s , 532379.7 tokens/s INFO:__main__:2024-10-27 15:27:49 | Epoch: 4 | Step: 99420 | Dataset: 0-624604 | Loss: 2.065 | 675 ms/step , 58268.25 GFLOP/s , 532430.4 tokens/s INFO:__main__:2024-10-27 15:27:56 | Epoch: 4 | Step: 99430 | Dataset: 0-632604 | Loss: 2.099 | 675 ms/step , 58229.08 GFLOP/s , 532454.2 tokens/s INFO:__main__:2024-10-27 15:28:04 | Epoch: 4 | Step: 99440 | Dataset: 0-640604 | Loss: 2.028 | 674 ms/step , 58288.02 GFLOP/s , 531952.5 tokens/s INFO:__main__:2024-10-27 15:28:12 | Epoch: 4 | Step: 99450 | Dataset: 0-648604 | Loss: 2.036 | 674 ms/step , 58328.42 GFLOP/s , 532728.3 tokens/s INFO:__main__:2024-10-27 15:28:19 | Epoch: 4 | Step: 99460 | Dataset: 0-656604 | Loss: 2.028 | 675 ms/step , 58197.58 GFLOP/s , 532563.0 tokens/s INFO:__main__:2024-10-27 15:28:27 | Epoch: 4 | Step: 99470 | Dataset: 0-664604 | Loss: 2.076 | 675 ms/step , 58270.97 GFLOP/s , 531662.0 tokens/s INFO:__main__:2024-10-27 15:28:35 | Epoch: 4 | Step: 99480 | Dataset: 0-672604 | Loss: 2.119 | 674 ms/step , 58284.17 GFLOP/s , 532061.9 tokens/s INFO:__main__:2024-10-27 15:28:43 | Epoch: 4 | Step: 99490 | Dataset: 0-680604 | Loss: 2.115 | 676 ms/step , 58161.50 GFLOP/s , 530681.8 tokens/s INFO:__main__:2024-10-27 15:28:50 | Epoch: 4 | Step: 99500 | Dataset: 0-688604 | Loss: 2.165 | 674 ms/step , 58303.89 GFLOP/s , 531967.0 tokens/s INFO:__main__:2024-10-27 15:28:58 | Epoch: 4 | Step: 99510 | Dataset: 0-696604 | Loss: 2.118 | 674 ms/step , 58300.89 GFLOP/s , 532205.3 tokens/s INFO:__main__:2024-10-27 15:29:06 | Epoch: 4 | Step: 99520 | Dataset: 0-704604 | Loss: 2.114 | 675 ms/step , 58240.71 GFLOP/s , 532309.4 tokens/s INFO:__main__:2024-10-27 15:29:13 | Epoch: 4 | Step: 99530 | Dataset: 0-712604 | Loss: 2.204 | 675 ms/step , 58215.14 GFLOP/s , 532030.5 tokens/s INFO:__main__:2024-10-27 15:29:21 | Epoch: 4 | Step: 99540 | Dataset: 0-720604 | Loss: 2.142 | 675 ms/step , 58241.92 GFLOP/s , 532076.2 tokens/s INFO:__main__:2024-10-27 15:29:29 | Epoch: 4 | Step: 99550 | Dataset: 0-728604 | Loss: 2.093 | 676 ms/step , 58144.70 GFLOP/s , 531890.7 tokens/s INFO:__main__:2024-10-27 15:29:36 | Epoch: 4 | Step: 99560 | Dataset: 0-736604 | Loss: 2.001 | 676 ms/step , 58180.48 GFLOP/s , 531768.7 tokens/s INFO:__main__:2024-10-27 15:29:44 | Epoch: 4 | Step: 99570 | Dataset: 0-744604 | Loss: 2.030 | 675 ms/step , 58201.74 GFLOP/s , 531644.4 tokens/s INFO:__main__:2024-10-27 15:29:52 | Epoch: 4 | Step: 99580 | Dataset: 0-752604 | Loss: 2.068 | 676 ms/step , 58154.79 GFLOP/s , 531756.6 tokens/s INFO:__main__:2024-10-27 15:30:00 | Epoch: 4 | Step: 99590 | Dataset: 0-760604 | Loss: 2.057 | 676 ms/step , 58184.39 GFLOP/s , 531811.1 tokens/s INFO:__main__:2024-10-27 15:30:07 | Epoch: 4 | Step: 99600 | Dataset: 0-768604 | Loss: 2.100 | 675 ms/step , 58245.95 GFLOP/s , 531521.5 tokens/s INFO:__main__:2024-10-27 15:30:15 | Epoch: 4 | Step: 99610 | Dataset: 0-776604 | Loss: 2.093 | 676 ms/step , 58181.49 GFLOP/s , 531396.9 tokens/s INFO:__main__:2024-10-27 15:30:23 | Epoch: 4 | Step: 99620 | Dataset: 0-784604 | Loss: 2.109 | 675 ms/step , 58256.48 GFLOP/s , 531693.9 tokens/s INFO:__main__:2024-10-27 15:30:30 | Epoch: 4 | Step: 99630 | Dataset: 0-792604 | Loss: 2.131 | 675 ms/step , 58226.15 GFLOP/s , 531817.7 tokens/s INFO:__main__:2024-10-27 15:30:38 | Epoch: 4 | Step: 99640 | Dataset: 0-800604 | Loss: 2.128 | 675 ms/step , 58224.53 GFLOP/s , 531544.6 tokens/s INFO:__main__:2024-10-27 15:30:46 | Epoch: 4 | Step: 99650 | Dataset: 0-808604 | Loss: 2.125 | 675 ms/step , 58201.28 GFLOP/s , 531436.2 tokens/s INFO:__main__:2024-10-27 15:30:53 | Epoch: 4 | Step: 99660 | Dataset: 0-816604 | Loss: 2.111 | 676 ms/step , 58155.48 GFLOP/s , 531378.1 tokens/s INFO:__main__:2024-10-27 15:31:01 | Epoch: 4 | Step: 99670 | Dataset: 0-824604 | Loss: 2.064 | 675 ms/step , 58242.32 GFLOP/s , 531611.1 tokens/s INFO:__main__:2024-10-27 15:31:09 | Epoch: 4 | Step: 99680 | Dataset: 0-832604 | Loss: 2.093 | 676 ms/step , 58149.93 GFLOP/s , 531642.9 tokens/s INFO:__main__:2024-10-27 15:31:17 | Epoch: 4 | Step: 99690 | Dataset: 0-840604 | Loss: 2.057 | 676 ms/step , 58190.83 GFLOP/s , 531496.6 tokens/s INFO:__main__:2024-10-27 15:31:24 | Epoch: 4 | Step: 99700 | Dataset: 0-848604 | Loss: 1.989 | 675 ms/step , 58202.71 GFLOP/s , 531632.4 tokens/s INFO:__main__:2024-10-27 15:31:32 | Epoch: 4 | Step: 99710 | Dataset: 0-856604 | Loss: 2.120 | 677 ms/step , 58087.77 GFLOP/s , 531462.5 tokens/s INFO:__main__:2024-10-27 15:31:40 | Epoch: 4 | Step: 99720 | Dataset: 0-864604 | Loss: 2.110 | 677 ms/step , 58081.91 GFLOP/s , 531206.9 tokens/s INFO:__main__:2024-10-27 15:31:47 | Epoch: 4 | Step: 99730 | Dataset: 0-872604 | Loss: 2.109 | 676 ms/step , 58157.66 GFLOP/s , 531192.7 tokens/s INFO:__main__:2024-10-27 15:31:55 | Epoch: 4 | Step: 99740 | Dataset: 0-880604 | Loss: 2.031 | 676 ms/step , 58172.11 GFLOP/s , 531319.1 tokens/s INFO:__main__:2024-10-27 15:32:03 | Epoch: 4 | Step: 99750 | Dataset: 0-888604 | Loss: 2.047 | 675 ms/step , 58218.38 GFLOP/s , 531593.6 tokens/s INFO:__main__:2024-10-27 15:32:11 | Epoch: 4 | Step: 99760 | Dataset: 0-896604 | Loss: 2.022 | 675 ms/step , 58199.66 GFLOP/s , 531028.1 tokens/s INFO:__main__:2024-10-27 15:32:18 | Epoch: 4 | Step: 99770 | Dataset: 0-904604 | Loss: 2.032 | 676 ms/step , 58148.92 GFLOP/s , 530990.5 tokens/s INFO:__main__:2024-10-27 15:32:26 | Epoch: 4 | Step: 99780 | Dataset: 0-912604 | Loss: 2.115 | 676 ms/step , 58161.67 GFLOP/s , 530836.3 tokens/s INFO:__main__:2024-10-27 15:32:34 | Epoch: 4 | Step: 99790 | Dataset: 0-920604 | Loss: 2.108 | 675 ms/step , 58204.86 GFLOP/s , 531081.2 tokens/s INFO:__main__:2024-10-27 15:32:41 | Epoch: 4 | Step: 99800 | Dataset: 0-928604 | Loss: 2.166 | 678 ms/step , 57940.70 GFLOP/s , 531117.4 tokens/s INFO:__main__:2024-10-27 15:32:49 | Epoch: 4 | Step: 99810 | Dataset: 0-936604 | Loss: 2.166 | 676 ms/step , 58116.49 GFLOP/s , 531345.5 tokens/s INFO:__main__:2024-10-27 15:32:57 | Epoch: 4 | Step: 99820 | Dataset: 0-944604 | Loss: 2.057 | 677 ms/step , 58065.25 GFLOP/s , 530975.3 tokens/s INFO:__main__:2024-10-27 15:33:05 | Epoch: 4 | Step: 99830 | Dataset: 0-952604 | Loss: 2.015 | 675 ms/step , 58198.22 GFLOP/s , 531445.2 tokens/s INFO:__main__:2024-10-27 15:33:12 | Epoch: 4 | Step: 99840 | Dataset: 0-960604 | Loss: 2.203 | 676 ms/step , 58140.54 GFLOP/s , 531473.4 tokens/s INFO:__main__:2024-10-27 15:33:20 | Epoch: 4 | Step: 99850 | Dataset: 0-968604 | Loss: 1.982 | 676 ms/step , 58159.44 GFLOP/s , 531242.0 tokens/s INFO:__main__:2024-10-27 15:33:28 | Epoch: 4 | Step: 99860 | Dataset: 0-976604 | Loss: 2.092 | 675 ms/step , 58231.17 GFLOP/s , 531505.4 tokens/s INFO:__main__:2024-10-27 15:33:35 | Epoch: 4 | Step: 99870 | Dataset: 0-984604 | Loss: 2.078 | 676 ms/step , 58174.78 GFLOP/s , 531348.7 tokens/s INFO:__main__:2024-10-27 15:33:43 | Epoch: 4 | Step: 99880 | Dataset: 0-992604 | Loss: 2.094 | 675 ms/step , 58213.67 GFLOP/s , 531492.0 tokens/s INFO:__main__:2024-10-27 15:33:51 | Epoch: 4 | Step: 99890 | Dataset: 0-1000604 | Loss: 2.087 | 676 ms/step , 58159.07 GFLOP/s , 531027.5 tokens/s INFO:__main__:2024-10-27 15:33:58 | Epoch: 4 | Step: 99900 | Dataset: 0-1008604 | Loss: 2.117 | 676 ms/step , 58179.22 GFLOP/s , 530948.8 tokens/s INFO:__main__:2024-10-27 15:34:06 | Epoch: 4 | Step: 99910 | Dataset: 0-1016604 | Loss: 2.047 | 676 ms/step , 58121.50 GFLOP/s , 530376.4 tokens/s INFO:__main__:2024-10-27 15:34:14 | Epoch: 4 | Step: 99920 | Dataset: 0-1024604 | Loss: 2.137 | 676 ms/step , 58174.48 GFLOP/s , 531258.6 tokens/s INFO:__main__:2024-10-27 15:34:22 | Epoch: 4 | Step: 99930 | Dataset: 0-1032604 | Loss: 2.153 | 676 ms/step , 58145.86 GFLOP/s , 531144.7 tokens/s INFO:__main__:2024-10-27 15:34:29 | Epoch: 4 | Step: 99940 | Dataset: 0-1040604 | Loss: 2.034 | 677 ms/step , 58056.66 GFLOP/s , 531280.4 tokens/s INFO:__main__:2024-10-27 15:34:37 | Epoch: 4 | Step: 99950 | Dataset: 0-1048604 | Loss: 2.068 | 674 ms/step , 58303.20 GFLOP/s , 531606.9 tokens/s INFO:__main__:2024-10-27 15:34:45 | Epoch: 4 | Step: 99960 | Dataset: 0-1056604 | Loss: 1.901 | 676 ms/step , 58153.81 GFLOP/s , 531725.8 tokens/s INFO:__main__:2024-10-27 15:34:52 | Epoch: 4 | Step: 99970 | Dataset: 0-1064604 | Loss: 1.838 | 675 ms/step , 58256.29 GFLOP/s , 531478.1 tokens/s INFO:__main__:2024-10-27 15:35:00 | Epoch: 4 | Step: 99980 | Dataset: 0-1072604 | Loss: 1.806 | 676 ms/step , 58160.65 GFLOP/s , 531827.4 tokens/s INFO:__main__:2024-10-27 15:35:08 | Epoch: 4 | Step: 99990 | Dataset: 0-1080604 | Loss: 1.760 | 676 ms/step , 58175.98 GFLOP/s , 531450.3 tokens/s INFO:__main__:2024-10-27 15:35:15 | Validation | Step: 100000 | Val_loss: 2.288 | Best_val_loss: 1.7829 INFO:__main__:2024-10-27 15:35:15 | Saving checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241027_153515_step_100000.pt` INFO:__main__:2024-10-27 15:35:16 | Epoch: 4 | Step: 100000 | Dataset: 0-1088604 | Loss: 1.796 | 674 ms/step , 58316.88 GFLOP/s , 478312.6 tokens/s INFO:__main__:2024-10-27 15:35:24 | Epoch: 4 | Step: 100010 | Dataset: 0-1096604 | Loss: 1.749 | 674 ms/step , 58286.22 GFLOP/s , 531615.7 tokens/s INFO:__main__:2024-10-27 15:35:32 | Epoch: 4 | Step: 100020 | Dataset: 0-1104604 | Loss: 1.758 | 674 ms/step , 58335.52 GFLOP/s , 531790.0 tokens/s INFO:__main__:2024-10-27 15:35:40 | Epoch: 4 | Step: 100030 | Dataset: 0-1112604 | Loss: 1.769 | 675 ms/step , 58273.77 GFLOP/s , 531790.7 tokens/s INFO:__main__:2024-10-27 15:35:47 | Epoch: 4 | Step: 100040 | Dataset: 0-1120604 | Loss: 2.252 | 674 ms/step , 58303.11 GFLOP/s , 531904.7 tokens/s INFO:__main__:2024-10-27 15:35:55 | Epoch: 4 | Step: 100050 | Dataset: 0-1128604 | Loss: 2.190 | 674 ms/step , 58331.04 GFLOP/s , 532131.1 tokens/s INFO:__main__:2024-10-27 15:36:03 | Epoch: 4 | Step: 100060 | Dataset: 0-1136604 | Loss: 2.171 | 676 ms/step , 58160.64 GFLOP/s , 531956.7 tokens/s INFO:__main__:2024-10-27 15:36:10 | Epoch: 4 | Step: 100070 | Dataset: 0-1144604 | Loss: 2.192 | 676 ms/step , 58176.30 GFLOP/s , 531953.1 tokens/s INFO:__main__:2024-10-27 15:36:18 | Epoch: 4 | Step: 100080 | Dataset: 0-1152604 | Loss: 2.207 | 674 ms/step , 58309.56 GFLOP/s , 531999.3 tokens/s INFO:__main__:2024-10-27 15:36:26 | Epoch: 4 | Step: 100090 | Dataset: 0-1160604 | Loss: 2.181 | 675 ms/step , 58227.29 GFLOP/s , 532102.8 tokens/s INFO:__main__:2024-10-27 15:36:33 | Epoch: 4 | Step: 100100 | Dataset: 0-1168604 | Loss: 2.167 | 675 ms/step , 58215.37 GFLOP/s , 532074.9 tokens/s INFO:__main__:2024-10-27 15:36:41 | Epoch: 4 | Step: 100110 | Dataset: 0-1176604 | Loss: 2.126 | 675 ms/step , 58246.68 GFLOP/s , 531935.3 tokens/s INFO:__main__:2024-10-27 15:36:49 | Epoch: 4 | Step: 100120 | Dataset: 0-1184604 | Loss: 2.093 | 675 ms/step , 58219.22 GFLOP/s , 531969.7 tokens/s INFO:__main__:2024-10-27 15:36:57 | Epoch: 4 | Step: 100130 | Dataset: 0-1192604 | Loss: 2.118 | 675 ms/step , 58216.71 GFLOP/s , 531591.2 tokens/s INFO:__main__:2024-10-27 15:37:04 | Epoch: 4 | Step: 100140 | Dataset: 0-1200604 | Loss: 2.157 | 674 ms/step , 58309.22 GFLOP/s , 532611.5 tokens/s INFO:__main__:2024-10-27 15:37:12 | Epoch: 4 | Step: 100150 | Dataset: 0-1208604 | Loss: 2.162 | 675 ms/step , 58212.15 GFLOP/s , 531756.9 tokens/s INFO:__main__:2024-10-27 15:37:20 | Epoch: 4 | Step: 100160 | Dataset: 0-1216604 | Loss: 2.160 | 675 ms/step , 58210.39 GFLOP/s , 532208.4 tokens/s INFO:__main__:2024-10-27 15:37:27 | Epoch: 4 | Step: 100170 | Dataset: 0-1224604 | Loss: 2.105 | 676 ms/step , 58189.47 GFLOP/s , 532534.0 tokens/s INFO:__main__:2024-10-27 15:37:35 | Epoch: 4 | Step: 100180 | Dataset: 0-1232604 | Loss: 2.177 | 675 ms/step , 58218.63 GFLOP/s , 532046.8 tokens/s INFO:__main__:2024-10-27 15:37:43 | Epoch: 4 | Step: 100190 | Dataset: 0-1240604 | Loss: 2.107 | 675 ms/step , 58221.04 GFLOP/s , 532053.7 tokens/s INFO:__main__:2024-10-27 15:37:50 | Epoch: 4 | Step: 100200 | Dataset: 0-1248604 | Loss: 2.190 | 675 ms/step , 58220.08 GFLOP/s , 532008.8 tokens/s INFO:__main__:2024-10-27 15:37:58 | Epoch: 4 | Step: 100210 | Dataset: 0-1256604 | Loss: 2.153 | 674 ms/step , 58282.58 GFLOP/s , 532084.1 tokens/s INFO:__main__:2024-10-27 15:38:06 | Epoch: 4 | Step: 100220 | Dataset: 0-1264604 | Loss: 2.179 | 675 ms/step , 58261.77 GFLOP/s , 532490.8 tokens/s INFO:__main__:2024-10-27 15:38:13 | Epoch: 4 | Step: 100230 | Dataset: 0-1272604 | Loss: 2.168 | 675 ms/step , 58251.95 GFLOP/s , 532546.3 tokens/s INFO:__main__:2024-10-27 15:38:21 | Epoch: 4 | Step: 100240 | Dataset: 0-1280604 | Loss: 2.207 | 675 ms/step , 58247.96 GFLOP/s , 532509.2 tokens/s INFO:__main__:2024-10-27 15:38:29 | Epoch: 4 | Step: 100250 | Dataset: 0-1288604 | Loss: 2.195 | 674 ms/step , 58302.68 GFLOP/s , 532258.1 tokens/s INFO:__main__:2024-10-27 15:38:37 | Epoch: 4 | Step: 100260 | Dataset: 0-1296604 | Loss: 2.103 | 675 ms/step , 58197.57 GFLOP/s , 531789.5 tokens/s INFO:__main__:2024-10-27 15:38:44 | Epoch: 4 | Step: 100270 | Dataset: 0-1304604 | Loss: 2.187 | 675 ms/step , 58268.80 GFLOP/s , 532421.6 tokens/s INFO:__main__:2024-10-27 15:38:52 | Epoch: 4 | Step: 100280 | Dataset: 0-1312604 | Loss: 2.139 | 675 ms/step , 58232.30 GFLOP/s , 532148.1 tokens/s INFO:__main__:2024-10-27 15:39:00 | Epoch: 4 | Step: 100290 | Dataset: 0-1320604 | Loss: 2.093 | 676 ms/step , 58166.43 GFLOP/s , 531716.6 tokens/s INFO:__main__:2024-10-27 15:39:07 | Epoch: 4 | Step: 100300 | Dataset: 0-1328604 | Loss: 2.084 | 675 ms/step , 58223.42 GFLOP/s , 531978.7 tokens/s INFO:__main__:2024-10-27 15:39:15 | Epoch: 4 | Step: 100310 | Dataset: 0-1336604 | Loss: 2.133 | 675 ms/step , 58217.98 GFLOP/s , 531901.7 tokens/s INFO:__main__:2024-10-27 15:39:23 | Epoch: 4 | Step: 100320 | Dataset: 0-1344604 | Loss: 2.171 | 675 ms/step , 58267.04 GFLOP/s , 532142.7 tokens/s INFO:__main__:2024-10-27 15:39:30 | Epoch: 4 | Step: 100330 | Dataset: 0-1352604 | Loss: 2.092 | 675 ms/step , 58235.45 GFLOP/s , 532231.3 tokens/s INFO:__main__:2024-10-27 15:39:38 | Epoch: 4 | Step: 100340 | Dataset: 0-1360604 | Loss: 2.129 | 674 ms/step , 58294.34 GFLOP/s , 532522.3 tokens/s INFO:__main__:2024-10-27 15:39:46 | Epoch: 4 | Step: 100350 | Dataset: 0-1368604 | Loss: 2.110 | 676 ms/step , 58145.94 GFLOP/s , 531886.2 tokens/s INFO:__main__:2024-10-27 15:39:54 | Epoch: 4 | Step: 100360 | Dataset: 0-1376604 | Loss: 2.168 | 675 ms/step , 58233.64 GFLOP/s , 532099.2 tokens/s INFO:__main__:2024-10-27 15:40:01 | Epoch: 4 | Step: 100370 | Dataset: 0-1384604 | Loss: 2.172 | 675 ms/step , 58243.84 GFLOP/s , 532040.7 tokens/s INFO:__main__:2024-10-27 15:40:09 | Epoch: 4 | Step: 100380 | Dataset: 0-1392604 | Loss: 2.159 | 676 ms/step , 58171.79 GFLOP/s , 532418.4 tokens/s INFO:__main__:2024-10-27 15:40:17 | Epoch: 4 | Step: 100390 | Dataset: 0-1400604 | Loss: 2.184 | 675 ms/step , 58265.59 GFLOP/s , 532431.1 tokens/s INFO:__main__:2024-10-27 15:40:24 | Epoch: 4 | Step: 100400 | Dataset: 0-1408604 | Loss: 2.112 | 675 ms/step , 58220.39 GFLOP/s , 532289.0 tokens/s INFO:__main__:2024-10-27 15:40:32 | Epoch: 4 | Step: 100410 | Dataset: 0-1416604 | Loss: 2.120 | 676 ms/step , 58187.28 GFLOP/s , 532216.1 tokens/s INFO:__main__:2024-10-27 15:40:40 | Epoch: 4 | Step: 100420 | Dataset: 0-1424604 | Loss: 2.116 | 676 ms/step , 58147.35 GFLOP/s , 531944.1 tokens/s INFO:__main__:2024-10-27 15:40:47 | Epoch: 4 | Step: 100430 | Dataset: 0-1432604 | Loss: 2.174 | 676 ms/step , 58116.34 GFLOP/s , 531905.5 tokens/s INFO:__main__:2024-10-27 15:40:55 | Epoch: 4 | Step: 100440 | Dataset: 0-1440604 | Loss: 2.087 | 677 ms/step , 58074.26 GFLOP/s , 531805.8 tokens/s INFO:__main__:2024-10-27 15:41:03 | Epoch: 4 | Step: 100450 | Dataset: 0-1448604 | Loss: 2.010 | 675 ms/step , 58200.52 GFLOP/s , 531830.7 tokens/s INFO:__main__:2024-10-27 15:41:11 | Epoch: 4 | Step: 100460 | Dataset: 0-1456604 | Loss: 2.168 | 675 ms/step , 58248.56 GFLOP/s , 531822.5 tokens/s INFO:__main__:2024-10-27 15:41:18 | Epoch: 4 | Step: 100470 | Dataset: 0-1464604 | Loss: 2.107 | 674 ms/step , 58287.47 GFLOP/s , 532398.5 tokens/s INFO:__main__:2024-10-27 15:41:26 | Epoch: 4 | Step: 100480 | Dataset: 0-1472604 | Loss: 2.169 | 674 ms/step , 58283.30 GFLOP/s , 532203.6 tokens/s INFO:__main__:2024-10-27 15:41:34 | Epoch: 4 | Step: 100490 | Dataset: 0-1480604 | Loss: 2.167 | 676 ms/step , 58139.85 GFLOP/s , 531713.1 tokens/s INFO:__main__:2024-10-27 15:41:41 | Epoch: 4 | Step: 100500 | Dataset: 0-1488604 | Loss: 2.091 | 675 ms/step , 58252.44 GFLOP/s , 532494.0 tokens/s INFO:__main__:2024-10-27 15:41:49 | Epoch: 4 | Step: 100510 | Dataset: 0-1496604 | Loss: 2.103 | 676 ms/step , 58185.80 GFLOP/s , 531738.4 tokens/s INFO:__main__:2024-10-27 15:41:57 | Epoch: 4 | Step: 100520 | Dataset: 0-1504604 | Loss: 2.184 | 675 ms/step , 58250.86 GFLOP/s , 532386.2 tokens/s INFO:__main__:2024-10-27 15:42:04 | Epoch: 4 | Step: 100530 | Dataset: 0-1512604 | Loss: 1.861 | 677 ms/step , 58078.15 GFLOP/s , 531254.2 tokens/s INFO:__main__:2024-10-27 15:42:12 | Epoch: 4 | Step: 100540 | Dataset: 0-1520604 | Loss: 1.804 | 675 ms/step , 58207.87 GFLOP/s , 530035.4 tokens/s INFO:__main__:2024-10-27 15:42:20 | Epoch: 4 | Step: 100550 | Dataset: 0-1528604 | Loss: 1.784 | 678 ms/step , 57989.62 GFLOP/s , 530575.8 tokens/s INFO:__main__:2024-10-27 15:42:28 | Epoch: 4 | Step: 100560 | Dataset: 0-1536604 | Loss: 1.781 | 676 ms/step , 58161.48 GFLOP/s , 530741.9 tokens/s INFO:__main__:2024-10-27 15:42:35 | Epoch: 4 | Step: 100570 | Dataset: 0-1544604 | Loss: 1.783 | 676 ms/step , 58176.33 GFLOP/s , 531086.1 tokens/s INFO:__main__:2024-10-27 15:42:43 | Epoch: 4 | Step: 100580 | Dataset: 0-1552604 | Loss: 1.770 | 675 ms/step , 58223.70 GFLOP/s , 531476.8 tokens/s INFO:__main__:2024-10-27 15:42:51 | Epoch: 4 | Step: 100590 | Dataset: 0-1560604 | Loss: 1.749 | 677 ms/step , 58081.76 GFLOP/s , 530956.7 tokens/s INFO:__main__:2024-10-27 15:42:58 | Epoch: 4 | Step: 100600 | Dataset: 0-1568604 | Loss: 1.753 | 677 ms/step , 58074.09 GFLOP/s , 529788.1 tokens/s INFO:__main__:2024-10-27 15:43:06 | Epoch: 4 | Step: 100610 | Dataset: 0-1576604 | Loss: 2.327 | 675 ms/step , 58194.47 GFLOP/s , 530831.2 tokens/s INFO:__main__:2024-10-27 15:43:14 | Epoch: 4 | Step: 100620 | Dataset: 0-1584604 | Loss: 2.250 | 676 ms/step , 58149.30 GFLOP/s , 531637.6 tokens/s INFO:__main__:2024-10-27 15:43:22 | Epoch: 4 | Step: 100630 | Dataset: 0-1592604 | Loss: 2.117 | 676 ms/step , 58149.72 GFLOP/s , 531887.3 tokens/s INFO:__main__:2024-10-27 15:43:29 | Epoch: 4 | Step: 100640 | Dataset: 0-1600604 | Loss: 2.177 | 675 ms/step , 58220.88 GFLOP/s , 532028.2 tokens/s INFO:__main__:2024-10-27 15:43:37 | Epoch: 4 | Step: 100650 | Dataset: 0-1608604 | Loss: 2.185 | 676 ms/step , 58183.38 GFLOP/s , 531957.2 tokens/s INFO:__main__:2024-10-27 15:43:45 | Epoch: 4 | Step: 100660 | Dataset: 0-1616604 | Loss: 2.159 | 676 ms/step , 58130.49 GFLOP/s , 531883.0 tokens/s INFO:__main__:2024-10-27 15:43:52 | Epoch: 4 | Step: 100670 | Dataset: 0-1624604 | Loss: 2.184 | 676 ms/step , 58175.16 GFLOP/s , 532302.9 tokens/s INFO:__main__:2024-10-27 15:44:00 | Epoch: 4 | Step: 100680 | Dataset: 0-1632604 | Loss: 2.184 | 676 ms/step , 58168.41 GFLOP/s , 531716.0 tokens/s INFO:__main__:2024-10-27 15:44:08 | Epoch: 4 | Step: 100690 | Dataset: 0-1640604 | Loss: 2.115 | 675 ms/step , 58251.23 GFLOP/s , 531918.9 tokens/s INFO:__main__:2024-10-27 15:44:15 | Epoch: 4 | Step: 100700 | Dataset: 0-1648604 | Loss: 2.064 | 675 ms/step , 58248.63 GFLOP/s , 532167.3 tokens/s INFO:__main__:2024-10-27 15:44:23 | Epoch: 4 | Step: 100710 | Dataset: 0-1656604 | Loss: 2.090 | 675 ms/step , 58278.32 GFLOP/s , 531640.4 tokens/s INFO:__main__:2024-10-27 15:44:31 | Epoch: 4 | Step: 100720 | Dataset: 0-1664604 | Loss: 2.108 | 675 ms/step , 58228.00 GFLOP/s , 532225.2 tokens/s INFO:__main__:2024-10-27 15:44:39 | Epoch: 4 | Step: 100730 | Dataset: 0-1672604 | Loss: 2.020 | 674 ms/step , 58313.52 GFLOP/s , 532104.7 tokens/s INFO:__main__:2024-10-27 15:44:46 | Epoch: 4 | Step: 100740 | Dataset: 0-1680604 | Loss: 2.144 | 674 ms/step , 58302.65 GFLOP/s , 532312.1 tokens/s INFO:__main__:2024-10-27 15:44:54 | Epoch: 4 | Step: 100750 | Dataset: 0-1688604 | Loss: 2.087 | 674 ms/step , 58352.89 GFLOP/s , 532724.1 tokens/s INFO:__main__:2024-10-27 15:45:02 | Epoch: 4 | Step: 100760 | Dataset: 0-1696604 | Loss: 2.203 | 674 ms/step , 58351.62 GFLOP/s , 532724.4 tokens/s INFO:__main__:2024-10-27 15:45:09 | Epoch: 4 | Step: 100770 | Dataset: 0-1704604 | Loss: 2.113 | 674 ms/step , 58296.33 GFLOP/s , 532568.6 tokens/s INFO:__main__:2024-10-27 15:45:17 | Epoch: 4 | Step: 100780 | Dataset: 0-1712604 | Loss: 2.150 | 674 ms/step , 58288.25 GFLOP/s , 532752.6 tokens/s INFO:__main__:2024-10-27 15:45:25 | Epoch: 4 | Step: 100790 | Dataset: 0-1720604 | Loss: 2.203 | 675 ms/step , 58236.39 GFLOP/s , 532458.9 tokens/s INFO:__main__:2024-10-27 15:45:32 | Epoch: 4 | Step: 100800 | Dataset: 0-1728604 | Loss: 2.188 | 675 ms/step , 58274.60 GFLOP/s , 532339.6 tokens/s INFO:__main__:2024-10-27 15:45:40 | Epoch: 4 | Step: 100810 | Dataset: 0-1736604 | Loss: 2.110 | 676 ms/step , 58148.55 GFLOP/s , 532481.6 tokens/s INFO:__main__:2024-10-27 15:45:48 | Epoch: 4 | Step: 100820 | Dataset: 0-1744604 | Loss: 2.141 | 676 ms/step , 58185.54 GFLOP/s , 531904.9 tokens/s INFO:__main__:2024-10-27 15:45:56 | Epoch: 4 | Step: 100830 | Dataset: 0-1752604 | Loss: 2.138 | 675 ms/step , 58224.65 GFLOP/s , 532432.4 tokens/s INFO:__main__:2024-10-27 15:46:03 | Epoch: 4 | Step: 100840 | Dataset: 0-1760604 | Loss: 2.223 | 674 ms/step , 58297.73 GFLOP/s , 532459.0 tokens/s INFO:__main__:2024-10-27 15:46:11 | Epoch: 4 | Step: 100850 | Dataset: 0-1768604 | Loss: 2.116 | 675 ms/step , 58256.88 GFLOP/s , 532617.9 tokens/s INFO:__main__:2024-10-27 15:46:19 | Epoch: 4 | Step: 100860 | Dataset: 0-1776604 | Loss: 2.099 | 674 ms/step , 58352.19 GFLOP/s , 532602.0 tokens/s INFO:__main__:2024-10-27 15:46:26 | Epoch: 4 | Step: 100870 | Dataset: 0-1784604 | Loss: 2.196 | 678 ms/step , 58020.66 GFLOP/s , 532267.4 tokens/s INFO:__main__:2024-10-27 15:46:34 | Epoch: 4 | Step: 100880 | Dataset: 0-1792604 | Loss: 2.093 | 677 ms/step , 58029.36 GFLOP/s , 530717.4 tokens/s INFO:__main__:2024-10-27 15:46:42 | Epoch: 4 | Step: 100890 | Dataset: 0-1800604 | Loss: 2.123 | 674 ms/step , 58291.96 GFLOP/s , 532420.7 tokens/s INFO:__main__:2024-10-27 15:46:49 | Epoch: 4 | Step: 100900 | Dataset: 0-1808604 | Loss: 2.118 | 676 ms/step , 58178.20 GFLOP/s , 532460.3 tokens/s INFO:__main__:2024-10-27 15:46:57 | Epoch: 4 | Step: 100910 | Dataset: 0-1816604 | Loss: 2.098 | 676 ms/step , 58192.72 GFLOP/s , 532466.2 tokens/s INFO:__main__:2024-10-27 15:47:05 | Epoch: 4 | Step: 100920 | Dataset: 0-1824604 | Loss: 2.098 | 674 ms/step , 58329.90 GFLOP/s , 532561.4 tokens/s INFO:__main__:2024-10-27 15:47:12 | Epoch: 4 | Step: 100930 | Dataset: 0-1832604 | Loss: 2.284 | 676 ms/step , 58142.64 GFLOP/s , 532203.3 tokens/s INFO:__main__:2024-10-27 15:47:20 | Epoch: 4 | Step: 100940 | Dataset: 0-1840604 | Loss: 1.788 | 676 ms/step , 58183.30 GFLOP/s , 531817.4 tokens/s INFO:__main__:2024-10-27 15:47:28 | Epoch: 4 | Step: 100950 | Dataset: 0-1848604 | Loss: 1.777 | 675 ms/step , 58264.16 GFLOP/s , 531619.3 tokens/s INFO:__main__:2024-10-27 15:47:36 | Epoch: 4 | Step: 100960 | Dataset: 0-1856604 | Loss: 1.751 | 675 ms/step , 58206.16 GFLOP/s , 531737.9 tokens/s INFO:__main__:2024-10-27 15:47:43 | Epoch: 4 | Step: 100970 | Dataset: 0-1864604 | Loss: 1.713 | 676 ms/step , 58140.83 GFLOP/s , 531676.5 tokens/s