|
Model: gpt_LinearDiTStudent2 |
|
Optimizer: adamw, LR: 2e-05 |
|
Best Val Loss: 1.0569, Test Loss: 1.0260 |
|
Epoch 1: Train Loss: 7146.0399, Val Loss: 5.1513 |
|
Epoch 2: Train Loss: 2.6326, Val Loss: 1.4842 |
|
Epoch 3: Train Loss: 1.4425, Val Loss: 1.2182 |
|
Epoch 4: Train Loss: 1.2817, Val Loss: 1.1668 |
|
Epoch 5: Train Loss: 1.1631, Val Loss: 1.1296 |
|
Epoch 6: Train Loss: 1.0963, Val Loss: 1.0863 |
|
Epoch 7: Train Loss: 1.2146, Val Loss: 1.0793 |
|
Epoch 8: Train Loss: 1.0522, Val Loss: 1.0756 |
|
Epoch 9: Train Loss: 1.0450, Val Loss: 1.0707 |
|
Epoch 10: Train Loss: 1.1127, Val Loss: 1.0760 |
|
Epoch 11: Train Loss: 1.0296, Val Loss: 1.0569 |
|
Epoch 12: Train Loss: 1.0260, Val Loss: 1.0654 |
|
Epoch 13: Train Loss: 1.0236, Val Loss: 1.0681 |
|
Epoch 14: Train Loss: 1.0223, Val Loss: 1.0661 |
|
Epoch 15: Train Loss: 1.0216, Val Loss: 1.0714 |
|
Final Test Loss: 1.0260 |