Update README.md
Browse files
README.md
CHANGED
@@ -75,6 +75,7 @@ This model is trained on the World v2.8 with a total of 1.0 trillion tokens.
|
|
75 |
|
76 |
- **Training regime:** bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
|
77 |
- **Final Loss:** 1.9965
|
|
|
78 |
|
79 |
## Evaluation
|
80 |
|
|
|
75 |
|
76 |
- **Training regime:** bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
|
77 |
- **Final Loss:** 1.9965
|
78 |
+
- **Token Count:** 3.119 trillion
|
79 |
|
80 |
## Evaluation
|
81 |
|