fla-hub
/

rwkv7-1.5B-world

Text Generation

Model card Files Files and versions

ZhangRC commited on Feb 6

Commit

1a6d14f

·

verified ·

1 Parent(s): 27e0c1e

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -75,6 +75,7 @@ This model is trained on the World v2.8 with a total of 1.0 trillion tokens.
 - **Training regime:** bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
 - **Final Loss:** 1.9965
 ## Evaluation

 - **Training regime:** bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
 - **Final Loss:** 1.9965
+- **Token Count:** 3.119 trillion
 ## Evaluation