Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,11 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
Unusable models, compute optimally 🔥. We hope that buy open-sourcing our compute-optimal trained models, that others can replicate our results and also make no use out of our unusable models. These models are not useful in the slightest, and don't benefit research.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
Unusable models, compute optimally 🔥. We hope that buy open-sourcing our compute-optimal trained models, that others can replicate our results and also make no use out of our unusable models. These models are not useful in the slightest, and don't benefit research.
|
11 |
+
|
12 |
+
- A-Class Models: (Chinchilla-Optimal) 20 x Million Params tokens in training set.
|
13 |
+
- B-Class Models: 42 x Million Params tokens in training set.
|
14 |
+
- C-Class Models: 76 x Million Params tokens in training set.
|
15 |
+
- D-Class Models: 142 x Million Params tokens in training set.
|
16 |
+
|
17 |
+
The B, C, and D classes are derived from the tokens per model ratio from LLaMA, as LLaMA 65B is nearly Chinchilla-optimal with a ratio of 21 x Million Params tokens in training. Descending down the model sizes per training set for each model gives us these classes.
|