Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,90 @@
|
|
3 |
## Overview
|
4 |
We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
## Wandb
|
7 |
Our training curves have been recorded in Weights & Biases [wandb](https://wandb.ai/openseek-team/OpenSeek-Small-v1-Baseline).
|
8 |
|
|
|
3 |
## Overview
|
4 |
We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
|
5 |
|
6 |
+
## Training Data
|
7 |
+
**Total Volume**: 100B high-quality pretraining data
|
8 |
+
| Name | Ratio |
|
9 |
+
|-------------------------------------------|---------|
|
10 |
+
| Nemotron-CC-high-actual-actual-high | 1.1068 |
|
11 |
+
| Nemotron-CC-high-actual-actual-low | 0.3577 |
|
12 |
+
| Nemotron-CC-high-actual-actual-mid | 0.7775 |
|
13 |
+
| Nemotron-CC-high-synthetic-distill-high | 0.2859 |
|
14 |
+
| Nemotron-CC-high-synthetic-distill-low | 0.1672 |
|
15 |
+
| Nemotron-CC-high-synthetic-distill-mid | 0.2339 |
|
16 |
+
| Nemotron-CC-high-synthetic-diverse_qa_pairs-high | 0.5397 |
|
17 |
+
| Nemotron-CC-high-synthetic-diverse_qa_pairs-low | 0.4064 |
|
18 |
+
| Nemotron-CC-high-synthetic-diverse_qa_pairs-mid | 0.5005 |
|
19 |
+
| Nemotron-CC-high-synthetic-extract_knowledge-high | 0.4616 |
|
20 |
+
| Nemotron-CC-high-synthetic-extract_knowledge-low | 0.0670 |
|
21 |
+
| Nemotron-CC-high-synthetic-extract_knowledge-mid | 0.3429 |
|
22 |
+
| Nemotron-CC-high-synthetic-knowledge_list-high | 0.2610 |
|
23 |
+
| Nemotron-CC-high-synthetic-knowledge_list-low | 0.1824 |
|
24 |
+
| Nemotron-CC-high-synthetic-knowledge_list-mid | 0.2313 |
|
25 |
+
| Nemotron-CC-high-synthetic-wrap_medium-high | 0.8237 |
|
26 |
+
| Nemotron-CC-high-synthetic-wrap_medium-low | 0.2866 |
|
27 |
+
| Nemotron-CC-high-synthetic-wrap_medium-mid | 0.6670 |
|
28 |
+
| Nemotron-CC-low-synthetic-wrap_medium-high | 0.4657 |
|
29 |
+
| Nemotron-CC-low-synthetic-wrap_medium-low | 0.2005 |
|
30 |
+
| Nemotron-CC-low-synthetic-wrap_medium-mid | 0.4317 |
|
31 |
+
| Nemotron-CC-medium-actual-actual-high | 1.1397 |
|
32 |
+
| Nemotron-CC-medium-actual-actual-low | 0.6782 |
|
33 |
+
| Nemotron-CC-medium-actual-actual-mid | 0.9175 |
|
34 |
+
| arxiv | 0.6414 |
|
35 |
+
| books | 0.4696 |
|
36 |
+
| code-high | 1.0102 |
|
37 |
+
| code-low | 1.1403 |
|
38 |
+
| code-mid | 0.9674 |
|
39 |
+
| cot_synthesis2_CC-high | 0.3755 |
|
40 |
+
| cot_synthesis2_CC-low | 0.0499 |
|
41 |
+
| cot_synthesis2_CC-mid | 1.8299 |
|
42 |
+
| cot_synthesis2_OpenSource-high | 0.2573 |
|
43 |
+
| cot_synthesis2_OpenSource-low | 0.1638 |
|
44 |
+
| cot_synthesis2_OpenSource-mid | 0.3251 |
|
45 |
+
| cot_synthesis2_arxiv-high | 6.0237 |
|
46 |
+
| cot_synthesis2_arxiv-low | 8.9063 |
|
47 |
+
| cot_synthesis2_arxiv-mid | 10.1376 |
|
48 |
+
| cot_synthesis2_code-high | 0.4598 |
|
49 |
+
| cot_synthesis2_code-low | 0.6857 |
|
50 |
+
| cot_synthesis2_code-mid | 0.8990 |
|
51 |
+
| cot_synthesis2_math-high | 1.3135 |
|
52 |
+
| cot_synthesis2_math-low | 1.6530 |
|
53 |
+
| cot_synthesis2_math-mid | 0.3536 |
|
54 |
+
| cot_synthesis2_wiki-high | 0.6314 |
|
55 |
+
| cot_synthesis2_wiki-low | 0.5978 |
|
56 |
+
| cot_synthesis2_wiki-mid | 0.7909 |
|
57 |
+
| cot_synthesis_CC-high | 0.2225 |
|
58 |
+
| cot_synthesis_CC-low | 0.1797 |
|
59 |
+
| cot_synthesis_CC-mid | 0.2042 |
|
60 |
+
| cot_synthesis_OpenSource-high | 0.4081 |
|
61 |
+
| cot_synthesis_OpenSource-low | 0.1659 |
|
62 |
+
| cot_synthesis_OpenSource-mid | 1.2828 |
|
63 |
+
| cot_synthesis_arxiv-high | 5.68 |
|
64 |
+
| cot_synthesis_arxiv-low | 7.4907 |
|
65 |
+
| cot_synthesis_arxiv-mid | 8.9359 |
|
66 |
+
| cot_synthesis_code-high | 0.7663 |
|
67 |
+
| cot_synthesis_code-low | 0.4052 |
|
68 |
+
| cot_synthesis_code-mid | 0.1916 |
|
69 |
+
| cot_synthesis_math-high | 0.5074 |
|
70 |
+
| cot_synthesis_math-low | 0.6437 |
|
71 |
+
| cot_synthesis_math-mid | 0.6406 |
|
72 |
+
| cot_synthesis_wiki-high | 0.4000 |
|
73 |
+
| cot_synthesis_wiki-low | 0.3564 |
|
74 |
+
| cot_synthesis_wiki-mid | 0.5768 |
|
75 |
+
| math-high | 1.8165 |
|
76 |
+
| math-low | 1.6940 |
|
77 |
+
| math-mid | 1.6311 |
|
78 |
+
| pes2o | 6.1982 |
|
79 |
+
| pes2o-full-train | 1.4257 |
|
80 |
+
| pes2o-full-val | 0.0143 |
|
81 |
+
| stack | 0.4229 |
|
82 |
+
| wiki | 0.4202 |
|
83 |
+
| zh_cc-high-loss0 | 1.8171 |
|
84 |
+
| zh_cc-high-loss1 | 0.9776 |
|
85 |
+
| zh_cc-high-loss2 | 0.3725 |
|
86 |
+
| zh_cc-medium-loss0 | 0.9492 |
|
87 |
+
| zh_cc-medium-loss1 | 0.9236 |
|
88 |
+
| zh_cc-medium-loss2 | 1.0643 |
|
89 |
+
|
90 |
## Wandb
|
91 |
Our training curves have been recorded in Weights & Biases [wandb](https://wandb.ai/openseek-team/OpenSeek-Small-v1-Baseline).
|
92 |
|