ldwang commited on
Commit
30b43c7
·
verified ·
1 Parent(s): a790f28

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md CHANGED
@@ -3,6 +3,90 @@
3
  ## Overview
4
  We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ## Wandb
7
  Our training curves have been recorded in Weights & Biases [wandb](https://wandb.ai/openseek-team/OpenSeek-Small-v1-Baseline).
8
 
 
3
  ## Overview
4
  We sampled 100 billion tokens from the CCI4.0 dataset and trained a 1.4B-parameter MoE model with 0.4B active parameters. This model, along with the dataset, is open-sourced as a baseline for future experiments in areas such as dataset construction, algorithmic strategies, and parallel training frameworks. The model arch is same as OpenSeek-Small-v1 model.
5
 
6
+ ## Training Data
7
+ **Total Volume**: 100B high-quality pretraining data
8
+ | Name | Ratio |
9
+ |-------------------------------------------|---------|
10
+ | Nemotron-CC-high-actual-actual-high | 1.1068 |
11
+ | Nemotron-CC-high-actual-actual-low | 0.3577 |
12
+ | Nemotron-CC-high-actual-actual-mid | 0.7775 |
13
+ | Nemotron-CC-high-synthetic-distill-high | 0.2859 |
14
+ | Nemotron-CC-high-synthetic-distill-low | 0.1672 |
15
+ | Nemotron-CC-high-synthetic-distill-mid | 0.2339 |
16
+ | Nemotron-CC-high-synthetic-diverse_qa_pairs-high | 0.5397 |
17
+ | Nemotron-CC-high-synthetic-diverse_qa_pairs-low | 0.4064 |
18
+ | Nemotron-CC-high-synthetic-diverse_qa_pairs-mid | 0.5005 |
19
+ | Nemotron-CC-high-synthetic-extract_knowledge-high | 0.4616 |
20
+ | Nemotron-CC-high-synthetic-extract_knowledge-low | 0.0670 |
21
+ | Nemotron-CC-high-synthetic-extract_knowledge-mid | 0.3429 |
22
+ | Nemotron-CC-high-synthetic-knowledge_list-high | 0.2610 |
23
+ | Nemotron-CC-high-synthetic-knowledge_list-low | 0.1824 |
24
+ | Nemotron-CC-high-synthetic-knowledge_list-mid | 0.2313 |
25
+ | Nemotron-CC-high-synthetic-wrap_medium-high | 0.8237 |
26
+ | Nemotron-CC-high-synthetic-wrap_medium-low | 0.2866 |
27
+ | Nemotron-CC-high-synthetic-wrap_medium-mid | 0.6670 |
28
+ | Nemotron-CC-low-synthetic-wrap_medium-high | 0.4657 |
29
+ | Nemotron-CC-low-synthetic-wrap_medium-low | 0.2005 |
30
+ | Nemotron-CC-low-synthetic-wrap_medium-mid | 0.4317 |
31
+ | Nemotron-CC-medium-actual-actual-high | 1.1397 |
32
+ | Nemotron-CC-medium-actual-actual-low | 0.6782 |
33
+ | Nemotron-CC-medium-actual-actual-mid | 0.9175 |
34
+ | arxiv | 0.6414 |
35
+ | books | 0.4696 |
36
+ | code-high | 1.0102 |
37
+ | code-low | 1.1403 |
38
+ | code-mid | 0.9674 |
39
+ | cot_synthesis2_CC-high | 0.3755 |
40
+ | cot_synthesis2_CC-low | 0.0499 |
41
+ | cot_synthesis2_CC-mid | 1.8299 |
42
+ | cot_synthesis2_OpenSource-high | 0.2573 |
43
+ | cot_synthesis2_OpenSource-low | 0.1638 |
44
+ | cot_synthesis2_OpenSource-mid | 0.3251 |
45
+ | cot_synthesis2_arxiv-high | 6.0237 |
46
+ | cot_synthesis2_arxiv-low | 8.9063 |
47
+ | cot_synthesis2_arxiv-mid | 10.1376 |
48
+ | cot_synthesis2_code-high | 0.4598 |
49
+ | cot_synthesis2_code-low | 0.6857 |
50
+ | cot_synthesis2_code-mid | 0.8990 |
51
+ | cot_synthesis2_math-high | 1.3135 |
52
+ | cot_synthesis2_math-low | 1.6530 |
53
+ | cot_synthesis2_math-mid | 0.3536 |
54
+ | cot_synthesis2_wiki-high | 0.6314 |
55
+ | cot_synthesis2_wiki-low | 0.5978 |
56
+ | cot_synthesis2_wiki-mid | 0.7909 |
57
+ | cot_synthesis_CC-high | 0.2225 |
58
+ | cot_synthesis_CC-low | 0.1797 |
59
+ | cot_synthesis_CC-mid | 0.2042 |
60
+ | cot_synthesis_OpenSource-high | 0.4081 |
61
+ | cot_synthesis_OpenSource-low | 0.1659 |
62
+ | cot_synthesis_OpenSource-mid | 1.2828 |
63
+ | cot_synthesis_arxiv-high | 5.68 |
64
+ | cot_synthesis_arxiv-low | 7.4907 |
65
+ | cot_synthesis_arxiv-mid | 8.9359 |
66
+ | cot_synthesis_code-high | 0.7663 |
67
+ | cot_synthesis_code-low | 0.4052 |
68
+ | cot_synthesis_code-mid | 0.1916 |
69
+ | cot_synthesis_math-high | 0.5074 |
70
+ | cot_synthesis_math-low | 0.6437 |
71
+ | cot_synthesis_math-mid | 0.6406 |
72
+ | cot_synthesis_wiki-high | 0.4000 |
73
+ | cot_synthesis_wiki-low | 0.3564 |
74
+ | cot_synthesis_wiki-mid | 0.5768 |
75
+ | math-high | 1.8165 |
76
+ | math-low | 1.6940 |
77
+ | math-mid | 1.6311 |
78
+ | pes2o | 6.1982 |
79
+ | pes2o-full-train | 1.4257 |
80
+ | pes2o-full-val | 0.0143 |
81
+ | stack | 0.4229 |
82
+ | wiki | 0.4202 |
83
+ | zh_cc-high-loss0 | 1.8171 |
84
+ | zh_cc-high-loss1 | 0.9776 |
85
+ | zh_cc-high-loss2 | 0.3725 |
86
+ | zh_cc-medium-loss0 | 0.9492 |
87
+ | zh_cc-medium-loss1 | 0.9236 |
88
+ | zh_cc-medium-loss2 | 1.0643 |
89
+
90
  ## Wandb
91
  Our training curves have been recorded in Weights & Biases [wandb](https://wandb.ai/openseek-team/OpenSeek-Small-v1-Baseline).
92