Update README.md
Browse files
README.md
CHANGED
|
@@ -161,9 +161,9 @@ Some instruction datasets are added for curiosity sake although model is not tra
|
|
| 161 |
- The model does not perform well enough to tell educational value in instruction datasets.
|
| 162 |
|
| 163 |
# 📈Analysis
|
| 164 |
-
## 🤖Model
|
| 165 |
The expectation is that the model trained with filter will outperform model trained without the filter.
|
| 166 |
-
Fineweb is filtered on the fly with Educational Value >= 1.
|
| 167 |
|
| 168 |
Test 1:
|
| 169 |
Model params: 192M
|
|
@@ -178,7 +178,7 @@ Training token: 3.1B training token, 6000 global steps
|
|
| 178 |
|TruthfulQA| 45.88 | 45.20| 45.97|
|
| 179 |
|Winogrande| 49.49 | 50.59 | 50.67 |
|
| 180 |
|
| 181 |
-
The reasoning and commensense reasoning seems to be better when
|
| 182 |
MMLU is better also; however it is close to random due to limitation in compute (both training time and model size).
|
| 183 |
Model of larger size will be trained to further validate this claim.
|
| 184 |
|
|
@@ -192,3 +192,6 @@ The first 10M records have been analysed. Full file in [here](https://drive.goo
|
|
| 192 |
Below is the top 100 domain names, with no of record >= 100.
|
| 193 |

|
| 194 |
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
- The model does not perform well enough to tell educational value in instruction datasets.
|
| 162 |
|
| 163 |
# 📈Analysis
|
| 164 |
+
## 🤖Model Training With And Without Classifier
|
| 165 |
The expectation is that the model trained with filter will outperform model trained without the filter.
|
| 166 |
+
Fineweb is filtered on the fly with Educational Value >= 1.0.
|
| 167 |
|
| 168 |
Test 1:
|
| 169 |
Model params: 192M
|
|
|
|
| 178 |
|TruthfulQA| 45.88 | 45.20| 45.97|
|
| 179 |
|Winogrande| 49.49 | 50.59 | 50.67 |
|
| 180 |
|
| 181 |
+
The reasoning and commensense reasoning seems to be better when filter is on, aligning with expectation. It is also close to Cosmopedia.
|
| 182 |
MMLU is better also; however it is close to random due to limitation in compute (both training time and model size).
|
| 183 |
Model of larger size will be trained to further validate this claim.
|
| 184 |
|
|
|
|
| 192 |
Below is the top 100 domain names, with no of record >= 100.
|
| 193 |

|
| 194 |
|
| 195 |
+
## 🧪Classifier Ranking Ordering
|
| 196 |
+
Spearman rank-order correlation coefficient between Educational Value and that of test data is 0.7055, indicating a strong monotonic relationship. The Educational Value can be used for ranking.
|
| 197 |
+

|