|
--- |
|
datasets: |
|
- JeanKaddour/minipile |
|
language: |
|
- en |
|
base_model: |
|
- EleutherAI/pythia-1.4b-deduped |
|
--- |
|
|
|
| Benchmark | Measure | | 1.4B Pile Deduplicated | 1.4B MiniPile | Percentage Difference in Means | |
|
| ---------------- | ---------- | --- | ---------------------- | -------------------------- | ------------------------------ | |
|
| ARC-Challenge | acc | ↑ | **0.2600 ± 0.0130** | 0.1903 ± 0.0115 | -26.8077 | |
|
| MMLU | acc | ↑ | **0.2388 ± 0.0036** | 0.2295 ± 0.0035 | -3.8945 | |
|
| HellaSwag | acc | ↑ | **0.4177 ± 0.0049** | 0.2579 ± 0.0044 | -38.2571 | |
|
| WinoGrande | acc | ↑ | **0.5730 ± 0.0140** | 0.5185 ± 0.0140 | -9.5133 | |
|
| Lambada (OpenAI) | acc | ↑ | **0.6202 ± 0.0068** | 0.0000 ± 0.0000 | -100.0000 | |
|
| Lambada (OpenAI) | perplexity | ↓ | **6.1041 ± 0.1531** | 1564928.5258 ± 118691.4565 | 25637234.3458 | |
|
| Lambada (Std) | acc | ↑ | **0.4898 ± 0.0070** | 0.0000 ± 0.0000 | -100.0000 | |
|
| Lambada (Std) | perplexity | ↓ | **11.2448 ± 0.3305** | 8848600.9409 ± 745031.8900 | 78690503.1312 | |
|
| BLiMP | acc | ↑ | **0.8154 ± 0.0013** | 0.5483 ± 0.0017 | -32.7569 | |
|
| ARC-Easy | acc | ↑ | **0.6174 ± 0.0100** | 0.2715 ± 0.0091 | -56.0253 | |