File size: 845 Bytes
562f9ec
 
 
 
 
b8ac79e
 
562f9ec
 
 
 
 
 
b8ac79e
562f9ec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: mit
datasets:
- tiiuae/falcon-refinedweb
- HuggingFaceFW/fineweb
base_model:
  - cckm/tinymistral_950m
language:
- en
pipeline_tag: text-generation
library_name: PyTorch
---

## A deep and narrow Mistral model (950M params)
This checkpoint is for a small (950M params), deep and narrow (40 layers, hidden size=1440) Mistral model, as described in this [[blog post]](https://epsilons.ai/blog.html#post1_3). It is meant for edge applications.

It was trained with ~400B tokens from RefinedWeb, and ~400B tokens from FineWeb (up to epoch 202418). It is a base model, and has not gone through instruct or chat fine-tuning.

LM Harness numbers:
| Benchmark | Result |
| ----- | ----- |
| arc_c | 0.2884 |
| arc_e | 0.5139 |
| boolq | 0.6089 |
| hellaswag | 0.5888 |
| obqa | 0.3280 |
| piqa | 0.7388 |
| siqa | 0.4038 |
| wino | 0.5627 |