NaiveUser commited on
Commit
87575b8
·
verified ·
1 Parent(s): 30dcb56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -3
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - mlfoundations/dclm-baseline-1.0
5
+ ---
6
+ # Morph-1B
7
+
8
+ Morph-1B is a 1 billion parameter language model trained on the DCLM-Baseline dataset, which was curated as part of the DataComp for Language Models (DCLM) benchmark.
9
+
10
+ This model is designed to show wider and shallower models can yield efficiency gains while preserving accuracy.
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ - **Developed by:** Song Bian*, Minghao Yan*, Shivaram Venkataraman
17
+
18
+ ### Model Sources
19
+
20
+ - **Repository:** [open-lm-morph](https://github.com/Waterpine/open-lm-morph)
21
+ - **Paper:** [Scaling Inference-Efficient Language Models](https://arxiv.org/pdf/2501.18107)
22
+
23
+ ### Model Sources
24
+
25
+ The model architecture is similar to GPT-2 and LLaMA, using GPT-Neox as the tokenizer.
26
+
27
+ ### Training Details
28
+
29
+ We utilize [DCLM-Baseline](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) dataset for training.
30
+
31
+ The training procedure and hyperparameters are detailed in our ICML 2025 paper.
32
+
33
+ ## Evaluation
34
+
35
+ We evaluate the models over the following dataset: Arc-Easy, Arc-Challenge, BoolQ, COPA, HellaSwag, Lambada, PIQA, WinoGrande, MMLU, Jeopardy, and Winograd.
36
+
37
+ ### Results
38
+
39
+ | Models | d_model | n_layers | Average | Latency(s) |
40
+ | -------- | ------- | ------- | ------- | ------- |
41
+ | Open-LM-1B | 2048 | 24 | 0.49 | 3.61 |
42
+ | OPT-1.3B | 2048 | 24 | 0.50 | 2.55 |
43
+ | Pythia-1.3B | 2048 | 22 | 0.49 | 3.28 |
44
+ | Neox-1.3B | 2048 | 24 | 0.49 | 3.99 |
45
+ | OPT-IML-1.3B | 2048 | 24 | 0.54 | 2.54 |
46
+ | Morph-1B | 3072 | 12 | 0.52 | 1.96 |
47
+
48
+ #### Summary
49
+
50
+ the Morph-1B model improves inference latency by 1.8× while maintaining accuracy on downstream tasks compared to open-source models.
51
+
52
+ ## Citation
53
+
54
+ **BibTeX:**
55
+
56
+ @article{bian2025scaling,
57
+ title={Scaling Inference-Efficient Language Models},
58
+ author={Bian, Song and Yan, Minghao and Venkataraman, Shivaram},
59
+ journal={arXiv preprint arXiv:2501.18107},
60
+ year={2025}
61
+ }