mav23 commited on
Commit
378349e
·
verified ·
1 Parent(s): 1de3836

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +247 -0
  3. stablelm-3b-4e1t.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ stablelm-3b-4e1t.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: cc-by-sa-4.0
5
+ tags:
6
+ - causal-lm
7
+ datasets:
8
+ - tiiuae/falcon-refinedweb
9
+ - togethercomputer/RedPajama-Data-1T
10
+ - CarperAI/pilev2-dev
11
+ - bigcode/starcoderdata
12
+ - allenai/peS2o
13
+ model-index:
14
+ - name: stablelm-3b-4e1t
15
+ results:
16
+ - task:
17
+ type: text-generation
18
+ name: Text Generation
19
+ dataset:
20
+ name: AI2 Reasoning Challenge (25-Shot)
21
+ type: ai2_arc
22
+ config: ARC-Challenge
23
+ split: test
24
+ args:
25
+ num_few_shot: 25
26
+ metrics:
27
+ - type: acc_norm
28
+ value: 46.59
29
+ name: normalized accuracy
30
+ source:
31
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
32
+ name: Open LLM Leaderboard
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: HellaSwag (10-Shot)
38
+ type: hellaswag
39
+ split: validation
40
+ args:
41
+ num_few_shot: 10
42
+ metrics:
43
+ - type: acc_norm
44
+ value: 75.94
45
+ name: normalized accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
48
+ name: Open LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: MMLU (5-Shot)
54
+ type: cais/mmlu
55
+ config: all
56
+ split: test
57
+ args:
58
+ num_few_shot: 5
59
+ metrics:
60
+ - type: acc
61
+ value: 45.23
62
+ name: accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: TruthfulQA (0-shot)
71
+ type: truthful_qa
72
+ config: multiple_choice
73
+ split: validation
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: mc2
78
+ value: 37.2
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: Winogrande (5-shot)
87
+ type: winogrande
88
+ config: winogrande_xl
89
+ split: validation
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 71.19
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: GSM8k (5-shot)
104
+ type: gsm8k
105
+ config: main
106
+ split: test
107
+ args:
108
+ num_few_shot: 5
109
+ metrics:
110
+ - type: acc
111
+ value: 3.34
112
+ name: accuracy
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=stabilityai/stablelm-3b-4e1t
115
+ name: Open LLM Leaderboard
116
+ ---
117
+ # `StableLM-3B-4E1T`
118
+
119
+ ## Model Description
120
+
121
+ `StableLM-3B-4E1T` is a 3 billion parameter decoder-only language model pre-trained on 1 trillion tokens of diverse English and code datasets for 4 epochs.
122
+
123
+ ## Usage
124
+
125
+ Get started generating text with `StableLM-3B-4E1T` by using the following code snippet:
126
+
127
+ ```python
128
+ from transformers import AutoModelForCausalLM, AutoTokenizer
129
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
130
+ model = AutoModelForCausalLM.from_pretrained(
131
+ "stabilityai/stablelm-3b-4e1t",
132
+ torch_dtype="auto",
133
+ )
134
+ model.cuda()
135
+ inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(model.device)
136
+ tokens = model.generate(
137
+ **inputs,
138
+ max_new_tokens=64,
139
+ temperature=0.75,
140
+ top_p=0.95,
141
+ do_sample=True,
142
+ )
143
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
144
+ ```
145
+
146
+ ### Run with Flash Attention 2 ⚡️
147
+
148
+ <details>
149
+ <summary> Click to expand </summary>
150
+
151
+ ```python
152
+ from transformers import AutoModelForCausalLM, AutoTokenizer
153
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-3b-4e1t")
154
+ model = AutoModelForCausalLM.from_pretrained(
155
+ "stabilityai/stablelm-3b-4e1t",
156
+ torch_dtype="auto",
157
+ attn_implementation="flash_attention_2",
158
+ )
159
+ model.cuda()
160
+ inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(model.device)
161
+ tokens = model.generate(
162
+ **inputs,
163
+ max_new_tokens=64,
164
+ temperature=0.75,
165
+ top_p=0.95,
166
+ do_sample=True,
167
+ )
168
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
169
+ ```
170
+
171
+ </details>
172
+
173
+
174
+ ## Model Details
175
+
176
+ * **Developed by**: [Stability AI](https://stability.ai/)
177
+ * **Model type**: `StableLM-3B-4E1T` models are auto-regressive language models based on the transformer decoder architecture.
178
+ * **Language(s)**: English
179
+ * **Library**: [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)
180
+ * **License**: Model checkpoints are licensed under the Creative Commons license ([CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/)). Under this license, you must give [credit](https://creativecommons.org/licenses/by/4.0/#) to Stability AI, provide a link to the license, and [indicate if changes were made](https://creativecommons.org/licenses/by/4.0/#). You may do so in any reasonable manner, but not in any way that suggests the Stability AI endorses you or your use.
181
+ * **Contact**: For questions and comments about the model, please email `[email protected]`
182
+
183
+ ### Model Architecture
184
+
185
+ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:
186
+
187
+ | Parameters | Hidden Size | Layers | Heads | Sequence Length |
188
+ |----------------|-------------|--------|-------|-----------------|
189
+ | 2,795,443,200 | 2560 | 32 | 32 | 4096 |
190
+
191
+ * **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
192
+ * **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
193
+ * **Tokenizer**: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).
194
+
195
+ ## Training
196
+
197
+ For complete dataset and training details, please see the [StableLM-3B-4E1T Technical Report](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo).
198
+
199
+ ### Training Dataset
200
+
201
+ The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
202
+
203
+ * Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.
204
+
205
+ ### Training Procedure
206
+
207
+ The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 50,257. We outline the complete hyperparameters choices in the project's [GitHub repository - config](https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-3b-4e1t.yml).
208
+
209
+ ### Training Infrastructure
210
+
211
+ * **Hardware**: `StableLM-3B-4E1T` was trained on the Stability AI cluster across 256 NVIDIA A100 40GB GPUs (AWS P4d instances). Training began on August 23, 2023, and took approximately 30 days to complete.
212
+
213
+ * **Software**: We use a fork of `gpt-neox` ([EleutherAI, 2021](https://github.com/EleutherAI/gpt-neox)), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 ([Rajbhandari et al., 2019](https://arxiv.org/abs/1910.02054v3)), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 ([Dao et al., 2023](https://tridao.me/publications/flash2/flash2.pdf))
214
+
215
+ ## Use and Limitations
216
+
217
+ ### Intended Use
218
+
219
+ The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.
220
+
221
+ ### Limitations and Bias
222
+
223
+ As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.
224
+
225
+ ## How to Cite
226
+
227
+ ```bibtex
228
+ @misc{StableLM-3B-4E1T,
229
+ url={[https://huggingface.co/stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)},
230
+ title={StableLM 3B 4E1T},
231
+ author={Tow, Jonathan and Bellagente, Marco and Mahan, Dakota and Riquelme, Carlos}
232
+ }
233
+ ```
234
+
235
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
236
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_stabilityai__stablelm-3b-4e1t)
237
+
238
+ | Metric |Value|
239
+ |---------------------------------|----:|
240
+ |Avg. |46.58|
241
+ |AI2 Reasoning Challenge (25-Shot)|46.59|
242
+ |HellaSwag (10-Shot) |75.94|
243
+ |MMLU (5-Shot) |45.23|
244
+ |TruthfulQA (0-shot) |37.20|
245
+ |Winogrande (5-shot) |71.19|
246
+ |GSM8k (5-shot) | 3.34|
247
+
stablelm-3b-4e1t.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f10c4a2837c6f29a0f7aafa509ea5e42d74b38f7f3a421cc46dfd0b5de5bc6e
3
+ size 1608571424