Transformers
falcon3
lucyknada commited on
Commit
7070aac
·
verified ·
1 Parent(s): 57694dd

Upload ./README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +276 -0
README.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - es
6
+ - pt
7
+ tags:
8
+ - falcon3
9
+ base_model: tiiuae/Falcon3-1B-Base
10
+ license: other
11
+ license_name: falcon-llm-license
12
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
+ library_name: transformers
14
+ ---
15
+ ### exl2 quant (measurement.json in main branch)
16
+ ---
17
+ ### check revisions for quants
18
+ ---
19
+
20
+
21
+ <div align="center">
22
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
23
+ </div>
24
+
25
+ # Falcon3-1B-Instruct
26
+
27
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
28
+
29
+ This repository contains the **Falcon3-1B-Instruct**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
30
+ Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
31
+
32
+ ## Model Details
33
+ - Architecture
34
+ - Transformer-based causal decoder-only architecture
35
+ - 18 decoder blocks
36
+ - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
37
+ - Wider head dimension: 256
38
+ - High RoPE value to support long context understanding: 1000042
39
+ - Uses SwiGLU and RMSNorm
40
+ - 8K context length
41
+ - 131K vocab size
42
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
43
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
44
+ - Supports EN, FR, ES, PT
45
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
46
+ - License: TII Falcon-LLM License 2.0
47
+ - Model Release Date: December 2024
48
+
49
+
50
+ ## Getting started
51
+
52
+ <details>
53
+ <summary> Click to expand </summary>
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM
57
+
58
+
59
+ from transformers import AutoModelForCausalLM, AutoTokenizer
60
+
61
+ model_name = "tiiuae/Falcon3-1B-Instruct"
62
+
63
+ model = AutoModelForCausalLM.from_pretrained(
64
+ model_name,
65
+ torch_dtype="auto",
66
+ device_map="auto"
67
+ )
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
69
+
70
+ prompt = "How many hours in one day?"
71
+ messages = [
72
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
73
+ {"role": "user", "content": prompt}
74
+ ]
75
+ text = tokenizer.apply_chat_template(
76
+ messages,
77
+ tokenize=False,
78
+ add_generation_prompt=True
79
+ )
80
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
81
+
82
+ generated_ids = model.generate(
83
+ **model_inputs,
84
+ max_new_tokens=1024
85
+ )
86
+ generated_ids = [
87
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
88
+ ]
89
+
90
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
91
+ print(response)
92
+ ```
93
+
94
+ </details>
95
+
96
+ <br>
97
+
98
+ ## Benchmarks
99
+ We report in the following table our internal pipeline benchmarks.
100
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
101
+ - We report **raw scores** obtained by applying chat template **without fewshot_as_multiturn** (unlike Llama3.1).
102
+ - We use same batch-size across all models.
103
+
104
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
105
+ <colgroup>
106
+ <col style="width: 10%;">
107
+ <col style="width: 10%;">
108
+ <col style="width: 7%;">
109
+ <col style="width: 7%;">
110
+ <col style="width: 7%;">
111
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
112
+ </colgroup>
113
+ <thead>
114
+ <tr>
115
+ <th>Category</th>
116
+ <th>Benchmark</th>
117
+ <th>Llama-3.2-1B</th>
118
+ <th>Qwen2.5-1.5B</th>
119
+ <th>SmolLM2-1.7B</th>
120
+ <th>Falcon3-1B-Instruct</th>
121
+ </tr>
122
+ </thead>
123
+ <tbody>
124
+ <tr>
125
+ <td rowspan="3">General</td>
126
+ <td>MMLU (5-shot)</td>
127
+ <td>23.4</td>
128
+ <td><b>58.4</b></td>
129
+ <td>48.4</td>
130
+ <td>43.9</td>
131
+ </tr>
132
+ <tr>
133
+ <td>MMLU-PRO (5-shot)</td>
134
+ <td>11.3</td>
135
+ <td><b>21.3</b></td>
136
+ <td>17.2</td>
137
+ <td>18.6</td>
138
+ </tr>
139
+ <tr>
140
+ <td>IFEval</td>
141
+ <td><b>55.8</b></td>
142
+ <td>44.4</td>
143
+ <td>53.0</td>
144
+ <td>54.4</td>
145
+ </tr>
146
+ <tr>
147
+ <td rowspan="3">Math</td>
148
+ <td>GSM8K (5-shot)</td>
149
+ <td>37.4</td>
150
+ <td><b>57.2</b></td>
151
+ <td>43.4</td>
152
+ <td>38.6</td>
153
+ </tr>
154
+ <tr>
155
+ <td>GSM8K (8-shot, COT)</td>
156
+ <td>35.6</td>
157
+ <td><b>62.2</b></td>
158
+ <td>47.2</td>
159
+ <td>41.8</td>
160
+ </tr>
161
+ <tr>
162
+ <td>MATH Lvl-5 (4-shot)</td>
163
+ <td><b>3.9</b></td>
164
+ <td>0.2</td>
165
+ <td>0.1</td>
166
+ <td>1.0</td>
167
+ </tr>
168
+ <tr>
169
+ <td rowspan="6">Reasoning</td>
170
+ <td>Arc Challenge (25-shot)</td>
171
+ <td>34.1</td>
172
+ <td>47.0</td>
173
+ <td><b>47.6</b></td>
174
+ <td>45.9</td>
175
+ </tr>
176
+ <tr>
177
+ <td>GPQA (0-shot)</td>
178
+ <td>25.3</td>
179
+ <td><b>29.6</b></td>
180
+ <td>28.7</td>
181
+ <td>26.5</td>
182
+ </tr>
183
+ <tr>
184
+ <td>GPQA (0-shot, COT)</td>
185
+ <td>13.2</td>
186
+ <td>9.2</td>
187
+ <td>16.0</td>
188
+ <td><b>21.3</b></td>
189
+ </tr>
190
+ <tr>
191
+ <td>MUSR (0-shot)</td>
192
+ <td>32.4</td>
193
+ <td>36.8</td>
194
+ <td>33.0</td>
195
+ <td><b>40.7</b></td>
196
+ </tr>
197
+ <tr>
198
+ <td>BBH (3-shot)</td>
199
+ <td>30.3</td>
200
+ <td><b>38.5</b></td>
201
+ <td>33.1</td>
202
+ <td>35.1</td>
203
+ </tr>
204
+ <tr>
205
+ <td>BBH (3-shot, COT)</td>
206
+ <td>0.0</td>
207
+ <td>20.3</td>
208
+ <td>0.8</td>
209
+ <td><b>30.5</b></td>
210
+ </tr>
211
+ <tr>
212
+ <td rowspan="5">CommonSense Understanding</td>
213
+ <td>PIQA (0-shot)</td>
214
+ <td>72.1</td>
215
+ <td>73.2</td>
216
+ <td><b>74.4</b></td>
217
+ <td>72.0</td>
218
+ </tr>
219
+ <tr>
220
+ <td>SciQ (0-shot)</td>
221
+ <td>61.8</td>
222
+ <td>69.5</td>
223
+ <td>71.4</td>
224
+ <td><b>86.8</b></td>
225
+ </tr>
226
+ <tr>
227
+ <td>Winogrande (0-shot)</td>
228
+ <td>-</td>
229
+ <td>-</td>
230
+ <td>-</td>
231
+ <td><b>60.2</b></td>
232
+ </tr>
233
+ <tr>
234
+ <td>OpenbookQA (0-shot)</td>
235
+ <td>40.2</td>
236
+ <td>40.4</td>
237
+ <td><b>42.8</b></td>
238
+ <td>40.0</td>
239
+ </tr>
240
+ <tr>
241
+ <td>MT-Bench (avg)</td>
242
+ <td>5.4</td>
243
+ <td><b>7.1</b></td>
244
+ <td>6.1</td>
245
+ <td>5.5</td>
246
+ </tr>
247
+ <tr>
248
+ <td rowspan="1">Instructions following</td>
249
+ <td>Alpaca (WC)</td>
250
+ <td><b>8.6</b></td>
251
+ <td><b>8.6</b></td>
252
+ <td>5.4</td>
253
+ <td>6.1</td>
254
+ </tr>
255
+ </tbody>
256
+ </table>
257
+
258
+ ## Useful links
259
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
260
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
261
+
262
+ ## Technical Report
263
+ Coming soon....
264
+
265
+ ## Citation
266
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
267
+
268
+ ```
269
+ @misc{Falcon3,
270
+ title = {The Falcon 3 Family of Open Models},
271
+ url = {https://huggingface.co/blog/falcon3},
272
+ author = {Falcon-LLM Team},
273
+ month = {December},
274
+ year = {2024}
275
+ }
276
+ ```