qqc1989 commited on
Commit
e4ac2c3
Β·
verified Β·
1 Parent(s): be2089b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +194 -3
README.md CHANGED
@@ -1,3 +1,194 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model:
5
+ - HuggingFaceTB/SmolLM2-360M-Instruct
6
+ tags:
7
+ - HuggingFaceTB
8
+ - SmolLM2
9
+ - SmolLM2-360M-Instruct
10
+ - Int8
11
+ - M5Stack
12
+ - RaspberryPi 5
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ # SmolLM2-360M-Instruct
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/oWWfzW4RbWkVIo7f-5444.png)
20
+
21
+ This version of SmolLM2-360M-Instruct has been converted to run on the Axera NPU using **w8a16** quantization.
22
+
23
+ This model has been optimized with the following LoRA:
24
+
25
+ Compatible with Pulsar2 version: 3.4(Not released yet)
26
+
27
+ ## Convert tools links:
28
+
29
+ For those who are interested in model conversion, you can try to export axmodel through the original repo
30
+ https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct
31
+
32
+ [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
33
+
34
+ [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/internvl2)
35
+
36
+ [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-llm-internvl)
37
+
38
+ ## Support Platform
39
+
40
+ - AX650
41
+ - AX650N DEMO Board
42
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
43
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
44
+ - AX630C
45
+ - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
46
+ - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
47
+ - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
48
+
49
+ |Chips|w8a16|w4a16|
50
+ |--|--|--|
51
+ |AX650| 39 tokens/sec|todo|
52
+ |AX630C| 14 tokens/sec|todo|
53
+
54
+ ## How to use
55
+
56
+ Download all files from this repository to the device
57
+
58
+ ```
59
+ root@ax650:/mnt/qtang/llm-test/smollm2-360m# tree -L 1
60
+ .
61
+ |-- main_axcl_aarch64
62
+ |-- main_axcl_x86
63
+ |-- main_prefill
64
+ |-- post_config.json
65
+ |-- run_smollm2_360m_ax630c.sh
66
+ |-- run_smollm2_360m_ax650.sh
67
+ |-- run_smollm2_360m_axcl_aarch64.sh
68
+ |-- run_smollm2_360m_axcl_x86.sh
69
+ |-- smollm2-360m-ax630c
70
+ |-- smollm2-360m-ax650
71
+ |-- smollm2_tokenizer
72
+ `-- smollm2_tokenizer.py
73
+ ```
74
+
75
+ ### Install transformer
76
+
77
+ ```
78
+ pip install transformers==4.41.1
79
+ ```
80
+
81
+ ### Start the Tokenizer service
82
+
83
+ ```
84
+ root@ax650:/mnt/qtang/llm-test/smollm2-360m$ python smollm2_tokenizer.py --port 12345
85
+ 1 <|im_start|> 2 <|im_end|>
86
+ <|im_start|>system
87
+ You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
88
+ <|im_start|>user
89
+ hello world<|im_end|>
90
+ <|im_start|>assistant
91
+
92
+ [1, 9690, 198, 2683, 359, 253, 5356, 5646, 11173, 3365, 3511, 308, 34519, 28, 7018, 411, 407, 19712, 8182, 2, 198, 1, 4093, 198, 28120, 905, 2, 198, 1, 520, 9531, 198]
93
+ http://localhost:12345
94
+ ```
95
+
96
+ ### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
97
+
98
+ Open another terminal and run `run_smollm2_360m_ax650.sh`
99
+
100
+ ```
101
+ root@ax650:/mnt/qtang/llm-test/smollm2-360m# ./run_smollm2_360m_ax650.sh
102
+ [I][ Init][ 125]: LLM init start
103
+ bos_id: 1, eos_id: 2
104
+ 2% | β–ˆ | 1 / 35 [0.00s<0.14s, 250.00 count/s] tokenizer init ok
105
+ [I][ Init][ 26]: LLaMaEmbedSelector use mmap
106
+ 100% | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35 / 35 [0.81s<0.81s, 43.37 count/s] init post axmodel ok,remain_cmm(3339 MB)
107
+ [I][ Init][ 241]: max_token_len : 1023
108
+ [I][ Init][ 246]: kv_cache_size : 320, kv_cache_num: 1023
109
+ [I][ Init][ 254]: prefill_token_num : 128
110
+ [I][ load_config][ 281]: load config:
111
+ {
112
+ "enable_repetition_penalty": false,
113
+ "enable_temperature": true,
114
+ "enable_top_k_sampling": true,
115
+ "enable_top_p_sampling": false,
116
+ "penalty_window": 20,
117
+ "repetition_penalty": 1.2,
118
+ "temperature": 0.9,
119
+ "top_k": 10,
120
+ "top_p": 0.8
121
+ }
122
+
123
+ [I][ Init][ 268]: LLM init ok
124
+ Type "q" to exit, Ctrl+c to stop current running
125
+ >> who are you?
126
+ [I][ Run][ 466]: ttft: 156.63 ms
127
+ I'm a chatbot developed by the Artificial Intelligence Research and Development Lab (AI R&D Lab) at Hugging Face Labs,
128
+ specifically designed to facilitate and augment human-AI conversations. My role is to provide assistance in understanding
129
+ and responding to natural language queries, using advanced language models and AI algorithms to understand context and intent.
130
+
131
+ [N][ Run][ 605]: hit eos,avg 38.70 token/s
132
+
133
+ >> q
134
+
135
+ ```
136
+
137
+ ### Inference with M.2 Accelerator card
138
+
139
+ [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
140
+
141
+ ```
142
+ (base) axera@raspberrypi:~/samples/smollm2-360m $ ./run_smollm2_360m_axcl_aarch64.sh
143
+ build time: Feb 13 2025 15:44:57
144
+ [I][ Init][ 111]: LLM init start
145
+ bos_id: 1, eos_id: 2
146
+ 100% | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆοΏ½οΏ½οΏ½β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 35 / 35 [18.07s<18.07s, 1.94 count/s] init post axmodel okremain_cmm(6621 MB)
147
+ [I][ Init][ 226]: max_token_len : 1023
148
+ [I][ Init][ 231]: kv_cache_size : 320, kv_cache_num: 1023
149
+ [I][ load_config][ 282]: load config:
150
+ {
151
+ "enable_repetition_penalty": false,
152
+ "enable_temperature": true,
153
+ "enable_top_k_sampling": true,
154
+ "enable_top_p_sampling": false,
155
+ "penalty_window": 20,
156
+ "repetition_penalty": 1.2,
157
+ "temperature": 0.9,
158
+ "top_k": 10,
159
+ "top_p": 0.8
160
+ }
161
+
162
+ [I][ Init][ 288]: LLM init ok
163
+ Type "q" to exit, Ctrl+c to stop current running
164
+ >> who are you?
165
+
166
+ I'm a virtual AI assistant, designed to support users with their questions and tasks.
167
+ I was trained on a vast dataset of text, including text from various sources and
168
+ conversations. This extensive training allows me to understand and respond to a wide range of queries.
169
+ I'm here to be helpful and provide answers to your questions.
170
+
171
+ [N][ Run][ 610]: hit eos,avg 20.81 token/s
172
+
173
+
174
+ >> ^Cq
175
+
176
+ (base) axera@raspberrypi:~ $ axcl-smi
177
+ +------------------------------------------------------------------------------------------------+
178
+ | AXCL-SMI V2.26.0_20250205130139 Driver V2.26.0_20250205130139 |
179
+ +-----------------------------------------+--------------+---------------------------------------+
180
+ | Card Name Firmware | Bus-Id | Memory-Usage |
181
+ | Fan Temp Pwr:Usage/Cap | CPU NPU | CMM-Usage |
182
+ |=========================================+==============+=======================================|
183
+ | 0 AX650N V2.26.0 | 0000:01:00.0 | 171 MiB / 945 MiB |
184
+ | -- 39C -- / -- | 2% 0% | 468 MiB / 7040 MiB |
185
+ +-----------------------------------------+--------------+---------------------------------------+
186
+
187
+ +------------------------------------------------------------------------------------------------+
188
+ | Processes: |
189
+ | Card PID Process Name NPU Memory Usage |
190
+ |================================================================================================|
191
+ | 0 18636 /home/axera/qtang/llm-test/smollm2-360m/main_axcl_aarch64 418580 KiB |
192
+ +------------------------------------------------------------------------------------------------+
193
+ (base) axera@raspberrypi:~ $
194
+ ```