File size: 11,669 Bytes
bede825 56e4588 bede825 56e4588 bede825 f594d72 bede825 059901a bede825 9ad3f73 bede825 56e4588 bede825 56e4588 7c96b99 56e4588 bede825 56e4588 bede825 56e4588 bede825 56e4588 bede825 56e4588 bede825 56e4588 bede825 9ad3f73 56e4588 f594d72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
---
library_name: transformers
license: bsd-3-clause
base_model:
- OpenGVLab/InternVL2_5-1B
- OpenGVLab/InternVL2_5-1B-MPO
tags:
- InternVL2_5
- InternVL2_5-1B
- InternVL2_5-1B-MPO
- Int8
- VLM
pipeline_tag: image-text-to-text
---
# InternVL2_5-1B-Int8
This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.3
## Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo :
https://huggingface.co/OpenGVLab/InternVL2_5-1B
[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
[AXera NPU HOST LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/intervl2_pulsarbuild)
[AXera NPU AXCL LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/axcl-intervl2.5-1b-pulsarbuild)
## Support Platform
- AX650
- AX650N DEMO Board
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
- [η±θ―ζ΄Ύ2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
|Chips|image encoder 448|ttft|w8a16|
|--|--|--|--|
|AX650| 350 ms | 420 ms |32 tokens/sec|
|Chips|image encoder 364|ttft|w8a16|
|--|--|--|--|
|AX630C| 1120 ms | 1150 ms |11 tokens/sec|
## How to use
Download all files from this repository to the device
**If you using AX650 Board**
```
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# tree -L 1
.
|-- config.json
|-- internvl2_5_1b_448_ax650
|-- internvl2_5_tokenizer
|-- internvl2_5_tokenizer_448.py
|-- main_internvl2_5_448_prefill
|-- run_internvl2_5_448_ax650.sh
`-- ssd_car.jpg
```
**If you using AX630C Board**
```
root@ax630c:/mnt/qtang/llm-test/internvl2_5-1b-mpo# tree -L 1
.
|-- config.json
|-- internvl2_5_1b_364_ax630c
|-- internvl2_5_tokenizer
|-- internvl2_5_tokenizer_364.py
|-- main
`-- run_internvl2_5_364_ax630c.sh
```
#### Install transformer
```
pip install transformers==4.41.1
```
#### Start the Tokenizer service
**If you using AX650 Board**
```
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# python3 internvl2_5_tokenizer_448.py --port 12345
None None 151645 <|im_end|>
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
......
151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
310
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
47
http://localhost:12345
```
**If you using AX630C Board**
```
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# python internvl2_5_tokenizer_364.py --port 12345
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
None None 151645 <|im_end|>
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
......
151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151666, 198, 5501, 7512,
279, 2168, 19620, 13, 151645, 151644, 77091, 198]
223
[151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140,
53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692,
100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
47
http://localhost:12345
```
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
- input text
```
Describe the picture
```
- input image

Open another terminal and run `./run_internvl2_5_448_ax650.sh`
```
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# ./run_internvl2_5_448_ax650.sh
[I][ Init][ 127]: LLM init start
bos_id: -1, eos_id: 151645
3% | ββ | 1 / 28 [0.01s<0.14s, 200.00 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | ββββββββββββββββββββββββββββββββ | 28 / 28 [1.42s<1.42s, 19.66 count/s] init vpm axmodel ok,remain_cmm(2859 MB)
[I][ Init][ 275]: max_token_len : 1023
[I][ Init][ 280]: kv_cache_size : 128, kv_cache_num: 1023
[I][ Init][ 288]: prefill_token_num : 320
[I][ Init][ 290]: vpm_height : 448,vpm_width : 448
[I][ Init][ 299]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> ssd_car.jpg
[I][ Encode][ 358]: image encode time : 362.987000 ms, size : 229376
[I][ Run][ 569]: ttft: 426.75 ms
The image depicts a scene on a city street with a prominent red double-decker bus in the background.
The bus is adorned with an advertisement that reads, "THINGS GET MORE EXCITING WHEN YOU SAY YES."
The bus is traveling on a road with a white bicycle lane marked on it. The street is lined with buildings,
and there is a black car parked on the side of the road. A woman is standing in the foreground, smiling at the camera.
She is wearing a black jacket and a scarf. The overall atmosphere suggests a typical urban setting,
possibly in a city known for its iconic double-decker buses.
[N][ Run][ 708]: hit eos,avg 31.90 token/s
prompt >> q
root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B#
```
#### Inference with M.2 Accelerator card
[What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
- input text
```
Describe the picture
```
- input image

Open another terminal and run `./run_internvl2_5_448_axcl_aarch64.sh`
```
(base) axera@raspberrypi:~/samples/InternVL2_5-1B $ ./run_internvl2_5_448_axcl_aarch64.sh
[I][ Init][ 128]: LLM init start
[I][ Init][ 321]: connect http://127.0.0.1:12345 ok
bos_id: -1, eos_id: 151645
7% | βββ | 2 / 27 [0.13s<1.73s, 15.62 count/s] embed_selector init ok
[I][ run][ 30]: AXCLWorker start with devid 0
100% | ββββββββββββββββββββββββββββββββ | 27 / 27 [8.02s<8.02s, 3.37 count/s] init post axmodel ok,remain_cmm(-1 MB)m(-1 MB)
[I][ Init][ 225]: image_encoder_height : 448, image_encoder_width: 448
[I][ Init][ 227]: max_token_len : 1023
[I][ Init][ 230]: kv_cache_size : 128, kv_cache_num: 1023
[I][ Init][ 238]: prefill_token_num : 320
[I][ Init][ 240]: prefill_max_token_num : 320
________________________
| ID| remain cmm(MB)|
========================
| 0| -1|
Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―
[E][ load_config][ 278]: config file(post_config.json) open failed
[W][ Init][ 333]: load postprocess config(post_config.json) failed
[I][ Init][ 337]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Describe the picture
image >> ssd_car.jpg
[I][ Encode][ 393]: image encode time : 361.53 ms, size : 229376
[I][ Encode][ 453]: offset : 42 out_embed.size() : 275072
[I][ Run][ 481]: input token num : 307, prefill_split_num : 1
[I][ Run][ 604]: ttft: 506.51 ms
The image depicts a scene on a city street with a prominent red double-decker bus in the background.
The bus is adorned with an advertisement that reads, "THINGS GET MORE EXCITING WHEN YOU SAY YES."
The bus is traveling on a road with a white bicycle lane marked on it. The street is lined with buildings,
and there is a black car parked on the side of the road. A woman is standing in the foreground,
smiling at the camera.She is wearing a black jacket and a scarf. The overall atmosphere suggests a typical urban setting,
possibly in a city known for its iconic double-decker buses.
[N][ Run][ 756]: hit eos,avg 20.50 token/s
prompt >> q
[I][ run][ 80]: AXCLWorker exit with devid 0
(base) axera@raspberrypi:~/samples/InternVL2_5-1B $
```
#### Inference with AX630C Host, such as η±θ―ζ΄Ύ2, Module-LLM, LLM630 Compute Kit and AX630C DEMO Board
- input text
```
Describe the picture
```
- input image

Open another terminal and run `./run_internvl2_5_364_ax630c.sh`
```
/mnt/qtang/llm-test/internvl2_5-1b-mpo # ./run_internvl2_5_364_ax630c.sh
[I][ Init][ 106]: LLM init start
bos_id: -1, eos_id: 151645
3% | ββ | 1 / 28 [0.01s<0.14s, 200.00 count/s] tokenizer init ok
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
100% | ββββββββββββββββββββββββββββββββ | 28 / 28 [9.48s<9.48s, 2.95 count/s] init vpm axmodel ok,remain_cmm(905 MB)
[I][ Init][ 254]: max_token_len : 1023
[I][ Init][ 259]: kv_cache_size : 128, kv_cache_num: 1023
[I][ Init][ 267]: prefill_token_num : 256
[I][ Init][ 269]: vpm_height : 364,vpm_width : 364
[I][ Init][ 278]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> Please describe the image
image >> panda.jpg
[I][ Encode][ 337]: image encode time : 1156.637939 ms, size : 151424
[I][ Run][ 548]: ttft: 1120.15 ms
The image features a red panda in a natural setting, likely in a zoo or a forested area.
The red panda has distinctive reddish-brown fur with white markings around its eyes and ears.
It is leaning on a wooden structure, possibly a platform or a log, with a background of green foliage.
The red panda appears to be looking directly at the camera with a calm expression.
[N][ Run][ 687]: hit eos,avg 10.94 token/s
``` |