Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
- Int4
|
12 |
---
|
13 |
|
14 |
-
# Qwen2.5-
|
15 |
|
16 |
This version of Qwen2.5-3B-Instruct-GPTQ-Int4 has been converted to run on the Axera NPU using **w4a16** quantization.
|
17 |
|
@@ -76,10 +76,10 @@ http://localhost:12345
|
|
76 |
|
77 |
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
|
78 |
|
79 |
-
Open another terminal and run `run_qwen2.
|
80 |
|
81 |
```
|
82 |
-
root@ax650:/mnt/qtang/llm-test/qwen2.5-
|
83 |
[I][ Init][ 125]: LLM init start
|
84 |
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
|
85 |
100% | ββββββββββββββββββββββββββββββββ | 39 / 39 [19.30s<19.30s, 2.02 count/s] init post axmodel ok,remain_cmm(1811 MB)
|
|
|
11 |
- Int4
|
12 |
---
|
13 |
|
14 |
+
# Qwen2.5-3B-Instruct-GPTQ-Int4
|
15 |
|
16 |
This version of Qwen2.5-3B-Instruct-GPTQ-Int4 has been converted to run on the Axera NPU using **w4a16** quantization.
|
17 |
|
|
|
76 |
|
77 |
#### Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
|
78 |
|
79 |
+
Open another terminal and run `run_qwen2.5_3b_gptq_int4_ax650.sh`
|
80 |
|
81 |
```
|
82 |
+
root@ax650:/mnt/qtang/llm-test/qwen2.5-3b# ./run_qwen2.5_3b_gptq_int4_ax650.sh
|
83 |
[I][ Init][ 125]: LLM init start
|
84 |
[I][ Init][ 26]: LLaMaEmbedSelector use mmap
|
85 |
100% | ββββββββββββββββββββββββββββββββ | 39 / 39 [19.30s<19.30s, 2.02 count/s] init post axmodel ok,remain_cmm(1811 MB)
|