Spaces:
Sleeping
Sleeping
Delete README_sycl.md
Browse files- README_sycl.md +0 -249
README_sycl.md
DELETED
@@ -1,249 +0,0 @@
|
|
1 |
-
# whisper.cpp for SYCL
|
2 |
-
|
3 |
-
[Background](#background)
|
4 |
-
|
5 |
-
[OS](#os)
|
6 |
-
|
7 |
-
[Intel GPU](#intel-gpu)
|
8 |
-
|
9 |
-
[Linux](#linux)
|
10 |
-
|
11 |
-
[Environment Variable](#environment-variable)
|
12 |
-
|
13 |
-
[Known Issue](#known-issue)
|
14 |
-
|
15 |
-
[Todo](#todo)
|
16 |
-
|
17 |
-
## Background
|
18 |
-
|
19 |
-
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators�such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17.
|
20 |
-
|
21 |
-
oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms.
|
22 |
-
|
23 |
-
Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs.
|
24 |
-
|
25 |
-
To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel� DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL.
|
26 |
-
|
27 |
-
The whisper.cpp for SYCL is used to support Intel GPUs.
|
28 |
-
|
29 |
-
For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build).
|
30 |
-
|
31 |
-
## OS
|
32 |
-
|
33 |
-
|OS|Status|Verified|
|
34 |
-
|-|-|-|
|
35 |
-
|Linux|Support|Ubuntu 22.04|
|
36 |
-
|Windows|Ongoing| |
|
37 |
-
|
38 |
-
|
39 |
-
## Intel GPU
|
40 |
-
|
41 |
-
|Intel GPU| Status | Verified Model|
|
42 |
-
|-|-|-|
|
43 |
-
|Intel Data Center Max Series| Support| Max 1550|
|
44 |
-
|Intel Data Center Flex Series| Support| Flex 170|
|
45 |
-
|Intel Arc Series| Support| Arc 770|
|
46 |
-
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
|
47 |
-
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
|
48 |
-
|
49 |
-
|
50 |
-
## Linux
|
51 |
-
|
52 |
-
### Setup Environment
|
53 |
-
|
54 |
-
1. Install Intel GPU driver.
|
55 |
-
|
56 |
-
a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html).
|
57 |
-
|
58 |
-
Note: for iGPU, please install the client GPU driver.
|
59 |
-
|
60 |
-
b. Add user to group: video, render.
|
61 |
-
|
62 |
-
```
|
63 |
-
sudo usermod -aG render username
|
64 |
-
sudo usermod -aG video username
|
65 |
-
```
|
66 |
-
|
67 |
-
Note: re-login to enable it.
|
68 |
-
|
69 |
-
c. Check
|
70 |
-
|
71 |
-
```
|
72 |
-
sudo apt install clinfo
|
73 |
-
sudo clinfo -l
|
74 |
-
```
|
75 |
-
|
76 |
-
Output (example):
|
77 |
-
|
78 |
-
```
|
79 |
-
Platform #0: Intel(R) OpenCL Graphics
|
80 |
-
`-- Device #0: Intel(R) Arc(TM) A770 Graphics
|
81 |
-
|
82 |
-
|
83 |
-
Platform #0: Intel(R) OpenCL HD Graphics
|
84 |
-
`-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49]
|
85 |
-
```
|
86 |
-
|
87 |
-
2. Install Intel� oneAPI Base toolkit.
|
88 |
-
|
89 |
-
|
90 |
-
a. Please follow the procedure in [Get the Intel� oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
|
91 |
-
|
92 |
-
Recommend to install to default folder: **/opt/intel/oneapi**.
|
93 |
-
|
94 |
-
Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder.
|
95 |
-
|
96 |
-
b. Check
|
97 |
-
|
98 |
-
```
|
99 |
-
source /opt/intel/oneapi/setvars.sh
|
100 |
-
|
101 |
-
sycl-ls
|
102 |
-
```
|
103 |
-
|
104 |
-
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
|
105 |
-
|
106 |
-
Output (example):
|
107 |
-
```
|
108 |
-
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
|
109 |
-
[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
|
110 |
-
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
|
111 |
-
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
|
112 |
-
|
113 |
-
```
|
114 |
-
|
115 |
-
2. Build locally:
|
116 |
-
|
117 |
-
```
|
118 |
-
mkdir -p build
|
119 |
-
cd build
|
120 |
-
source /opt/intel/oneapi/setvars.sh
|
121 |
-
|
122 |
-
#for FP16
|
123 |
-
#cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON
|
124 |
-
|
125 |
-
#for FP32
|
126 |
-
cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
|
127 |
-
|
128 |
-
#build example/main only
|
129 |
-
#cmake --build . --config Release --target main
|
130 |
-
|
131 |
-
#build all binary
|
132 |
-
cmake --build . --config Release -v
|
133 |
-
|
134 |
-
```
|
135 |
-
|
136 |
-
or
|
137 |
-
|
138 |
-
```
|
139 |
-
./examples/sycl/build.sh
|
140 |
-
```
|
141 |
-
|
142 |
-
Note:
|
143 |
-
|
144 |
-
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
|
145 |
-
|
146 |
-
### Run
|
147 |
-
|
148 |
-
1. Put model file to folder **models**
|
149 |
-
|
150 |
-
2. Enable oneAPI running environment
|
151 |
-
|
152 |
-
```
|
153 |
-
source /opt/intel/oneapi/setvars.sh
|
154 |
-
```
|
155 |
-
|
156 |
-
3. List device ID
|
157 |
-
|
158 |
-
Run without parameter:
|
159 |
-
|
160 |
-
```
|
161 |
-
./build/bin/ls-sycl-device
|
162 |
-
|
163 |
-
or
|
164 |
-
|
165 |
-
./build/bin/main
|
166 |
-
```
|
167 |
-
|
168 |
-
Check the ID in startup log, like:
|
169 |
-
|
170 |
-
```
|
171 |
-
found 4 SYCL devices:
|
172 |
-
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
|
173 |
-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|
174 |
-
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
|
175 |
-
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
|
176 |
-
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
|
177 |
-
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
|
178 |
-
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
|
179 |
-
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
|
180 |
-
|
181 |
-
```
|
182 |
-
|
183 |
-
|Attribute|Note|
|
184 |
-
|-|-|
|
185 |
-
|compute capability 1.3|Level-zero running time, recommended |
|
186 |
-
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
|
187 |
-
|
188 |
-
4. Set device ID and execute whisper.cpp
|
189 |
-
|
190 |
-
Set device ID = 0 by **GGML_SYCL_DEVICE=0**
|
191 |
-
|
192 |
-
```
|
193 |
-
GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav
|
194 |
-
```
|
195 |
-
or run by script:
|
196 |
-
|
197 |
-
```
|
198 |
-
./examples/sycl/run_whisper.sh
|
199 |
-
```
|
200 |
-
|
201 |
-
|
202 |
-
|
203 |
-
5. Check the device ID in output
|
204 |
-
|
205 |
-
Like:
|
206 |
-
```
|
207 |
-
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|
208 |
-
```
|
209 |
-
|
210 |
-
|
211 |
-
## Environment Variable
|
212 |
-
|
213 |
-
#### Build
|
214 |
-
|
215 |
-
|Name|Value|Function|
|
216 |
-
|-|-|-|
|
217 |
-
|WHISPER_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, WHISPER_SYCL=ON is mandatory.|
|
218 |
-
|WHISPER_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path.For FP32, do not set it.|
|
219 |
-
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
|
220 |
-
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
|
221 |
-
|
222 |
-
#### Running
|
223 |
-
|
224 |
-
|
225 |
-
|Name|Value|Function|
|
226 |
-
|-|-|-|
|
227 |
-
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
|
228 |
-
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
|
229 |
-
|
230 |
-
## Known Issue
|
231 |
-
|
232 |
-
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
|
233 |
-
|
234 |
-
Miss to enable oneAPI running environment.
|
235 |
-
|
236 |
-
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
|
237 |
-
|
238 |
-
|
239 |
-
- Hang during startup
|
240 |
-
|
241 |
-
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
|
242 |
-
|
243 |
-
Solution: add **--no-mmap**.
|
244 |
-
|
245 |
-
## Todo
|
246 |
-
|
247 |
-
- Support to build in Windows.
|
248 |
-
|
249 |
-
- Support multiple cards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|