writinwaters commited on
Commit
a5da72c
·
1 Parent(s): c60e2e3

Updated deploy a local llm using IPEX-LLM (#1578)

Browse files

### What problem does this PR solve?



### Type of change


- [x] Documentation Update

README.md CHANGED
@@ -63,7 +63,7 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
63
  </div>
64
 
65
 
66
- ## 📌 Latest Updates
67
 
68
  - 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
69
  - 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
 
63
  </div>
64
 
65
 
66
+ ## 🔥 Latest Updates
67
 
68
  - 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
69
  - 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
README_ja.md CHANGED
@@ -45,7 +45,7 @@
45
  </div>
46
 
47
 
48
- ## 📌 最新情報
49
  - 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
50
  - 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
51
  - 2024-06-27 Docxファイルからの画像の抽出をサポートします。
 
45
  </div>
46
 
47
 
48
+ ## 🔥 最新情報
49
  - 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
50
  - 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
51
  - 2024-06-27 Docxファイルからの画像の抽出をサポートします。
README_zh.md CHANGED
@@ -44,7 +44,7 @@
44
  </div>
45
 
46
 
47
- ## 📌 近期更新
48
 
49
  - 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
50
  - 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。
 
44
  </div>
45
 
46
 
47
+ ## 🔥 近期更新
48
 
49
  - 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
50
  - 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。
docs/guides/{deploy_local_llm.md → deploy_local_llm.mdx} RENAMED
@@ -4,6 +4,8 @@ slug: /deploy_local_llm
4
  ---
5
 
6
  # Deploy a local LLM
 
 
7
 
8
  RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
9
 
@@ -108,7 +110,7 @@ Update your chat model accordingly in **Chat Configuration**:
108
 
109
  ## Deploy a local model using Xinference
110
 
111
- Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
112
 
113
  :::note
114
  - For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).
@@ -129,8 +131,8 @@ $ xinference-local --host 0.0.0.0 --port 9997
129
 
130
  ### 3. Launch your local model
131
 
132
- Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method
133
- :
134
  ```bash
135
  $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
136
  ```
@@ -161,9 +163,9 @@ Update your chat model accordingly in **Chat Configuration**:
161
 
162
  ## Deploy a local model using IPEX-LLM
163
 
164
- IPEX-LLM([IPEX-LLM](https://github.com/intel-analytics/ipex-llm)) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency
165
 
166
- To deploy a local model, eg., **Qwen2**, using IPEX-LLM, follow the steps below:
167
 
168
  ### 1. Check firewall settings
169
 
@@ -173,46 +175,69 @@ Ensure that your host machine's firewall allows inbound connections on port 1143
173
  sudo ufw allow 11434/tcp
174
  ```
175
 
176
- ### 2. Install and Start Ollama serve using IPEX-LLM
177
 
178
  #### 2.1 Install IPEX-LLM for Ollama
179
 
180
- IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
 
 
181
 
182
- Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md), and follow the instructions in section [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites) to setup and section [Install IPEX-LLM cpp](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp) to install the IPEX-LLM with Ollama binaries.
 
 
183
 
184
- **After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `ollama` commands with IPEX-LLM.**
185
 
186
  #### 2.2 Initialize Ollama
187
 
188
- Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
189
 
190
- - For **Linux users**:
 
 
 
 
 
 
191
 
192
  ```bash
193
  conda activate llm-cpp
194
  init-ollama
195
  ```
 
 
196
 
197
- - For **Windows users**:
198
-
199
- Please run the following command with **administrator privilege in Miniforge Prompt**.
200
 
201
  ```cmd
202
  conda activate llm-cpp
203
  init-ollama.bat
204
  ```
 
 
205
 
206
- > [!NOTE]
207
- > If you have installed higher version `ipex-llm[cpp]` and want to upgrade your ollama binary file, don't forget to remove old binary files first and initialize again with `init-ollama` or `init-ollama.bat`.
 
208
 
209
- **Now you can use this executable file by standard ollama's usage.**
210
 
211
- #### 2.3 Run Ollama Serve
 
212
 
213
- You may launch the Ollama service as below:
 
 
 
214
 
215
- - For **Linux users**:
 
 
 
 
 
 
216
 
217
  ```bash
218
  export OLLAMA_NUM_GPU=999
@@ -224,9 +249,10 @@ You may launch the Ollama service as below:
224
  ./ollama serve
225
  ```
226
 
227
- - For **Windows users**:
 
228
 
229
- Please run the following command in Miniforge Prompt.
230
 
231
  ```cmd
232
  set OLLAMA_NUM_GPU=999
@@ -236,49 +262,54 @@ You may launch the Ollama service as below:
236
 
237
  ollama serve
238
  ```
 
 
239
 
240
 
241
- > Please set environment variable `OLLAMA_NUM_GPU` to `999` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
242
-
243
-
244
- > If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing `ollama serve`:
245
- >
246
- > ```bash
247
- > export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
248
- > ```
249
-
250
-
251
- > To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
252
 
253
- The console will display messages similar to the following:
254
 
255
  ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png)
256
 
257
- ### 3. Pull and Run Ollama Model
 
 
258
 
259
- Keep the Ollama service on and open another terminal and run `./ollama pull <model_name>` in Linux (`ollama.exe pull <model_name>` in Windows) to automatically pull a model. e.g. `qwen2:latest`:
260
 
261
  ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png)
262
 
263
- #### Run Ollama Model
 
 
 
 
 
 
 
 
264
 
265
- - For **Linux users**:
266
  ```bash
267
  ./ollama run qwen2:latest
268
  ```
269
-
270
- - For **Windows users**:
 
271
  ```cmd
272
  ollama run qwen2:latest
273
  ```
274
- ### 4. Configure RAGflow to use IPEX-LLM accelerated Ollama
275
-
276
- The confiugraiton follows the steps in
277
 
278
- Ollama Section 4 [Add Ollama](#4-add-ollama),
 
279
 
280
- Section 5 [Complete basic Ollama settings](#5-complete-basic-ollama-settings),
281
 
282
- Section 6 [Update System Model Settings](#6-update-system-model-settings),
283
 
284
- Section 7 [Update Chat Configuration](#7-update-chat-configuration)
 
 
 
 
4
  ---
5
 
6
  # Deploy a local LLM
7
+ import Tabs from '@theme/Tabs';
8
+ import TabItem from '@theme/TabItem';
9
 
10
  RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
11
 
 
110
 
111
  ## Deploy a local model using Xinference
112
 
113
+ Xorbits Inference ([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
114
 
115
  :::note
116
  - For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).
 
131
 
132
  ### 3. Launch your local model
133
 
134
+ Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method:
135
+
136
  ```bash
137
  $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
138
  ```
 
163
 
164
  ## Deploy a local model using IPEX-LLM
165
 
166
+ [IPEX-LLM](https://github.com/intel-analytics/ipex-llm) is a PyTorch library for running LLMs on local Intel CPUs or GPUs (including iGPU or discrete GPUs like Arc, Flex, and Max) with low latency. It supports Ollama on Linux and Windows systems.
167
 
168
+ To deploy a local model, e.g., **Qwen2**, using IPEX-LLM-accelerated Ollama:
169
 
170
  ### 1. Check firewall settings
171
 
 
175
  sudo ufw allow 11434/tcp
176
  ```
177
 
178
+ ### 2. Launch Ollama service using IPEX-LLM
179
 
180
  #### 2.1 Install IPEX-LLM for Ollama
181
 
182
+ :::tip NOTE
183
+ IPEX-LLM's supports Ollama on Linux and Windows systems.
184
+ :::
185
 
186
+ For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md):
187
+ - [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites)
188
+ - [Install IPEX-LLM cpp with Ollama binaries](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp)
189
 
190
+ *After the installation, you should have created a Conda environment, e.g., `llm-cpp`, for running Ollama commands with IPEX-LLM.*
191
 
192
  #### 2.2 Initialize Ollama
193
 
194
+ 1. Activate the `llm-cpp` Conda environment and initialize Ollama:
195
 
196
+ <Tabs
197
+ defaultValue="linux"
198
+ values={[
199
+ {label: 'Linux', value: 'linux'},
200
+ {label: 'Windows', value: 'windows'},
201
+ ]}>
202
+ <TabItem value="linux">
203
 
204
  ```bash
205
  conda activate llm-cpp
206
  init-ollama
207
  ```
208
+ </TabItem>
209
+ <TabItem value="windows">
210
 
211
+ Run these commands with *administrator privileges in Miniforge Prompt*:
 
 
212
 
213
  ```cmd
214
  conda activate llm-cpp
215
  init-ollama.bat
216
  ```
217
+ </TabItem>
218
+ </Tabs>
219
 
220
+ 2. If the installed `ipex-llm[cpp]` requires an upgrade to the Ollama binary files, remove the old binary files and reinitialize Ollama using `init-ollama` (Linux) or `init-ollama.bat` (Windows).
221
+
222
+ *A symbolic link to Ollama appears in your current directory, and you can use this executable file following standard Ollama commands.*
223
 
224
+ #### 2.3 Launch Ollama service
225
 
226
+ 1. Set the environment variable `OLLAMA_NUM_GPU` to `999` to ensure that all layers of your model run on the Intel GPU; otherwise, some layers may default to CPU.
227
+ 2. For optimal performance on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), set the following environment variable before launching the Ollama service:
228
 
229
+ ```bash
230
+ export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
231
+ ```
232
+ 3. Launch the Ollama service:
233
 
234
+ <Tabs
235
+ defaultValue="linux"
236
+ values={[
237
+ {label: 'Linux', value: 'linux'},
238
+ {label: 'Windows', value: 'windows'},
239
+ ]}>
240
+ <TabItem value="linux">
241
 
242
  ```bash
243
  export OLLAMA_NUM_GPU=999
 
249
  ./ollama serve
250
  ```
251
 
252
+ </TabItem>
253
+ <TabItem value="windows">
254
 
255
+ Run the following command *in Miniforge Prompt*:
256
 
257
  ```cmd
258
  set OLLAMA_NUM_GPU=999
 
262
 
263
  ollama serve
264
  ```
265
+ </TabItem>
266
+ </Tabs>
267
 
268
 
269
+ :::tip NOTE
270
+ To enable the Ollama service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` rather than simply `./ollama serve`.
271
+ :::
 
 
 
 
 
 
 
 
272
 
273
+ *The console displays messages similar to the following:*
274
 
275
  ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png)
276
 
277
+ ### 3. Pull and Run Ollama model
278
+
279
+ #### 3.1 Pull Ollama model
280
 
281
+ With the Ollama service running, open a new terminal and run `./ollama pull <model_name>` (Linux) or `ollama.exe pull <model_name>` (Windows) to pull the desired model. e.g., `qwen2:latest`:
282
 
283
  ![](https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png)
284
 
285
+ #### 3.2 Run Ollama model
286
+
287
+ <Tabs
288
+ defaultValue="linux"
289
+ values={[
290
+ {label: 'Linux', value: 'linux'},
291
+ {label: 'Windows', value: 'windows'},
292
+ ]}>
293
+ <TabItem value="linux">
294
 
 
295
  ```bash
296
  ./ollama run qwen2:latest
297
  ```
298
+ </TabItem>
299
+ <TabItem value="windows">
300
+
301
  ```cmd
302
  ollama run qwen2:latest
303
  ```
 
 
 
304
 
305
+ </TabItem>
306
+ </Tabs>
307
 
308
+ ### 4. Configure RAGflow
309
 
310
+ To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the configurations in RAGFlow. The steps are identical to those outlined in the *Deploy a local model using Ollama* section:
311
 
312
+ 1. [Add Ollama](#4-add-ollama)
313
+ 2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
314
+ 3. [Update System Model Settings](#6-update-system-model-settings)
315
+ 4. [Update Chat Configuration](#7-update-chat-configuration)
docs/guides/manage_files.md CHANGED
@@ -43,7 +43,7 @@ You can link your file to one knowledge base or multiple knowledge bases at one
43
 
44
  ![link multiple kb](https://github.com/infiniflow/ragflow/assets/93570324/6c508803-fb1f-435d-b688-683066fd7fff)
45
 
46
- ## Move file to specified folder
47
 
48
  As of RAGFlow v0.8.0, this feature is *not* available.
49
 
 
43
 
44
  ![link multiple kb](https://github.com/infiniflow/ragflow/assets/93570324/6c508803-fb1f-435d-b688-683066fd7fff)
45
 
46
+ ## Move file to a specific folder
47
 
48
  As of RAGFlow v0.8.0, this feature is *not* available.
49