writinwaters
commited on
Commit
·
a5da72c
1
Parent(s):
c60e2e3
Updated deploy a local llm using IPEX-LLM (#1578)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- README.md +1 -1
- README_ja.md +1 -1
- README_zh.md +1 -1
- docs/guides/{deploy_local_llm.md → deploy_local_llm.mdx} +78 -47
- docs/guides/manage_files.md +1 -1
README.md
CHANGED
@@ -63,7 +63,7 @@ Try our demo at [https://demo.ragflow.io](https://demo.ragflow.io).
|
|
63 |
</div>
|
64 |
|
65 |
|
66 |
-
##
|
67 |
|
68 |
- 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
|
69 |
- 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
|
|
|
63 |
</div>
|
64 |
|
65 |
|
66 |
+
## 🔥 Latest Updates
|
67 |
|
68 |
- 2024-07-08 Supports workflow based on [Graph](./graph/README.md).
|
69 |
- 2024-06-27 Supports Markdown and Docx in the Q&A parsing method.
|
README_ja.md
CHANGED
@@ -45,7 +45,7 @@
|
|
45 |
</div>
|
46 |
|
47 |
|
48 |
-
##
|
49 |
- 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
|
50 |
- 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
|
51 |
- 2024-06-27 Docxファイルからの画像の抽出をサポートします。
|
|
|
45 |
</div>
|
46 |
|
47 |
|
48 |
+
## 🔥 最新情報
|
49 |
- 2024-07-08 [Graph](./graph/README.md) ベースのワークフローをサポート
|
50 |
- 2024-06-27 Q&A解析方式はMarkdownファイルとDocxファイルをサポートしています。
|
51 |
- 2024-06-27 Docxファイルからの画像の抽出をサポートします。
|
README_zh.md
CHANGED
@@ -44,7 +44,7 @@
|
|
44 |
</div>
|
45 |
|
46 |
|
47 |
-
##
|
48 |
|
49 |
- 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
|
50 |
- 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。
|
|
|
44 |
</div>
|
45 |
|
46 |
|
47 |
+
## 🔥 近期更新
|
48 |
|
49 |
- 2024-07-08 支持 Agentic RAG: 基于 [Graph](./graph/README.md) 的工作流。
|
50 |
- 2024-06-27 Q&A 解析方式支持 Markdown 文件和 Docx 文件。
|
docs/guides/{deploy_local_llm.md → deploy_local_llm.mdx}
RENAMED
@@ -4,6 +4,8 @@ slug: /deploy_local_llm
|
|
4 |
---
|
5 |
|
6 |
# Deploy a local LLM
|
|
|
|
|
7 |
|
8 |
RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
|
9 |
|
@@ -108,7 +110,7 @@ Update your chat model accordingly in **Chat Configuration**:
|
|
108 |
|
109 |
## Deploy a local model using Xinference
|
110 |
|
111 |
-
Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
|
112 |
|
113 |
:::note
|
114 |
- For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).
|
@@ -129,8 +131,8 @@ $ xinference-local --host 0.0.0.0 --port 9997
|
|
129 |
|
130 |
### 3. Launch your local model
|
131 |
|
132 |
-
Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method
|
133 |
-
|
134 |
```bash
|
135 |
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
|
136 |
```
|
@@ -161,9 +163,9 @@ Update your chat model accordingly in **Chat Configuration**:
|
|
161 |
|
162 |
## Deploy a local model using IPEX-LLM
|
163 |
|
164 |
-
|
165 |
|
166 |
-
To deploy a local model,
|
167 |
|
168 |
### 1. Check firewall settings
|
169 |
|
@@ -173,46 +175,69 @@ Ensure that your host machine's firewall allows inbound connections on port 1143
|
|
173 |
sudo ufw allow 11434/tcp
|
174 |
```
|
175 |
|
176 |
-
### 2.
|
177 |
|
178 |
#### 2.1 Install IPEX-LLM for Ollama
|
179 |
|
180 |
-
|
|
|
|
|
181 |
|
182 |
-
|
|
|
|
|
183 |
|
184 |
-
|
185 |
|
186 |
#### 2.2 Initialize Ollama
|
187 |
|
188 |
-
Activate the `llm-cpp`
|
189 |
|
190 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
191 |
|
192 |
```bash
|
193 |
conda activate llm-cpp
|
194 |
init-ollama
|
195 |
```
|
|
|
|
|
196 |
|
197 |
-
|
198 |
-
|
199 |
-
Please run the following command with **administrator privilege in Miniforge Prompt**.
|
200 |
|
201 |
```cmd
|
202 |
conda activate llm-cpp
|
203 |
init-ollama.bat
|
204 |
```
|
|
|
|
|
205 |
|
206 |
-
|
207 |
-
|
|
|
208 |
|
209 |
-
|
210 |
|
211 |
-
|
|
|
212 |
|
213 |
-
|
|
|
|
|
|
|
214 |
|
215 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
216 |
|
217 |
```bash
|
218 |
export OLLAMA_NUM_GPU=999
|
@@ -224,9 +249,10 @@ You may launch the Ollama service as below:
|
|
224 |
./ollama serve
|
225 |
```
|
226 |
|
227 |
-
|
|
|
228 |
|
229 |
-
|
230 |
|
231 |
```cmd
|
232 |
set OLLAMA_NUM_GPU=999
|
@@ -236,49 +262,54 @@ You may launch the Ollama service as below:
|
|
236 |
|
237 |
ollama serve
|
238 |
```
|
|
|
|
|
239 |
|
240 |
|
241 |
-
|
242 |
-
|
243 |
-
|
244 |
-
> If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), it is recommended to additionaly set the following environment variable for optimal performance before executing `ollama serve`:
|
245 |
-
>
|
246 |
-
> ```bash
|
247 |
-
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
248 |
-
> ```
|
249 |
-
|
250 |
-
|
251 |
-
> To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
252 |
|
253 |
-
The console
|
254 |
|
255 |

|
256 |
|
257 |
-
### 3. Pull and Run Ollama
|
|
|
|
|
258 |
|
259 |
-
|
260 |
|
261 |

|
262 |
|
263 |
-
#### Run Ollama
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
264 |
|
265 |
-
- For **Linux users**:
|
266 |
```bash
|
267 |
./ollama run qwen2:latest
|
268 |
```
|
269 |
-
|
270 |
-
|
|
|
271 |
```cmd
|
272 |
ollama run qwen2:latest
|
273 |
```
|
274 |
-
### 4. Configure RAGflow to use IPEX-LLM accelerated Ollama
|
275 |
-
|
276 |
-
The confiugraiton follows the steps in
|
277 |
|
278 |
-
|
|
|
279 |
|
280 |
-
|
281 |
|
282 |
-
|
283 |
|
284 |
-
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Deploy a local LLM
|
7 |
+
import Tabs from '@theme/Tabs';
|
8 |
+
import TabItem from '@theme/TabItem';
|
9 |
|
10 |
RAGFlow supports deploying models locally using Ollama or Xinference. If you have locally deployed models to leverage or wish to enable GPU or CUDA for inference acceleration, you can bind Ollama or Xinference into RAGFlow and use either of them as a local "server" for interacting with your local models.
|
11 |
|
|
|
110 |
|
111 |
## Deploy a local model using Xinference
|
112 |
|
113 |
+
Xorbits Inference ([Xinference](https://github.com/xorbitsai/inference)) enables you to unleash the full potential of cutting-edge AI models.
|
114 |
|
115 |
:::note
|
116 |
- For information about installing Xinference Ollama, see [here](https://inference.readthedocs.io/en/latest/getting_started/).
|
|
|
131 |
|
132 |
### 3. Launch your local model
|
133 |
|
134 |
+
Launch your local model (**Mistral**), ensuring that you replace `${quantization}` with your chosen quantization method:
|
135 |
+
|
136 |
```bash
|
137 |
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
|
138 |
```
|
|
|
163 |
|
164 |
## Deploy a local model using IPEX-LLM
|
165 |
|
166 |
+
[IPEX-LLM](https://github.com/intel-analytics/ipex-llm) is a PyTorch library for running LLMs on local Intel CPUs or GPUs (including iGPU or discrete GPUs like Arc, Flex, and Max) with low latency. It supports Ollama on Linux and Windows systems.
|
167 |
|
168 |
+
To deploy a local model, e.g., **Qwen2**, using IPEX-LLM-accelerated Ollama:
|
169 |
|
170 |
### 1. Check firewall settings
|
171 |
|
|
|
175 |
sudo ufw allow 11434/tcp
|
176 |
```
|
177 |
|
178 |
+
### 2. Launch Ollama service using IPEX-LLM
|
179 |
|
180 |
#### 2.1 Install IPEX-LLM for Ollama
|
181 |
|
182 |
+
:::tip NOTE
|
183 |
+
IPEX-LLM's supports Ollama on Linux and Windows systems.
|
184 |
+
:::
|
185 |
|
186 |
+
For detailed information about installing IPEX-LLM for Ollama, see [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md):
|
187 |
+
- [Prerequisites](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#0-prerequisites)
|
188 |
+
- [Install IPEX-LLM cpp with Ollama binaries](https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/Quickstart/llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp)
|
189 |
|
190 |
+
*After the installation, you should have created a Conda environment, e.g., `llm-cpp`, for running Ollama commands with IPEX-LLM.*
|
191 |
|
192 |
#### 2.2 Initialize Ollama
|
193 |
|
194 |
+
1. Activate the `llm-cpp` Conda environment and initialize Ollama:
|
195 |
|
196 |
+
<Tabs
|
197 |
+
defaultValue="linux"
|
198 |
+
values={[
|
199 |
+
{label: 'Linux', value: 'linux'},
|
200 |
+
{label: 'Windows', value: 'windows'},
|
201 |
+
]}>
|
202 |
+
<TabItem value="linux">
|
203 |
|
204 |
```bash
|
205 |
conda activate llm-cpp
|
206 |
init-ollama
|
207 |
```
|
208 |
+
</TabItem>
|
209 |
+
<TabItem value="windows">
|
210 |
|
211 |
+
Run these commands with *administrator privileges in Miniforge Prompt*:
|
|
|
|
|
212 |
|
213 |
```cmd
|
214 |
conda activate llm-cpp
|
215 |
init-ollama.bat
|
216 |
```
|
217 |
+
</TabItem>
|
218 |
+
</Tabs>
|
219 |
|
220 |
+
2. If the installed `ipex-llm[cpp]` requires an upgrade to the Ollama binary files, remove the old binary files and reinitialize Ollama using `init-ollama` (Linux) or `init-ollama.bat` (Windows).
|
221 |
+
|
222 |
+
*A symbolic link to Ollama appears in your current directory, and you can use this executable file following standard Ollama commands.*
|
223 |
|
224 |
+
#### 2.3 Launch Ollama service
|
225 |
|
226 |
+
1. Set the environment variable `OLLAMA_NUM_GPU` to `999` to ensure that all layers of your model run on the Intel GPU; otherwise, some layers may default to CPU.
|
227 |
+
2. For optimal performance on Intel Arc™ A-Series Graphics with Linux OS (Kernel 6.2), set the following environment variable before launching the Ollama service:
|
228 |
|
229 |
+
```bash
|
230 |
+
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
231 |
+
```
|
232 |
+
3. Launch the Ollama service:
|
233 |
|
234 |
+
<Tabs
|
235 |
+
defaultValue="linux"
|
236 |
+
values={[
|
237 |
+
{label: 'Linux', value: 'linux'},
|
238 |
+
{label: 'Windows', value: 'windows'},
|
239 |
+
]}>
|
240 |
+
<TabItem value="linux">
|
241 |
|
242 |
```bash
|
243 |
export OLLAMA_NUM_GPU=999
|
|
|
249 |
./ollama serve
|
250 |
```
|
251 |
|
252 |
+
</TabItem>
|
253 |
+
<TabItem value="windows">
|
254 |
|
255 |
+
Run the following command *in Miniforge Prompt*:
|
256 |
|
257 |
```cmd
|
258 |
set OLLAMA_NUM_GPU=999
|
|
|
262 |
|
263 |
ollama serve
|
264 |
```
|
265 |
+
</TabItem>
|
266 |
+
</Tabs>
|
267 |
|
268 |
|
269 |
+
:::tip NOTE
|
270 |
+
To enable the Ollama service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` rather than simply `./ollama serve`.
|
271 |
+
:::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
272 |
|
273 |
+
*The console displays messages similar to the following:*
|
274 |
|
275 |

|
276 |
|
277 |
+
### 3. Pull and Run Ollama model
|
278 |
+
|
279 |
+
#### 3.1 Pull Ollama model
|
280 |
|
281 |
+
With the Ollama service running, open a new terminal and run `./ollama pull <model_name>` (Linux) or `ollama.exe pull <model_name>` (Windows) to pull the desired model. e.g., `qwen2:latest`:
|
282 |
|
283 |

|
284 |
|
285 |
+
#### 3.2 Run Ollama model
|
286 |
+
|
287 |
+
<Tabs
|
288 |
+
defaultValue="linux"
|
289 |
+
values={[
|
290 |
+
{label: 'Linux', value: 'linux'},
|
291 |
+
{label: 'Windows', value: 'windows'},
|
292 |
+
]}>
|
293 |
+
<TabItem value="linux">
|
294 |
|
|
|
295 |
```bash
|
296 |
./ollama run qwen2:latest
|
297 |
```
|
298 |
+
</TabItem>
|
299 |
+
<TabItem value="windows">
|
300 |
+
|
301 |
```cmd
|
302 |
ollama run qwen2:latest
|
303 |
```
|
|
|
|
|
|
|
304 |
|
305 |
+
</TabItem>
|
306 |
+
</Tabs>
|
307 |
|
308 |
+
### 4. Configure RAGflow
|
309 |
|
310 |
+
To enable IPEX-LLM accelerated Ollama in RAGFlow, you must also complete the configurations in RAGFlow. The steps are identical to those outlined in the *Deploy a local model using Ollama* section:
|
311 |
|
312 |
+
1. [Add Ollama](#4-add-ollama)
|
313 |
+
2. [Complete basic Ollama settings](#5-complete-basic-ollama-settings)
|
314 |
+
3. [Update System Model Settings](#6-update-system-model-settings)
|
315 |
+
4. [Update Chat Configuration](#7-update-chat-configuration)
|
docs/guides/manage_files.md
CHANGED
@@ -43,7 +43,7 @@ You can link your file to one knowledge base or multiple knowledge bases at one
|
|
43 |
|
44 |

|
45 |
|
46 |
-
## Move file to
|
47 |
|
48 |
As of RAGFlow v0.8.0, this feature is *not* available.
|
49 |
|
|
|
43 |
|
44 |

|
45 |
|
46 |
+
## Move file to a specific folder
|
47 |
|
48 |
As of RAGFlow v0.8.0, this feature is *not* available.
|
49 |
|