KevinHuSh
commited on
Commit
·
6b3ce5a
1
Parent(s):
63df91a
Support Xinference (#321)
Browse files### What problem does this PR solve?
Issue link:#299
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- docs/xinference.md +43 -0
- rag/llm/cv_model.py +2 -1
docs/xinference.md
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Xinference
|
2 |
+
|
3 |
+
<div align="center" style="margin-top:20px;margin-bottom:20px;">
|
4 |
+
<img src="https://github.com/infiniflow/ragflow/assets/12318111/2c5e86a7-807b-4d29-bd2b-f73fb1018866" width="130"/>
|
5 |
+
</div>
|
6 |
+
|
7 |
+
Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models.
|
8 |
+
|
9 |
+
## Install
|
10 |
+
|
11 |
+
- [pip install "xinference[all]"](https://inference.readthedocs.io/en/latest/getting_started/installation.html)
|
12 |
+
- [Docker](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html)
|
13 |
+
|
14 |
+
To start a local instance of Xinference, run the following command:
|
15 |
+
```bash
|
16 |
+
$ xinference-local --host 0.0.0.0 --port 9997
|
17 |
+
```
|
18 |
+
## Launch Xinference
|
19 |
+
|
20 |
+
Decide which LLM you want to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**.
|
21 |
+
Execute the following command to launch the model, remember to replace ${quantization} with your chosen quantization method from the options listed above:
|
22 |
+
```bash
|
23 |
+
$ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
|
24 |
+
```
|
25 |
+
|
26 |
+
## Use Xinference in RAGFlow
|
27 |
+
|
28 |
+
- Go to 'Settings > Model Providers > Models to be added > Xinference'.
|
29 |
+
|
30 |
+
<div align="center" style="margin-top:20px;margin-bottom:20px;">
|
31 |
+
<img src="https://github.com/infiniflow/ragflow/assets/12318111/bcbf4d7a-ade6-44c7-ad5f-0a92c8a73789" width="1300"/>
|
32 |
+
</div>
|
33 |
+
|
34 |
+
> Base URL: Enter the base URL where the Ollama service is accessible, like, http://<your-ollama-endpoint-domain>:11434
|
35 |
+
|
36 |
+
- Use Xinference Models.
|
37 |
+
|
38 |
+
<div align="center" style="margin-top:20px;margin-bottom:20px;">
|
39 |
+
<img src="https://github.com/infiniflow/ragflow/assets/12318111/b01fcb6f-47c9-4777-82e0-f1e947ed615a" width="530"/>
|
40 |
+
</div>
|
41 |
+
<div align="center" style="margin-top:20px;margin-bottom:20px;">
|
42 |
+
<img src="https://github.com/infiniflow/ragflow/assets/12318111/1763dcd1-044f-438d-badd-9729f5b3a144" width="530"/>
|
43 |
+
</div>
|
rag/llm/cv_model.py
CHANGED
@@ -161,9 +161,10 @@ class OllamaCV(Base):
|
|
161 |
except Exception as e:
|
162 |
return "**ERROR**: " + str(e), 0
|
163 |
|
|
|
164 |
class XinferenceCV(Base):
|
165 |
def __init__(self, key, model_name="", lang="Chinese", base_url=""):
|
166 |
-
self.client = OpenAI(api_key=
|
167 |
self.model_name = model_name
|
168 |
self.lang = lang
|
169 |
|
|
|
161 |
except Exception as e:
|
162 |
return "**ERROR**: " + str(e), 0
|
163 |
|
164 |
+
|
165 |
class XinferenceCV(Base):
|
166 |
def __init__(self, key, model_name="", lang="Chinese", base_url=""):
|
167 |
+
self.client = OpenAI(api_key="xxx", base_url=base_url)
|
168 |
self.model_name = model_name
|
169 |
self.lang = lang
|
170 |
|