KevinHuSh commited on
Commit
6b3ce5a
·
1 Parent(s): 63df91a

Support Xinference (#321)

Browse files

### What problem does this PR solve?

Issue link:#299

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Files changed (2) hide show
  1. docs/xinference.md +43 -0
  2. rag/llm/cv_model.py +2 -1
docs/xinference.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Xinference
2
+
3
+ <div align="center" style="margin-top:20px;margin-bottom:20px;">
4
+ <img src="https://github.com/infiniflow/ragflow/assets/12318111/2c5e86a7-807b-4d29-bd2b-f73fb1018866" width="130"/>
5
+ </div>
6
+
7
+ Xorbits Inference([Xinference](https://github.com/xorbitsai/inference)) empowers you to unleash the full potential of cutting-edge AI models.
8
+
9
+ ## Install
10
+
11
+ - [pip install "xinference[all]"](https://inference.readthedocs.io/en/latest/getting_started/installation.html)
12
+ - [Docker](https://inference.readthedocs.io/en/latest/getting_started/using_docker_image.html)
13
+
14
+ To start a local instance of Xinference, run the following command:
15
+ ```bash
16
+ $ xinference-local --host 0.0.0.0 --port 9997
17
+ ```
18
+ ## Launch Xinference
19
+
20
+ Decide which LLM you want to deploy ([here's a list for supported LLM](https://inference.readthedocs.io/en/latest/models/builtin/)), say, **mistral**.
21
+ Execute the following command to launch the model, remember to replace ${quantization} with your chosen quantization method from the options listed above:
22
+ ```bash
23
+ $ xinference launch -u mistral --model-name mistral-v0.1 --size-in-billions 7 --model-format pytorch --quantization ${quantization}
24
+ ```
25
+
26
+ ## Use Xinference in RAGFlow
27
+
28
+ - Go to 'Settings > Model Providers > Models to be added > Xinference'.
29
+
30
+ <div align="center" style="margin-top:20px;margin-bottom:20px;">
31
+ <img src="https://github.com/infiniflow/ragflow/assets/12318111/bcbf4d7a-ade6-44c7-ad5f-0a92c8a73789" width="1300"/>
32
+ </div>
33
+
34
+ > Base URL: Enter the base URL where the Ollama service is accessible, like, http://<your-ollama-endpoint-domain>:11434
35
+
36
+ - Use Xinference Models.
37
+
38
+ <div align="center" style="margin-top:20px;margin-bottom:20px;">
39
+ <img src="https://github.com/infiniflow/ragflow/assets/12318111/b01fcb6f-47c9-4777-82e0-f1e947ed615a" width="530"/>
40
+ </div>
41
+ <div align="center" style="margin-top:20px;margin-bottom:20px;">
42
+ <img src="https://github.com/infiniflow/ragflow/assets/12318111/1763dcd1-044f-438d-badd-9729f5b3a144" width="530"/>
43
+ </div>
rag/llm/cv_model.py CHANGED
@@ -161,9 +161,10 @@ class OllamaCV(Base):
161
  except Exception as e:
162
  return "**ERROR**: " + str(e), 0
163
 
 
164
  class XinferenceCV(Base):
165
  def __init__(self, key, model_name="", lang="Chinese", base_url=""):
166
- self.client = OpenAI(api_key=key, base_url=base_url)
167
  self.model_name = model_name
168
  self.lang = lang
169
 
 
161
  except Exception as e:
162
  return "**ERROR**: " + str(e), 0
163
 
164
+
165
  class XinferenceCV(Base):
166
  def __init__(self, key, model_name="", lang="Chinese", base_url=""):
167
+ self.client = OpenAI(api_key="xxx", base_url=base_url)
168
  self.model_name = model_name
169
  self.lang = lang
170