writinwaters commited on
Commit
305b8c0
·
1 Parent(s): bf4c34e

Miscellaneous edits to RAGFlow's UI (#3337)

Browse files

### What problem does this PR solve?



### Type of change

- [x] Documentation Update

agent/templates/investment_advisor.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "id": 8,
3
  "title": "Intelligent investment advisor",
4
- "description": "An intelligent investment advisor that can answer your financial questions based on real-time domestic financial data and financial information.",
5
  "canvas_type": "chatbot",
6
  "dsl": {
7
  "answer": [],
 
1
  {
2
  "id": 8,
3
  "title": "Intelligent investment advisor",
4
+ "description": "An intelligent investment advisor that answers your financial questions using real-time domestic financial data.",
5
  "canvas_type": "chatbot",
6
  "dsl": {
7
  "answer": [],
agent/templates/medical_consultation.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "id": 7,
3
  "title": "Medical consultation",
4
- "description": "Medical Consultation Assistant, can provide you with some professional consultation suggestions for your reference. Please note that the content provided by the medical assistant is for reference only and may not be authentic or available. Knowledge Base Content Reference: <a href = 'https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main'> Medical Knowledge Base Reference</a>",
5
  "canvas_type": "chatbot",
6
  "dsl": {
7
  "answer": [],
 
1
  {
2
  "id": 7,
3
  "title": "Medical consultation",
4
+ "description": "A consultant that offers medical suggestions using an internal QA dataset and PubMed search results. Note that this agent's answers are for reference only and may not be valid. The dataset can be found at https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main",
5
  "canvas_type": "chatbot",
6
  "dsl": {
7
  "answer": [],
api/db/services/document_service.py CHANGED
@@ -410,7 +410,7 @@ def queue_raptor_tasks(doc):
410
  "doc_id": doc["id"],
411
  "from_page": 0,
412
  "to_page": -1,
413
- "progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing For Tree-Organized Retrieval)."
414
  }
415
 
416
  task = new_task()
 
410
  "doc_id": doc["id"],
411
  "from_page": 0,
412
  "to_page": -1,
413
+ "progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)."
414
  }
415
 
416
  task = new_task()
docs/configurations.md CHANGED
@@ -136,37 +136,44 @@ If you cannot download the RAGFlow Docker image, try the following mirrors.
136
 
137
  [service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor.
138
 
139
- - `ragflow`
140
- - `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
141
- - `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
142
-
143
- - `mysql`
144
- - `name`: The MySQL database name. Defaults to `rag_flow`.
145
- - `user`: The username for MySQL.
146
- - `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
147
- - `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
148
- - `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
149
- - `stale_timeout`: Timeout in seconds.
150
-
151
- - `minio`
152
- - `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
153
- - `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
154
- - `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
155
-
156
- - `oauth`
157
- The OAuth configuration for signing up or signing in to RAGFlow using a third-party account. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
158
- - `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key.
159
-
160
- - `user_default_llm`
161
- The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
162
- - `factory`: The LLM supplier. Available options:
163
- - `"OpenAI"`
164
- - `"DeepSeek"`
165
- - `"Moonshot"`
166
- - `"Tongyi-Qianwen"`
167
- - `"VolcEngine"`
168
- - `"ZHIPU-AI"`
169
- - `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
 
 
 
 
 
 
 
170
 
171
  :::tip NOTE
172
  If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI.
 
136
 
137
  [service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor.
138
 
139
+ ### `ragflow`
140
+
141
+ - `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
142
+ - `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
143
+
144
+ ### `mysql`
145
+
146
+ - `name`: The MySQL database name. Defaults to `rag_flow`.
147
+ - `user`: The username for MySQL.
148
+ - `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
149
+ - `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
150
+ - `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
151
+ - `stale_timeout`: Timeout in seconds.
152
+
153
+ ### `minio`
154
+
155
+ - `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
156
+ - `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
157
+ - `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
158
+
159
+ ### `oauth`
160
+
161
+ The OAuth configuration for signing up or signing in to RAGFlow using a third-party account. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
162
+
163
+ - `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key.
164
+
165
+ ### `user_default_llm`
166
+
167
+ The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
168
+
169
+ - `factory`: The LLM supplier. Available options:
170
+ - `"OpenAI"`
171
+ - `"DeepSeek"`
172
+ - `"Moonshot"`
173
+ - `"Tongyi-Qianwen"`
174
+ - `"VolcEngine"`
175
+ - `"ZHIPU-AI"`
176
+ - `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
177
 
178
  :::tip NOTE
179
  If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI.
docs/guides/configure_knowledge_base.md CHANGED
@@ -52,13 +52,13 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
52
  | Picture | | JPEG, JPG, PNG, TIF, GIF |
53
  | One | The entire document is chunked as one. | DOCX, EXCEL, PDF, TXT |
54
 
55
- You can also change the chunk template for a particular file on the **Datasets** page.
56
 
57
  ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
58
 
59
  ### Select embedding model
60
 
61
- An embedding model builds vector index on file chunks. Once you have chosen an embedding model and used it to parse a file, you are no longer allowed to change it. To switch to a different embedding model, you *must* delete all completed file chunks in the knowledge base. The obvious reason is that we must *ensure* that all files in a specific knowledge base are parsed using the *same* embedding model (ensure that they are compared in the same embedding space).
62
 
63
  The following embedding models can be deployed locally:
64
 
 
52
  | Picture | | JPEG, JPG, PNG, TIF, GIF |
53
  | One | The entire document is chunked as one. | DOCX, EXCEL, PDF, TXT |
54
 
55
+ You can also change the chunk template for a particular file on the **Datasets** page.
56
 
57
  ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
58
 
59
  ### Select embedding model
60
 
61
+ An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
62
 
63
  The following embedding models can be deployed locally:
64
 
web/src/locales/en.ts CHANGED
@@ -157,14 +157,14 @@ export default {
157
  delimiter: `Delimiter`,
158
  html4excel: 'Excel to HTML',
159
  html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
160
- autoKeywords: 'Auto keywords',
161
- autoKeywordsTip: `Extract N keywords for every chunk to boost their rank score while querying such keywords. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list.`,
162
- autoQuestions: 'Auto questions',
163
- autoQuestionsTip: `Extract N questions for every chunk to boost their rank score while querying such questions. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list. This function will not destroy the entire chunking process if errors occur except adding empty result to the original chunk.`,
164
  },
165
  knowledgeConfiguration: {
166
  titleDescription:
167
- 'Update your knowledge base details especially parsing method here.',
168
  name: 'Knowledge base name',
169
  photo: 'Knowledge base photo',
170
  description: 'Description',
@@ -176,13 +176,13 @@ export default {
176
  chunkTokenNumber: 'Chunk token number',
177
  chunkTokenNumberMessage: 'Chunk token number is required',
178
  embeddingModelTip:
179
- "The embedding model used to embedding chunks. It's unchangable once the knowledgebase has chunks. You need to delete all the chunks if you want to change it.",
180
  permissionsTip:
181
- "If the permission is 'Team', all the team member can manipulate the knowledgebase.",
182
  chunkTokenNumberTip:
183
- 'It determine the token number of a chunk approximately.',
184
  chunkMethod: 'Chunk method',
185
- chunkMethodTip: 'The instruction is at right.',
186
  upload: 'Upload',
187
  english: 'English',
188
  chinese: 'Chinese',
@@ -192,11 +192,11 @@ export default {
192
  me: 'Only me',
193
  team: 'Team',
194
  cancel: 'Cancel',
195
- methodTitle: 'Chunking Method Description',
196
  methodExamples: 'Examples',
197
  methodExamplesDescription:
198
- 'The following screenshots are presented to facilitate understanding.',
199
- dialogueExamplesTitle: 'Dialogue Examples',
200
  methodEmpty:
201
  'This will display a visual explanation of the knowledge base categories',
202
  book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
@@ -208,8 +208,7 @@ export default {
208
  The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk.
209
  </p>`,
210
  manual: `<p>Only <b>PDF</b> is supported.</p><p>
211
- We assume manual has hierarchical section structure. We use the lowest section titles as pivots to slice documents.
212
- So, the figures and tables in the same section will not be sliced apart, and chunk size might be large.
213
  </p>`,
214
  naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
215
  <p>This method apply the naive ways to chunk files: </p>
@@ -292,7 +291,7 @@ Successive text will be sliced into pieces each of which is around 512 token num
292
  Mind the entiry type you need to specify.</p>`,
293
  useRaptor: 'Use RAPTOR to enhance retrieval',
294
  useRaptorTip:
295
- 'Recursive Abstractive Processing for Tree-Organized Retrieval, please refer to https://huggingface.co/papers/2401.18059',
296
  prompt: 'Prompt',
297
  promptTip: 'LLM prompt used for summarization.',
298
  promptMessage: 'Prompt is required',
 
157
  delimiter: `Delimiter`,
158
  html4excel: 'Excel to HTML',
159
  html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
160
+ autoKeywords: 'Auto-keyword',
161
+ autoKeywordsTip: `Extract N keywords for each chunk to improve their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
162
+ autoQuestions: 'Auto-question',
163
+ autoQuestionsTip: `Extract N questions for each chunk to improve their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
164
  },
165
  knowledgeConfiguration: {
166
  titleDescription:
167
+ 'Update your knowledge base configurations here, particularly the chunk method.',
168
  name: 'Knowledge base name',
169
  photo: 'Knowledge base photo',
170
  description: 'Description',
 
176
  chunkTokenNumber: 'Chunk token number',
177
  chunkTokenNumberMessage: 'Chunk token number is required',
178
  embeddingModelTip:
179
+ "The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base.",
180
  permissionsTip:
181
+ "If set to 'Team', all team members will be able to manage the knowledge base.",
182
  chunkTokenNumberTip:
183
+ 'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
184
  chunkMethod: 'Chunk method',
185
+ chunkMethodTip: 'Tips are on the right.',
186
  upload: 'Upload',
187
  english: 'English',
188
  chinese: 'Chinese',
 
192
  me: 'Only me',
193
  team: 'Team',
194
  cancel: 'Cancel',
195
+ methodTitle: 'Chunk method description',
196
  methodExamples: 'Examples',
197
  methodExamplesDescription:
198
+ 'The following screenshots are provided for clarity.',
199
+ dialogueExamplesTitle: 'Dialogue examples',
200
  methodEmpty:
201
  'This will display a visual explanation of the knowledge base categories',
202
  book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
 
208
  The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk.
209
  </p>`,
210
  manual: `<p>Only <b>PDF</b> is supported.</p><p>
211
+ We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
 
212
  </p>`,
213
  naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
214
  <p>This method apply the naive ways to chunk files: </p>
 
291
  Mind the entiry type you need to specify.</p>`,
292
  useRaptor: 'Use RAPTOR to enhance retrieval',
293
  useRaptorTip:
294
+ 'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information',
295
  prompt: 'Prompt',
296
  promptTip: 'LLM prompt used for summarization.',
297
  promptMessage: 'Prompt is required',