Spaces:

retopara
/

ragflow

Build error

App Files Files Community

writinwaters commited on Nov 11, 2024

Commit

305b8c0

1 Parent(s): bf4c34e

Miscellaneous edits to RAGFlow's UI (#3337)

Browse files

### What problem does this PR solve?

### Type of change

- [x] Documentation Update

Files changed (6) hide show

agent/templates/investment_advisor.json +1 -1
agent/templates/medical_consultation.json +1 -1
api/db/services/document_service.py +1 -1
docs/configurations.md +38 -31
docs/guides/configure_knowledge_base.md +2 -2
web/src/locales/en.ts +14 -15

agent/templates/investment_advisor.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
     "id": 8,
     "title": "Intelligent investment advisor",
-    "description": "An intelligent investment advisor that can answer your financial questions based on real-time domestic financial data and financial information.",
     "canvas_type": "chatbot",
     "dsl": {
           "answer": [],

 {
     "id": 8,
     "title": "Intelligent investment advisor",
+    "description": "An intelligent investment advisor that answers your financial questions using real-time domestic financial data.",
     "canvas_type": "chatbot",
     "dsl": {
           "answer": [],

agent/templates/medical_consultation.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "id": 7,
   "title": "Medical consultation",
-  "description": "Medical Consultation Assistant, can provide you with some professional consultation suggestions for your reference. Please note that the content provided by the medical assistant is for reference only and may not be authentic or available. Knowledge Base Content Reference: <a href = 'https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main'> Medical Knowledge Base Reference</a>",
   "canvas_type": "chatbot",
    "dsl": {
       "answer": [],

 {
   "id": 7,
   "title": "Medical consultation",
+  "description": "A consultant that offers medical suggestions using an internal QA dataset and PubMed search results. Note that this agent's answers are for reference only and may not be valid. The dataset can be found at https://huggingface.co/datasets/InfiniFlow/medical_QA/tree/main",
   "canvas_type": "chatbot",
    "dsl": {
       "answer": [],

api/db/services/document_service.py CHANGED Viewed

@@ -410,7 +410,7 @@ def queue_raptor_tasks(doc):
             "doc_id": doc["id"],
             "from_page": 0,
             "to_page": -1,
-            "progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing For Tree-Organized Retrieval)."
         }
     task = new_task()

             "doc_id": doc["id"],
             "from_page": 0,
             "to_page": -1,
+            "progress_msg": "Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)."
         }
     task = new_task()

docs/configurations.md CHANGED Viewed

@@ -136,37 +136,44 @@ If you cannot download the RAGFlow Docker image, try the following mirrors.
 [service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor.
-- `ragflow`
-  - `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
-  - `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
-- `mysql`
-  - `name`: The MySQL database name. Defaults to `rag_flow`.
-  - `user`: The username for MySQL.
-  - `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
-  - `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
-  - `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
-  - `stale_timeout`: Timeout in seconds.
-- `minio`
-  - `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
-  - `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
-  - `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
-- `oauth`
-  The OAuth configuration for signing up or signing in to RAGFlow using a third-party account.  It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
-  - `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key.
-- `user_default_llm`
-  The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
-  - `factory`: The LLM supplier. Available options:
-    - `"OpenAI"`
-    - `"DeepSeek"`
-    - `"Moonshot"`
-    - `"Tongyi-Qianwen"`
-    - `"VolcEngine"`
-    - `"ZHIPU-AI"`
-  - `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
 :::tip NOTE
 If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI.

 [service_conf.yaml](https://github.com/infiniflow/ragflow/blob/main/docker/service_conf.yaml) specifies the system-level configuration for RAGFlow and is used by its API server and task executor.
+### `ragflow`
+- `host`: The API server's IP address inside the Docker container. Defaults to `0.0.0.0`.
+- `port`: The API server's serving port inside the Docker container. Defaults to `9380`.
+### `mysql`
+- `name`: The MySQL database name. Defaults to `rag_flow`.
+- `user`: The username for MySQL.
+- `password`: The password for MySQL. When updated, you must revise the `MYSQL_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
+- `port`: The MySQL serving port inside the Docker container. Defaults to `3306`.
+- `max_connections`: The maximum number of concurrent connections to the MySQL database. Defaults to `100`.
+- `stale_timeout`: Timeout in seconds.
+### `minio`
+- `user`: The username for MinIO. When updated, you must revise the `MINIO_USER` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
+- `password`: The password for MinIO. When updated, you must revise the `MINIO_PASSWORD` variable in [.env](https://github.com/infiniflow/ragflow/blob/main/docker/.env) accordingly.
+- `host`: The MinIO serving IP *and* port inside the Docker container. Defaults to `minio:9000`.
+### `oauth`
+The OAuth configuration for signing up or signing in to RAGFlow using a third-party account.  It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
+- `github`: The GitHub authentication settings for your application. Visit the [Github Developer Settings](https://github.com/settings/developers) page to obtain your client_id and secret_key.
+### `user_default_llm`
+The default LLM to use for a new RAGFlow user. It is disabled by default. To enable this feature, uncomment the corresponding lines in **service_conf.yaml**.
+- `factory`: The LLM supplier. Available options:
+  - `"OpenAI"`
+  - `"DeepSeek"`
+  - `"Moonshot"`
+  - `"Tongyi-Qianwen"`
+  - `"VolcEngine"`
+  - `"ZHIPU-AI"`
+- `api_key`: The API key for the specified LLM. You will need to apply for your model API key online.
 :::tip NOTE
 If you do not set the default LLM here, configure the default LLM on the **Settings** page in the RAGFlow UI.

docs/guides/configure_knowledge_base.md CHANGED Viewed

@@ -52,13 +52,13 @@ RAGFlow offers multiple chunking template to facilitate chunking files of differ
 | Picture      |                                                              | JPEG, JPG, PNG, TIF, GIF                             |
 | One          | The entire document is chunked as one.                       | DOCX, EXCEL, PDF, TXT                                |
-You can also change the chunk template for a particular file on the **Datasets** page.
 ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
 ### Select embedding model
-An embedding model builds vector index on file chunks. Once you have chosen an embedding model and used it to parse a file, you are no longer allowed to change it. To switch to a different embedding model, you *must* delete all completed file chunks in the knowledge base. The obvious reason is that we must *ensure* that all files in a specific knowledge base are parsed using the *same* embedding model (ensure that they are compared in the same embedding space).
 The following embedding models can be deployed locally:

 | Picture      |                                                              | JPEG, JPG, PNG, TIF, GIF                             |
 | One          | The entire document is chunked as one.                       | DOCX, EXCEL, PDF, TXT                                |
+You can also change the chunk template for a particular file on the **Datasets** page.
 ![change chunk method](https://github.com/infiniflow/ragflow/assets/93570324/ac116353-2793-42b2-b181-65e7082bed42)
 ### Select embedding model
+An embedding model converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base. The obvious reason is that we *must* ensure that files in a specific knowledge base are converted to embeddings using the *same* embedding model (ensure that they are compared in the same embedding space).
 The following embedding models can be deployed locally:

web/src/locales/en.ts CHANGED Viewed

@@ -157,14 +157,14 @@ export default {
       delimiter: `Delimiter`,
       html4excel: 'Excel to HTML',
       html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
-      autoKeywords: 'Auto keywords',
-      autoKeywordsTip: `Extract N keywords for every chunk to boost their rank score while querying such keywords. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list.`,
-      autoQuestions: 'Auto questions',
-      autoQuestionsTip: `Extract N questions for every chunk to boost their rank score while querying such questions. Extra tokens will be comsumed for LLM that you set in 'System model settings'. You can check the result in the chunk list. This function will not destroy the entire chunking process if errors occur except adding empty result to the original chunk.`,
     },
     knowledgeConfiguration: {
       titleDescription:
-        'Update your knowledge base details especially parsing method here.',
       name: 'Knowledge base name',
       photo: 'Knowledge base photo',
       description: 'Description',
@@ -176,13 +176,13 @@ export default {
       chunkTokenNumber: 'Chunk token number',
       chunkTokenNumberMessage: 'Chunk token number is required',
       embeddingModelTip:
-        "The embedding model used to embedding chunks. It's unchangable once the knowledgebase has chunks. You need to delete all the chunks if you want to change it.",
       permissionsTip:
-        "If the permission is 'Team', all the team member can manipulate the knowledgebase.",
       chunkTokenNumberTip:
-        'It determine the token number of a chunk approximately.',
       chunkMethod: 'Chunk method',
-      chunkMethodTip: 'The instruction is at right.',
       upload: 'Upload',
       english: 'English',
       chinese: 'Chinese',
@@ -192,11 +192,11 @@ export default {
       me: 'Only me',
       team: 'Team',
       cancel: 'Cancel',
-      methodTitle: 'Chunking Method Description',
       methodExamples: 'Examples',
       methodExamplesDescription:
-        'The following screenshots are presented to facilitate understanding.',
-      dialogueExamplesTitle: 'Dialogue Examples',
       methodEmpty:
         'This will display a visual explanation of the knowledge base categories',
       book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
@@ -208,8 +208,7 @@ export default {
       The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk.
       </p>`,
       manual: `<p>Only <b>PDF</b> is supported.</p><p>
-      We assume manual has hierarchical section structure. We use the lowest section titles as pivots to slice documents.
-      So, the figures and tables in the same section will not be sliced apart, and chunk size might be large.
       </p>`,
       naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
       <p>This method apply the naive ways to chunk files: </p>
@@ -292,7 +291,7 @@ Successive text will be sliced into pieces each of which is around 512 token num
 Mind the entiry type you need to specify.</p>`,
       useRaptor: 'Use RAPTOR to enhance retrieval',
       useRaptorTip:
-        'Recursive Abstractive Processing for Tree-Organized Retrieval, please refer to https://huggingface.co/papers/2401.18059',
       prompt: 'Prompt',
       promptTip: 'LLM prompt used for summarization.',
       promptMessage: 'Prompt is required',

       delimiter: `Delimiter`,
       html4excel: 'Excel to HTML',
       html4excelTip: `Excel will be parsed into HTML table or not. If it's FALSE, every row in Excel will be formed as a chunk.`,
+      autoKeywords: 'Auto-keyword',
+      autoKeywordsTip: `Extract N keywords for each chunk to improve their ranking for queries containing those keywords. You can check or update the added keywords for a chunk from the chunk list. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
+      autoQuestions: 'Auto-question',
+      autoQuestionsTip: `Extract N questions for each chunk to improve their ranking for queries containing those questions. You can check or update the added questions for a chunk from the chunk list. This feature will not disrupt the chunking process if an error occurs, except that it may add an empty result to the original chunk. Be aware that extra tokens will be consumed by the LLM specified in 'System model settings'.`,
     },
     knowledgeConfiguration: {
       titleDescription:
+        'Update your knowledge base configurations here, particularly the chunk method.',
       name: 'Knowledge base name',
       photo: 'Knowledge base photo',
       description: 'Description',
       chunkTokenNumber: 'Chunk token number',
       chunkTokenNumberMessage: 'Chunk token number is required',
       embeddingModelTip:
+        "The model that converts chunks into embeddings. It cannot be changed once the knowledge base has chunks. To switch to a different embedding model, You must delete all chunks in the knowledge base.",
       permissionsTip:
+        "If set to 'Team', all team members will be able to manage the knowledge base.",
       chunkTokenNumberTip:
+        'It sets the token threshold for a chunk. A paragraph with fewer tokens than this threshold will be combined with the following paragraph until the token count exceeds the threshold, at which point a chunk is created.',
       chunkMethod: 'Chunk method',
+      chunkMethodTip: 'Tips are on the right.',
       upload: 'Upload',
       english: 'English',
       chinese: 'Chinese',
       me: 'Only me',
       team: 'Team',
       cancel: 'Cancel',
+      methodTitle: 'Chunk method description',
       methodExamples: 'Examples',
       methodExamplesDescription:
+        'The following screenshots are provided for clarity.',
+      dialogueExamplesTitle: 'Dialogue examples',
       methodEmpty:
         'This will display a visual explanation of the knowledge base categories',
       book: `<p>Supported file formats are <b>DOCX</b>, <b>PDF</b>, <b>TXT</b>.</p><p>
       The chunk granularity is consistent with 'ARTICLE', and all the upper level text will be included in the chunk.
       </p>`,
       manual: `<p>Only <b>PDF</b> is supported.</p><p>
+      We assume that the manual has a hierarchical section structure, using the lowest section titles as basic unit for chunking documents. Therefore, figures and tables in the same section will not be separated, which may result in larger chunk sizes.
       </p>`,
       naive: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML, HTML</b>.</p>
       <p>This method apply the naive ways to chunk files: </p>
 Mind the entiry type you need to specify.</p>`,
       useRaptor: 'Use RAPTOR to enhance retrieval',
       useRaptorTip:
+        'Recursive Abstractive Processing for Tree-Organized Retrieval, see https://huggingface.co/papers/2401.18059 for more information',
       prompt: 'Prompt',
       promptTip: 'LLM prompt used for summarization.',
       promptMessage: 'Prompt is required',