writinwaters
commited on
Commit
·
02e5242
1
Parent(s):
f859b0d
Updated retrieval testing UI (#3433)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
docs/references/http_api_reference.md
CHANGED
@@ -1383,7 +1383,7 @@ curl --request POST \
|
|
1383 |
The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
|
1384 |
- `"prompt"`: (*Body parameter*), `object`
|
1385 |
Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes:
|
1386 |
-
- `"similarity_threshold"`: `float` RAGFlow
|
1387 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1388 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1389 |
- `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
@@ -1518,7 +1518,7 @@ curl --request PUT \
|
|
1518 |
The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
|
1519 |
- `"prompt"`: (*Body parameter*), `object`
|
1520 |
Instructions for the LLM to follow. A `prompt` object contains the following attributes:
|
1521 |
-
- `"similarity_threshold"`: `float` RAGFlow
|
1522 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1523 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1524 |
- `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
|
|
1383 |
The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
|
1384 |
- `"prompt"`: (*Body parameter*), `object`
|
1385 |
Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes:
|
1386 |
+
- `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
|
1387 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1388 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1389 |
- `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
|
|
1518 |
The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
|
1519 |
- `"prompt"`: (*Body parameter*), `object`
|
1520 |
Instructions for the LLM to follow. A `prompt` object contains the following attributes:
|
1521 |
+
- `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
|
1522 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1523 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1524 |
- `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
docs/references/python_api_reference.md
CHANGED
@@ -957,7 +957,7 @@ The LLM settings for the chat assistant to create. Defaults to `None`. When the
|
|
957 |
|
958 |
Instructions for the LLM to follow. A `Prompt` object contains the following attributes:
|
959 |
|
960 |
-
- `similarity_threshold`: `float` RAGFlow
|
961 |
- `keywords_similarity_weight`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
962 |
- `top_n`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
963 |
- `variables`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
@@ -1015,7 +1015,7 @@ A dictionary representing the attributes to update, with the following keys:
|
|
1015 |
- `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
|
1016 |
- `"max_token"`, `int` The maximum length of the model’s output, measured in the number of tokens (words or pieces of words).
|
1017 |
- `"prompt"` : Instructions for the LLM to follow.
|
1018 |
-
- `"similarity_threshold"`: `float` RAGFlow
|
1019 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1020 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1021 |
- `"variables"`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
|
|
957 |
|
958 |
Instructions for the LLM to follow. A `Prompt` object contains the following attributes:
|
959 |
|
960 |
+
- `similarity_threshold`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
|
961 |
- `keywords_similarity_weight`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
962 |
- `top_n`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
963 |
- `variables`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
|
|
1015 |
- `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
|
1016 |
- `"max_token"`, `int` The maximum length of the model’s output, measured in the number of tokens (words or pieces of words).
|
1017 |
- `"prompt"` : Instructions for the LLM to follow.
|
1018 |
+
- `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
|
1019 |
- `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
|
1020 |
- `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
|
1021 |
- `"variables"`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
|
web/src/locales/en.ts
CHANGED
@@ -102,15 +102,15 @@ export default {
|
|
102 |
processDuration: 'Process Duration',
|
103 |
progressMsg: 'Progress Msg',
|
104 |
testingDescription:
|
105 |
-
'
|
106 |
similarityThreshold: 'Similarity threshold',
|
107 |
similarityThresholdTip:
|
108 |
-
"
|
109 |
vectorSimilarityWeight: 'Keywords similarity weight',
|
110 |
vectorSimilarityWeightTip:
|
111 |
-
"
|
112 |
testText: 'Test text',
|
113 |
-
testTextPlaceholder: '
|
114 |
testingLabel: 'Testing',
|
115 |
similarity: 'Hybrid Similarity',
|
116 |
termSimilarity: 'Term Similarity',
|
@@ -152,7 +152,7 @@ export default {
|
|
152 |
cancel: 'Cancel',
|
153 |
rerankModel: 'Rerank Model',
|
154 |
rerankPlaceholder: 'Please select',
|
155 |
-
rerankTip: `If
|
156 |
topK: 'Top-K',
|
157 |
topKTip: `K chunks will be fed into rerank models.`,
|
158 |
delimiter: `Delimiter`,
|
@@ -277,7 +277,7 @@ export default {
|
|
277 |
knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
|
278 |
|
279 |
<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segements and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
|
280 |
-
<p>The chunks are then fed to the LLM to extract
|
281 |
<p>Ensure that you set the <b>Entity types</b>.</p>`,
|
282 |
useRaptor: 'Use RAPTOR to enhance retrieval',
|
283 |
useRaptorTip:
|
|
|
102 |
processDuration: 'Process Duration',
|
103 |
progressMsg: 'Progress Msg',
|
104 |
testingDescription:
|
105 |
+
'Conduct a retrieval test to check if RAGFlow can recover the intended content for the LLM.',
|
106 |
similarityThreshold: 'Similarity threshold',
|
107 |
similarityThresholdTip:
|
108 |
+
"RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This parameter sets the threshold for similarities between the user query and chunks. Any chunk with a similarity score below this threshold will be excluded from the results.",
|
109 |
vectorSimilarityWeight: 'Keywords similarity weight',
|
110 |
vectorSimilarityWeightTip:
|
111 |
+
"This sets the weight of keyword similarity in the combined similarity score, either used with vector cosine similarity or with reranking score. The total of the two weights must equal 1.0.",
|
112 |
testText: 'Test text',
|
113 |
+
testTextPlaceholder: 'Input your question here!',
|
114 |
testingLabel: 'Testing',
|
115 |
similarity: 'Hybrid Similarity',
|
116 |
termSimilarity: 'Term Similarity',
|
|
|
152 |
cancel: 'Cancel',
|
153 |
rerankModel: 'Rerank Model',
|
154 |
rerankPlaceholder: 'Please select',
|
155 |
+
rerankTip: `If left empty, RAGFlow will use a combination of weighted keyword similarity and weighted vector cosine similarity; if a rerank model is selected, a weighted reranking score will replace the weighted vector cosine similarity.`,
|
156 |
topK: 'Top-K',
|
157 |
topKTip: `K chunks will be fed into rerank models.`,
|
158 |
delimiter: `Delimiter`,
|
|
|
277 |
knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
|
278 |
|
279 |
<p>This approach chunks files using the 'naive'/'General' method. It splits a document into segements and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
|
280 |
+
<p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
|
281 |
<p>Ensure that you set the <b>Entity types</b>.</p>`,
|
282 |
useRaptor: 'Use RAPTOR to enhance retrieval',
|
283 |
useRaptorTip:
|