writinwaters commited on
Commit
02e5242
·
1 Parent(s): f859b0d

Updated retrieval testing UI (#3433)

Browse files

### What problem does this PR solve?



### Type of change


- [x] Documentation Update

docs/references/http_api_reference.md CHANGED
@@ -1383,7 +1383,7 @@ curl --request POST \
1383
  The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
1384
  - `"prompt"`: (*Body parameter*), `object`
1385
  Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes:
1386
- - `"similarity_threshold"`: `float` RAGFlow uses a hybrid of weighted keyword similarity and vector cosine similarity during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1387
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1388
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1389
  - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
@@ -1518,7 +1518,7 @@ curl --request PUT \
1518
  The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
1519
  - `"prompt"`: (*Body parameter*), `object`
1520
  Instructions for the LLM to follow. A `prompt` object contains the following attributes:
1521
- - `"similarity_threshold"`: `float` RAGFlow uses a hybrid of weighted keyword similarity and vector cosine similarity during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1522
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1523
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1524
  - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
 
1383
  The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
1384
  - `"prompt"`: (*Body parameter*), `object`
1385
  Instructions for the LLM to follow. If it is not explicitly set, a JSON object with the following values will be generated as the default. A `prompt` JSON object contains the following attributes:
1386
+ - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1387
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1388
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1389
  - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
 
1518
  The maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
1519
  - `"prompt"`: (*Body parameter*), `object`
1520
  Instructions for the LLM to follow. A `prompt` object contains the following attributes:
1521
+ - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1522
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1523
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1524
  - `"variables"`: `object[]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
docs/references/python_api_reference.md CHANGED
@@ -957,7 +957,7 @@ The LLM settings for the chat assistant to create. Defaults to `None`. When the
957
 
958
  Instructions for the LLM to follow. A `Prompt` object contains the following attributes:
959
 
960
- - `similarity_threshold`: `float` RAGFlow uses a hybrid of weighted keyword similarity and vector cosine similarity during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
961
  - `keywords_similarity_weight`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
962
  - `top_n`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
963
  - `variables`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
@@ -1015,7 +1015,7 @@ A dictionary representing the attributes to update, with the following keys:
1015
  - `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
1016
  - `"max_token"`, `int` The maximum length of the model’s output, measured in the number of tokens (words or pieces of words).
1017
  - `"prompt"` : Instructions for the LLM to follow.
1018
- - `"similarity_threshold"`: `float` RAGFlow uses a hybrid of weighted keyword similarity and vector cosine similarity during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1019
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1020
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1021
  - `"variables"`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
 
957
 
958
  Instructions for the LLM to follow. A `Prompt` object contains the following attributes:
959
 
960
+ - `similarity_threshold`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
961
  - `keywords_similarity_weight`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
962
  - `top_n`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
963
  - `variables`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
 
1015
  - `"frequency penalty"`, `float` Similar to presence penalty, this reduces the model’s tendency to repeat the same words.
1016
  - `"max_token"`, `int` The maximum length of the model’s output, measured in the number of tokens (words or pieces of words).
1017
  - `"prompt"` : Instructions for the LLM to follow.
1018
+ - `"similarity_threshold"`: `float` RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted rerank score during retrieval. This argument sets the threshold for similarities between the user query and chunks. If a similarity score falls below this threshold, the corresponding chunk will be excluded from the results. The default value is `0.2`.
1019
  - `"keywords_similarity_weight"`: `float` This argument sets the weight of keyword similarity in the hybrid similarity score with vector cosine similarity or reranking model similarity. By adjusting this weight, you can control the influence of keyword similarity in relation to other similarity measures. The default value is `0.7`.
1020
  - `"top_n"`: `int` This argument specifies the number of top chunks with similarity scores above the `similarity_threshold` that are fed to the LLM. The LLM will *only* access these 'top N' chunks. The default value is `8`.
1021
  - `"variables"`: `list[dict[]]` This argument lists the variables to use in the 'System' field of **Chat Configurations**. Note that:
web/src/locales/en.ts CHANGED
@@ -102,15 +102,15 @@ export default {
102
  processDuration: 'Process Duration',
103
  progressMsg: 'Progress Msg',
104
  testingDescription:
105
- 'Final step! After success, leave the rest to Infiniflow AI.',
106
  similarityThreshold: 'Similarity threshold',
107
  similarityThresholdTip:
108
- "We use hybrid similarity score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity. If the similarity between query and chunk is less than this threshold, the chunk will be filtered out.",
109
  vectorSimilarityWeight: 'Keywords similarity weight',
110
  vectorSimilarityWeightTip:
111
- " We use hybrid similarity score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity or rerank score(0~1). The sum of both weights is 1.0.",
112
  testText: 'Test text',
113
- testTextPlaceholder: 'Please input your question!',
114
  testingLabel: 'Testing',
115
  similarity: 'Hybrid Similarity',
116
  termSimilarity: 'Term Similarity',
@@ -152,7 +152,7 @@ export default {
152
  cancel: 'Cancel',
153
  rerankModel: 'Rerank Model',
154
  rerankPlaceholder: 'Please select',
155
- rerankTip: `If it's empty. It uses embeddings of query and chunks to compuste vector cosine similarity. Otherwise, it uses rerank score in place of vector cosine similarity.`,
156
  topK: 'Top-K',
157
  topKTip: `K chunks will be fed into rerank models.`,
158
  delimiter: `Delimiter`,
@@ -277,7 +277,7 @@ export default {
277
  knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
278
 
279
  <p>This approach chunks files using the 'naive'/'General' method. It splits a document into segements and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
280
- <p>The chunks are then fed to the LLM to extract nodes and relationships for a knowledge graph and a mind map.</p>
281
  <p>Ensure that you set the <b>Entity types</b>.</p>`,
282
  useRaptor: 'Use RAPTOR to enhance retrieval',
283
  useRaptorTip:
 
102
  processDuration: 'Process Duration',
103
  progressMsg: 'Progress Msg',
104
  testingDescription:
105
+ 'Conduct a retrieval test to check if RAGFlow can recover the intended content for the LLM.',
106
  similarityThreshold: 'Similarity threshold',
107
  similarityThresholdTip:
108
+ "RAGFlow employs either a combination of weighted keyword similarity and weighted vector cosine similarity, or a combination of weighted keyword similarity and weighted reranking score during retrieval. This parameter sets the threshold for similarities between the user query and chunks. Any chunk with a similarity score below this threshold will be excluded from the results.",
109
  vectorSimilarityWeight: 'Keywords similarity weight',
110
  vectorSimilarityWeightTip:
111
+ "This sets the weight of keyword similarity in the combined similarity score, either used with vector cosine similarity or with reranking score. The total of the two weights must equal 1.0.",
112
  testText: 'Test text',
113
+ testTextPlaceholder: 'Input your question here!',
114
  testingLabel: 'Testing',
115
  similarity: 'Hybrid Similarity',
116
  termSimilarity: 'Term Similarity',
 
152
  cancel: 'Cancel',
153
  rerankModel: 'Rerank Model',
154
  rerankPlaceholder: 'Please select',
155
+ rerankTip: `If left empty, RAGFlow will use a combination of weighted keyword similarity and weighted vector cosine similarity; if a rerank model is selected, a weighted reranking score will replace the weighted vector cosine similarity.`,
156
  topK: 'Top-K',
157
  topKTip: `K chunks will be fed into rerank models.`,
158
  delimiter: `Delimiter`,
 
277
  knowledgeGraph: `<p>Supported file formats are <b>DOCX, EXCEL, PPT, IMAGE, PDF, TXT, MD, JSON, EML</b>
278
 
279
  <p>This approach chunks files using the 'naive'/'General' method. It splits a document into segements and then combines adjacent segments until the token count exceeds the threshold specified by 'Chunk token number', at which point a chunk is created.</p>
280
+ <p>The chunks are then fed to the LLM to extract entities and relationships for a knowledge graph and a mind map.</p>
281
  <p>Ensure that you set the <b>Entity types</b>.</p>`,
282
  useRaptor: 'Use RAPTOR to enhance retrieval',
283
  useRaptorTip: