writinwaters commited on
Commit
ce611dd
·
1 Parent(s): 2ace2a9

Updated HTTP API reference and Python API reference based on test results (#3090)

Browse files

### What problem does this PR solve?



### Type of change


- [x] Documentation Update

api/http_api_reference.md CHANGED
@@ -94,8 +94,10 @@ curl --request POST \
94
  The configuration settings for the dataset parser, a JSON object containing the following attributes:
95
  - `"chunk_token_count"`: Defaults to `128`.
96
  - `"layout_recognize"`: Defaults to `true`.
 
97
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
98
- - `"task_page_size"`: Defaults to `12`.
 
99
 
100
  ### Response
101
 
@@ -177,7 +179,7 @@ curl --request DELETE \
177
 
178
  #### Request parameters
179
 
180
- - `"ids"`: (*Body parameter*), `list[string]`
181
  The IDs of the datasets to delete. If it is not specified, all datasets will be deleted.
182
 
183
  ### Response
@@ -241,7 +243,7 @@ curl --request PUT \
241
  - `"embedding_model"`: (*Body parameter*), `string`
242
  The updated embedding model name.
243
  - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
244
- - `"chunk_method"`: (*Body parameter*), `enum<string>`
245
  The chunking method for the dataset. Available options:
246
  - `"naive"`: General
247
  - `"manual`: Manual
@@ -510,12 +512,12 @@ curl --request PUT \
510
  - `"one"`: One
511
  - `"knowledge_graph"`: Knowledge Graph
512
  - `"email"`: Email
513
- - `"parser_config"`: (*Body parameter*), `object`
514
  The parsing configuration for the document:
515
  - `"chunk_token_count"`: Defaults to `128`.
516
  - `"layout_recognize"`: Defaults to `true`.
517
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
518
- - `"task_page_size"`: Defaults to `12`.
519
 
520
  ### Response
521
 
@@ -718,7 +720,7 @@ curl --request DELETE \
718
 
719
  - `dataset_id`: (*Path parameter*)
720
  The associated dataset ID.
721
- - `"ids"`: (*Body parameter*), `list[string]`
722
  The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted.
723
 
724
  ### Response
@@ -1169,7 +1171,7 @@ Failure:
1169
 
1170
  ## Retrieve chunks
1171
 
1172
- **GET** `/api/v1/retrieval`
1173
 
1174
  Retrieves chunks from specified datasets.
1175
 
 
94
  The configuration settings for the dataset parser, a JSON object containing the following attributes:
95
  - `"chunk_token_count"`: Defaults to `128`.
96
  - `"layout_recognize"`: Defaults to `true`.
97
+ - `"html4excel"`: Indicates whether to convert Excel documents into HTML format. Defaults to `false`.
98
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
99
+ - `"task_page_size"`: Defaults to `12`. For PDF only.
100
+ - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
101
 
102
  ### Response
103
 
 
179
 
180
  #### Request parameters
181
 
182
+ - `"ids"`: (*Body parameter*), `list[string]`
183
  The IDs of the datasets to delete. If it is not specified, all datasets will be deleted.
184
 
185
  ### Response
 
243
  - `"embedding_model"`: (*Body parameter*), `string`
244
  The updated embedding model name.
245
  - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
246
+ - `"chunk_method"`: (*Body parameter*), `enum<string>`
247
  The chunking method for the dataset. Available options:
248
  - `"naive"`: General
249
  - `"manual`: Manual
 
512
  - `"one"`: One
513
  - `"knowledge_graph"`: Knowledge Graph
514
  - `"email"`: Email
515
+ - `"parser_config"`: (*Body parameter*), `object`
516
  The parsing configuration for the document:
517
  - `"chunk_token_count"`: Defaults to `128`.
518
  - `"layout_recognize"`: Defaults to `true`.
519
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
520
+ - `"task_page_size"`: Defaults to `12`. For PDF only.
521
 
522
  ### Response
523
 
 
720
 
721
  - `dataset_id`: (*Path parameter*)
722
  The associated dataset ID.
723
+ - `"ids"`: (*Body parameter*), `list[string]`
724
  The IDs of the documents to delete. If it is not specified, all documents in the specified dataset will be deleted.
725
 
726
  ### Response
 
1171
 
1172
  ## Retrieve chunks
1173
 
1174
+ **POST** `/api/v1/retrieval`
1175
 
1176
  Retrieves chunks from specified datasets.
1177
 
api/python_api_reference.md CHANGED
@@ -1253,7 +1253,7 @@ Asks a question to start an AI-powered conversation.
1253
 
1254
  #### question: `str` *Required*
1255
 
1256
- The question to start an AI chat.
1257
 
1258
  #### stream: `bool`
1259
 
@@ -1286,7 +1286,7 @@ A list of `Chunk` objects representing references to the message, each containin
1286
  - `content` `str`
1287
  The content of the chunk.
1288
  - `image_id` `str`
1289
- The ID of the snapshot of the chunk.
1290
  - `document_id` `str`
1291
  The ID of the referenced document.
1292
  - `document_name` `str`
@@ -1295,14 +1295,13 @@ A list of `Chunk` objects representing references to the message, each containin
1295
  The location information of the chunk within the referenced document.
1296
  - `dataset_id` `str`
1297
  The ID of the dataset to which the referenced document belongs.
1298
- - `similarity` `float`
1299
- A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity.
1300
  - `vector_similarity` `float`
1301
  A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
1302
  - `term_similarity` `float`
1303
  A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.
1304
 
1305
-
1306
  ### Examples
1307
 
1308
  ```python
 
1253
 
1254
  #### question: `str` *Required*
1255
 
1256
+ The question to start an AI-powered conversation.
1257
 
1258
  #### stream: `bool`
1259
 
 
1286
  - `content` `str`
1287
  The content of the chunk.
1288
  - `image_id` `str`
1289
+ The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
1290
  - `document_id` `str`
1291
  The ID of the referenced document.
1292
  - `document_name` `str`
 
1295
  The location information of the chunk within the referenced document.
1296
  - `dataset_id` `str`
1297
  The ID of the dataset to which the referenced document belongs.
1298
+ - `similarity` `float`
1299
+ A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`.
1300
  - `vector_similarity` `float`
1301
  A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
1302
  - `term_similarity` `float`
1303
  A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.
1304
 
 
1305
  ### Examples
1306
 
1307
  ```python