writinwaters
commited on
Commit
·
7ddda98
1
Parent(s):
3366eac
Added chunk methods (#3110)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- api/http_api_reference.md +4 -2
- api/python_api_reference.md +16 -2
api/http_api_reference.md
CHANGED
@@ -88,6 +88,7 @@ curl --request POST \
|
|
88 |
- `"picture"`: Picture
|
89 |
- `"one"`: One
|
90 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
91 |
|
92 |
- `"parser_config"`: (*Body parameter*), `object`
|
93 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
@@ -100,7 +101,7 @@ curl --request POST \
|
|
100 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
101 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
102 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
103 |
-
- If `"chunk_method"` is `"table"` or `"
|
104 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
105 |
- `"chunk_token_count"`: Defaults to `128`.
|
106 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
@@ -517,6 +518,7 @@ curl --request PUT \
|
|
517 |
- `"picture"`: Picture
|
518 |
- `"one"`: One
|
519 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
520 |
- `"parser_config"`: (*Body parameter*), `object`
|
521 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
522 |
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
|
@@ -528,7 +530,7 @@ curl --request PUT \
|
|
528 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
529 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
530 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
531 |
-
- If `"chunk_method"` is `"table"` or `"
|
532 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
533 |
- `"chunk_token_count"`: Defaults to `128`.
|
534 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
|
|
88 |
- `"picture"`: Picture
|
89 |
- `"one"`: One
|
90 |
- `"knowledge_graph"`: Knowledge Graph
|
91 |
+
- `"email"`: Email
|
92 |
|
93 |
- `"parser_config"`: (*Body parameter*), `object`
|
94 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
|
|
101 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
102 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
103 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
104 |
+
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
105 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
106 |
- `"chunk_token_count"`: Defaults to `128`.
|
107 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
|
|
518 |
- `"picture"`: Picture
|
519 |
- `"one"`: One
|
520 |
- `"knowledge_graph"`: Knowledge Graph
|
521 |
+
- `"email"`: Email
|
522 |
- `"parser_config"`: (*Body parameter*), `object`
|
523 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
524 |
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
|
|
|
530 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
531 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
532 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
533 |
+
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
534 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
535 |
- `"chunk_token_count"`: Defaults to `128`.
|
536 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
api/python_api_reference.md
CHANGED
@@ -75,12 +75,13 @@ The chunking method of the dataset to create. Available options:
|
|
75 |
- `"picture"`: Picture
|
76 |
- `"one"`: One
|
77 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
78 |
|
79 |
#### parser_config
|
80 |
|
81 |
-
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `
|
82 |
|
83 |
-
- `
|
84 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
85 |
- `chunk_method`=`"qa"`:
|
86 |
`{"raptor": {"user_raptor": False}}`
|
@@ -94,12 +95,16 @@ The parser configuration of the dataset. A `ParserConfig` object's attributes va
|
|
94 |
`{"raptor": {"user_raptor": False}}`
|
95 |
- `chunk_method`=`"laws"`:
|
96 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
97 |
- `chunk_method`=`"presentation"`:
|
98 |
`{"raptor": {"user_raptor": False}}`
|
99 |
- `chunk_method`=`"one"`:
|
100 |
`None`
|
101 |
- `chunk_method`=`"knowledge-graph"`:
|
102 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
103 |
|
104 |
### Returns
|
105 |
|
@@ -322,6 +327,7 @@ A dictionary representing the attributes to update, with the following keys:
|
|
322 |
- `"picture"`: Picture
|
323 |
- `"one"`: One
|
324 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
325 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
|
326 |
- `"chunk_method"`=`"naive"`:
|
327 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
@@ -339,10 +345,14 @@ A dictionary representing the attributes to update, with the following keys:
|
|
339 |
`{"raptor": {"user_raptor": False}}`
|
340 |
- `chunk_method`=`"presentation"`:
|
341 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
342 |
- `chunk_method`=`"one"`:
|
343 |
`None`
|
344 |
- `chunk_method`=`"knowledge-graph"`:
|
345 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
346 |
|
347 |
### Returns
|
348 |
|
@@ -475,10 +485,14 @@ A `Document` object contains the following attributes:
|
|
475 |
`{"raptor": {"user_raptor": False}}`
|
476 |
- `chunk_method`=`"presentation"`:
|
477 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
478 |
- `chunk_method`=`"one"`:
|
479 |
`None`
|
480 |
- `chunk_method`=`"knowledge-graph"`:
|
481 |
`{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
482 |
|
483 |
### Examples
|
484 |
|
|
|
75 |
- `"picture"`: Picture
|
76 |
- `"one"`: One
|
77 |
- `"knowledge_graph"`: Knowledge Graph
|
78 |
+
- `"email"`: Email
|
79 |
|
80 |
#### parser_config
|
81 |
|
82 |
+
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`:
|
83 |
|
84 |
+
- `chunk_method`=`"naive"`:
|
85 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
86 |
- `chunk_method`=`"qa"`:
|
87 |
`{"raptor": {"user_raptor": False}}`
|
|
|
95 |
`{"raptor": {"user_raptor": False}}`
|
96 |
- `chunk_method`=`"laws"`:
|
97 |
`{"raptor": {"user_raptor": False}}`
|
98 |
+
- `chunk_method`=`"picture"`:
|
99 |
+
`None`
|
100 |
- `chunk_method`=`"presentation"`:
|
101 |
`{"raptor": {"user_raptor": False}}`
|
102 |
- `chunk_method`=`"one"`:
|
103 |
`None`
|
104 |
- `chunk_method`=`"knowledge-graph"`:
|
105 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
106 |
+
- `chunk_method`=`"email"`:
|
107 |
+
`None`
|
108 |
|
109 |
### Returns
|
110 |
|
|
|
327 |
- `"picture"`: Picture
|
328 |
- `"one"`: One
|
329 |
- `"knowledge_graph"`: Knowledge Graph
|
330 |
+
- `"email"`: Email
|
331 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
|
332 |
- `"chunk_method"`=`"naive"`:
|
333 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
|
|
345 |
`{"raptor": {"user_raptor": False}}`
|
346 |
- `chunk_method`=`"presentation"`:
|
347 |
`{"raptor": {"user_raptor": False}}`
|
348 |
+
- `chunk_method`=`"picture"`:
|
349 |
+
`None`
|
350 |
- `chunk_method`=`"one"`:
|
351 |
`None`
|
352 |
- `chunk_method`=`"knowledge-graph"`:
|
353 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
354 |
+
- `chunk_method`=`"email"`:
|
355 |
+
`None`
|
356 |
|
357 |
### Returns
|
358 |
|
|
|
485 |
`{"raptor": {"user_raptor": False}}`
|
486 |
- `chunk_method`=`"presentation"`:
|
487 |
`{"raptor": {"user_raptor": False}}`
|
488 |
+
- `chunk_method`=`"picure"`:
|
489 |
+
`None`
|
490 |
- `chunk_method`=`"one"`:
|
491 |
`None`
|
492 |
- `chunk_method`=`"knowledge-graph"`:
|
493 |
`{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
494 |
+
- `chunk_method`=`"email"`:
|
495 |
+
`None`
|
496 |
|
497 |
### Examples
|
498 |
|