writinwaters commited on
Commit
7ddda98
·
1 Parent(s): 3366eac

Added chunk methods (#3110)

Browse files

### What problem does this PR solve?



### Type of change


- [x] Documentation Update

api/http_api_reference.md CHANGED
@@ -88,6 +88,7 @@ curl --request POST \
88
  - `"picture"`: Picture
89
  - `"one"`: One
90
  - `"knowledge_graph"`: Knowledge Graph
 
91
 
92
  - `"parser_config"`: (*Body parameter*), `object`
93
  The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
@@ -100,7 +101,7 @@ curl --request POST \
100
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
101
  - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
102
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
103
- - If `"chunk_method"` is `"table"` or `"one"`, `"parser_config"` is an empty JSON object.
104
  - If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
105
  - `"chunk_token_count"`: Defaults to `128`.
106
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
@@ -517,6 +518,7 @@ curl --request PUT \
517
  - `"picture"`: Picture
518
  - `"one"`: One
519
  - `"knowledge_graph"`: Knowledge Graph
 
520
  - `"parser_config"`: (*Body parameter*), `object`
521
  The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
522
  - If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
@@ -528,7 +530,7 @@ curl --request PUT \
528
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
529
  - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
530
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
531
- - If `"chunk_method"` is `"table"` or `"one"`, `"parser_config"` is an empty JSON object.
532
  - If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
533
  - `"chunk_token_count"`: Defaults to `128`.
534
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
 
88
  - `"picture"`: Picture
89
  - `"one"`: One
90
  - `"knowledge_graph"`: Knowledge Graph
91
+ - `"email"`: Email
92
 
93
  - `"parser_config"`: (*Body parameter*), `object`
94
  The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
 
101
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
102
  - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
103
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
104
+ - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
105
  - If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
106
  - `"chunk_token_count"`: Defaults to `128`.
107
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
 
518
  - `"picture"`: Picture
519
  - `"one"`: One
520
  - `"knowledge_graph"`: Knowledge Graph
521
+ - `"email"`: Email
522
  - `"parser_config"`: (*Body parameter*), `object`
523
  The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
524
  - If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
 
530
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
531
  - If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
532
  - `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
533
+ - If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
534
  - If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
535
  - `"chunk_token_count"`: Defaults to `128`.
536
  - `"delimiter"`: Defaults to `"\n!?。;!?"`.
api/python_api_reference.md CHANGED
@@ -75,12 +75,13 @@ The chunking method of the dataset to create. Available options:
75
  - `"picture"`: Picture
76
  - `"one"`: One
77
  - `"knowledge_graph"`: Knowledge Graph
 
78
 
79
  #### parser_config
80
 
81
- The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `"chunk_method"`:
82
 
83
- - `"chunk_method"`=`"naive"`:
84
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
85
  - `chunk_method`=`"qa"`:
86
  `{"raptor": {"user_raptor": False}}`
@@ -94,12 +95,16 @@ The parser configuration of the dataset. A `ParserConfig` object's attributes va
94
  `{"raptor": {"user_raptor": False}}`
95
  - `chunk_method`=`"laws"`:
96
  `{"raptor": {"user_raptor": False}}`
 
 
97
  - `chunk_method`=`"presentation"`:
98
  `{"raptor": {"user_raptor": False}}`
99
  - `chunk_method`=`"one"`:
100
  `None`
101
  - `chunk_method`=`"knowledge-graph"`:
102
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
 
 
103
 
104
  ### Returns
105
 
@@ -322,6 +327,7 @@ A dictionary representing the attributes to update, with the following keys:
322
  - `"picture"`: Picture
323
  - `"one"`: One
324
  - `"knowledge_graph"`: Knowledge Graph
 
325
  - `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
326
  - `"chunk_method"`=`"naive"`:
327
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
@@ -339,10 +345,14 @@ A dictionary representing the attributes to update, with the following keys:
339
  `{"raptor": {"user_raptor": False}}`
340
  - `chunk_method`=`"presentation"`:
341
  `{"raptor": {"user_raptor": False}}`
 
 
342
  - `chunk_method`=`"one"`:
343
  `None`
344
  - `chunk_method`=`"knowledge-graph"`:
345
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
 
 
346
 
347
  ### Returns
348
 
@@ -475,10 +485,14 @@ A `Document` object contains the following attributes:
475
  `{"raptor": {"user_raptor": False}}`
476
  - `chunk_method`=`"presentation"`:
477
  `{"raptor": {"user_raptor": False}}`
 
 
478
  - `chunk_method`=`"one"`:
479
  `None`
480
  - `chunk_method`=`"knowledge-graph"`:
481
  `{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
 
 
482
 
483
  ### Examples
484
 
 
75
  - `"picture"`: Picture
76
  - `"one"`: One
77
  - `"knowledge_graph"`: Knowledge Graph
78
+ - `"email"`: Email
79
 
80
  #### parser_config
81
 
82
+ The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`:
83
 
84
+ - `chunk_method`=`"naive"`:
85
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
86
  - `chunk_method`=`"qa"`:
87
  `{"raptor": {"user_raptor": False}}`
 
95
  `{"raptor": {"user_raptor": False}}`
96
  - `chunk_method`=`"laws"`:
97
  `{"raptor": {"user_raptor": False}}`
98
+ - `chunk_method`=`"picture"`:
99
+ `None`
100
  - `chunk_method`=`"presentation"`:
101
  `{"raptor": {"user_raptor": False}}`
102
  - `chunk_method`=`"one"`:
103
  `None`
104
  - `chunk_method`=`"knowledge-graph"`:
105
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
106
+ - `chunk_method`=`"email"`:
107
+ `None`
108
 
109
  ### Returns
110
 
 
327
  - `"picture"`: Picture
328
  - `"one"`: One
329
  - `"knowledge_graph"`: Knowledge Graph
330
+ - `"email"`: Email
331
  - `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
332
  - `"chunk_method"`=`"naive"`:
333
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
 
345
  `{"raptor": {"user_raptor": False}}`
346
  - `chunk_method`=`"presentation"`:
347
  `{"raptor": {"user_raptor": False}}`
348
+ - `chunk_method`=`"picture"`:
349
+ `None`
350
  - `chunk_method`=`"one"`:
351
  `None`
352
  - `chunk_method`=`"knowledge-graph"`:
353
  `{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
354
+ - `chunk_method`=`"email"`:
355
+ `None`
356
 
357
  ### Returns
358
 
 
485
  `{"raptor": {"user_raptor": False}}`
486
  - `chunk_method`=`"presentation"`:
487
  `{"raptor": {"user_raptor": False}}`
488
+ - `chunk_method`=`"picure"`:
489
+ `None`
490
  - `chunk_method`=`"one"`:
491
  `None`
492
  - `chunk_method`=`"knowledge-graph"`:
493
  `{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
494
+ - `chunk_method`=`"email"`:
495
+ `None`
496
 
497
  ### Examples
498