writinwaters commited on
Commit
ad355eb
·
1 Parent(s): c4c45b6

Draft: Updated file management-related APIs (#2882)

Browse files

### What problem does this PR solve?

Updated file management-related APIs

### Type of change

- [x] Documentation Update

Files changed (1) hide show
  1. api/python_api_reference.md +106 -70
api/python_api_reference.md CHANGED
@@ -3,7 +3,7 @@
3
  **THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
4
 
5
  :::tip NOTE
6
- Knowledgebase APIs
7
  :::
8
 
9
  ## Create knowledge base
@@ -232,38 +232,46 @@ Updates the current knowledge base.
232
  from ragflow import RAGFlow
233
 
234
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
235
- ds = rag.list_datasets(name="kb_1")
236
- ds.update({"embedding_model":"BAAI/bge-zh-v1.5", "parse_method":"manual"})
237
  ```
 
238
  ---
239
 
240
  :::tip API GROUPING
241
- File management inside knowledge base
242
  :::
243
 
244
- ## Upload document
245
 
246
  ```python
247
  DataSet.upload_documents(document_list: list[dict])
248
  ```
249
 
 
 
250
  ### Parameters
251
 
252
- #### document_list:`list[dict]`
253
- A list composed of dicts containing `name` and `blob`.
 
254
 
 
 
 
255
 
256
  ### Returns
257
- no return
 
 
258
 
259
  ### Examples
260
- ```python
261
- from ragflow import RAGFlow
262
 
263
- rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
264
- ds = rag.create_dataset(name="kb_1")
265
- ds.upload_documents([{name="1.txt", blob="123"}, ...] }
266
  ```
 
267
  ---
268
 
269
  ## Update document
@@ -272,14 +280,18 @@ ds.upload_documents([{name="1.txt", blob="123"}, ...] }
272
  Document.update(update_message:dict)
273
  ```
274
 
 
 
275
  ### Parameters
276
 
277
- #### update_message:`dict`
278
- only `name`,`parser_config`,`parser_method` can be changed
 
279
 
280
  ### Returns
281
 
282
- no return
 
283
 
284
  ### Examples
285
 
@@ -287,11 +299,11 @@ no return
287
  from ragflow import RAGFlow
288
 
289
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
290
- ds=rag.list_datasets(id='id')
291
- ds=ds[0]
292
- doc = ds.list_documents(id="wdfxb5t547d")
293
  doc = doc[0]
294
- doc.update([{"parser_method": "manual"...}])
295
  ```
296
 
297
  ---
@@ -330,46 +342,49 @@ Dataset.list_documents(id:str =None, keywords: str=None, offset: int=0, limit:in
330
 
331
  ### Parameters
332
 
333
- #### id: `str`
334
 
335
  The id of the document to be got
336
 
337
- #### keywords: `str`
338
 
339
  List documents whose name has the given keywords. Defaults to `None`.
340
 
341
- #### offset: `int`
342
 
343
  The beginning number of records for paging. Defaults to `0`.
344
 
345
- #### limit: `int`
346
 
347
- Records number to return, -1 means all of them. Records number to return, -1 means all of them.
 
 
348
 
349
- #### orderby: `str`
350
  The field by which the records should be sorted. This specifies the attribute or column used to order the results.
351
 
352
- #### desc:`bool`
 
353
  A boolean flag indicating whether the sorting should be in descending order.
 
354
  ### Returns
355
 
356
  list[Document]
357
 
358
  A document object containing the following attributes:
359
 
360
- #### id: `str`
361
 
362
  Id of the retrieved document. Defaults to `""`.
363
 
364
- #### thumbnail: `str`
365
 
366
  Thumbnail image of the retrieved document. Defaults to `""`.
367
 
368
- #### knowledgebase_id: `str`
369
 
370
  Knowledge base ID related to the document. Defaults to `""`.
371
 
372
- #### parser_method: `str`
373
 
374
  Method used to parse the document. Defaults to `""`.
375
 
@@ -377,11 +392,11 @@ Method used to parse the document. Defaults to `""`.
377
 
378
  Configuration object for the parser. Defaults to `None`.
379
 
380
- #### source_type: `str`
381
 
382
  Source type of the document. Defaults to `""`.
383
 
384
- #### type: `str`
385
 
386
  Type or category of the document. Defaults to `""`.
387
 
@@ -389,9 +404,8 @@ Type or category of the document. Defaults to `""`.
389
 
390
  Creator of the document. Defaults to `""`.
391
 
392
- #### name: `str`
393
- string
394
- ''
395
  Name or title of the document. Defaults to `""`.
396
 
397
  #### size: `int`
@@ -428,13 +442,13 @@ Duration of the processing in seconds or minutes. Defaults to `0.0`.
428
  from ragflow import RAGFlow
429
 
430
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
431
- ds = rag.create_dataset(name="kb_1")
432
 
433
  filename1 = "~/ragflow.txt"
434
  blob=open(filename1 , "rb").read()
435
  list_files=[{"name":filename1,"blob":blob}]
436
- ds.upload_documents(list_files)
437
- for d in ds.list_documents(keywords="rag", offset=0, limit=12):
438
  print(d)
439
  ```
440
 
@@ -445,9 +459,11 @@ for d in ds.list_documents(keywords="rag", offset=0, limit=12):
445
  ```python
446
  DataSet.delete_documents(ids: list[str] = None)
447
  ```
 
448
  ### Returns
449
 
450
- no return
 
451
 
452
  ### Examples
453
 
@@ -471,20 +487,22 @@ DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
471
 
472
  ### Parameters
473
 
474
- #### document_ids:`list[str]`
 
475
  The ids of the documents to be parsed
476
  ????????????????????????????????????????????????????
477
 
478
  ### Returns
479
- no return
480
- ????????????????????????????????????????????????????
 
481
 
482
  ### Examples
483
 
484
  ```python
485
  #documents parse and cancel
486
  rag = RAGFlow(API_KEY, HOST_ADDRESS)
487
- ds = rag.create_dataset(name="God5")
488
  documents = [
489
  {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
490
  {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
@@ -501,10 +519,14 @@ ds.async_cancel_parse_documents(ids)
501
  print("Async bulk parsing cancelled")
502
  ```
503
 
 
 
504
  ## List chunks
 
505
  ```python
506
  Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
507
  ```
 
508
  ### Parameters
509
 
510
  - `keywords`: `str`
@@ -522,6 +544,7 @@ Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id
522
  - `id`: `str`
523
  The ID of the chunk to be retrieved
524
  default: `None`
 
525
  ### Returns
526
  list[chunk]
527
 
@@ -536,6 +559,7 @@ ds.async_parse_documents(["wdfxb5t547d"])
536
  for c in doc.list_chunks(keywords="rag", offset=0, limit=12):
537
  print(c)
538
  ```
 
539
  ## Add chunk
540
 
541
  ```python
@@ -545,8 +569,11 @@ Document.add_chunk(content:str) -> Chunk
545
  ### Parameters
546
 
547
  #### content: `str`, *Required*
 
548
  Contains the main text or information of the chunk.
 
549
  #### important_keywords :`list[str]`
 
550
  list the key terms or phrases that are significant or central to the chunk's content.
551
 
552
  ### Returns
@@ -574,12 +601,15 @@ chunk = doc.add_chunk(content="xxxxxxx")
574
  Document.delete_chunks(chunk_ids: list[str])
575
  ```
576
  ### Parameters
 
577
  #### chunk_ids:`list[str]`
 
578
  The list of chunk_id
579
 
580
  ### Returns
581
 
582
- no return
 
583
 
584
  ### Examples
585
 
@@ -614,7 +644,8 @@ Chunk.update(update_message: dict)
614
 
615
  ### Returns
616
 
617
- no return
 
618
 
619
  ### Examples
620
 
@@ -711,7 +742,7 @@ for c in rag.retrieve(question="What's ragflow?",
711
  ---
712
 
713
  :::tip API GROUPING
714
- Chat APIs
715
  :::
716
 
717
  ## Create chat assistant
@@ -1008,6 +1039,8 @@ session.update({"name": "updated_name"})
1008
  Session.ask(question: str, stream: bool = False) -> Optional[Message, iter[Message]]
1009
  ```
1010
 
 
 
1011
  ### Parameters
1012
 
1013
  #### question *Required*
@@ -1016,18 +1049,21 @@ The question to start an AI chat. Defaults to `None`.
1016
 
1017
  #### stream
1018
 
1019
- Indicates whether to output responses in a streaming way. Defaults to `False`.
 
 
 
1020
 
1021
  ### Returns
1022
 
1023
- Optional[Message, iter[Message]]
 
1024
 
1025
- - Message object, if `stream` is set to `False`
1026
- - iter[Message] object, if `stream` is set to `True`
1027
 
1028
  #### id: `str`
1029
 
1030
- The ID of the message. `id` is automatically generated.
1031
 
1032
  #### content: `str`
1033
 
@@ -1035,28 +1071,29 @@ The content of the message. Defaults to `"Hi! I am your assistant, can I help yo
1035
 
1036
  #### reference: `list[Chunk]`
1037
 
1038
- The auto-generated reference of the message. Each `chunk` object includes the following attributes:
1039
 
1040
  - **id**: `str`
1041
- The id of the chunk.
1042
  - **content**: `str`
1043
- The content of the chunk.
 
 
1044
  - **document_id**: `str`
1045
- The ID of the document being referenced.
1046
  - **document_name**: `str`
1047
- The name of the referenced document being referenced.
 
 
1048
  - **knowledgebase_id**: `str`
1049
- The id of the knowledge base to which the relevant document belongs.
1050
- - **image_id**: `str`
1051
- The id of the image related to the chunk.
1052
  - **similarity**: `float`
1053
- A general similarity score, usually a composite score derived from various similarity measures . This score represents the degree of similarity between two objects. The value ranges between 0 and 1, where a value closer to 1 indicates higher similarity.
1054
  - **vector_similarity**: `float`
1055
- A similarity score based on vector representations. This score is obtained by converting texts, words, or objects into vectors and then calculating the cosine similarity or other distance measures between these vectors to determine the similarity in vector space. A higher value indicates greater similarity in the vector space.
1056
  - **term_similarity**: `float`
1057
- The similarity score based on terms or keywords. This score is calculated by comparing the similarity of key terms between texts or datasets, typically measuring how similar two words or phrases are in meaning or context. A higher value indicates a stronger similarity between terms.
1058
- - **position**: `list[string]`
1059
- Indicates the position or index of keywords or specific terms within the text. An array is typically used to mark the location of keywords or specific elements, facilitating precise operations or analysis of the text.
1060
 
1061
  ### Examples
1062
 
@@ -1066,7 +1103,7 @@ from ragflow import RAGFlow
1066
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
1067
  assistant = rag.list_chats(name="Miss R")
1068
  assistant = assistant[0]
1069
- sess = assistant.create_session()
1070
 
1071
  print("\n==================== Miss R =====================\n")
1072
  print(assistant.get_prologue())
@@ -1076,10 +1113,9 @@ while True:
1076
  print("\n==================== Miss R =====================\n")
1077
 
1078
  cont = ""
1079
- for ans in sess.ask(question, stream=True):
1080
- print(ans.content[len(cont):], end='', flush=True)
1081
- cont = ans.content
1082
-
1083
  ```
1084
 
1085
  ---
 
3
  **THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
4
 
5
  :::tip NOTE
6
+ Knowledge Base Management
7
  :::
8
 
9
  ## Create knowledge base
 
232
  from ragflow import RAGFlow
233
 
234
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
235
+ dataset = rag.list_datasets(name="kb_1")
236
+ dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "parse_method":"manual"})
237
  ```
238
+
239
  ---
240
 
241
  :::tip API GROUPING
242
+ File Management within Knowledge Base
243
  :::
244
 
245
+ ## Upload documents
246
 
247
  ```python
248
  DataSet.upload_documents(document_list: list[dict])
249
  ```
250
 
251
+ Updloads documents to the current knowledge base.
252
+
253
  ### Parameters
254
 
255
+ #### document_list
256
+
257
+ A list of dictionaries representing the documents to upload, each containing the following keys:
258
 
259
+ - `"name"`: (Optional) File path to the document to upload.
260
+ Ensure that each file path has a suffix.
261
+ - `"blob"`: (Optional) The document to upload in binary format.
262
 
263
  ### Returns
264
+
265
+ - Success: No value is returned.
266
+ - Failure: `Exception`
267
 
268
  ### Examples
 
 
269
 
270
+ ```python
271
+ dataset = rag.create_dataset(name="kb_name")
272
+ dataset.upload_documents([{name="1.txt", blob="123"}, ...])
273
  ```
274
+
275
  ---
276
 
277
  ## Update document
 
280
  Document.update(update_message:dict)
281
  ```
282
 
283
+ Updates configurations for the current document.
284
+
285
  ### Parameters
286
 
287
+ #### update_message: `dict`
288
+
289
+ only `name`, `parser_config`, and `parser_method` can be changed
290
 
291
  ### Returns
292
 
293
+ - Success: No value is returned.
294
+ - Failure: `Exception`
295
 
296
  ### Examples
297
 
 
299
  from ragflow import RAGFlow
300
 
301
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
302
+ dataset=rag.list_datasets(id='id')
303
+ dataset=dataset[0]
304
+ doc = dataset.list_documents(id="wdfxb5t547d")
305
  doc = doc[0]
306
+ doc.update([{"parser_method": "manual"}])
307
  ```
308
 
309
  ---
 
342
 
343
  ### Parameters
344
 
345
+ #### id
346
 
347
  The id of the document to be got
348
 
349
+ #### keywords
350
 
351
  List documents whose name has the given keywords. Defaults to `None`.
352
 
353
+ #### offset
354
 
355
  The beginning number of records for paging. Defaults to `0`.
356
 
357
+ #### limit
358
 
359
+ Records number to return, `-1` means all of them. Records number to return, `-1` means all of them.
360
+
361
+ #### orderby
362
 
 
363
  The field by which the records should be sorted. This specifies the attribute or column used to order the results.
364
 
365
+ #### desc
366
+
367
  A boolean flag indicating whether the sorting should be in descending order.
368
+
369
  ### Returns
370
 
371
  list[Document]
372
 
373
  A document object containing the following attributes:
374
 
375
+ #### id
376
 
377
  Id of the retrieved document. Defaults to `""`.
378
 
379
+ #### thumbnail
380
 
381
  Thumbnail image of the retrieved document. Defaults to `""`.
382
 
383
+ #### knowledgebase_id
384
 
385
  Knowledge base ID related to the document. Defaults to `""`.
386
 
387
+ #### parser_method
388
 
389
  Method used to parse the document. Defaults to `""`.
390
 
 
392
 
393
  Configuration object for the parser. Defaults to `None`.
394
 
395
+ #### source_type
396
 
397
  Source type of the document. Defaults to `""`.
398
 
399
+ #### type
400
 
401
  Type or category of the document. Defaults to `""`.
402
 
 
404
 
405
  Creator of the document. Defaults to `""`.
406
 
407
+ #### name
408
+
 
409
  Name or title of the document. Defaults to `""`.
410
 
411
  #### size: `int`
 
442
  from ragflow import RAGFlow
443
 
444
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
445
+ dataset = rag.create_dataset(name="kb_1")
446
 
447
  filename1 = "~/ragflow.txt"
448
  blob=open(filename1 , "rb").read()
449
  list_files=[{"name":filename1,"blob":blob}]
450
+ dataset.upload_documents(list_files)
451
+ for d in dataset.list_documents(keywords="rag", offset=0, limit=12):
452
  print(d)
453
  ```
454
 
 
459
  ```python
460
  DataSet.delete_documents(ids: list[str] = None)
461
  ```
462
+
463
  ### Returns
464
 
465
+ - Success: No value is returned.
466
+ - Failure: `Exception`
467
 
468
  ### Examples
469
 
 
487
 
488
  ### Parameters
489
 
490
+ #### document_ids: `list[str]`
491
+
492
  The ids of the documents to be parsed
493
  ????????????????????????????????????????????????????
494
 
495
  ### Returns
496
+
497
+ - Success: No value is returned.
498
+ - Failure: `Exception`
499
 
500
  ### Examples
501
 
502
  ```python
503
  #documents parse and cancel
504
  rag = RAGFlow(API_KEY, HOST_ADDRESS)
505
+ ds = rag.create_dataset(name="dataset_name")
506
  documents = [
507
  {'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
508
  {'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
 
519
  print("Async bulk parsing cancelled")
520
  ```
521
 
522
+ ---
523
+
524
  ## List chunks
525
+
526
  ```python
527
  Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
528
  ```
529
+
530
  ### Parameters
531
 
532
  - `keywords`: `str`
 
544
  - `id`: `str`
545
  The ID of the chunk to be retrieved
546
  default: `None`
547
+
548
  ### Returns
549
  list[chunk]
550
 
 
559
  for c in doc.list_chunks(keywords="rag", offset=0, limit=12):
560
  print(c)
561
  ```
562
+
563
  ## Add chunk
564
 
565
  ```python
 
569
  ### Parameters
570
 
571
  #### content: `str`, *Required*
572
+
573
  Contains the main text or information of the chunk.
574
+
575
  #### important_keywords :`list[str]`
576
+
577
  list the key terms or phrases that are significant or central to the chunk's content.
578
 
579
  ### Returns
 
601
  Document.delete_chunks(chunk_ids: list[str])
602
  ```
603
  ### Parameters
604
+
605
  #### chunk_ids:`list[str]`
606
+
607
  The list of chunk_id
608
 
609
  ### Returns
610
 
611
+ - Success: No value is returned.
612
+ - Failure: `Exception`
613
 
614
  ### Examples
615
 
 
644
 
645
  ### Returns
646
 
647
+ - Success: No value is returned.
648
+ - Failure: `Exception`
649
 
650
  ### Examples
651
 
 
742
  ---
743
 
744
  :::tip API GROUPING
745
+ Chat Assistant Management
746
  :::
747
 
748
  ## Create chat assistant
 
1039
  Session.ask(question: str, stream: bool = False) -> Optional[Message, iter[Message]]
1040
  ```
1041
 
1042
+ Asks a question to start a conversation.
1043
+
1044
  ### Parameters
1045
 
1046
  #### question *Required*
 
1049
 
1050
  #### stream
1051
 
1052
+ Indicates whether to output responses in a streaming way:
1053
+
1054
+ - `True`: Enable streaming.
1055
+ - `False`: (Default) Disable streaming.
1056
 
1057
  ### Returns
1058
 
1059
+ - A `Message` object containing the response to the question if `stream` is set to `False`
1060
+ - An iterator containing multiple `message` objects (`iter[Message]`) if `stream` is set to `True`
1061
 
1062
+ The following shows the attributes of a `Message` object:
 
1063
 
1064
  #### id: `str`
1065
 
1066
+ The auto-generated message ID.
1067
 
1068
  #### content: `str`
1069
 
 
1071
 
1072
  #### reference: `list[Chunk]`
1073
 
1074
+ A list of `Chunk` objects representing references to the message, each containing the following attributes:
1075
 
1076
  - **id**: `str`
1077
+ The chunk ID.
1078
  - **content**: `str`
1079
+ The content of the chunk.
1080
+ - **image_id**: `str`
1081
+ The ID of the snapshot of the chunk.
1082
  - **document_id**: `str`
1083
+ The ID of the referenced document.
1084
  - **document_name**: `str`
1085
+ The name of the referenced document.
1086
+ - **position**: `list[str]`
1087
+ The location information of the chunk within the referenced document.
1088
  - **knowledgebase_id**: `str`
1089
+ The ID of the knowledge base to which the referenced document belongs.
 
 
1090
  - **similarity**: `float`
1091
+ A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity.
1092
  - **vector_similarity**: `float`
1093
+ A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
1094
  - **term_similarity**: `float`
1095
+ A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.
1096
+
 
1097
 
1098
  ### Examples
1099
 
 
1103
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
1104
  assistant = rag.list_chats(name="Miss R")
1105
  assistant = assistant[0]
1106
+ session = assistant.create_session()
1107
 
1108
  print("\n==================== Miss R =====================\n")
1109
  print(assistant.get_prologue())
 
1113
  print("\n==================== Miss R =====================\n")
1114
 
1115
  cont = ""
1116
+ for ans in session.ask(question, stream=True):
1117
+ print(answer.content[len(cont):], end='', flush=True)
1118
+ cont = answer.content
 
1119
  ```
1120
 
1121
  ---