writinwaters commited on
Commit
116c571
·
1 Parent(s): ad355eb

Miscellaneous minor updates (#2885)

Browse files

### What problem does this PR solve?



### Type of change


- [x] Documentation Update

Files changed (1) hide show
  1. api/python_api_reference.md +86 -142
api/python_api_reference.md CHANGED
@@ -204,10 +204,10 @@ Updates the current knowledge base.
204
  #### update_message: `dict[str, str|int]`, *Required*
205
 
206
  - `"name"`: `str` The name of the knowledge base to update.
207
- - `"tenant_id"`: `str` The `"tenant_id` you get after calling `create_dataset()`.
208
  - `"embedding_model"`: `str` The embedding model for generating vector embeddings.
209
  - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
210
- - `"parser_method"`: `str`
211
  - `"naive"`: General
212
  - `"manual`: Manual
213
  - `"qa"`: Q&A
@@ -232,7 +232,7 @@ Updates the current knowledge base.
232
  from ragflow import RAGFlow
233
 
234
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
235
- dataset = rag.list_datasets(name="kb_1")
236
  dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "parse_method":"manual"})
237
  ```
238
 
@@ -269,7 +269,7 @@ A list of dictionaries representing the documents to upload, each containing the
269
 
270
  ```python
271
  dataset = rag.create_dataset(name="kb_name")
272
- dataset.upload_documents([{name="1.txt", blob="123"}, ...])
273
  ```
274
 
275
  ---
@@ -284,7 +284,7 @@ Updates configurations for the current document.
284
 
285
  ### Parameters
286
 
287
- #### update_message: `dict`
288
 
289
  only `name`, `parser_config`, and `parser_method` can be changed
290
 
@@ -316,7 +316,7 @@ Document.download() -> bytes
316
 
317
  ### Returns
318
 
319
- bytes of the document.
320
 
321
  ### Examples
322
 
@@ -344,7 +344,7 @@ Dataset.list_documents(id:str =None, keywords: str=None, offset: int=0, limit:in
344
 
345
  #### id
346
 
347
- The id of the document to be got
348
 
349
  #### keywords
350
 
@@ -368,73 +368,27 @@ A boolean flag indicating whether the sorting should be in descending order.
368
 
369
  ### Returns
370
 
371
- list[Document]
372
-
373
- A document object containing the following attributes:
374
-
375
- #### id
376
-
377
- Id of the retrieved document. Defaults to `""`.
378
-
379
- #### thumbnail
380
-
381
- Thumbnail image of the retrieved document. Defaults to `""`.
382
-
383
- #### knowledgebase_id
384
-
385
- Knowledge base ID related to the document. Defaults to `""`.
386
-
387
- #### parser_method
388
-
389
- Method used to parse the document. Defaults to `""`.
390
-
391
- #### parser_config: `ParserConfig`
392
-
393
- Configuration object for the parser. Defaults to `None`.
394
-
395
- #### source_type
396
-
397
- Source type of the document. Defaults to `""`.
398
-
399
- #### type
400
-
401
- Type or category of the document. Defaults to `""`.
402
-
403
- #### created_by: `str`
404
-
405
- Creator of the document. Defaults to `""`.
406
-
407
- #### name
408
-
409
- Name or title of the document. Defaults to `""`.
410
-
411
- #### size: `int`
412
-
413
- Size of the document in bytes or some other unit. Defaults to `0`.
414
-
415
- #### token_count: `int`
416
-
417
- Number of tokens in the document. Defaults to `""`.
418
-
419
- #### chunk_count: `int`
420
-
421
- Number of chunks the document is split into. Defaults to `0`.
422
-
423
- #### progress: `float`
424
-
425
- Current processing progress as a percentage. Defaults to `0.0`.
426
-
427
- #### progress_msg: `str`
428
-
429
- Message indicating current progress status. Defaults to `""`.
430
-
431
- #### process_begin_at: `datetime`
432
-
433
- Start time of the document processing. Defaults to `None`.
434
-
435
- #### process_duation: `float`
436
 
437
- Duration of the processing in seconds or minutes. Defaults to `0.0`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
438
 
439
  ### Examples
440
 
@@ -460,6 +414,8 @@ for d in dataset.list_documents(keywords="rag", offset=0, limit=12):
460
  DataSet.delete_documents(ids: list[str] = None)
461
  ```
462
 
 
 
463
  ### Returns
464
 
465
  - Success: No value is returned.
@@ -489,8 +445,7 @@ DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
489
 
490
  #### document_ids: `list[str]`
491
 
492
- The ids of the documents to be parsed
493
- ????????????????????????????????????????????????????
494
 
495
  ### Returns
496
 
@@ -529,26 +484,28 @@ Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id
529
 
530
  ### Parameters
531
 
532
- - `keywords`: `str`
533
- List chunks whose name has the given keywords
534
- default: `None`
 
 
535
 
536
- - `offset`: `int`
537
- The beginning number of records for paging
538
- default: `1`
539
 
540
- - `limit`: `int`
541
- Records number to return
542
- default: `30`
543
 
544
- - `id`: `str`
545
- The ID of the chunk to be retrieved
546
- default: `None`
 
 
547
 
548
  ### Returns
 
549
  list[chunk]
550
 
551
  ### Examples
 
552
  ```python
553
  from ragflow import RAGFlow
554
 
@@ -568,13 +525,13 @@ Document.add_chunk(content:str) -> Chunk
568
 
569
  ### Parameters
570
 
571
- #### content: `str`, *Required*
572
 
573
- Contains the main text or information of the chunk.
574
 
575
  #### important_keywords :`list[str]`
576
 
577
- list the key terms or phrases that are significant or central to the chunk's content.
578
 
579
  ### Returns
580
 
@@ -586,9 +543,9 @@ chunk
586
  from ragflow import RAGFlow
587
 
588
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
589
- ds = rag.list_datasets(id="123")
590
- ds = ds[0]
591
- doc = ds.list_documents(id="wdfxb5t547d")
592
  doc = doc[0]
593
  chunk = doc.add_chunk(content="xxxxxxx")
594
  ```
@@ -600,11 +557,12 @@ chunk = doc.add_chunk(content="xxxxxxx")
600
  ```python
601
  Document.delete_chunks(chunk_ids: list[str])
602
  ```
 
603
  ### Parameters
604
 
605
  #### chunk_ids:`list[str]`
606
 
607
- The list of chunk_id
608
 
609
  ### Returns
610
 
@@ -633,14 +591,12 @@ doc.delete_chunks(["id_1","id_2"])
633
  Chunk.update(update_message: dict)
634
  ```
635
  ### Parameters
636
- - `content`: `str`
637
- Contains the main text or information of the chunk
638
 
639
- - `important_keywords`: `list[str]`
640
- List the key terms or phrases that are significant or central to the chunk's content
641
 
642
- - `available`: `int`
643
- Indicating the availability status, `0` means unavailable and `1` means available
 
644
 
645
  ### Returns
646
 
@@ -653,12 +609,12 @@ Chunk.update(update_message: dict)
653
  from ragflow import RAGFlow
654
 
655
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
656
- ds = rag.list_datasets(id="123")
657
- ds = ds[0]
658
- doc = ds.list_documents(id="wdfxb5t547d")
659
  doc = doc[0]
660
  chunk = doc.add_chunk(content="xxxxxxx")
661
- chunk.update({"content":"sdfx...})
662
  ```
663
 
664
  ---
@@ -764,44 +720,30 @@ Creates a chat assistant.
764
  - Success: A `Chat` object representing the chat assistant.
765
  - Failure: `Exception`
766
 
767
- #### name: `str`
768
-
769
- The name of the chat assistant. Defaults to `"assistant"`.
770
-
771
- #### avatar: `str`
772
-
773
- Base64 encoding of the avatar. Defaults to `""`.
774
-
775
- #### knowledgebases: `list[str]`
776
-
777
- The associated knowledge bases. Defaults to `["kb1"]`.
778
-
779
- #### llm: `LLM`
780
-
781
- The llm of the created chat. Defaults to `None`. When the value is `None`, a dictionary with the following values will be generated as the default.
782
-
783
- - **model_name**, `str`
784
- The chat model name. If it is `None`, the user's default chat model will be returned.
785
- - **temperature**, `float`
786
- Controls the randomness of the model's predictions. A lower temperature increases the model's conficence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.
787
- - **top_p**, `float`
788
- Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`
789
- - **presence_penalty**, `float`
790
- This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
791
- - **frequency penalty**, `float`
792
- Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
793
- - **max_token**, `int`
794
- This sets the maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
795
-
796
- #### Prompt: `str`
797
-
798
- Instructions for the LLM to follow.
799
-
800
- - `"similarity_threshold"`: `float` A similarity score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity. If the similarity between query and chunk is less than this threshold, the chunk will be filtered out. Defaults to `0.2`.
801
- - `"keywords_similarity_weight"`: `float` It's weighted keywords similarity and vector cosine similarity or rerank score (0~1). Defaults to `0.7`.
802
- - `"top_n"`: `int` Not all the chunks whose similarity score is above the 'similarity threshold' will be feed to LLMs. LLM can only see these 'Top N' chunks. Defaults to `8`.
803
- - `"variables"`: `list[dict[]]` If you use dialog APIs, the variables might help you chat with your clients with different strategies. The variables are used to fill in the 'System' part in prompt in order to give LLM a hint. The 'knowledge' is a very special variable which will be filled-in with the retrieved chunks. All the variables in 'System' should be curly bracketed. Defaults to `[{"key": "knowledge", "optional": True}]`
804
- - `"rerank_model"`: `str` If it is not specified, vector cosine similarity will be used; otherwise, reranking score will be used. Defaults to `""`.
805
  - `"empty_response"`: `str` If nothing is retrieved in the knowledge base for the user's question, this will be used as the response. To allow the LLM to improvise when nothing is retrieved, leave this blank. Defaults to `None`.
806
  - `"opener"`: `str` The opening greeting for the user. Defaults to `"Hi! I am your assistant, can I help you?"`.
807
  - `"show_quote`: `bool` Indicates whether the source of text should be displayed Defaults to `True`.
@@ -919,6 +861,8 @@ RAGFlow.list_chats(
919
  ) -> list[Chat]
920
  ```
921
 
 
 
922
  ### Parameters
923
 
924
  #### page
 
204
  #### update_message: `dict[str, str|int]`, *Required*
205
 
206
  - `"name"`: `str` The name of the knowledge base to update.
207
+ - `"tenant_id"`: `str` The `"tenant_id` you get after calling `create_dataset()`. ?????????????????????
208
  - `"embedding_model"`: `str` The embedding model for generating vector embeddings.
209
  - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
210
+ - `"parser_method"`: `str` The default parsing method for the knowledge base.
211
  - `"naive"`: General
212
  - `"manual`: Manual
213
  - `"qa"`: Q&A
 
232
  from ragflow import RAGFlow
233
 
234
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
235
+ dataset = rag.list_datasets(name="kb_name")
236
  dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "parse_method":"manual"})
237
  ```
238
 
 
269
 
270
  ```python
271
  dataset = rag.create_dataset(name="kb_name")
272
+ dataset.upload_documents([{"name": "1.txt", "blob": "123"}])
273
  ```
274
 
275
  ---
 
284
 
285
  ### Parameters
286
 
287
+ #### update_message: `dict[str, str|int]`, *Required*
288
 
289
  only `name`, `parser_config`, and `parser_method` can be changed
290
 
 
316
 
317
  ### Returns
318
 
319
+ Bytes of the document.
320
 
321
  ### Examples
322
 
 
344
 
345
  #### id
346
 
347
+ The id of the document to retrieve.
348
 
349
  #### keywords
350
 
 
368
 
369
  ### Returns
370
 
371
+ - Success: A list of `Document` objects.
372
+ - Failure: `Exception`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
373
 
374
+ A `Document` object contains the following attributes:
375
+
376
+ - `id` Id of the retrieved document. Defaults to `""`.
377
+ - `thumbnail` Thumbnail image of the retrieved document. Defaults to `""`.
378
+ - `knowledgebase_id` Knowledge base ID related to the document. Defaults to `""`.
379
+ - `parser_method` Method used to parse the document. Defaults to `""`.
380
+ - `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `None`.
381
+ - `source_type`: Source type of the document. Defaults to `""`.
382
+ - `type`: Type or category of the document. Defaults to `""`.
383
+ - `created_by`: `str` Creator of the document. Defaults to `""`.
384
+ - `name` Name or title of the document. Defaults to `""`.
385
+ - `size`: `int` Size of the document in bytes or some other unit. Defaults to `0`.
386
+ - `token_count`: `int` Number of tokens in the document. Defaults to `""`.
387
+ - `chunk_count`: `int` Number of chunks the document is split into. Defaults to `0`.
388
+ - `progress`: `float` Current processing progress as a percentage. Defaults to `0.0`.
389
+ - `progress_msg`: `str` Message indicating current progress status. Defaults to `""`.
390
+ - `process_begin_at`: `datetime` Start time of the document processing. Defaults to `None`.
391
+ - `process_duation`: `float` Duration of the processing in seconds or minutes. Defaults to `0.0`.
392
 
393
  ### Examples
394
 
 
414
  DataSet.delete_documents(ids: list[str] = None)
415
  ```
416
 
417
+ Deletes specified documents or all documents from the current knowledge base.
418
+
419
  ### Returns
420
 
421
  - Success: No value is returned.
 
445
 
446
  #### document_ids: `list[str]`
447
 
448
+ The IDs of the documents to parse.
 
449
 
450
  ### Returns
451
 
 
484
 
485
  ### Parameters
486
 
487
+ #### keywords
488
+
489
+ List chunks whose name has the given keywords. Defaults to `None`
490
+
491
+ #### offset
492
 
493
+ The beginning number of records for paging. Defaults to `1`
 
 
494
 
495
+ #### limit
 
 
496
 
497
+ Records number to return. Default: `30`
498
+
499
+ #### id
500
+
501
+ The ID of the chunk to retrieve. Default: `None`
502
 
503
  ### Returns
504
+
505
  list[chunk]
506
 
507
  ### Examples
508
+
509
  ```python
510
  from ragflow import RAGFlow
511
 
 
525
 
526
  ### Parameters
527
 
528
+ #### content: *Required*
529
 
530
+ The main text or information of the chunk.
531
 
532
  #### important_keywords :`list[str]`
533
 
534
+ List the key terms or phrases that are significant or central to the chunk's content.
535
 
536
  ### Returns
537
 
 
543
  from ragflow import RAGFlow
544
 
545
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
546
+ dataset = rag.list_datasets(id="123")
547
+ dtaset = dataset[0]
548
+ doc = dataset.list_documents(id="wdfxb5t547d")
549
  doc = doc[0]
550
  chunk = doc.add_chunk(content="xxxxxxx")
551
  ```
 
557
  ```python
558
  Document.delete_chunks(chunk_ids: list[str])
559
  ```
560
+
561
  ### Parameters
562
 
563
  #### chunk_ids:`list[str]`
564
 
565
+ A list of chunk_id.
566
 
567
  ### Returns
568
 
 
591
  Chunk.update(update_message: dict)
592
  ```
593
  ### Parameters
 
 
594
 
595
+ #### update_message: *Required*
 
596
 
597
+ - `content`: `str` Contains the main text or information of the chunk
598
+ - `important_keywords`: `list[str]` List the key terms or phrases that are significant or central to the chunk's content
599
+ - `available`: `int` Indicating the availability status, `0` means unavailable and `1` means available
600
 
601
  ### Returns
602
 
 
609
  from ragflow import RAGFlow
610
 
611
  rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
612
+ dataset = rag.list_datasets(id="123")
613
+ dataset = dataset[0]
614
+ doc = dataset.list_documents(id="wdfxb5t547d")
615
  doc = doc[0]
616
  chunk = doc.add_chunk(content="xxxxxxx")
617
+ chunk.update({"content":"sdfx..."})
618
  ```
619
 
620
  ---
 
720
  - Success: A `Chat` object representing the chat assistant.
721
  - Failure: `Exception`
722
 
723
+ The following shows the attributes of a `Chat` object:
724
+
725
+ - `name`: `str` The name of the chat assistant. Defaults to `"assistant"`.
726
+ - `avatar`: `str` Base64 encoding of the avatar. Defaults to `""`.
727
+ - `knowledgebases`: `list[str]` The associated knowledge bases. Defaults to `["kb1"]`.
728
+ - `llm`: `LLM` The llm of the created chat. Defaults to `None`. When the value is `None`, a dictionary with the following values will be generated as the default.
729
+ - `model_name`, `str`
730
+ The chat model name. If it is `None`, the user's default chat model will be returned.
731
+ - `temperature`, `float`
732
+ Controls the randomness of the model's predictions. A lower temperature increases the model's conficence in its responses; a higher temperature increases creativity and diversity. Defaults to `0.1`.
733
+ - `top_p`, `float`
734
+ Also known as “nucleus sampling”, this parameter sets a threshold to select a smaller set of words to sample from. It focuses on the most likely words, cutting off the less probable ones. Defaults to `0.3`
735
+ - `presence_penalty`, `float`
736
+ This discourages the model from repeating the same information by penalizing words that have already appeared in the conversation. Defaults to `0.2`.
737
+ - `frequency penalty`, `float`
738
+ Similar to the presence penalty, this reduces the model’s tendency to repeat the same words frequently. Defaults to `0.7`.
739
+ - `max_token`, `int`
740
+ This sets the maximum length of the model’s output, measured in the number of tokens (words or pieces of words). Defaults to `512`.
741
+ - `Prompt`: `Prompt` Instructions for the LLM to follow.
742
+ - `"similarity_threshold"`: `float` A similarity score to evaluate distance between two lines of text. It's weighted keywords similarity and vector cosine similarity. If the similarity between query and chunk is less than this threshold, the chunk will be filtered out. Defaults to `0.2`.
743
+ - `"keywords_similarity_weight"`: `float` It's weighted keywords similarity and vector cosine similarity or rerank score (0~1). Defaults to `0.7`.
744
+ - `"top_n"`: `int` Not all the chunks whose similarity score is above the 'similarity threshold' will be feed to LLMs. LLM can only see these 'Top N' chunks. Defaults to `8`.
745
+ - `"variables"`: `list[dict[]]` If you use dialog APIs, the variables might help you chat with your clients with different strategies. The variables are used to fill in the 'System' part in prompt in order to give LLM a hint. The 'knowledge' is a very special variable which will be filled-in with the retrieved chunks. All the variables in 'System' should be curly bracketed. Defaults to `[{"key": "knowledge", "optional": True}]`
746
+ - `"rerank_model"`: `str` If it is not specified, vector cosine similarity will be used; otherwise, reranking score will be used. Defaults to `""`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
747
  - `"empty_response"`: `str` If nothing is retrieved in the knowledge base for the user's question, this will be used as the response. To allow the LLM to improvise when nothing is retrieved, leave this blank. Defaults to `None`.
748
  - `"opener"`: `str` The opening greeting for the user. Defaults to `"Hi! I am your assistant, can I help you?"`.
749
  - `"show_quote`: `bool` Indicates whether the source of text should be displayed Defaults to `True`.
 
861
  ) -> list[Chat]
862
  ```
863
 
864
+ Retrieves a list of chat assistants.
865
+
866
  ### Parameters
867
 
868
  #### page