PaddlePaddle
/

PP-DocBee2-3B

@@ -1,5 +1,14 @@
 ---
 license: apache-2.0
 ---
 # PP-DocBee2-3B
@@ -53,16 +62,19 @@ You can quickly experience the functionality with a single command:
 ```bash
 paddleocr doc_vlm \
     --model_name PP-DocBee2-3B \
-    -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出'}"
 ```
-You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine.
 ```python
 from paddleocr import DocVLM
 model = DocVLM(model_name="PP-DocBee2-3B")
 results = model.predict(
-    input={"image": "medal_table.png", "query": "识别这份表格的内容, 以markdown格式输出"},
     batch_size=1
 )
 for res in results:
@@ -73,29 +85,29 @@ for res in results:
 After running, the obtained result is as follows:
 ```bash
-{'res': {'image': 'medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国（CHN） | 48 | 22 | 30 | 100 |\n| 2 | 美国（USA） | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯（RUS） | 24 | 13 | 23 | 60 |\n| 4 | 英国（GBR） | 19 | 13 | 19 | 51 |\n| 5 | 德国（GER） | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚（AUS） | 14 | 15 | 17 | 46 |\n| 7 | 韩国（KOR） | 13 | 11 | 8 | 32 |\n| 8 | 日本（JPN） | 9 | 8 | 8 | 25 |\n| 9 | 意大利（ITA） | 8 | 9 | 10 | 27 |\n| 10 | 法国（FRA） | 7 | 16 | 20 | 43 |\n| 11 | 荷兰（NED） | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰（UKR） | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚（KEN） | 6 | 4 | 6 | 16 |\n| 14 | 西班牙（ESP） | 5 | 11 | 3 | 19 |\n| 15 | 牙买加（JAM） | 5 | 4 | 2 | 11 |\n'}}
 ```
 The visualized result is as follows:
 ```bash
-| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |
-| --- | --- | --- | --- | --- | --- |
-| 1 | 中国（CHN） | 48 | 22 | 30 | 100 |
-| 2 | 美国（USA） | 36 | 39 | 37 | 112 |
-| 3 | 俄罗斯（RUS） | 24 | 13 | 23 | 60 |
-| 4 | 英国（GBR） | 19 | 13 | 19 | 51 |
-| 5 | 德国（GER） | 16 | 11 | 14 | 41 |
-| 6 | 澳大利亚（AUS） | 14 | 15 | 17 | 46 |
-| 7 | 韩国（KOR） | 13 | 11 | 8 | 32 |
-| 8 | 日本（JPN） | 9 | 8 | 8 | 25 |
-| 9 | 意大利（ITA） | 8 | 9 | 10 | 27 |
-| 10 | 法国（FRA） | 7 | 16 | 20 | 43 |
-| 11 | 荷兰（NED） | 7 | 5 | 4 | 16 |
-| 12 | 乌克兰（UKR） | 7 | 4 | 11 | 22 |
-| 13 | 肯尼亚（KEN） | 6 | 4 | 6 | 16 |
-| 14 | 西班牙（ESP） | 5 | 11 | 3 | 19 |
-| 15 | 牙买加（JAM） | 5 | 4 | 2 | 11 |
 ```
 For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/doc_vlm.html#iii-quick-start).
@@ -112,18 +124,18 @@ The document understanding pipeline is an advanced document processing technolog
 Run a single command to quickly experience the OCR pipeline:
 ```bash
-paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出'}"
 ```
 Results are printed to the terminal:
 ```json
-{'res': {'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国（CHN） | 48 | 22 | 30 | 100 |\n| 2 | 美国（USA） | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯（RUS） | 24 | 13 | 23 | 60 |\n| 4 | 英国（GBR） | 19 | 13 | 19 | 51 |\n| 5 | 德国（GER） | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚（AUS） | 14 | 15 | 17 | 46 |\n| 7 | 韩国（KOR） | 13 | 11 | 8 | 32 |\n| 8 | 日本（JPN） | 9 | 8 | 8 | 25 |\n| 9 | 意大利（ITA） | 8 | 9 | 10 | 27 |\n| 10 | 法国（FRA） | 7 | 16 | 20 | 43 |\n| 11 | 荷兰（NED） | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰（UKR） | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚（KEN） | 6 | 4 | 6 | 16 |\n| 14 | 西班牙（ESP） | 5 | 11 | 3 | 19 |\n| 15 | 牙买加（JAM） | 5 | 4 | 2 | 11 |\n'}}
 ```
 If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
-![image/png](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/pipelines/doc_understanding/doc_understanding.png)
 The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
@@ -135,8 +147,8 @@ pipeline = DocUnderstanding(
 )
 output = pipeline.predict(
     {
-        "image": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png",
-        "query": "识别这份表格的内容, 以markdown格式输出"
     }
 )
 for res in output:
@@ -144,7 +156,7 @@ for res in output:
     res.save_to_json("./output/")
 ```
-The default model used in pipeline is `PP-DocBee2-3B`, so it is not necessary that specifing to `PP-DocBee2-3B` by argument `doc_understanding_model_name`. But you can use the local model file by argument `doc_understanding_model_dir`. For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/doc_understanding.html#2-quick-start).
 ## Links

 ---
 license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+- zh
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
 ---
 # PP-DocBee2-3B
 ```bash
 paddleocr doc_vlm \
     --model_name PP-DocBee2-3B \
+    -i "{'image': 'https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png', 'query': 'Recognize the content of this table and output it in markdown format.'}"
 ```
+You can also integrate the model inference of the document visual-language module into your project. Before running the following code, please download the sample image to your local machine.
 ```python
 from paddleocr import DocVLM
 model = DocVLM(model_name="PP-DocBee2-3B")
 results = model.predict(
+    input={
+        "image": "https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png",
+        "query": "Recognize the content of this table and output it in markdown format."
+    },
     batch_size=1
 )
 for res in results:
 After running, the obtained result is as follows:
 ```bash
+{'res': {'image': 'medal_table_en.png', 'query': 'Recognize the content of this table and output it in markdown format', 'result': '| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |\n|---|---|---|---|---|---|\n| 1 | China (CHN) | 48 | 22 | 30 | 100 |\n| 2 | United States (USA) | 36 | 39 | 37 | 112 |\n| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |\n| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |\n| 5 | Germany (GER) | 16 | 11 | 14 | 41 |\n| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |\n| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |\n| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |\n| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |\n| 10 | France (FRA) | 7 | 16 | 20 | 43 |\n| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |\n| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |\n| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |\n| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |\n| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |\n'}}
 ```
 The visualized result is as follows:
 ```bash
+| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |
+|---|---|---|---|---|---|
+| 1 | China (CHN) | 48 | 22 | 30 | 100 |
+| 2 | United States (USA) | 36 | 39 | 37 | 112 |
+| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |
+| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |
+| 5 | Germany (GER) | 16 | 11 | 14 | 41 |
+| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |
+| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |
+| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |
+| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |
+| 10 | France (FRA) | 7 | 16 | 20 | 43 |
+| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |
+| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |
+| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |
+| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |
+| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |
 ```
 For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/doc_vlm.html#iii-quick-start).
 Run a single command to quickly experience the OCR pipeline:
 ```bash
+paddleocr doc_understanding -i "{'image': 'https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png', 'query': 'Recognize the content of this table and output it in markdown format.'}"
 ```
 Results are printed to the terminal:
 ```json
+{'res': {'image': 'medal_table_en.png', 'query': 'Recognize the content of this table and output it in markdown format', 'result': '| Rank | Country/Region | Gold | Silver | Bronze | Total Medals |\n|---|---|---|---|---|---|\n| 1 | China (CHN) | 48 | 22 | 30 | 100 |\n| 2 | United States (USA) | 36 | 39 | 37 | 112 |\n| 3 | Russia (RUS) | 24 | 13 | 23 | 60 |\n| 4 | Great Britain (GBR) | 19 | 13 | 19 | 51 |\n| 5 | Germany (GER) | 16 | 11 | 14 | 41 |\n| 6 | Australia (AUS) | 14 | 15 | 17 | 46 |\n| 7 | South Korea (KOR) | 13 | 11 | 8 | 32 |\n| 8 | Japan (JPN) | 9 | 8 | 8 | 25 |\n| 9 | Italy (ITA) | 8 | 9 | 10 | 27 |\n| 10 | France (FRA) | 7 | 16 | 20 | 43 |\n| 11 | Netherlands (NED) | 7 | 5 | 4 | 16 |\n| 12 | Ukraine (UKR) | 7 | 4 | 11 | 22 |\n| 13 | Kenya (KEN) | 6 | 4 | 6 | 16 |\n| 14 | Spain (ESP) | 5 | 11 | 3 | 19 |\n| 15 | Jamaica (JAM) | 5 | 4 | 2 | 11 |\n'}}
 ```
 If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/kFGo9nlHuHs2uyN1voSTg.png)
 The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
 )
 output = pipeline.predict(
     {
+        "image": "https://cdn-uploads.huggingface.co/production/uploads/684acf07de103b2d44c85531/l5xpHbfLn75dKInhQZ84I.png",
+        "query": "Recognize the content of this table and output it in markdown format."
     }
 )
 for res in output:
     res.save_to_json("./output/")
 ```
+The default model used in pipeline is `PP-DocBee2-3B`, so you don't have to specify `PP-DocBee2-3B` for the `doc_understanding_model_name argument`, but you can use the local model file by argument `doc_understanding_model_dir`. For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/doc_understanding.html#2-quick-start).
 ## Links