Skip to content

Commit 3c23a9a

Browse files
authored
[cherry-pick] Add methods for table related pipelines (#3617)
* refine table-pipe-v2 * refine table-pipe-v2 * refine table-pipe-v2 * refine codes * refine codes * refine codes * refine codes * refine table rec v2 pipeline * refine table rec v2 pipeline
1 parent 80ee29b commit 3c23a9a

File tree

10 files changed

+265
-52
lines changed

10 files changed

+265
-52
lines changed

docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ comments: true
77
## 1. Introduction to General Table Recognition Pipeline
88
Table recognition is a technology that automatically identifies and extracts table content and structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into editable formats, facilitating further processing and analysis of data.
99

10-
The General Table Recognition Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. This pipeline integrates the well-known SLANet and SLANet_plus table recognition models. Based on this pipeline, precise predictions of tables can be achieved, covering a wide range of applications in general, manufacturing, finance, transportation, and other fields. The pipeline also provides flexible service deployment options, supporting various hardware and programming languages for integration. Moreover, it offers custom development capabilities, allowing you to train and optimize models on your own dataset, which can then be seamlessly integrated.
10+
The General Table Recognition Pipeline is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. This pipeline integrates the well-known SLANet and SLANet_plus table structure recognition models. Based on this pipeline, precise predictions of tables can be achieved, covering a wide range of applications in general, manufacturing, finance, transportation, and other fields. The pipeline also provides flexible service deployment options, supporting various hardware and programming languages for integration. Moreover, it offers custom development capabilities, allowing you to train and optimize models on your own dataset, which can then be seamlessly integrated.
1111

1212
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
1313
<b>The General Table Recognition Pipeline includes essential modules for table structure recognition, text detection, and text recognition, as well as optional modules for layout area detection, document image orientation classification, and text image correction.</b>
@@ -16,7 +16,7 @@ The General Table Recognition Pipeline is designed to solve table recognition ta
1616

1717
<details><summary>👉Model List Details</summary>
1818

19-
<p><b>Table Recognition Module Models:</b></p>
19+
<p><b>Table Structure Recognition Module Models:</b></p>
2020
<table>
2121
<tr>
2222
<th>Model</th><th>Model Download Link</th>
@@ -868,6 +868,13 @@ In the above Python script, the following steps are executed:
868868
</td>
869869
<td><code>None</code></td>
870870
</tr>
871+
<td><code>use_table_cells_ocr_results</code></td>
872+
<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to HTML table, when enabled, do OCR cell by cell and fill to HTML table (it will increase the time consuming). Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
873+
<td><code>bool|False</code></td>
874+
<td>
875+
<ul>
876+
<li><b>bool</b>:<code>True</code> or <code>False</code>
877+
<td><code>False</code></td>
871878
</table>
872879

873880
(3) Process the prediction results. Each sample's prediction result is represented as a corresponding Result object, and supports operations such as printing, saving as an image, saving as an `xlsx` file, saving as an `HTML` file, and saving as a `json` file.
@@ -1390,12 +1397,12 @@ SubModules:
13901397
LayoutDetection:
13911398
module_name: layout_detection
13921399
model_name: PicoDet_layout_1x_table
1393-
model_dir: null # 替换为微调后的版面区域检测模型权重路径
1400+
model_dir: null # Replace with fine-tuned model weight paths
13941401

13951402
TableStructureRecognition:
13961403
module_name: table_structure_recognition
13971404
model_name: SLANet_plus
1398-
model_dir: null # 替换为微调后的表格结构识别模型权重路径
1405+
model_dir: null # Replace with fine-tuned model weight paths
13991406

14001407
SubPipelines:
14011408
DocPreprocessor:
@@ -1406,7 +1413,7 @@ SubPipelines:
14061413
DocOrientationClassify:
14071414
module_name: doc_text_orientation
14081415
model_name: PP-LCNet_x1_0_doc_ori
1409-
model_dir: null # 替换为微调后的文档图像方向分类模型权重路径
1416+
model_dir: null # Replace with fine-tuned model weight paths
14101417

14111418
DocUnwarping:
14121419
module_name: image_unwarping
@@ -1422,16 +1429,16 @@ SubPipelines:
14221429
TextDetection:
14231430
module_name: text_detection
14241431
model_name: PP-OCRv4_server_det
1425-
model_dir: null # 替换为微调后的文本检测模型权重路径
1432+
model_dir: null # Replace with fine-tuned model weight paths
14261433
limit_side_len: 960
14271434
limit_type: max
14281435
thresh: 0.3
1429-
box_thresh: 0.6
1436+
box_thresh: 0.4
14301437
unclip_ratio: 2.0
14311438
TextRecognition:
14321439
module_name: text_recognition
14331440
model_name: PP-OCRv4_server_rec
1434-
model_dir: null # 替换为微调后文本识别的模型权重路径
1441+
model_dir: null # Replace with fine-tuned model weight paths
14351442
batch_size: 1
14361443
score_thresh: 0
14371444
```

docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ comments: true
77
## 1. 通用表格识别产线介绍
88
表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。
99

10-
通用表格识别产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。本产线集成了业界知名的 SLANet 和 SLANet_plus 表格识别模型。基于本产线,可实现对表格的精准预测,使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
10+
通用表格识别产线用于解决表格识别任务,对图片中的表格进行识别,并以HTML格式输出。本产线集成了业界知名的 SLANet 和 SLANet_plus 表格结构识别模型。基于本产线,可实现对表格的精准预测,使用场景覆盖通用、制造、金融、交通等各个领域。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
1111

1212
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition/01.png"/>
1313
<b>通用</b><b>表格识别</b><b>产线中包含必选的表格结构识别模块、文本检测模块和文本识别模块,以及可选的版面区域检测模块、文档图像方向分类模块和文本图像矫正模块</b>。
@@ -16,7 +16,7 @@ comments: true
1616

1717
<details><summary> 👉模型列表详情</summary>
1818

19-
<p><b>表格识别模块模型:</b></p>
19+
<p><b>表格结构识别模块模型:</b></p>
2020
<table>
2121
<tr>
2222
<th>模型</th><th>模型下载链接</th>
@@ -812,6 +812,14 @@ for res in output:
812812
<li><b>float</b>:大于 <code>0</code> 的任意浮点数
813813
<li><b>None</b>:如果设置为 <code>None</code>, 将默认使用产线初始化的该参数值 <code>0.0</code>。即不设阈值</li></li></ul></td>
814814
<td><code>None</code></td>
815+
</tr>
816+
<td><code>use_table_cells_ocr_results</code></td>
817+
<td>是否启用单元格OCR模式,不启用时采用全局OCR结果填充至HTML表格,启用时逐个单元格做OCR并填充至HTML表格(会增加耗时)。二者在不同场景下性能不同,请根据实际情况选择。</td>
818+
<td><code>bool|False</code></td>
819+
<td>
820+
<ul>
821+
<li><b>bool</b>:<code>True</code> 或者 <code>False</code>
822+
<td><code>False</code></td>
815823

816824
</tr></table>
817825

@@ -1367,7 +1375,7 @@ SubPipelines:
13671375
limit_side_len: 960
13681376
limit_type: max
13691377
thresh: 0.3
1370-
box_thresh: 0.6
1378+
box_thresh: 0.4
13711379
unclip_ratio: 2.0
13721380
TextRecognition:
13731381
module_name: text_recognition

docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@ comments: true
77
## 1. Introduction to General Table Recognition v2 Pipeline
88
Table recognition is a technology that automatically identifies and extracts table content and its structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, making it easier for users to further process and analyze data.
99

10-
The General Table Recognition v2 Pipeline(PP-TableMagic) is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models.
11-
12-
<b>❗ The General Table Recognition v2 Pipeline is still being optimized and the final version will be released in the next version of PaddleX. In order to maintain the stability of use, you can use the General Table Recognition Pipeline for table processing first, and we will release a notice when the final version of v2 is open-sourced, so please stay tuned!</b>
10+
The General Table Recognition v2 Pipeline (PP-TableMagic) is designed to solve table recognition tasks by identifying tables in images and outputting them in HTML format. Unlike the General Table Recognition Pipeline, this pipeline introduces two additional modules: table classification and table cell detection, which are linked with the table structure recognition module to complete the table recognition task. This pipeline can achieve accurate table predictions and is applicable in various fields such as general, manufacturing, finance, and transportation. It also provides flexible service deployment options, supporting multiple programming languages on various hardware. Additionally, it offers custom development capabilities, allowing you to train and fine-tune models on your own dataset, with seamless integration of the trained models. <b> In addition, the General Table Recognition v2 Pipeline also supports the use of end-to-end table structure recognition models (e.g. SLANet, SLANet_plus, etc.), and supports independent configuration of table recognition for wired and wireless table, allowing developers to freely select and combine the best table recognition solutions.</b>
1311

1412
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/table_recognition_v2/01.png"/>
1513

@@ -19,7 +17,7 @@ The General Table Recognition v2 Pipeline(PP-TableMagic) is designed to solve ta
1917

2018
<details><summary> 👉Model List Details</summary>
2119

22-
<p><b>Table Recognition Module Models:</b></p>
20+
<p><b>Table Structure Recognition Module Models:</b></p>
2321
<table>
2422
<tr>
2523
<th>Model</th><th>Model Download Link</th>
@@ -894,14 +892,55 @@ In the above Python script, the following steps are executed:
894892
<td><code>None</code></td>
895893
</tr>
896894
<td><code>use_table_cells_ocr_results</code></td>
897-
<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to html table, when enabled, do OCR cell by cell and fill to html table. Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
895+
<td>Whether to enable Table-Cells-OCR mode, when not enabled, use global OCR result to fill to HTML table, when enabled, do OCR cell by cell and fill to HTML table (it will increase the time consuming). Both of them perform differently in different scenarios, please choose according to the actual situation.</td>
896+
<td><code>bool|False</code></td>
897+
<td>
898+
<ul>
899+
<li><b>bool</b>:<code>True</code> or <code>False</code>
900+
<td><code>False</code></td>
901+
</tr>
902+
<td><code>use_e2e_wired_table_rec_model</code></td>
903+
<td>Whether to enable the wired table end-to-end prediction mode, when not enabled, using the table cells detection model prediction results filled to the HTML table, when enabled, using the end-to-end table structure recognition model cell prediction results filled to the HTML table. Both of them have different performance in different scenarios, please choose according to the actual situation.</td>
904+
<td><code>bool|False</code></td>
905+
<td>
906+
<ul>
907+
<li><b>bool</b>:<code>True</code> or <code>False</code>
908+
<td><code>False</code></td>
909+
</tr>
910+
<td><code>use_e2e_wireless_table_rec_model</code></td>
911+
<td>Whether to enable the wireless table end-to-end prediction mode, when not enabled, using the table cells detection model prediction results filled to the HTML table, when enabled, using the end-to-end table structure recognition model cell prediction results filled to the HTML table. Both of them have different performance in different scenarios, please choose according to the actual situation.</td>
898912
<td><code>bool|False</code></td>
899913
<td>
900914
<ul>
901915
<li><b>bool</b>:<code>True</code> or <code>False</code>
902916
<td><code>False</code></td>
903917
</table>
904918

919+
<b>If you need to use the end-to-end table structure recognition model, just replace the corresponding table structure recognition model with the end-to-end table structure recognition model in the pipeline config file, and then load the modified config file and modify the corresponding `predict()` method parameter</b>. For example, if you need to use SLANet_plus to do end-to-end table recognition for wireless tables, just replace `model_name` with SLANet_plus in `WirelessTableStructureRecognition` in the config file (as shown below) and specify `use_e2e_ wireless_table_rec_model=True` in the prediction, the rest of the parts do not need to be modified, at this time the wireless table cells detection model will not take effect, but directly use SLANet_plus for end-to-end table recognition.
920+
921+
```yaml
922+
SubModules:
923+
WiredTableStructureRecognition:
924+
module_name: table_structure_recognition
925+
model_name: SLANeXt_wired
926+
model_dir: null
927+
928+
WirelessTableStructureRecognition:
929+
module_name: table_structure_recognition
930+
model_name: SLANet_plus # Replace with the end-to-end table structure recognition model
931+
model_dir: null
932+
933+
WiredTableCellsDetection:
934+
module_name: table_cells_detection
935+
model_name: RT-DETR-L_wired_table_cell_det
936+
model_dir: null
937+
938+
WirelessTableCellsDetection:
939+
module_name: table_cells_detection
940+
model_name: RT-DETR-L_wireless_table_cell_det
941+
model_dir: null
942+
```
943+
905944
(3) Process the prediction results, where each sample's prediction result is represented as a corresponding Result object, and supports operations such as printing, saving as an image, saving as an `xlsx` file, saving as an `HTML` file, and saving as a `json` file:
906945

907946
<table>
@@ -1471,7 +1510,7 @@ SubPipelines:
14711510
limit_side_len: 960
14721511
limit_type: max
14731512
thresh: 0.3
1474-
box_thresh: 0.6
1513+
box_thresh: 0.4
14751514
unclip_ratio: 2.0
14761515
14771516
TextRecognition:

0 commit comments

Comments
 (0)