Skip to content
This repository was archived by the owner on May 8, 2024. It is now read-only.

Commit 47d4fa4

Browse files
authored
Merge pull request #3 from oneapi-src/main-upstream
refkit-2.1.0
2 parents a75ebb8 + d19ef51 commit 47d4fa4

32 files changed

+56
-42
lines changed

LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2022, Intel Corporation
1+
Copyright (c) 2024, Intel Corporation
22

33
Redistribution and use in source and binary forms, with or without
44
modification, are permitted provided that the following conditions are met:
@@ -21,4 +21,4 @@ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
2121
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
2222
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
2323
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
24-
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Optical Character Recognition (OCR) systems emerge as an automated solution that
1313

1414
In an OCR pipeline, an input document image flows into a text detection component and next, it is processed by a text recognition component. In the text detection stage, the objective is to localize all text regions within the input document images, where each of these text zones are known as region of interest (ROI). Once the ROIs are detected, they are cropped from the input images and passed to the text recognition component, which is in charge of identifying the text contained in the ROIs and transcribe such text into machine-encoded text. This process is illustrated in the following diagram:
1515

16-
![ocr-flow](assets/ocr_flow_diagram.png)
16+
![ocr-flow](assets/ocr_flow_diagram_op.png)
1717

1818
Nowadays, AI (Artificial Intelligence) methods in the form of cutting-edge deep learning algorithms are commonly incorporated into OCR solutions to increase their efficiency in the processing of scanned files and their accuracy in the text recognition task [[3]](#memon_2020). Deep learning detection models like YOLO variations and CRAFT are frequently used in the text detection module to localize the ROIs, whereas models like Convolutional Recurrent Neural Networks (CRNN) and Transformers are implemented as part of the text recognition stage [[2]](#li_2022)[[4]](#faustomorales_2019).
1919

@@ -51,7 +51,7 @@ Furthermore, avoiding the manual retrieval of some specific information from a m
5151
* Extracting text information from products to reduce shrinkage loss in grocery stores.
5252
* Automate the processing of financial documents to combat fraud, increase productivity and improve customer service.
5353

54-
For more details, visit [Intel® Extension for PyTorch\*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.5vjhbw), [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html#gs.5vjr1p), [Intel® Distribution of OpenVINO<sup>TM</sup> Toolkit](https://www.intel.com/content/www/us/en/download/753640/intel-distribution-of-openvino-toolkit.html), the [PyTorch\* Historical Assets Document Processing (OCR)]() GitHub repository, and the [EasyOCR](https://github.com/JaidedAI/EasyOCR) GitHub repository.
54+
For more details, visit [Intel® Extension for PyTorch\*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-pytorch.html#gs.5vjhbw), [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html#gs.5vjr1p), [Intel® Distribution of OpenVINO<sup>TM</sup> Toolkit](https://www.intel.com/content/www/us/en/download/753640/intel-distribution-of-openvino-toolkit.html), the [Historical Assets Document Process]() GitHub repository, and the [EasyOCR](https://github.com/JaidedAI/EasyOCR) GitHub repository.
5555

5656
## Solution Technical Details
5757
In this section, the interested reader can find a more in deep explanation about the text recognition component from the proposed OCR solution. A description of the dataset used to perform training and inference is also presented.
@@ -71,7 +71,7 @@ About the LSTM model used by the CRNN in this project, it works under a bidirect
7171

7272
Regarding the workflow process of the CRNN, it receives a cropped ROI image from EasyOCR as an input, and the convolutional component proceeds to extract a sequence of feature maps, which are then mapped into a sequence of feature vectors. Next, the bidirectional LSTM makes a prediction for each feature vector. Finally, a post-processing step is carried out to convert the LSTM predictions into a label sequence. The diagram below provides an illustrative reference of this process.
7373

74-
![ocr-flow](assets/crnn_flow_diagram.png)
74+
![ocr-flow](assets/crnn_flow_diagram_op.png)
7575

7676
In terms of model architecture, the CRNN is composed by seven convolutional layers, each of them followed by a max pooling layer. As for the RNN, it is constituted by two bidirectional LSTM layers, each of them followed by a linear layer. The next table summarizes the structure of the CRNN model implemented in this reference kit. "in_maps" stands for "input feature maps", "out_maps" for "output feature maps", "k" for "kernel size", "s" for "stride", "p" for "padding", "in_features" is the size of each input instance and "out_features" is the size of each output instance.
7777

@@ -119,7 +119,7 @@ how the workflow is run.
119119

120120
| Recommended Hardware | Precision
121121
| ----------------------------------------------------------------|-
122-
| CPU: Intel® 2th Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher | FP32, INT8
122+
| CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher | FP32, INT8
123123
| RAM: 187 GB |
124124
| Recommended Free Disk Space: 20 GB or more |
125125

@@ -128,7 +128,7 @@ Code was tested on Ubuntu\* 22.04 LTS.
128128
## How it Works
129129
The text recognition component enables the training and inference modalities. Furthermore, this reference kit provides the option to incorporate the trained CRNN text recognition model into an end-to-end OCR system to make predictions from a complete document image. All these procedures are optimized using Intel® specialized packages. The next diagram illustrates the workflow of these processes and how the Intel® optimization features are applied in each stage.
130130

131-
![ocr-flow](assets/e2e_flow_diagram.png)
131+
![ocr-flow](assets/e2e_flow_diagram_op.png)
132132

133133
### Intel® Extension for PyTorch\*
134134
Training a CRNN model, and making inference with it, usually represent compute-intensive tasks. To address these requirements and to gain a performance boost on Intel® hardware, in this reference kit the training and inference stages of the CRNN model include the implementation of Intel® Extension for PyTorch\*.
@@ -155,9 +155,9 @@ Just like any of the trained CRNN models with Intel® Extension for PyTorch\*, t
155155
### Intel® Distribution of OpenVINO™ Toolkit
156156
Similar to Intel® Neural Compressor, the Intel® Distribution of OpenVINO™ toolkit allows to reduce the model size with post-training quantization, which improves inference performance. By using the Intel® Distribution of OpenVINO™ toolkit post-training quantization, the FP32 CRNN model is converted to INT8. Moreover, the Intel® Distribution of OpenVINO™ toolkit optimizes the CRNN model for deployment in resource-constrained environments, like edge devices.
157157

158-
In order to quantize the FP32 CRNN model using the Intel® Distribution of OpenVINO™ toolkit, it is necessary to first convert the original FP32 CRNN model into ONNX (Open Neural Network Exchange) model representation. After the model is converted to ONNX, it must be converted into an Intermediate Representation (IR) format, which is an internal Intel® Distribution of OpenVINO™ toolkit model representation. Once the CRNN model is in IR format, the Intel® Distribution of OpenVINO™ toolkit directly quantizes the IR model via the Post-training Optimization (POT) tool and transforms it into an INT8 model. This conversion stages are illustrated in the following diagram.
158+
In order to quantize the FP32 CRNN model using the Intel® Distribution of OpenVINO™ toolkit, it is necessary to first convert the original FP32 CRNN model into ONNX (Open Neural Network Exchange) model representation. After the model is converted to ONNX, it must be converted into an Intermediate Representation (IR) format, which is an internal Intel® Distribution of OpenVINO™ toolkit model representation. Once the CRNN model is in IR format, the Intel® Distribution of OpenVINO™ toolkit directly quantizes the IR model via the Post-training Optimization (POT) tool and transforms it into an INT8 model. These conversion stages are illustrated in the following diagram.
159159

160-
![ocr-flow](assets/conversion_stages.png)
160+
![ocr-flow](assets/conversion_stages_op.png)
161161

162162
Another benefit from using the Intel® Distribution of OpenVINO™ toolkit is that it enables the use of the benchmark Python\* tool, which is a feature that estimates the inference performance of the corresponding deep learning model on supported devices [[12]](#openvino). The estimated inference performance is calculated in terms of latency and throughput. For this use case, the benchmark Python\* tool is applied on the ONNX, IR and quantized INT8 models.
163163

@@ -186,7 +186,7 @@ export OUTPUT_DIR=$WORKSPACE/output
186186
**OUTPUT_DIR:** This path will contain the multiple outputs generated by the workflow, e.g. FP32 CRNN model and INT8 CRNN model.
187187

188188
### Download the Workflow Repository
189-
Create the workspace directory for the workflow and clone the [PyTorch Historical Assets Document Processing(OCR)]() repository inside it.
189+
Create the workspace directory for the workflow and clone the [Historical Assets Document Process]() repository inside it.
190190

191191
[//]: # (capture: baremetal)
192192
```bash
@@ -231,10 +231,10 @@ conda config --set solver libmamba
231231
| Packages | Version |
232232
| -------- | ------- |
233233
| python | 3.9 |
234-
| intelpython3_core | 2023.2.0 |
234+
| intelpython3_core | 2024.0.0 |
235235
| intel-extension-for-pytorch | 2.0.100 |
236-
| neural-compressor| 2.3 |
237-
| openvino-dev| 2023.1.0 |
236+
| neural-compressor| 2.3.1 |
237+
| openvino-dev| 2023.2.0 |
238238

239239
The dependencies required to properly execute this workflow can be found in the yml file [$WORKSPACE/env/intel_env.yml](env/intel_env.yml).
240240

@@ -1222,6 +1222,7 @@ For more information about Predictive Asset Maintenance or to read about other r
12221222
If you have questions or issues about this workflow, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.
12231223
12241224
## Appendix
1225+
\*Names and brands that may be claimed as the property of others. [Trademarks](https://www.intel.com/content/www/us/en/legal/trademarks.html).
12251226
12261227
### Disclaimer
12271228

assets/conversion_stages.png

-89 KB
Binary file not shown.

assets/conversion_stages_op.png

103 KB
Loading

assets/crnn_flow_diagram.png

-58.4 KB
Binary file not shown.

assets/crnn_flow_diagram_op.png

114 KB
Loading

assets/e2e_flow_diagram.png

-95.6 KB
Binary file not shown.

assets/e2e_flow_diagram_op.png

156 KB
Loading

assets/ocr_flow_diagram.png

-41.7 KB
Binary file not shown.

assets/ocr_flow_diagram_op.png

88.2 KB
Loading

config/conf.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ version: 1.0
33

44
model:
55
name: CRNN
6-
framework: pytorch_fx
6+
framework: pytorch_ipex
77
evaluation: # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
88
accuracy: # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
99
metric:

env/intel_env.yml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,17 @@ channels:
33
- intel
44
- conda-forge
55
dependencies:
6-
- intel::intelpython3_core=2023.2.0=py39_0
6+
- intel::intelpython3_core=2024.0.0
7+
- intel::python=3.9
78
- cpuonly=1.0
8-
- intel:pip
9+
- intel-extension-for-pytorch==2.0.100
10+
- neural-compressor==2.3.1
11+
- pillow==9.5
12+
- intel::pip
913
- pip:
1014
- torch==2.0.1
11-
- easyocr==1.6.2
12-
- intel-extension-for-pytorch==2.0.100
13-
- neural-compressor==2.2
15+
- easyocr==1.7.1
1416
- trdg==1.8.0
15-
- opencv-python==4.5.5.64
16-
- openvino-dev[onnx]==2023.1.0
17-
- pillow==9.5
17+
- opencv-python==4.8.1.78
18+
- openvino-dev[onnx]==2023.2.0
19+

env/jake_env.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
name: historical_assets_jake
2+
channels:
3+
- intel
4+
- conda-forge
5+
dependencies:
6+
- intel::python==3.9.16
7+
- pillow==9.5
8+
- intel::pip
9+
- pip:
10+
- easyocr==1.7.1
11+
- trdg==1.8.0

src/DatasetGenerator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33
import os
44
import argparse

src/config.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-docstring

src/crnn.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-docstring

src/dataset_gen.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33
#!/usr/bin/env bash
44
trdg -c 3356 -f 64 -sym -l en -t 8 -na 1 -rbl -rk --output_dir ./data/dataset

src/inc_inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/keys.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-docstring

src/mydataset.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
#!/usr/bin/python

src/neural_compressor_conversion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33
# pylint: disable=missing-module-docstring
44
# pylint: disable=E0401

src/ocr.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/ocr_pipeline.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring
@@ -69,7 +69,7 @@ def single_pic_proc(image_file, crnn_model_path, quantized_model_path=None, inte
6969

7070
if inc_opt:
7171
intel_opt = True
72-
assert quantized_model_path is not None
72+
if (quantized_model_path is None): raise AssertionError('There is not a quantized model')
7373

7474
image_files = glob(test_images_path+'/*.*')
7575
print(image_files)

src/ocr_train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/ocr_train_hp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33
# pylint: disable=missing-docstring
44

src/online_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-docstring

src/onnx_convert.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/performance_analysis.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/trans.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
#!/usr/bin/env python

src/trans_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

src/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2023 Intel Corporation
1+
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: BSD-3-Clause
33

44
# pylint: disable=missing-module-docstring

0 commit comments

Comments
 (0)