Skip to content

Commit 22d64ae

Browse files
authored
Merge pull request opencv#17570 from HannibalAPE:text_det_recog_demo
[GSoC] High Level API and Samples for Scene Text Detection and Recognition * APIs and samples for scene text detection and recognition * update APIs and tutorial for Text Detection and Recognition * API updates: (1) put decodeType into struct Voc (2) optimize the post-processing of DB * sample update: (1) add transformation into scene_text_spotting.cpp (2) modify text_detection.cpp with API update * update tutorial * simplify text recognition API update tutorial * update impl usage in recognize() and detect() * dnn: refactoring public API of TextRecognitionModel/TextDetectionModel * update provided models update opencv.bib * dnn: adjust text rectangle angle * remove points ordering operation in model.cpp * update gts of DB test in test_model.cpp * dnn: ensure to keep text rectangle angle - avoid 90/180 degree turns * dnn(text): use quadrangle result in TextDetectionModel API * dnn: update Text Detection API (1) keep points' order consistent with (bl, tl, tr, br) in unclip (2) update contourScore with boundingRect
1 parent 5ecf693 commit 22d64ae

19 files changed

+2340
-182
lines changed

doc/opencv.bib

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1261,3 +1261,26 @@ @inproceedings{forstner1987fast
12611261
pages={281--305},
12621262
year={1987}
12631263
}
1264+
@inproceedings{liao2020real,
1265+
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
1266+
title={Real-time Scene Text Detection with Differentiable Binarization},
1267+
booktitle={Proc. AAAI},
1268+
year={2020}
1269+
}
1270+
@article{shi2016end,
1271+
title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
1272+
author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
1273+
journal={IEEE transactions on pattern analysis and machine intelligence},
1274+
volume={39},
1275+
number={11},
1276+
pages={2298--2304},
1277+
year={2016},
1278+
publisher={IEEE}
1279+
}
1280+
@inproceedings{zhou2017east,
1281+
title={East: an efficient and accurate scene text detector},
1282+
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
1283+
booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
1284+
pages={5551--5560},
1285+
year={2017}
1286+
}

doc/tutorials/dnn/dnn_OCR/dnn_OCR.markdown

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# How to run custom OCR model {#tutorial_dnn_OCR}
22

33
@prev_tutorial{tutorial_dnn_custom_layers}
4+
@next_tutorial{tutorial_dnn_text_spotting}
45

56
## Introduction
67

@@ -43,4 +44,4 @@ The input of text recognition model is the output of the text detection model, w
4344

4445
DenseNet_CTC has the smallest parameters and best FPS, and it is suitable for edge devices, which are very sensitive to the cost of calculation. If you have limited computing resources and want to achieve better accuracy, VGG_CTC is a good choice.
4546

46-
CRNN_VGG_BiLSTM_CTC is suitable for scenarios that require high recognition accuracy.
47+
CRNN_VGG_BiLSTM_CTC is suitable for scenarios that require high recognition accuracy.
Loading
Loading
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# High Level API: TextDetectionModel and TextRecognitionModel {#tutorial_dnn_text_spotting}
2+
3+
@prev_tutorial{tutorial_dnn_OCR}
4+
5+
## Introduction
6+
In this tutorial, we will introduce the APIs for TextRecognitionModel and TextDetectionModel in detail.
7+
8+
---
9+
#### TextRecognitionModel:
10+
11+
In the current version, @ref cv::dnn::TextRecognitionModel only supports CNN+RNN+CTC based algorithms,
12+
and the greedy decoding method for CTC is provided.
13+
For more information, please refer to the [original paper](https://arxiv.org/abs/1507.05717)
14+
15+
Before recognition, you should `setVocabulary` and `setDecodeType`.
16+
- "CTC-greedy", the output of the text recognition model should be a probability matrix.
17+
The shape should be `(T, B, Dim)`, where
18+
- `T` is the sequence length
19+
- `B` is the batch size (only support `B=1` in inference)
20+
- and `Dim` is the length of vocabulary +1('Blank' of CTC is at the index=0 of Dim).
21+
22+
@ref cv::dnn::TextRecognitionModel::recognize() is the main function for text recognition.
23+
- The input image should be a cropped text image or an image with `roiRects`
24+
- Other decoding methods may supported in the future
25+
26+
---
27+
28+
#### TextDetectionModel:
29+
30+
@ref cv::dnn::TextDetectionModel API provides these methods for text detection:
31+
- cv::dnn::TextDetectionModel::detect() returns the results in std::vector<std::vector<Point>> (4-points quadrangles)
32+
- cv::dnn::TextDetectionModel::detectTextRectangles() returns the results in std::vector<cv::RotatedRect> (RBOX-like)
33+
34+
In the current version, @ref cv::dnn::TextDetectionModel supports these algorithms:
35+
- use @ref cv::dnn::TextDetectionModel_DB with "DB" models
36+
- and use @ref cv::dnn::TextDetectionModel_EAST with "EAST" models
37+
38+
The following provided pretrained models are variants of DB (w/o deformable convolution),
39+
and the performance can be referred to the Table.1 in the [paper]((https://arxiv.org/abs/1911.08947)).
40+
For more information, please refer to the [official code](https://github.com/MhLiao/DB)
41+
42+
---
43+
44+
You can train your own model with more data, and convert it into ONNX format.
45+
We encourage you to add new algorithms to these APIs.
46+
47+
48+
## Pretrained Models
49+
50+
#### TextRecognitionModel:
51+
52+
```
53+
crnn.onnx:
54+
url: https://drive.google.com/uc?export=dowload&id=1ooaLR-rkTl8jdpGy1DoQs0-X0lQsB6Fj
55+
sha: 270d92c9ccb670ada2459a25977e8deeaf8380d3,
56+
alphabet_36.txt: https://drive.google.com/uc?export=dowload&id=1oPOYx5rQRp8L6XQciUwmwhMCfX0KyO4b
57+
parameter setting: -rgb=0;
58+
description: The classification number of this model is 36 (0~9 + a~z).
59+
The training dataset is MJSynth.
60+
61+
crnn_cs.onnx:
62+
url: https://drive.google.com/uc?export=dowload&id=12diBsVJrS9ZEl6BNUiRp9s0xPALBS7kt
63+
sha: a641e9c57a5147546f7a2dbea4fd322b47197cd5
64+
alphabet_94.txt: https://drive.google.com/uc?export=dowload&id=1oKXxXKusquimp7XY1mFvj9nwLzldVgBR
65+
parameter setting: -rgb=1;
66+
description: The classification number of this model is 94 (0~9 + a~z + A~Z + punctuations).
67+
The training datasets are MJsynth and SynthText.
68+
69+
crnn_cs_CN.onnx:
70+
url: https://drive.google.com/uc?export=dowload&id=1is4eYEUKH7HR7Gl37Sw4WPXx6Ir8oQEG
71+
sha: 3940942b85761c7f240494cf662dcbf05dc00d14
72+
alphabet_3944.txt: https://drive.google.com/uc?export=dowload&id=18IZUUdNzJ44heWTndDO6NNfIpJMmN-ul
73+
parameter setting: -rgb=1;
74+
description: The classification number of this model is 3944 (0~9 + a~z + A~Z + Chinese characters + special characters).
75+
The training dataset is ReCTS (https://rrc.cvc.uab.es/?ch=12).
76+
```
77+
78+
More models can be found in [here](https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing),
79+
which are taken from [clovaai](https://github.com/clovaai/deep-text-recognition-benchmark).
80+
You can train more models by [CRNN](https://github.com/meijieru/crnn.pytorch), and convert models by `torch.onnx.export`.
81+
82+
#### TextDetectionModel:
83+
84+
```
85+
- DB_IC15_resnet50.onnx:
86+
url: https://drive.google.com/uc?export=dowload&id=17_ABp79PlFt9yPCxSaarVc_DKTmrSGGf
87+
sha: bef233c28947ef6ec8c663d20a2b326302421fa3
88+
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
89+
description: This model is trained on ICDAR2015, so it can only detect English text instances.
90+
91+
- DB_IC15_resnet18.onnx:
92+
url: https://drive.google.com/uc?export=dowload&id=1sZszH3pEt8hliyBlTmB-iulxHP1dCQWV
93+
sha: 19543ce09b2efd35f49705c235cc46d0e22df30b
94+
recommended parameter setting: -inputHeight=736, -inputWidth=1280;
95+
description: This model is trained on ICDAR2015, so it can only detect English text instances.
96+
97+
- DB_TD500_resnet50.onnx:
98+
url: https://drive.google.com/uc?export=dowload&id=19YWhArrNccaoSza0CfkXlA8im4-lAGsR
99+
sha: 1b4dd21a6baa5e3523156776970895bd3db6960a
100+
recommended parameter setting: -inputHeight=736, -inputWidth=736;
101+
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
102+
103+
- DB_TD500_resnet18.onnx:
104+
url: https://drive.google.com/uc?export=dowload&id=1vY_KsDZZZb_svd5RT6pjyI8BS1nPbBSX
105+
sha: 8a3700bdc13e00336a815fc7afff5dcc1ce08546
106+
recommended parameter setting: -inputHeight=736, -inputWidth=736;
107+
description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances.
108+
109+
```
110+
111+
We will release more models of DB [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq?usp=sharing) in the future.
112+
113+
```
114+
- EAST:
115+
Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1
116+
This model is based on https://github.com/argman/EAST
117+
```
118+
119+
## Images for Testing
120+
121+
```
122+
Text Recognition:
123+
url: https://drive.google.com/uc?export=dowload&id=1nMcEy68zDNpIlqAn6xCk_kYcUTIeSOtN
124+
sha: 89205612ce8dd2251effa16609342b69bff67ca3
125+
126+
Text Detection:
127+
url: https://drive.google.com/uc?export=dowload&id=149tAhIcvfCYeyufRoZ9tmc2mZDKE_XrF
128+
sha: ced3c03fb7f8d9608169a913acf7e7b93e07109b
129+
```
130+
131+
## Example for Text Recognition
132+
133+
Step1. Loading images and models with a vocabulary
134+
135+
```cpp
136+
// Load a cropped text line image
137+
// you can find cropped images for testing in "Images for Testing"
138+
int rgb = IMREAD_COLOR; // This should be changed according to the model input requirement.
139+
Mat image = imread("path/to/text_rec_test.png", rgb);
140+
141+
// Load models weights
142+
TextRecognitionModel model("path/to/crnn_cs.onnx");
143+
144+
// The decoding method
145+
// more methods will be supported in future
146+
model.setDecodeType("CTC-greedy");
147+
148+
// Load vocabulary
149+
// vocabulary should be changed according to the text recognition model
150+
std::ifstream vocFile;
151+
vocFile.open("path/to/alphabet_94.txt");
152+
CV_Assert(vocFile.is_open());
153+
String vocLine;
154+
std::vector<String> vocabulary;
155+
while (std::getline(vocFile, vocLine)) {
156+
vocabulary.push_back(vocLine);
157+
}
158+
model.setVocabulary(vocabulary);
159+
```
160+
161+
Step2. Setting Parameters
162+
163+
```cpp
164+
// Normalization parameters
165+
double scale = 1.0 / 127.5;
166+
Scalar mean = Scalar(127.5, 127.5, 127.5);
167+
168+
// The input shape
169+
Size inputSize = Size(100, 32);
170+
171+
model.setInputParams(scale, inputSize, mean);
172+
```
173+
Step3. Inference
174+
```cpp
175+
std::string recognitionResult = recognizer.recognize(image);
176+
std::cout << "'" << recognitionResult << "'" << std::endl;
177+
```
178+
179+
Input image:
180+
181+
![Picture example](text_rec_test.png)
182+
183+
Output:
184+
```
185+
'welcome'
186+
```
187+
188+
189+
## Example for Text Detection
190+
191+
Step1. Loading images and models
192+
```cpp
193+
// Load an image
194+
// you can find some images for testing in "Images for Testing"
195+
Mat frame = imread("/path/to/text_det_test.png");
196+
```
197+
198+
Step2.a Setting Parameters (DB)
199+
```cpp
200+
// Load model weights
201+
TextDetectionModel_DB model("/path/to/DB_TD500_resnet50.onnx");
202+
203+
// Post-processing parameters
204+
float binThresh = 0.3;
205+
float polyThresh = 0.5;
206+
uint maxCandidates = 200;
207+
double unclipRatio = 2.0;
208+
model.setBinaryThreshold(binThresh)
209+
.setPolygonThreshold(polyThresh)
210+
.setMaxCandidates(maxCandidates)
211+
.setUnclipRatio(unclipRatio)
212+
;
213+
214+
// Normalization parameters
215+
double scale = 1.0 / 255.0;
216+
Scalar mean = Scalar(122.67891434, 116.66876762, 104.00698793);
217+
218+
// The input shape
219+
Size inputSize = Size(736, 736);
220+
221+
model.setInputParams(scale, inputSize, mean);
222+
```
223+
224+
Step2.b Setting Parameters (EAST)
225+
```cpp
226+
TextDetectionModel_EAST model("EAST.pb");
227+
228+
float confThreshold = 0.5;
229+
float nmsThreshold = 0.4;
230+
model.setConfidenceThreshold(confThresh)
231+
.setNMSThreshold(nmsThresh)
232+
;
233+
234+
double detScale = 1.0;
235+
Size detInputSize = Size(320, 320);
236+
Scalar detMean = Scalar(123.68, 116.78, 103.94);
237+
bool swapRB = true;
238+
model.setInputParams(detScale, detInputSize, detMean, swapRB);
239+
```
240+
241+
242+
Step3. Inference
243+
```cpp
244+
std::vector<std::vector<Point>> detResults;
245+
model.detect(detResults);
246+
247+
// Visualization
248+
polylines(frame, results, true, Scalar(0, 255, 0), 2);
249+
imshow("Text Detection", image);
250+
waitKey();
251+
```
252+
253+
Output:
254+
255+
![Picture example](text_det_test_results.jpg)
256+
257+
## Example for Text Spotting
258+
259+
After following the steps above, it is easy to get the detection results of an input image.
260+
Then, you can do transformation and crop text images for recognition.
261+
For more information, please refer to **Detailed Sample**
262+
```cpp
263+
// Transform and Crop
264+
Mat cropped;
265+
fourPointsTransform(recInput, vertices, cropped);
266+
267+
String recResult = recognizer.recognize(cropped);
268+
```
269+
270+
Output Examples:
271+
272+
![Picture example](detect_test1.jpg)
273+
274+
![Picture example](detect_test2.jpg)
275+
276+
## Source Code
277+
The [source code](https://github.com/opencv/opencv/blob/master/modules/dnn/src/model.cpp)
278+
of these APIs can be found in the DNN module.
279+
280+
## Detailed Sample
281+
For more information, please refer to:
282+
- [samples/dnn/scene_text_recognition.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_recognition.cpp)
283+
- [samples/dnn/scene_text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_detection.cpp)
284+
- [samples/dnn/text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp)
285+
- [samples/dnn/scene_text_spotting.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_spotting.cpp)
286+
287+
#### Test with an image
288+
Examples:
289+
```bash
290+
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt
291+
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736
292+
example_dnn_scene_text_spotting -dmp=path/to/DB_IC15_resnet50.onnx -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -iw=1280 -ih=736 -rgb=1 -vp=/path/to/alphabet_94.txt
293+
example_dnn_text_detection -dmp=path/to/EAST.pb -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=path/to/alphabet_94.txt
294+
```
295+
296+
#### Test on public datasets
297+
Text Recognition:
298+
299+
The download link for testing images can be found in the **Images for Testing**
300+
301+
302+
Examples:
303+
```bash
304+
example_dnn_scene_text_recognition -mp=path/to/crnn.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_36.txt -rgb=0
305+
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_94.txt -rgb=1
306+
```
307+
308+
Text Detection:
309+
310+
The download links for testing images can be found in the **Images for Testing**
311+
312+
Examples:
313+
```bash
314+
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/TD500 -ih=736 -iw=736
315+
example_dnn_scene_text_detection -mp=path/to/DB_IC15_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/IC15 -ih=736 -iw=1280
316+
```
Loading
Loading

doc/tutorials/dnn/table_of_content_dnn.markdown

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,4 +79,14 @@ Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
7979

8080
*Author:* Zihao Mu
8181

82-
In this tutorial you will learn how to use opencv_dnn module using custom OCR models.
82+
In this tutorial you will learn how to use opencv_dnn module using custom OCR models.
83+
84+
- @subpage tutorial_dnn_text_spotting
85+
86+
*Languages:* C++
87+
88+
*Compatibility:* \> OpenCV 4.5
89+
90+
*Author:* Wenqing Zhang
91+
92+
In these tutorial, we'll introduce how to use the high-level APIs for text recognition and text detection

0 commit comments

Comments
 (0)