|
| 1 | +# High Level API: TextDetectionModel and TextRecognitionModel {#tutorial_dnn_text_spotting} |
| 2 | + |
| 3 | +@prev_tutorial{tutorial_dnn_OCR} |
| 4 | + |
| 5 | +## Introduction |
| 6 | +In this tutorial, we will introduce the APIs for TextRecognitionModel and TextDetectionModel in detail. |
| 7 | + |
| 8 | +--- |
| 9 | +#### TextRecognitionModel: |
| 10 | + |
| 11 | +In the current version, @ref cv::dnn::TextRecognitionModel only supports CNN+RNN+CTC based algorithms, |
| 12 | +and the greedy decoding method for CTC is provided. |
| 13 | +For more information, please refer to the [original paper](https://arxiv.org/abs/1507.05717) |
| 14 | + |
| 15 | +Before recognition, you should `setVocabulary` and `setDecodeType`. |
| 16 | +- "CTC-greedy", the output of the text recognition model should be a probability matrix. |
| 17 | + The shape should be `(T, B, Dim)`, where |
| 18 | + - `T` is the sequence length |
| 19 | + - `B` is the batch size (only support `B=1` in inference) |
| 20 | + - and `Dim` is the length of vocabulary +1('Blank' of CTC is at the index=0 of Dim). |
| 21 | + |
| 22 | +@ref cv::dnn::TextRecognitionModel::recognize() is the main function for text recognition. |
| 23 | +- The input image should be a cropped text image or an image with `roiRects` |
| 24 | +- Other decoding methods may supported in the future |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +#### TextDetectionModel: |
| 29 | + |
| 30 | +@ref cv::dnn::TextDetectionModel API provides these methods for text detection: |
| 31 | +- cv::dnn::TextDetectionModel::detect() returns the results in std::vector<std::vector<Point>> (4-points quadrangles) |
| 32 | +- cv::dnn::TextDetectionModel::detectTextRectangles() returns the results in std::vector<cv::RotatedRect> (RBOX-like) |
| 33 | + |
| 34 | +In the current version, @ref cv::dnn::TextDetectionModel supports these algorithms: |
| 35 | +- use @ref cv::dnn::TextDetectionModel_DB with "DB" models |
| 36 | +- and use @ref cv::dnn::TextDetectionModel_EAST with "EAST" models |
| 37 | + |
| 38 | +The following provided pretrained models are variants of DB (w/o deformable convolution), |
| 39 | +and the performance can be referred to the Table.1 in the [paper]((https://arxiv.org/abs/1911.08947)). |
| 40 | +For more information, please refer to the [official code](https://github.com/MhLiao/DB) |
| 41 | + |
| 42 | +--- |
| 43 | + |
| 44 | +You can train your own model with more data, and convert it into ONNX format. |
| 45 | +We encourage you to add new algorithms to these APIs. |
| 46 | + |
| 47 | + |
| 48 | +## Pretrained Models |
| 49 | + |
| 50 | +#### TextRecognitionModel: |
| 51 | + |
| 52 | +``` |
| 53 | +crnn.onnx: |
| 54 | +url: https://drive.google.com/uc?export=dowload&id=1ooaLR-rkTl8jdpGy1DoQs0-X0lQsB6Fj |
| 55 | +sha: 270d92c9ccb670ada2459a25977e8deeaf8380d3, |
| 56 | +alphabet_36.txt: https://drive.google.com/uc?export=dowload&id=1oPOYx5rQRp8L6XQciUwmwhMCfX0KyO4b |
| 57 | +parameter setting: -rgb=0; |
| 58 | +description: The classification number of this model is 36 (0~9 + a~z). |
| 59 | + The training dataset is MJSynth. |
| 60 | +
|
| 61 | +crnn_cs.onnx: |
| 62 | +url: https://drive.google.com/uc?export=dowload&id=12diBsVJrS9ZEl6BNUiRp9s0xPALBS7kt |
| 63 | +sha: a641e9c57a5147546f7a2dbea4fd322b47197cd5 |
| 64 | +alphabet_94.txt: https://drive.google.com/uc?export=dowload&id=1oKXxXKusquimp7XY1mFvj9nwLzldVgBR |
| 65 | +parameter setting: -rgb=1; |
| 66 | +description: The classification number of this model is 94 (0~9 + a~z + A~Z + punctuations). |
| 67 | + The training datasets are MJsynth and SynthText. |
| 68 | +
|
| 69 | +crnn_cs_CN.onnx: |
| 70 | +url: https://drive.google.com/uc?export=dowload&id=1is4eYEUKH7HR7Gl37Sw4WPXx6Ir8oQEG |
| 71 | +sha: 3940942b85761c7f240494cf662dcbf05dc00d14 |
| 72 | +alphabet_3944.txt: https://drive.google.com/uc?export=dowload&id=18IZUUdNzJ44heWTndDO6NNfIpJMmN-ul |
| 73 | +parameter setting: -rgb=1; |
| 74 | +description: The classification number of this model is 3944 (0~9 + a~z + A~Z + Chinese characters + special characters). |
| 75 | + The training dataset is ReCTS (https://rrc.cvc.uab.es/?ch=12). |
| 76 | +``` |
| 77 | + |
| 78 | +More models can be found in [here](https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing), |
| 79 | +which are taken from [clovaai](https://github.com/clovaai/deep-text-recognition-benchmark). |
| 80 | +You can train more models by [CRNN](https://github.com/meijieru/crnn.pytorch), and convert models by `torch.onnx.export`. |
| 81 | + |
| 82 | +#### TextDetectionModel: |
| 83 | + |
| 84 | +``` |
| 85 | +- DB_IC15_resnet50.onnx: |
| 86 | +url: https://drive.google.com/uc?export=dowload&id=17_ABp79PlFt9yPCxSaarVc_DKTmrSGGf |
| 87 | +sha: bef233c28947ef6ec8c663d20a2b326302421fa3 |
| 88 | +recommended parameter setting: -inputHeight=736, -inputWidth=1280; |
| 89 | +description: This model is trained on ICDAR2015, so it can only detect English text instances. |
| 90 | +
|
| 91 | +- DB_IC15_resnet18.onnx: |
| 92 | +url: https://drive.google.com/uc?export=dowload&id=1sZszH3pEt8hliyBlTmB-iulxHP1dCQWV |
| 93 | +sha: 19543ce09b2efd35f49705c235cc46d0e22df30b |
| 94 | +recommended parameter setting: -inputHeight=736, -inputWidth=1280; |
| 95 | +description: This model is trained on ICDAR2015, so it can only detect English text instances. |
| 96 | +
|
| 97 | +- DB_TD500_resnet50.onnx: |
| 98 | +url: https://drive.google.com/uc?export=dowload&id=19YWhArrNccaoSza0CfkXlA8im4-lAGsR |
| 99 | +sha: 1b4dd21a6baa5e3523156776970895bd3db6960a |
| 100 | +recommended parameter setting: -inputHeight=736, -inputWidth=736; |
| 101 | +description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances. |
| 102 | +
|
| 103 | +- DB_TD500_resnet18.onnx: |
| 104 | +url: https://drive.google.com/uc?export=dowload&id=1vY_KsDZZZb_svd5RT6pjyI8BS1nPbBSX |
| 105 | +sha: 8a3700bdc13e00336a815fc7afff5dcc1ce08546 |
| 106 | +recommended parameter setting: -inputHeight=736, -inputWidth=736; |
| 107 | +description: This model is trained on MSRA-TD500, so it can detect both English and Chinese text instances. |
| 108 | +
|
| 109 | +``` |
| 110 | + |
| 111 | +We will release more models of DB [here](https://drive.google.com/drive/folders/1qzNCHfUJOS0NEUOIKn69eCtxdlNPpWbq?usp=sharing) in the future. |
| 112 | + |
| 113 | +``` |
| 114 | +- EAST: |
| 115 | +Download link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1 |
| 116 | +This model is based on https://github.com/argman/EAST |
| 117 | +``` |
| 118 | + |
| 119 | +## Images for Testing |
| 120 | + |
| 121 | +``` |
| 122 | +Text Recognition: |
| 123 | +url: https://drive.google.com/uc?export=dowload&id=1nMcEy68zDNpIlqAn6xCk_kYcUTIeSOtN |
| 124 | +sha: 89205612ce8dd2251effa16609342b69bff67ca3 |
| 125 | +
|
| 126 | +Text Detection: |
| 127 | +url: https://drive.google.com/uc?export=dowload&id=149tAhIcvfCYeyufRoZ9tmc2mZDKE_XrF |
| 128 | +sha: ced3c03fb7f8d9608169a913acf7e7b93e07109b |
| 129 | +``` |
| 130 | + |
| 131 | +## Example for Text Recognition |
| 132 | + |
| 133 | +Step1. Loading images and models with a vocabulary |
| 134 | + |
| 135 | +```cpp |
| 136 | + // Load a cropped text line image |
| 137 | + // you can find cropped images for testing in "Images for Testing" |
| 138 | + int rgb = IMREAD_COLOR; // This should be changed according to the model input requirement. |
| 139 | + Mat image = imread("path/to/text_rec_test.png", rgb); |
| 140 | + |
| 141 | + // Load models weights |
| 142 | + TextRecognitionModel model("path/to/crnn_cs.onnx"); |
| 143 | + |
| 144 | + // The decoding method |
| 145 | + // more methods will be supported in future |
| 146 | + model.setDecodeType("CTC-greedy"); |
| 147 | + |
| 148 | + // Load vocabulary |
| 149 | + // vocabulary should be changed according to the text recognition model |
| 150 | + std::ifstream vocFile; |
| 151 | + vocFile.open("path/to/alphabet_94.txt"); |
| 152 | + CV_Assert(vocFile.is_open()); |
| 153 | + String vocLine; |
| 154 | + std::vector<String> vocabulary; |
| 155 | + while (std::getline(vocFile, vocLine)) { |
| 156 | + vocabulary.push_back(vocLine); |
| 157 | + } |
| 158 | + model.setVocabulary(vocabulary); |
| 159 | +``` |
| 160 | +
|
| 161 | +Step2. Setting Parameters |
| 162 | +
|
| 163 | +```cpp |
| 164 | + // Normalization parameters |
| 165 | + double scale = 1.0 / 127.5; |
| 166 | + Scalar mean = Scalar(127.5, 127.5, 127.5); |
| 167 | +
|
| 168 | + // The input shape |
| 169 | + Size inputSize = Size(100, 32); |
| 170 | +
|
| 171 | + model.setInputParams(scale, inputSize, mean); |
| 172 | +``` |
| 173 | +Step3. Inference |
| 174 | +```cpp |
| 175 | + std::string recognitionResult = recognizer.recognize(image); |
| 176 | + std::cout << "'" << recognitionResult << "'" << std::endl; |
| 177 | +``` |
| 178 | + |
| 179 | +Input image: |
| 180 | + |
| 181 | + |
| 182 | + |
| 183 | +Output: |
| 184 | +``` |
| 185 | +'welcome' |
| 186 | +``` |
| 187 | + |
| 188 | + |
| 189 | +## Example for Text Detection |
| 190 | + |
| 191 | +Step1. Loading images and models |
| 192 | +```cpp |
| 193 | + // Load an image |
| 194 | + // you can find some images for testing in "Images for Testing" |
| 195 | + Mat frame = imread("/path/to/text_det_test.png"); |
| 196 | +``` |
| 197 | + |
| 198 | +Step2.a Setting Parameters (DB) |
| 199 | +```cpp |
| 200 | + // Load model weights |
| 201 | + TextDetectionModel_DB model("/path/to/DB_TD500_resnet50.onnx"); |
| 202 | + |
| 203 | + // Post-processing parameters |
| 204 | + float binThresh = 0.3; |
| 205 | + float polyThresh = 0.5; |
| 206 | + uint maxCandidates = 200; |
| 207 | + double unclipRatio = 2.0; |
| 208 | + model.setBinaryThreshold(binThresh) |
| 209 | + .setPolygonThreshold(polyThresh) |
| 210 | + .setMaxCandidates(maxCandidates) |
| 211 | + .setUnclipRatio(unclipRatio) |
| 212 | + ; |
| 213 | + |
| 214 | + // Normalization parameters |
| 215 | + double scale = 1.0 / 255.0; |
| 216 | + Scalar mean = Scalar(122.67891434, 116.66876762, 104.00698793); |
| 217 | + |
| 218 | + // The input shape |
| 219 | + Size inputSize = Size(736, 736); |
| 220 | + |
| 221 | + model.setInputParams(scale, inputSize, mean); |
| 222 | +``` |
| 223 | +
|
| 224 | +Step2.b Setting Parameters (EAST) |
| 225 | +```cpp |
| 226 | + TextDetectionModel_EAST model("EAST.pb"); |
| 227 | +
|
| 228 | + float confThreshold = 0.5; |
| 229 | + float nmsThreshold = 0.4; |
| 230 | + model.setConfidenceThreshold(confThresh) |
| 231 | + .setNMSThreshold(nmsThresh) |
| 232 | + ; |
| 233 | +
|
| 234 | + double detScale = 1.0; |
| 235 | + Size detInputSize = Size(320, 320); |
| 236 | + Scalar detMean = Scalar(123.68, 116.78, 103.94); |
| 237 | + bool swapRB = true; |
| 238 | + model.setInputParams(detScale, detInputSize, detMean, swapRB); |
| 239 | +``` |
| 240 | + |
| 241 | + |
| 242 | +Step3. Inference |
| 243 | +```cpp |
| 244 | + std::vector<std::vector<Point>> detResults; |
| 245 | + model.detect(detResults); |
| 246 | + |
| 247 | + // Visualization |
| 248 | + polylines(frame, results, true, Scalar(0, 255, 0), 2); |
| 249 | + imshow("Text Detection", image); |
| 250 | + waitKey(); |
| 251 | +``` |
| 252 | +
|
| 253 | +Output: |
| 254 | +
|
| 255 | + |
| 256 | +
|
| 257 | +## Example for Text Spotting |
| 258 | +
|
| 259 | +After following the steps above, it is easy to get the detection results of an input image. |
| 260 | +Then, you can do transformation and crop text images for recognition. |
| 261 | +For more information, please refer to **Detailed Sample** |
| 262 | +```cpp |
| 263 | + // Transform and Crop |
| 264 | + Mat cropped; |
| 265 | + fourPointsTransform(recInput, vertices, cropped); |
| 266 | +
|
| 267 | + String recResult = recognizer.recognize(cropped); |
| 268 | +``` |
| 269 | + |
| 270 | +Output Examples: |
| 271 | + |
| 272 | + |
| 273 | + |
| 274 | + |
| 275 | + |
| 276 | +## Source Code |
| 277 | +The [source code](https://github.com/opencv/opencv/blob/master/modules/dnn/src/model.cpp) |
| 278 | +of these APIs can be found in the DNN module. |
| 279 | + |
| 280 | +## Detailed Sample |
| 281 | +For more information, please refer to: |
| 282 | +- [samples/dnn/scene_text_recognition.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_recognition.cpp) |
| 283 | +- [samples/dnn/scene_text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_detection.cpp) |
| 284 | +- [samples/dnn/text_detection.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp) |
| 285 | +- [samples/dnn/scene_text_spotting.cpp](https://github.com/opencv/opencv/blob/master/samples/dnn/scene_text_spotting.cpp) |
| 286 | + |
| 287 | +#### Test with an image |
| 288 | +Examples: |
| 289 | +```bash |
| 290 | +example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt |
| 291 | +example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736 |
| 292 | +example_dnn_scene_text_spotting -dmp=path/to/DB_IC15_resnet50.onnx -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -iw=1280 -ih=736 -rgb=1 -vp=/path/to/alphabet_94.txt |
| 293 | +example_dnn_text_detection -dmp=path/to/EAST.pb -rmp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=path/to/alphabet_94.txt |
| 294 | +``` |
| 295 | + |
| 296 | +#### Test on public datasets |
| 297 | +Text Recognition: |
| 298 | + |
| 299 | +The download link for testing images can be found in the **Images for Testing** |
| 300 | + |
| 301 | + |
| 302 | +Examples: |
| 303 | +```bash |
| 304 | +example_dnn_scene_text_recognition -mp=path/to/crnn.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_36.txt -rgb=0 |
| 305 | +example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -e=true -edp=path/to/evaluation_data_rec -vp=/path/to/alphabet_94.txt -rgb=1 |
| 306 | +``` |
| 307 | + |
| 308 | +Text Detection: |
| 309 | + |
| 310 | +The download links for testing images can be found in the **Images for Testing** |
| 311 | + |
| 312 | +Examples: |
| 313 | +```bash |
| 314 | +example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/TD500 -ih=736 -iw=736 |
| 315 | +example_dnn_scene_text_detection -mp=path/to/DB_IC15_resnet50.onnx -e=true -edp=path/to/evaluation_data_det/IC15 -ih=736 -iw=1280 |
| 316 | +``` |
0 commit comments