Gsoc-2017 Text detect and recognition dnn backend #1348

sghoshcvc · 2017-08-28T17:53:51Z

Implementation of opencv-dnn module based backend for text detection and recognition in images based on a deep neural network model described in https://arxiv.org/abs/1611.06779 and https://arxiv.org/abs/1406.2227
This is part of GSOC 2017 with project title End to End text detection and recognition under mentorship of @prasannavk

This extends the work of last year's GSOC on holistic word recognition. A part of pull request #761 is included here.

The following gist documents all the details of this implementation.
https://gist.github.com/sghoshcvc/d955c743bade4415f532d07f4cef919f

prasannavk · 2017-08-30T07:22:10Z

modules/text/README.md

+-----
+
+A word spotting CNN is a CNN that takes an image assumed to contain a single word and provides a probabillity over a given vocabulary.
+Although other backends will be supported, for the moment only the Caffe backend is supported.


This sentence needs to be updated to include the DNN backend.

prasannavk · 2017-08-30T07:22:50Z

modules/text/include/opencv2/text.hpp

@@ -92,7 +93,7 @@ grouping horizontally aligned text, and the method proposed by Lluis Gomez and D
 in @cite Gomez13 @cite Gomez14 for grouping arbitrary oriented text (see erGrouping).

 To see the text detector at work, have a look at the textdetection demo:
-<https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp>
+<https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp>



this link may be invalid.

The link is valid

But it's better to remove all the occurrences of itseez. It doesn't exist anymore and all such links are redirected to opencv repo.

prasannavk · 2017-09-02T17:44:48Z

modules/text/CMakeLists.txt

+  find_package(Boost 1.46 REQUIRED COMPONENTS system thread filesystem)
+  include_directories(SYSTEM ${Boost_INCLUDE_DIR})
+  include_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/include/ usr/local/cuda-8.0/include/ /usr/local/cuda-7.5/targets/x86_64-linux/include/ )
+  link_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/lib/ usr/local/cuda-8.0/lib/ /usr/local/cuda-7.5/targets/x86_64-linux/lib/ /usr/lib/openblas-base/lib /usr/local/cuda-8.0/lib64)


This hard-coding is something to fix. If possible we should use the cuda detected by CMake rather than hard coding. The question to answer is what happens when cuda 9.0 comes out. Would this be broken?

Yes, I'll look into it, we should certainly use CUDA detected by opencv, Though this was part of Anguelos's code so we should also ask his opinion.

prasannavk · 2017-09-02T17:45:59Z

modules/text/README.md

+cd $OPENCV_BUILD_DIR #You must set this
+CAFFEROOT="${HOME}/caffe_inst/" #If you used the previous code to compile Caffe in ubuntu 16.04
+
+cmake  -DCaffe_LIBS:FILEPATH="$CAFFEROOT/caffe/distribute/lib/libcaffe.so" -DBUILD_opencv_ts:BOOL="0" -DBUILD_opencv_dnn:BOOL="0" -DBUILD_opencv_dnn_modern:BOOL="0" -DCaffe_INCLUDE_DIR:PATH="$CAFFEROOT/caffe/distribute/include" -DWITH_MATLAB:BOOL="0" -DBUILD_opencv_cudabgsegm:BOOL="0"  -DWITH_QT:BOOL="1" -DBUILD_opencv_cudaoptflow:BOOL="0" -DBUILD_opencv_cudastereo:BOOL="0" -DBUILD_opencv_cudafilters:BOOL="0" -DBUILD_opencv_cudev:BOOL="1" -DOPENCV_EXTRA_MODULES_PATH:PATH="/home/anguelos/work/projects/opencv_gsoc/opencv_contrib/modules"   ./


The extra modules path has /home/anguelos/work. Needs to become generic.

prasannavk · 2017-09-02T18:00:08Z

modules/text/include/opencv2/text/ocr.hpp

-}
+
+//Classifiers should provide diferent backends
+//For the moment only caffe is implemeted


Isn't this comment inaccurate? Now you have both caffe and DNN implemented.

Updated in new code

prasannavk · 2017-09-02T19:50:08Z

modules/text/include/opencv2/text/textDetector.hpp

+
+/** Generic structure of Deep CNN based Text Detectors
+ * */
+class CV_EXPORTS_W  DeepCNNTextDetector : public TextRegionDetector


So, what is the typical way to use the DeepCNNTextDetector? Is it to call create() and then call detect? Or is this some intermediate class that gets inherited by some other class?

DeepCNNTextDetector is created as a generic CNN based text detector, it inherits TextRegionDetector, which can be any text detector. DeepCNNTextDetector implements detect function to provide bounding boxes of text.

prasannavk · 2017-09-02T19:55:35Z

modules/text/include/opencv2/text/textDetector.hpp

+ *
+ */
+
+class CV_EXPORTS_W textDetector : public BaseDetector


So is this a non-deep method of generating the text boxes? Would be useful to mention what the difference is from the above class in the comments, because to an outsider they look similar.

No, purpose of this class is to take one textRegionDetector be it deep or non deep, do the detection and present the output. This works as an common interface of different textRegionDetector.

prasannavk · 2017-09-02T19:59:21Z

modules/text/samples/deeptextdetection.py

+
+print('\nDeeptextdetection.py')
+print('       A demo script of text box alogorithm of the paper:')
+print('       * Minghui Liao et al.: TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://arxiv.org/abs/1611.06779\n')


prasannavk · 2017-09-02T20:16:16Z

modules/text/src/text_detector.cpp

+
+    void textDetectInImage(InputArray inputImage,CV_OUT std::vector<Rect>& Bbox,CV_OUT std::vector<float>& confidence)
+    {
+                Mat netOutput;


Need to follow the opencv c++ style guide for indentation. http://code.opencv.org/projects/opencv/wiki/Coding_Style_Guide

prasannavk · 2017-09-02T20:27:15Z

modules/text/src/ocr_holistic.cpp

+
+    void preprocess_(const Mat& input,Mat& output,Size outputSize,int outputChannels){
+
+        //TODO put all the logic of channel and depth conversions in ImageProcessor class


can you just call ResizerPreprocessor's preprocess to avoid repeating all this code?

prasannavk · 2017-09-02T20:29:34Z

modules/text/src/ocr_holistic.cpp

+
+
+
+class ResizerPreprocessor: public ImagePreprocessor{


I would separate out the preprocessing code out of this file. It distracts away from the real core of the network setup. Also, the different preprocess_ functions requires refactoring. They can be achieved in smaller lines of code. Right now, they are too verbose and repetitive. In a 1000 line src file the main class starts at around line 500. That is not useful someone trying to read the code.

prasannavk · 2017-09-02T20:39:57Z

modules/text/src/ocr_holistic.cpp

+    //int channelCount_;
+   // int inputChannel_ ;//=1;
+    const int _inputHeight =32;
+    const int _inputWidth =100;


Hard coding input size is a no-go. Has to be resolved before a merge can happen. Nobody will be able to use this.

I'll update this

prasannavk · 2017-09-03T01:16:50Z

modules/text/src/text_detectorCNN.cpp

+    return f.good();
+}
+
+class DeepCNNTextDetectorCaffeImpl: public DeepCNNTextDetector{


If HAVE_CAFFE macro is not defined isn't the whole DeepCNNTextDetectorCaffeImpl class not of use? Shouldn't you just envelope all of the class definition with the macro instead of just sections?

In constructor there is an else section which throws an exception. I think this is better as constructor will be available anyway.

prasannavk · 2017-09-03T01:30:10Z

modules/text/src/ocr_holistic.cpp

+        this->setPreprocessor(preprocessor);
+#ifdef HAVE_DNN
+
+        this->net_ = makePtr<Net>(readNetFromCaffe(modelArchFilename,modelWeightsFilename));


I think this is the most important part of the whole pull-request! You should mention it right at the top of the class and maybe link to this page - http://docs.opencv.org/3.3.0/d6/d87/group__dnnLayerList.html saying this will work as long as the layer list includes your network's layers. So, did you have to add any new layers in core DNN to get the text network to work? These assumptions should go in the class level comments.

I will update the comment section

prasannavk · 2017-09-03T03:39:55Z

modules/text/text_config.hpp.in

 // HAVE OCR Tesseract
-#cmakedefine HAVE_TESSERACT
+//#cmakedefine HAVE_TESSERACT


I would remove lines that are not necessary completely and not leave it commented.

prasannavk · 2017-09-03T03:41:01Z

modules/text/src/text_detectorCNN.cpp

+    Mat textbox_mean(1,3,CV_8U);
+    textbox_mean.at<uchar>(0,0)=104;
+    textbox_mean.at<uchar>(0,1)=117;
+    textbox_mean.at<uchar>(0,2)=123;


where are these magic numbers from? Or how are they calculated? Maybe this should also be part of configuration of this class or read from a prototxt.

These numbers are mean values as per the author's original implementation and will be fixed.

prasannavk · 2017-09-03T04:11:13Z

modules/text/src/text_detectorCNN.cpp

+        this->outputGeometry_.width = net_->output_blobs()[0]->width();
+
+
+


Please remove unnecessary blank lines at that end of code blocks. There should be 1 blank line to separate blocks of code and 2 blank lines at the end of functions. It's a style issue but being consistent makes it easier to read.

prasannavk · 2017-09-03T05:40:36Z

Does this PR include work from PR #1287 ? Is so, we should close the other PR.

prasannavk · 2017-09-03T05:57:04Z

Can you also add a link to the summary Gist file for GSoC in the PR description? In addition, can you add links to the tiny-dnn implementation that we explored? If you can explain the max-pooling layer implementation discrepancy you ran into between caffe and tiny-dnn, it would be valuable for anyone running into the same issue. The roadblocks due to which we had to choose the DNN would be especially be useful. Also, your code will be great to start from for someone on the same path.

prasannavk · 2017-09-03T06:10:01Z

Also, the question of how to make this code run totally real time? This could be added to the future extensions part of the gist file. Secondly, if a user of the code would like to add particular training data to improve the model, what approach would you suggest? In other words, we have an end-to-end deep model now, but is there a way to train better instead just doing the net_.forward() operation. That could also go on the gist file.

prasannavk · 2017-09-03T17:09:42Z

modules/text/src/ocr_holistic.cpp

+
+Ptr<OCRHolisticWordRecognizer> OCRHolisticWordRecognizer::create(String modelArchFilename, String modelWeightsFilename, String vocabularyFilename)
+{
+    Ptr<ImagePreprocessor> preprocessor=ImagePreprocessor::createImageStandarizer(113);


How do you obtain 113 as a parameter? What does it mean? Can it be a named constant?

This is part of code written as part of GSOC'16 by Anguelos, @lluisgomez or @anguelos can throw some light.

prasannavk · 2017-09-03T17:10:12Z

modules/text/src/precomp.hpp

@@ -45,7 +45,7 @@

 #include "opencv2/text.hpp"

-#include "text_config.hpp"
+//#include "text_config.hpp"


Please remove dead code and not leave them commented.

…oshcvc/opencv_contrib into GSOC_text_detect_DNN_backend merge conflict

sovrasov · 2017-09-28T08:04:20Z

modules/text/CMakeLists.txt

-ocv_add_testdata(samples/ contrib/text
-    FILES_MATCHING PATTERN "*.xml" PATTERN "*.xml.gz" REGEX "scenetext[0-9]+.jpg"
-)
+if(HAVE_opencv_dnn)


If dnn is enabled HAVE_OPENCV_DNN is defined in opencv_modules.hpp, so this definition is useless.

sovrasov · 2017-09-28T09:04:45Z

modules/text/src/text_detectorCNN.cpp

+#include "opencv2/dnn.hpp"
+#endif
+
+using namespace cv::dnn;


This command should be guarded using HAVE_OPENCV_DNN definition.

sovrasov · 2017-09-28T09:24:25Z

modules/text/include/opencv2/text/ocr.hpp

+//Classifiers should provide diferent backends
+
+enum{
+    OCR_HOLISTIC_BACKEND_NONE, //No back end


Why do you need an item representing empty backend?

sovrasov · 2017-09-28T09:55:27Z

modules/text/CMakeLists.txt

+# Using cmake scripts and modules
+list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR})
+
+set(TEXT_DEPS opencv_ml opencv_highgui opencv_imgproc opencv_core opencv_features2d opencv_calib3d)


This variable is unused. Also all the commented code should be removed form the cmake script.

sovrasov · 2017-10-04T11:57:47Z

@sghoshcvc did you check your text detection model works with DNN backend?
I have the following error after textDetectInImage call:

OpenCV Error: Assertion failed ((numPriors * _numLocClasses * 4) == inputs[0][1]) in getMemoryShapes, file /home/vsovrasov/repositories/opencv/modules/dnn/src/layers/detection_output_layer.cpp, line 178
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/vsovrasov/repositories/opencv/modules/dnn/src/layers/detection_output_layer.cpp:178: error: (-215) (numPriors * _numLocClasses * 4) == inputs[0][1] in function getMemoryShapes

sghoshcvc added 20 commits June 22, 2017 18:31

Text detector class and Custom Image processor Class

9ae765a

Add sample script

40db962

Minor modification

fc9c41b

Added comments

e494efb

added instructions to build

2b8ed12

Modified the class heirarchy

be395e5

Added python sample script

1bc908b

simple cleaning and added comments

73ddeab

Merge branch 'master' into gsoc_textDetect_2017

9071ca7

fix a dependency bug

8cf800e

removed Java Wrapper

a617059

Removed white space errors and platform specific warnings

ca2a2ab

Fixed Doxygen Warning and error

b913cac

Fixed Text box demo error

4c9af58

White Space error in sample python script

103fbaf

Modified to handle windows warning

0e74d63

Modified to silent Clang warnings

111b3be

DNN backend initial commit

a2cab07

added calculation of output size

c697e41

Merge branch 'master' into GSOC_text_detect_DNN_backend

731637e

sovrasov added the GSoC label Aug 29, 2017

prasannavk reviewed Aug 30, 2017

View reviewed changes

prasannavk reviewed Sep 2, 2017

View reviewed changes

prasannavk reviewed Sep 3, 2017

View reviewed changes

sghoshcvc added 7 commits September 5, 2017 06:16

removed blanks, fixed Cmake issue

dc48968

Merge branch 'GSOC_text_detect_DNN_backend' of https://github.com/sgh…

e98f42e

…oshcvc/opencv_contrib into GSOC_text_detect_DNN_backend merge conflict

seperate image pre-processing from ocr code

af536b1

removed hard coding height and width

efc864c

removed hard codinginput parameters

887e6e5

modified initializers

878258b

Modified initializers list

bf630be

sovrasov mentioned this pull request Sep 21, 2017

Gsoc text detect merge #1287

Closed

sovrasov reviewed Sep 28, 2017

View reviewed changes

mshabunin mentioned this pull request Sep 28, 2017

GSOC 2016 Holistic word spotter #723

Closed

sovrasov mentioned this pull request Oct 9, 2017

CNN-based text detector #1399

Merged

sovrasov closed this Oct 10, 2017


		void preprocess_(const Mat& input,Mat& output,Size outputSize,int outputChannels){

		//TODO put all the logic of channel and depth conversions in ImageProcessor class

		this->outputGeometry_.width = net_->output_blobs()[0]->width();

Gsoc-2017 Text detect and recognition dnn backend #1348

Gsoc-2017 Text detect and recognition dnn backend #1348

Uh oh!

Conversation

sghoshcvc commented Aug 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prasannavk Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prasannavk Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prasannavk Sep 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prasannavk commented Sep 3, 2017

Uh oh!

prasannavk commented Sep 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prasannavk commented Sep 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

sghoshcvc commented Aug 28, 2017 •

edited

Loading

prasannavk Sep 2, 2017 •

edited

Loading

prasannavk Sep 2, 2017 •

edited

Loading

prasannavk Sep 2, 2017 •

edited

Loading

prasannavk commented Sep 3, 2017 •

edited

Loading

prasannavk commented Sep 3, 2017 •

edited

Loading

sovrasov Sep 28, 2017 •

edited

Loading