Skip to content

Gsoc-2017 Text detect and recognition dnn backend #1348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from

Conversation

sghoshcvc
Copy link
Contributor

@sghoshcvc sghoshcvc commented Aug 28, 2017

Implementation of opencv-dnn module based backend for text detection and recognition in images based on a deep neural network model described in https://arxiv.org/abs/1611.06779 and https://arxiv.org/abs/1406.2227
This is part of GSOC 2017 with project title End to End text detection and recognition under mentorship of @prasannavk

This extends the work of last year's GSOC on holistic word recognition. A part of pull request #761 is included here.

The following gist documents all the details of this implementation.
https://gist.github.com/sghoshcvc/d955c743bade4415f532d07f4cef919f

@sovrasov sovrasov added the GSoC label Aug 29, 2017
-----

A word spotting CNN is a CNN that takes an image assumed to contain a single word and provides a probabillity over a given vocabulary.
Although other backends will be supported, for the moment only the Caffe backend is supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence needs to be updated to include the DNN backend.

@@ -92,7 +93,7 @@ grouping horizontally aligned text, and the method proposed by Lluis Gomez and D
in @cite Gomez13 @cite Gomez14 for grouping arbitrary oriented text (see erGrouping).

To see the text detector at work, have a look at the textdetection demo:
<https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp>
<https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link may be invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's better to remove all the occurrences of itseez. It doesn't exist anymore and all such links are redirected to opencv repo.

find_package(Boost 1.46 REQUIRED COMPONENTS system thread filesystem)
include_directories(SYSTEM ${Boost_INCLUDE_DIR})
include_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/include/ usr/local/cuda-8.0/include/ /usr/local/cuda-7.5/targets/x86_64-linux/include/ )
link_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/lib/ usr/local/cuda-8.0/lib/ /usr/local/cuda-7.5/targets/x86_64-linux/lib/ /usr/lib/openblas-base/lib /usr/local/cuda-8.0/lib64)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-coding is something to fix. If possible we should use the cuda detected by CMake rather than hard coding. The question to answer is what happens when cuda 9.0 comes out. Would this be broken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll look into it, we should certainly use CUDA detected by opencv, Though this was part of Anguelos's code so we should also ask his opinion.

cd $OPENCV_BUILD_DIR #You must set this
CAFFEROOT="${HOME}/caffe_inst/" #If you used the previous code to compile Caffe in ubuntu 16.04

cmake -DCaffe_LIBS:FILEPATH="$CAFFEROOT/caffe/distribute/lib/libcaffe.so" -DBUILD_opencv_ts:BOOL="0" -DBUILD_opencv_dnn:BOOL="0" -DBUILD_opencv_dnn_modern:BOOL="0" -DCaffe_INCLUDE_DIR:PATH="$CAFFEROOT/caffe/distribute/include" -DWITH_MATLAB:BOOL="0" -DBUILD_opencv_cudabgsegm:BOOL="0" -DWITH_QT:BOOL="1" -DBUILD_opencv_cudaoptflow:BOOL="0" -DBUILD_opencv_cudastereo:BOOL="0" -DBUILD_opencv_cudafilters:BOOL="0" -DBUILD_opencv_cudev:BOOL="1" -DOPENCV_EXTRA_MODULES_PATH:PATH="/home/anguelos/work/projects/opencv_gsoc/opencv_contrib/modules" ./

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra modules path has /home/anguelos/work. Needs to become generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

//Classifiers should provide diferent backends
//For the moment only caffe is implemeted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this comment inaccurate? Now you have both caffe and DNN implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in new code


/** Generic structure of Deep CNN based Text Detectors
* */
class CV_EXPORTS_W DeepCNNTextDetector : public TextRegionDetector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what is the typical way to use the DeepCNNTextDetector? Is it to call create() and then call detect? Or is this some intermediate class that gets inherited by some other class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepCNNTextDetector is created as a generic CNN based text detector, it inherits TextRegionDetector, which can be any text detector. DeepCNNTextDetector implements detect function to provide bounding boxes of text.

*
*/

class CV_EXPORTS_W textDetector : public BaseDetector
Copy link

@prasannavk prasannavk Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this a non-deep method of generating the text boxes? Would be useful to mention what the difference is from the above class in the comments, because to an outsider they look similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, purpose of this class is to take one textRegionDetector be it deep or non deep, do the detection and present the output. This works as an common interface of different textRegionDetector.


print('\nDeeptextdetection.py')
print(' A demo script of text box alogorithm of the paper:')
print(' * Minghui Liao et al.: TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://arxiv.org/abs/1611.06779\n')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍


void textDetectInImage(InputArray inputImage,CV_OUT std::vector<Rect>& Bbox,CV_OUT std::vector<float>& confidence)
{
Mat netOutput;
Copy link

@prasannavk prasannavk Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to follow the opencv c++ style guide for indentation. http://code.opencv.org/projects/opencv/wiki/Coding_Style_Guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


void preprocess_(const Mat& input,Mat& output,Size outputSize,int outputChannels){

//TODO put all the logic of channel and depth conversions in ImageProcessor class

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just call ResizerPreprocessor's preprocess to avoid repeating all this code?




class ResizerPreprocessor: public ImagePreprocessor{
Copy link

@prasannavk prasannavk Sep 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out the preprocessing code out of this file. It distracts away from the real core of the network setup. Also, the different preprocess_ functions requires refactoring. They can be achieved in smaller lines of code. Right now, they are too verbose and repetitive. In a 1000 line src file the main class starts at around line 500. That is not useful someone trying to read the code.

//int channelCount_;
// int inputChannel_ ;//=1;
const int _inputHeight =32;
const int _inputWidth =100;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard coding input size is a no-go. Has to be resolved before a merge can happen. Nobody will be able to use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this

return f.good();
}

class DeepCNNTextDetectorCaffeImpl: public DeepCNNTextDetector{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If HAVE_CAFFE macro is not defined isn't the whole DeepCNNTextDetectorCaffeImpl class not of use? Shouldn't you just envelope all of the class definition with the macro instead of just sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In constructor there is an else section which throws an exception. I think this is better as constructor will be available anyway.

this->setPreprocessor(preprocessor);
#ifdef HAVE_DNN

this->net_ = makePtr<Net>(readNetFromCaffe(modelArchFilename,modelWeightsFilename));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the most important part of the whole pull-request! You should mention it right at the top of the class and maybe link to this page - http://docs.opencv.org/3.3.0/d6/d87/group__dnnLayerList.html saying this will work as long as the layer list includes your network's layers. So, did you have to add any new layers in core DNN to get the text network to work? These assumptions should go in the class level comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the comment section

// HAVE OCR Tesseract
#cmakedefine HAVE_TESSERACT
//#cmakedefine HAVE_TESSERACT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove lines that are not necessary completely and not leave it commented.

Mat textbox_mean(1,3,CV_8U);
textbox_mean.at<uchar>(0,0)=104;
textbox_mean.at<uchar>(0,1)=117;
textbox_mean.at<uchar>(0,2)=123;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these magic numbers from? Or how are they calculated? Maybe this should also be part of configuration of this class or read from a prototxt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are mean values as per the author's original implementation and will be fixed.

this->outputGeometry_.width = net_->output_blobs()[0]->width();



Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary blank lines at that end of code blocks. There should be 1 blank line to separate blocks of code and 2 blank lines at the end of functions. It's a style issue but being consistent makes it easier to read.

@prasannavk
Copy link

Does this PR include work from PR #1287 ? Is so, we should close the other PR.

@prasannavk
Copy link

prasannavk commented Sep 3, 2017

Can you also add a link to the summary Gist file for GSoC in the PR description? In addition, can you add links to the tiny-dnn implementation that we explored? If you can explain the max-pooling layer implementation discrepancy you ran into between caffe and tiny-dnn, it would be valuable for anyone running into the same issue. The roadblocks due to which we had to choose the DNN would be especially be useful. Also, your code will be great to start from for someone on the same path.

@prasannavk
Copy link

prasannavk commented Sep 3, 2017

Also, the question of how to make this code run totally real time? This could be added to the future extensions part of the gist file. Secondly, if a user of the code would like to add particular training data to improve the model, what approach would you suggest? In other words, we have an end-to-end deep model now, but is there a way to train better instead just doing the net_.forward() operation. That could also go on the gist file.


Ptr<OCRHolisticWordRecognizer> OCRHolisticWordRecognizer::create(String modelArchFilename, String modelWeightsFilename, String vocabularyFilename)
{
Ptr<ImagePreprocessor> preprocessor=ImagePreprocessor::createImageStandarizer(113);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you obtain 113 as a parameter? What does it mean? Can it be a named constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of code written as part of GSOC'16 by Anguelos, @lluisgomez or @anguelos can throw some light.

@@ -45,7 +45,7 @@

#include "opencv2/text.hpp"

#include "text_config.hpp"
//#include "text_config.hpp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove dead code and not leave them commented.

@sovrasov sovrasov mentioned this pull request Sep 21, 2017
ocv_add_testdata(samples/ contrib/text
FILES_MATCHING PATTERN "*.xml" PATTERN "*.xml.gz" REGEX "scenetext[0-9]+.jpg"
)
if(HAVE_opencv_dnn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dnn is enabled HAVE_OPENCV_DNN is defined in opencv_modules.hpp, so this definition is useless.

#include "opencv2/dnn.hpp"
#endif

using namespace cv::dnn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command should be guarded using HAVE_OPENCV_DNN definition.

//Classifiers should provide diferent backends

enum{
OCR_HOLISTIC_BACKEND_NONE, //No back end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an item representing empty backend?

# Using cmake scripts and modules
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR})

set(TEXT_DEPS opencv_ml opencv_highgui opencv_imgproc opencv_core opencv_features2d opencv_calib3d)
Copy link
Contributor

@sovrasov sovrasov Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is unused. Also all the commented code should be removed form the cmake script.

@sovrasov
Copy link
Contributor

sovrasov commented Oct 4, 2017

@sghoshcvc did you check your text detection model works with DNN backend?
I have the following error after textDetectInImage call:

OpenCV Error: Assertion failed ((numPriors * _numLocClasses * 4) == inputs[0][1]) in getMemoryShapes, file /home/vsovrasov/repositories/opencv/modules/dnn/src/layers/detection_output_layer.cpp, line 178
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/vsovrasov/repositories/opencv/modules/dnn/src/layers/detection_output_layer.cpp:178: error: (-215) (numPriors * _numLocClasses * 4) == inputs[0][1] in function getMemoryShapes

@sovrasov sovrasov mentioned this pull request Oct 9, 2017
@sovrasov sovrasov closed this Oct 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants