-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Gsoc-2017 Text detect and recognition dnn backend #1348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
modules/text/README.md
Outdated
----- | ||
|
||
A word spotting CNN is a CNN that takes an image assumed to contain a single word and provides a probabillity over a given vocabulary. | ||
Although other backends will be supported, for the moment only the Caffe backend is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence needs to be updated to include the DNN backend.
@@ -92,7 +93,7 @@ grouping horizontally aligned text, and the method proposed by Lluis Gomez and D | |||
in @cite Gomez13 @cite Gomez14 for grouping arbitrary oriented text (see erGrouping). | |||
|
|||
To see the text detector at work, have a look at the textdetection demo: | |||
<https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp> | |||
<https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link may be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's better to remove all the occurrences of itseez. It doesn't exist anymore and all such links are redirected to opencv repo.
modules/text/CMakeLists.txt
Outdated
find_package(Boost 1.46 REQUIRED COMPONENTS system thread filesystem) | ||
include_directories(SYSTEM ${Boost_INCLUDE_DIR}) | ||
include_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/include/ usr/local/cuda-8.0/include/ /usr/local/cuda-7.5/targets/x86_64-linux/include/ ) | ||
link_directories(SYSTEM /usr/local/cuda-8.0/targets/x86_64-linux/lib/ usr/local/cuda-8.0/lib/ /usr/local/cuda-7.5/targets/x86_64-linux/lib/ /usr/lib/openblas-base/lib /usr/local/cuda-8.0/lib64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hard-coding is something to fix. If possible we should use the cuda detected by CMake rather than hard coding. The question to answer is what happens when cuda 9.0 comes out. Would this be broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll look into it, we should certainly use CUDA detected by opencv, Though this was part of Anguelos's code so we should also ask his opinion.
modules/text/README.md
Outdated
cd $OPENCV_BUILD_DIR #You must set this | ||
CAFFEROOT="${HOME}/caffe_inst/" #If you used the previous code to compile Caffe in ubuntu 16.04 | ||
|
||
cmake -DCaffe_LIBS:FILEPATH="$CAFFEROOT/caffe/distribute/lib/libcaffe.so" -DBUILD_opencv_ts:BOOL="0" -DBUILD_opencv_dnn:BOOL="0" -DBUILD_opencv_dnn_modern:BOOL="0" -DCaffe_INCLUDE_DIR:PATH="$CAFFEROOT/caffe/distribute/include" -DWITH_MATLAB:BOOL="0" -DBUILD_opencv_cudabgsegm:BOOL="0" -DWITH_QT:BOOL="1" -DBUILD_opencv_cudaoptflow:BOOL="0" -DBUILD_opencv_cudastereo:BOOL="0" -DBUILD_opencv_cudafilters:BOOL="0" -DBUILD_opencv_cudev:BOOL="1" -DOPENCV_EXTRA_MODULES_PATH:PATH="/home/anguelos/work/projects/opencv_gsoc/opencv_contrib/modules" ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra modules path has /home/anguelos/work
. Needs to become generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
} | ||
|
||
//Classifiers should provide diferent backends | ||
//For the moment only caffe is implemeted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this comment inaccurate? Now you have both caffe and DNN implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in new code
|
||
/** Generic structure of Deep CNN based Text Detectors | ||
* */ | ||
class CV_EXPORTS_W DeepCNNTextDetector : public TextRegionDetector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what is the typical way to use the DeepCNNTextDetector
? Is it to call create()
and then call detect? Or is this some intermediate class that gets inherited by some other class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepCNNTextDetector is created as a generic CNN based text detector, it inherits TextRegionDetector, which can be any text detector. DeepCNNTextDetector implements detect function to provide bounding boxes of text.
* | ||
*/ | ||
|
||
class CV_EXPORTS_W textDetector : public BaseDetector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this a non-deep method of generating the text boxes? Would be useful to mention what the difference is from the above class in the comments, because to an outsider they look similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, purpose of this class is to take one textRegionDetector be it deep or non deep, do the detection and present the output. This works as an common interface of different textRegionDetector.
|
||
print('\nDeeptextdetection.py') | ||
print(' A demo script of text box alogorithm of the paper:') | ||
print(' * Minghui Liao et al.: TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://arxiv.org/abs/1611.06779\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
|
||
void textDetectInImage(InputArray inputImage,CV_OUT std::vector<Rect>& Bbox,CV_OUT std::vector<float>& confidence) | ||
{ | ||
Mat netOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to follow the opencv c++ style guide for indentation. http://code.opencv.org/projects/opencv/wiki/Coding_Style_Guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
modules/text/src/ocr_holistic.cpp
Outdated
|
||
void preprocess_(const Mat& input,Mat& output,Size outputSize,int outputChannels){ | ||
|
||
//TODO put all the logic of channel and depth conversions in ImageProcessor class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just call ResizerPreprocessor
's preprocess to avoid repeating all this code?
modules/text/src/ocr_holistic.cpp
Outdated
|
||
|
||
|
||
class ResizerPreprocessor: public ImagePreprocessor{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would separate out the preprocessing code out of this file. It distracts away from the real core of the network setup. Also, the different preprocess_
functions requires refactoring. They can be achieved in smaller lines of code. Right now, they are too verbose and repetitive. In a 1000 line src file the main class starts at around line 500. That is not useful someone trying to read the code.
modules/text/src/ocr_holistic.cpp
Outdated
//int channelCount_; | ||
// int inputChannel_ ;//=1; | ||
const int _inputHeight =32; | ||
const int _inputWidth =100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard coding input size is a no-go. Has to be resolved before a merge can happen. Nobody will be able to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this
return f.good(); | ||
} | ||
|
||
class DeepCNNTextDetectorCaffeImpl: public DeepCNNTextDetector{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If HAVE_CAFFE
macro is not defined isn't the whole DeepCNNTextDetectorCaffeImpl
class not of use? Shouldn't you just envelope all of the class definition with the macro instead of just sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In constructor there is an else section which throws an exception. I think this is better as constructor will be available anyway.
this->setPreprocessor(preprocessor); | ||
#ifdef HAVE_DNN | ||
|
||
this->net_ = makePtr<Net>(readNetFromCaffe(modelArchFilename,modelWeightsFilename)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the most important part of the whole pull-request! You should mention it right at the top of the class and maybe link to this page - http://docs.opencv.org/3.3.0/d6/d87/group__dnnLayerList.html saying this will work as long as the layer list includes your network's layers. So, did you have to add any new layers in core DNN to get the text network to work? These assumptions should go in the class level comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the comment section
modules/text/text_config.hpp.in
Outdated
// HAVE OCR Tesseract | ||
#cmakedefine HAVE_TESSERACT | ||
//#cmakedefine HAVE_TESSERACT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove lines that are not necessary completely and not leave it commented.
Mat textbox_mean(1,3,CV_8U); | ||
textbox_mean.at<uchar>(0,0)=104; | ||
textbox_mean.at<uchar>(0,1)=117; | ||
textbox_mean.at<uchar>(0,2)=123; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these magic numbers from? Or how are they calculated? Maybe this should also be part of configuration of this class or read from a prototxt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers are mean values as per the author's original implementation and will be fixed.
this->outputGeometry_.width = net_->output_blobs()[0]->width(); | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unnecessary blank lines at that end of code blocks. There should be 1 blank line to separate blocks of code and 2 blank lines at the end of functions. It's a style issue but being consistent makes it easier to read.
Does this PR include work from PR #1287 ? Is so, we should close the other PR. |
Can you also add a link to the summary Gist file for GSoC in the PR description? In addition, can you add links to the tiny-dnn implementation that we explored? If you can explain the max-pooling layer implementation discrepancy you ran into between caffe and tiny-dnn, it would be valuable for anyone running into the same issue. The roadblocks due to which we had to choose the DNN would be especially be useful. Also, your code will be great to start from for someone on the same path. |
Also, the question of how to make this code run totally real time? This could be added to the future extensions part of the gist file. Secondly, if a user of the code would like to add particular training data to improve the model, what approach would you suggest? In other words, we have an end-to-end deep model now, but is there a way to train better instead just doing the |
|
||
Ptr<OCRHolisticWordRecognizer> OCRHolisticWordRecognizer::create(String modelArchFilename, String modelWeightsFilename, String vocabularyFilename) | ||
{ | ||
Ptr<ImagePreprocessor> preprocessor=ImagePreprocessor::createImageStandarizer(113); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you obtain 113 as a parameter? What does it mean? Can it be a named constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of code written as part of GSOC'16 by Anguelos, @lluisgomez or @anguelos can throw some light.
modules/text/src/precomp.hpp
Outdated
@@ -45,7 +45,7 @@ | |||
|
|||
#include "opencv2/text.hpp" | |||
|
|||
#include "text_config.hpp" | |||
//#include "text_config.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove dead code and not leave them commented.
…oshcvc/opencv_contrib into GSOC_text_detect_DNN_backend merge conflict
ocv_add_testdata(samples/ contrib/text | ||
FILES_MATCHING PATTERN "*.xml" PATTERN "*.xml.gz" REGEX "scenetext[0-9]+.jpg" | ||
) | ||
if(HAVE_opencv_dnn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dnn is enabled HAVE_OPENCV_DNN
is defined in opencv_modules.hpp
, so this definition is useless.
#include "opencv2/dnn.hpp" | ||
#endif | ||
|
||
using namespace cv::dnn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command should be guarded using HAVE_OPENCV_DNN
definition.
//Classifiers should provide diferent backends | ||
|
||
enum{ | ||
OCR_HOLISTIC_BACKEND_NONE, //No back end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need an item representing empty backend?
# Using cmake scripts and modules | ||
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}) | ||
|
||
set(TEXT_DEPS opencv_ml opencv_highgui opencv_imgproc opencv_core opencv_features2d opencv_calib3d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is unused. Also all the commented code should be removed form the cmake script.
@sghoshcvc did you check your text detection model works with DNN backend?
|
Implementation of opencv-dnn module based backend for text detection and recognition in images based on a deep neural network model described in https://arxiv.org/abs/1611.06779 and https://arxiv.org/abs/1406.2227
This is part of GSOC 2017 with project title End to End text detection and recognition under mentorship of @prasannavk
This extends the work of last year's GSOC on holistic word recognition. A part of pull request #761 is included here.
The following gist documents all the details of this implementation.
https://gist.github.com/sghoshcvc/d955c743bade4415f532d07f4cef919f