MMOCR Release v0.5.0
Highlights
- MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain
.txt
file to JSON line format.jsonl
, and then revise a few configurations to enable theLineJsonParser
. For more information, please read our step-by-step tutorial. - Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in
mmocr.utils.ocr
by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object byMMOCR(det=’Tesseract’, recog=’Tesseract’)
. Credit to @garvan2021 - We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
- Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!
Migration Guide - ResNet
Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like
architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.
Plugin
-
Plugin
is a module category inherited from MMCV's implementation ofPLUGIN_LAYERS
, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.Plugin Example
@PLUGIN_LAYERS.register_module() class Maxpool2d(nn.Module): """A wrapper around nn.Maxpool2d(). Args: kernel_size (int or tuple(int)): Kernel size for max pooling layer stride (int or tuple(int)): Stride for max pooling layer padding (int or tuple(int)): Padding for pooling layer """ def __init__(self, kernel_size, stride, padding=0, **kwargs): super(Maxpool2d, self).__init__() self.model = nn.MaxPool2d(kernel_size, stride, padding) def forward(self, x): """ Args: x (Tensor): Input feature map Returns: Tensor: The tensor after Maxpooling layer. """ return self.model(x)
Stage-wise Plugins
-
ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving
plugins
parameters in ResNet.[port1: before stage] ---> [stage] ---> [port2: after stage]
-
E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this
resnet18_speical = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='before_stage') dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, False, True), position='after_stage') ])
-
You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:
Multiple Plugins Example
-
ResNet in Master is based on ResNet31. And after each stage, a module named
GCAModule
will be used. TheGCAModule
is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins atafter_stage
port in the same time.resnet_master = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1), stages=[True, True, True, True], position='after_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
-
-
In each plugin, we will pass two parameters (
in_channels
,out_channels
) to support operations that need the information of current channels.
Block-wise Plugin (Experimental)
-
We also refactored the
BasicBlock
used in ResNet. Now it can be customized with block-wise plugins. Check here for more details. -
BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.
[port1: before_conv1] ---> [conv1] ---> [port2: after_conv1] ---> [conv2] ---> [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
-
In each plugin, we will pass a parameter
in_channels
to support operations that need the information of current channels. -
E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:
Block-wise Plugin Example
resnet_31 = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock'), arch_layers=[1, 2, 5, 3], arch_channels=[256, 256, 512, 512], strides=[1, 1, 1, 1], plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
Full Examples
ResNet without plugins
-
ResNet45 is used in ASTER and ABINet without any plugins.
resnet45_aster = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)]) resnet45_abi = ResNet( in_channels=3, stem_channels=32, block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[2, 1, 2, 1, 1])
ResNet with plugins
-
ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.
resnet_31 = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock'), arch_layers=[1, 2, 5, 3], arch_channels=[256, 256, 512, 512], strides=[1, 1, 1, 1], plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
Migration Guide - Dataset Annotation Loader
The annotation loaders, LmdbLoader
and HardDiskLoader
, are unified into AnnFileLoader
for a more consistent design and wider support on different file formats and storage backends. AnnFileLoader
can load the annotations from disk
(default), http
and petrel
backend, and parse the annotation in txt
or lmdb
format. LmdbLoader
and HardDiskLoader
are deprecated, and users are recommended to modify their configs to use the new AnnFileLoader
. Users can migrate their legacy loader HardDiskLoader
referring to the following example:
# Legacy config
train = dict(
type='OCRDataset',
...
loader=dict(
type='HardDiskLoader',
...))
# Suggested config
train = dict(
type='OCRDataset',
...
loader=dict(
type='AnnFileLoader',
file_storage_backend='disk',
file_format='txt',
...))
Similarly, using AnnFileLoader
with file_format='lmdb'
instead of LmdbLoader
is strongly recommended.
New Features & Enhancements
- Update mmcv install by @Harold-lkk in #775
- Upgrade isort by @gaotongxiao in #771
- Automatically infer device for inference if not speicifed by @gaotongxiao in #781
- Add open-mmlab precommit hooks by @gaotongxiao in #787
- Add windows CI by @gaotongxiao in #790
- Add CurvedSyntext150k Converter by @gaotongxiao in #719
- Add FUNSD Converter by @xinke-wang in #808
- Support loading annotation file with petrel/http backend by @cuhk-hbsun in #793
- Support different seeds on different ranks by @gaotongxiao in #820
- Support json in recognition converter by @Mountchicken in #844
- Add args and docs for multi-machine training/testing by @gaotongxiao in #849
- Add warning info for LineStrParser by @xinke-wang in #850
- Deploy openmmlab-bot by @gaotongxiao in #876
- Add Tesserocr Inference by @garvan2021 in #814
- Add LV Dataset Converter by @xinke-wang in #871
- Add SROIE Converter by @xinke-wang in #810
- Add NAF Converter by @xinke-wang in #815
- Add DeText Converter by @xinke-wang in #818
- Add IMGUR Converter by @xinke-wang in #825
- Add ILST Converter by @Mountchicken in #833
- Add KAIST Converter by @xinke-wang in #835
- Add IC11 (Born-digital Images) Data Converter by @xinke-wang in #857
- Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in #861
- Add BID Converter by @Mountchicken in #862
- Add Vintext Converter by @Mountchicken in #864
- Add MTWI Data Converter by @xinke-wang in #867
- Add COCO Text v2 Data Converter by @xinke-wang in #872
- Add ReCTS Data Converter by @xinke-wang in #892
- Refactor ResNets by @Mountchicken in #809
Bug Fixes
- Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in #763
- Update mmdet version limit by @cuhk-hbsun in #773
- Minimum version requirement of albumentations by @gaotongxiao in #769
- Disable worker in the dataloader of gpu unit test by @gaotongxiao in #780
- Standardize the type of torch.device in ocr.py by @gaotongxiao in #800
- Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in #685
- Add num_classes to configs of ABINet by @gaotongxiao in #805
- Support loading space character from dict file by @gaotongxiao in #854
- Description in tools/data/utils/txt2lmdb.py by @Mountchicken in #870
- ignore_index in SARLoss by @Mountchicken in #869
- Fix a bug that may cause inplace operation error by @Mountchicken in #884
- Use hyphen instead of underscores in script args by @gaotongxiao in #890
Docs
- Add deprecation message for deploy tools by @xinke-wang in #801
- Reorganizing OpenMMLab projects in readme by @xinke-wang in #806
- Add demo/README_zh.md by @EighteenSprings in #802
- Add detailed version requirement table by @gaotongxiao in #778
- Correct misleading section title in training.md by @gaotongxiao in #819
- Update README_zh-CN document URL by @BeyondYourself in #823
- translate testing.md. by @yangrisheng in #822
- Fix confused description for load-from and resume-from by @xinke-wang in #842
- Add documents getting_started in docs/zh by @BeyondYourself in #841
- Add the model serving translation document by @BeyondYourself in #845
- Update docs about installation on Windows by @Mountchicken in #852
- Update tutorial notebook by @gaotongxiao in #853
- Update Instructions for New Data Converters by @xinke-wang in #900
- Brief installation instruction in README by @Harold-lkk in #897
- update doc for ILST, VinText, BID by @Mountchicken in #902
- Fix typos in readme by @gaotongxiao in #903
- Recog dataset doc by @Harold-lkk in #893
- Reorganize the directory structure section in det.md by @gaotongxiao in #894
New Contributors
- @GPhilo made their first contribution in #763
- @xinke-wang made their first contribution in #801
- @EighteenSprings made their first contribution in #802
- @BeyondYourself made their first contribution in #823
- @yangrisheng made their first contribution in #822
- @Mountchicken made their first contribution in #844
- @garvan2021 made their first contribution in #814
Full Changelog: v0.4.1...v0.5.0