Skip to content

MMOCR Release v0.5.0

Compare
Choose a tag to compare
@gaotongxiao gaotongxiao released this 31 Mar 09:50
· 666 commits to main since this release
0546134

Highlights

  1. MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain .txt file to JSON line format .jsonl, and then revise a few configurations to enable the LineJsonParser. For more information, please read our step-by-step tutorial.
  2. Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in mmocr.utils.ocr by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object by MMOCR(det=’Tesseract’, recog=’Tesseract’). Credit to @garvan2021
  3. We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
  4. Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!

Migration Guide - ResNet

Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.

Plugin

  • Plugin is a module category inherited from MMCV's implementation of PLUGIN_LAYERS, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.

    Plugin Example
    @PLUGIN_LAYERS.register_module()
    class Maxpool2d(nn.Module):
        """A wrapper around nn.Maxpool2d().
    
        Args:
            kernel_size (int or tuple(int)): Kernel size for max pooling layer
            stride (int or tuple(int)): Stride for max pooling layer
            padding (int or tuple(int)): Padding for pooling layer
        """
    
        def __init__(self, kernel_size, stride, padding=0, **kwargs):
            super(Maxpool2d, self).__init__()
            self.model = nn.MaxPool2d(kernel_size, stride, padding)
    
        def forward(self, x):
            """
            Args:
                x (Tensor): Input feature map
    
            Returns:
                Tensor: The tensor after Maxpooling layer.
            """
            return self.model(x)

Stage-wise Plugins

  • ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving plugins parameters in ResNet.

    [port1: before stage] ---> [stage] ---> [port2: after stage]
    
  • E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this

    resnet18_speical = ResNet(
            # for simplicity, some required
            # parameters are omitted
            plugins=[
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, True, True),
                    position='before_stage')
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, False, True),
                    position='after_stage')
            ])
  • You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:

    Multiple Plugins Example
    • ResNet in Master is based on ResNet31. And after each stage, a module named GCAModule will be used. The GCAModule is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins at after_stage port in the same time.

      resnet_master = ResNet(
                      # for simplicity, some required
                      # parameters are omitted
                      plugins=[
                          dict(
                              cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
                              stages=(True, True, False, False),
                              position='before_stage'),
                          dict(
                              cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
                              stages=(False, False, True, False),
                              position='before_stage'),
                          dict(
                              cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1),
                              stages=[True, True, True, True],
                              position='after_stage'),
                          dict(
                              cfg=dict(
                                  type='ConvModule',
                                  kernel_size=3,
                                  stride=1,
                                  padding=1,
                                  norm_cfg=dict(type='BN'),
                                  act_cfg=dict(type='ReLU')),
                              stages=(True, True, True, True),
                              position='after_stage')
                      ])
  • In each plugin, we will pass two parameters (in_channels, out_channels) to support operations that need the information of current channels.

Block-wise Plugin (Experimental)

  • We also refactored the BasicBlock used in ResNet. Now it can be customized with block-wise plugins. Check here for more details.

  • BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.

        [port1: before_conv1] ---> [conv1] --->
        [port2: after_conv1] ---> [conv2] --->
        [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
    
  • In each plugin, we will pass a parameter in_channels to support operations that need the information of current channels.

  • E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:

    Block-wise Plugin Example
    resnet_31 = ResNet(
            in_channels=3,
            stem_channels=[64, 128],
            block_cfgs=dict(type='BasicBlock'),
            arch_layers=[1, 2, 5, 3],
            arch_channels=[256, 256, 512, 512],
            strides=[1, 1, 1, 1],
            plugins=[
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=2,
                    stride=(2, 2)),
                    stages=(True, True, False, False),
                    position='before_stage'),
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=(2, 1),
                    stride=(2, 1)),
                    stages=(False, False, True, False),
                    position='before_stage'),
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, True, True),
                    position='after_stage')
            ])

Full Examples

ResNet without plugins
  • ResNet45 is used in ASTER and ABINet without any plugins.

    resnet45_aster = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)])
    
    resnet45_abi = ResNet(
        in_channels=3,
        stem_channels=32,
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[2, 1, 2, 1, 1])
ResNet with plugins
  • ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.

    resnet_31 = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock'),
        arch_layers=[1, 2, 5, 3],
        arch_channels=[256, 256, 512, 512],
        strides=[1, 1, 1, 1],
        plugins=[
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=2,
                stride=(2, 2)),
                stages=(True, True, False, False),
                position='before_stage'),
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=(2, 1),
                stride=(2, 1)),
                stages=(False, False, True, False),
                position='before_stage'),
            dict(
                cfg=dict(
                type='ConvModule',
                kernel_size=3,
                stride=1,
                padding=1,
                norm_cfg=dict(type='BN'),
                act_cfg=dict(type='ReLU')),
                stages=(True, True, True, True),
                position='after_stage')
        ])

Migration Guide - Dataset Annotation Loader

The annotation loaders, LmdbLoader and HardDiskLoader, are unified into AnnFileLoader for a more consistent design and wider support on different file formats and storage backends. AnnFileLoader can load the annotations from disk(default), http and petrel backend, and parse the annotation in txt or lmdb format. LmdbLoader and HardDiskLoader are deprecated, and users are recommended to modify their configs to use the new AnnFileLoader. Users can migrate their legacy loader HardDiskLoader referring to the following example:

# Legacy config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='HardDiskLoader',
        ...))

# Suggested config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='AnnFileLoader',
        file_storage_backend='disk',
        file_format='txt',
        ...))

Similarly, using AnnFileLoader with file_format='lmdb' instead of LmdbLoader is strongly recommended.

New Features & Enhancements

Bug Fixes

Docs

New Contributors

Full Changelog: v0.4.1...v0.5.0