feat: add model script, training recipe and pretrained weight of halonet (#720)

rabbit-fgh · web-flow · commit 94ef850e92fc · 2023-08-08T14:18:38.000+08:00
diff --git a/configs/halonet/README.md b/configs/halonet/README.md
@@ -0,0 +1,97 @@
+# HaloNet
+
+> [Scaling Local Self-Attention for Parameter Efficient Visual Backbones](https://arxiv.org/abs/2103.12731)
+
+## Introduction
+
+Researchers from Google Research and UC Berkeley have developed a new model of self-attention that can outperform standard baseline models and even high-performance convolutional models.[[1](#references)]
+
+Blocked Self-Attention：The whole input image is divided into multiple blocks and self-attention is applied to each block.However, if only the information inside the block is considered each time, it will inevitably lead to the loss of information.Therefore, before calculating the SA, a haloing operation is performed on each block, i.e., outside of each block, the information of the original image is used to padding a circle, so that the sensory field of each block can be appropriately larger and focus on more information.
+
+<p align="center">
+  <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257577202-3ac43b82-785a-42c5-9b6c-ca58b0fa7ab8.png" width=800 />
+</p>
+<p align="center">
+  <em>Figure 1. Architecture of Blocked Self-Attention [<a href="#references">1</a>] </em>
+</p>
+
+Down Sampling：In order to reduce the amount of computation, each block is sampled separately, and then attentions are performed on this sampled information to reach the effect of down sampling.
+
+<p align="center">
+  <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257578183-fe45c2c2-5006-492b-b30a-5b049a0e2531.png" width=800 />
+</p>
+<p align="center">
+  <em>Figure 2. Architecture of Down Sampling [<a href="#references">1</a>] </em>
+</p>
+
+
+## Results
+
+Our reproduced model performance on ImageNet-1K is reported as follows.
+
+<div align="center">
+
+| Model       | Context  | Top-1 (%) | Top-5 (%) | Params (M) | Recipe                                                       | Download                                                     |
+| ----------- | -------- | --------- | --------- | ---------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| halonet_50t | D910X8-G | 79.53     | 94.79     | 22.79      | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/halonet/halonet_50t_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/halonet/halonet_50t-533da6be.ckpt) |
+
+</div>
+
+#### Notes
+
+- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
+- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
+
+## Quick Start
+
+### Preparation
+
+#### Installation
+Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
+
+#### Dataset Preparation
+Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
+
+### Training
+
+* Distributed Training
+
+It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
+
+```shell
+# distributed training on multiple GPU/Ascend devices
+mpirun -n 8 python train.py --config configs/halonet/halonet_50t_ascend.yaml  --data_dir /path/to/imagenet
+```
+
+> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
+
+Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
+
+For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
+
+**Note:**  As the global batch size  (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
+
+* Standalone Training
+
+If you want to train or finetune the model on a smaller dataset without distributed training, please run:
+
+```shell
+# standalone training on a CPU/GPU/Ascend device
+python train.py --config configs/halonet/halonet_50t_ascend.yaml  --data_dir /path/to/dataset --distribute False
+```
+
+### Validation
+
+To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
+
+```shell
+python validate.py -c configs/halonet/halonet_50t_ascend.yaml  --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
+```
+
+### Deployment
+
+Please refer to the [deployment tutorial](https://mindspore-lab.github.io/mindcv/tutorials/deployment/) in MindCV.
+
+## References
+
+[1] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12894-12904.
diff --git a/configs/halonet/halonet_50t_ascend.yaml b/configs/halonet/halonet_50t_ascend.yaml
@@ -0,0 +1,60 @@
+# system
+mode: 0
+distribute: True
+num_parallel_workers: 8
+val_while_train: True
+
+# dataset
+dataset: 'imagenet'
+data_dir: '/path/to/imagenet'
+shuffle: True
+dataset_download: False
+batch_size: 64
+drop_remainder: True
+val_split: val
+
+# augmentation
+image_resize: 256
+scale: [0.08, 1.0]
+ratio: [0.75, 1.333]
+hflip: 0.5
+interpolation: 'bilinear'
+crop_pct: 0.95
+
+#color_jitter:
+auto_augment: 'randaug-m9-n2-mstd0.5-inc1'
+re_prob: 0.25
+re_max_attempts: 1
+mixup: 0.8
+color_jitter: 0.4
+
+# model
+model: 'halonet_50t'
+num_classes: 1000
+pretrained: False
+ckpt_path: ''
+keep_checkpoint_max: 20
+val_interval: 5
+ckpt_save_dir: './ckpt'
+epoch_size: 300
+dataset_sink_mode: True
+amp_level: 'O3'
+val_amp_level: 'O2'
+
+# optimizer
+opt: 'adamw'
+filter_bias_and_bn: True
+weight_decay: 0.04
+loss_scale: 1024
+use_nesterov: False
+
+# lr scheduler
+scheduler: 'warmup_cosine_decay'
+min_lr: 0.000006
+lr:  0.00125
+warmup_epochs: 3
+decay_epochs: 297
+
+# loss
+loss: 'CE'
+label_smoothing: 0.1
diff --git a/mindcv/models/__init__.py b/mindcv/models/__init__.py
@@ -13,6 +13,7 @@
     efficientnet,
     ghostnet,
     googlenet,
+    halonet,
     hrnet,
     inceptionv3,
     inceptionv4,
@@ -67,6 +68,7 @@
 from .efficientnet import *
 from .ghostnet import *
 from .googlenet import *
+from .halonet import *
 from .helpers import *
 from .hrnet import *
 from .inceptionv3 import *
@@ -125,6 +127,7 @@
 __all__.extend(efficientnet.__all__)
 __all__.extend(ghostnet.__all__)
 __all__.extend(["GoogLeNet", "googlenet"])
+__all__.extend(halonet.__all__)
 __all__.extend(hrnet.__all__)
 __all__.extend(["InceptionV3", "inception_v3"])
 __all__.extend(["InceptionV4", "inception_v4"])
diff --git a/mindcv/models/halonet.py b/mindcv/models/halonet.py