|
| 1 | +# HaloNet |
| 2 | + |
| 3 | +> [Scaling Local Self-Attention for Parameter Efficient Visual Backbones](https://arxiv.org/abs/2103.12731) |
| 4 | +
|
| 5 | +## Introduction |
| 6 | + |
| 7 | +Researchers from Google Research and UC Berkeley have developed a new model of self-attention that can outperform standard baseline models and even high-performance convolutional models.[[1](#references)] |
| 8 | + |
| 9 | +Blocked Self-Attention:The whole input image is divided into multiple blocks and self-attention is applied to each block.However, if only the information inside the block is considered each time, it will inevitably lead to the loss of information.Therefore, before calculating the SA, a haloing operation is performed on each block, i.e., outside of each block, the information of the original image is used to padding a circle, so that the sensory field of each block can be appropriately larger and focus on more information. |
| 10 | + |
| 11 | +<p align="center"> |
| 12 | + <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257577202-3ac43b82-785a-42c5-9b6c-ca58b0fa7ab8.png" width=800 /> |
| 13 | +</p> |
| 14 | +<p align="center"> |
| 15 | + <em>Figure 1. Architecture of Blocked Self-Attention [<a href="#references">1</a>] </em> |
| 16 | +</p> |
| 17 | + |
| 18 | +Down Sampling:In order to reduce the amount of computation, each block is sampled separately, and then attentions are performed on this sampled information to reach the effect of down sampling. |
| 19 | + |
| 20 | +<p align="center"> |
| 21 | + <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257578183-fe45c2c2-5006-492b-b30a-5b049a0e2531.png" width=800 /> |
| 22 | +</p> |
| 23 | +<p align="center"> |
| 24 | + <em>Figure 2. Architecture of Down Sampling [<a href="#references">1</a>] </em> |
| 25 | +</p> |
| 26 | + |
| 27 | + |
| 28 | +## Results |
| 29 | + |
| 30 | +Our reproduced model performance on ImageNet-1K is reported as follows. |
| 31 | + |
| 32 | +<div align="center"> |
| 33 | + |
| 34 | +| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download | |
| 35 | +| ----------- | -------- | --------- | --------- | ---------- | ------------------------------------------------------------ | ------------------------------------------------------------ | |
| 36 | +| halonet_50t | D910X8-G | 79.53 | 94.79 | 22.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/halonet/halonet_50t_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/halonet/halonet_50t-533da6be.ckpt) | |
| 37 | + |
| 38 | +</div> |
| 39 | + |
| 40 | +#### Notes |
| 41 | + |
| 42 | +- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode. |
| 43 | +- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K. |
| 44 | + |
| 45 | +## Quick Start |
| 46 | + |
| 47 | +### Preparation |
| 48 | + |
| 49 | +#### Installation |
| 50 | +Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV. |
| 51 | + |
| 52 | +#### Dataset Preparation |
| 53 | +Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation. |
| 54 | + |
| 55 | +### Training |
| 56 | + |
| 57 | +* Distributed Training |
| 58 | + |
| 59 | +It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run |
| 60 | + |
| 61 | +```shell |
| 62 | +# distributed training on multiple GPU/Ascend devices |
| 63 | +mpirun -n 8 python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet |
| 64 | +``` |
| 65 | + |
| 66 | +> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`. |
| 67 | +
|
| 68 | +Similarly, you can train the model on multiple GPU devices with the above `mpirun` command. |
| 69 | + |
| 70 | +For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py). |
| 71 | + |
| 72 | +**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size. |
| 73 | + |
| 74 | +* Standalone Training |
| 75 | + |
| 76 | +If you want to train or finetune the model on a smaller dataset without distributed training, please run: |
| 77 | + |
| 78 | +```shell |
| 79 | +# standalone training on a CPU/GPU/Ascend device |
| 80 | +python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/dataset --distribute False |
| 81 | +``` |
| 82 | + |
| 83 | +### Validation |
| 84 | + |
| 85 | +To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`. |
| 86 | + |
| 87 | +```shell |
| 88 | +python validate.py -c configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt |
| 89 | +``` |
| 90 | + |
| 91 | +### Deployment |
| 92 | + |
| 93 | +Please refer to the [deployment tutorial](https://mindspore-lab.github.io/mindcv/tutorials/deployment/) in MindCV. |
| 94 | + |
| 95 | +## References |
| 96 | + |
| 97 | +[1] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12894-12904. |
0 commit comments