Skip to content

Commit 94ef850

Browse files
authored
feat: add model script, training recipe and pretrained weight of halonet (#720)
1 parent 2e03511 commit 94ef850

File tree

4 files changed

+811
-0
lines changed

4 files changed

+811
-0
lines changed

configs/halonet/README.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# HaloNet
2+
3+
> [Scaling Local Self-Attention for Parameter Efficient Visual Backbones](https://arxiv.org/abs/2103.12731)
4+
5+
## Introduction
6+
7+
Researchers from Google Research and UC Berkeley have developed a new model of self-attention that can outperform standard baseline models and even high-performance convolutional models.[[1](#references)]
8+
9+
Blocked Self-Attention:The whole input image is divided into multiple blocks and self-attention is applied to each block.However, if only the information inside the block is considered each time, it will inevitably lead to the loss of information.Therefore, before calculating the SA, a haloing operation is performed on each block, i.e., outside of each block, the information of the original image is used to padding a circle, so that the sensory field of each block can be appropriately larger and focus on more information.
10+
11+
<p align="center">
12+
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257577202-3ac43b82-785a-42c5-9b6c-ca58b0fa7ab8.png" width=800 />
13+
</p>
14+
<p align="center">
15+
<em>Figure 1. Architecture of Blocked Self-Attention [<a href="#references">1</a>] </em>
16+
</p>
17+
18+
Down Sampling:In order to reduce the amount of computation, each block is sampled separately, and then attentions are performed on this sampled information to reach the effect of down sampling.
19+
20+
<p align="center">
21+
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/50255437/257578183-fe45c2c2-5006-492b-b30a-5b049a0e2531.png" width=800 />
22+
</p>
23+
<p align="center">
24+
<em>Figure 2. Architecture of Down Sampling [<a href="#references">1</a>] </em>
25+
</p>
26+
27+
28+
## Results
29+
30+
Our reproduced model performance on ImageNet-1K is reported as follows.
31+
32+
<div align="center">
33+
34+
| Model | Context | Top-1 (%) | Top-5 (%) | Params (M) | Recipe | Download |
35+
| ----------- | -------- | --------- | --------- | ---------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
36+
| halonet_50t | D910X8-G | 79.53 | 94.79 | 22.79 | [yaml](https://github.com/mindspore-lab/mindcv/blob/main/configs/halonet/halonet_50t_ascend.yaml) | [weights](https://download.mindspore.cn/toolkits/mindcv/halonet/halonet_50t-533da6be.ckpt) |
37+
38+
</div>
39+
40+
#### Notes
41+
42+
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
43+
- Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.
44+
45+
## Quick Start
46+
47+
### Preparation
48+
49+
#### Installation
50+
Please refer to the [installation instruction](https://github.com/mindspore-ecosystem/mindcv#installation) in MindCV.
51+
52+
#### Dataset Preparation
53+
Please download the [ImageNet-1K](https://www.image-net.org/challenges/LSVRC/2012/index.php) dataset for model training and validation.
54+
55+
### Training
56+
57+
* Distributed Training
58+
59+
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
60+
61+
```shell
62+
# distributed training on multiple GPU/Ascend devices
63+
mpirun -n 8 python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet
64+
```
65+
66+
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
67+
68+
Similarly, you can train the model on multiple GPU devices with the above `mpirun` command.
69+
70+
For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindcv/blob/main/config.py).
71+
72+
**Note:** As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
73+
74+
* Standalone Training
75+
76+
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
77+
78+
```shell
79+
# standalone training on a CPU/GPU/Ascend device
80+
python train.py --config configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/dataset --distribute False
81+
```
82+
83+
### Validation
84+
85+
To validate the accuracy of the trained model, you can use `validate.py` and parse the checkpoint path with `--ckpt_path`.
86+
87+
```shell
88+
python validate.py -c configs/halonet/halonet_50t_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt
89+
```
90+
91+
### Deployment
92+
93+
Please refer to the [deployment tutorial](https://mindspore-lab.github.io/mindcv/tutorials/deployment/) in MindCV.
94+
95+
## References
96+
97+
[1] Vaswani A, Ramachandran P, Srinivas A, et al. Scaling local self-attention for parameter efficient visual backbones[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12894-12904.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# system
2+
mode: 0
3+
distribute: True
4+
num_parallel_workers: 8
5+
val_while_train: True
6+
7+
# dataset
8+
dataset: 'imagenet'
9+
data_dir: '/path/to/imagenet'
10+
shuffle: True
11+
dataset_download: False
12+
batch_size: 64
13+
drop_remainder: True
14+
val_split: val
15+
16+
# augmentation
17+
image_resize: 256
18+
scale: [0.08, 1.0]
19+
ratio: [0.75, 1.333]
20+
hflip: 0.5
21+
interpolation: 'bilinear'
22+
crop_pct: 0.95
23+
24+
#color_jitter:
25+
auto_augment: 'randaug-m9-n2-mstd0.5-inc1'
26+
re_prob: 0.25
27+
re_max_attempts: 1
28+
mixup: 0.8
29+
color_jitter: 0.4
30+
31+
# model
32+
model: 'halonet_50t'
33+
num_classes: 1000
34+
pretrained: False
35+
ckpt_path: ''
36+
keep_checkpoint_max: 20
37+
val_interval: 5
38+
ckpt_save_dir: './ckpt'
39+
epoch_size: 300
40+
dataset_sink_mode: True
41+
amp_level: 'O3'
42+
val_amp_level: 'O2'
43+
44+
# optimizer
45+
opt: 'adamw'
46+
filter_bias_and_bn: True
47+
weight_decay: 0.04
48+
loss_scale: 1024
49+
use_nesterov: False
50+
51+
# lr scheduler
52+
scheduler: 'warmup_cosine_decay'
53+
min_lr: 0.000006
54+
lr: 0.00125
55+
warmup_epochs: 3
56+
decay_epochs: 297
57+
58+
# loss
59+
loss: 'CE'
60+
label_smoothing: 0.1

mindcv/models/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
efficientnet,
1414
ghostnet,
1515
googlenet,
16+
halonet,
1617
hrnet,
1718
inceptionv3,
1819
inceptionv4,
@@ -67,6 +68,7 @@
6768
from .efficientnet import *
6869
from .ghostnet import *
6970
from .googlenet import *
71+
from .halonet import *
7072
from .helpers import *
7173
from .hrnet import *
7274
from .inceptionv3 import *
@@ -125,6 +127,7 @@
125127
__all__.extend(efficientnet.__all__)
126128
__all__.extend(ghostnet.__all__)
127129
__all__.extend(["GoogLeNet", "googlenet"])
130+
__all__.extend(halonet.__all__)
128131
__all__.extend(hrnet.__all__)
129132
__all__.extend(["InceptionV3", "inception_v3"])
130133
__all__.extend(["InceptionV4", "inception_v4"])

0 commit comments

Comments
 (0)