Skip to content

Commit 16b40fd

Browse files
authored
readme-updates (#874)
1 parent 8236a87 commit 16b40fd

File tree

1 file changed

+21
-9
lines changed
  • torchao/sparsity/prototype/superblock

1 file changed

+21
-9
lines changed

torchao/sparsity/prototype/superblock/README.md

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,8 @@ At least one GPU:
4747
Baseline:
4848
```
4949
python benchmark.py \
50-
--model vit_b_16 \
50+
--model vit_h_14 \
5151
--batch-size 256 \
52-
> /dev/null
5352
```
5453
Result:
5554
```
@@ -59,19 +58,27 @@ Result:
5958

6059
80% sparsity, block size 64 (random weights):
6160
```
62-
python benchmark.py --model vit_b_16 \
61+
python benchmark.py \
62+
--model vit_h_14 \
6363
--batch-size 256 \
6464
--sparsity-linear 0.8 \
6565
--sp-linear-tile-size 64 \
66-
--sparsify-weights \
6766
--bsr 64 \
68-
> /dev/null
67+
--sparsity bsr
6968
```
7069
Result:
7170
```
7271
393.864453125 ms
7372
```
7473

74+
Semi-structured sparsity
75+
```
76+
python benchmark.py \
77+
--model vit_h_14 \
78+
--batch-size 256 \
79+
--sparsity semi_structured
80+
```
81+
7582

7683
## Training
7784
Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
@@ -102,11 +109,11 @@ To apply supermask, we have the following arguments at our disposal,
102109
For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
103110
```
104111
torchrun --nproc_per_node=8 train.py\
105-
--model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
112+
--model vit_h_14 --epochs 3 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
106113
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
107-
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
108-
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema\
109-
--sparsity-linear 0.9 --sp-linear-tile-size 32
114+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 \
115+
--clip-grad-norm 1 --cutmix-alpha 1.0 --model-ema\
116+
--sparsity semi_structured --data-path $IMAGENET_PATH
110117
```
111118
Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.
112119
@@ -134,6 +141,11 @@ NGPUS=1 # put number of available GPUS here
134141
```
135142
This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.
136143
144+
* Semi-structured sparsity
145+
```
146+
python evaluate.py --model vit_b_16 --batch-size 256 --data-path $IMAGENET_PATH --weights-path checkpoints/2x4_sparse_ft_1_epoch.pth --sparsity semi_structured --skip-last-layer-sparsity
147+
```
148+
137149
Please run `python evaluate.py --help` for a full list of available arguments.
138150
139151
Results (1x A100):

0 commit comments

Comments
 (0)