You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: torchao/sparsity/prototype/superblock/README.md
+21-9Lines changed: 21 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -47,9 +47,8 @@ At least one GPU:
47
47
Baseline:
48
48
```
49
49
python benchmark.py \
50
-
--model vit_b_16 \
50
+
--model vit_h_14 \
51
51
--batch-size 256 \
52
-
> /dev/null
53
52
```
54
53
Result:
55
54
```
@@ -59,19 +58,27 @@ Result:
59
58
60
59
80% sparsity, block size 64 (random weights):
61
60
```
62
-
python benchmark.py --model vit_b_16 \
61
+
python benchmark.py \
62
+
--model vit_h_14 \
63
63
--batch-size 256 \
64
64
--sparsity-linear 0.8 \
65
65
--sp-linear-tile-size 64 \
66
-
--sparsify-weights \
67
66
--bsr 64 \
68
-
> /dev/null
67
+
--sparsity bsr
69
68
```
70
69
Result:
71
70
```
72
71
393.864453125 ms
73
72
```
74
73
74
+
Semi-structured sparsity
75
+
```
76
+
python benchmark.py \
77
+
--model vit_h_14 \
78
+
--batch-size 256 \
79
+
--sparsity semi_structured
80
+
```
81
+
75
82
76
83
## Training
77
84
Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
@@ -102,11 +109,11 @@ To apply supermask, we have the following arguments at our disposal,
102
109
For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.
112
119
@@ -134,6 +141,11 @@ NGPUS=1 # put number of available GPUS here
134
141
```
135
142
This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.
0 commit comments