You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to pull out as much shared code as possible into utils.py, now both benchmark.py and evaluate.py are singe function files.
I also added a block_sparse_wieght function to the BlockSparse subclass. We should probably make this a public API before PTC, I might try to turn this into a good-first-task kind of thing.
Additionally fixed a bug so FakeSparsity Parameterizations now return a state_dict, so the mask are present in the dumped file.
Copy file name to clipboardExpand all lines: torchao/sparsity/prototype/superblock/README.md
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
# SuperBlock
2
2
3
-
SuperBlock combines two techniques for efficient neural network training and inference: Supermask and Block Compressed Sparse Row (BSR).
3
+
SuperBlock combines two techniques for efficient neural network training and inference: Supermask and Block Compressed Sparse Row (BSR).
4
4
The techniques are described in this [blog post](https://pytorch.org/blog/speeding-up-vits/).
5
5
6
6
### Supermask
7
7
[Supermask](https://arxiv.org/abs/2207.00670) is a technique for applying structured sparsity to neural networks using a learned mask. It works by learning a continuous mask (scores) that is applied element-wise to the weights of a neural network layer. The mask scores are learned separately from the weights and are thresholded based on a target sparsity level to obtain a binary mask. The mask determines which weigths are kept and which are pruned, and is learned during training.
8
8
9
-
During inference, the binary mask is applied element-wise to the weights, pruning the weights that correspond to a 0 in the mask, resulting in a sparse network that can be efficiently computed.
9
+
During inference, the binary mask is applied element-wise to the weights, pruning the weights that correspond to a 0 in the mask, resulting in a sparse network that can be efficiently computed.
10
10
11
11
### Block compressed Sparse Row Format (BSR)
12
12
[The BSR format](https://pytorch.org/docs/main/sparse.html#sparse-bsr-tensor) is a sparse matrix representation that stores dense sub-blocks of non-zero elements instead of individual non-zero elements. The matrix is divided into equal-sized blocks, and only the non-zero blocks are stored.
This command applies 90% sparsity to linear layers using 32x32 tiles, loads the model weights from ${MODEL_PATH}, loads the ImageNet validation set located at the specified path, applies offline sparsification to the weights, and converts the sparse weights to BSR format with a block size of 32. It is recommended to set `--bsr` the same as tile size.
parser.add_argument("--skip-last-layer-sparsity", action="store_true", help="Skip applying sparsity to the last linear layer (for vit only)")
150
103
parser.add_argument("--skip-first-transformer-sparsity", action="store_true", help="Skip applying sparsity to the first transformer layer (for vit only)")
151
-
parser.add_argument('--sparsify-weights', action='store_true', help='Apply weight sparsification in evaluation mode')
152
104
parser.add_argument('--bsr', type=int, nargs='?', const=256, default=None, help='Convert sparsified weights to BSR format with optional block size (default: 256)')
0 commit comments