Skip to content

Commit c240bf6

Browse files
modify sparsity docs (#1319)
* init commit * Updated hyperlinks * Update datapoints * Update sparsity.md * Update sparsity.md * Update README.md * Update README.md * Update sparsity.md * space adjustment * Update README.md change 2:4 to structure * Update README.md bert large pattern 2x1 * add 2in4 sparsity demo image * Update README.md change to 1x2 as following the config * additional 2in4 sparsity explanation * Update sparsity.md * Update README.md revert * Update sparsity.md * Update sparsity.md * Update README.md * Update sparsity.md * Update sparsity.md * Update sparsity.md Co-authored-by: wenhuach21 <108330088+wenhuach21@users.noreply.github.com>
1 parent 716c3ee commit c240bf6

File tree

3 files changed

+29
-16
lines changed

3 files changed

+29
-16
lines changed

docs/imgs/2in4_sparsity_demo.png

4.53 KB
Loading

docs/sparsity.md

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,16 @@ Sparsity is one of promising model compression techniques that can be used to ac
44

55
The document describes the sparsity definition, sparsity training flow, validated models, and performance benefit using software sparsity. Note that the document discusses the sparse weight (with dense activation) for inference acceleration. Sparse activation or sparse embedding for inference acceleration or training acceleration is out of the scope.
66

7-
> **Note**: training for sparsity with 2:4 or similar structured pattern is under development
7+
> **Note**: training for sparsity with 2:4 or similar structured pattern is supported, please refer it at our new [API](../neural_compressor/experimental/pytorch_pruner/), [question-answering examples](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/pytorch_pruner/eager) and [text-classification examples](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager)
88
99
## Sparsity Definition
10-
Different from structured sparsity pattern [2:4](https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/) what NVidia proposed in Ampere architecture, we propose the block-wise structured sparsity patterns that we are able to demonstrate the performance benefits on existing Intel hardwares even without the support of hardware sparsity. A block-wise sparsity pattern with block size ```S``` means the contiguous ```S``` elements in this block are all zero values.
10+
NVidia proposed [2:4 sparsity](https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/) (or known as "2in4 sparsity") in Ampere architecture, for every 4 continuous elements in a matrix, two of them are zero and others are non-zero.
11+
12+
<a target="_blank" href="./docs/imgs/2in4_sparsity_demo.png">
13+
<img src="../docs/imgs/2in4_sparsity_demo.png" width=600 height=200 alt="Sparsity Pattern">
14+
</a>
15+
16+
Different from 2:4 sparsity above, we propose the block-wise structured sparsity patterns that we are able to demonstrate the performance benefits on existing Intel hardwares even without the support of hardware sparsity. A block-wise sparsity pattern with block size ```S``` means the contiguous ```S``` elements in this block are all zero values.
1117

1218
For a typical GEMM, the weight dimension is ```IC``` x ```OC```, where ```IC``` is the number of input channels and ```OC``` is the number of output channels. Note that sometimes ```IC``` is also called dimension ```K```, and ```OC``` is called dimension ```N```. The sparsity dimension is on ```OC``` (or ```N```).
1319

@@ -45,16 +51,23 @@ def train():
4551

4652
We validate the sparsity on typical models across different domains (including CV, NLP, and Recommendation System). The below table shows the sparsity pattern, sparsity ratio, and accuracy of sparse and dense (Reference) model for each model. We also provide a simplified [BERT example](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/group_lasso/eager) with only one sparse layer.
4753

48-
| Model | Sparsity Pattern | Sparsity Ratio | Accuracy (Sparse Model) | Accuracy (Dense Model) |
49-
|-----------|:----------------:|:--------------:|:-----------------------:|:-----------------------:|
50-
| Bert Large| ***2***x1 | 70% | 90.70% | 91.34% |
51-
| DLRM | 4x***16*** | 85% | 80.29% | 80.25% |
52-
| Bert Mini | ***16***x1 | 81% | 81.89% | 82.93% |
53-
|ResNet50 v1.5 | ***2***x1 | 78% | 75.3% | 76.13% |
54-
|SSD-ResNet34 | ***2***x1 | 75% | 22.85% | 23% |
55-
|ResNext101| ***2***x1 | 73% | 79.14% | 79.37% |
56-
57-
Note: ***bold*** means the sparsity dimension (```OC```).
54+
| Model | Sparsity Pattern | Sparsity Ratio |Dataset| Accuracy (Sparse Model) | Accuracy (Dense Model) |
55+
|-----------|:----------------:|:--------------:|:-------------:|:-----------------------:|:-----------------------:|
56+
| Bert Large| [***2***x1](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/group_lasso/eager) | 70% |SQuAD| 90.70% | 91.34% |
57+
| DLRM | 4x***16*** | 85% |Criteo Terabyte| 80.29% | 80.25% |
58+
| Bert Mini | [***4***x1](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager) | 90% |MRPC| 87.22% | 87.52% |
59+
| Bert Mini | [***4***x1](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager) | 90% |SST-2| 86.92% | 87.61% |
60+
| Bert Mini | [***4***x1](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/pytorch_pruner/eager) | 90% |SQuAD| 76.27% | 76.87% |
61+
| Bert Mini | [2 in ***4***](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager) | 50% |MRPC| 86.95% | 87.52% |
62+
| Bert Mini | [2 in ***4***](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager) | 50% |SST-2| 86.93% | 87.61% |
63+
| Bert Mini | [2 in ***4***](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/pytorch_pruner/eager) | 50% |SQuAD| 76.85% | 76.87% |
64+
|ResNet50 v1.5 | [***2***x1](../examples/pytorch/image_recognition/torchvision_models/pruning/magnitude/eager) | 78% |Image-Net| 75.3% | 76.13% |
65+
|SSD-ResNet34 | ***2***x1 | 75% |Coco| 22.85% | 23% |
66+
|ResNext101| ***2***x1 | 73% |Image-Net| 79.14% | 79.37% |
67+
68+
Note:
69+
* ***bold*** means the sparsity dimension (```OC```).
70+
* Bert-Mini related examples are developed based on our [Pytorch Pruner API](../neural_compressor/experimental/pytorch_pruner/). Examples of [question answering](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/pytorch_pruner/eager) and [text classification](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager) are developed.
5871

5972
## Performance
6073

examples/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -575,7 +575,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
575575
<tr>
576576
<td>BERT large</td>
577577
<td>Natural Language Processing</td>
578-
<td>Structured</td>
578+
<td>Structured (2x1)</td>
579579
<td>Group Lasso</td>
580580
<td><a href="./pytorch/nlp/huggingface_models/question-answering/pruning/group_lasso/eager">eager</a></td>
581581
</tr>
@@ -589,7 +589,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
589589
<tr>
590590
<td>bert-base-uncased</td>
591591
<td>Natural Language Processing</td>
592-
<td>Structured</td>
592+
<td>Structured (Filter/Channel-wise)</td>
593593
<td>Gradient Sensitivity</td>
594594
<td><a href="./pytorch/nlp/huggingface_models/text-classification/pruning/gradient_sensitivity/eager">eager</a></td>
595595
</tr>
@@ -610,14 +610,14 @@ Intel® Neural Compressor validated examples with multiple compression technique
610610
<tr>
611611
<td>Bert-mini</td>
612612
<td>Natural Language Processing (text classification)</td>
613-
<td>Structured</td>
613+
<td>Structured (4x1, 2in4), Unstructured</td>
614614
<td>Snip-momentum</td>
615615
<td><a href="./pytorch/nlp/huggingface_models/text-classification/pruning/pytorch_pruner/eager">eager</a></td>
616616
</tr>
617617
<tr>
618618
<td>Bert-mini</td>
619619
<td>Natural Language Processing (question answering)</td>
620-
<td>Structured</td>
620+
<td>Structured (4x1, 2in4), Unstructured</td>
621621
<td>Snip-momentum</td>
622622
<td><a href="./pytorch/nlp/huggingface_models/question-answering/pruning/pytorch_pruner/eager">eager</a></td>
623623
</tr>

0 commit comments

Comments
 (0)