Skip to content

Commit 06d21ea

Browse files
authored
Create distillation_for_quantization documentation (#1321)
* Create distillation_quantization.md * Update distillation_quantization.md * Update README.md * Update distillation_quantization.md * Update distillation_quantization.md test lpot-ut * Update distillation_quantization.md * Update distillation_quantization.md
1 parent 10276c0 commit 06d21ea

File tree

2 files changed

+20
-0
lines changed

2 files changed

+20
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,10 @@ Intel® Neural Compressor validated 420+ [examples](./examples) for quantization
207207
<td colspan="2" align="center"><a href="docs/model_conversion.md">Model Conversion</a></td>
208208
<td colspan="2" align="center"><a href="docs/tensorboard.md">TensorBoard</a></td>
209209
</tr>
210+
<tr>
211+
<td colspan="3" align="center"><a href="docs/distillation_quantization.md">Distillation for Quantization</a></td>
212+
</tr>
213+
210214
</tbody>
211215
<thead>
212216
<tr>

docs/distillation_quantization.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Distillation for Quantization
2+
============
3+
4+
### Introduction
5+
6+
Distillation and quantization are both promising methods to reduce the computational and memory footprint that huge transformer-based networks require. Quantization refers to a process of reducing the bit precision for both activations and weights. Distillation method transfers knowledge from a heavy teacher model to a light one (student) and it could be used as a performance-booster in lower-bits quantizations. Quantization-aware training recovers accuracy degradation from representation loss in the retraining process and typically provides better performance compared to post-training quantization.
7+
Intel provides a quantization-aware training (QAT) method that incorporates a novel layer-by-layer knowledge distillation step for INT8 quantization pipelines.
8+
9+
### User-defined yaml
10+
11+
The configurations of distillation and QAT are specified in distillation.yaml and qat.yaml, respectively.
12+
13+
14+
### Examples
15+
16+
For examples of distillation for quantization, please refer to [distillation-for-quantization examples](../examples/pytorch/nlp/huggingface_models/text-classification/optimization_pipeline/distillation_for_quantization/fx/README.md)

0 commit comments

Comments
 (0)