Create distillation_for_quantization documentation (#1321)

yintong-lu · web-flow · commit 06d21ea16c7a · 2022-09-30T15:46:09.000+08:00
* Create distillation_quantization.md

* Update distillation_quantization.md

* Update README.md

* Update distillation_quantization.md

* Update distillation_quantization.md

test lpot-ut

* Update distillation_quantization.md

* Update distillation_quantization.md
diff --git a/README.md b/README.md
@@ -207,6 +207,10 @@ Intel® Neural Compressor validated 420+ [examples](./examples) for quantization
         <td colspan="2" align="center"><a href="docs/model_conversion.md">Model Conversion</a></td>
         <td colspan="2" align="center"><a href="docs/tensorboard.md">TensorBoard</a></td>
     </tr>
+    <tr>
+        <td colspan="3" align="center"><a href="docs/distillation_quantization.md">Distillation for Quantization</a></td>
+    </tr>    
+    
   </tbody>
   <thead>
       <tr>
diff --git a/docs/distillation_quantization.md b/docs/distillation_quantization.md
@@ -0,0 +1,16 @@
+Distillation for Quantization
+============
+
+### Introduction
+
+Distillation and quantization are both promising methods to reduce the computational and memory footprint that huge transformer-based networks require. Quantization refers to a process of reducing the bit precision for both activations and weights. Distillation method transfers knowledge from a heavy teacher model to a light one (student) and it could be used as a performance-booster in lower-bits quantizations. Quantization-aware training recovers accuracy degradation from representation loss in the retraining process and typically provides better performance compared to post-training quantization. 
+Intel provides a quantization-aware training (QAT) method that incorporates a novel layer-by-layer knowledge distillation step for INT8 quantization pipelines. 
+
+### User-defined yaml
+
+The configurations of distillation and QAT are specified in distillation.yaml and qat.yaml, respectively.
+
+
+### Examples
+
+For examples of distillation for quantization, please refer to [distillation-for-quantization examples](../examples/pytorch/nlp/huggingface_models/text-classification/optimization_pipeline/distillation_for_quantization/fx/README.md)