You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `Basic` strategy is designed for quantizing most models. There are six stages executed by `Basic` strategy sequentially, and the tuning process ends once the condition meets the exit policy.
148
+
The `Basic` strategy is designed for quantizing most models. There are several stages executed by `Basic` strategy sequentially, and the tuning process ends once the condition meets the exit policy. The diagram below illustrates each stage, accompanied by additional details provided for each annotated step.
-**Stage I**. Quantize with default quantization configuration
151
-
152
-
At this stage, it tries to quantize OPs with the default quantization configuration which is consistent with the framework behavior.
176
+
> `Opt` stands for optional which mean this stage can be skipped.
153
177
154
-
-**Stage II**. Apply all recipes
178
+
> `*` INC will detect the block pattern for [transformer-like](https://arxiv.org/abs/1706.03762) model by default.
155
179
156
-
At this stage, it tries to apply all recipes.
180
+
**1.** Default quantization
157
181
158
-
-**Stage III**. OP-Type-Wise Tuning
182
+
At this stage, it attempts to quantize OPs with the default quantization configuration which is consistent with the framework's behavior.
159
183
160
-
At this stage, it tries to quantize OPs as many as possible and traverse all OP type wise tuning configs. Note that, the OP is initialized with different quantization modes according to the quantization approach.
161
-
162
-
a. `post_training_static_quant`: Quantize all OPs support PTQ static.
184
+
**2.** Apply all recipes
163
185
164
-
b. `post_training_dynamic_quant`: Quantize all OPs support PTQ dynamic.
165
-
166
-
c. `post_training_auto_quant`: Quantize all OPs support PTQ static or PTQ dynamic. For OPs supporting both PTQ static and PTQ dynamic, PTQ static will be tried first, and PTQ dynamic will be tried when none of the OP type wise tuning configs meet the accuracy loss criteria.
186
+
At this stage, it tries to apply all recipes. This stage will be skipped if user assigned the usage of all recipes.
187
+
188
+
**3.** OP-Type-Wise Tuning
189
+
190
+
At this stage, it tries to quantize OPs as many as possible and traverse all OP type wise tuning configs. Note that, the OP is initialized with different quantization modes according to the quantization approach.
191
+
192
+
a. `post_training_static_quant`: Quantize all OPs support PTQ static.
193
+
194
+
b. `post_training_dynamic_quant`: Quantize all OPs support PTQ dynamic.
195
+
196
+
c. `post_training_auto_quant`: Quantize all OPs support PTQ static or PTQ dynamic. For OPs supporting both PTQ static and PTQ dynamic, PTQ static will be tried first, and PTQ dynamic will be tried when none of the OP type wise tuning configs meet the accuracy loss criteria.
197
+
198
+
**4.** Try recipe One by One
199
+
200
+
At this stage, it sequentially tries recipe based on the tuning config with the best result in the previous stage. This stage will be skipped the recipes(s) specified by user.
201
+
202
+
If the above trials not meet the accuracy requirements, it start to performs fallback, which mean converting quantized OP(s) into high-precision(FP32, BF16 ...).
203
+
204
+
**5.1** Block-wise fallback*
167
205
168
-
-**Stage IV**. Try recipe One by One
206
+
For the [transformer-like](https://arxiv.org/abs/1706.03762) model, it will use the detected transformer block by default, and conduct the block-wise fallback. In each trial, all OPs within a block are reverted to high-precision.
169
207
170
-
At this stage, it tries recipe one by one based on the tuning config with the best result in the previous stage.
208
+
**5.2** Instance-wise fallback
171
209
172
-
-**Stage V**. Fallback OP One by One
210
+
At this stage, it performs high-precision OP (FP32, BF16 ...) fallbacks one by one based on the tuning config with the best result in the previous stage, and records the impact of each OP.
173
211
174
-
At this stage, it performs high-precision OP (FP32, BF16 ...) fallbacks one by one based on the tuning config with the best result in the previous stage, and records the impact of each OP.
212
+
**5.3** Accumulated fallback
175
213
176
-
-**Stage VI**. Fallback Multiple OPs Accumulated
214
+
At the final stage, it first sorted the OPs list according to the impact score in stage V, and tries to incrementally fallback multiple OPs to high precision according to the sorted OP list.
177
215
178
-
At the final stage, it first sorted the OPs list according to the impact score in stage V, and tries to incrementally fallback multiple OPs to high precision according to the sorted OP list.
0 commit comments