You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|qweight_config_path | None|If need to export model with fp32_model and json file, set the path of qconfig.json|
96
+
| use_optimum_format |True |Whether to use the popular format used in [Optimum](https://github.com/huggingface/optimum/blob/e0927976d06d163ed09fe5bd80d013e1cfa0c463/docs/source/llm_quantization/usage_guides/quantization.mdx#L5)|
97
97
| sym_full_range | False | Whether to leverage the full compression range under symmetric quantization |
98
-
| compression_dtype | torch.int32 | Data type for compressed dtype, select from [torch.int8\|16\|32\|64]|
99
-
| compression_dim | 1 | 0 means output channel while 1 means input channel |
100
-
| scale_dtype | torch.float32 | Data type for scale and bias |
101
-
| use_hf_format | False | Whether to use the popular format present on HuggingFace hub |
98
+
| compression_dtype | torch.int32 | Data type for compressed dtype, select from [torch.int8\|16\|32\|64]. It's torch.int32 when use_optimum_format=True |
99
+
| compression_dim | 1 | 0 means output channel while 1 means input channel. It's 1 for weight and 0 for zero-point when use_optimum_format=True |
100
+
| scale_dtype | torch.float32 | Data type for scale and bias. It's torch.float16 when use_optimum_format=True |
101
+
| qweight_config_path | None | set the path of qconfig.json if you want to export model with json file |
102
+
| gptq_config_path | None | If need to export model with fp32_model and json file, set the path of gptq_config.json for GPTQ quantized model|
102
103
103
-
**Note:**HuggingFace format is quite special, the main differences are as follows:
104
+
**Note:**The format used in Optimum is acceptable for transformers, which makes it easy to use. However, this format is rather special, the main differences are as follows:
104
105
105
106
> 1: Compression Dimension: weight = 1, zero = 0 and both are transposed.
106
107
> 2: Zero Point: zero_point-= 1 before compression. zero_point is always required even for sym.
107
-
> 3: Group Index: Use the same number for a group instead of recording channel order.
108
+
> 3: Group Index: Use the same number for a group instead of recording channel order.
0 commit comments