How to disable deepspeed quantization during inference? #3567
Unanswered
yuchen2580
asked this question in
Q&A
Replies: 1 comment
-
@yuchen2580 Have you figured out this problem? I also have the same confusion |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I follow the tutorial and use the following code for inference:
However, when i exam the log output, i found:
[2023-05-18 11:50:58,071] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,073] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
05/18/2023 11:50:58 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/ethany/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/bcb9b8b48fdeae767d48b3ce9341d5b691048450328db6e6f1a9583eb759599a/cache-9164b0c08bb4f7d7.arrow
[2023-05-18 11:50:58,074] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,084] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,086] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,086] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,087] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,087] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.9.1, git-hash=unknown, git-branch=unknown
[2023-05-18 11:50:58,088] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,088] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2023-05-18 11:50:58,089] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2023-05-18 11:50:58,089] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['self_attn.out_proj', '.fc2'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['.fc2', 'self_attn.out_proj'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['.fc2', 'self_attn.out_proj'])]
AutoTP: [(<class 'transformers.models.opt.modeling_opt.OPTDecoderLayer'>, ['self_attn.out_proj', '.fc2'])]
which suggests it was running on quantize_bits=8?
how should i disable it? cause it drops a lot of accuracy for model OPT, gpt-neo which i grabbed from hugging face.
Beta Was this translation helpful? Give feedback.
All reactions