-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add debug hook to support dump tensor data and add new debug functions easily #5182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add context manager method to enable debugger Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
self.layer_inner_counter = [] | ||
|
||
self.module_forward_hook_handle = None | ||
self.module_forward_pre_hook_handle = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the outputs of layer n can be the inputs of layer n+1, do we need to dump the inputs and outputs of all layers at the same time? I guess there will be lots of duplicated results.
self.log_folder = dest_folder | ||
self.is_forward_pre = True | ||
self.dump_style: DumpStyle = DumpStyle.BINARY | ||
self.log_folder_inited: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unifiy naming style, is_forward_pre
, is_log_folder_inited
return | ||
|
||
if self.log_folder is None: | ||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move import os
to the beginning of the file.
|
||
debug_ctx.get_current_modules_tree().clear() | ||
debug_ctx.get_module_indices_tree().clear() | ||
for name, submodule in model.named_modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a MLP
module contains 2 Linear
modules, then, the output of MLP
module should be the same with the output of the second Linear
module. Is my understanding correct? If so, there will be duplicated dumpled tensors.
# position_ids=position_ids, | ||
# attn_metadata=attn_metadata) | ||
@contextmanager | ||
def debugger_addon(model, dest_folder: Optional[str] = None, filter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming debugger_addon
to debug_mode
?
|
||
tensor_counter += 1 | ||
module_path = "-".join([module_path, tensor_name]) | ||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move from pathlib import Path
to the begining of this file.
Add debug hook to support dump tensor data and add new debug functions easily.
To enable to dump tensors' data,
from tensorrt_llm._torch.debug.debug_hook import debugger_addon, register_tensor_dump_hook
with debugger_addon(model, DATA_FOLDER):
register_tensor_dump_hook()
model.forward()
The dumped data are put under DATA_FOLDER/rank[ID]/....
The data file name is in the pattern:
[LOOP_COUNT].[model_name]-[OPIDX_IN_MODEL].[OPNAME]-[OPIDX_IN_PRE_OP].[OPNAME]-[input|output].[PARA_NAME].pt
suc as 1.LlamaModel-24.LlamaDecoderLayer-2.LlamaAttention-2.Linear-1.AllReduce-input.input.pt.