-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add debug information for runtime async methods #120303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add new JIT-EE API to report back debug information about the generated state machine and continuations - Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. - Add new async debug information to the storage on the VM side - Set get target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
CONTRACTL | ||
{ | ||
NOTHROW; | ||
THROWS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RestorePatchpointInfo
is called from JitPatchpointWorker
, which has STANDARD_VM_CONTRACT
. I think throwing should be ok, since we only throw here on an internal inconsistency in the compressed data when encountered by NibbleReader
. That's consistent with the other Restore
routines and saves us having to write separate decoding routines for the patchpoint info.
CompressDebugInfo::RestoreRichDebugInfo( | ||
fpNew, pNewData, | ||
pDebugInfo, | ||
ppInlineTree, pNumInlineTree, | ||
ppRichMappings, pNumRichMappings); | ||
|
||
return TRUE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured I might as well hook this one up in ReadyToRunJitManager
too since the format now technically allows for R2R images that contain the rich debug info, eeven if crossgen2 doesn't produce it.
The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work, giving me an address that is about 0x1000 different from method start. However, in Windbg I am able to get the proper IP by going through the DAC, specifically by going through NativeCodeVersion::GetNativeCode(). Do you know why this is? |
One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). The correct way to do this is to go from IP to debug info like what (The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.) |
Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping. The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch |
Good point -- to get from the resumption stub back to the original code we need a lookup that gives back the IP of the exact code version we resume in. Today that gets allocated here while we JIT: runtime/src/coreclr/vm/jitinterface.cpp Lines 14674 to 14676 in 2787545
I think we can subclass |
@rcj1 I pushed a commit that adds a new |
static BOOL GetAsyncDebugInfo( | ||
const DebugInfoRequest & request, | ||
IN FP_IDS_NEW fpNew, IN void * pNewData, | ||
OUT ICorDebugInfo::AsyncInfo* pAsyncInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we keep the number of suspension points in AsyncInfo and number of vars as an out parameter? Since we have the AsyncInfo struct, wouldn't it make sense to put all the out parameters inside that struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just the fact that the length of the async vars array is not an interesting piece of semantic information about the async method, while the number of suspension points is. So I included the length of that array in the normal "API hygienic" way, while I put the number of suspension points inside ICorDebugInfo::AsyncInfo
which contains the semantically interesting method-level information.
I also considered duplicating the length of the suspension points array in the API signature, for API hygiene/consistency, but it feels redundant/confusing the have the same number twice.
OUT ICorDebugInfo::RichOffsetMapping** ppRichMappings, | ||
OUT ULONG32* pNumRichMappings); | ||
|
||
static BOOL GetAsyncDebugInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get an optimized version of this function? It will be a common scenario when stackwalking to just request data for a specific continuation resume state index and we are only interested in the suspension point data and not local vars. If we could scope it down to just one item, then I could have a custom fpNew and pNewData using a stack allocated ICorDebugInfo::AsyncSuspensionPoint, meaning there is no need for any dynamic memory allocation, and we could skip to the requested index in async debug info and only extract requested information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into a way to extract the native offset of a particular state number in constant time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stackwalking to just request data for a specific continuation resume state index
It still feels wrong for stackwalking to parse the debug info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should be treating the state index <-> native IP mapping as new unwind data rather than new debug data? Alternately if the Continuations aren't shared across different async methods then putting the info directly in the MethodTable is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that we are using the same terminology, there are two steps:
- Stack walking: Populates
Exception._stackTrace
with raw data. For async methods, the raw data is(Resume, State)
pair and potential keep alive root. We should not need debug info to find(Resume, State)
pair. Is that correct? - Stack trace formatting: Converting
Exception._stackTrace
to a string, like whatException.ToString()
does. This is several orders more expansive than (1). It can use metadata, debug info, etc.
(There is similar two-step process with other diagnostic scenarios, e.g. CPU profiling.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not suggesting to make State part of the contract. I am saying we have the option to avoid creating IL<->IP mappings for the trampolines and instead map from trampoline IP to join IP via the async debug info. This avoids the ambiguity in IL<->IP mappings.
Ah, I misunderstood you, sorry about that! In the case of EventPipe I have been treating 'stack trace formatting' as 'the profiler does it in a separate process after reading the trace file'. So doing a trampoline IP -> join IP conversion would require we add the async debug info to the serialized trace data and update all profilers to understand how to do that mapping. Its not as disruptive as including state index because it doesn't require a trace file format update but it does require updates to every profiler to understand how to extract and apply an additional mapping. If the debug data in the runtime stores resume IP -> join IP and join ip -> IL offset mappings separately then the EventPipe code would just merge those two mappings into a single native->IL mapping every time we emitted them. We'd probably wind up doing the same thing when reporting the mappings via the ICorDebug interface or ICorProfiler interface. Its probably simpler to store it in the merged format rather than merge it on demand everywhere we report it but both options seem doable if we find benefits to keeping the storage separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we are converging on a plan!
The trampoline IP to friendly IP mapping can be possible to do via the async debug info as part of stack trace formatting. Is it good enough to keep this as the way to map
I think it is fine to start with the trampoline mapping on the side. We will have to extend the debug info for async in some ways, and this is part of the extension. I do not have an opinion about the details - I might have an opinion once I see the stack formatting code.
unlike synchronous stack walking the async stack walking has not paid to access unwind info
It is a good question what a reasonable budget per frame for (sync) stack walking is. There are several ways to do sync stackwalking for diagnostics: unwind infos, frame-chains, shadow stacks. Frame-chains walk a link list, shadow stacks copy a memory block - both of these have amortized cost per frame measured in nanoseconds. SFrame does tricks that circumvent the costs of traditional unwind infos to get close (I do not recall seeing numbers), so you do not have a lot to play with if you want to be the state of the art.
I do think it will be useful to have something in our side-table data (maybe debug, maybe elsewhere) which identifies this range of trampolines as a funclet so that we can filter the stack trace frame and avoid binding breakpoints there.
The trampolines are tail-calls so they won't show up in the stack trace. The raw stack trace will have the auto-generated stubs that are separate dynamic methods that should get filtered in other ways.
The trampolines should be in "no gc" region that should prevent managed debuggers from considering them for breakpoints, similar to how instructions in prologs/epilogs are not considered for breakpoint locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestions above are focused on avoiding creating fake debug information. The resumption trampoline is not user written code. It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.
I am also not suggesting that you would introduce the fake mapping on the debugger side or for the ETW events. I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events. Instead of stopping at the trampoline IP, we can continue mapping and stop at the join IP. The join IP is inside user written code, so it has natural IL<->IP mappings for it.
Note that we would still stop at the trampoline IP for the async stack walking inside the runtime. We just would map the trampoline IP to the more friendly join IP before firing off diagnostics containing stack traces, like for EventPipe stack traces.
It seems to me people are preferring (trampoline IP, mappings with fake mapping) over (join IP, true mappings) and I am not sure I understand why, especially if the former requires us to build something so that the debugger knows to ignore the fake mapping. Are we really worried about the performance of doing the trampoline IP -> join IP
mapping in EventPipe? It is what I am unsure about because it seems to me that doing that mapping has the same cost as the existing unwind info access for synchronous stack walking.
it does require updates to every profiler to understand how to extract and apply an additional mapping
Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.
I think it has fairly natural IL location (the resume point in IL). For example, CPU sampling profiler should associate the samples at trampoline IP with the resume point IL offset (if not, what else you would associate these CPU samples with?).
I agree that the debugger may need some work since it has multiple different use cases for the mapping.
Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?
Both performance and layering, in particular in NAOT. Debug info is strongly separated and optional at runtime in NAOT. If we were to do this mapping at stackwalking time, it would lead to creating a new special info and/or folding it into GC info that's accessible by stackwalkng on NAOT today... .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events
My bad. Above you said "The trampoline IP to friendly IP mapping can be possible to do ... as part of stack trace formatting". In the context of EventPipe I thought we had defined:
- 'stack trace formatting' = profiler work out-of-proc
- 'stack walking' = runtime work in-proc.
Thats why I interpretted it as a suggestion to do the work in the profiler.
Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?
There are probably acceptable performance options that do the mapping at runtime when generating the event. However if I get to choose between stackwalks that cost 0.1us vs. 1us and there are minimal downsides I prefer the faster option. Do you think there is an advantage we are overlooking to do the conversion at runtime? When we were talking about state index -> IP mapping doing it at runtime avoids breaking the trace format which is a big advantage. But once state index is out of the picture now my reason to do it at runtime is gone.
Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?
The scenario where I expect it to show up is an async stackwalk where we are walking the linked list of Continuations. At runtime an async-aware stackwalker would be emitting a sequence of IPs by doing:
for(var cont = leaf_continuation; cont != null; cont = cont.Next)
Append(cont.ResumeIP);
That IP sequence gets serialized as (part of) the stacktrace for an event in the trace file. Then later a profiler is parsing those events from the file and it converts each of those IPs from absolute IP -> method relative native offset -> IL offset -> file/line number.
I pushed a commit that creates the unique trampolines for each suspension point we talked about in comments above. It means each The trampolines are not directly reflected in the IL<->IP mappings (see comments above for my thinking about them), so for stack trace formatting it's still necessary to map these back to the join point. I renamed |
PTR_LoaderHeap m_loaderHeap; | ||
}; | ||
|
||
class AsyncResumeILStubResolver : public ILStubResolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this with the latest scheme?
Uh oh!
There was an error while loading. Please reload this page.