Skip to content

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 1, 2025

  • Add new JIT-EE API to report back debug information about the generated state machine and continuations
  • Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. It is indicated by the first field (size of bounds) having value 0, which is an uncommon value for this field.
  • Add new async debug information to the storage on the VM side
  • Set target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.
  • Implement new format in R2R as well, bump R2R major version (might as well do this now as we expect to need to store async debug info in R2R during .NET 11 anyway)

- Add new JIT-EE API to report back debug information about the
  generated state machine and continuations
- Refactor debug info storage on VM side to be more easily extensible.
  The new format has either a thin or fat header. The fat header is used
  when we have either uninstrumented bounds, patchpoint info, rich debug
  info or async debug info, and stores the blob sizes of all of those
  components in addition to the bounds and vars.
- Add new async debug information to the storage on the VM side
- Set get target method desc for async resumption stubs, to be used for
  mapping from continuations back to the async IL function that it will
  resume.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 1, 2025
@jakobbotsch jakobbotsch changed the title Add debug information for runtime async information Add debug information for runtime async methods Oct 1, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

CONTRACTL
{
NOTHROW;
THROWS;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RestorePatchpointInfo is called from JitPatchpointWorker, which has STANDARD_VM_CONTRACT. I think throwing should be ok, since we only throw here on an internal inconsistency in the compressed data when encountered by NibbleReader. That's consistent with the other Restore routines and saves us having to write separate decoding routines for the patchpoint info.

Comment on lines +6548 to +6555
CompressDebugInfo::RestoreRichDebugInfo(
fpNew, pNewData,
pDebugInfo,
ppInlineTree, pNumInlineTree,
ppRichMappings, pNumRichMappings);

return TRUE;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured I might as well hook this one up in ReadyToRunJitManager too since the format now technically allows for R2R images that contain the rich debug info, eeven if crossgen2 doesn't produce it.

@rcj1
Copy link
Contributor

rcj1 commented Oct 14, 2025

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work, giving me an address that is about 0x1000 different from method start. However, in Windbg I am able to get the proper IP by going through the DAC, specifically by going through NativeCodeVersion::GetNativeCode().

Do you know why this is?

@jkotas
Copy link
Member

jkotas commented Oct 14, 2025

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

@rcj1
Copy link
Contributor

rcj1 commented Oct 14, 2025

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

@jakobbotsch
Copy link
Member Author

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

Good point -- to get from the resumption stub back to the original code we need a lookup that gives back the IP of the exact code version we resume in. Today that gets allocated here while we JIT:

{
m_finalCodeAddressSlot = (PCODE*)amTracker.Track(m_pMethodBeingCompiled->GetLoaderAllocator()->GetHighFrequencyHeap()->AllocMem(S_SIZE_T(sizeof(PCODE))));
}

I think we can subclass ILStubResolver and keep it there for the resumption IL stubs, then get rid of that loader heap allocation. Let me look into this.

@jakobbotsch
Copy link
Member Author

@rcj1 I pushed a commit that adds a new AsyncResumeILStubResolver, and the async resumption stubs will have this resolver. There is an AsyncResumeILStubResolver::GetFinalResumeMethodStartAddress() that can be used to retrieve the start address of the method that resumption is going to end up in.

static BOOL GetAsyncDebugInfo(
const DebugInfoRequest & request,
IN FP_IDS_NEW fpNew, IN void * pNewData,
OUT ICorDebugInfo::AsyncInfo* pAsyncInfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we keep the number of suspension points in AsyncInfo and number of vars as an out parameter? Since we have the AsyncInfo struct, wouldn't it make sense to put all the out parameters inside that struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just the fact that the length of the async vars array is not an interesting piece of semantic information about the async method, while the number of suspension points is. So I included the length of that array in the normal "API hygienic" way, while I put the number of suspension points inside ICorDebugInfo::AsyncInfo which contains the semantically interesting method-level information.

I also considered duplicating the length of the suspension points array in the API signature, for API hygiene/consistency, but it feels redundant/confusing the have the same number twice.

OUT ICorDebugInfo::RichOffsetMapping** ppRichMappings,
OUT ULONG32* pNumRichMappings);

static BOOL GetAsyncDebugInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get an optimized version of this function? It will be a common scenario when stackwalking to just request data for a specific continuation resume state index and we are only interested in the suspension point data and not local vars. If we could scope it down to just one item, then I could have a custom fpNew and pNewData using a stack allocated ICorDebugInfo::AsyncSuspensionPoint, meaning there is no need for any dynamic memory allocation, and we could skip to the requested index in async debug info and only extract requested information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into a way to extract the native offset of a particular state number in constant time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stackwalking to just request data for a specific continuation resume state index

It still feels wrong for stackwalking to parse the debug info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should be treating the state index <-> native IP mapping as new unwind data rather than new debug data? Alternately if the Continuations aren't shared across different async methods then putting the info directly in the MethodTable is an option.

Copy link
Member

@jkotas jkotas Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that we are using the same terminology, there are two steps:

  1. Stack walking: Populates Exception._stackTrace with raw data. For async methods, the raw data is (Resume, State) pair and potential keep alive root. We should not need debug info to find (Resume, State) pair. Is that correct?
  2. Stack trace formatting: Converting Exception._stackTrace to a string, like what Exception.ToString() does. This is several orders more expansive than (1). It can use metadata, debug info, etc.

(There is similar two-step process with other diagnostic scenarios, e.g. CPU profiling.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not suggesting to make State part of the contract. I am saying we have the option to avoid creating IL<->IP mappings for the trampolines and instead map from trampoline IP to join IP via the async debug info. This avoids the ambiguity in IL<->IP mappings.

Ah, I misunderstood you, sorry about that! In the case of EventPipe I have been treating 'stack trace formatting' as 'the profiler does it in a separate process after reading the trace file'. So doing a trampoline IP -> join IP conversion would require we add the async debug info to the serialized trace data and update all profilers to understand how to do that mapping. Its not as disruptive as including state index because it doesn't require a trace file format update but it does require updates to every profiler to understand how to extract and apply an additional mapping. If the debug data in the runtime stores resume IP -> join IP and join ip -> IL offset mappings separately then the EventPipe code would just merge those two mappings into a single native->IL mapping every time we emitted them. We'd probably wind up doing the same thing when reporting the mappings via the ICorDebug interface or ICorProfiler interface. Its probably simpler to store it in the merged format rather than merge it on demand everywhere we report it but both options seem doable if we find benefits to keeping the storage separated.

Copy link
Member

@jkotas jkotas Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we are converging on a plan!

The trampoline IP to friendly IP mapping can be possible to do via the async debug info as part of stack trace formatting. Is it good enough to keep this as the way to map

I think it is fine to start with the trampoline mapping on the side. We will have to extend the debug info for async in some ways, and this is part of the extension. I do not have an opinion about the details - I might have an opinion once I see the stack formatting code.

unlike synchronous stack walking the async stack walking has not paid to access unwind info

It is a good question what a reasonable budget per frame for (sync) stack walking is. There are several ways to do sync stackwalking for diagnostics: unwind infos, frame-chains, shadow stacks. Frame-chains walk a link list, shadow stacks copy a memory block - both of these have amortized cost per frame measured in nanoseconds. SFrame does tricks that circumvent the costs of traditional unwind infos to get close (I do not recall seeing numbers), so you do not have a lot to play with if you want to be the state of the art.

I do think it will be useful to have something in our side-table data (maybe debug, maybe elsewhere) which identifies this range of trampolines as a funclet so that we can filter the stack trace frame and avoid binding breakpoints there.

The trampolines are tail-calls so they won't show up in the stack trace. The raw stack trace will have the auto-generated stubs that are separate dynamic methods that should get filtered in other ways.

The trampolines should be in "no gc" region that should prevent managed debuggers from considering them for breakpoints, similar to how instructions in prologs/epilogs are not considered for breakpoint locations.

Copy link
Member Author

@jakobbotsch jakobbotsch Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions above are focused on avoiding creating fake debug information. The resumption trampoline is not user written code. It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.

I am also not suggesting that you would introduce the fake mapping on the debugger side or for the ETW events. I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events. Instead of stopping at the trampoline IP, we can continue mapping and stop at the join IP. The join IP is inside user written code, so it has natural IL<->IP mappings for it.

Note that we would still stop at the trampoline IP for the async stack walking inside the runtime. We just would map the trampoline IP to the more friendly join IP before firing off diagnostics containing stack traces, like for EventPipe stack traces.

It seems to me people are preferring (trampoline IP, mappings with fake mapping) over (join IP, true mappings) and I am not sure I understand why, especially if the former requires us to build something so that the debugger knows to ignore the fake mapping. Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe? It is what I am unsure about because it seems to me that doing that mapping has the same cost as the existing unwind info access for synchronous stack walking.

it does require updates to every profiler to understand how to extract and apply an additional mapping

Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.

I think it has fairly natural IL location (the resume point in IL). For example, CPU sampling profiler should associate the samples at trampoline IP with the resume point IL offset (if not, what else you would associate these CPU samples with?).

I agree that the debugger may need some work since it has multiple different use cases for the mapping.

Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?

Both performance and layering, in particular in NAOT. Debug info is strongly separated and optional at runtime in NAOT. If we were to do this mapping at stackwalking time, it would lead to creating a new special info and/or folding it into GC info that's accessible by stackwalkng on NAOT today... .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events

My bad. Above you said "The trampoline IP to friendly IP mapping can be possible to do ... as part of stack trace formatting". In the context of EventPipe I thought we had defined:

  • 'stack trace formatting' = profiler work out-of-proc
  • 'stack walking' = runtime work in-proc.

Thats why I interpretted it as a suggestion to do the work in the profiler.

Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?

There are probably acceptable performance options that do the mapping at runtime when generating the event. However if I get to choose between stackwalks that cost 0.1us vs. 1us and there are minimal downsides I prefer the faster option. Do you think there is an advantage we are overlooking to do the conversion at runtime? When we were talking about state index -> IP mapping doing it at runtime avoids breaking the trace format which is a big advantage. But once state index is out of the picture now my reason to do it at runtime is gone.

Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?

The scenario where I expect it to show up is an async stackwalk where we are walking the linked list of Continuations. At runtime an async-aware stackwalker would be emitting a sequence of IPs by doing:

for(var cont = leaf_continuation; cont != null; cont = cont.Next)
   Append(cont.ResumeIP);

That IP sequence gets serialized as (part of) the stacktrace for an event in the trace file. Then later a profiler is parsing those events from the file and it converts each of those IPs from absolute IP -> method relative native offset -> IL offset -> file/line number.

@jakobbotsch
Copy link
Member Author

I pushed a commit that creates the unique trampolines for each suspension point we talked about in comments above. It means each Continuation now has a unique Resume for each State, and that Resume directly points into the runtime async method that suspended and that will resume.

The trampolines are not directly reflected in the IL<->IP mappings (see comments above for my thinking about them), so for stack trace formatting it's still necessary to map these back to the join point. I renamed ICorDebugInfo::AsyncSuspensionPoint::NativeOffset to NativeJoinOffset and there is now an additional NativeResumeOffset that gives the offset that Continuation.Resume will point to (not super useful when you already have the Continuation as you can get the state from there, but will be useful if you only stored Continuation.Resume).

PTR_LoaderHeap m_loaderHeap;
};

class AsyncResumeILStubResolver : public ILStubResolver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this with the latest scheme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI runtime-async

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants