Add debug information for runtime async methods #120303

jakobbotsch · 2025-10-01T19:35:35Z

Add new JIT-EE API to report back debug information about the generated state machine and continuations
Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. It is indicated by the first field (size of bounds) having value 0, which is an uncommon value for this field.
Add new async debug information to the storage on the VM side
Set target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.
Implement new format in R2R as well, bump R2R major version (might as well do this now as we expect to need to store async debug info in R2R during .NET 11 anyway)

- Add new JIT-EE API to report back debug information about the generated state machine and continuations - Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. - Add new async debug information to the storage on the VM side - Set get target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.

dotnet-policy-service · 2025-10-01T19:36:36Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/inc/cordebuginfo.h

jakobbotsch · 2025-10-02T10:20:48Z

src/coreclr/vm/debuginfostore.cpp

    CONTRACTL
    {
-        NOTHROW;
+        THROWS;


RestorePatchpointInfo is called from JitPatchpointWorker, which has STANDARD_VM_CONTRACT. I think throwing should be ok, since we only throw here on an internal inconsistency in the compressed data when encountered by NibbleReader. That's consistent with the other Restore routines and saves us having to write separate decoding routines for the patchpoint info.

src/coreclr/vm/debuginfostore.h

jakobbotsch · 2025-10-02T12:58:03Z

src/coreclr/vm/codeman.cpp

+    CompressDebugInfo::RestoreRichDebugInfo(
+        fpNew, pNewData,
+        pDebugInfo,
+        ppInlineTree, pNumInlineTree,
+        ppRichMappings, pNumRichMappings);
+
+    return TRUE;
+}


I figured I might as well hook this one up in ReadyToRunJitManager too since the format now technically allows for R2R images that contain the rich debug info, eeven if crossgen2 doesn't produce it.

…mental APIs in async tests

src/coreclr/jit/async.cpp

…ostics

rcj1 · 2025-10-14T02:32:29Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work, giving me an address that is about 0x1000 different from method start. However, in Windbg I am able to get the proper IP by going through the DAC, specifically by going through NativeCodeVersion::GetNativeCode().

Do you know why this is?

jkotas · 2025-10-14T03:21:42Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

rcj1 · 2025-10-14T13:09:00Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

jakobbotsch · 2025-10-14T13:33:10Z

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

Good point -- to get from the resumption stub back to the original code we need a lookup that gives back the IP of the exact code version we resume in. Today that gets allocated here while we JIT:

runtime/src/coreclr/vm/jitinterface.cpp

Lines 14674 to 14676 in 2787545

    
           { 
        
               m_finalCodeAddressSlot = (PCODE*)amTracker.Track(m_pMethodBeingCompiled->GetLoaderAllocator()->GetHighFrequencyHeap()->AllocMem(S_SIZE_T(sizeof(PCODE)))); 
        
           }

I think we can subclass ILStubResolver and keep it there for the resumption IL stubs, then get rid of that loader heap allocation. Let me look into this.

jakobbotsch · 2025-10-14T14:11:58Z

@rcj1 I pushed a commit that adds a new AsyncResumeILStubResolver, and the async resumption stubs will have this resolver. There is an AsyncResumeILStubResolver::GetFinalResumeMethodStartAddress() that can be used to retrieve the start address of the method that resumption is going to end up in.

lateralusX · 2025-10-14T15:29:08Z

src/coreclr/vm/debuginfostore.h

+    static BOOL GetAsyncDebugInfo(
+        const DebugInfoRequest & request,
+        IN FP_IDS_NEW fpNew, IN void * pNewData,
+        OUT ICorDebugInfo::AsyncInfo* pAsyncInfo,


Any reason why we keep the number of suspension points in AsyncInfo and number of vars as an out parameter? Since we have the AsyncInfo struct, wouldn't it make sense to put all the out parameters inside that struct?

It is just the fact that the length of the async vars array is not an interesting piece of semantic information about the async method, while the number of suspension points is. So I included the length of that array in the normal "API hygienic" way, while I put the number of suspension points inside ICorDebugInfo::AsyncInfo which contains the semantically interesting method-level information.

I also considered duplicating the length of the suspension points array in the API signature, for API hygiene/consistency, but it feels redundant/confusing the have the same number twice.

lateralusX · 2025-10-14T15:35:29Z

src/coreclr/vm/debuginfostore.h

        OUT ICorDebugInfo::RichOffsetMapping** ppRichMappings,
        OUT ULONG32*                           pNumRichMappings);

+    static BOOL GetAsyncDebugInfo(


Would it be possible to get an optimized version of this function? It will be a common scenario when stackwalking to just request data for a specific continuation resume state index and we are only interested in the suspension point data and not local vars. If we could scope it down to just one item, then I could have a custom fpNew and pNewData using a stack allocated ICorDebugInfo::AsyncSuspensionPoint, meaning there is no need for any dynamic memory allocation, and we could skip to the requested index in async debug info and only extract requested information.

I will look into a way to extract the native offset of a particular state number in constant time.

stackwalking to just request data for a specific continuation resume state index

It still feels wrong for stackwalking to parse the debug info.

Perhaps we should be treating the state index <-> native IP mapping as new unwind data rather than new debug data? Alternately if the Continuations aren't shared across different async methods then putting the info directly in the MethodTable is an option.

To make sure that we are using the same terminology, there are two steps:

Stack walking: Populates Exception._stackTrace with raw data. For async methods, the raw data is (Resume, State) pair and potential keep alive root. We should not need debug info to find (Resume, State) pair. Is that correct?

Stack trace formatting: Converting Exception._stackTrace to a string, like what Exception.ToString() does. This is several orders more expansive than (1). It can use metadata, debug info, etc.

(There is similar two-step process with other diagnostic scenarios, e.g. CPU profiling.)

I am not suggesting to make State part of the contract. I am saying we have the option to avoid creating IL<->IP mappings for the trampolines and instead map from trampoline IP to join IP via the async debug info. This avoids the ambiguity in IL<->IP mappings.

Ah, I misunderstood you, sorry about that! In the case of EventPipe I have been treating 'stack trace formatting' as 'the profiler does it in a separate process after reading the trace file'. So doing a trampoline IP -> join IP conversion would require we add the async debug info to the serialized trace data and update all profilers to understand how to do that mapping. Its not as disruptive as including state index because it doesn't require a trace file format update but it does require updates to every profiler to understand how to extract and apply an additional mapping. If the debug data in the runtime stores resume IP -> join IP and join ip -> IL offset mappings separately then the EventPipe code would just merge those two mappings into a single native->IL mapping every time we emitted them. We'd probably wind up doing the same thing when reporting the mappings via the ICorDebug interface or ICorProfiler interface. Its probably simpler to store it in the merged format rather than merge it on demand everywhere we report it but both options seem doable if we find benefits to keeping the storage separated.

Sounds like we are converging on a plan!

The trampoline IP to friendly IP mapping can be possible to do via the async debug info as part of stack trace formatting. Is it good enough to keep this as the way to map

I think it is fine to start with the trampoline mapping on the side. We will have to extend the debug info for async in some ways, and this is part of the extension. I do not have an opinion about the details - I might have an opinion once I see the stack formatting code.

unlike synchronous stack walking the async stack walking has not paid to access unwind info

It is a good question what a reasonable budget per frame for (sync) stack walking is. There are several ways to do sync stackwalking for diagnostics: unwind infos, frame-chains, shadow stacks. Frame-chains walk a link list, shadow stacks copy a memory block - both of these have amortized cost per frame measured in nanoseconds. SFrame does tricks that circumvent the costs of traditional unwind infos to get close (I do not recall seeing numbers), so you do not have a lot to play with if you want to be the state of the art.

I do think it will be useful to have something in our side-table data (maybe debug, maybe elsewhere) which identifies this range of trampolines as a funclet so that we can filter the stack trace frame and avoid binding breakpoints there.

The trampolines are tail-calls so they won't show up in the stack trace. The raw stack trace will have the auto-generated stubs that are separate dynamic methods that should get filtered in other ways.

The trampolines should be in "no gc" region that should prevent managed debuggers from considering them for breakpoints, similar to how instructions in prologs/epilogs are not considered for breakpoint locations.

My suggestions above are focused on avoiding creating fake debug information. The resumption trampoline is not user written code. It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.

I am also not suggesting that you would introduce the fake mapping on the debugger side or for the ETW events. I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events. Instead of stopping at the trampoline IP, we can continue mapping and stop at the join IP. The join IP is inside user written code, so it has natural IL<->IP mappings for it.

Note that we would still stop at the trampoline IP for the async stack walking inside the runtime. We just would map the trampoline IP to the more friendly join IP before firing off diagnostics containing stack traces, like for EventPipe stack traces.

It seems to me people are preferring (trampoline IP, mappings with fake mapping) over (join IP, true mappings) and I am not sure I understand why, especially if the former requires us to build something so that the debugger knows to ignore the fake mapping. Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe? It is what I am unsure about because it seems to me that doing that mapping has the same cost as the existing unwind info access for synchronous stack walking.

it does require updates to every profiler to understand how to extract and apply an additional mapping

Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?

It does not make sense to me to create an IL<->IP mapping for it when it has no IL location.

I think it has fairly natural IL location (the resume point in IL). For example, CPU sampling profiler should associate the samples at trampoline IP with the resume point IL offset (if not, what else you would associate these CPU samples with?).

I agree that the debugger may need some work since it has multiple different use cases for the mapping.

Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?

Both performance and layering, in particular in NAOT. Debug info is strongly separated and optional at runtime in NAOT. If we were to do this mapping at stackwalking time, it would lead to creating a new special info and/or folding it into GC info that's accessible by stackwalkng on NAOT today... .

I am suggesting that we go a step further for the continuation -> IP mapping before we fire stack trace events

My bad. Above you said "The trampoline IP to friendly IP mapping can be possible to do ... as part of stack trace formatting". In the context of EventPipe I thought we had defined:

'stack trace formatting' = profiler work out-of-proc

'stack walking' = runtime work in-proc.

Thats why I interpretted it as a suggestion to do the work in the profiler.

Are we really worried about the performance of doing the trampoline IP -> join IP mapping in EventPipe?

There are probably acceptable performance options that do the mapping at runtime when generating the event. However if I get to choose between stackwalks that cost 0.1us vs. 1us and there are minimal downsides I prefer the faster option. Do you think there is an advantage we are overlooking to do the conversion at runtime? When we were talking about state index -> IP mapping doing it at runtime avoids breaking the trace format which is a big advantage. But once state index is out of the picture now my reason to do it at runtime is gone.

Maybe I can put my confusion in a different way: how does the profiler access the stack trace in a way where async frames ended up stopped at the trampoline IP?

The scenario where I expect it to show up is an async stackwalk where we are walking the linked list of Continuations. At runtime an async-aware stackwalker would be emitting a sequence of IPs by doing:

for(var cont = leaf_continuation; cont != null; cont = cont.Next) Append(cont.ResumeIP);

That IP sequence gets serialized as (part of) the stacktrace for an event in the trace file. Then later a profiler is parsing those events from the file and it converts each of those IPs from absolute IP -> method relative native offset -> IL offset -> file/line number.

…ostics

jakobbotsch · 2025-10-17T21:22:37Z

I pushed a commit that creates the unique trampolines for each suspension point we talked about in comments above. It means each Continuation now has a unique Resume for each State, and that Resume directly points into the runtime async method that suspended and that will resume.

The trampolines are not directly reflected in the IL<->IP mappings (see comments above for my thinking about them), so for stack trace formatting it's still necessary to map these back to the join point. I renamed ICorDebugInfo::AsyncSuspensionPoint::NativeOffset to NativeJoinOffset and there is now an additional NativeResumeOffset that gives the offset that Continuation.Resume will point to (not super useful when you already have the Continuation as you can get the state from there, but will be useful if you only stored Continuation.Resume).

jkotas · 2025-10-18T01:06:06Z

src/coreclr/vm/ilstubresolver.h

    PTR_LoaderHeap          m_loaderHeap;
 };

+class AsyncResumeILStubResolver : public ILStubResolver


Do we still need this with the latest scheme?

jakobbotsch added 3 commits October 1, 2025 14:54

Set target method desc for resumption stubs

b7bb968

Add JIT-EE boilerplate

d5d0864

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 1, 2025

jakobbotsch changed the title ~~Add debug information for runtime async information~~ Add debug information for runtime async methods Oct 1, 2025

dotnet-policy-service bot assigned jakobbotsch Oct 1, 2025

jakobbotsch commented Oct 1, 2025

View reviewed changes

src/coreclr/inc/cordebuginfo.h Outdated Show resolved Hide resolved

jakobbotsch added 8 commits October 1, 2025 21:42

Update managed view

6addd0e

Remove TODOs

6a5bf19

Leaf transition

262260a

Comment

4e2a829

Delete unused enum

820afa9

Restore comment

3621eae

Fix osx build

f597424

Fix GCC build

6bb6cee

am11 added the runtime-async label Oct 1, 2025

build-analysis bot mentioned this pull request Oct 2, 2025

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

Change sentinel value, fix contract

f580e3f

jakobbotsch commented Oct 2, 2025

View reviewed changes

src/coreclr/vm/debuginfostore.h Show resolved Hide resolved

jakobbotsch added 3 commits October 2, 2025 13:41

Bump R2R

55b9522

Clean up

c9c64be

Expose async debug info accessor APIs

db77446

jakobbotsch commented Oct 2, 2025

View reviewed changes

jakobbotsch added 5 commits October 2, 2025 16:18

Missed bumping R2R version for naot

c808390

Fix reverse mapping to IL local nums

3ec5160

Fix monotonicity for async vars

bb12c77

Code style

bf3364b

Fix JIT-EE prompt tools from Egor's instructions, allow use of experi…

b49f38f

…mental APIs in async tests

lateralusX reviewed Oct 8, 2025

View reviewed changes

src/coreclr/jit/async.cpp Outdated Show resolved Hide resolved

jakobbotsch added 6 commits October 8, 2025 18:03

Publish NextContinuation in TLS

c056793

Rename ThunkTask -> RuntimeAsyncTask

9679f91

Merge branch 'main' of github.com:dotnet/runtime into jit-async-diagn…

7a34475

…ostics

Report native offsets instead

e839409

Run jit-format

d8ad0c1

Print reported async debug info, always report it

9fdd8a7

This was referenced Oct 9, 2025

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

[android] Android.Device_Emulator.JIT.Test failing on emulators with CoreCLR #112633

Open

Store target IPs in AsyncResumeILStubResolver

69e02db

Fix bug

68c245b

lateralusX reviewed Oct 14, 2025

View reviewed changes

Remove BBF_INTERNAL from rethrow BB to avoid broken mappings

2031f08

max-charlamb mentioned this pull request Oct 16, 2025

[cDAC] Runtime Change Backlog #120797

Open

2 tasks

jakobbotsch mentioned this pull request Oct 16, 2025

Follow-up work for new profiler ClassLoad events in .NET 11 #120799

Open

jakobbotsch added 5 commits October 17, 2025 23:07

Trampolines for resumption

b7576b2

Merge branch 'main' of github.com:dotnet/runtime into jit-async-diagn…

03f85ec

…ostics

Fix after merge

06fea81

Enable runtime async testing

29d8998

Undo changes

cb7a943

jakobbotsch added 2 commits October 17, 2025 23:46

Hacky late Friday GC issue fix

879ec4f

Fix 32 bit build

0749688

jkotas reviewed Oct 18, 2025

View reviewed changes

Add debug information for runtime async methods #120303

Are you sure you want to change the base?

Add debug information for runtime async methods #120303

Uh oh!

Conversation

jakobbotsch commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Oct 1, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rcj1 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Oct 14, 2025

Uh oh!

rcj1 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch commented Oct 14, 2025

Uh oh!

jakobbotsch commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch commented Oct 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jakobbotsch commented Oct 1, 2025 •

edited

Loading

rcj1 commented Oct 14, 2025 •

edited

Loading

rcj1 commented Oct 14, 2025 •

edited

Loading

jkotas Oct 16, 2025 •

edited

Loading

jkotas Oct 17, 2025 •

edited

Loading

jakobbotsch Oct 17, 2025 •

edited

Loading