[clr-interp] Add a scheme for adding peephole optimizations for known sequences of IL opcodes #120827

davidwrighton · 2025-10-16T23:50:17Z

This is needed to ensure that some of our SIMD tests finish in a vaguely reasonable timeframe
The peeps implemented here make that almost possible, although the behavior in CI hasn't yet been verified
Peeps implemented
- stloc/ldloc ... This is a minor improvement, but will be needed to handle a future optimization around allowing the il stack to have constant values
- box/unbox.any - Removes unnecessary boxing/unboxing
- typeof(T)==typeof(Y) - This allows us to handle the type testing specialization behavior that is used heavily in the BCL
- typeof(T).IsValueType - Used in the Unsafe.BitCast function

Not yet implemented is giving the interpreter stack a concept of constant values, so that we can optimize brtrue/brfalse and friends into unconditional branches. With this set of changes the if (typeof(T)==typeof(int)) { ... } else if (typeof(T)==typeof(float)) pattern is much faster than before, but the not taken paths are still fully generated interpreter code, and there are still many branches. That work will follow on in a future PR once we agree on the structure of this one

… sequences of IL opcodes - This is needed to ensure that some of our SIMD tests finish in a vaguely reasonable timeframe - The peeps implmented here make that almost possible, although the behavior in CI hasn't yet been verified - Peeps implemented - stloc/ldloc ... This is a minor improvement, but will be needed to handle a future optimization around allowing the il stack to have constant values - box/unbox.any - Removes unnecessary boxing/unboxing - typeof(T)==typeof(Y) - This allows us to handle the type testing specialization behavior that is used heavily in the BCL - typeof(T).IsValueType - Used in the Unsafe.BitCast function Not yet implemented is giving the interpreter stack a concept of constant values, so that we can optimize brtrue/brfalse and friends into unconditional branches. With this set of changes the if (typeof(T)==typeof(int)) { ... } else if (typeof(T)==typeof(float)) pattern is *much* faster than before, but the not taken paths are still fully generated interpreter code, and there are still many branches.

Copilot

Pull Request Overview

Adds peephole optimization infrastructure to the interpreter to recognize and replace specific IL opcode sequences (store/load locals, type equality/value type checks, box/unbox.any) with more efficient forms to improve performance (notably for SIMD-related scenarios).

Introduces OpcodePeep data structures and matching/apply logic.
Implements specific peepholes: stloc/ldloc, typeof(T)==typeof(U), box/unbox.any, typeof(T).IsValueType.
Extends intrinsic recognition for System.Type methods.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
src/coreclr/interpreter/intrinsics.cpp	Adds intrinsic mapping for System.Type methods used by new peepholes.
src/coreclr/interpreter/compiler.h	Declares peephole pattern structures and adds methods for identifying/applying optimizations.
src/coreclr/interpreter/compiler.cpp	Defines peephole patterns and logic; integrates peephole application into code generation loop.

src/coreclr/interpreter/compiler.cpp

dotnet-policy-service · 2025-10-17T00:46:22Z

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

src/coreclr/interpreter/compiler.cpp

src/coreclr/interpreter/intrinsics.cpp

kg

The performance of this will probably get bad as we add more peeps, but it looks fine to me (EDIT: Minus the stuff Copilot found). Is there a specific reason why we're peephole optimizing the IL (and having to do redundant ResolveTokens and getCallInfos) instead of the interpreter opcodes? I can imagine either approach being the best one.

hez2010 · 2025-10-17T05:00:39Z

Maybe a better approach is to build the interpreter as a codegen backend of RyuJIT so that we get all the optimizations for free?

janvorli · 2025-10-17T11:48:26Z

Maybe a better approach is to build the interpreter as a codegen backend of RyuJIT so that we get all the optimizations for free?

Something along these lines has been actually the plan so far - stay away from optimizations in the interpreter and use RyuJIT to do them.

@davidwrighton do you see this change as a temporary thing until we can do that in RyuJIT?

jkotas · 2025-10-17T15:17:18Z

This pattern matching is done by RyuJIT even with optimizations off. It is table stakes rather than an optimization. It is similar to must-expand intrinsics that are not an optimization either.

The byte code generated by interpreter today is super inefficient in places. We need to do some work like this to make it more reasonable. We shouldn’t be introducing new IRs or passes - that would make it seem like something suited for RyuJIT..

src/coreclr/interpreter/compiler.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

davidwrighton · 2025-10-17T18:27:56Z

@kg I want to do this optimization at the IL level, since the interpreter compiler doesn't have a functional general purpose optimization framework, and so the way to avoid operations in it is to never emit the interpreter opcodes. We may want to do a few tweaks around mov instructions, and we have a small number of existing details where we can omit branch instructions in some cases after emission, but we can't in general look back more than 1 instruction or so. So its best to avoid emitting the opcodes in the first place.

@janvorli I think this is a long term play to have this sort of optimization. Notably this is allowing us to get substantial wins in a few targeted scenarios, not a general purpose optimization system. I see us adding a few more details, such as a concept of constants/possible ranges on the IL stack, which will allow us to elide emitting various conversion, and branches. And I think we may also want to add a very simple inliner. (For the inliner, if we inline only functions which have no branches, and use a restricted set of instructions it's possible to get VERY big wins on performance without much complexity by making property accessors cheap, and having something which can inline the behavior of the various Unsafe.As/Cast/etc functions would be much cheaper than actually writing the giant stacks of intrinsics that we will otherwise require to get decent performance out of the BCL. This scheme is based on work Brian Smith did years ago when building the Compact Framework when building the JIT for it. That jit was actually very similar to this interpreter compiler, and the simple inliner produced extraordinary performance wins relative to its level of complexity. I have yet to discuss with @jkotas the value of this sort of inliner, but I think it would be reasonable.)

@hez2010 Yeah, RyuJit would be a better optimizing backend than writing a new optimizer. There is NO desire to write a high quality optimizing backend here, and we will not support work to do that, but work to make normal C# logic and logic of the sort that is in important parts of the BCL compile into something reasonable is in scope.

jkotas · 2025-10-17T18:57:59Z

This scheme is based on work Brian Smith did years ago when building the Compact Framework when building the JIT for it. That jit was actually very similar to this interpreter compiler, and the simple inliner produced extraordinary performance wins relative to its level of complexity. I have yet to discuss with @jkotas the value of this sort of inliner, but I think it would be reasonable

Sounds reasonable to me. We will want to do this only for optimized code (interpreter does not have concept of optimized code today) so that it does not impact managed debugging.

janvorli · 2025-10-17T18:59:27Z

such as a concept of constants

Mono already has this concept AFAIK, so it might be useful to take some inspiration from there.

janvorli

LGTM, thank you.

src/coreclr/interpreter/compiler.cpp

src/coreclr/interpreter/intrinsics.cpp

Copilot AI review requested due to automatic review settings October 16, 2025 23:50

davidwrighton requested review from BrzVlad, janvorli and kg as code owners October 16, 2025 23:50

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 16, 2025

dotnet-policy-service bot assigned davidwrighton Oct 16, 2025

Copilot AI reviewed Oct 16, 2025

View reviewed changes

jkotas added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 17, 2025

kg reviewed Oct 17, 2025

View reviewed changes

src/coreclr/interpreter/compiler.cpp Show resolved Hide resolved

kg reviewed Oct 17, 2025

View reviewed changes

src/coreclr/interpreter/intrinsics.cpp Show resolved Hide resolved

kg approved these changes Oct 17, 2025

View reviewed changes

This was referenced Oct 17, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

BrzVlad reviewed Oct 17, 2025

View reviewed changes

src/coreclr/interpreter/compiler.cpp Show resolved Hide resolved

Apply suggestions from code review

4141fcd

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

janvorli approved these changes Oct 17, 2025

View reviewed changes

src/coreclr/interpreter/compiler.cpp Outdated Show resolved Hide resolved

src/coreclr/interpreter/compiler.cpp Show resolved Hide resolved

src/coreclr/interpreter/intrinsics.cpp Show resolved Hide resolved

Code review

4f83f2c

davidwrighton enabled auto-merge (squash) October 17, 2025 20:35

janvorli approved these changes Oct 17, 2025

View reviewed changes

davidwrighton merged commit d148b5c into dotnet:main Oct 17, 2025
93 of 95 checks passed

[clr-interp] Add a scheme for adding peephole optimizations for known sequences of IL opcodes #120827

[clr-interp] Add a scheme for adding peephole optimizations for known sequences of IL opcodes #120827

Conversation

davidwrighton commented Oct 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dotnet-policy-service bot commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

kg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hez2010 commented Oct 17, 2025

Uh oh!

janvorli commented Oct 17, 2025

Uh oh!

jkotas commented Oct 17, 2025

Uh oh!

Uh oh!

davidwrighton commented Oct 17, 2025

Uh oh!

jkotas commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janvorli commented Oct 17, 2025

Uh oh!

janvorli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kg left a comment •

edited

Loading

jkotas commented Oct 17, 2025 •

edited

Loading