Skip to content

Conversation

davidwrighton
Copy link
Member

  • This is needed to ensure that some of our SIMD tests finish in a vaguely reasonable timeframe
  • The peeps implemented here make that almost possible, although the behavior in CI hasn't yet been verified
  • Peeps implemented
    • stloc/ldloc ... This is a minor improvement, but will be needed to handle a future optimization around allowing the il stack to have constant values
    • box/unbox.any - Removes unnecessary boxing/unboxing
    • typeof(T)==typeof(Y) - This allows us to handle the type testing specialization behavior that is used heavily in the BCL
    • typeof(T).IsValueType - Used in the Unsafe.BitCast function

Not yet implemented is giving the interpreter stack a concept of constant values, so that we can optimize brtrue/brfalse and friends into unconditional branches. With this set of changes the if (typeof(T)==typeof(int)) { ... } else if (typeof(T)==typeof(float)) pattern is much faster than before, but the not taken paths are still fully generated interpreter code, and there are still many branches. That work will follow on in a future PR once we agree on the structure of this one

… sequences of IL opcodes

- This is needed to ensure that some of our SIMD tests finish in a vaguely reasonable timeframe
- The peeps implmented here make that almost possible, although the behavior in CI hasn't yet been verified
- Peeps implemented
  - stloc/ldloc  ... This is a minor improvement, but will be needed to handle a future optimization around allowing the il stack to have constant values
  - box/unbox.any - Removes unnecessary boxing/unboxing
  - typeof(T)==typeof(Y) - This allows us to handle the type testing specialization behavior that is used heavily in the BCL
  - typeof(T).IsValueType - Used in the Unsafe.BitCast function

Not yet implemented is giving the interpreter stack a concept of constant values, so that we can optimize brtrue/brfalse and friends into unconditional branches. With this set of changes the if (typeof(T)==typeof(int)) { ... } else if (typeof(T)==typeof(float)) pattern is *much* faster than before, but the not taken paths are still fully generated interpreter code, and there are still many branches.
@Copilot Copilot AI review requested due to automatic review settings October 16, 2025 23:50
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 16, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds peephole optimization infrastructure to the interpreter to recognize and replace specific IL opcode sequences (store/load locals, type equality/value type checks, box/unbox.any) with more efficient forms to improve performance (notably for SIMD-related scenarios).

  • Introduces OpcodePeep data structures and matching/apply logic.
  • Implements specific peepholes: stloc/ldloc, typeof(T)==typeof(U), box/unbox.any, typeof(T).IsValueType.
  • Extends intrinsic recognition for System.Type methods.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
src/coreclr/interpreter/intrinsics.cpp Adds intrinsic mapping for System.Type methods used by new peepholes.
src/coreclr/interpreter/compiler.h Declares peephole pattern structures and adds methods for identifying/applying optimizations.
src/coreclr/interpreter/compiler.cpp Defines peephole patterns and logic; integrates peephole application into code generation loop.

@jkotas jkotas added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 17, 2025
Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@kg kg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of this will probably get bad as we add more peeps, but it looks fine to me (EDIT: Minus the stuff Copilot found). Is there a specific reason why we're peephole optimizing the IL (and having to do redundant ResolveTokens and getCallInfos) instead of the interpreter opcodes? I can imagine either approach being the best one.

@hez2010
Copy link
Contributor

hez2010 commented Oct 17, 2025

Maybe a better approach is to build the interpreter as a codegen backend of RyuJIT so that we get all the optimizations for free?

@janvorli
Copy link
Member

Maybe a better approach is to build the interpreter as a codegen backend of RyuJIT so that we get all the optimizations for free?

Something along these lines has been actually the plan so far - stay away from optimizations in the interpreter and use RyuJIT to do them.

@davidwrighton do you see this change as a temporary thing until we can do that in RyuJIT?

@jkotas
Copy link
Member

jkotas commented Oct 17, 2025

This pattern matching is done by RyuJIT even with optimizations off. It is table stakes rather than an optimization. It is similar to must-expand intrinsics that are not an optimization either.

The byte code generated by interpreter today is super inefficient in places. We need to do some work like this to make it more reasonable. We shouldn’t be introducing new IRs or passes - that would make it seem like something suited for RyuJIT..

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@davidwrighton
Copy link
Member Author

@kg I want to do this optimization at the IL level, since the interpreter compiler doesn't have a functional general purpose optimization framework, and so the way to avoid operations in it is to never emit the interpreter opcodes. We may want to do a few tweaks around mov instructions, and we have a small number of existing details where we can omit branch instructions in some cases after emission, but we can't in general look back more than 1 instruction or so. So its best to avoid emitting the opcodes in the first place.

@janvorli I think this is a long term play to have this sort of optimization. Notably this is allowing us to get substantial wins in a few targeted scenarios, not a general purpose optimization system. I see us adding a few more details, such as a concept of constants/possible ranges on the IL stack, which will allow us to elide emitting various conversion, and branches. And I think we may also want to add a very simple inliner. (For the inliner, if we inline only functions which have no branches, and use a restricted set of instructions it's possible to get VERY big wins on performance without much complexity by making property accessors cheap, and having something which can inline the behavior of the various Unsafe.As/Cast/etc functions would be much cheaper than actually writing the giant stacks of intrinsics that we will otherwise require to get decent performance out of the BCL. This scheme is based on work Brian Smith did years ago when building the Compact Framework when building the JIT for it. That jit was actually very similar to this interpreter compiler, and the simple inliner produced extraordinary performance wins relative to its level of complexity. I have yet to discuss with @jkotas the value of this sort of inliner, but I think it would be reasonable.)

@hez2010 Yeah, RyuJit would be a better optimizing backend than writing a new optimizer. There is NO desire to write a high quality optimizing backend here, and we will not support work to do that, but work to make normal C# logic and logic of the sort that is in important parts of the BCL compile into something reasonable is in scope.

@jkotas
Copy link
Member

jkotas commented Oct 17, 2025

This scheme is based on work Brian Smith did years ago when building the Compact Framework when building the JIT for it. That jit was actually very similar to this interpreter compiler, and the simple inliner produced extraordinary performance wins relative to its level of complexity. I have yet to discuss with @jkotas the value of this sort of inliner, but I think it would be reasonable

Sounds reasonable to me. We will want to do this only for optimized code (interpreter does not have concept of optimized code today) so that it does not impact managed debugging.

@janvorli
Copy link
Member

such as a concept of constants

Mono already has this concept AFAIK, so it might be useful to take some inspiration from there.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

@davidwrighton davidwrighton enabled auto-merge (squash) October 17, 2025 20:35
@davidwrighton davidwrighton merged commit d148b5c into dotnet:main Oct 17, 2025
93 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants