Skip to content

Enable 'partial' AOT of the CLI #48790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Enable 'partial' AOT of the CLI #48790

wants to merge 6 commits into from

Conversation

agocke
Copy link
Member

@agocke agocke commented May 1, 2025

This PR demonstrates how the SDK could incrementally move pieces to Native AOT, while still keeping support for things that are incompatible.

The basic idea is as follows:

  • Move existing code for the CLI into Dotnet.Shared.dll, except for Main
  • Leave Main in the existing dotnet.dll
  • Add a new dotnet-aot.csproj that compiles with Native AOT to a native dotnet_aot.dll. This DLL implements a custom .NET host by loading hostfxr, starting the runtime, and running dotnet.dll.

All the existing code continues to be accessible through the dotnet.dll Main. However, dotnet_aot.dll provides a new native entry point. Currently the muxer (dotnet.exe) calls into dotnet.dll. However, the muxer could be modified to opportunistically load dotnet_aot.dll instead. dotnet_aot.dll now has the opportunity to examine the aruments and execute any code before running dotnet.dll.

With this implementation, the SDK could gradually and opportunistically move code to Native AOT, while still keeping incompatible pieces functioning like before.

@agocke
Copy link
Member Author

agocke commented May 1, 2025

@marcpopMSFT this is what I was mentioning earlier

@nagilson This could be used to AOT the --info command that we were discussing

@marcpopMSFT marcpopMSFT requested review from MiYanni and nagilson May 1, 2025 20:51
Copy link
Member

@nagilson nagilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a smart strategy and respect that you even have a proposed PR with changes to make this happen. 👏

I would consult @MiYanni before merging as this may conflict with some of his restructuring work.

One thing I don't understand yet at a quicker glance, is how the muxer would be determined and how it'd know which dll to run (the aot dll seems to pick based on its list of known sdkArgs, but that does not replace the actual muxer, right?) Would there be a shared data source that is versioned with the muxer, or the muxer would have an option to always try the aot dll first? Otherwise, there is a big issue where the muxer would need to know that .NET SDK 10.0.0 Preview Blah can only aot dotnet --version but Preview Blah+1 can aot dotnet --info etc.

One nit: It might be possible to use IntPtr instead of the byte**? I might be wrong. I am familiar with c++ but not native programming using c#.

@agocke
Copy link
Member Author

agocke commented May 2, 2025

One thing I don't understand yet at a quicker glance, is how the muxer would be determined and how it'd know which dll to run (the aot dll seems to pick based on its list of known sdkArgs, but that does not replace the actual muxer, right?)

Right -- the muxer (dotnet.exe) would still exist and we (the host team) would produce it. It would still be the entry point for all user actions. However, we would change the muxer to opportunistically load dotnet_aot.dll. Right now the muxer looks at the command line and tries to determine whether it's a command to the muxer itself (e.g. dotnet myapp.dll) or a command to the SDK. If it's a command to the SDK, it calculates the appropriate dotnet.dll target, finds the appropriate dotnet runtime, loads the runtime, and enters dotnet.dll.

The change here is that we would alter the muxer to first look for a libdotnet_aot.so next to dotnet.dll. If present, the muxer would just dlopen(libdotnet_aot.so), find the entry point, then call it, passing down the path to the appropriate dotnet.dll and libhostfxr.so. The AOT CLI would then try to execute the command itself. If it can't, it would use the parameters passed in to do the final step that the muxer used to do, which is start the runtime and launch dotnet.dll.

One nit: It might be possible to use IntPtr instead of the byte**? I might be wrong. I am familiar with c++ but not native programming using c#.

Yup you could. I like to use pointers when possible because C# at least forces you to explicitly cast between them if they're not the same. IntPtr doesn't have any static type checking so you're on your own.

@marcpopMSFT
Copy link
Member

Do we have any sense for what this will end up saving us? Our most common commands are restore, build, and test which I imagine all take a significant amount of time. Would we expect to eventually AOT those upstream dependencies as well? Would this mainly just save on overhead and speed up commands that should be faster like --info? Do we know the rough size of that savings?

As for drawbacks, what's the impact to the SDK size as well as sdk maintenance?

Copy link
Member

@MiYanni MiYanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not plan on making something named dotnet-shared right now. To give some high-level context, we are planning on creating a distributable CLI package. This package would add a layer on top of System.CommandLine that has features specific to the dotnet CLI. This will require a bunch of refactoring for the CLI in this repo. This is to help encourage consistency for anything wanting to integrate with the dotnet CLI.

That endeavor will take time and has the potential to increase some performance aspects of the CLI as the code gets cleaned up and consolidated. Adding this additional shim-esque project and renaming everything else will make it more complicated to do this kind of refactoring.

If the main goal here is performance, my approach to performance would be to:

  • Measure the performance of high-usage commands (based on telemetry)
  • Determine the "long tails" of those commands and see what can be streamlined to reduce performance overhead for those aspects
  • If startup perf is the only focus, then we'd also need to look at the runtime muxer to see if something can be made more streamlined there

My concerns about the approach proposed in this PR:

  • Traditionally, the CLI has been the "wild west" and many parts of the code just sit for many years with no significant changes or focus. If you make this proposal about "converting command piecemeal", it has a likelihood that the conversion process would be started, but never completed simply because priorities change.
  • I question the actual perf impact versus the engineering overhead incurred here. I fully believe that actual aspects of the running commands themselves can be improved to give greater performance gain.
  • Lastly, I still plan on changing the dotnet project to Microsoft.DotNet.Cli at some point, without changing the output DLL name. My initial PR to do this was rejected but as I create the externally shared library, there will likely be a migration to the project space as that name. I haven't determined the name for the external library yet, though.

@agocke
Copy link
Member Author

agocke commented May 3, 2025

Thanks for reviewing!

This will require a bunch of refactoring for the CLI in this repo

If there are specific operational concerns, it makes sense to work around them. I would leave it up to you to decide precisely when to merge this and how to structure the dependencies. As long as the native entry point is well known, everything downstream only matters to the CLI.

Traditionally, the CLI has been the "wild west" and many parts of the code just sit for many years with no significant changes or focus. If you make this proposal about "converting command piecemeal", it has a likelihood that the conversion process would be started, but never completed simply because priorities change.

This actually sounds like the right solution to me -- if it's never finished, that's because some pieces are too hard to move over or not worth it to move over. That sounds fine.

If the main goal here is performance, my approach to performance would be to:

Measure the performance of high-usage commands (based on telemetry)
Determine the "long tails" of those commands and see what can be streamlined to reduce performance overhead for those aspects
If startup perf is the only focus, then we'd also need to look at the runtime muxer to see if something can be made more streamlined there

These are good suggestions but they're a form of "top-down" optimization. Top-down optimization looks at hot spots and is primarily useful for addressing the general goal of "improving performance." But the goal here isn't general performance optimization, it's fitting the SDK into a specific "performance profile." For that we need to do "bottoms-up" optimization, where we define the goal, list all the work we need to do (and what we don't need to do), and allocate a budget for each piece of work. The profile we're trying to fit into is defined by two things:

  1. Competitive analysis with other language tooling
  2. Rules of UI response times

For (1), I looked at other language stacks for building hello-world: #33741. The results were that every competitor was < 1s, and some were <500ms. Scripting languages would of course be even faster. dotnet build was 1.7s. The lesson from this is that to be competitive we need to make things at least 2x faster, ideally 3x or more. In that budget we would need everything to add up to 500ms, including host, CLI, no-op restore, MSBuild, and csc. As the component doing the most work, we can assume that csc should have most of the budget, say 100-200ms. MSBuild does most of the remaining work, meaning it should receive most of the rest of the budget -- say 200ms. That leaves, at most, 100ms for the host, NuGet, and the CLI.

For (2), we're interested in the time limits associated with different classes of user experience. Per https://www.nngroup.com/articles/response-times-3-important-limits/, these are 100ms for instantaneous response, 1s for uninterrupted flow of thought, and 10s for keeping the user engaged. For most operations we definitely want to stay under 1s. There are certain operations (build, publish) where that's not realistic because they do significant compilation that scales with project size. In those cases we should focus on providing a progress indicator instead. But for all other operations: dotnet new, dotnet --info, even dotnet run for things that have already been built, I think we should be targeting the instantaneous response class, meaning < 100ms. The CLI will be many people's first interaction with .NET -- we want things to immediately feel fast.

So, the conclusion of both of those things seems to point in the same direction -- we should aim for a budget of significantly < 100ms for the CLI alone, probably closer to 50ms. This simply isn't going to be possible without AOT. The CLR load time for "hello world" is 30ms on a fast machine in the best case. The CLI is more code, more JITing, and more work. Right now it's 110ms in the best case on my fastest machines, just doing --version. As a long-term architecture I don't think there's an alternative to AOT for many parts of the experience.

@nagilson
Copy link
Member

nagilson commented May 6, 2025

We will be discussing this during our Wednesday design meeting tomorrow, I believe.

@marcpopMSFT
Copy link
Member

Results of discussion:
I think we want to understand the list of commands that we would want to prioritize for this this effort and determine what capacity we have. We know that anything that takes time already (like build) is not the initial priority for this though with build server hopefully coming later this release, there is potentially some savings there.

From the list, some that pop-out to us as interesting on dotnet run and dotnet new as ones where the non-AOT could be a larger impact to the overall command. The hello world run for example drops from 1800ms for a full run to 700ms for a --no-build run to 70ms for a direct call to the .exe file. For new, we do an implicit restore so we'd have to discuss what an AOT CLI command for restore would do.

Main issues that need to be resolved for some of the initial scenarios:

  • Can we do msbuild evaluation with msbuild server from AOT? If msbuild server is not running, what's the impact of spinning it up?
  • Application insights doesn't support AOT and has not plans to. The recommendation there would be to switch telemetry implementations which we don't have plans to do. The alternative proposal we came up with was to write out the telemetry and then send it on a future command that wasn't AOTed (like build if build is one of the last ones we AOT).

Other open questions:

  • How does this impact our plans to potentially create a library for the bootstrapper?
  • How does this impact our other CLI work?
  • What capacity do we have for a limited set of commands in .net 10 (ie would we be able to do run, --info, and new).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants