-
Notifications
You must be signed in to change notification settings - Fork 575
windows: Use new hints.mostly-unused
#3660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wow that's encouraging! |
Thanks! That seems like a great improvement. Two quick questions:
|
I kicked off #3661 to see whether we can identify an observable improvement with this flag. Perhaps I'm doing something wrong but it doesn't appear to help. Without (master): https://github.com/microsoft/windows-rs/actions/runs/16221219482 Thoughts? |
@kennykerr That CI job is doing |
Because it's a performance loss if applied when it isn't a good fit. If you apply it to a crate with 1000 items of which almost every user uses ~10, it's a big win. If you apply it to a crate with 10 items of which the average user uses most of them, it's not just neutral, it'll likely make compilation time worse. |
I don't know if it would make sense. It's worth testing. |
@kennykerr Ah, I see; I saw the titles all saying "check" and made an incorrect assumption. It can benefit test, but only if the tests exercise only a small fraction of the API surface area. If the tests are anywhere near comprehensive, then they won't demonstrate any benefit. The performance win comes from real-world crates using |
The tests are tiny but so is the function space, due to their limited number of feature enabled. I wonder if these speed improvements are specific to crates bringing in |
Oh, I just realized the likely problem. This is being trialed in nightly, so you'll need to use a nightly rustc/cargo, and pass |
I used https://github.com/microsoft/windows-rs/pull/3661/files Should I instead use |
Setting |
Thanks, I made the suggested changes to #3661 but I don't see a noticeable improvement. |
@kennykerr 🤦 I just realized what the problem is here. https://github.com/microsoft/windows-rs/actions/runs/16304789650/job/46048204434?pr=3661#step:162:7
Before putting out a call for testing, it would have been good to make sure the change in cargo was synced to It looks like this might take until the 2025-07-17 nightly. I'll update the blog post. |
No problem, we can kick that PR again when the latest nightly is available. |
Most users of the `windows` crate will use a fraction of its API surface area. Nightly rustc provides an option `-Zhint-mostly-unused` to tell it to defer as much compilation as possible, which provides a substantial performance improvement if most of that compilation doesn't end up happening. Cargo plumbs this option through using the new `[hints]` table. This will cause users of the `windows` crate to default to setting `hint-mostly-unused`. (Top-level crates can override this if they wish, using a new profile option.) Note that setting this hint does not increase the MSRV of the Windows crate, as old versions of Cargo will ignore it. New versions of Cargo will respect it automatically (and, until we stabilize it, Cargo will do nothing unless you pass `-Zprofile-hint-mostly-unused` to cargo). Some sample performance numbers: this takes `windows` compilation time with all Graphics and UI features enabled from 18.3s to 10.7s (a 42% improvement), and takes compilation time with *all* features enabled from 3m48s to 2m55s (a 23% improvement).
e2a8025
to
afc1e77
Compare
@kennykerr Current nightly as of today should now work. Give it another try? |
I reran https://github.com/microsoft/windows-rs/actions/runs/16361241438 but still don't see any noticeable improvement. |
Can you share an example where this clearly helps? |
By example I mean something like this. Before:
After:
The change is harmless enough, but we need a compelling example that illustrates to early adopters how it might be beneficial for them in general. This example, which is very representative, just does not bring the advertised 20-40% improvement. |
@kennykerr The net effect of the change is larger the more feature flags you have enabled on the That said, you'll notice the effect more strongly in release builds:
(Also note the difference in "User" time, which reflects CPU time used by all threads.) The effect becomes larger the more features you enable; for instance, if I enable all of |
@joshtriplett Are you testing on Windows? macOS? Other? Just wanted to follow along and make sure I'm on the same machine. |
I'm cross-compiling from Linux. |
I have tried release builds as well and it makes no difference. Perhaps it is unique to GNU or Linux builds. |
Can you post the output from the same Also, how many CPUs are you building on? |
This comment was marked as outdated.
This comment was marked as outdated.
Running with the correct branch this time Windows 26200.5702 / msvc 17.14.5-pre1 Run without flag
Run with flag against
Run with flag against modified
Re-run (to eliminate outliers warning) with flag against modified
|
Can't tell anything from this data, it's too noisy. Will try cranking up the number of runs. |
@riverar The "User" time numbers are pretty definitive already. Is it possible that the wall-clock numbers are being affected by other tasks happening on your system? (Also, the run you're doing labeled "Run without flag" is testing the same build twice. Each hyperfine invocation is already comparing results with and without the flag.) |
Those numbers are misleading--with a huge ~±0.50 variance, the faster results could actually be much slower (e.g., 30%). (The user time difference does look more promising, agree.) I'm trying to complete 100 runs but statistical outliers keep showing up. My dev drive (specialized ReFS) or system must be unstable/noisy.
Understood. That was just a run to get an idea how unstable the tests were. I was expecting with that run to be closer to 1.0x than it spat out. (Done with edits.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution.
Most users of the
windows
crate will use a fraction of its API surfacearea.
Nightly rustc provides an option
-Zhint-mostly-unused
to tell it todefer as much compilation as possible, which provides a substantial
performance improvement if most of that compilation doesn't end up
happening. Cargo plumbs this option through using the new
[hints]
table. This will cause users of the
windows
crate to default tosetting
hint-mostly-unused
. (Top-level crates can override this ifthey wish, using a new profile option.)
Note that setting this hint does not increase the MSRV of the Windows
crate, as old versions of Cargo will ignore it. New versions of Cargo
will respect it automatically (and, until we stabilize it, Cargo will do
nothing unless you pass
-Zprofile-hint-mostly-unused
to cargo).Some sample performance numbers:
hint-mostly-unused
windows
, all Graphics/UI featureswindows
, all features