windows: Use new `hints.mostly-unused` #3660

joshtriplett · 2025-07-13T07:35:41Z

Most users of the windows crate will use a fraction of its API surface
area.

Nightly rustc provides an option -Zhint-mostly-unused to tell it to
defer as much compilation as possible, which provides a substantial
performance improvement if most of that compilation doesn't end up
happening. Cargo plumbs this option through using the new [hints]
table. This will cause users of the windows crate to default to
setting hint-mostly-unused. (Top-level crates can override this if
they wish, using a new profile option.)

Note that setting this hint does not increase the MSRV of the Windows
crate, as old versions of Cargo will ignore it. New versions of Cargo
will respect it automatically (and, until we stabilize it, Cargo will do
nothing unless you pass -Zprofile-hint-mostly-unused to cargo).

Some sample performance numbers:

Dependency Crate	Before	`hint-mostly-unused`	Delta
`windows`, all Graphics/UI features	18.3s	10.7s	-42%
`windows`, all features	3m 48s	2m 55s	-23%

riverar · 2025-07-13T07:43:40Z

Wow that's encouraging!

kennykerr · 2025-07-14T14:55:25Z

Thanks! That seems like a great improvement. Two quick questions:

Why is this not enabled by default when available, as apposed to requiring the hint?
Would it make sense to add this hint to the windows-sys crate as well or are the savings mostly around function bodies?

kennykerr · 2025-07-14T17:00:51Z

I kicked off #3661 to see whether we can identify an observable improvement with this flag. Perhaps I'm doing something wrong but it doesn't appear to help.

Without (master): https://github.com/microsoft/windows-rs/actions/runs/16221219482
With hint (this PR): https://github.com/microsoft/windows-rs/actions/runs/16272221637

Thoughts?

joshtriplett · 2025-07-15T17:32:34Z

@kennykerr That CI job is doing check, which already skips code generation. This hint matters for folks who are doing build (and especially build -r).

kennykerr · 2025-07-15T17:38:16Z

Thanks Josh, I appreciate that clarification. Sorry if it is a little misleading, but that workflow runs cargo test individually on all of the crates in the repo (which is a lot). Testing requires code generation or does this hint truly only benefit cargo build?

joshtriplett · 2025-07-15T17:41:12Z

Why is this not enabled by default when available, as apposed to requiring the hint?

Because it's a performance loss if applied when it isn't a good fit.

If you apply it to a crate with 1000 items of which almost every user uses ~10, it's a big win. If you apply it to a crate with 10 items of which the average user uses most of them, it's not just neutral, it'll likely make compilation time worse.

joshtriplett · 2025-07-15T17:43:02Z

Would it make sense to add this hint to the windows-sys crate as well or are the savings mostly around function bodies?

I don't know if it would make sense. It's worth testing.

joshtriplett · 2025-07-15T18:16:04Z

@kennykerr Ah, I see; I saw the titles all saying "check" and made an incorrect assumption.

It can benefit test, but only if the tests exercise only a small fraction of the API surface area. If the tests are anywhere near comprehensive, then they won't demonstrate any benefit.

The performance win comes from real-world crates using windows, which often pull it in and only call a few functions.

riverar · 2025-07-15T18:46:21Z

It can benefit test, but only if the tests exercise only a small fraction of the API surface area. If the tests are anywhere near comprehensive, then they won't demonstrate any benefit.

The tests are tiny but so is the function space, due to their limited number of feature enabled. I wonder if these speed improvements are specific to crates bringing in windows with features like Windows_Win32_UI_Shell enabled.

joshtriplett · 2025-07-15T19:04:55Z

Oh, I just realized the likely problem. This is being trialed in nightly, so you'll need to use a nightly rustc/cargo, and pass -Zprofile-hint-mostly-unused to cargo, or it'll do nothing.

kennykerr · 2025-07-15T19:07:58Z

I used -Zhint-mostly-unused here:

https://github.com/microsoft/windows-rs/pull/3661/files

Should I instead use -Zprofile-hint-mostly-unused?

joshtriplett · 2025-07-15T20:38:14Z

I used -Zhint-mostly-unused here:

https://github.com/microsoft/windows-rs/pull/3661/files

Should I instead use -Zprofile-hint-mostly-unused?

Setting RUSTFLAGS=-Zhint-mostly-unused will have the net effect of setting it for every dependency; that may not be a good idea. I would suggest setting the hint and then using cargo -Zprofile-hint-mostly-unused.

kennykerr · 2025-07-15T22:10:38Z

Thanks, I made the suggested changes to #3661 but I don't see a noticeable improvement.

joshtriplett · 2025-07-16T03:06:49Z

@kennykerr 🤦 I just realized what the problem is here.

https://github.com/microsoft/windows-rs/actions/runs/16304789650/job/46048204434?pr=3661#step:162:7

warning: D:\a\windows-rs\windows-rs\crates\libs\windows\Cargo.toml: unused manifest key: hints

Before putting out a call for testing, it would have been good to make sure the change in cargo was synced to rust-lang/rust (which is a manual process).

rust-lang/rust#143998

It looks like this might take until the 2025-07-17 nightly. I'll update the blog post.

kennykerr · 2025-07-16T18:14:39Z

No problem, we can kick that PR again when the latest nightly is available.

Most users of the `windows` crate will use a fraction of its API surface area. Nightly rustc provides an option `-Zhint-mostly-unused` to tell it to defer as much compilation as possible, which provides a substantial performance improvement if most of that compilation doesn't end up happening. Cargo plumbs this option through using the new `[hints]` table. This will cause users of the `windows` crate to default to setting `hint-mostly-unused`. (Top-level crates can override this if they wish, using a new profile option.) Note that setting this hint does not increase the MSRV of the Windows crate, as old versions of Cargo will ignore it. New versions of Cargo will respect it automatically (and, until we stabilize it, Cargo will do nothing unless you pass `-Zprofile-hint-mostly-unused` to cargo). Some sample performance numbers: this takes `windows` compilation time with all Graphics and UI features enabled from 18.3s to 10.7s (a 42% improvement), and takes compilation time with *all* features enabled from 3m48s to 2m55s (a 23% improvement).

joshtriplett · 2025-07-17T21:35:36Z

@kennykerr Current nightly as of today should now work. Give it another try?

kennykerr · 2025-07-21T13:47:50Z

I reran https://github.com/microsoft/windows-rs/actions/runs/16361241438 but still don't see any noticeable improvement.

kennykerr · 2025-07-21T18:07:50Z

Can you share an example where this clearly helps?

kennykerr · 2025-07-22T15:52:43Z

By example I mean something like this.

Before:

E:\git\windows-rs>cls && cargo clean && cargo build -p sample_direct2d
     Removed 644 files, 405.6MiB total
warning: windows@0.61.3: ignoring 'hints.mostly-unused', pass `-Zprofile-hint-mostly-unused` to enable it
<snip>
   Compiling sample_direct2d v0.0.0 (E:\git\windows-rs\crates\samples\windows\direct2d)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 13.97s

After:

E:\git\windows-rs>cls && cargo clean && cargo build -p sample_direct2d -Zprofile-hint-mostly-unused
     Removed 681 files, 452.4MiB total
<snip>
   Compiling sample_direct2d v0.0.0 (E:\git\windows-rs\crates\samples\windows\direct2d)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 13.86s

The change is harmless enough, but we need a compelling example that illustrates to early adopters how it might be beneficial for them in general. This example, which is very representative, just does not bring the advertised 20-40% improvement.

joshtriplett · 2025-07-22T17:11:53Z

@kennykerr The net effect of the change is larger the more feature flags you have enabled on the windows crate. If you enable very few features, the codegen time is already small enough that the savings is hard to measure (but it does no harm). If you enable more features (or features that gate large API surfaces), the savings become more obvious.

That said, you'll notice the effect more strongly in release builds:

~/src/windows-rs$ hyperfine -M 4 -p 'cargo clean' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu -Zprofile-hint-mostly-unused' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu'
Benchmark 1: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu -Zprofile-hint-mostly-unused
  Time (mean ± σ):      8.458 s ±  0.086 s    [User: 10.598 s, System: 1.248 s]
  Range (min … max):    8.362 s …  8.555 s    4 runs
 
Benchmark 2: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu
  Time (mean ± σ):      9.011 s ±  0.081 s    [User: 13.206 s, System: 1.287 s]
  Range (min … max):    8.903 s …  9.098 s    4 runs
 
Summary
  cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu -Zprofile-hint-mostly-unused ran
    1.07 ± 0.01 times faster than cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-gnu

(Also note the difference in "User" time, which reflects CPU time used by all threads.)

The effect becomes larger the more features you enable; for instance, if I enable all of Graphics_*, I get a 1.43x difference rather than 1.07x. Data_* shows 1.28x.

riverar · 2025-07-22T17:38:16Z

@joshtriplett Are you testing on Windows? macOS? Other? Just wanted to follow along and make sure I'm on the same machine.

joshtriplett · 2025-07-22T18:01:56Z

@joshtriplett Are you testing on Windows? macOS? Other? Just wanted to follow along and make sure I'm on the same machine.

I'm cross-compiling from Linux.

kennykerr · 2025-07-22T18:04:25Z

I have tried release builds as well and it makes no difference. Perhaps it is unique to GNU or Linux builds.

joshtriplett · 2025-07-22T18:08:58Z

I have tried release builds as well and it makes no difference. Perhaps it is unique to GNU or Linux builds.

Can you post the output from the same hyperfine command I ran (but for whichever target you prefer, e.g. -msvc)?

Also, how many CPUs are you building on?

riverar · 2025-07-22T18:46:35Z

Running with the correct branch this time test-hint-mostly-unused.

Windows 26200.5702 / msvc 17.14.5-pre1
rustc 1.90.0-nightly (9748d87dc 2025-07-21)
32virt / 16phy cores

Run without flag -Zprofile-hint-mostly-unused:

hyperfine -M 4 -p 'cargo clean' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc'
Benchmark 1: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc
  Time (mean ± σ):     18.961 s ±  4.717 s    [User: 12.921 s, System: 2.600 s]
  Range (min … max):   12.208 s … 23.201 s    4 runs

Benchmark 2: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc
  Time (mean ± σ):     16.071 s ±  3.417 s    [User: 12.917 s, System: 2.600 s]
  Range (min … max):   12.405 s … 20.577 s    4 runs

Summary
  cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc ran
    1.18 ± 0.39 times faster than cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc

Run with flag against sample_direct2d:

hyperfine -M 4 -p 'cargo clean' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc'
Benchmark 1: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused
  Time (mean ± σ):     14.953 s ±  3.616 s    [User: 10.967 s, System: 2.299 s]
  Range (min … max):   12.395 s … 20.173 s    4 runs

Benchmark 2: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc
  Time (mean ± σ):     19.158 s ±  4.786 s    [User: 12.862 s, System: 2.900 s]
  Range (min … max):   12.989 s … 23.831 s    4 runs

Summary
  cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused ran
    1.28 ± 0.45 times faster than cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc

Run with flag against modified sample_direct2d (including all Graphics_* features):

hyperfine -M 4 -p 'cargo clean' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc'
Benchmark 1: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused
  Time (mean ± σ):     21.136 s ±  7.984 s    [User: 12.952 s, System: 2.495 s]
  Range (min … max):   14.721 s … 32.063 s    4 runs

Benchmark 2: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc
  Time (mean ± σ):     19.191 s ±  6.038 s    [User: 19.584 s, System: 3.147 s]
  Range (min … max):   16.033 s … 28.246 s    4 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc ran
    1.10 ± 0.54 times faster than cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused

Re-run (to eliminate outliers warning) with flag against modified sample_direct2d (including all Graphics_* features):

hyperfine -M 4 -p 'cargo clean' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused' 'cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc'
Benchmark 1: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused
  Time (mean ± σ):     21.034 s ±  6.907 s    [User: 12.627 s, System: 2.999 s]
  Range (min … max):   15.290 s … 30.111 s    4 runs

Benchmark 2: cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc
  Time (mean ± σ):     22.919 s ±  6.668 s    [User: 19.834 s, System: 3.850 s]
  Range (min … max):   16.242 s … 32.130 s    4 runs

Summary
  cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc -Zprofile-hint-mostly-unused ran
    1.09 ± 0.48 times faster than cargo +nightly build -r -p sample_direct2d --target x86_64-pc-windows-msvc

riverar · 2025-07-22T18:57:05Z

Can't tell anything from this data, it's too noisy. Will try cranking up the number of runs.

joshtriplett · 2025-07-22T20:15:07Z

@riverar The "User" time numbers are pretty definitive already. Is it possible that the wall-clock numbers are being affected by other tasks happening on your system?

(Also, the run you're doing labeled "Run without flag" is testing the same build twice. Each hyperfine invocation is already comparing results with and without the flag.)

riverar · 2025-07-22T20:20:36Z

Those numbers are misleading--with a huge ~±0.50 variance, the faster results could actually be much slower (e.g., 30%). (The user time difference does look more promising, agree.)

I'm trying to complete 100 runs but statistical outliers keep showing up. My dev drive (specialized ReFS) or system must be unstable/noisy.

(Also, the run you're doing labeled "Run without flag" is testing the same build twice. Each hyperfine invocation is already comparing results with and without the flag.)

Understood. That was just a run to get an idea how unstable the tests were. I was expecting with that run to be closer to 1.0x than it spat out.

(Done with edits.)

kennykerr

Thanks for the contribution.

kennykerr mentioned this pull request Jul 14, 2025

Hopefully speed up the test workflow via hint-mostly-unused #3661

Closed

ChrisDenton mentioned this pull request Jul 15, 2025

Experiment with the mostly-unused Cargo hint to optimize compile times for downstream users #3664

Closed

joshtriplett force-pushed the hint-mostly-unused branch from e2a8025 to afc1e77 Compare July 17, 2025 21:33

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into hint-mostly-unused

8abc640

kennykerr approved these changes Jul 23, 2025

View reviewed changes

kennykerr merged commit cff9e38 into microsoft:master Jul 23, 2025
29 checks passed

windows: Use new hints.mostly-unused #3660

windows: Use new hints.mostly-unused #3660

Uh oh!

Conversation

joshtriplett commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riverar commented Jul 13, 2025

Uh oh!

kennykerr commented Jul 14, 2025

Uh oh!

kennykerr commented Jul 14, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

kennykerr commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

riverar commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

kennykerr commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 15, 2025

Uh oh!

kennykerr commented Jul 15, 2025

Uh oh!

joshtriplett commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennykerr commented Jul 16, 2025

Uh oh!

joshtriplett commented Jul 17, 2025

Uh oh!

kennykerr commented Jul 21, 2025

Uh oh!

kennykerr commented Jul 21, 2025

Uh oh!

kennykerr commented Jul 22, 2025

Uh oh!

joshtriplett commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riverar commented Jul 22, 2025

Uh oh!

joshtriplett commented Jul 22, 2025

Uh oh!

kennykerr commented Jul 22, 2025

Uh oh!

joshtriplett commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

riverar commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riverar commented Jul 22, 2025

Uh oh!

joshtriplett commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riverar commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennykerr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

windows: Use new `hints.mostly-unused` #3660

windows: Use new `hints.mostly-unused` #3660

joshtriplett commented Jul 13, 2025 •

edited

Loading

joshtriplett commented Jul 16, 2025 •

edited

Loading

joshtriplett commented Jul 22, 2025 •

edited

Loading

joshtriplett commented Jul 22, 2025 •

edited

Loading

riverar commented Jul 22, 2025 •

edited

Loading

joshtriplett commented Jul 22, 2025 •

edited

Loading

riverar commented Jul 22, 2025 •

edited

Loading