add some "no invalidations" tests for the sysimage #57884

nsajko · 2025-03-25T12:18:58Z

Test that defining certain handpicked type-method pairs (as a user) doesn't result in any invalidation of the sysimage.

Some of the tests are marked as broken.

Merging this will make Julia's contribution process more demanding: when someone makes a PR they might need to fix unrelated type stability issues, or decrease max_methods for some function, before being able to merge their initial PR. However that's better than ignoring the constant danger of type instability/invalidation regressions making the sysimage and precompilation less useful.

This change should only be merged while up-to-date with the master branch.

Fixes #47022

test/no_invalidations.jl

nsajko · 2025-06-10T13:40:35Z

Actually I had a discussion with @timholy on Slack regarding this PR back in March, amazingly it seems I can still read it:

https://julialang.slack.com/archives/C6FGJ8REC/p1743088607215269?thread_ts=1740925822.280469&cid=C6FGJ8REC

@timholy was basically advocating to check for good inference instead of checking for invalidations, although TBH I'm not sure they really checked out this PR.

I'm pretty sure:

The approach by Tim Holy would be good:
- to track stats across each commit on the master branch
- to calculate stats for each PR, similar to how Codecov posts a message with the coverage stats on some repos
The approach in this PR could also be good for the above, the difference is that any test failure here is a regression, so it should block a PR from being merged. Thus putting it in the test suite.

nsajko · 2025-06-10T13:55:11Z

Test that defining certain handpicked type-method pairs (as a user) doesn't result in any invalidation of the sysimage.

To expand on this: I want to ensure that user packages can create new types without causing invalidation of the sysimage. Otherwise there's a conflict between wanting to use Julia naturally in one's package and having to "ensure" the sysimage doesn't get invalidated (which is not something package authors should need to worry about).

This PR does not forbid all invalidation or all world-splitting.

nsajko · 2025-06-10T13:57:40Z

Furthermore, as can be seen from the test set names, this PR currently only encompasses some examples of adding subtypes of Integer or of AbstractString. The intention is for new test sets to be added over time.

timholy

There's no denying that this tests for something practically important: that certain common extensions don't trigger invalidation and thus longer TTFX. My concern is that it will catch issues for which there is no obvious fault and no sensible resolution.

For example, suppose someone submits a PR that breaks this test; let's say that adding a new foo method invalidates the compiled code for bar. Any one of the following changes in the PR would "fix" this test, but only the ones marked (*) represent an improvement in Julia:

add a dummy foo method to Base, pushing the number of applicable methods above max_methods for our default AbstractInterpreter (currently 3)
if there are already >3 foo methods but the invalidation in bar happens because bar is partially inferrable so that you at least know you're dealing with foo(::Real) and there are only a couple of applicable methods, you could worsen inference in bar so that inference only knows that it's foo(::Any). Thus more than 3 foo methods are potentially applicable at that call site and you've exceeded max_methods by making bar worse.
stop precompiling bar so it can't be invalidated
(maybe *) add an invokelatest or Base.invoke_in_world in bar
(*) improve inference in bar

I'm not saying there isn't a reason to consider any but the last; indeed, there was at least one case where we did the first of these because it prevented a common cause of massive invalidations. But this case can also be easily tested: @test length(methods, (foo, T)) > 3.

I think it's fair to say that the only case that's generically beneficial is the last one. To me this argues that we need an "inference quality" test. This PR is a very indirect way of testing inference quality, and I think we'd be better served by directly testing what matters.

nsajko · 2025-06-12T10:56:53Z

I disagree on some points:

You say that this PR is "a very indirect way of testing inference quality", and that's a valid interpretation. However, I'm pretty sure the approach in this PR is the only known approach of its kind with certain guarantees that are necessary for a test in the test suite: The approach in this PR guarantees that a test failure for a commit is caused by that commit. In contrast, your approach in Add a generic code-inferrability test #37193 (the only known alternative to this PR AFAIK) is based on aggregate information and arbitrary thresholds, so it might cause a delayed test failure, where a test failure only happens after several commits, the first of which was bad and some of which might even have been good! Correct me if I misunderstood your PR.
- So I'd say that this PR is the one "directly testing what matters".
Regarding your last proposed solution to a test failure:

(*) improve inference in bar

Indeed, sometimes it would be necessary to fix the inference in unrelated code after a test failure here. That was already noted in the PR description anyway, and, although this is a lot to demand of contributors, it seems necessary, because otherwise the sysimage and package precompilation are kinda useless, being able to be invalidated by a package author defining a new type, with some methods.

Among your proposed solutions to a test failure, you omitted one: decrease the max_methods value for a function. This can be a valid solution, of the "improvement to Julia" kind. I think it's already the case that a lot of (or all) functions which are meant to have methods added to by third parties should have their max_methods decreased to, e.g., 2. This is because it doesn't make sense to bake into the sysimage the assumption that no more methods will be added to a function, if the functions is expressly meant to have an arbitrary number of methods added to it by package authors. There's one place in Base where this is already done, for the copyto! function, which has its max_methods value set to 1, originally introduced in more latency improvements for #33615 #33779:

julia/base/abstractarray.jl

Lines 936 to 938 in c0ecfe9

    
           # This is `Experimental.@max_methods 1 function copyto! end`, which is not 
        
           # defined at this point in bootstrap. 
        
           typeof(function copyto! end).name.max_methods = UInt8(1)

nsajko · 2025-06-12T11:02:11Z

My concern is that it will catch issues for which there is no obvious fault and no sensible resolution.

I don't understand this conclusion. You yourself list "improve inference in bar" as a solution that's an improvement to Julia. In any case, invalidation of the sysimage is most definitely a fault.

adienes · 2025-06-12T13:16:41Z

incoming opinion from the peanut gallery: I generally associate unit tests (aka everything running under test/runtests.jl) to test mechanics and correctness of the code. then other passes can handle other measures of code quality, of which Julia already has quite a few (codecov, formatting checks, BaseBenchmarks)

something like this of course a bit blurs the line between testing "correctness" vs testing "quality" because changes in inference can and do change the results of operations. but I think personally speaking I would still associate this more with a "quality" pass than something to be put in unit tests.

it could even still be in CI, just under [Check] or [Allow Fail] rather than [Test]

This PR is a very indirect way of testing inference quality, and I think we'd be better served by directly testing what matters.

I think I tend to disagree with this sentiment, in that why not do both!

nsajko · 2025-06-12T13:28:26Z

something like this of course a bit blurs the line between testing "correctness" vs testing "quality"

I agree, speaking generally. The issue is that the sysimage and precompilation are really essential features for Julia, otherwise TTFX becomes unbearable. So I believe that sysimage invalidation should block a PR from being merged, thus these tests should be in Julia's test suite.

why not do both

Agree, apart from getting this PR merged, I definitely want more metrics available in CI. Of both the "inference quality" and of "invalidation count" kinds.

adienes · 2025-06-20T20:50:32Z

ref: #47093

The `eltype` function was one of the few functions in the sysimage with a `max_methods` value (the world-splitting threshold) greater than the default. This was a workaround for the unnecessarily large number of methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was increased in PR JuliaLang#48322 to help effect inference. Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also allows keeping the `max_methods` for `eltype` at the default value. The intent here is to guard against unnecessary invalidation of the sysimage or other precompiled code, which might require decreasing the `max_methods` value to the natural minimum for many interface functions, by which I mean "generic function meant to have methods added to it by package authors". See also: PR JuliaLang#57884.

The `eltype` function was one of the few functions in the sysimage with a `max_methods` value (the world-splitting threshold) greater than the default. This was a workaround for the unnecessarily large number of methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was increased in PR JuliaLang#48322 to help effect inference. Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also allows keeping the `max_methods` for `eltype` at the default value. The intent here is to guard against unnecessary invalidation of the sysimage or other precompiled code, which might require decreasing the `max_methods` value to the natural minimum for many interface functions, by which I mean "generic function meant to have methods added to it by package authors". I intend to approach this issue in following PRs. See also: PR JuliaLang#57884. Regarding future work: I consider the "natural minimum" value for interface functions taking types to be **two**, because the bottom type subtypes each type. If the interface function doesn't accept types, the natural minimum should be **one**.

Test that defining certain handpicked type-method pairs (as a user) doesn't result in any invalidation of the sysimage. Some of the tests are marked as broken. Merging this will make the contribution process more demanding: when someone makes a PR they might need to fix unrelated type stability issues, or decrease `max_methods` for some function, before being able to merge their initial PR. However that's better than ignoring the constant danger of type instability/invalidation regressions making the sysimage and precompilation less useful. This change should only be merged while up-to-date with the `master` branch. Fixes JuliaLang#47022

The `eltype` function was one of the few functions in the sysimage with a `max_methods` value (the world-splitting threshold) greater than the default. This was a workaround for the unnecessarily large number of methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was increased in PR JuliaLang#48322 to help effect inference. Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also allows keeping the `max_methods` for `eltype` at the default value. The intent here is to guard against unnecessary invalidation of the sysimage or other precompiled code, which might require decreasing the `max_methods` value to the natural minimum for many interface functions, by which I mean "generic function meant to have methods added to it by package authors". I intend to approach this issue in following PRs. See also: PR JuliaLang#57884. Regarding future work: I consider the "natural minimum" value for interface functions taking types to be **two**, because the bottom type subtypes each type. If the interface function doesn't accept types, the natural minimum should be **one**.

nsajko added test This change adds or pertains to unit tests invalidations labels Mar 25, 2025

nsajko force-pushed the test_for_no_invalidations branch from a09863b to 885c03e Compare March 25, 2025 22:22

nsajko changed the title ~~test for no invalidations~~ add some "no invalidations" tests Mar 26, 2025

nsajko force-pushed the test_for_no_invalidations branch 2 times, most recently from 02674e3 to 26efc09 Compare March 27, 2025 09:17

nsajko marked this pull request as draft March 28, 2025 13:24

nsajko force-pushed the test_for_no_invalidations branch from 1efd53f to 2fc945e Compare March 28, 2025 14:29

nsajko changed the title ~~add some "no invalidations" tests~~ add some "no invalidations" tests for the sysimage Mar 28, 2025

nsajko force-pushed the test_for_no_invalidations branch from 2fc945e to 18504c1 Compare March 28, 2025 14:33

nsajko marked this pull request as ready for review March 28, 2025 14:34

nsajko force-pushed the test_for_no_invalidations branch 4 times, most recently from 115b46a to 9cd212b Compare April 1, 2025 08:26

nsajko commented Apr 2, 2025

View reviewed changes

test/no_invalidations.jl Show resolved Hide resolved

nsajko added the speculative Whether the change will be implemented is speculative label Apr 14, 2025

nsajko force-pushed the test_for_no_invalidations branch 2 times, most recently from d99e0a8 to 8abdb2c Compare June 10, 2025 05:03

vchuravy requested a review from timholy June 10, 2025 09:00

timholy reviewed Jun 12, 2025

View reviewed changes

nsajko force-pushed the test_for_no_invalidations branch from 91b09e0 to dec4a8b Compare June 12, 2025 10:57

nsajko force-pushed the test_for_no_invalidations branch from 1266a12 to bd269ba Compare June 14, 2025 08:59

nsajko force-pushed the test_for_no_invalidations branch from 6d07c9d to 8967b5e Compare June 17, 2025 19:10

nsajko force-pushed the test_for_no_invalidations branch from f233ddd to 9c8c8a1 Compare June 21, 2025 02:04

nsajko mentioned this pull request Jun 21, 2025

decrease method count for eltype for Tuple, decrease max_methods #58788

Open

nsajko force-pushed the test_for_no_invalidations branch from 6c43f23 to bc6ab5c Compare June 21, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add some "no invalidations" tests for the sysimage #57884

add some "no invalidations" tests for the sysimage #57884

Uh oh!

nsajko commented Mar 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

timholy left a comment •

edited

Loading

Uh oh!

nsajko commented Jun 12, 2025

Uh oh!

nsajko commented Jun 12, 2025

Uh oh!

adienes commented Jun 12, 2025 •

edited

Loading

Uh oh!

nsajko commented Jun 12, 2025 •

edited

Loading

Uh oh!

adienes commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

add some "no invalidations" tests for the sysimage #57884

Are you sure you want to change the base?

add some "no invalidations" tests for the sysimage #57884

Uh oh!

Conversation

nsajko commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

nsajko commented Jun 10, 2025

Uh oh!

timholy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsajko commented Jun 12, 2025

Uh oh!

nsajko commented Jun 12, 2025

Uh oh!

adienes commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nsajko commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adienes commented Jun 20, 2025

Uh oh!

Uh oh!

nsajko commented Mar 25, 2025 •

edited

Loading

timholy left a comment •

edited

Loading

adienes commented Jun 12, 2025 •

edited

Loading

nsajko commented Jun 12, 2025 •

edited

Loading