Skip to content

add some "no invalidations" tests for the sysimage #57884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented Mar 25, 2025

Test that defining certain handpicked type-method pairs (as a user) doesn't result in any invalidation of the sysimage.

Some of the tests are marked as broken.

Merging this will make Julia's contribution process more demanding: when someone makes a PR they might need to fix unrelated type stability issues, or decrease max_methods for some function, before being able to merge their initial PR. However that's better than ignoring the constant danger of type instability/invalidation regressions making the sysimage and precompilation less useful.

This change should only be merged while up-to-date with the master branch.

Fixes #47022

@nsajko nsajko added test This change adds or pertains to unit tests invalidations labels Mar 25, 2025
@nsajko nsajko force-pushed the test_for_no_invalidations branch from a09863b to 885c03e Compare March 25, 2025 22:22
@nsajko nsajko changed the title test for no invalidations add some "no invalidations" tests Mar 26, 2025
@nsajko nsajko force-pushed the test_for_no_invalidations branch 2 times, most recently from 02674e3 to 26efc09 Compare March 27, 2025 09:17
@nsajko nsajko marked this pull request as draft March 28, 2025 13:24
@nsajko nsajko force-pushed the test_for_no_invalidations branch from 1efd53f to 2fc945e Compare March 28, 2025 14:29
@nsajko nsajko changed the title add some "no invalidations" tests add some "no invalidations" tests for the sysimage Mar 28, 2025
@nsajko nsajko force-pushed the test_for_no_invalidations branch from 2fc945e to 18504c1 Compare March 28, 2025 14:33
@nsajko nsajko marked this pull request as ready for review March 28, 2025 14:34
@nsajko nsajko force-pushed the test_for_no_invalidations branch 4 times, most recently from 115b46a to 9cd212b Compare April 1, 2025 08:26
@nsajko nsajko added the speculative Whether the change will be implemented is speculative label Apr 14, 2025
@nsajko nsajko force-pushed the test_for_no_invalidations branch 2 times, most recently from d99e0a8 to 8abdb2c Compare June 10, 2025 05:03
@vchuravy vchuravy requested a review from timholy June 10, 2025 09:00
@nsajko
Copy link
Contributor Author

nsajko commented Jun 10, 2025

Actually I had a discussion with @timholy on Slack regarding this PR back in March, amazingly it seems I can still read it:

@timholy was basically advocating to check for good inference instead of checking for invalidations, although TBH I'm not sure they really checked out this PR.

I'm pretty sure:

  • The approach by Tim Holy would be good:

    • to track stats across each commit on the master branch

    • to calculate stats for each PR, similar to how Codecov posts a message with the coverage stats on some repos

  • The approach in this PR could also be good for the above, the difference is that any test failure here is a regression, so it should block a PR from being merged. Thus putting it in the test suite.

@nsajko
Copy link
Contributor Author

nsajko commented Jun 10, 2025

Test that defining certain handpicked type-method pairs (as a user) doesn't result in any invalidation of the sysimage.

To expand on this: I want to ensure that user packages can create new types without causing invalidation of the sysimage. Otherwise there's a conflict between wanting to use Julia naturally in one's package and having to "ensure" the sysimage doesn't get invalidated (which is not something package authors should need to worry about).

This PR does not forbid all invalidation or all world-splitting.

@nsajko
Copy link
Contributor Author

nsajko commented Jun 10, 2025

Furthermore, as can be seen from the test set names, this PR currently only encompasses some examples of adding subtypes of Integer or of AbstractString. The intention is for new test sets to be added over time.

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no denying that this tests for something practically important: that certain common extensions don't trigger invalidation and thus longer TTFX. My concern is that it will catch issues for which there is no obvious fault and no sensible resolution.

For example, suppose someone submits a PR that breaks this test; let's say that adding a new foo method invalidates the compiled code for bar. Any one of the following changes in the PR would "fix" this test, but only the ones marked (*) represent an improvement in Julia:

  • add a dummy foo method to Base, pushing the number of applicable methods above max_methods for our default AbstractInterpreter (currently 3)
  • if there are already >3 foo methods but the invalidation in bar happens because bar is partially inferrable so that you at least know you're dealing with foo(::Real) and there are only a couple of applicable methods, you could worsen inference in bar so that inference only knows that it's foo(::Any). Thus more than 3 foo methods are potentially applicable at that call site and you've exceeded max_methods by making bar worse.
  • stop precompiling bar so it can't be invalidated
  • (maybe *) add an invokelatest or Base.invoke_in_world in bar
  • (*) improve inference in bar

I'm not saying there isn't a reason to consider any but the last; indeed, there was at least one case where we did the first of these because it prevented a common cause of massive invalidations. But this case can also be easily tested: @test length(methods, (foo, T)) > 3.

I think it's fair to say that the only case that's generically beneficial is the last one. To me this argues that we need an "inference quality" test. This PR is a very indirect way of testing inference quality, and I think we'd be better served by directly testing what matters.

@nsajko
Copy link
Contributor Author

nsajko commented Jun 12, 2025

I disagree on some points:

  • You say that this PR is "a very indirect way of testing inference quality", and that's a valid interpretation. However, I'm pretty sure the approach in this PR is the only known approach of its kind with certain guarantees that are necessary for a test in the test suite: The approach in this PR guarantees that a test failure for a commit is caused by that commit. In contrast, your approach in Add a generic code-inferrability test #37193 (the only known alternative to this PR AFAIK) is based on aggregate information and arbitrary thresholds, so it might cause a delayed test failure, where a test failure only happens after several commits, the first of which was bad and some of which might even have been good! Correct me if I misunderstood your PR.

    • So I'd say that this PR is the one "directly testing what matters".
  • Regarding your last proposed solution to a test failure:

    (*) improve inference in bar

    Indeed, sometimes it would be necessary to fix the inference in unrelated code after a test failure here. That was already noted in the PR description anyway, and, although this is a lot to demand of contributors, it seems necessary, because otherwise the sysimage and package precompilation are kinda useless, being able to be invalidated by a package author defining a new type, with some methods.

  • Among your proposed solutions to a test failure, you omitted one: decrease the max_methods value for a function. This can be a valid solution, of the "improvement to Julia" kind. I think it's already the case that a lot of (or all) functions which are meant to have methods added to by third parties should have their max_methods decreased to, e.g., 2. This is because it doesn't make sense to bake into the sysimage the assumption that no more methods will be added to a function, if the functions is expressly meant to have an arbitrary number of methods added to it by package authors. There's one place in Base where this is already done, for the copyto! function, which has its max_methods value set to 1, originally introduced in more latency improvements for #33615 #33779:

    julia/base/abstractarray.jl

    Lines 936 to 938 in c0ecfe9

    # This is `Experimental.@max_methods 1 function copyto! end`, which is not
    # defined at this point in bootstrap.
    typeof(function copyto! end).name.max_methods = UInt8(1)

@nsajko nsajko force-pushed the test_for_no_invalidations branch from 91b09e0 to dec4a8b Compare June 12, 2025 10:57
@nsajko
Copy link
Contributor Author

nsajko commented Jun 12, 2025

My concern is that it will catch issues for which there is no obvious fault and no sensible resolution.

I don't understand this conclusion. You yourself list "improve inference in bar" as a solution that's an improvement to Julia. In any case, invalidation of the sysimage is most definitely a fault.

@adienes
Copy link
Member

adienes commented Jun 12, 2025

incoming opinion from the peanut gallery: I generally associate unit tests (aka everything running under test/runtests.jl) to test mechanics and correctness of the code. then other passes can handle other measures of code quality, of which Julia already has quite a few (codecov, formatting checks, BaseBenchmarks)

something like this of course a bit blurs the line between testing "correctness" vs testing "quality" because changes in inference can and do change the results of operations. but I think personally speaking I would still associate this more with a "quality" pass than something to be put in unit tests.

it could even still be in CI, just under [Check] or [Allow Fail] rather than [Test]

This PR is a very indirect way of testing inference quality, and I think we'd be better served by directly testing what matters.

I think I tend to disagree with this sentiment, in that why not do both!

@nsajko
Copy link
Contributor Author

nsajko commented Jun 12, 2025

something like this of course a bit blurs the line between testing "correctness" vs testing "quality"

I agree, speaking generally. The issue is that the sysimage and precompilation are really essential features for Julia, otherwise TTFX becomes unbearable. So I believe that sysimage invalidation should block a PR from being merged, thus these tests should be in Julia's test suite.

why not do both

Agree, apart from getting this PR merged, I definitely want more metrics available in CI. Of both the "inference quality" and of "invalidation count" kinds.

@nsajko nsajko force-pushed the test_for_no_invalidations branch from 1266a12 to bd269ba Compare June 14, 2025 08:59
@nsajko nsajko force-pushed the test_for_no_invalidations branch from 6d07c9d to 8967b5e Compare June 17, 2025 19:10
@adienes
Copy link
Member

adienes commented Jun 20, 2025

ref: #47093

@nsajko nsajko force-pushed the test_for_no_invalidations branch from f233ddd to 9c8c8a1 Compare June 21, 2025 02:04
nsajko added a commit to nsajko/julia that referenced this pull request Jun 21, 2025
The `eltype` function was one of the few functions in the sysimage with
a `max_methods` value (the world-splitting threshold) greater than the
default. This was a workaround for the unnecessarily large number of
methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was
increased in PR JuliaLang#48322 to help effect inference.

Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also
allows keeping the `max_methods` for `eltype` at the default value.

The intent here is to guard against unnecessary invalidation of the
sysimage or other precompiled code, which might require decreasing the
`max_methods` value to the natural minimum for many interface functions,
by which I mean "generic function meant to have methods added to it by
package authors". See also: PR JuliaLang#57884.
nsajko added a commit to nsajko/julia that referenced this pull request Jun 21, 2025
The `eltype` function was one of the few functions in the sysimage with
a `max_methods` value (the world-splitting threshold) greater than the
default. This was a workaround for the unnecessarily large number of
methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was
increased in PR JuliaLang#48322 to help effect inference.

Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also
allows keeping the `max_methods` for `eltype` at the default value.

The intent here is to guard against unnecessary invalidation of the
sysimage or other precompiled code, which might require decreasing the
`max_methods` value to the natural minimum for many interface
functions, by which I mean "generic function meant to have methods
added to it by package authors". I intend to approach this issue in
following PRs. See also: PR JuliaLang#57884.

Regarding future work: I consider the "natural minimum" value for
interface functions taking types to be **two**, because the bottom type
subtypes each type. If the interface function doesn't accept types, the
natural minimum should be **one**.
Test that defining certain handpicked type-method pairs (as a user)
doesn't result in any invalidation of the sysimage.

Some of the tests are marked as broken.

Merging this will make the contribution process more demanding: when
someone makes a PR they might need to fix unrelated type stability
issues, or decrease `max_methods` for some function, before being able
to merge their initial PR. However that's better than ignoring the
constant danger of type instability/invalidation regressions making the
sysimage and precompilation less useful.

This change should only be merged while up-to-date with the `master`
branch.

Fixes JuliaLang#47022
@nsajko nsajko force-pushed the test_for_no_invalidations branch from 6c43f23 to bc6ab5c Compare June 21, 2025 14:46
nsajko added a commit to nsajko/julia that referenced this pull request Jun 21, 2025
The `eltype` function was one of the few functions in the sysimage with
a `max_methods` value (the world-splitting threshold) greater than the
default. This was a workaround for the unnecessarily large number of
methods of `eltype(::Type{<:Tuple})`. The `max_methods` value was
increased in PR JuliaLang#48322 to help effect inference.

Reduce the number of `eltype(::Type{<:Tuple})` methods, which then also
allows keeping the `max_methods` for `eltype` at the default value.

The intent here is to guard against unnecessary invalidation of the
sysimage or other precompiled code, which might require decreasing the
`max_methods` value to the natural minimum for many interface
functions, by which I mean "generic function meant to have methods
added to it by package authors". I intend to approach this issue in
following PRs. See also: PR JuliaLang#57884.

Regarding future work: I consider the "natural minimum" value for
interface functions taking types to be **two**, because the bottom type
subtypes each type. If the interface function doesn't accept types, the
natural minimum should be **one**.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalidations speculative Whether the change will be implemented is speculative test This change adds or pertains to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for invalidation fixes
3 participants