You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2020/05/invalidations.md
+36-35Lines changed: 36 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@
4
4
@def rss_pubdate = Date(2020, 5, 11)
5
5
@def rss = """Julia is fast, but compiling Julia code takes time. This post analyzes why it's sometimes necessary to repeat that work, and what might be done to fix it."""
6
6
7
+
\toc
7
8
8
9
[The Julia programming language][Julia] has wonderful flexibility with types, and this allows you to combine packages in unanticipated ways to solve new kinds of problems.
9
10
Crucially, it achieves this flexibility without sacrificing performance.
@@ -24,7 +25,7 @@ Recently I got interested in a specific source of this latency, and this blog po
24
25
25
26
When Julia compiles a method for specific types, it saves the resulting code
26
27
so that it can be used by any caller.
27
-
This is crucial to performance, because it means generally compilation has to be done only once for a specific type.
28
+
This is crucial to performance, because it means that compilation generally has to be done only once for a specific type.
28
29
To keep things really simple, I'm going to use a very artificial example.
29
30
30
31
```
@@ -146,8 +147,7 @@ However, if you try `@code_typed applyf(c)` again, you'll notice something curio
146
147
Julia has gone to the trouble to create a new-and-improved implementation of `applyf`,
147
148
one which also union-splits for `String`.
148
149
This brings us to the topic of this blog post: the old compiled method has been *invalidated*.
149
-
Given new information---which here comes from defining or loading new methods---
150
-
Julia changes its mind about how things should be implemented,
150
+
Given new information--which here comes from defining or loading new methods--Julia changes its mind about how things should be implemented,
151
151
and this forces Julia to recompile `applyf`.
152
152
153
153
If you add fourth and fifth methods,
@@ -173,14 +173,14 @@ CodeInfo(
173
173
174
174
There are now so many possibilities that Julia just gives up and
175
175
uses "runtime dispatch" to decide what method of `f` to call.
176
-
It doesn't even try to enforce the fact that `f` returns and`Int`,
176
+
It doesn't even try to enforce the fact that `f` returns an`Int`,
177
177
in part because determining such facts takes time (adding to compiler latency)
178
178
and because functions with many methods typically tend to return multiple types
179
179
anyway.
180
180
181
181
Compiling each of these new implementations takes JIT-time.
182
182
If Julia knew in advance that you'd arrive at this place, it would never have bothered to produce that first, heavily-optimized version of `applyf`.
183
-
But the performance benefits of such optimizations are so large that, when applicable, they are well worth it.
183
+
But the performance benefits of such optimizations are so large that, when applicable, they can be well worth it.
184
184
For example, if you start a fresh Julia session and just define the `f(::Int)`
185
185
and `f(::Bool)` methods, then
186
186
@@ -214,9 +214,9 @@ If method invalidation happens often, this might contribute to making Julia "fee
214
214
Unfortunately, method invalidation is pretty common.
215
215
First, let's get some baseline statistics.
216
216
Using the [MethodAnalysis] package (which is at a very early stage of development
217
-
at the time of this writing), you can find out that a fresh Julia session
218
-
(albeit one that has loaded the MethodAnalysis package and used it to perform some analysis) has almost 50,000 `MethodInstance`s tucked away in its cache.
217
+
at the time of this writing), you can find out that a fresh Julia session has almost 50,000 `MethodInstance`s tucked away in its cache.
219
218
These are mostly for `Base` and the standard libraries.
219
+
(There are some additional `MethodInstance`s that get created to load the MethodAnalysis package and do this analysis, but these are surely a very small fraction of the total.)
220
220
221
221
Using some not-yet merged work in both Julia itself and [SnoopCompile], we can count the number of invalidations when we load various packages into a fresh Julia session:
222
222
@@ -237,8 +237,8 @@ Using some not-yet merged work in both Julia itself and [SnoopCompile], we can c
237
237
| DifferentialEquations | 6.13.0 | 6777 |
238
238
239
239
You can see that key packages used by large portions of the Julia ecosystem invalidate
240
-
hundreds or thousands of MethodInstances, sometimes more than 10% of the total
241
-
number of MethodInstances present before loading the package.
240
+
hundreds or thousands of `MethodInstance`s, sometimes more than 10% of the total
241
+
number of `MethodInstance`s present before loading the package.
`@snoopr` turns on some debugging code inside Julia, and then executes the supplied statment;
346
346
it returns a fairly opaque list that can be parsed by `invalidation_trees`.
347
-
Entries in the returned array correspond to method additions (or deletions, if relevant) that trigger one or more invalidations.
347
+
Entries in the array returned by `invalidation_trees` correspond to method additions (or deletions, if relevant) that trigger one or more invalidations.
348
348
In this case, the output means that the new `f(x::String)` method triggered an invalidation of `applyf(::Array{Any,1})`,
349
349
due to intersection with the signature `f(::Any)`.
350
350
`(0 children)` means that `applyf(::Vector{Any})` does not yet have any methods that called it and which in turn need to be invalidated.
351
-
Finally, `more specific` (which is printed in cyan) indicate that the new method was strictly more specific than the one that got invalidated.
351
+
Finally, `more specific` (which is printed in cyan) indicate that the new method `f(::String)`was strictly more specific than the signature `f(::Any)` used by the `applyf``MethodInstance` that got invalidated.
352
352
353
353
As we mentioned above, there are good reasons to think this invalidation is "necessary," meaning that it is an unavoidable consequence of the choices made to optimize runtime performance while also allowing one to dynamically extend functions.
354
354
However, that doesn't mean there is nothing that you, as a developer, could do to eliminate this invalidation.
355
355
Perhaps there is no real need to ever call `applyf` with a `Vector{Any}`;
356
356
perhaps you can fix one of its upstream callers to supply a concretely-type vector.
357
+
Or perhaps you could define more `f` methods at the outset, so that Julia has a better understanding of the different types that `applyf` needs to handle.
357
358
In some cases, though, you might really need to call `applyf` with a `Vector{Any}`, in which case the best choice is to accept this invalidation as necessary and move on.
358
359
359
360
### New methods with ambiguous specificity
@@ -389,7 +390,7 @@ julia> trees = invalidation_trees(@snoopr using FixedPointNumbers)
389
390
```
390
391
391
392
This list is ordered from least- to most-consequential in terms of total number of invalidations.
392
-
The final entry, for `(::Type{X})(x::Real) where X<:FixedPoint`, triggered the invalidation of what nominally appear to be more than 350 MethodInstances.
393
+
The final entry, for `(::Type{X})(x::Real) where X<:FixedPoint`, triggered the invalidation of what nominally appear to be more than 350 `MethodInstance`s.
393
394
(There is no guarantee that these methods are all disjoint from one another;
394
395
the results are represented as a tree, where each node links to its callers.)
395
396
In contrast, the first entry is responsible for just two invalidations.
@@ -450,8 +451,8 @@ Type{Union{}}
450
451
451
452
which shows that there is one Type, the "empty Type", that lies in their intersection.
452
453
453
-
There are good reasons to believe that the right way to fix such methods is to exclude ambiguous pairs from invalidation---if it were to be called by the compiled code, it would trigger an error anyway.
454
-
If such a change gets made to Julia, then all the ones marked "ambiguous" should magically disappear.
454
+
There are good reasons to believe that the right way to fix such methods is to exclude ambiguous pairs from invalidation--if it were to be called by the compiled code, it would trigger an error anyway.
455
+
If this gets changed in Julia, then all the ones marked "ambiguous" should magically disappear.
455
456
Consequently, we can turn our attention to other cases.
456
457
457
458
Let's look at the next item up the list:
@@ -537,9 +538,9 @@ In the statistics below, we'll lump partial specialization in with ambiguity.
537
538
538
539
### Some summary statistics
539
540
540
-
Let's go back to our table above, and augment it with "sources" of invalidation:
541
+
Let's go back to our table above, and count the number of invalidations in each of these categories:
@@ -565,23 +566,22 @@ However, it appears that there will need to be a second round in which package d
565
566
You may have noticed that two packages, `Example` and `Revise`, trigger far fewer invalidations that the rest of the packages in our analysis.
566
567
`Example` is quite trivial, but `Revise` and its dependencies are quite large.
567
568
How does it avoid this problem?
568
-
First, Revise does not extending very many Base methods;
569
-
most of its methods are to functions it "owns," and the same is true for its dependencies.
569
+
First, Revise does not extend very many Base methods;
570
+
most of its methods are for functions it "owns," and the same is true for its dependencies.
570
571
Second, in the closing days of Julia 1.5's merge window,
571
572
Revise (and Julia) underwent a process of tracking down invalidations and eliminating them;
572
573
for comparison, on Julia 1.4, Revise triggers more than a 1000 non-unique invalidations.
573
574
The success of this effort gives one hope that other packages too may one day have fewer invalidations.
574
575
575
576
As stated above, there is reason to hope that most of the invalidations marked as "ambiguous" will be fixed by changes to Julia's compiler.
576
-
Here our focus is on those marked "more specific," since those are cases where it is hard to imagine a generic fix.
577
+
Here our focus is on those marked "more specific," since those are cases where it is harder to imagine a generic fix.
577
578
578
-
### Fixing a case of type-instability
579
+
### Fixing type instabilities
579
580
580
-
In engineering Julia and Revise to reduce invalidations, at least two cases were fixed by resolving a type-instability.
581
+
In engineering Julia and Revise to reduce invalidations, at least two cases were fixed by resolving type-instabilities.
581
582
For example, one set of invalidations happened because `CodeTracking`, a dependency of Revise's, defines new methods for `Base.PkgId`.
582
-
It turns out that this triggered an invalidation of `_tryrequire_from_serialized`, which is used to load packages;
583
-
a negative consequence is that Revise introduced a slight latency upon loading the *next* package.
584
-
However, it turned out to be an easy fix: one section of `_tryrequire_from_serialized` had a passage
583
+
It turns out that this triggered an invalidation of `_tryrequire_from_serialized`, which is used to load packages.
584
+
Fortunately, it turned out to be an easy fix: one section of `_tryrequire_from_serialized` had a passage
585
585
586
586
```
587
587
for M in mod::Vector{Any}
@@ -601,9 +601,11 @@ It sufficed to add
601
601
immediately after the `for` statement to fix the problem.
602
602
Not only does this fix the invalidation, but it lets the compiler generate better code.
603
603
604
-
The other case was similar: a call from `Pkg` of `keys` on an AbstractDict of unknown type
605
-
(due to a higher `@nospecialize` call).
606
-
Replacing `keys(dct)` with `Base.KeySet(dct)` (which is the default consequence of calling `keys`) eliminated a very consequential invalidation, one that triggered seconds-long latencies in the next `Pkg` command after loading Revise.
604
+
The other case was a call from `Pkg` of `keys` on an AbstractDict of unknown type
605
+
(due to a caller's `@nospecialize` annotation).
606
+
Replacing `keys(dct)` with `Base.KeySet(dct)` (which is the default return value of `keys`) eliminated a very consequential invalidation, one that triggered seconds-long latencies in the next `Pkg` command after loading Revise.
607
+
The benefits of this change in Pkg's code went far beyond helping Revise; any package depending on the OrderedCollections package (which is a dependency of Revise and what actually triggered the invalidation) got the same benefit.
608
+
With these and a few other relatively simple changes, loading Revise no longer forces Julia to recompile much of Pkg's code the next time you try to update packages.
607
609
608
610
### Redirecting call chains
609
611
@@ -627,7 +629,7 @@ If you look up this definition, you'll see it's
627
629
reduce_empty(op, T) = _empty_reduce_error()
628
630
```
629
631
630
-
which indicates that it is the fallback method for reducing over an empty collection, and indeed calling this results in an error:
632
+
which indicates that it is the fallback method for reducing over an empty collection, and as you might expect from the name, calling it results in an error:
631
633
632
634
```julia-repl
633
635
julia> op = Base.BottomRF(Base.max)
@@ -642,7 +644,7 @@ Stacktrace:
642
644
[4] top-level scope at REPL[36]:1
643
645
```
644
646
645
-
This essentially means that no "identity element" has been defined for this operation and type.
647
+
This essentially means that no "neutral element" has been defined for this operation and type.
646
648
647
649
Can we avoid this fallback?
648
650
One approach is to define the method directly: modify Julia to add
MethodInstance for reduce_empty_iter(::Base.BottomRF{typeof(max)}, ::Set{VersionNumber}, ::Base.HasEltype) at depth 1 with 38 children
679
681
```
680
682
681
-
We can display the whole tree using `show(node)`:
683
+
We can display this whole branch of the tree using `show(node)`:
682
684
683
685
```julia-repl
684
686
julia> show(node)
@@ -751,8 +753,8 @@ Julia's remarkable flexibility and outstanding code-generation open many new hor
751
753
These advantages come with a few costs, and here we've explored one of them, method invalidation.
752
754
While Julia's core developers have been aware of its cost for a long time,
753
755
we're only now starting to get tools to analyze it in a manner suitable for a larger population of users and developers.
754
-
Because it's not been easy to measure previously, it would not be surprising if there are numerous opportunities to reduce it, waiting to be discovered.
755
-
One might hope that the next period of development might see significant strides in new ways of getting packages to work together without stomping on each other's toes.
756
+
Because it's not been easy to measure previously, it would not be surprising if there are numerous opportunities for improvement waiting to be discovered.
757
+
One might hope that the next period of development might see significant improvement in getting packages to work together without stomping on each other's toes.
0 commit comments