Skip to content

Commit 72bc02c

Browse files
committed
Add table of contents and polish
1 parent c16bd67 commit 72bc02c

File tree

1 file changed

+36
-35
lines changed

1 file changed

+36
-35
lines changed

blog/2020/05/invalidations.md

Lines changed: 36 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
@def rss_pubdate = Date(2020, 5, 11)
55
@def rss = """Julia is fast, but compiling Julia code takes time. This post analyzes why it's sometimes necessary to repeat that work, and what might be done to fix it."""
66

7+
\toc
78

89
[The Julia programming language][Julia] has wonderful flexibility with types, and this allows you to combine packages in unanticipated ways to solve new kinds of problems.
910
Crucially, it achieves this flexibility without sacrificing performance.
@@ -24,7 +25,7 @@ Recently I got interested in a specific source of this latency, and this blog po
2425

2526
When Julia compiles a method for specific types, it saves the resulting code
2627
so that it can be used by any caller.
27-
This is crucial to performance, because it means generally compilation has to be done only once for a specific type.
28+
This is crucial to performance, because it means that compilation generally has to be done only once for a specific type.
2829
To keep things really simple, I'm going to use a very artificial example.
2930

3031
```
@@ -146,8 +147,7 @@ However, if you try `@code_typed applyf(c)` again, you'll notice something curio
146147
Julia has gone to the trouble to create a new-and-improved implementation of `applyf`,
147148
one which also union-splits for `String`.
148149
This brings us to the topic of this blog post: the old compiled method has been *invalidated*.
149-
Given new information---which here comes from defining or loading new methods---
150-
Julia changes its mind about how things should be implemented,
150+
Given new information--which here comes from defining or loading new methods--Julia changes its mind about how things should be implemented,
151151
and this forces Julia to recompile `applyf`.
152152

153153
If you add fourth and fifth methods,
@@ -173,14 +173,14 @@ CodeInfo(
173173

174174
There are now so many possibilities that Julia just gives up and
175175
uses "runtime dispatch" to decide what method of `f` to call.
176-
It doesn't even try to enforce the fact that `f` returns and `Int`,
176+
It doesn't even try to enforce the fact that `f` returns an `Int`,
177177
in part because determining such facts takes time (adding to compiler latency)
178178
and because functions with many methods typically tend to return multiple types
179179
anyway.
180180

181181
Compiling each of these new implementations takes JIT-time.
182182
If Julia knew in advance that you'd arrive at this place, it would never have bothered to produce that first, heavily-optimized version of `applyf`.
183-
But the performance benefits of such optimizations are so large that, when applicable, they are well worth it.
183+
But the performance benefits of such optimizations are so large that, when applicable, they can be well worth it.
184184
For example, if you start a fresh Julia session and just define the `f(::Int)`
185185
and `f(::Bool)` methods, then
186186

@@ -214,9 +214,9 @@ If method invalidation happens often, this might contribute to making Julia "fee
214214
Unfortunately, method invalidation is pretty common.
215215
First, let's get some baseline statistics.
216216
Using the [MethodAnalysis] package (which is at a very early stage of development
217-
at the time of this writing), you can find out that a fresh Julia session
218-
(albeit one that has loaded the MethodAnalysis package and used it to perform some analysis) has almost 50,000 `MethodInstance`s tucked away in its cache.
217+
at the time of this writing), you can find out that a fresh Julia session has almost 50,000 `MethodInstance`s tucked away in its cache.
219218
These are mostly for `Base` and the standard libraries.
219+
(There are some additional `MethodInstance`s that get created to load the MethodAnalysis package and do this analysis, but these are surely a very small fraction of the total.)
220220

221221
Using some not-yet merged work in both Julia itself and [SnoopCompile], we can count the number of invalidations when we load various packages into a fresh Julia session:
222222

@@ -237,8 +237,8 @@ Using some not-yet merged work in both Julia itself and [SnoopCompile], we can c
237237
| DifferentialEquations | 6.13.0 | 6777 |
238238

239239
You can see that key packages used by large portions of the Julia ecosystem invalidate
240-
hundreds or thousands of MethodInstances, sometimes more than 10% of the total
241-
number of MethodInstances present before loading the package.
240+
hundreds or thousands of `MethodInstance`s, sometimes more than 10% of the total
241+
number of `MethodInstance`s present before loading the package.
242242

243243
## How serious is method invalidation?
244244

@@ -328,13 +328,13 @@ function applyf(container)
328328
end
329329
c = Any[1, false];
330330
applyf(c)
331-
332-
using SnoopCompile
333331
```
334332

335333
Then,
336334

337335
```julia-repl
336+
julia> using SnoopCompile
337+
338338
julia> invalidation_trees(@snoopr f(x::String) = 3)
339339
1-element Array{SnoopCompile.MethodInvalidations,1}:
340340
insert f(x::String) in Main at REPL[7]:1 invalidated:
@@ -344,16 +344,17 @@ julia> invalidation_trees(@snoopr f(x::String) = 3)
344344
Let's walk through this output a bit.
345345
`@snoopr` turns on some debugging code inside Julia, and then executes the supplied statment;
346346
it returns a fairly opaque list that can be parsed by `invalidation_trees`.
347-
Entries in the returned array correspond to method additions (or deletions, if relevant) that trigger one or more invalidations.
347+
Entries in the array returned by `invalidation_trees` correspond to method additions (or deletions, if relevant) that trigger one or more invalidations.
348348
In this case, the output means that the new `f(x::String)` method triggered an invalidation of `applyf(::Array{Any,1})`,
349349
due to intersection with the signature `f(::Any)`.
350350
`(0 children)` means that `applyf(::Vector{Any})` does not yet have any methods that called it and which in turn need to be invalidated.
351-
Finally, `more specific` (which is printed in cyan) indicate that the new method was strictly more specific than the one that got invalidated.
351+
Finally, `more specific` (which is printed in cyan) indicate that the new method `f(::String)` was strictly more specific than the signature `f(::Any)` used by the `applyf` `MethodInstance` that got invalidated.
352352

353353
As we mentioned above, there are good reasons to think this invalidation is "necessary," meaning that it is an unavoidable consequence of the choices made to optimize runtime performance while also allowing one to dynamically extend functions.
354354
However, that doesn't mean there is nothing that you, as a developer, could do to eliminate this invalidation.
355355
Perhaps there is no real need to ever call `applyf` with a `Vector{Any}`;
356356
perhaps you can fix one of its upstream callers to supply a concretely-type vector.
357+
Or perhaps you could define more `f` methods at the outset, so that Julia has a better understanding of the different types that `applyf` needs to handle.
357358
In some cases, though, you might really need to call `applyf` with a `Vector{Any}`, in which case the best choice is to accept this invalidation as necessary and move on.
358359

359360
### New methods with ambiguous specificity
@@ -389,7 +390,7 @@ julia> trees = invalidation_trees(@snoopr using FixedPointNumbers)
389390
```
390391

391392
This list is ordered from least- to most-consequential in terms of total number of invalidations.
392-
The final entry, for `(::Type{X})(x::Real) where X<:FixedPoint`, triggered the invalidation of what nominally appear to be more than 350 MethodInstances.
393+
The final entry, for `(::Type{X})(x::Real) where X<:FixedPoint`, triggered the invalidation of what nominally appear to be more than 350 `MethodInstance`s.
393394
(There is no guarantee that these methods are all disjoint from one another;
394395
the results are represented as a tree, where each node links to its callers.)
395396
In contrast, the first entry is responsible for just two invalidations.
@@ -450,8 +451,8 @@ Type{Union{}}
450451

451452
which shows that there is one Type, the "empty Type", that lies in their intersection.
452453

453-
There are good reasons to believe that the right way to fix such methods is to exclude ambiguous pairs from invalidation---if it were to be called by the compiled code, it would trigger an error anyway.
454-
If such a change gets made to Julia, then all the ones marked "ambiguous" should magically disappear.
454+
There are good reasons to believe that the right way to fix such methods is to exclude ambiguous pairs from invalidation--if it were to be called by the compiled code, it would trigger an error anyway.
455+
If this gets changed in Julia, then all the ones marked "ambiguous" should magically disappear.
455456
Consequently, we can turn our attention to other cases.
456457

457458
Let's look at the next item up the list:
@@ -537,9 +538,9 @@ In the statistics below, we'll lump partial specialization in with ambiguity.
537538

538539
### Some summary statistics
539540

540-
Let's go back to our table above, and augment it with "sources" of invalidation:
541+
Let's go back to our table above, and count the number of invalidations in each of these categories:
541542

542-
| Package | greater specificity | lesser specificity | ambiguity |
543+
| Package | more specific | less specific | ambiguous |
543544
|:------- | ------------------:| --------:| -----:|
544545
| Example | 0 | 0 | 0 | 0 |
545546
| Revise | 6 | 0 | 0 |
@@ -565,23 +566,22 @@ However, it appears that there will need to be a second round in which package d
565566
You may have noticed that two packages, `Example` and `Revise`, trigger far fewer invalidations that the rest of the packages in our analysis.
566567
`Example` is quite trivial, but `Revise` and its dependencies are quite large.
567568
How does it avoid this problem?
568-
First, Revise does not extending very many Base methods;
569-
most of its methods are to functions it "owns," and the same is true for its dependencies.
569+
First, Revise does not extend very many Base methods;
570+
most of its methods are for functions it "owns," and the same is true for its dependencies.
570571
Second, in the closing days of Julia 1.5's merge window,
571572
Revise (and Julia) underwent a process of tracking down invalidations and eliminating them;
572573
for comparison, on Julia 1.4, Revise triggers more than a 1000 non-unique invalidations.
573574
The success of this effort gives one hope that other packages too may one day have fewer invalidations.
574575

575576
As stated above, there is reason to hope that most of the invalidations marked as "ambiguous" will be fixed by changes to Julia's compiler.
576-
Here our focus is on those marked "more specific," since those are cases where it is hard to imagine a generic fix.
577+
Here our focus is on those marked "more specific," since those are cases where it is harder to imagine a generic fix.
577578

578-
### Fixing a case of type-instability
579+
### Fixing type instabilities
579580

580-
In engineering Julia and Revise to reduce invalidations, at least two cases were fixed by resolving a type-instability.
581+
In engineering Julia and Revise to reduce invalidations, at least two cases were fixed by resolving type-instabilities.
581582
For example, one set of invalidations happened because `CodeTracking`, a dependency of Revise's, defines new methods for `Base.PkgId`.
582-
It turns out that this triggered an invalidation of `_tryrequire_from_serialized`, which is used to load packages;
583-
a negative consequence is that Revise introduced a slight latency upon loading the *next* package.
584-
However, it turned out to be an easy fix: one section of `_tryrequire_from_serialized` had a passage
583+
It turns out that this triggered an invalidation of `_tryrequire_from_serialized`, which is used to load packages.
584+
Fortunately, it turned out to be an easy fix: one section of `_tryrequire_from_serialized` had a passage
585585

586586
```
587587
for M in mod::Vector{Any}
@@ -601,9 +601,11 @@ It sufficed to add
601601
immediately after the `for` statement to fix the problem.
602602
Not only does this fix the invalidation, but it lets the compiler generate better code.
603603

604-
The other case was similar: a call from `Pkg` of `keys` on an AbstractDict of unknown type
605-
(due to a higher `@nospecialize` call).
606-
Replacing `keys(dct)` with `Base.KeySet(dct)` (which is the default consequence of calling `keys`) eliminated a very consequential invalidation, one that triggered seconds-long latencies in the next `Pkg` command after loading Revise.
604+
The other case was a call from `Pkg` of `keys` on an AbstractDict of unknown type
605+
(due to a caller's `@nospecialize` annotation).
606+
Replacing `keys(dct)` with `Base.KeySet(dct)` (which is the default return value of `keys`) eliminated a very consequential invalidation, one that triggered seconds-long latencies in the next `Pkg` command after loading Revise.
607+
The benefits of this change in Pkg's code went far beyond helping Revise; any package depending on the OrderedCollections package (which is a dependency of Revise and what actually triggered the invalidation) got the same benefit.
608+
With these and a few other relatively simple changes, loading Revise no longer forces Julia to recompile much of Pkg's code the next time you try to update packages.
607609

608610
### Redirecting call chains
609611

@@ -627,7 +629,7 @@ If you look up this definition, you'll see it's
627629
reduce_empty(op, T) = _empty_reduce_error()
628630
```
629631

630-
which indicates that it is the fallback method for reducing over an empty collection, and indeed calling this results in an error:
632+
which indicates that it is the fallback method for reducing over an empty collection, and as you might expect from the name, calling it results in an error:
631633

632634
```julia-repl
633635
julia> op = Base.BottomRF(Base.max)
@@ -642,7 +644,7 @@ Stacktrace:
642644
[4] top-level scope at REPL[36]:1
643645
```
644646

645-
This essentially means that no "identity element" has been defined for this operation and type.
647+
This essentially means that no "neutral element" has been defined for this operation and type.
646648

647649
Can we avoid this fallback?
648650
One approach is to define the method directly: modify Julia to add
@@ -678,7 +680,7 @@ julia> node = invtree.children[1]
678680
MethodInstance for reduce_empty_iter(::Base.BottomRF{typeof(max)}, ::Set{VersionNumber}, ::Base.HasEltype) at depth 1 with 38 children
679681
```
680682

681-
We can display the whole tree using `show(node)`:
683+
We can display this whole branch of the tree using `show(node)`:
682684

683685
```julia-repl
684686
julia> show(node)
@@ -751,8 +753,8 @@ Julia's remarkable flexibility and outstanding code-generation open many new hor
751753
These advantages come with a few costs, and here we've explored one of them, method invalidation.
752754
While Julia's core developers have been aware of its cost for a long time,
753755
we're only now starting to get tools to analyze it in a manner suitable for a larger population of users and developers.
754-
Because it's not been easy to measure previously, it would not be surprising if there are numerous opportunities to reduce it, waiting to be discovered.
755-
One might hope that the next period of development might see significant strides in new ways of getting packages to work together without stomping on each other's toes.
756+
Because it's not been easy to measure previously, it would not be surprising if there are numerous opportunities for improvement waiting to be discovered.
757+
One might hope that the next period of development might see significant improvement in getting packages to work together without stomping on each other's toes.
756758

757759
[Julia]: https://julialang.org/
758760
[union-splitting]: https://julialang.org/blog/2018/08/union-splitting/
@@ -761,4 +763,3 @@ One might hope that the next period of development might see significant strides
761763
[PRJulia]: https://github.com/JuliaLang/julia/pull/35768
762764
[PRSC]: https://github.com/timholy/SnoopCompile.jl/pull/79
763765
[method ambiguity]: https://docs.julialang.org/en/latest/manual/methods/#man-ambiguities-1
764-
[sentinel]: https://en.wikipedia.org/wiki/Sentinel_value

0 commit comments

Comments
 (0)