-
Notifications
You must be signed in to change notification settings - Fork 187
Expand precompilation #1015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand precompilation #1015
Conversation
src/precompile.jl
Outdated
precompile(Tuple{typeof(convert), Type{Integer}, PyObject}) | ||
precompile(Tuple{typeof(convert), Type{Float32}, PyObject}) | ||
precompile(Tuple{typeof(convert), Type{Float64}, PyObject}) | ||
precompile(Tuple{typeof(convert), Type{Array{T, N} where N where T}, PyObject}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this one does. Can that be done for all the array types, or does this precompile statement have no effect?
Do you notice any improvement in load times or time-to-first-X? |
It's roughly a 30% improvement in startup time on this quick benchmark which does some simple calls. Code and timings below.
This is the benchmarked code (in Python) from julia import Main as jl
from julia import LinearAlgebra
# Arrays:
x = jl.randn(jl.Float32, 10_000)
y = jl.randn(jl.Float32, 10_000)
# Functions:
LinearAlgebra.dot(x, y)
# String:
jl.s = "test"
# Tuple:
jl.t = (1, 2, 3)
# Dict:
jl.d = {"a": 1, "b": 2} Code:Run with:
from julia import Main as jl
from julia import LinearAlgebra
import time
def f():
# Arrays:
x = jl.randn(jl.Float32, 10_000)
y = jl.randn(jl.Float32, 10_000)
# Functions:
LinearAlgebra.dot(x, y)
# String:
jl.s = "test"
# Tuple:
jl.t = (1, 2, 3)
# Dict:
jl.d = {"a": 1, "b": 2}
start = time.time()
f()
end = time.time()
print(end - start, end=",")
start = time.time()
f()
end = time.time()
print(end - start, end="")
using Statistics
NTRIALS = parse(Int, ARGS[1])
function run_bench()
out = read(`python bench_pycall.py`, String)
out = split(out, ',')
return [parse(Float64, x) for x in out]
end
recordings = []
for i in 1:NTRIALS
push!(recordings, run_bench())
println("Trial $i done, with $(recordings[end])")
end
first_run = median(rec[1] for rec in recordings)
second_run = median(rec[2] for rec in recordings)
print("Median first: $(first_run) s, median second: $(second_run) s") |
I just added more precompilation calls to this PR using PyCall.jl's test suite. I now get a 38% speedup at that startup test. (now a 0.469 s first run, 0.071 s second run). To get the extra precompilation calls, I replaced By the way, I thought a bit more about how we could use SnoopCompile.jl instead, and think it will be difficult, because one needs to call |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1015 +/- ##
==========================================
- Coverage 68.36% 67.21% -1.16%
==========================================
Files 20 20
Lines 2039 2019 -20
==========================================
- Hits 1394 1357 -37
- Misses 645 662 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Looks like |
Thanks, I removed those lines and now it's working on Julia >=1.4. I will set that as the Project.toml bound to be safe. Implemented the CI changes too. Edit: Fixed by #1016. |
Okay it looks like the other errors are unrelated to this PR - venv test broken on master too: https://github.com/MilesCranmer/PyCall.jl/actions/runs/3606216417/jobs/6077346109. Let me know if the PR is good to go. |
@stevengj do you think this could be merged and released? It would be awesome to take full advantage of Julia 1.9 startup times on PyJulia-based apps. |
Should be auto merged when tests complete. |
I think auto-merge doesn't work when the CI is changed (Julia 1.0 was removed from the testing). The rest look like they pass though. |
Not sure why the tests are waiting forever to run? Seems like an unrelated CI problem. |
I think when a PR changes the CI settings (here, removing one of the Julia versions), then the other jobs are left as "waiting to complete." I've seen this on other PRs too. I think it is simply to prevent auto-merging when CI is accidentally turned off. |
This adds more extensive precompilation.
Eventually it would be better to switch to SnoopCompile.jl, but for some reason I kept running into segfaults with it when trying to snoop compile some example code. I think using SnoopCompile.jl with calls to the Python interpreter triggers some unexpected behavior, and will take some fixing before it works.
So in the meantime, this is something to improve the results of precompilation. As noted in MilesCranmer/PySR#230, it seems like PyCall.jl is a source of slow startup for Python-Julia sysimages, so these additional precompile statements should help quite a bit.
It was generated with
--trace-compile=stderr
on a run of PySR and then manually parsed down and sorted. Thus it does not hit all functions by any means, but the precompilation statements are general and should help any PyCall.jl package.Let me know what you think.
Cheers,
Miles