Skip to content

Commit 393543b

Browse files
authored
Document compilecache race condition issue and fix (#399)
Addresses #398.
1 parent 33543a7 commit 393543b

File tree

1 file changed

+36
-2
lines changed

1 file changed

+36
-2
lines changed

docs/src/knownissues.md

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,48 @@
11
# Known issues
22

3+
## Julia module precompilation
4+
5+
If multiple MPI ranks trigger Julia's module precompilation, then a race condition can result in an error such as:
6+
```
7+
ERROR: LoadError: IOError: mkdir: file already exists (EEXIST)
8+
Stacktrace:
9+
[1] uv_error at ./libuv.jl:97 [inlined]
10+
[2] mkdir(::String; mode::UInt16) at ./file.jl:177
11+
[3] mkpath(::String; mode::UInt16) at ./file.jl:227
12+
[4] mkpath at ./file.jl:222 [inlined]
13+
[5] compilecache_path(::Base.PkgId) at ./loading.jl:1210
14+
[6] compilecache(::Base.PkgId, ::String) at ./loading.jl:1240
15+
[7] _require(::Base.PkgId) at ./loading.jl:1029
16+
[8] require(::Base.PkgId) at ./loading.jl:927
17+
[9] require(::Module, ::Symbol) at ./loading.jl:922
18+
[10] include(::Module, ::String) at ./Base.jl:377
19+
[11] exec_options(::Base.JLOptions) at ./client.jl:288
20+
[12] _start() at ./client.jl:484
21+
```
22+
23+
See [julia issue #30174](https://github.com/JuliaLang/julia/pull/30174) for more discussion of this problem. There are similar issues with Pkg operations, see [Pkg issue #1219](https://github.com/JuliaLang/Pkg.jl/issues/1219).
24+
25+
This can be worked around be either:
26+
27+
1. Triggering precompilation before launching MPI processes, for example:
28+
29+
```
30+
julia --project -e 'using Pkg; pkg"instantiate"'
31+
julia --project -e 'using Pkg; pkg"precompile"'
32+
mpiexec julia --project script.jl
33+
```
34+
35+
2. Launching julia with the `--compiled-modules=no` option. This can result in much longer package load times.
36+
337
## UCX
438

539
[UCX](https://www.openucx.org/) is a communication framework used by several MPI implementations.
640

741
### Memory cache
842

943
When used with CUDA, UCX intercepts `cudaMalloc` so it can determine whether the pointer passed to MPI is on the host (main memory) or the device (GPU). Unfortunately, there are several known issues with how this works with Julia:
10-
- https://github.com/openucx/ucx/issues/5061
11-
- https://github.com/openucx/ucx/issues/4001 (fixed in UCX v1.7.0)
44+
- [UCX issue #5061](https://github.com/openucx/ucx/issues/5061)
45+
- [UCX issue #4001](https://github.com/openucx/ucx/issues/4001) (fixed in UCX v1.7.0)
1246

1347
By default, MPI.jl disables this by setting
1448
```

0 commit comments

Comments
 (0)