From 94cadce98af265e7fa058719f74b8cdad770bcca Mon Sep 17 00:00:00 2001 From: Valentin Churavy Date: Mon, 31 Mar 2025 15:50:37 +0200 Subject: [PATCH 1/2] add note about CUDA.jl own use of threads --- docs/src/knownissues.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/docs/src/knownissues.md b/docs/src/knownissues.md index 660651dfc..a73687a46 100644 --- a/docs/src/knownissues.md +++ b/docs/src/knownissues.md @@ -145,6 +145,26 @@ before calling `mpiexec`. ## CUDA-aware MPI +[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) starts its own internal threads, thus the notes above apply. +In particular you may see an error like: + +``` +Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x151a22497008) +==== backtrace (tid: 226485) ==== + 0 0x000000000003e730 __GI___sigaction() :0 + 1 0x0000000001db5d4c julia_multiq_check_empty_75441() ./partr.jl:186 + 2 0x000000000045dab8 jfptr_multiq_check_empty_75442.1() :0 + 3 0x000000000004706e _jl_invoke() /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-10/src/gf.c:2895 + 4 0x000000000009888e check_empty() /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-10/src/partr.c:348 + 5 0x0000000002205ca8 julia_poptask_75610() ./task.jl:999 + 6 0x0000000002205ca8 julia_poptask_75610() ./task.jl:1001 + 7 0x0000000000b3b5c2 julia_wait_74913() ./task.jl:1008 + 8 0x0000000001be39e3 julia_#wait#645_74932() ./condition.jl:130 + 9 0x000000000004706e _jl_invoke() /cache/build/builder-amdci5-6/julialang/julia-release-1-dot-10/src/gf.c:2895 +10 0x0000000000035ca1 jlcapi_synchronization_worker_11948() text:0 +================================= +``` + ### Memory pool Using CUDA-aware MPI on multi-GPU nodes with recent CUDA.jl may trigger (see [here](https://github.com/JuliaGPU/CUDA.jl/issues/1053#issue-946826096)) From 3e92e126ed7530cddfeb4f37229b2f7c666a54b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mos=C3=A8=20Giordano?= <765740+giordano@users.noreply.github.com> Date: Mon, 31 Mar 2025 16:00:12 +0200 Subject: [PATCH 2/2] Update docs/src/knownissues.md Co-authored-by: Michael Schlottke-Lakemper --- docs/src/knownissues.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/knownissues.md b/docs/src/knownissues.md index a73687a46..155688ece 100644 --- a/docs/src/knownissues.md +++ b/docs/src/knownissues.md @@ -145,7 +145,7 @@ before calling `mpiexec`. ## CUDA-aware MPI -[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) starts its own internal threads, thus the notes above apply. +[CUDA.jl](https://github.com/JuliaGPU/CUDA.jl) starts its own internal threads, thus the notes in the section about [Multi-threading and signal handling](@ref) apply. In particular you may see an error like: ```