-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
compiler:codegenGeneration of LLVM IR and native codeGeneration of LLVM IR and native code
Description
I recently observed a deadlock, that seems to occur when we attempt to JIT compile a function during the emission of Julia code.
LLVM.jl installs a error handler that roughly looks like this:
function handle_error(reason::Cstring)
throw(LLVMException(unsafe_string(reason)))
end
function _install_handlers()
handler = @cfunction(handle_error, Cvoid, (Cstring,))
ccall((:LLVMInstallFatalErrorHandler, libllvm), Cvoid, (Ptr{Cvoid},), handler)
end
Using the profiler to get a backtrace:
cmd: /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/julia 18641 running 2 of 2
signal (10): User defined signal 1
unknown function (ip: 0x7c1c1496f10e)
pthread_mutex_lock at /usr/lib/libc.so.6 (unknown line)
__gthread_mutex_lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/x86_64-linux-gnu/bits/gthr-default.h:749 [inlined]
__gthread_recursive_mutex_lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/x86_64-linux-gnu/bits/gthr-default.h:811 [inlined]
lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/mutex:106 [inlined]
lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/bits/unique_lock.h:141 [inlined]
unique_lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/bits/unique_lock.h:71 [inlined]
Lock at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:42 [inlined]
getLock at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:69
jl_codegen_params_t at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jitlayers.h:258 [inlined]
_jl_compile_codeinst at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jitlayers.cpp:213
jl_generate_fptr_impl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jitlayers.cpp:528
jl_compile_method_internal at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gf.c:2534 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gf.c:2421
_jl_invoke at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gf.c:2938 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/gf.c:3123
handle_error at /home/vchuravy/.julia/packages/LLVM/bzSzE/src/core/context.jl:168
jfptr_handle_error_5213 at /home/vchuravy/.julia/compiled/v1.11/LLVM/e8NBy_INkA2.so (unknown line)
jlcapi_handle_error_5773 at /home/vchuravy/.julia/compiled/v1.11/LLVM/e8NBy_INkA2.so (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.0 at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.0 at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/vchuravy/.julia/juliaup/julia-1.11.0-beta1+0.x64.linux.gnu/bin/../lib/julia/libLLVM-16jl.so (unknown line)
add_output_impl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1171
operator() at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1477
operator() at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/bits/std_function.h:690 [inlined]
lambda_trampoline at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1347
unknown function (ip: 0x7c1c14972559)
unknown function (ip: 0x7c1c149efa3b)
unknown function (ip: (nil))
unknown function (ip: 0x7c1c1496eebc)
unknown function (ip: 0x7c1c149740e2)
uv_thread_join at /workspace/srcdir/libuv/src/unix/thread.c:294
add_output<jl_dump_native_impl(void*, char const*, char const*, char const*, char const*, ios_t*, ios_t*, jl_emission_params_t*)::<lambda(llvm::Module&)> > at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1485
operator()<jl_dump_native_impl(void*, char const*, char const*, char const*, char const*, ios_t*, ios_t*, jl_emission_params_t*)::<lambda(llvm::Module&)> > at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1645 [inlined]
jl_dump_native_impl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/aotcompile.cpp:1790
ijl_write_compiler_output at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/precompile.c:168
ijl_atexit_hook at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/init.c:285
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1060
main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x7c1c1490cccf)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
unknown function (ip: (nil))
My hypothesis is that the two locks involved are:
Lock at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:42
and
Lines 1785 to 1786 in 08e1fc0
auto lock = TSCtx.getLock(); | |
auto dataM = data->M.getModuleUnlocked(); |
and that we end up re-using the context and therefore the lock.
@pchintalapudi any thoughts?
Metadata
Metadata
Assignees
Labels
compiler:codegenGeneration of LLVM IR and native codeGeneration of LLVM IR and native code