-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello, thanks for the great work!
I'm currently trying to wrap a pytorch model into a Flux based training setup.
The training seems to go fine for a few epochs, however seemingly at random, a segmentation fault occurs (see below).
I don't have a great MWE right now (I'll try to make one still), but perhaps we can already make some conclusions based on the stacktrace, which here happened after about seven epochs:
[56770] signal (11.1): Segmentation fault
in expression starting at /home/romeo/Documents/Stanford/google_ood/DisentanglingVAE.jl/scripts/vae_CUB.jl:213
PyErr_Occurred at /usr/lib/libpython3.10.so.1.0 (unknown line)
pyerr_occurred at /home/romeo/.julia/packages/PyCall/twYvK/src/exception.jl:69 [inlined]
pyerr_check at /home/romeo/.julia/packages/PyCall/twYvK/src/exception.jl:75 [inlined]
############# LOOK HERE vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
share at /home/romeo/.julia/packages/DLPack/SUhao/src/pycall.jl:109
#13 at /home/romeo/.julia/packages/PyCallChainRules/YR5iR/src/pytorch.jl:59
#########################^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
unknown function (ip: 0x7ff3e1725d52)
map at ./tuple.jl:292
unknown function (ip: 0x7ff3e1723e23)
_jl_invoke at /cache/build/default-amdci4-7/julialang/julia-release-1-dot-9/src/gf.c:2681 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-7/julialang/julia-release-1-dot-9/src/gf.c:2863
#rrule#12 at /home/romeo/.julia/packages/PyCallChainRules/YR5iR/src/pytorch.jl:59
rrule at /home/romeo/.julia/packages/PyCallChainRules/YR5iR/src/pytorch.jl:56 [inlined]
rrule at /home/romeo/.julia/packages/ChainRulesCore/a4mIA/src/rules.jl:134 [inlined]
chain_rrule at /home/romeo/.julia/packages/Zygote/xGkZ5/src/compiler/chainrules.jl:218 [inlined]
macro expansion at /home/romeo/.julia/packages/Zygote/xGkZ5/src/compiler/interface2.jl:0 [inlined]
_pullback at /home/romeo/.julia/packages/Zygote/xGkZ5/src/compiler/interface2.jl:9
unknown function (ip: 0x7ff3e1723a4d)
_jl_invoke at /cache/build/default-amdci4-7/julialang/julia-release-1-dot-9/src/gf.c:2681 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-7/julialang/julia-release-1-dot-9/src/gf.c:2863
_pullback at /home/romeo/Documents/Stanford/google_ood/DisentanglingVAE.jl/scripts/vae_CUB.jl:166 [inlined]
Here are the referenced code snippets in the stacktrace:
PyCallChainRules.jl/src/pytorch.jl
Lines 56 to 64 in 1723781
function ChainRulesCore.rrule(wrap::TorchModuleWrapper, args...; kwargs...) | |
T = typeof(first(wrap.params)) | |
params = wrap.params | |
pyparams = Tuple(map(x -> DLPack.share(x, PyObject, pyfrom_dlpack).requires_grad_(true), params)) | |
pyargs = fmap(x -> DLPack.share(x, PyObject, pyfrom_dlpack).requires_grad_(true), args) | |
torch_primal, torch_vjpfun = functorch.vjp(py"buffer_implicit"(wrap.torch_stateless_module, wrap.buffers), pyparams, pyargs...; kwargs...) | |
project = ProjectTo(args) | |
function TorchModuleWrapper_pullback(Δ) |
and
https://github.com/pabloferz/DLPack.jl/blob/61f48ee6b5e4f56d9b8525fa6ef9b613242160b8/src/pycall.jl#L98-L116
Metadata
Metadata
Assignees
Labels
No labels