Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.1.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from a source tarball.
Please describe the system on which you are running
- Operating system/version: CentOS 7.9
- Computer hardware: AMD EPYC Milan
- Network type: Mellanox ConnectX-6
Details of the problem
We are seeing problems with Open MPI 4.1.2 and UCX trying to call ibv_fork_init()
. Originally, we were seeing occasional UCX warnings
ib_md.c:1161 UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued.
ib_md.c:1162 UCX WARN IB: data corruption might occur when using registered memory.
which under some circumstances led to abortions/hangs. As expected, setting UCX_IB_FORK_INIT=yes
instead of the default try
leads to reproducible abortions. Interestingly, the issues does not appear when using Open MPI 4.0.3 instead of 4.1.2. A bit of digging around brought me to openucx/ucx#686, where it says that UCX is always using ibv_fork_init()
but it could happen that
OpenMPI is [...] using verbs without calling ibv_fork_init(), and then UCX fails when it calls ibv_fork_init()
So, it seems like there might be issues when verbs functions are called before UCX is trying to initialize with ibv_fork_init()
. Using a debugger with a breakpoint at ibv_open_device()
showed me that indeed, this function is called before UCX is calling ibv_fork_init()
. In fact, it is called by BTL/ofi, which is new in Open MPI 4.1.x which in turn explains that we are not seeing the same issue with Open MPI 4.0.3.
Since we are relying on UCX and don't do one-sided communication, I don't think we are actually using BTL/ofi. It rather seems like it is only being initialized during the Open MPI init process and then kept active. Disabling this component using -mca btl ^ofi
avoids the UCX warnings as well as all downstream problems.
There are a number of other MCA components which are calling ibv_open_device()
(e.g., BTL/usnic, MTL/ofi), but these do not lead to the same issues, possibly because they are not kept active during the application run.
While we can just disable BTL/ofi and should be all set for Open MPI 4.1.2, I am wondering about two things:
- How can we make sure that newly added MCA components do not lead to the same issue going forward?
- Could this be improved in future versions of Open MPI? If UCX tries to call
ibv_fork_init()
which requires no non-fork-safe IBV component to be active, maybe there is a way to make sure UCX goes first (if present) because it has the highest requirements? Alternatively, how about a "common" MCA parameter to enable fork-safe IBV initialization for all MCA components, which should always work regardless of the initialization order?
Thanks,
Moritz