Skip to content

Is the error "Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2 CUDA-aware support is disabled." due to unavailability of module nv_peer_mem or nvidia-peermem in the nvidia-driver #13326

@Dcn303

Description

@Dcn303

Hello everyone I am facing error shown below

An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.

I was trying to benchmark cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x using OSU benchmark from https://mvapich.cse.ohio-state.edu/benchmarks/
Things I have done so far

  1. Build cuda-11.8 tool kit using gcc-8.2.0 then export its lib64 and bin
  2. Make ucx-1.19.x cuda-aware using the built cuda-11.8 then export its lib and bin (gcc-8.2.0 compiler used)
  3. Link openmpi-4.1.8 with cuda-11.8 making it cuda-aware and also link cuda-aware ucx-1.19.x (gcc-8.2.0 compiler used)
  4. Build the OSU benchmark with the built cuda-aware openmpi-4.1.8 linked with cuda-aware ucx-1.19.x and with
    the cuda-11.8 (gcc-8.2.0 compiler used)
  5. The OSU program picked to benchmark was osu_bw
    after the execution I am facing the above error

One thing I notice in the built cuda-aware ucx-1.19.x was it had a missing transport gdr_copy
thought it has cuda_copy and cuda_ipc when checking for cuda support with "ucx_info -d | grep -i cuda"
I heard that gdr_copy transport should also be there if ucx is cuda-aware
and that this transport is dependent on module called nv_peer_mem or nvidia-peermem
later I found out that my driver have a missing module call nv_peer_mem or
nvidia-peermem
Could this also be the reason for the above error i.e.

An error occurred while trying to map in the address of a function.
Function Name: cuIpcOpenMemHandle_v2
Error string: /lib64/libcuda.so.1: undefined symbol: cuIpcOpenMemHandle_v2
CUDA-aware support is disabled.

Thanks a lot for taking time to read

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions