Skip to content

Commit 9bcc823

Browse files
committed
osc/rdma: add local leader's pid in shm file name to make it unique
In osc/rdma, each communicator will assign a rank as the local leader on one node. The local leader will create a shm file, and other ranks in the same communicator and on the same node will attach to the shm file. For osc/rdma to work properly, the shm file name must be unqiue for each communicator/node. To achieve that, osc/rdma used the following shm file name: osc_rdma.<node_name>.<job_id>.<comm_id> However, this name format did not achieve the goal, because comm_id is only unique from process level. It can happen that different communicator have same comm_id, as long as they do not share process. To address this issue, this patch added local leader's pid to shm file name to make it unique. Signed-off-by: Wei Zhang <wzam@amazon.com>
1 parent 855b523 commit 9bcc823

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

ompi/mca/osc/rdma/osc_rdma_component.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -650,9 +650,9 @@ static int allocate_state_shared (ompi_osc_rdma_module_t *module, void **base, s
650650

651651
if (0 == local_rank) {
652652
/* allocate the shared memory segment */
653-
ret = opal_asprintf (&data_file, "%s" OPAL_PATH_SEP "osc_rdma.%s.%x.%d",
653+
ret = opal_asprintf (&data_file, "%s" OPAL_PATH_SEP "osc_rdma.%s.%x.%d.%d",
654654
mca_osc_rdma_component.backing_directory, ompi_process_info.nodename,
655-
OMPI_PROC_MY_NAME->jobid, ompi_comm_get_cid(module->comm));
655+
OMPI_PROC_MY_NAME->jobid, ompi_comm_get_cid(module->comm), getpid());
656656
if (0 > ret) {
657657
ret = OMPI_ERR_OUT_OF_RESOURCE;
658658
} else {

0 commit comments

Comments
 (0)