Skip to content

Commit 6754bf1

Browse files
committed
SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier function use an invalid memory location. In particular, this location was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier algorithm and it did not complete: One PE could read 0 from its peer and assume the peer already started the barrier, and then write 1 to the peer. Then, the peer entered the barrier and overwrote the 1 with 0, and then it waited forever to see '1' in its pSync. Found with shmem_verifier test suite. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
1 parent f2e6d78 commit 6754bf1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

oshmem/mca/scoll/basic/scoll_basic_alltoall.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ int mca_scoll_basic_alltoall(struct oshmem_group_t *group,
7979

8080
/* Wait for operation completion */
8181
SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe);
82-
rc = BARRIER_FUNC(group, pSync + 1, SCOLL_DEFAULT_ALG);
82+
rc = BARRIER_FUNC(group, pSync, SCOLL_DEFAULT_ALG);
8383

8484
/* Restore initial values */
8585
SCOLL_VERBOSE(12, "PE#%d Restore special synchronization array",

0 commit comments

Comments
 (0)