Skip to content

Commit b8d6b6b

Browse files
committed
libnbc fix for iallreduce count*extent overflowing int
In the libnbc iallreduce ring algorithm at -np 4, if the datatype is MPI_LONG_LONG of 8bytes and a count is used like 1.5 billion so the total bytes is 12Gb, the offsets for some of the iterations were going negative. gist for testcase: https://gist.github.com/markalle/61e05fed6de4cd201d5e7d22b0c175a1 % mpicc -o x iallreduce_overflow.c % mpirun -np 4 --mca coll_libnbc_iallreduce_algorithm 1 ./x 12000000000 The testcase picks a random number of bytes for the allreduce buffer if one isn't specified on the command line. Signed-off-by: Mark Allen <markalle@us.ibm.com>
1 parent 4ddb5a8 commit b8d6b6b

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

ompi/mca/coll/libnbc/nbc_iallreduce.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
* reserved.
1010
* Copyright (c) 2014-2018 Research Organization for Information Science
1111
* and Technology (RIST). All rights reserved.
12-
* Copyright (c) 2017 IBM Corporation. All rights reserved.
12+
* Copyright (c) 2017-2022 IBM Corporation. All rights reserved.
1313
* Copyright (c) 2018 FUJITSU LIMITED. All rights reserved.
1414
* $COPYRIGHT$
1515
*
@@ -768,9 +768,9 @@ allred_sched_ring(int r, int p,
768768
/* first p-1 rounds are reductions */
769769
for (int round = 0 ; round < p - 1 ; ++round) {
770770
int selement = (r+1-round + 2*p /*2*p avoids negative mod*/)%p; /* the element I am sending */
771-
int soffset = segoffsets[selement]*ext;
771+
size_t soffset = segoffsets[selement]*(size_t)ext;
772772
int relement = (r-round + 2*p /*2*p avoids negative mod*/)%p; /* the element that I receive from my neighbor */
773-
int roffset = segoffsets[relement]*ext;
773+
size_t roffset = segoffsets[relement]*(size_t)ext;
774774

775775
/* first message come out of sendbuf */
776776
if (round == 0) {
@@ -807,9 +807,9 @@ allred_sched_ring(int r, int p,
807807
}
808808
for (int round = p - 1 ; round < 2 * p - 2 ; ++round) {
809809
int selement = (r+1-round + 2*p /*2*p avoids negative mod*/)%p; /* the element I am sending */
810-
int soffset = segoffsets[selement]*ext;
810+
size_t soffset = segoffsets[selement]*(size_t)ext;
811811
int relement = (r-round + 2*p /*2*p avoids negative mod*/)%p; /* the element that I receive from my neighbor */
812-
int roffset = segoffsets[relement]*ext;
812+
size_t roffset = segoffsets[relement]*(size_t)ext;
813813

814814
res = NBC_Sched_send ((char *) recvbuf + soffset, false, segsizes[selement], datatype, speer,
815815
schedule, false);

0 commit comments

Comments
 (0)