Skip to content

Commit 9afcb8e

Browse files
authored
Merge pull request #8113 from rajachan/empty-error-cq
mtl/ofi: Do not fail if error CQ is empty
2 parents 47c03bc + 415dddb commit 9afcb8e

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

ompi/mca/mtl/ofi/mtl_ofi.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,17 @@ ompi_mtl_ofi_context_progress(int ctxt_id)
137137
&error,
138138
0);
139139
if (0 > ret) {
140+
/*
141+
* In multi-threaded scenarios, any thread that attempts to read
142+
* a CQ when there's a pending error CQ entry gets an
143+
* -FI_EAVAIL. Without any serialization here (which is okay,
144+
* since libfabric will protect access to critical CQ objects),
145+
* all threads proceed to read from the error CQ, but only one
146+
* thread fetches the entry while others get -FI_EAGAIN
147+
* indicating an empty queue, which is not erroneous.
148+
*/
149+
if (ret == -FI_EAGAIN)
150+
return count;
140151
opal_output(0, "%s:%d: Error returned from fi_cq_readerr: %s(%zd).\n"
141152
"*** The Open MPI OFI MTL is aborting the MPI job (via exit(3)).\n",
142153
__FILE__, __LINE__, fi_strerror(-ret), ret);

0 commit comments

Comments
 (0)