Skip to content

Commit 020e83f

Browse files
committed
pml/ob1: ensure RDMA fragments are released in the get -> send/recv fallback
Under a number of circumstances it may be necessary to abandon an RDMA get in ob1. In some cases it falls back to put but it may fall back to using send/recv. If that happens then we may either crash or leak RDMA fragments because they are still attached to the send request. Debug builds will crash due to a check on rdma_frag when they are returned. This CL fixes the flaw by releasing any rdma fragment when sceduling sends. Signed-off-by: Nathan Hjelm <hjelmn@google.com>
1 parent 5f00259 commit 020e83f

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

ompi/mca/pml/ob1/pml_ob1_sendreq.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
* Copyright (c) 2018-2019 Triad National Security, LLC. All rights
2323
* reserved.
2424
* Copyright (c) 2022 IBM Corporation. All rights reserved.
25+
* Copyright (c) 2024 Google, LLC. All rights reserved.
2526
* $COPYRIGHT$
2627
*
2728
* Additional copyrights may follow
@@ -1110,6 +1111,12 @@ mca_pml_ob1_send_request_schedule_once(mca_pml_ob1_send_request_t* sendreq)
11101111

11111112
range = get_send_range(sendreq);
11121113

1114+
if (NULL != sendreq->rdma_frag) {
1115+
/* this request was first attempted with RDMA but is now using send/recv */
1116+
MCA_PML_OB1_RDMA_FRAG_RETURN(sendreq->rdma_frag);
1117+
sendreq->rdma_frag = NULL;
1118+
}
1119+
11131120
while(range && (false == sendreq->req_throttle_sends ||
11141121
sendreq->req_pipeline_depth < mca_pml_ob1.send_pipeline_depth)) {
11151122
mca_pml_ob1_frag_hdr_t* hdr;

0 commit comments

Comments
 (0)