Skip to content

Commit a74f31d

Browse files
committed
OFI/MTL: bypass HMEM mem reg for CXI provider
The code in the OFI MTL to register acclerator memory doesn't work well with providers like CXI which require the libfabric consumer to specify the requested_key field in the fi_mr_attr arg to fi_mr_regattr. As it turns out there's a simpe workaround though with this provider as it actually doesn't need accelerator memory to be registered for FI_MSG/FI_TAGGED type transfers. This patch restores support for transfers to/from accelerator memory using the OFI MTL on SS11 (CXI provider). Signed-off-by: Howard Pritchard <howardp@lanl.gov>
1 parent 56e8c08 commit a74f31d

File tree

3 files changed

+18
-2
lines changed

3 files changed

+18
-2
lines changed

ompi/mca/mtl/ofi/mtl_ofi.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved
33
* Copyright (c) 2017 Los Alamos National Security, LLC. All rights
44
* reserved.
5-
* Copyright (c) 2019-2021 Triad National Security, LLC. All rights
5+
* Copyright (c) 2019-2022 Triad National Security, LLC. All rights
66
* reserved.
77
* Copyright (c) 2018-2022 Amazon.com, Inc. or its affiliates. All Rights reserved.
88
* reserved.
@@ -306,7 +306,7 @@ int ompi_mtl_ofi_register_buffer(struct opal_convertor_t *convertor,
306306
return OMPI_SUCCESS;
307307
}
308308

309-
if (convertor->flags & CONVERTOR_ACCELERATOR) {
309+
if ((convertor->flags & CONVERTOR_ACCELERATOR) && ompi_mtl_ofi.hmem_needs_reg) {
310310
/* Register buffer */
311311
int ret;
312312
struct fi_mr_attr attr = {0};

ompi/mca/mtl/ofi/mtl_ofi_component.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -777,6 +777,19 @@ ompi_mtl_ofi_component_init(bool enable_progress_threads,
777777
*accelerator_support = false;
778778
} else {
779779
*accelerator_support = true;
780+
ompi_mtl_ofi.hmem_needs_reg = true;
781+
/*
782+
* Workaround for the fact that the CXI provider actually doesn't need for accelerator memory to be registered
783+
* for local buffers, but if one does do so using fi_mr_regattr, one actually needs to manage the
784+
* requested_key field in the fi_mr_attr attr argument, and the OFI MTL doesn't track which requested_keys
785+
* have already been registered. So just set a flag to disable local registration. Note the OFI BTL doesn't
786+
* have a problem here since it uses fi_mr_regattr only within the context of an rcache, and manages the
787+
* requested_key field in this way.
788+
*/
789+
if (!strncasecmp(prov->fabric_attr->prov_name, "cxi", 3)) {
790+
ompi_mtl_ofi.hmem_needs_reg = false;
791+
}
792+
780793
}
781794

782795
/**

ompi/mca/mtl/ofi/mtl_ofi_types.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved
33
*
44
* Copyright (c) 2015 Cisco Systems, Inc. All rights reserved.
5+
* Copyright (c) 2022 Triad National Security, LLC. All rights
6+
* reserved.
57
* $COPYRIGHT$
68
*
79
* Additional copyrights may follow
@@ -97,6 +99,7 @@ typedef struct mca_mtl_ofi_module_t {
9799

98100
bool is_initialized;
99101
bool has_posted_initial_buffer;
102+
bool hmem_needs_reg;
100103

101104
} mca_mtl_ofi_module_t;
102105

0 commit comments

Comments
 (0)