Skip to content

Commit a7dcfd9

Browse files
committed
btl/ofi: Disable EFA provider in versions earlier than libfabric 1.12.0
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric versions. While FI_DELIVERY_COMPLETE would be advertised by the provider, completions would return too early by not accounting for bounce buffers on the receive side. This would cause the BTL to receive early completions that lead to correctness issues. This is not an issue in the mtl/ofi as it does not require FI_DELIVERY_COMPLETE. Signed-off-by: William Zhang <wilzhang@amazon.com>
1 parent 41df122 commit a7dcfd9

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

opal/mca/btl/ofi/btl_ofi_component.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,17 @@ static int validate_info(struct fi_info *info, uint64_t required_caps)
5959

6060
BTL_VERBOSE(("validating device: %s", info->domain_attr->name));
6161

62+
/* EFA does not fulfill FI_DELIVERY_COMPLETE requirements in prior libfabric
63+
* versions. The prov version is set as:
64+
* FI_VERSION(FI_MAJOR_VERSION * 100 + FI_MINOR_VERSION, FI_REVISION_VERSION * 10)
65+
* Thus, FI_VERSION(112,0) corresponds to libfabric 1.12.0
66+
*/
67+
if (!strncasecmp(info->fabric_attr->prov_name, "efa", 3)
68+
&& FI_VERSION_LT(info->fabric_attr->prov_version, FI_VERSION(112,0))) {
69+
BTL_VERBOSE(("unsupported libfabric efa version"));
70+
return OPAL_ERROR;
71+
}
72+
6273
/* we need exactly all the required bits */
6374
if ((info->caps & required_caps) != required_caps) {
6475
BTL_VERBOSE(("unsupported caps"));

0 commit comments

Comments
 (0)