Skip to content

Commit 2acc4b7

Browse files
committed
btl tcp: Add workaround for "dropped connection" issue
Work around a race condition in the TCP BTL's proc setup code. The Cisco MTT results have been failing on TCP tests due to a "dropped connection" message some percentage of the time. Some digging shows that the issue happens in a combination of multiple NICs and multiple threads. The race is detailed in #3035 (comment). This patch doesn't fix the race, but avoids it by forcing the MPI layer to complete all calls to add_procs across the entire job before any process leaves MPI_INIT. It also reduces the scalability of the TCP BTL by increasing start-up time, but better than hanging. The long term fix is to do all endpoint setup in the first call to add_procs for a given remote proc, removing the race. THis patch is a work around until that patch can be developed. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
1 parent 37a3f32 commit 2acc4b7

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

opal/mca/btl/tcp/btl_tcp_component.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1300,6 +1300,24 @@ mca_btl_base_module_t** mca_btl_tcp_component_init(int *num_btl_modules,
13001300
}
13011301
}
13021302

1303+
/* Avoid a race in wire-up when using threads (progess or user)
1304+
and multiple BTL modules. The details of the race are in
1305+
https://github.com/open-mpi/ompi/issues/3035#issuecomment-429500032,
1306+
but the summary is that the lookup code in
1307+
component_recv_handler() below assumes that add_procs() is
1308+
atomic across all active TCP BTL modules, but in multi-threaded
1309+
code, that isn't guaranteed, because the locking is inside
1310+
add_procs(), and add_procs() is called once per module. This
1311+
isn't a proper fix, but will solve the "dropped connection"
1312+
problem until we can come up with a more complete fix to how we
1313+
initialize procs, endpoints, and modules in the TCP BTL. */
1314+
if (mca_btl_tcp_component.tcp_num_btls > 1 &&
1315+
(enable_mpi_threads || 0 < mca_btl_tcp_progress_thread_trigger)) {
1316+
for( i = 0; i < mca_btl_tcp_component.tcp_num_btls; i++) {
1317+
mca_btl_tcp_component.tcp_btls[i]->super.btl_flags |= MCA_BTL_FLAGS_SINGLE_ADD_PROCS;
1318+
}
1319+
}
1320+
13031321
#if OPAL_CUDA_SUPPORT
13041322
mca_common_cuda_stage_one_init();
13051323
#endif /* OPAL_CUDA_SUPPORT */

0 commit comments

Comments
 (0)