-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This was mentioned during the WG call today by @jeffhammond and after some thought I really like the idea, so I put down my thoughts on it.
MPI_Accumulate
is the root of all evil when it comes to atomic operation performance in MPI. It allows users to mutate an unbounded number of elements with element-wise atomicity guarantees, which span across all accumulation functions (incl. single-element MPI_Fetch_and_op
). No hardware in existence today (and likely in the future) will provide efficient accumulation of more than a few elements at a time, forcing implementations to fall back to a software emulation approach to guarantee atomicity between MPI_Accumulate
and MPI_Fetch_and_op
. This prevents MPI_Fetch_and_op
from making proper use of network hardware and has been a source of great frustration.
In essence, the MPI standard contains a function that prevents us from using low-level hardware features. It has spurred a line of proposals to mitigate its impact (#8, https://github.com/mpi-forum/mpi-standard/pull/93) that went no where and are merely band-aids. It\s also one of the main drivers for the new allocation function (#22). Instead of spending another decade on trying to overcome these shortcomings we should remove multi-element accumulate.
But I want to accumulate megabytes of data?!
Sure, MPI RMA provides you with all the functions needed to implement get-reduce-put with support from the hardware for data movement. We also provide mutual exclusion. With continuations (https://github.com/mpiwg-hybrid/mpi-standard/pull/1), you could even do that without blocking on the get or put. Or you can implement something akin to AMs using send/recv, if that fits your needs. A function that cannot make (and inhibits) proper use of hardware capabilities has no place in an API that aims at exposing low-level hardware features. You wouldn't accept a language that cannot make use of CPU AMOs for any reasonable system-level coding, either.
To summarize
- Deprecate
MPI_Accumulate
,MPI_Raccumulate
,MPI_Get_accumulate
, andMPI_Rget_accumulate
. - Introduce request-based fetch-op (https://github.com/mpi-forum/mpi-standard/pull/107) to provide an alternative to
MPI_Rget_accumulate
for single elements. - To bridge the time until removal, add an info assertion that you won't use
MPI_Accumulate
anymore so that we can ignore it.