Deprecate (and remove) MPI_Accumulate

This was mentioned during the WG call today by @jeffhammond and after some thought I really like the idea, so I put down my thoughts on it.

`MPI_Accumulate` is the root of all evil when it comes to atomic operation performance in MPI. It allows users to mutate an unbounded number of elements with element-wise atomicity guarantees, which span across all accumulation functions (incl. single-element `MPI_Fetch_and_op`). No hardware in existence today (and likely in the future) will provide efficient accumulation of more than a few elements at a time, forcing implementations to fall back to a software emulation approach to guarantee atomicity between `MPI_Accumulate` and `MPI_Fetch_and_op`. This prevents `MPI_Fetch_and_op` from making proper use of network hardware and has been a source of great frustration.

In essence, the MPI standard contains a function that prevents us from using low-level hardware features. It has spurred a line of proposals to mitigate its impact (https://github.com/mpiwg-rma/rma-issues/issues/8, https://github.com/mpi-forum/mpi-standard/pull/93) that went no where and are merely band-aids. It\s also one of the main drivers for the new allocation function (https://github.com/mpiwg-rma/rma-issues/issues/22). Instead of spending another decade on trying to overcome these shortcomings we should remove multi-element accumulate.

*But I want to accumulate megabytes of data?!*

Sure, MPI RMA provides you with all the functions needed to implement get-reduce-put with support from the hardware for data movement. We also provide mutual exclusion. With continuations (https://github.com/mpiwg-hybrid/mpi-standard/pull/1), you could even do that without blocking on the get or put. Or you can implement something akin to AMs using send/recv, if that fits your needs. A function that cannot make (and inhibits) proper use of hardware capabilities has no place in an API that aims at exposing low-level hardware features. You wouldn't accept a language that cannot make use of CPU AMOs for any reasonable system-level coding, either.

*To summarize*
1) Deprecate `MPI_Accumulate`, `MPI_Raccumulate`, `MPI_Get_accumulate`, and `MPI_Rget_accumulate`.
2) Introduce request-based fetch-op (https://github.com/mpi-forum/mpi-standard/pull/107) to provide an alternative to `MPI_Rget_accumulate` for single elements.
3) To bridge the time until removal, add an info assertion that you won't use `MPI_Accumulate` anymore so that we can ignore it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate (and remove) MPI_Accumulate #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deprecate (and remove) MPI_Accumulate #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions