Skip to content

Permute (distributed): memory allocation at schedule-time represents a limiting factor for bigger runs #855

@albestro

Description

@albestro

Current implementation of Permute distributed, i.e. permuteOnCPU, currently allocates memory at schedule-time for:

  • inverse_perms
  • local2global_index
  • packing_index
  • unpacking_index
  • mat_send
  • mat_recv

If the ones for indices represents a waste but not really a concern, mat_send and mat_recv are a bit more "expensive" and might represent a limit factor for big runs.

For example, the D&C tridiagonal solver calls multiple times this function on different sub matrices obtained by splitting the original one in different ways. This results in allocating a lot of "support memory" all at once at schedule time, when it could, at least in principle be allocated:

  • just when needed (at runtime),
  • but even more re-used among different calls, that will not anyway run in parallel due to the nature of the algorithm.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    No status

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions