Permute (distributed): memory allocation at schedule-time represents a limiting factor for bigger runs

Current implementation of Permute distributed, i.e. `permuteOnCPU`, currently allocates memory at schedule-time for:

- `inverse_perms`
- `local2global_index`
- `packing_index`
- `unpacking_index`
- `mat_send`
- `mat_recv`

If the ones for indices represents a waste but not really a concern, `mat_send` and `mat_recv` are a bit more "expensive" and might represent a limit factor for big runs.

For example, the D&C tridiagonal solver calls multiple times this function on different sub matrices obtained by splitting the original one in different ways. This results in allocating a lot of "support memory" all at once at schedule time, when it could, at least in principle be allocated:
- just when needed (at runtime),
- but even more re-used among different calls, that will not anyway run in parallel due to the nature of the algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Permute (distributed): memory allocation at schedule-time represents a limiting factor for bigger runs #855

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Permute (distributed): memory allocation at schedule-time represents a limiting factor for bigger runs #855

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions