Skip to content

Conversation

fritzgoebel
Copy link
Collaborator

@fritzgoebel fritzgoebel commented Oct 31, 2024

This PR adds a new distributed matrix format useful for domain decomposition preconditioners such as BDDC. It is in preparation of eventually adding BDDC into Ginkgo.

Instead of assembling the global matrix, each rank stores its local contribution to the global matrix in a globally non-assembled state, where degrees of freedom on the subdomain interfaces are "shared" between multiple ranks. The global matrix application uses a restriction matrix R that maps into an enriched space where the shared degrees of freedom have the multiplicity of ranks sharing in them. In this space, the local contributions can be applied independently of each other before applying R^T to sum up the local parts of the result vector.

This PR relies on:

TODO:

  • General Tests for matrix setup

@fritzgoebel fritzgoebel added is:new-feature A request or implementation of a feature that does not exist yet. type:matrix-format This is related to the Matrix formats 1:ST:WIP This PR is a work in progress. Not ready for review. is:experimental This is an experimental feature/PR/issue/module. mod:all This touches all Ginkgo modules. type:distributed-functionality labels Oct 31, 2024
@fritzgoebel fritzgoebel self-assigned this Oct 31, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers labels Oct 31, 2024
@fritzgoebel fritzgoebel marked this pull request as draft October 31, 2024 12:05
@fritzgoebel fritzgoebel changed the base branch from develop to dd_base October 31, 2024 13:54
@pratikvn pratikvn self-requested a review January 22, 2025 13:29
@MarcelKoch MarcelKoch added this to the Ginkgo 1.10.0 milestone Mar 13, 2025
@fritzgoebel fritzgoebel changed the base branch from dd_base to develop March 17, 2025 17:57
@fritzgoebel fritzgoebel marked this pull request as ready for review March 17, 2025 17:58
@MarcelKoch MarcelKoch self-requested a review March 18, 2025 07:59
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the documentation! Mostly looks good to me. A few comments.

Comment on lines 383 to 387
check_and_adjust_buffer_size(const size_type nrhs) const
{
auto exec = this->get_executor();
auto comm = this->get_communicator();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be incomplete ? Does not use nrhs

GKO_ASSERT_ARRAY_EQ(d_non_local_col_idxs, non_local_col_idxs);
}

std::default_random_engine engine;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed here.

TupleTypenameNameGenerator);


TYPED_TEST(DdMatrix, ReadsDistributed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests for different row and col partitions would also be useful to have.

* 1 | 0 1 ! 1 0 | | 1 0 | | 1 |
* 1 | 0 0 ! 0 1 | | 0 1 | | 0 |
* ```
* With these operators and ablock diagonal 4x4 matrix A_BD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* With these operators and ablock diagonal 4x4 matrix A_BD
* With these operators and a block diagonal 4x4 matrix A_BD

Comment on lines 89 to 95
* second and third, this would lead to a restriction operator R
* ```
* Part-Id Global Local Non-Local
* 0 | 1 0 ! 0 | | 1 0 | | |
* 0 | 0 1 ! 0 | | 0 1 | | |
* |---------| ---->
* 1 | 0 1 ! 0 | | 0 | | 1 |
* 1 | 0 0 ! 1 | | 1 | | 0 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be the prolongation operator, right ? Its mapping from a coarser space to a finer space with a left apply.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It restricts the global vector to the local problems. Agreed, the global space for the block-diagonal matrix is larger, but for each rank it's a restriction. We can discuss this though if you disagree.

Comment on lines 211 to 213
* by the process. The local matrix still considers these and the
* restriction and prolongation operators take care of fetching /
* re-distributing the corresponding vector entries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* by the process. The local matrix still considers these and the
* restriction and prolongation operators take care of fetching /
* re-distributing the corresponding vector entries.
* by the process. The local matrix still considers these and the
* restriction and prolongation operators take care of fetching /
* re-distributing the corresponding vector entries.

* @param data The device_matrix_data structure.
* @param partition The global left and right partition.
*
* @return the index_map induced by the partitions and the matrix structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not return anything. Same for all read_distributed functions below.

@fritzgoebel
Copy link
Collaborator Author

format!

Co-authored-by: fritzgoebel <fritz.goebel@tum.de>
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, and minor issues

Comment on lines +116 to +139
* auto part = Partition<...>::build_from_mapping(...);
* auto mat = Matrix<...>::create(exec, comm);
* mat->read_distributed(matrix_data, part);
* ```
* This will set the dimensions of the global and local matrices and generate
* the restriction and prolongation matrices automatically by deducing the sizes
* from the partition.
*
* By default the Matrix type uses Csr for the local matrix and the storage of
* the local and non-local parts of the restriction and prolongation matrices.
* It is possible to explicitly change the datatype for the local matrices, with
* the constraint that the new type should implement the LinOp and
* ReadableFromMatrixData interface. The type can be set by:
* ```
* auto mat = Matrix<ValueType, LocalIndexType[, ...]>::create(
* exec, comm,
* Coo<ValueType, LocalIndexType>::create(exec).get());
* ```
* Alternatively, the helper function with_matrix_type can be used:
* ```
* auto mat = Matrix<ValueType, LocalIndexType>::create(
* exec, comm,
* with_matrix_type<Coo>());
* ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated ? Still uses the Matrix instead of DdMatrix

Comment on lines +144 to +145
* experimental::distributed::Matrix *A; // distributed matrix
* experimental::distributed::Vector *b, *x; // distributed multi-vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

*/
template <typename ValueType = default_precision,
typename LocalIndexType = int32, typename GlobalIndexType = int64>
class DdMatrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that DdMatrix might be a bit vague. What do you think about UnassembledMatrix or maybe SubstructuredMatrix ?

Comment on lines +136 to +137
data, make_temporary_clone(exec, partition).get(),
make_temporary_clone(exec, partition).get(), local_part,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need both row and column partitions here ? The kernel can probably be simplified to use just one partition ?

auto input = gko::test::generate_random_device_matrix_data<
value_type, global_index_type>(
num_rows, num_cols,
std::uniform_int_distribution<int>(static_cast<int>(num_cols - 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to have fully dense rows here ?

Comment on lines +245 to +246
std::uniform_int_distribution<int>(static_cast<int>(num_cols),
static_cast<int>(num_cols)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

}


TYPED_TEST(DdMatrix, CanApplyToSingleVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also makes sense to add some apply tests to ensure exception throws in cases of dimension mismatches.

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks mostly good. The most important remarks are on clarifying the documentation a bit more.

* restriction and prolongation operators take care of fetching /
* re-distributing the corresponding vector entries.
*
* @param data The device_matrix_data structure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should note somewhere that the data has to use global indexing. Maybe you could also add a small note on how to get to global indexing if only local indexing is available via the index map.

Comment on lines +73 to +74
* operator, which are distributed matrices
* (gko::experimental::distributed::Matrix) defined through the global indices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like an implementation detail. I don't think it's relevant for the users that the internal data structure for the restriction/prolongation operator is a distributed matrix. It might even make sense to switch that to a distributed::RowGatherer. All I'm saying is that we should not unnecessarily restrict ourselves now to an internal data structure.



/**
* The DdMatrix class defines a (MPI-)distributed matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a big note that this type only supports additive overlap. What I mean by that is that local rows on different processes which map to the same global row have to be added up. This is the correct behavior for non-overlapping DD methods, but it doesn't match overlapping ones. For those, some local rows may only be read from other processes and not added up.
I think it's fine to not support those kinds of overlapping DD methods for now, we just have to make clear that the current implementation has this restriction.

* With a partition where rank 0 owns the first two rows and rank 1 the
* third, this would lead to a restriction operator R
* ```
* Part-Id Global Local Non-Local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't think differentiating between local and non-local is necessary. If users are interested in that, they can look at the docs for the distributed::Matrix.

Comment on lines +83 to +85
* | 4 -2 0 | | 0 0 0 | | 4 -2 0 |
* | -2 2 0 | | 0 2 -2 | | -2 4 -2 |
* | 0 0 0 | | 0 -2 4 | | 0 -2 4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that the zero rows are not stored locally, i.e. each rank only stores a 2x2 matrix.

Comment on lines +132 to +133
device_matrix_data<value_type, global_index_type> data_copy{exec, data};
auto arrays = data_copy.empty_out();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy is not necessary, right? You only need a copy of the values for the local data, the indices are only read during the mapping to local indexing.

make_temporary_clone(exec, partition).get(), local_part,
non_owning_row_idxs, non_owning_col_idxs));

auto map = gko::experimental::distributed::index_map<LocalIndexType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
auto map = gko::experimental::distributed::index_map<LocalIndexType,
auto map = index_map<LocalIndexType,

Comment on lines +147 to +149
arrays.col_idxs, gko::experimental::distributed::index_space::combined);
auto local_row_idxs = map.map_to_local(
arrays.row_idxs, gko::experimental::distributed::index_space::combined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
arrays.col_idxs, gko::experimental::distributed::index_space::combined);
auto local_row_idxs = map.map_to_local(
arrays.row_idxs, gko::experimental::distributed::index_space::combined);
arrays.col_idxs, index_space::combined);
auto local_row_idxs = map.map_to_local(
arrays.row_idxs, index_space::combined);

TupleTypenameNameGenerator);


TYPED_TEST(DdMatrix, BuildsEmptyIsSameAsRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test isn't building anything, it tests the filter, so maybe rename it (and the other tests):

Suggested change
TYPED_TEST(DdMatrix, BuildsEmptyIsSameAsRef)
TYPED_TEST(DdMatrix, FiltersEmptyIsSameAsRef)

@@ -1,4 +1,5 @@
ginkgo_create_common_and_reference_test(assembly MPI_SIZE 3)
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use this instead of the #ifndef GKO_COMPILING_DPCPP macro:

Suggested change
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3)
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3 DISABLE_EXECUTORS dpcpp)

@MarcelKoch MarcelKoch removed this from the Ginkgo 1.10.0 milestone May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:WIP This PR is a work in progress. Not ready for review. is:experimental This is an experimental feature/PR/issue/module. is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:distributed-functionality type:matrix-format This is related to the Matrix formats type:solver This is related to the solvers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants