-
Notifications
You must be signed in to change notification settings - Fork 99
Add a Domain decomposition Matrix format #1719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
01e9089
to
1c0e4e9
Compare
a3fb354
to
762277e
Compare
057b33c
to
a6d9c13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the documentation! Mostly looks good to me. A few comments.
core/distributed/dd_matrix.cpp
Outdated
check_and_adjust_buffer_size(const size_type nrhs) const | ||
{ | ||
auto exec = this->get_executor(); | ||
auto comm = this->get_communicator(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be incomplete ? Does not use nrhs
GKO_ASSERT_ARRAY_EQ(d_non_local_col_idxs, non_local_col_idxs); | ||
} | ||
|
||
std::default_random_engine engine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed here.
TupleTypenameNameGenerator); | ||
|
||
|
||
TYPED_TEST(DdMatrix, ReadsDistributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tests for different row and col partitions would also be useful to have.
* 1 | 0 1 ! 1 0 | | 1 0 | | 1 | | ||
* 1 | 0 0 ! 0 1 | | 0 1 | | 0 | | ||
* ``` | ||
* With these operators and ablock diagonal 4x4 matrix A_BD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* With these operators and ablock diagonal 4x4 matrix A_BD | |
* With these operators and a block diagonal 4x4 matrix A_BD |
* second and third, this would lead to a restriction operator R | ||
* ``` | ||
* Part-Id Global Local Non-Local | ||
* 0 | 1 0 ! 0 | | 1 0 | | | | ||
* 0 | 0 1 ! 0 | | 0 1 | | | | ||
* |---------| ----> | ||
* 1 | 0 1 ! 0 | | 0 | | 1 | | ||
* 1 | 0 0 ! 1 | | 1 | | 0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be the prolongation operator, right ? Its mapping from a coarser space to a finer space with a left apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It restricts the global vector to the local problems. Agreed, the global space for the block-diagonal matrix is larger, but for each rank it's a restriction. We can discuss this though if you disagree.
* by the process. The local matrix still considers these and the | ||
* restriction and prolongation operators take care of fetching / | ||
* re-distributing the corresponding vector entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* by the process. The local matrix still considers these and the | |
* restriction and prolongation operators take care of fetching / | |
* re-distributing the corresponding vector entries. | |
* by the process. The local matrix still considers these and the | |
* restriction and prolongation operators take care of fetching / | |
* re-distributing the corresponding vector entries. |
* @param data The device_matrix_data structure. | ||
* @param partition The global left and right partition. | ||
* | ||
* @return the index_map induced by the partitions and the matrix structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not return anything. Same for all read_distributed functions below.
15733d8
to
46788cf
Compare
format! |
Co-authored-by: fritzgoebel <fritz.goebel@tum.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, and minor issues
* auto part = Partition<...>::build_from_mapping(...); | ||
* auto mat = Matrix<...>::create(exec, comm); | ||
* mat->read_distributed(matrix_data, part); | ||
* ``` | ||
* This will set the dimensions of the global and local matrices and generate | ||
* the restriction and prolongation matrices automatically by deducing the sizes | ||
* from the partition. | ||
* | ||
* By default the Matrix type uses Csr for the local matrix and the storage of | ||
* the local and non-local parts of the restriction and prolongation matrices. | ||
* It is possible to explicitly change the datatype for the local matrices, with | ||
* the constraint that the new type should implement the LinOp and | ||
* ReadableFromMatrixData interface. The type can be set by: | ||
* ``` | ||
* auto mat = Matrix<ValueType, LocalIndexType[, ...]>::create( | ||
* exec, comm, | ||
* Coo<ValueType, LocalIndexType>::create(exec).get()); | ||
* ``` | ||
* Alternatively, the helper function with_matrix_type can be used: | ||
* ``` | ||
* auto mat = Matrix<ValueType, LocalIndexType>::create( | ||
* exec, comm, | ||
* with_matrix_type<Coo>()); | ||
* ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated ? Still uses the Matrix
instead of DdMatrix
* experimental::distributed::Matrix *A; // distributed matrix | ||
* experimental::distributed::Vector *b, *x; // distributed multi-vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
*/ | ||
template <typename ValueType = default_precision, | ||
typename LocalIndexType = int32, typename GlobalIndexType = int64> | ||
class DdMatrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that DdMatrix
might be a bit vague. What do you think about UnassembledMatrix
or maybe SubstructuredMatrix
?
data, make_temporary_clone(exec, partition).get(), | ||
make_temporary_clone(exec, partition).get(), local_part, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need both row and column partitions here ? The kernel can probably be simplified to use just one partition ?
auto input = gko::test::generate_random_device_matrix_data< | ||
value_type, global_index_type>( | ||
num_rows, num_cols, | ||
std::uniform_int_distribution<int>(static_cast<int>(num_cols - 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to have fully dense rows here ?
std::uniform_int_distribution<int>(static_cast<int>(num_cols), | ||
static_cast<int>(num_cols)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here
} | ||
|
||
|
||
TYPED_TEST(DdMatrix, CanApplyToSingleVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also makes sense to add some apply tests to ensure exception throws in cases of dimension mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks mostly good. The most important remarks are on clarifying the documentation a bit more.
* restriction and prolongation operators take care of fetching / | ||
* re-distributing the corresponding vector entries. | ||
* | ||
* @param data The device_matrix_data structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should note somewhere that the data has to use global indexing. Maybe you could also add a small note on how to get to global indexing if only local indexing is available via the index map.
* operator, which are distributed matrices | ||
* (gko::experimental::distributed::Matrix) defined through the global indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds more like an implementation detail. I don't think it's relevant for the users that the internal data structure for the restriction/prolongation operator is a distributed matrix. It might even make sense to switch that to a distributed::RowGatherer
. All I'm saying is that we should not unnecessarily restrict ourselves now to an internal data structure.
|
||
|
||
/** | ||
* The DdMatrix class defines a (MPI-)distributed matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a big note that this type only supports additive overlap. What I mean by that is that local rows on different processes which map to the same global row have to be added up. This is the correct behavior for non-overlapping DD methods, but it doesn't match overlapping ones. For those, some local rows may only be read from other processes and not added up.
I think it's fine to not support those kinds of overlapping DD methods for now, we just have to make clear that the current implementation has this restriction.
* With a partition where rank 0 owns the first two rows and rank 1 the | ||
* third, this would lead to a restriction operator R | ||
* ``` | ||
* Part-Id Global Local Non-Local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I don't think differentiating between local and non-local is necessary. If users are interested in that, they can look at the docs for the distributed::Matrix
.
* | 4 -2 0 | | 0 0 0 | | 4 -2 0 | | ||
* | -2 2 0 | | 0 2 -2 | | -2 4 -2 | | ||
* | 0 0 0 | | 0 -2 4 | | 0 -2 4 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that the zero rows are not stored locally, i.e. each rank only stores a 2x2 matrix.
device_matrix_data<value_type, global_index_type> data_copy{exec, data}; | ||
auto arrays = data_copy.empty_out(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copy is not necessary, right? You only need a copy of the values for the local data, the indices are only read during the mapping to local indexing.
make_temporary_clone(exec, partition).get(), local_part, | ||
non_owning_row_idxs, non_owning_col_idxs)); | ||
|
||
auto map = gko::experimental::distributed::index_map<LocalIndexType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
auto map = gko::experimental::distributed::index_map<LocalIndexType, | |
auto map = index_map<LocalIndexType, |
arrays.col_idxs, gko::experimental::distributed::index_space::combined); | ||
auto local_row_idxs = map.map_to_local( | ||
arrays.row_idxs, gko::experimental::distributed::index_space::combined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
arrays.col_idxs, gko::experimental::distributed::index_space::combined); | |
auto local_row_idxs = map.map_to_local( | |
arrays.row_idxs, gko::experimental::distributed::index_space::combined); | |
arrays.col_idxs, index_space::combined); | |
auto local_row_idxs = map.map_to_local( | |
arrays.row_idxs, index_space::combined); |
TupleTypenameNameGenerator); | ||
|
||
|
||
TYPED_TEST(DdMatrix, BuildsEmptyIsSameAsRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't building anything, it tests the filter, so maybe rename it (and the other tests):
TYPED_TEST(DdMatrix, BuildsEmptyIsSameAsRef) | |
TYPED_TEST(DdMatrix, FiltersEmptyIsSameAsRef) |
@@ -1,4 +1,5 @@ | |||
ginkgo_create_common_and_reference_test(assembly MPI_SIZE 3) | |||
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use this instead of the #ifndef GKO_COMPILING_DPCPP
macro:
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3) | |
ginkgo_create_common_and_reference_test(dd_matrix MPI_SIZE 3 DISABLE_EXECUTORS dpcpp) |
This PR adds a new distributed matrix format useful for domain decomposition preconditioners such as BDDC. It is in preparation of eventually adding BDDC into Ginkgo.
Instead of assembling the global matrix, each rank stores its local contribution to the global matrix in a globally non-assembled state, where degrees of freedom on the subdomain interfaces are "shared" between multiple ranks. The global matrix application uses a restriction matrix
R
that maps into an enriched space where the shared degrees of freedom have the multiplicity of ranks sharing in them. In this space, the local contributions can be applied independently of each other before applyingR^T
to sum up the local parts of the result vector.This PR relies on:
TODO: