Skip to content

Conversation

upsj
Copy link
Member

@upsj upsj commented Apr 20, 2025

This conversion can suffer from load imbalance in imbalanced matrices, this new algorithm is always perfectly load-balanced.

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Apr 20, 2025
@upsj upsj requested a review from a team April 20, 2025 14:55
@upsj upsj self-assigned this Apr 20, 2025
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. mod:cuda This is related to the CUDA module. mod:openmp This is related to the OpenMP module. mod:hip This is related to the HIP module. labels Apr 20, 2025
@upsj upsj force-pushed the bitvector branch 4 times, most recently from 0f06710 to f0c5d97 Compare April 20, 2025 21:06
@upsj upsj force-pushed the improved_coo_conversion branch from 2d05e9f to 2d1113a Compare April 20, 2025 21:54
@MarcelKoch MarcelKoch self-requested a review April 24, 2025 12:25
@upsj upsj force-pushed the improved_coo_conversion branch from 2d1113a to 0f03ce2 Compare April 30, 2025 14:50
@upsj upsj changed the base branch from bitvector to transform_iterator April 30, 2025 14:52
@upsj upsj force-pushed the transform_iterator branch 2 times, most recently from 34f6a8c to a94c5a8 Compare April 30, 2025 15:45
@upsj upsj force-pushed the improved_coo_conversion branch from 0f03ce2 to 867c39a Compare April 30, 2025 15:46
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

const RowPtrType* ptrs, size_type num_blocks,
IndexType* idxs)
{
const auto num_elements = exec->copy_val_to_host(ptrs + num_blocks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we just paste the idxs size from outside. We should have it already when allocating the idxs array

{
const auto num_elements = exec->copy_val_to_host(ptrs + num_blocks);
// transform the ptrs to a bitvector in unary delta encoding, i.e.
// every row with n elements is encoded as 1 0 ... n times ... 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// every row with n elements is encoded as 1 0 ... n times ... 0
// every row with n elements is encoded as 1 0 ... n times ... 0
// we only process the value when bv is zero, the prefix-sum of bv - 1, which is get_rank() - 1, is the row index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-for-review This PR is ready for review mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:openmp This is related to the OpenMP module. reg:build This is related to the build system.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants