Skip to content

Conversation

upsj
Copy link
Member

@upsj upsj commented Jan 29, 2025

This adds a primitive that allows the distribution of variable-sized chunks of work across a warp for better memory coalescing and warp utilization. This can be used as a component in operations like SpGEMM, symbolic Cholesky or, combined with a segmented reduction, could also be used to implement the inner loop in an SpMV similar to MergePath. Looking for feedback and performance benchmark suggestions :)

@upsj upsj requested review from a team and yhmtsai January 29, 2025 22:36
@upsj upsj self-assigned this Jan 29, 2025
@upsj upsj marked this pull request as ready for review January 29, 2025 22:36
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. labels Jan 29, 2025
@upsj
Copy link
Member Author

upsj commented Jan 29, 2025

If the performance is good, this could also be used as an internal component for merge path SpMV

@ginkgo-bot
Copy link
Member

Error: The following files need to be formatted:

CMakeLists.txt
benchmark/CMakeLists.txt
benchmark/blas/blas_common.hpp
benchmark/conversion/conversion.cpp
benchmark/matrix_statistics/matrix_statistics.cpp
benchmark/preconditioner/preconditioner.cpp
benchmark/solver/distributed/solver.cpp
benchmark/solver/solver_common.hpp
benchmark/sparse_blas/sparse_blas.cpp
benchmark/spmv/spmv_common.hpp
benchmark/test/CMakeLists.txt
benchmark/utils/general.hpp
benchmark/utils/generator.hpp
benchmark/utils/stencil_matrix.hpp
cmake/CTestScript.cmake
cmake/Modules/CudaArchitectureSelector.cmake
cmake/Modules/FindHWLOC.cmake
cmake/Modules/FindMETIS.cmake
cmake/Modules/FindNUMA.cmake
cmake/Modules/FindNVTX.cmake
cmake/Modules/FindPAPI.cmake
cmake/Modules/FindROCTX.cmake
cmake/Modules/FindVTune.cmake
cmake/Modules/hwloc_helpers.cmake
cmake/autodetect_executors.cmake
cmake/autodetect_system_libs.cmake
cmake/build_helpers.cmake
cmake/build_type_helpers.cmake
cmake/compiler_features.cmake
cmake/create_test.cmake
cmake/cuda.cmake
cmake/get_info.cmake
cmake/hip.cmake
cmake/hip_helpers.cmake
cmake/information_helpers.cmake
cmake/install_helpers.cmake
cmake/rename.cmake
cmake/sycl.cmake
cmake/template_instantiation.cmake
common/cuda_hip/CMakeLists.txt
common/cuda_hip/base/device_matrix_data_kernels.cpp
common/cuda_hip/distributed/assembly_kernels.cpp
common/cuda_hip/distributed/index_map_kernels.cpp
common/cuda_hip/distributed/matrix_kernels.cpp
common/cuda_hip/distributed/partition_kernels.cpp
common/cuda_hip/distributed/vector_kernels.cpp
common/cuda_hip/matrix/ell_kernels.cpp
common/cuda_hip/matrix/sellp_kernels.cpp
common/cuda_hip/matrix/sparsity_csr_kernels.cpp
common/cuda_hip/multigrid/pgm_kernels.cpp
common/unified/CMakeLists.txt
common/unified/distributed/assembly_kernels.cpp
common/unified/matrix/coo_kernels.cpp
common/unified/matrix/csr_kernels.cpp
common/unified/matrix/dense_kernels.template.cpp
common/unified/preconditioner/jacobi_kernels.cpp
core/CMakeLists.txt
core/base/device_matrix_data.cpp
core/base/device_matrix_data_kernels.hpp
core/base/segmented_array.hpp
core/config/config_helper.cpp
core/config/config_helper.hpp
core/config/property_tree.cpp
core/config/registry.cpp
core/config/solver_config.cpp
core/config/solver_config.hpp
core/device_hooks/CMakeLists.txt
core/distributed/assembly.cpp
core/distributed/helpers.hpp
core/distributed/index_map.cpp
core/distributed/index_map_kernels.hpp
core/distributed/matrix.cpp
core/distributed/partition.cpp
core/distributed/partition_kernels.hpp
core/distributed/preconditioner/schwarz.cpp
core/distributed/vector.cpp
core/distributed/vector_cache.cpp
core/matrix/coo.cpp
core/matrix/coo_kernels.hpp
core/matrix/csr.cpp
core/matrix/csr_kernels.hpp
core/multigrid/pgm.cpp
core/multigrid/pgm_kernels.hpp
core/preconditioner/jacobi.cpp
core/preconditioner/jacobi_kernels.hpp
core/solver/gmres.cpp
core/solver/ir.cpp
core/solver/multigrid.cpp
core/test/accessor/CMakeLists.txt
core/test/config/config.cpp
core/test/config/preconditioner.cpp
core/test/config/property_tree.cpp
core/test/config/solver.cpp
core/test/gtest/CMakeLists.txt
core/test/log/CMakeLists.txt
core/test/matrix/csr.cpp
core/test/mpi/base/bindings.cpp
core/test/mpi/distributed/matrix.cpp
core/test/mpi/distributed/preconditioner/schwarz.cpp
core/test/preconditioner/jacobi.cpp
core/test/utils.hpp
core/test/utils/assertions_test.cpp
cuda/CMakeLists.txt
devices/cuda/CMakeLists.txt
devices/dpcpp/CMakeLists.txt
devices/hip/CMakeLists.txt
devices/omp/CMakeLists.txt
devices/reference/CMakeLists.txt
doc/CMakeLists.txt
doc/examples/CMakeLists.txt
dpcpp/CMakeLists.txt
dpcpp/base/device_matrix_data_kernels.dp.cpp
dpcpp/distributed/assembly_kernels.dp.cpp
dpcpp/distributed/index_map_kernels.dp.cpp
dpcpp/distributed/matrix_kernels.dp.cpp
dpcpp/distributed/partition_kernels.dp.cpp
dpcpp/distributed/vector_kernels.dp.cpp
dpcpp/matrix/csr_kernels.dp.cpp
dpcpp/matrix/dense_kernels.dp.cpp
dpcpp/matrix/ell_kernels.dp.cpp
dpcpp/matrix/sellp_kernels.dp.cpp
dpcpp/matrix/sparsity_csr_kernels.dp.cpp
dpcpp/multigrid/pgm_kernels.dp.cpp
dpcpp/test/base/CMakeLists.txt
examples/CMakeLists.txt
examples/adaptiveprecision-blockjacobi/CMakeLists.txt
examples/batched-solver/CMakeLists.txt
examples/cb-gmres/CMakeLists.txt
examples/custom-logger/CMakeLists.txt
examples/custom-matrix-format/CMakeLists.txt
examples/custom-stopping-criterion/CMakeLists.txt
examples/distributed-solver/distributed-solver.cpp
examples/external-lib-interfacing/CMakeLists.txt
examples/file-config-solver/CMakeLists.txt
examples/ginkgo-overhead/CMakeLists.txt
examples/ginkgo-ranges/CMakeLists.txt
examples/heat-equation/CMakeLists.txt
examples/ilu-preconditioned-solver/CMakeLists.txt
examples/inverse-iteration/CMakeLists.txt
examples/ir-ilu-preconditioned-solver/CMakeLists.txt
examples/iterative-refinement/CMakeLists.txt
examples/minimal-cuda-solver/CMakeLists.txt
examples/mixed-multigrid-preconditioned-solver/CMakeLists.txt
examples/mixed-multigrid-solver/CMakeLists.txt
examples/mixed-precision-ir/CMakeLists.txt
examples/mixed-spmv/CMakeLists.txt
examples/multigrid-preconditioned-solver-customized/CMakeLists.txt
examples/multigrid-preconditioned-solver/CMakeLists.txt
examples/nine-pt-stencil-solver/CMakeLists.txt
examples/papi-logging/CMakeLists.txt
examples/par-ilu-convergence/CMakeLists.txt
examples/performance-debugging/CMakeLists.txt
examples/poisson-solver/CMakeLists.txt
examples/preconditioned-solver/CMakeLists.txt
examples/preconditioner-export/CMakeLists.txt
examples/reordered-preconditioned-solver/CMakeLists.txt
examples/schroedinger-splitting/CMakeLists.txt
examples/simple-solver-logging/CMakeLists.txt
examples/simple-solver/CMakeLists.txt
examples/three-pt-stencil-solver/CMakeLists.txt
extensions/CMakeLists.txt
extensions/test/CMakeLists.txt
extensions/test/config/CMakeLists.txt
extensions/test/kokkos/CMakeLists.txt
hip/CMakeLists.txt
hip/test/matrix/CMakeLists.txt
include/CMakeLists.txt
include/ginkgo/core/base/mpi.hpp
include/ginkgo/core/base/precision_dispatch.hpp
include/ginkgo/core/base/std_extensions.hpp
include/ginkgo/core/base/types.hpp
include/ginkgo/core/config/config.hpp
include/ginkgo/core/config/property_tree.hpp
include/ginkgo/core/distributed/index_map.hpp
include/ginkgo/core/distributed/matrix.hpp
include/ginkgo/core/distributed/partition.hpp
include/ginkgo/core/distributed/preconditioner/schwarz.hpp
include/ginkgo/core/distributed/vector.hpp
include/ginkgo/core/distributed/vector_cache.hpp
include/ginkgo/core/matrix/coo.hpp
include/ginkgo/core/matrix/csr.hpp
include/ginkgo/core/matrix/dense.hpp
include/ginkgo/core/multigrid/pgm.hpp
include/ginkgo/core/preconditioner/jacobi.hpp
include/ginkgo/ginkgo.hpp
matrices/CMakeLists.txt
omp/CMakeLists.txt
omp/base/device_matrix_data_kernels.cpp
omp/distributed/assembly_kernels.cpp
omp/distributed/index_map_kernels.cpp
omp/distributed/matrix_kernels.cpp
omp/distributed/partition_kernels.cpp
omp/distributed/vector_kernels.cpp
omp/matrix/csr_kernels.cpp
omp/matrix/dense_kernels.cpp
omp/matrix/ell_kernels.cpp
omp/matrix/fbcsr_kernels.cpp
omp/matrix/sellp_kernels.cpp
omp/matrix/sparsity_csr_kernels.cpp
omp/multigrid/pgm_kernels.cpp
reference/CMakeLists.txt
reference/base/device_matrix_data_kernels.cpp
reference/distributed/assembly_kernels.cpp
reference/distributed/index_map_kernels.cpp
reference/distributed/matrix_kernels.cpp
reference/distributed/partition_helpers.hpp
reference/distributed/partition_kernels.cpp
reference/distributed/vector_kernels.cpp
reference/matrix/coo_kernels.cpp
reference/matrix/csr_kernels.cpp
reference/matrix/dense_kernels.cpp
reference/matrix/ell_kernels.cpp
reference/matrix/fbcsr_kernels.cpp
reference/matrix/sellp_kernels.cpp
reference/matrix/sparsity_csr_kernels.cpp
reference/multigrid/pgm_kernels.cpp
reference/preconditioner/jacobi_kernels.cpp
reference/test/distributed/assembly_kernels.cpp
reference/test/distributed/index_map_kernels.cpp
reference/test/distributed/matrix_kernels.cpp
reference/test/distributed/partition_kernels.cpp
reference/test/distributed/vector_kernels.cpp
reference/test/log/CMakeLists.txt
reference/test/matrix/coo_kernels.cpp
reference/test/matrix/csr_kernels.cpp
reference/test/matrix/dense_kernels.cpp
reference/test/matrix/ell_kernels.cpp
reference/test/matrix/sellp_kernels.cpp
reference/test/matrix/sparsity_csr_kernels.cpp
reference/test/preconditioner/jacobi_kernels.cpp
reference/test/solver/ir_kernels.cpp
test/distributed/assembly_kernels.cpp
test/distributed/index_map_kernels.cpp
test/distributed/matrix_kernels.cpp
test/distributed/partition_kernels.cpp
test/distributed/vector_kernels.cpp
test/factorization/ic_kernels.cpp
test/factorization/ilu_kernels.cpp
test/matrix/CMakeLists.txt
test/matrix/coo_kernels.cpp
test/matrix/csr_kernels.cpp
test/matrix/csr_kernels2.cpp
test/matrix/matrix.cpp
test/mpi/assembly.cpp
test/mpi/matrix.cpp
test/mpi/multigrid/pgm.cpp
test/mpi/preconditioner/schwarz.cpp
test/mpi/solver/solver.cpp
test/preconditioner/jacobi_kernels.cpp
test/reorder/CMakeLists.txt
test/solver/CMakeLists.txt
test/solver/solver.cpp
test/test_exportbuild/CMakeLists.txt
test/test_install/CMakeLists.txt
test/test_install/test_install.cpp
test/test_pkgconfig/CMakeLists.txt
test/test_subdir/CMakeLists.txt
third_party/CMakeLists.txt
third_party/dummy-hook/CMakeLists.txt
third_party/gflags/CMakeLists.txt
third_party/gtest/CMakeLists.txt
third_party/identify_stream_usage/CMakeLists.txt
third_party/nlohmann_json/CMakeLists.txt

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:build This is related to the build system. reg:testing This is related to testing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants