Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Change Log

## [4.7.01](https://github.com/kokkos/kokkos-kernels/tree/4.7.01)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.7.00...4.7.01)

### New Features and Enhancements
- First implementation of recursive coordinate bisection (RCB) in graph [\#2708](https://github.com/kokkos/kokkos-kernels/pull/2708)
- Update the bisect break condition in RCB [\#2766](https://github.com/kokkos/kokkos-kernels/pull/2766)
- Add setNumRows, setNumCols to sparse matrix structures [\#2700](https://github.com/kokkos/kokkos-kernels/pull/2700)
- Add optional argument to configure sorting algorithm used in KokkosSparse:sort_crs_matrix [\#2714](https://github.com/kokkos/kokkos-kernels/pull/2714)

### Bug Fixes:
- Batched - SVD: adding iteration limits [\#2706](https://github.com/kokkos/kokkos-kernels/pull/2706)
- Minor fix for coefficient type in SPMV_Functor [\#2730](https://github.com/kokkos/kokkos-kernels/pull/2730)
- cusparse spmv_mv: use native fallback if y not 16B aligned [\#2746](https://github.com/kokkos/kokkos-kernels/pull/2746)
- Fix a CMake error when benchmarks and perf-tests enabled [\#2729](https://github.com/kokkos/kokkos-kernels/pull/2729)
- Clean up old SortCrs workaround, update RCB for host_mirror_type renaming [\#2721](https://github.com/kokkos/kokkos-kernels/pull/2721)
- Drop deprecated double4 type for CUDA 13 [\#2718](https://github.com/kokkos/kokkos-kernels/pull/2718)

## [4.7.00](https://github.com/kokkos/kokkos-kernels/tree/4.7.00)
[Full Changelog](https://github.com/kokkos/kokkos-kernels/compare/4.6.02...4.7.00)

Expand Down
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ SET(KOKKOSKERNELS_TOP_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})

SET(KokkosKernels_VERSION_MAJOR 4)
SET(KokkosKernels_VERSION_MINOR 7)
SET(KokkosKernels_VERSION_PATCH 00)
SET(KokkosKernels_VERSION_PATCH 01)
SET(KokkosKernels_VERSION "${KokkosKernels_VERSION_MAJOR}.${KokkosKernels_VERSION_MINOR}.${KokkosKernels_VERSION_PATCH}")

#Set variables for config file
Expand Down
9 changes: 5 additions & 4 deletions batched/dense/impl/KokkosBatched_SVD_Serial_Impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ namespace KokkosBatched {
template <typename AViewType, typename UViewType, typename VViewType, typename SViewType, typename WViewType>
KOKKOS_INLINE_FUNCTION int SerialSVD::invoke(SVD_USV_Tag, const AViewType &A, const UViewType &U,
const SViewType &sigma, const VViewType &Vt, const WViewType &work,
typename AViewType::const_value_type tol) {
typename AViewType::const_value_type tol, int max_iters) {
static_assert(Kokkos::is_view_v<AViewType> && AViewType::rank == 2, "SVD: A must be a rank-2 view");
static_assert(Kokkos::is_view_v<UViewType> && UViewType::rank == 2, "SVD: U must be a rank-2 view");
static_assert(Kokkos::is_view_v<SViewType> && SViewType::rank == 1, "SVD: s must be a rank-1 view");
Expand All @@ -36,13 +36,14 @@ KOKKOS_INLINE_FUNCTION int SerialSVD::invoke(SVD_USV_Tag, const AViewType &A, co
using value_type = typename AViewType::non_const_value_type;
return KokkosBatched::SerialSVDInternal::invoke<value_type>(
A.extent(0), A.extent(1), A.data(), A.stride(0), A.stride(1), U.data(), U.stride(0), U.stride(1), Vt.data(),
Vt.stride(0), Vt.stride(1), sigma.data(), sigma.stride(0), work.data(), tol);
Vt.stride(0), Vt.stride(1), sigma.data(), sigma.stride(0), work.data(), tol, max_iters);
}

// Version which computes only singular values
template <typename AViewType, typename SViewType, typename WViewType>
KOKKOS_INLINE_FUNCTION int SerialSVD::invoke(SVD_S_Tag, const AViewType &A, const SViewType &sigma,
const WViewType &work, typename AViewType::const_value_type tol) {
const WViewType &work, typename AViewType::const_value_type tol,
int max_iters) {
static_assert(Kokkos::is_view_v<AViewType> && AViewType::rank == 2, "SVD: A must be a rank-2 view");
static_assert(Kokkos::is_view_v<SViewType> && SViewType::rank == 1, "SVD: s must be a rank-1 view");
static_assert(Kokkos::is_view_v<WViewType> && WViewType::rank == 1, "SVD: W must be a rank-1 view");
Expand All @@ -51,7 +52,7 @@ KOKKOS_INLINE_FUNCTION int SerialSVD::invoke(SVD_S_Tag, const AViewType &A, cons
using value_type = typename AViewType::non_const_value_type;
return KokkosBatched::SerialSVDInternal::invoke<value_type>(A.extent(0), A.extent(1), A.data(), A.stride(0),
A.stride(1), nullptr, 0, 0, nullptr, 0, 0, sigma.data(),
sigma.stride(0), work.data(), tol);
sigma.stride(0), work.data(), tol, max_iters);
}

} // namespace KokkosBatched
Expand Down
21 changes: 14 additions & 7 deletions batched/dense/impl/KokkosBatched_SVD_Serial_Internal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,14 +212,14 @@ struct SerialSVDInternal {
// U and Vt to maintain the product U*B*Vt. At the end, the singular values
// are copied to sigma.
template <typename value_type>
KOKKOS_INLINE_FUNCTION static void bidiSVD(int m, int n, value_type* B, int Bs0, int Bs1, value_type* U, int Us0,
int Us1, value_type* Vt, int Vts0, int Vts1, value_type* sigma, int ss,
const value_type& tol) {
KOKKOS_INLINE_FUNCTION static int bidiSVD(int m, int n, value_type* B, int Bs0, int Bs1, value_type* U, int Us0,
int Us1, value_type* Vt, int Vts0, int Vts1, value_type* sigma, int ss,
const value_type& tol, int max_iters) {
using KAT = Kokkos::ArithTraits<value_type>;
const value_type eps = Kokkos::ArithTraits<value_type>::epsilon();
int p = 0;
int q = 0;
while (true) {
for (int iters = 0; iters < max_iters; ++iters) {
// Zero out tiny superdiagonal entries
for (int i = 0; i < n - 1; i++) {
if (Kokkos::abs(SVDIND(B, i, i + 1)) <
Expand Down Expand Up @@ -271,10 +271,16 @@ struct SerialSVDInternal {
}
// B22 is nsub * nsub, Usub is m * nsub, and Vtsub is nsub * n
svdStep(Bsub, Usub, Vtsub, m, n, nsub, Bs0, Bs1, Us0, Us1, Vts0, Vts1);

if (iters + 1 == max_iters) {
return -1;
}
}
for (int i = 0; i < n; i++) {
sigma[i * ss] = SVDIND(B, i, i);
}

return 0;
}

// Convert SVD into conventional form: singular values positive and in
Expand Down Expand Up @@ -322,7 +328,8 @@ struct SerialSVDInternal {
template <typename value_type>
KOKKOS_INLINE_FUNCTION static int invoke(int m, int n, value_type* A, int As0, int As1, value_type* U, int Us0,
int Us1, value_type* Vt, int Vts0, int Vts1, value_type* sigma, int ss,
value_type* work, value_type tol = Kokkos::ArithTraits<value_type>::zero()) {
value_type* work, value_type tol = Kokkos::ArithTraits<value_type>::zero(),
int max_iters = 1000000000) {
// First, if m < n, need to instead compute (V, s, U^T) = A^T.
// This just means swapping U & Vt, and implicitly transposing A, U and Vt.
if (m < n) {
Expand All @@ -345,9 +352,9 @@ struct SerialSVDInternal {
return 0;
}
bidiagonalize(m, n, A, As0, As1, U, Us0, Us1, Vt, Vts0, Vts1, work);
bidiSVD(m, n, A, As0, As1, U, Us0, Us1, Vt, Vts0, Vts1, sigma, ss, tol);
int iter_err = bidiSVD(m, n, A, As0, As1, U, Us0, Us1, Vt, Vts0, Vts1, sigma, ss, tol, max_iters);
postprocessSVD(m, n, U, Us0, Us1, Vt, Vts0, Vts1, sigma, ss);
return 0;
return iter_err;
}
};

Expand Down
8 changes: 4 additions & 4 deletions batched/dense/impl/KokkosBatched_Vector_SIMD_Arith.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(float, 4) operator+(const Vector<SIM
KOKKOS_FORCEINLINE_FUNCTION
static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4) operator+(const Vector<SIMD<double>, 4> &a,
const Vector<SIMD<double>, 4> &b) {
double4 r_val;
KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4)::data_type r_val;
r_val.x = a.double4().x + b.double4().x;
r_val.y = a.double4().y + b.double4().y;
r_val.z = a.double4().z + b.double4().z;
Expand Down Expand Up @@ -281,7 +281,7 @@ static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(float, 4) operator-(const Vector<SIM
KOKKOS_FORCEINLINE_FUNCTION
static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4) operator-(const Vector<SIMD<double>, 4> &a,
const Vector<SIMD<double>, 4> &b) {
double4 r_val;
KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4)::data_type r_val;
r_val.x = a.double4().x - b.double4().x;
r_val.y = a.double4().y - b.double4().y;
r_val.z = a.double4().z - b.double4().z;
Expand Down Expand Up @@ -487,7 +487,7 @@ static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(float, 4) operator*(const Vector<SIM
KOKKOS_FORCEINLINE_FUNCTION
static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4) operator*(const Vector<SIMD<double>, 4> &a,
const Vector<SIMD<double>, 4> &b) {
double4 r_val;
KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4)::data_type r_val;
r_val.x = a.double4().x * b.double4().x;
r_val.y = a.double4().y * b.double4().y;
r_val.z = a.double4().z * b.double4().z;
Expand Down Expand Up @@ -729,7 +729,7 @@ static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(float, 4) operator/(const Vector<SIM
KOKKOS_FORCEINLINE_FUNCTION
static KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4) operator/(const Vector<SIMD<double>, 4> &a,
const Vector<SIMD<double>, 4> &b) {
double4 r_val;
KOKKOSKERNELS_SIMD_ARITH_RETURN_TYPE(double, 4)::data_type r_val;
r_val.x = a.double4().x / b.double4().x;
r_val.y = a.double4().y / b.double4().y;
r_val.z = a.double4().z / b.double4().z;
Expand Down
6 changes: 4 additions & 2 deletions batched/dense/src/KokkosBatched_SVD_Decl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,15 @@ struct SerialSVD {
template <typename AViewType, typename UViewType, typename VtViewType, typename SViewType, typename WViewType>
KOKKOS_INLINE_FUNCTION static int invoke(
SVD_USV_Tag, const AViewType &A, const UViewType &U, const SViewType &s, const VtViewType &Vt, const WViewType &W,
typename AViewType::const_value_type tol = Kokkos::ArithTraits<typename AViewType::value_type>::zero());
typename AViewType::const_value_type tol = Kokkos::ArithTraits<typename AViewType::value_type>::zero(),
int max_iters = 1000000000);

// Version which computes only singular values
template <typename AViewType, typename SViewType, typename WViewType>
KOKKOS_INLINE_FUNCTION static int invoke(
SVD_S_Tag, const AViewType &A, const SViewType &s, const WViewType &W,
typename AViewType::const_value_type tol = Kokkos::ArithTraits<typename AViewType::value_type>::zero());
typename AViewType::const_value_type tol = Kokkos::ArithTraits<typename AViewType::value_type>::zero(),
int max_iters = 1000000000);
};

} // namespace KokkosBatched
Expand Down
10 changes: 7 additions & 3 deletions batched/dense/src/KokkosBatched_Vector_SIMD.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,11 @@ class Vector<SIMD<double>, 4> {
using mag_type = double;

enum : int { vector_length = 4 };
#if CUDA_VERSION >= 13000
typedef double4_16a data_type;
#else
typedef double4 data_type;
#endif

KOKKOS_INLINE_FUNCTION
static const char *label() { return "GpuDouble4"; }
Expand Down Expand Up @@ -422,7 +426,7 @@ class Vector<SIMD<double>, 4> {
_data.z = b._data.z;
_data.w = b._data.w;
}
KOKKOS_INLINE_FUNCTION Vector(const double4 &val) {
KOKKOS_INLINE_FUNCTION Vector(const data_type &val) {
_data.x = val.x;
_data.y = val.y;
_data.z = val.z;
Expand All @@ -446,7 +450,7 @@ class Vector<SIMD<double>, 4> {
}

KOKKOS_INLINE_FUNCTION
type &operator=(const double4 &val) {
type &operator=(const data_type &val) {
_data.x = val.x;
_data.y = val.y;
_data.z = val.z;
Expand All @@ -455,7 +459,7 @@ class Vector<SIMD<double>, 4> {
}

KOKKOS_INLINE_FUNCTION
double4 double4() const { return _data; }
data_type double4() const { return _data; }

KOKKOS_INLINE_FUNCTION
type &loadAligned(const value_type *p) {
Expand Down
79 changes: 59 additions & 20 deletions batched/dense/unit_test/Test_Batched_SerialSVD.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,34 +56,36 @@ typename V::non_const_value_type simpleNorm2(const V& v) {

// Check that all columns of X are unit length and pairwise orthogonal
template <typename Mat>
void verifyOrthogonal(const Mat& X) {
using Scalar = typename Mat::non_const_value_type;
int k = X.extent(1);
void verifyOrthogonal(const Mat& X, const double epsilon = -1) {
using Scalar = typename Mat::non_const_value_type;
int k = X.extent(1);
const double tol = (epsilon <= 0 ? Test::svdEpsilon<Scalar>() : epsilon);
for (int i = 0; i < k; i++) {
auto col1 = Kokkos::subview(X, Kokkos::ALL(), i);
double len = simpleNorm2(col1);
Test::EXPECT_NEAR_KK(len, 1.0, Test::svdEpsilon<Scalar>());
Test::EXPECT_NEAR_KK(len, 1.0, tol);
for (int j = 0; j < i; j++) {
auto col2 = Kokkos::subview(X, Kokkos::ALL(), j);
double d = Kokkos::ArithTraits<Scalar>::abs(simpleDot(col1, col2));
Test::EXPECT_NEAR_KK(d, 0.0, Test::svdEpsilon<Scalar>());
Test::EXPECT_NEAR_KK(d, 0.0, tol);
}
}
}

template <typename AView, typename UView, typename VtView, typename SigmaView>
void verifySVD(const AView& A, const UView& U, const VtView& Vt, const SigmaView& sigma) {
void verifySVD(const AView& A, const UView& U, const VtView& Vt, const SigmaView& sigma, const double epsilon = -1) {
using Scalar = typename AView::non_const_value_type;
using KAT = Kokkos::ArithTraits<Scalar>;
// Check that U/V columns are unit length and orthogonal, and that U *
// diag(sigma) * V^T == A
int m = A.extent(0);
int n = A.extent(1);
int maxrank = std::min(m, n);
verifyOrthogonal(U);
// Check that U/V columns are unit length and orthogonal
// and that: U * diag(sigma) * V^T == A
int m = A.extent(0);
int n = A.extent(1);
int maxrank = std::min(m, n);
const double tol = (epsilon <= 0 ? Test::svdEpsilon<Scalar>() : epsilon);
verifyOrthogonal(U, epsilon);
// NOTE: V^T being square and orthonormal implies that V is, so we don't have
// to transpose it here.
verifyOrthogonal(Vt);
verifyOrthogonal(Vt, epsilon);
Kokkos::View<Scalar**, typename AView::device_type> usvt("USV^T", m, n);
for (int i = 0; i < maxrank; i++) {
auto Ucol = Kokkos::subview(U, Kokkos::ALL(), Kokkos::make_pair<int>(i, i + 1));
Expand All @@ -92,7 +94,7 @@ void verifySVD(const AView& A, const UView& U, const VtView& Vt, const SigmaView
}
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
Test::EXPECT_NEAR_KK(usvt(i, j), A(i, j), Test::svdEpsilon<Scalar>());
Test::EXPECT_NEAR_KK(usvt(i, j), A(i, j), tol);
}
}
// Make sure all singular values are positive
Expand All @@ -109,19 +111,26 @@ template <typename Matrix, typename Vector>
struct SerialSVDFunctor_Full {
SerialSVDFunctor_Full(const Matrix& A_, const Matrix& U_, const Matrix& Vt_, const Vector& sigma_,
const Vector& work_)
: A(A_), U(U_), Vt(Vt_), sigma(sigma_), work(work_) {}
: A(A_), U(U_), Vt(Vt_), sigma(sigma_), work(work_) {
tol = Kokkos::ArithTraits<double>::zero();
}

SerialSVDFunctor_Full(const Matrix& A_, const Matrix& U_, const Matrix& Vt_, const Vector& sigma_,
const Vector& work_, const double tol_)
: A(A_), U(U_), Vt(Vt_), sigma(sigma_), work(work_), tol(tol_) {}

// NOTE: this functor is only meant to be launched with a single element range
// policy
KOKKOS_INLINE_FUNCTION void operator()(int) const {
KokkosBatched::SerialSVD::invoke(KokkosBatched::SVD_USV_Tag(), A, U, sigma, Vt, work);
KokkosBatched::SerialSVD::invoke(KokkosBatched::SVD_USV_Tag(), A, U, sigma, Vt, work, tol);
}

Matrix A;
Matrix U;
Matrix Vt;
Vector sigma;
Vector work;
double tol;
};

template <typename Matrix, typename Vector>
Expand Down Expand Up @@ -497,6 +506,27 @@ Kokkos::View<Scalar**, Layout, Device> getTestCase(int testCase) {
Ahost = MatrixHost("A5", m, n);
break;
}
case 6: {
m = 3;
n = 6;
Ahost = MatrixHost("A6", m, n);
Ahost(0, 0) = -2.3588494081694974e-03;
Ahost(0, 1) = -2.3602176428346553e-03;
Ahost(0, 2) = -3.3360574050870077e-03;
Ahost(0, 3) = -2.3589487578561312e-03;
Ahost(0, 4) = -3.3359167956075490e-03;
Ahost(0, 5) = -3.3378517656821728e-03;
Ahost(1, 0) = 3.3359168246290603e-03;
Ahost(1, 1) = 3.3378518006490351e-03;
Ahost(1, 3) = 3.3360573263032968e-03;
Ahost(2, 0) = -2.3588494081695022e-03;
Ahost(2, 1) = -2.3602176428346587e-03;
Ahost(2, 2) = 3.3360574050869769e-03;
Ahost(2, 3) = -2.3589487578561286e-03;
Ahost(2, 4) = 3.3359167956075399e-03;
Ahost(2, 5) = 3.3378517656821581e-03;
break;
}
default: throw std::runtime_error("Test case out of bounds.");
}
Kokkos::View<Scalar**, Layout, Device> A(Ahost.label(), m, n);
Expand All @@ -509,7 +539,7 @@ void testSpecialCases() {
using Matrix = Kokkos::View<Scalar**, Layout, Device>;
using Vector = Kokkos::View<Scalar*, Device>;
using ExecSpace = typename Device::execution_space;
for (int i = 0; i < 6; i++) {
for (int i = 0; i < 7; i++) {
Matrix A = getTestCase<Scalar, Layout, Device>(i);
int m = A.extent(0);
int n = A.extent(1);
Expand All @@ -527,15 +557,24 @@ void testSpecialCases() {
typename Matrix::HostMirror Acopy("Acopy", m, n);
Kokkos::deep_copy(Acopy, A);
// Run the SVD
Kokkos::parallel_for(Kokkos::RangePolicy<ExecSpace>(0, 1),
SerialSVDFunctor_Full<Matrix, Vector>(A, U, Vt, sigma, work));
if (std::is_same_v<Scalar, double> && i == 6) {
Kokkos::parallel_for(Kokkos::RangePolicy<ExecSpace>(0, 1),
SerialSVDFunctor_Full<Matrix, Vector>(A, U, Vt, sigma, work, 1e-9));
} else {
Kokkos::parallel_for(Kokkos::RangePolicy<ExecSpace>(0, 1),
SerialSVDFunctor_Full<Matrix, Vector>(A, U, Vt, sigma, work));
}
// Get the results back
auto Uhost = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), U);
auto Vthost = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), Vt);
auto sigmaHost = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), sigma);

// Verify the SVD is correct
verifySVD(Acopy, Uhost, Vthost, sigmaHost);
if (std::is_same_v<Scalar, double> && i == 6) {
verifySVD(Acopy, Uhost, Vthost, sigmaHost, 1e-11);
} else {
verifySVD(Acopy, Uhost, Vthost, sigmaHost);
}
}
}

Expand Down
10 changes: 9 additions & 1 deletion docs/source/API/graph-index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ API: Graph

graph/distance1_color
graph/distance2_color
graph/rcb

Graph
=====

These algorithms provide generic graph coloring capabilities used by Gauss-Seidel, multigrid aggregation, etc... We distinguish to main categories of algorithms, distance 1 coloring and distance 2 coloring. We also provide a coloring handle that allows users to easily control the behavior of the algorithms.
These algorithms provide generic graph coloring capabilities used by Gauss-Seidel, multigrid aggregation, etc... We distinguish to main categories of algorithms, distance 1 coloring and distance 2 coloring. We also provide a coloring handle that allows users to easily control the behavior of the algorithms. We also include an implementation of the recursive partitioning bisection (RCB) algorithm.

Graph coloring handle
=====================
Expand All @@ -31,3 +32,10 @@ Distance 2 and One-sided Bipartite graph coloring
Distance 2 coloring algorithms will ensure that each node has a different color than its neighbors and its neighbors' neighbors.

- :doc:`Distance-2 Graph Coloring <graph/distance2_color>`

Recursive Coordinate Bisection (RCB)
====================================

RCB performs recursive partitioning on a set of coordinates of the mesh points.

- :doc:`Recursive Coordinate Bisection <graph/rcb>`
Loading
Loading